Re: [Bacula-users] concurrent backups to disk still hangs in 1.38.1
On Thu, 17 Nov 2005, Daniel Holtkamp wrote: Hi ! Luke Dean wrote: Last month I reported a problem where my system would hang whenever I ran multiple concurrent jobs that backup to a single RAID array after upgrading from bacula 1.36.2 to 1.36.3. This problem still persists in 1.38.1. I`m still using 1.37.40 but since you seem to have the problem in 1.36.2 and 1.38.1 i don`t think that matters ... I did not have the problem in 1.36.2. The problem was introduced in 1.36.3. Someone else on this list confirmed that earlier. I never tried anything in the 1.37 series though. I have about 10 clients that all do a backup at 1:05. Each client has his own job, his own pool and his own storage device. I don`t know if by four storage daemons you mean 4x bacula-sd but i`m only running each service once. I'm also running one bacula-sd. I've got four Storage sections in bacula-dir.conf, each with a unique Name and Device. I tried unique Media Types too, but that didn't help. Max concurrent jobs is set to 10 of course. All these backups go to a 350GB Raid-5 array and the directory structure is like this: ./storage/clientA/clientA.0001 ./storage/clientB/clientB.0001 etc I`m having absolutely NO problems whatsoever with all jobs concurrently writing to the raid device. I was backing up to /mirror/bacula That locks up within about 10 seconds of four jobs running concurrently. I reconfigured it to back up to /mirror/bacula/client1 /mirror/bacula/client2 etc... It takes several minutes to lock up the machine that way, but it still freezes. Maybe the problem is somewhere else ? Maybe. Are you doing software raid or hardware raid? I was using hardware raid in 1.36.2 when I didn't have problems. Now I'm using FreeBSD's gmirror as software raid. I really doubt it matters, but perhaps the internal locking is handled differently enough to be a problem. I'll post the configuration that produces the lockup, in case anyone wants to check it for mistakes. Here's bacula-dir.conf and bacula-sd.conf. Everything is bacula version 1.38.1, even the file daemons on the windows machines. border is running FreeBSD 6, and got the same problems with FreeBSD 5.4. I'm using the 4BSD scheduler. /mirror is a gmirror. This produces a hard freeze on border just a few seconds after multiple jobs start accessing the storage daemon at the same time. Making the changes in the storage daemon to have each Device point to a different subdirectory delays the freeze by several minutes. Changing the configuration to only have one Storage defined or cutting back Maximum Concurrent Jobs on the director to 1 avoids the freeze. bacula-dir.conf -- Director {# define myself Name = border-dir DIRport = 9101# where we listen for UA connections QueryFile = /usr/local/share/bacula/query.sql WorkingDirectory = /var/db/bacula PidDirectory = /var/run Maximum Concurrent Jobs = 4 FD Connect Timeout = 60 min #retry for an hour Password = xxx # Console password Messages = Standard } JobDefs { Name = DefaultJob Type = Backup Level = Incremental FileSet = Windows Storage = File Messages = Standard Pool = Default Priority = 10 Max Start Delay = 82800 #23 hours } Job { Name = Abigail JobDefs = DefaultJob Client = Abigail-fd Write Bootstrap = /var/db/bacula/Abigail.bsr Storage = AbigailFile Pool = Abigail Schedule = WeeklyCycle1 } Job { Name = Tani JobDefs = DefaultJob Client = Tani-fd Write Bootstrap = /var/db/bacula/Tani.bsr Storage = TaniFile Pool = Tani Schedule = WeeklyCycle2 } Job { Name = greentower JobDefs = DefaultJob Client = greentower-fd Write Bootstrap = /var/db/bacula/greentower.bsr Storage = greentowerFile Pool = greentower Schedule = WeeklyCycle3 FileSet = greentowerFileSet } Job { Name = border JobDefs = DefaultJob Client = border-fd Write Bootstrap = /var/db/bacula/border.bsr Storage = borderFile Pool = border Schedule = WeeklyCycle4 FileSet = borderFileSet } # Backup the catalog database (after the nightly save) Job { Name = BackupCatalog JobDefs = DefaultJob Client = border-fd Level = Full FileSet=Catalog Schedule = WeeklyCycleAfterBackup # This creates an ASCII copy of the catalog RunBeforeJob = /usr/local/share/bacula/make_catalog_backup bacula bacula ## This deletes the copy of the catalog #RunAfterJob = /usr/local/share/bacula/delete_catalog_backup Write Bootstrap = /var/db/bacula/BackupCatalog.bsr Priority = 11 # run after main backup } Job { Name = RestoreAbigail Type = Restore Client=Abigail-fd FileSet=Windows Storage = AbigailFile Pool = Abigail Messages = Standard Where = /tmp/bacula-restores } Job { Name = RestoreTani Type = Restore Client=Tani-fd FileSet=Windows Storage = TaniFile Pool = Tani
Re: [Bacula-users] dot commands: how to use .status??
I need to know how to use the .status command con bconsole, when I execute it keeps telling me is: 1900 Bad .status command, missing arguments. You have to specify what you want the status of. Try putting one of these after it: all dir=dir-name director client=client-name storage=storage-name --- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today Register for a JBoss Training Course. Free Certification Exam for All Training Attendees Through End of 2005. For more info visit: http://ads.osdn.com/?ad_id=7628alloc_id=16845op=click ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] concurrent backups to disk still hangs in 1.38.1
Last month I reported a problem where my system would hang whenever I ran multiple concurrent jobs that backup to a single RAID array after upgrading from bacula 1.36.2 to 1.36.3. This problem still persists in 1.38.1. I've got one director, four clients, four backup jobs (one per client), four pools (one per client), and four storage daemons (one per client). Max Concurrent Jobs on the director is upped to four so that I can back up all four machines at the same time. The storage daemons all write to the same RAID array, and I think this is the problem. The documentation says that Storage daemons can't share Devices, and I suppose that's what I'm doing. I tried setting the Devices to different folders within the array, but that didn't work. I tried giving them all different Media Types too. That didn't work either. It looks like bacula really is locking by physical device, and I can't get around that by asking it to treat folders as devices. If I were backing up to tape instead of disk, this wouldn't be an issue, I suppose. I cut back my configuration to just one Storage daemon. This works, but all four jobs have to take turns waiting on the Storage daemon. Maybe if I had all the jobs share a single pool too, they might be able to run concurrently, but I've grown to like having a separate file for each machine. --- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today Register for a JBoss Training Course. Free Certification Exam for All Training Attendees Through End of 2005. For more info visit: http://ads.osdn.com/?ad_id=7628alloc_id=16845op=click ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Upgrade Bacula to 1.38 on FreeBSD 5.4
On Mon, 14 Nov 2005, Kern Sibbald wrote: On Monday 14 November 2005 22:20, Matt Bettinger wrote: Good Afternoon, I decided to upgrade bacula on FreeBsd 5.4 (mysql) from 1.36? to 1.38.0 It compiled fine but it appears that my mysql database tables are out of date. When starting the bacula daemons I get an error: helpdesk# sh /usr/local/etc/rc.d/z-bacula.sh start Starting the Bacula Storage daemon Starting the Bacula File daemon Starting the Bacula Director daemon 14-Nov 15:16 bacula-dir: Fatal error: Version error for database bacula. Wanted 9, got 8 14-Nov 15:16 bacula-dir: Fatal error: Could not open database bacula. 14-Nov 15:16 bacula-dir: Fatal error: Version error for database bacula. Wanted 9, got 8 14-Nov 15:16 bacula-dir ERROR TERMINATION Please correct configuration file: /usr/local/etc/bacula-dir.conf helpdesk# I know about the table update scripts but do not see one for MySQL versions 8-9. helpdesk# ls README update_postgresql_tables_7_to_8 update_mysql_tables_4_to_5 update_sqlite_tables_4_to_5 update_mysql_tables_5_to_6 update_sqlite_tables_5_to_6 update_mysql_tables_6_to_7 update_sqlite_tables_6_to_7 update_mysql_tables_7_to_8 update_sqlite_tables_7_to_8 helpdesk# pwd /usr/ports/sysutils/bacula-server/work/bacula-1.38.0/updatedb Thanks for any assistance. I'd check the archives first but sourceforge's site is not up at the moment and I really would like to get this fixed before tonight. ;-) I cannot speak for the FreeBSD port, but the normal update script from one version to the next is found in src/cats/update_bacula_tables Kern Check the README file in that folder. Per the note in there from November, these scripts aren't necessary for what you're trying to do. (And you're farther along with the upgrade than I am, if that's any concolation.) :) Luke Dean --- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42 plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] v1.38 on FreeBSD with SQLite3
The bacula-server port was recently updated and I'm very happy about that! I think I've found a problem in the Makefile when I use the WITH_SQLITE3 option, and I thought I'd pass along my solution. I've submitted a PR to FreeBSD. The line LIB_DEPENDS+= sqlite.3:${PORTSDIR}/databases/sqlite3 in Makefile causes the build to fail I believe the proper line should be LIB_DEPENDS+= sqlite3:${PORTSDIR}/databases/sqlite3 At least this works on mine. I think the default SQLite version 2 line has a similar typo. Again, I'm happy to finally have the new version of bacula in the ports collection, and I'm looking forward to using it. --- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today Register for a JBoss Training Course. Free Certification Exam for All Training Attendees Through End of 2005. For more info visit: http://ads.osdn.com/?ad_id=7628alloc_id=16845op=click ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 1.36.3 Director dying/freezing/rebooting
On Fri, 14 Oct 2005, Alan Brown wrote: On Wed, 12 Oct 2005, Arno Lehmann wrote: but right now I'm thinking that something changed between 1.36.2 and 1.36.3 that keeps me from being able to run more than one job at a time now, or my hardware just can't handle it. Hmm. Well, I know of people who used these versions with multiple jobs without serious problems, and I did, too. And if *my* hardware mages that, yours should, too. (iP200MMX, 128MB) The issue isn't hardware and it only manifests if there is a deadlock on simultaneous jobs requiring tapes from different pools in the same drive. AB Ah yes, that is exactly the situation my configuration produces when I run multiple concurrent jobs. Also, the problem hasn't resurfaced since I reconfigured 1.36.3's director to run only one job at a time. Thank you for the information and confirmation of a fix in 1.37. --- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 1.36.3 Director dying/freezing/rebooting
On Tue, 11 Oct 2005, Arno Lehmann wrote: Hello, On 11.10.2005 08:35, Luke Dean wrote: Hello, I just subscribed to the list, though I've been happily using Bacula for about a year now. Last month I ran into my first serious problem, and I'm not sure how to troubleshoot it. I'd been using version 1.36.2 on an SMP machine running FreeBSD 5.4 (i386 platform) backing up several different machines on a network to a hardware RAID array. It worked great. Then I decided to put the backup responsibilities on a different machine. ... upgrade to 1.36.3 on single-CPU FreeBSD 5.4 machine Then the problems started. Often (nearly always) whenever I'd attempt a full backup, the director daemon would (a) silently terminate (b) cause the system to hang or (c) reboot the system. There was never anything in the Bacula log, syslog, or the console message log. It doesn't matter if the job starts automatically or manually from bconsole. Liklihood of a problem seems directly proportional to the size of the fileset. I'll remove the rest of your description - looks like you tried to rule out problems not related to Bacula. My first impression was that there should be something OS- or hardware related. After all, a reboot without log entries etc. usually indicates that. Anyway, what you experience might prove hard to analyze. Concerning bacula - I understand you are using file storage only and your backups are running rather unrelaibale right now. I'd suggest to upgrade to the current development version (1.37.40) and see if that fixes your problems (I guess it will not, but you never know). There were, as far as I remember, some deadlock problems in the 1.36 versions which should be fixed in 1.37. An upgrade to 1.37.40 will require a catalog database change, but the configuration can remain (mostly) unchanged. Personally, I consider 1.37 stable since 1.37.3something, although it is not tested as thorougly as a relase version, of course. Anyway, even if this doesn't fix anything for you, you will not lose much considering the current situation :-) Then it would seem useful to analyze the server crashes, reboot, and hangs. The first step I'd take is to set up system logging to another host - that can sometimes catch the last log messages before or during a crash. Then I'd suggest removing the new disk controller - that seems to be the only new hardware that can physically reset or hold your machine. Pull it out and use a test-setup for your backups. For example, set up disk volumes with very short retention times and limited size. Have them automatically recycled, and let some big jobs run on them. Of course, you will not be able to use these backups - they will overwrite their own data - but as far as I know you can (still) do this and it allows testing with limited disk space. then run bacula with debug output enabled and capture the files, which, in case of a crash, might be difficult. NFS mount and syncronous writing could be one solution for the logging directory. See if you can determine if bacula always does the same when the server crashes. And, of course, observe the temperature in your server and of your disks. I have an old machine I use as file server, and during normal operation without many accesses the disks report temperatures of more than 50 degrees (Celsius, of course). I wouldn't try to use that setup for high throughput applications... Arno I haven't tried 1.37 yet, but I did try several of your other suggestions. I eventually figured out how to run the director inside the debugger, get some debugging information, and watch the machine lock up. What I saw was that the director tends to run in a loop where it talks to the other daemons, and occasionally gets interrupted with a scheduling routine. The system freeze would happen whenever the director tried to talk to file daemons on multiple machines at the same time. On a whim, I changed the Maximum Concurrent Jobs setting in my director configuration from 4 back down to the default of 1. Sometime in 2004 on the machine I used to run bacula on, I experimented with concurrent jobs and had great success with it, so I didn't think anything about keeping the same configuration on this new machine and new version of bacula. Since I've made that change, I've queued up a lot of full backup jobs, and bacula has been chewing through them just great for the last five hours now. Admittedly the whole server could crash any minute now, and it's likely waiting until I send this email just to spite me, but right now I'm thinking that something changed between 1.36.2 and 1.36.3 that keeps me from being able to run more than one job at a time now, or my hardware just can't handle it. Either way, if the system keeps running like this, I'm happy. I'll probably just need to reorder my jobs and cut down the retry time for those clients that sometimes get turned off at night so I
[Bacula-users] 1.36.3 Director dying/freezing/rebooting
Hello, I just subscribed to the list, though I've been happily using Bacula for about a year now. Last month I ran into my first serious problem, and I'm not sure how to troubleshoot it. I'd been using version 1.36.2 on an SMP machine running FreeBSD 5.4 (i386 platform) backing up several different machines on a network to a hardware RAID array. It worked great. Then I decided to put the backup responsibilities on a different machine. This one is a single-processor AMD machine that also runs the i386 port of FreeBSD 5.4. It's headless, and it also runs a webserver, mailserver, firewall, and several other always on services, so I thought it would be good to give it the backup responsibilities too. I got a SATA controller and a couple of 200GB drives to hold the backups, set up gmirroring on them, and installed Bacula 1.36.3 from the ports collection, since that was the new current version. I chose the default SQLite database, since I was familiar with it and had always been happy with it before. I copied my old configuration over to the new machine, changing names and addresses where appropriate. I built Bacula from source using the ports collection with the default options. I'm not using any graphical components at all. Then the problems started. Often (nearly always) whenever I'd attempt a full backup, the director daemon would (a) silently terminate (b) cause the system to hang or (c) reboot the system. There was never anything in the Bacula log, syslog, or the console message log. It doesn't matter if the job starts automatically or manually from bconsole. Liklihood of a problem seems directly proportional to the size of the fileset. My first suspicion was hardware, since the new and old machines were running the same OS and almost the same version of Bacula with almost the same configuration. First I replaced my hub with a switch, since I was getting tons of packet collisions. This improved my traffic situation, but then I realized that the director would sometimes die even on a totally local backup, so that rules out network problems. Next I suspected a problem with the new controller, drives, or gmirror configuration. I stress-tested these drives as much as I could, copying huge amounts of data in several different threads all at the same time, pushing the drives to the limits according to gstat, but never had any problems. I'm not ruling out bad hardware or gmirror problems, but if that is the problem, I don't know how to prove it. Simply loading down the drives with prolonged heavy write activity doesn't seem to cause a problem. Then I decided to upgrade all the file daemons on my network from 1.36.2 to 1.36.3, just in case there was some compatibility problem between the two versions. No change. Through sheer persistence and luck, I managed to get Bacula to make full backups of all the machines on my network. I left Bacula running, and it ran fine for most of a month doing small incremental backups... but when it came time for some new full backups, the system hung again. Next I started over from scratch and tried a different database. I already had SQLite3 on this machine and I thought perhaps there was some conflict with the SQLite2 that Bacula used. I switched from SQLite to Postgresql 8.0. No change. The director still usually terminates, hangs the system, or reboots the system soon after I begin a full backup of anything. I haven't tried getting a traceback. I thought I'd try to get more information on how to proceed before I crash my server anymore. I've gone several pages into the bugs database and don't see anything relevant that hasn't already been fixed. I've got to believe that this is a hardware/OS problem that I don't know how to isolate, some bizarre configuration problem that this machine has that the other machine did not, or a difference between 1.36.2 and 1.36.3. Thank you for this great software and any hints on how I could proceed. --- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users