Re: [Bacula-users] concurrent backups to disk still hangs in 1.38.1

2005-11-17 Thread Luke Dean



On Thu, 17 Nov 2005, Daniel Holtkamp wrote:


Hi !

Luke Dean wrote:

Last month I reported a problem where my system would hang whenever I ran 
multiple concurrent jobs that backup to a single RAID array after upgrading 
from bacula 1.36.2 to 1.36.3.  This problem still persists in 1.38.1.


I`m still using 1.37.40 but since you seem to have the problem in 1.36.2 and 
1.38.1 i don`t think that matters ...


I did not have the problem in 1.36.2.  The problem was introduced in 
1.36.3.  Someone else on this list confirmed that earlier.  I never tried 
anything in the 1.37 series though.


I have about 10 clients that all do a backup at 1:05. Each client has his own 
job, his own pool and his own storage device. I don`t know if by four storage 
daemons you mean 4x bacula-sd but i`m only running each service once.


I'm also running one bacula-sd.  I've got four Storage sections in 
bacula-dir.conf, each with a unique Name and Device.  I tried unique 
Media Types too, but that didn't help.




Max concurrent jobs is set to 10 of course.

All these backups go to a 350GB Raid-5 array and the directory structure is 
like this:

./storage/clientA/clientA.0001
./storage/clientB/clientB.0001
etc

I`m having absolutely NO problems whatsoever with all jobs concurrently 
writing to the raid device.


I was backing up to
/mirror/bacula
That locks up within about 10 seconds of four jobs running concurrently.
I reconfigured it to back up to
/mirror/bacula/client1
/mirror/bacula/client2
etc...
It takes several minutes to lock up the machine that way, but it still 
freezes.



Maybe the problem is somewhere else ?


Maybe.  Are you doing software raid or hardware raid?  I was using 
hardware raid in 1.36.2 when I didn't have problems.  Now I'm using 
FreeBSD's gmirror as software raid.  I really doubt it matters, but 
perhaps the internal locking is handled differently enough to be a 
problem.


I'll post the configuration that produces the lockup, in case anyone 
wants to check it for mistakes.  Here's bacula-dir.conf and 
bacula-sd.conf.
Everything is bacula version 1.38.1, even the file daemons on the windows 
machines.
border is running FreeBSD 6, and got the same problems with FreeBSD 5.4. 
I'm using the 4BSD scheduler.

/mirror is a gmirror.
This produces a hard freeze on border just a few seconds after multiple 
jobs start accessing the storage daemon at the same time.  Making the 
changes in the storage daemon to have each Device point to a different 
subdirectory delays the freeze by several minutes.  Changing the 
configuration to only have one Storage defined or cutting back Maximum 
Concurrent Jobs on the director to 1 avoids the freeze.


bacula-dir.conf
--
Director {# define myself
  Name = border-dir
  DIRport = 9101# where we listen for UA connections
  QueryFile = /usr/local/share/bacula/query.sql
  WorkingDirectory = /var/db/bacula
  PidDirectory = /var/run
  Maximum Concurrent Jobs = 4
  FD Connect Timeout = 60 min  #retry for an hour
  Password = xxx # Console password
  Messages = Standard
}

JobDefs {
  Name = DefaultJob
  Type = Backup
  Level = Incremental
  FileSet = Windows
  Storage = File
  Messages = Standard
  Pool = Default
  Priority = 10
  Max Start Delay = 82800  #23 hours
}

Job {
  Name = Abigail
  JobDefs = DefaultJob
  Client = Abigail-fd
  Write Bootstrap = /var/db/bacula/Abigail.bsr
  Storage = AbigailFile
  Pool = Abigail
  Schedule = WeeklyCycle1
}
Job {
  Name = Tani
  JobDefs = DefaultJob
  Client = Tani-fd
  Write Bootstrap = /var/db/bacula/Tani.bsr
  Storage = TaniFile
  Pool = Tani
  Schedule = WeeklyCycle2
}
Job {
  Name = greentower
  JobDefs = DefaultJob
  Client = greentower-fd
  Write Bootstrap = /var/db/bacula/greentower.bsr
  Storage = greentowerFile
  Pool = greentower
  Schedule = WeeklyCycle3
  FileSet = greentowerFileSet
}
Job {
  Name = border
  JobDefs = DefaultJob
  Client = border-fd
  Write Bootstrap = /var/db/bacula/border.bsr
  Storage = borderFile
  Pool = border
  Schedule = WeeklyCycle4
  FileSet = borderFileSet
}


# Backup the catalog database (after the nightly save)
Job {
  Name = BackupCatalog
  JobDefs = DefaultJob
  Client = border-fd
  Level = Full
  FileSet=Catalog
  Schedule = WeeklyCycleAfterBackup
  # This creates an ASCII copy of the catalog
  RunBeforeJob = /usr/local/share/bacula/make_catalog_backup bacula bacula
  ## This deletes the copy of the catalog
  #RunAfterJob  = /usr/local/share/bacula/delete_catalog_backup
  Write Bootstrap = /var/db/bacula/BackupCatalog.bsr
  Priority = 11   # run after main backup
}

Job {
  Name = RestoreAbigail
  Type = Restore
  Client=Abigail-fd
  FileSet=Windows
  Storage = AbigailFile
  Pool = Abigail
  Messages = Standard
  Where = /tmp/bacula-restores
}
Job {
  Name = RestoreTani
  Type = Restore
  Client=Tani-fd
  FileSet=Windows
  Storage = TaniFile
  Pool = Tani

Re: [Bacula-users] dot commands: how to use .status??

2005-11-17 Thread Luke Dean



I need to know how to use the .status command con bconsole, when I
execute it keeps telling me is:

1900 Bad .status command, missing arguments.


You have to specify what you want the status of.

Try putting one of these after it:
all
dir=dir-name
director
client=client-name
storage=storage-name


---
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628alloc_id=16845op=click
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] concurrent backups to disk still hangs in 1.38.1

2005-11-16 Thread Luke Dean


Last month I reported a problem where my system would hang whenever I ran 
multiple concurrent jobs that backup to a single RAID array after 
upgrading from bacula 1.36.2 to 1.36.3.  This problem still persists in 
1.38.1.


I've got one director, four clients, four backup jobs (one per client), 
four pools (one per client), and four storage daemons (one per client). 
Max Concurrent Jobs on the director is upped to four so that I can back up 
all four machines at the same time.


The storage daemons all write to the same RAID array, and I think this is 
the problem.  The documentation says that Storage daemons can't share 
Devices, and I suppose that's what I'm doing.


I tried setting the Devices to different folders within the array, but 
that didn't work.  I tried giving them all different Media Types too. 
That didn't work either.  It looks like bacula really is locking by 
physical device, and I can't get around that by asking it to treat folders 
as devices.  If I were backing up to tape instead of disk, this wouldn't 
be an issue, I suppose.


I cut back my configuration to just one Storage daemon.  This works, but 
all four jobs have to take turns waiting on the Storage daemon.
Maybe if I had all the jobs share a single pool too, they might be able to 
run concurrently, but I've grown to like having a separate file for each 
machine.



---
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628alloc_id=16845op=click
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Upgrade Bacula to 1.38 on FreeBSD 5.4

2005-11-14 Thread Luke Dean



On Mon, 14 Nov 2005, Kern Sibbald wrote:


On Monday 14 November 2005 22:20, Matt Bettinger wrote:

Good Afternoon,

I decided to upgrade bacula on FreeBsd 5.4 (mysql) from 1.36? to 1.38.0
It compiled fine but it appears that my mysql database tables are out of
date.  When starting the bacula daemons I get an error:

helpdesk# sh /usr/local/etc/rc.d/z-bacula.sh start
Starting the Bacula Storage daemon
Starting the Bacula File daemon
Starting the Bacula Director daemon
14-Nov 15:16 bacula-dir:  Fatal error: Version error for database
bacula. Wanted 9, got 8
14-Nov 15:16 bacula-dir:  Fatal error: Could not open database bacula.
14-Nov 15:16 bacula-dir:  Fatal error: Version error for database
bacula. Wanted 9, got 8
14-Nov 15:16 bacula-dir ERROR TERMINATION
Please correct configuration file: /usr/local/etc/bacula-dir.conf
helpdesk#

I know about the table update scripts but do not see one for MySQL
versions 8-9.

helpdesk# ls
README  update_postgresql_tables_7_to_8
update_mysql_tables_4_to_5  update_sqlite_tables_4_to_5
update_mysql_tables_5_to_6  update_sqlite_tables_5_to_6
update_mysql_tables_6_to_7  update_sqlite_tables_6_to_7
update_mysql_tables_7_to_8  update_sqlite_tables_7_to_8
helpdesk# pwd
/usr/ports/sysutils/bacula-server/work/bacula-1.38.0/updatedb

Thanks for any assistance.  I'd check the archives first but
sourceforge's site is not up at the moment and I really would like to
get this fixed before tonight.  ;-)


I cannot speak for the FreeBSD port, but the normal update script from one
version to the next is found in src/cats/update_bacula_tables

Kern


Check the README file in that folder.  Per the note in there from 
November, these scripts aren't necessary for what you're trying to do.
(And you're farther along with the upgrade than I am, if that's any 
concolation.)  :)


Luke Dean


---
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42 plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] v1.38 on FreeBSD with SQLite3

2005-11-14 Thread Luke Dean


The bacula-server port was recently updated and I'm very happy about that!

I think I've found a problem in the Makefile when I use the WITH_SQLITE3 
option, and I thought I'd pass along my solution.

I've submitted a PR to FreeBSD.

The line
LIB_DEPENDS+=   sqlite.3:${PORTSDIR}/databases/sqlite3 
in Makefile causes the build to fail


I believe the proper line should be
LIB_DEPENDS+=   sqlite3:${PORTSDIR}/databases/sqlite3

At least this works on mine.

I think the default SQLite version 2 line has a similar typo.

Again, I'm happy to finally have the new version of bacula in the ports 
collection, and I'm looking forward to using it.



---
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628alloc_id=16845op=click
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] 1.36.3 Director dying/freezing/rebooting

2005-10-14 Thread Luke Dean



On Fri, 14 Oct 2005, Alan Brown wrote:


On Wed, 12 Oct 2005, Arno Lehmann wrote:

but right now I'm thinking that something changed between 1.36.2 and 
1.36.3 that keeps me from being able to run more than one job at a time 
now, or my hardware just can't handle it.


Hmm. Well, I know of people who used these versions with multiple jobs 
without serious problems, and I did, too. And if *my* hardware mages that, 
yours should, too. (iP200MMX, 128MB)


The issue isn't hardware and it only manifests if there is a deadlock on 
simultaneous jobs requiring tapes from different pools in the same drive.


AB


Ah yes, that is exactly the situation my configuration produces when I 
run multiple concurrent jobs. 
Also, the problem hasn't resurfaced since I reconfigured 1.36.3's director to 
run only one job at a time.

Thank you for the information and confirmation of a fix in 1.37.


---
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] 1.36.3 Director dying/freezing/rebooting

2005-10-12 Thread Luke Dean



On Tue, 11 Oct 2005, Arno Lehmann wrote:


Hello,

On 11.10.2005 08:35, Luke Dean wrote:



Hello, I just subscribed to the list, though I've been happily using Bacula 
for about a year now.  Last month I ran into my first serious problem, and 
I'm not sure how to troubleshoot it.


I'd been using version 1.36.2 on an SMP machine running FreeBSD 5.4 (i386 
platform) backing up several different machines on a network to a hardware 
RAID array.  It worked great.


Then I decided to put the backup responsibilities on a different machine.

... upgrade to 1.36.3 on single-CPU FreeBSD 5.4 machine


Then the problems started.

Often (nearly always) whenever I'd attempt a full backup, the director 
daemon would (a) silently terminate (b) cause the system to hang or (c) 
reboot the system.  There was never anything in the Bacula log, syslog, or 
the console message log.  It doesn't matter if the job starts automatically 
or manually from bconsole.  Liklihood of a problem seems directly 
proportional to the size of the fileset.


I'll remove the rest of your description - looks like you tried to rule out 
problems not related to Bacula.


My first impression was that there should be something OS- or hardware 
related. After all, a reboot without log entries etc. usually indicates that. 
Anyway, what you experience might prove hard to analyze.


Concerning bacula - I understand you are using file storage only and your 
backups are running rather unrelaibale right now.
I'd suggest to upgrade to the current development version (1.37.40) and see 
if that fixes your problems (I guess it will not, but you never know). There 
were, as far as I remember, some deadlock problems in the 1.36 versions which 
should be fixed in 1.37.


An upgrade to 1.37.40 will require a catalog database change, but the 
configuration can remain (mostly) unchanged. Personally, I consider 1.37 
stable since 1.37.3something, although it is not tested as thorougly as a 
relase version, of course. Anyway, even if this doesn't fix anything for you, 
you will not lose much considering the current situation :-)


Then it would seem useful to analyze the server crashes, reboot, and hangs.

The first step I'd take is to set up system logging to another host - that 
can sometimes catch the last log messages before or during a crash.


Then I'd suggest removing the new disk controller - that seems to be the only 
new hardware that can physically reset or hold your machine. Pull it out and 
use a test-setup for your backups. For example, set up disk volumes with very 
short retention times and limited size. Have them automatically recycled, and 
let some big jobs run on them. Of course, you will not be able to use these 
backups - they will overwrite their own data - but as far as I know you can 
(still) do this and it allows testing with limited disk space.


then run bacula with debug output enabled and capture the files, which, in 
case of a crash, might be difficult. NFS mount and syncronous writing could 
be one solution for the logging directory. See if you can determine if bacula 
always does the same when the server crashes.


And, of course, observe the temperature in your server and of your disks. I 
have an old machine I use as file server, and during normal operation without 
many accesses the disks report temperatures of more than 50 degrees (Celsius, 
of course). I wouldn't try to use that setup for high throughput 
applications...


Arno


I haven't tried 1.37 yet, but I did try several of your other suggestions.
I eventually figured out how to run the director inside the debugger, get 
some debugging information, and watch the machine lock up.  What I saw was 
that the director tends to run in a loop where it talks to the other 
daemons, and occasionally gets interrupted with a scheduling routine.  The 
system freeze would happen whenever the director tried to talk to file 
daemons on multiple machines at the same time.


On a whim, I changed the Maximum Concurrent Jobs setting in my director 
configuration from 4 back down to the default of 1.  Sometime in 2004 on 
the machine I used to run bacula on, I experimented with concurrent jobs 
and had great success with it, so I didn't think anything about keeping 
the same configuration on this new machine and new version of bacula.


Since I've made that change, I've queued up a lot of full backup jobs, and 
bacula has been chewing through them just great for the last five hours 
now.


Admittedly the whole server could crash any minute now, and it's likely 
waiting until I send this email just to spite me, but right now I'm 
thinking that something changed between 1.36.2 and 1.36.3 that keeps me 
from being able to run more than one job at a time now, or my hardware 
just can't handle it.  Either way, if the system keeps running like this, 
I'm happy.  I'll probably just need to reorder my jobs and cut down the 
retry time for those clients that sometimes get turned off at night so I

[Bacula-users] 1.36.3 Director dying/freezing/rebooting

2005-10-11 Thread Luke Dean


Hello, I just subscribed to the list, though I've been happily using 
Bacula for about a year now.  Last month I ran into my first serious 
problem, and I'm not sure how to troubleshoot it.


I'd been using version 1.36.2 on an SMP machine running FreeBSD 5.4 (i386 
platform) backing up several different machines on a network to a hardware 
RAID array.  It worked great.


Then I decided to put the backup responsibilities on a different machine. 
This one is a single-processor AMD machine that also runs the i386 port of 
FreeBSD 5.4.  It's headless, and it also runs a webserver, mailserver, 
firewall, and several other always on services, so I thought it would be 
good to give it the backup responsibilities too.  I got a SATA controller 
and a couple of 200GB drives to hold the backups, set up gmirroring on 
them, and installed Bacula 1.36.3 from the ports collection, since that 
was the new current version.  I chose the default SQLite database, since I 
was familiar with it and had always been happy with it before.  I copied 
my old configuration over to the new machine, changing names and addresses 
where appropriate.  I built Bacula from source using the ports 
collection with the default options.  I'm not using any graphical 
components at all.


Then the problems started.

Often (nearly always) whenever I'd attempt a full backup, the director 
daemon would (a) silently terminate (b) cause the system to hang or (c) 
reboot the system.  There was never anything in the Bacula log, syslog, or 
the console message log.  It doesn't matter if the job starts 
automatically or manually from bconsole.  Liklihood of a problem 
seems directly proportional to the size of the fileset.


My first suspicion was hardware, since the new and old machines were 
running the same OS and almost the same version of Bacula with almost the 
same configuration.


First I replaced my hub with a switch, since I was getting tons of packet 
collisions.  This improved my traffic situation, but then I realized that 
the director would sometimes die even on a totally local backup, so that 
rules out network problems.


Next I suspected a problem with the new controller, drives, or gmirror 
configuration.  I stress-tested these drives as much as I could, copying 
huge amounts of data in several different threads all at the same time, 
pushing the drives to the limits according to gstat, but never had any 
problems.  I'm not ruling out bad hardware or gmirror problems, but if 
that is the problem, I don't know how to prove it.  Simply loading down 
the drives with prolonged heavy write activity doesn't seem to cause a 
problem.


Then I decided to upgrade all the file daemons on my network from 1.36.2 
to 1.36.3, just in case there was some compatibility problem between the 
two versions.  No change.


Through sheer persistence and luck, I managed to get Bacula to make full 
backups of all the machines on my network.  I left Bacula running, and it 
ran fine for most of a month doing small incremental backups... but when 
it came time for some new full backups, the system hung again.


Next I started over from scratch and tried a different database.  I 
already had SQLite3 on this machine and I thought perhaps there was some 
conflict with the SQLite2 that Bacula used.  I switched from SQLite to 
Postgresql 8.0.  No change.  The director still usually terminates, hangs 
the system, or reboots the system soon after I begin a full backup of 
anything.


I haven't tried getting a traceback.  I thought I'd try to get more 
information on how to proceed before I crash my server anymore.  I've gone 
several pages into the bugs database and don't see anything relevant that 
hasn't already been fixed.  I've got to believe that this is a hardware/OS 
problem that I don't know how to isolate, some bizarre configuration 
problem that this machine has that the other machine did not, or a 
difference between 1.36.2 and 1.36.3.


Thank you for this great software and any hints on how I could proceed.


---
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users