Re: AW: [Bacula-users] Bacula director freezing
On Mon, 30 May 2005, Masopust Christian wrote: maybe you'll have a look at bugs.bacula.org at bug 331. i had a similar problem where bacula-dir randomly hangs. after applying kerns patch it didn't happen until now, but before closing this bug i would prefere to wait at least one week ;-)) As there is a HIGH chance that there will be attempts to use different volumes on the same drive, this fix won't work for me. AB --- This SF.Net email is sponsored by Yahoo. Introducing Yahoo! Search Developer Network - Create apps using Yahoo! Search APIs Find out how you can build Yahoo! directly into your own Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: AW: [Bacula-users] Bacula director freezing
On Tuesday 31 May 2005 16:14, Alan Brown wrote: On Tue, 31 May 2005, Kern Sibbald wrote: On version 1.37.20, providing you are using the new Autochanger resource in the SD, your second job will automatically select another drive if one is available, otherwise wait. Management won't let me test unstable versions. Yes, I imagined so. This is the case with most bigger operations ... Within a week or so, I hope to ensure that Bacula will never try to load the same tape simultaneously on two drives -- I received my 2 drive autochanger this morning and confirmed that it works a few minutes ago. My biggest problem is programming with it turned on, since it makes almost as much noise as a vacuum cleaner :-) Yes, this is always a problem with the Overland beasties. They are designed for use in a machine room - and they can pick up dust as effectively as a vacuum cleaner too, so a suitable enclosure wouldn't hurt. At the moment, it is probably the worst of all situations -- a desk top unit sitting on my carpeted floor next to a window. As soon as possible (when I get another SCSI card, and when a couple guys with strong arms are around), it will go down two flights of stairs into the basement with closed windows on a real desktop next to my servers. -- Best regards, Kern ( /\ V_V --- This SF.Net email is sponsored by Yahoo. Introducing Yahoo! Search Developer Network - Create apps using Yahoo! Search APIs Find out how you can build Yahoo! directly into your own Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
AW: [Bacula-users] Bacula director freezing
Title: AW: [Bacula-users] Bacula director freezing maybe you'll have a look at bugs.bacula.org at bug 331. i had a similar problem where bacula-dir randomly hangs. after applying kerns patch it didn't happen until now, but before closing this bug i would prefere to wait at least one week ;-)) -Ursprngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Im Auftrag von Alan Brown Gesendet: Donnerstag, 26. Mai 2005 17:32 An: Arno Lehmann Cc: Ali Zaidi; bacula-users@lists.sourceforge.net Betreff: Re: [Bacula-users] Bacula director freezing On Sat, 21 May 2005, Arno Lehmann wrote: However for the last two Friday nights the bacula director has been freezing after backing up the first seven clients. I did experience the same, couldn't find any reason, but after the upgrade to 1.36.2 that didn't happen again. So, I suggest you do an upgrade to that version or the current release version 1.36.3. It's happening for me on 1.36.2 and 1.36.3 Another peculiar thing i saw was that the number of bacula-dir processes on the system increases from 4 to around 21. That's normal - that's all the worker threads for the different jobs and bookkeeping. I don't know how many I had, but when the director crashed I usually had four jobs running and saw about 10-20 jobs / threads. It's about the same number as maximums set int he config files. AB --- This SF.Net email is sponsored by Yahoo. Introducing Yahoo! Search Developer Network - Create apps using Yahoo! Search APIs Find out how you can build Yahoo! directly into your own Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: AW: [Bacula-users] Bacula director freezing
Please see bug report 331 (if I am not mistaken). I've uploaded a correction that should fix the problem. On Wednesday 25 May 2005 16:09, Jeffery P. Humes wrote: I am not going to be much help here, but just wanted to say that I am having the same issue with (I believe) the director freezing. It is seemingly random. Sometimes it stops responding every other day, sometimes it will go 1-2 weeks. I have been running this version of bacula for about 2 months. Version: kninfratemp-dir Version: 1.36.2 (28 February 2005) (with Tape EOF restore patch applied) I will most likely upgrade to 1.36.3 in the near future. I just dont even know where to start troubleshooting this, I dont get a traceback at all when it freezes. -Jeff Humes Masopust Christian wrote: hi kern, all right, submitted this problem as a bug (331). i'm not sure if this is really a problem with timeout as i don't have any time limits configured in my config. the freeze of director occured when trying to start the first job in the evening. the last job that run before was at 2pm and it finished without problems. anyway, bug is submitted and thank for your help! (but first, please enjoy your holidays!!) chris -Ursprngliche Nachricht- Von: Kern Sibbald [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 24. Mai 2005 22:42 An: bacula-users@lists.sourceforge.net Cc: Masopust Christian Betreff: Re: [Bacula-users] Bacula director freezing Hello, This appears to be a deadlock situation, and seems to be triggered by a watchdog timeout, which means you have probably set some maximum time limit for a job. Though the deadlock could be related to version 1.36.3, I'd be a bit surprised. At this point, I cannot exclude a 1.36.3 specific problem, so I'll carefully check that after returning from vacation. I'd appreciate it if you would submit this traceback as a bug report along with your Director's conf file. On Tuesday 24 May 2005 13:25, Masopust Christian wrote: Yesterday in the evening, just when starting some jobs my director again freezes... here's the output of btraceback (my system is Fedora Core 3, Bacula is 1.36.3): From [EMAIL PROTECTED] Mon May 23 22:01:32 2005 Return-Path: [EMAIL PROTECTED] Received: from atpcc7fc.sie.siemens.at (atpcc7fc.sie.siemens.at [127.0.0.1]) by atpcc7fc.sie.siemens.at (8.13.1/8.13.1) with SMTP id j4NK1VFi027151 for [EMAIL PROTECTED]; Mon, 23 May 2005 22:01:31 +0200 Message-Id: [EMAIL PROTECTED] From: [EMAIL PROTECTED] Subject: Bacula GDB traceback of bacula-dir Sender: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Mon, 23 May 2005 22:01:31 +0200 Status: R Using host libthread_db library /lib/libthread_db.so.1. [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 3346)] [New Thread 32769 (LWP 3351)] [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 3346)] [New Thread 32769 (LWP 3351)] [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 3346)] [New Thread 32769 (LWP 3351)] [New Thread 16386 (LWP 3352)] [New Thread 32771 (LWP 3353)] [New Thread 19726340 (LWP 26151)] [New Thread 19742725 (LWP 26152)] [New Thread 19759110 (LWP 26164)] [New Thread 19775495 (LWP 26172)] [New Thread 19791880 (LWP 26180)] [New Thread 19808265 (LWP 26203)] [New Thread 19824650 (LWP 26267)] [New Thread 19841035 (LWP 26294)] [New Thread 19857420 (LWP 26320)] [New Thread 19873805 (LWP 26381)] [New Thread 19890190 (LWP 26411)] [New Thread 19906575 (LWP 26434)] 0x004c80d4 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 $1 = atpcc7fc-dir, '\0' repeats 17 times $2 = 0x80b5230 bacula-dir $3 = 0x80b5dd0 /opt/bacula/sbin/ $4 = MySQL $5 = 0x80a321c 1.36.3 (22 April 2005) $6 = 0x809bfb8 i686-redhat-linux-gnu $7 = 0x809bfb1 redhat $8 = 0x809bfa4 (Heidelberg) #0 0x004c80d4 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 #1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c9720 in __pthread_alt_lock () from /lib/i686/libpthread.so.0 #3 0x004c614e in pthread_mutex_lock () from /lib/i686/libpthread.so.0 #4 0x08057dab in jobq_add (jq=0x80b4300, jcr=0x80fc570) at jobq.c:240 #5 0x080566d8 in run_job (jcr=0x80fc570) at job.c:140 #6 0x0804c034 in main (argc=0, argv=0x8090b55) at dird.c:241 Thread 16 (Thread 19906575 (LWP 26434)): #0 0x004c80d4 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 #1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c3fab in [EMAIL PROTECTED] () from /lib/i686/libpthread.so.0 #3 0x08087a7a in rwl_writelock
Re: AW: [Bacula-users] Bacula director freezing
I tried this with the version I have currently. I got the below error: g++ -c -I. -I.. -g -O2 -Wall jobq.c jobq.c: In function `void* jobq_server(void*)': jobq.c:489: error: `dird_free_jcr_pointers' undeclared (first use this function) jobq.c:489: error: (Each undeclared identifier is reported only once for each function it appears in.) make[1]: *** [jobq.o] Error 1 make[1]: Leaving directory `/usr/src/bacula-1.36.2/src/dird' I will upgrade to 1.36.3. -Jeff Kern Sibbald wrote: Please see bug report 331 (if I am not mistaken). I've uploaded a correction that should fix the problem. On Wednesday 25 May 2005 16:09, Jeffery P. Humes wrote: I am not going to be much help here, but just wanted to say that I am having the same issue with (I believe) the director freezing. It is seemingly random. Sometimes it stops responding every other day, sometimes it will go 1-2 weeks. I have been running this version of bacula for about 2 months. Version: kninfratemp-dir Version: 1.36.2 (28 February 2005) (with Tape EOF restore patch applied) I will most likely upgrade to 1.36.3 in the near future. I just dont even know where to start troubleshooting this, I dont get a traceback at all when it freezes. -Jeff Humes Masopust Christian wrote: hi kern, all right, submitted this problem as a bug (331). i'm not sure if this is really a problem with timeout as i don't have any time limits configured in my config. the freeze of director occured when trying to start the first job in the evening. the last job that run before was at 2pm and it finished without problems. anyway, bug is submitted and thank for your help! (but first, please enjoy your holidays!!) chris -Ursprngliche Nachricht- Von: Kern Sibbald [mailto:[EMAIL PROTECTED]] Gesendet: Dienstag, 24. Mai 2005 22:42 An: bacula-users@lists.sourceforge.net Cc: Masopust Christian Betreff: Re: [Bacula-users] Bacula director freezing Hello, This appears to be a deadlock situation, and seems to be triggered by a watchdog timeout, which means you have probably set some maximum time limit for a job. Though the deadlock could be related to version 1.36.3, I'd be a bit surprised. At this point, I cannot exclude a 1.36.3 specific problem, so I'll carefully check that after returning from vacation. I'd appreciate it if you would submit this traceback as a bug report along with your Director's conf file. On Tuesday 24 May 2005 13:25, Masopust Christian wrote: Yesterday in the evening, just when starting some jobs my director again freezes... here's the output of btraceback (my system is Fedora Core 3, Bacula is 1.36.3): >From [EMAIL PROTECTED] Mon May 23 22:01:32 2005 Return-Path: [EMAIL PROTECTED] Received: from atpcc7fc.sie.siemens.at (atpcc7fc.sie.siemens.at [127.0.0.1]) by atpcc7fc.sie.siemens.at (8.13.1/8.13.1) with SMTP id j4NK1VFi027151 for [EMAIL PROTECTED]; Mon, 23 May 2005 22:01:31 +0200 Message-Id: [EMAIL PROTECTED] From: [EMAIL PROTECTED] Subject: Bacula GDB traceback of bacula-dir Sender: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Mon, 23 May 2005 22:01:31 +0200 Status: R Using host libthread_db library "/lib/libthread_db.so.1". [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 3346)] [New Thread 32769 (LWP 3351)] [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 3346)] [New Thread 32769 (LWP 3351)] [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 3346)] [New Thread 32769 (LWP 3351)] [New Thread 16386 (LWP 3352)] [New Thread 32771 (LWP 3353)] [New Thread 19726340 (LWP 26151)] [New Thread 19742725 (LWP 26152)] [New Thread 19759110 (LWP 26164)] [New Thread 19775495 (LWP 26172)] [New Thread 19791880 (LWP 26180)] [New Thread 19808265 (LWP 26203)] [New Thread 19824650 (LWP 26267)] [New Thread 19841035 (LWP 26294)] [New Thread 19857420 (LWP 26320)] [New Thread 19873805 (LWP 26381)] [New Thread 19890190 (LWP 26411)] [New Thread 19906575 (LWP 26434)] 0x004c80d4 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 $1 = "atpcc7fc-dir", '\0' repeats 17 times $2 = 0x80b5230 "bacula-dir" $3 = 0x80b5dd0 "/opt/bacula/sbin/" $4 = "MySQL" $5 = 0x80a321c "1.36.3 (22 April 2005)" $6 = 0x809bfb8 "i686-redhat-linux-gnu" $7 = 0x809bfb1 "redhat" $8 = 0x809bfa4 "(Heidelberg)" #0 0x004c80d4 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 #1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c9720 in __pthread_alt_lock () from /lib/i686/libpthread.so.0 #3 0x004c614e in pthread_mutex_lock () from /lib/i686/libpthread.so.0 #4 0x08057dab in jobq_add (jq=0x80b4300, jcr=0x80fc570) at jobq.c:240
Re: AW: [Bacula-users] Bacula director freezing
Sorry, but I'm not too surprised it doesn't work on prior versions. I built and tested (regression) the fix (actually code from 1.37) on version 1.36.3. On Wednesday 25 May 2005 21:55, Jeffery P. Humes wrote: I tried this with the version I have currently. I got the below error: g++ -c -I. -I.. -g -O2 -Wall jobq.c jobq.c: In function `void* jobq_server(void*)': jobq.c:489: error: `dird_free_jcr_pointers' undeclared (first use this function) jobq.c:489: error: (Each undeclared identifier is reported only once for each function it appears in.) make[1]: *** [jobq.o] Error 1 make[1]: Leaving directory `/usr/src/bacula-1.36.2/src/dird' I will upgrade to 1.36.3. -Jeff Kern Sibbald wrote: Please see bug report 331 (if I am not mistaken). I've uploaded a correction that should fix the problem. On Wednesday 25 May 2005 16:09, Jeffery P. Humes wrote: I am not going to be much help here, but just wanted to say that I am having the same issue with (I believe) the director freezing. It is seemingly random. Sometimes it stops responding every other day, sometimes it will go 1-2 weeks. I have been running this version of bacula for about 2 months. Version: kninfratemp-dir Version: 1.36.2 (28 February 2005) (with Tape EOF restore patch applied) I will most likely upgrade to 1.36.3 in the near future. I just dont even know where to start troubleshooting this, I dont get a traceback at all when it freezes. -Jeff Humes Masopust Christian wrote: hi kern, all right, submitted this problem as a bug (331). i'm not sure if this is really a problem with timeout as i don't have any time limits configured in my config. the freeze of director occured when trying to start the first job in the evening. the last job that run before was at 2pm and it finished without problems. anyway, bug is submitted and thank for your help! (but first, please enjoy your holidays!!) chris -Ursprngliche Nachricht- Von: Kern Sibbald [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 24. Mai 2005 22:42 An: bacula-users@lists.sourceforge.net Cc: Masopust Christian Betreff: Re: [Bacula-users] Bacula director freezing Hello, This appears to be a deadlock situation, and seems to be triggered by a watchdog timeout, which means you have probably set some maximum time limit for a job. Though the deadlock could be related to version 1.36.3, I'd be a bit surprised. At this point, I cannot exclude a 1.36.3 specific problem, so I'll carefully check that after returning from vacation. I'd appreciate it if you would submit this traceback as a bug report along with your Director's conf file. On Tuesday 24 May 2005 13:25, Masopust Christian wrote: Yesterday in the evening, just when starting some jobs my director again freezes... here's the output of btraceback (my system is Fedora Core 3, Bacula is 1.36.3): From [EMAIL PROTECTED] Mon May 23 22:01:32 2005 Return-Path: [EMAIL PROTECTED] Received: from atpcc7fc.sie.siemens.at (atpcc7fc.sie.siemens.at [127.0.0.1]) by atpcc7fc.sie.siemens.at (8.13.1/8.13.1) with SMTP id j4NK1VFi027151 for [EMAIL PROTECTED]; Mon, 23 May 2005 22:01:31 +0200 Message-Id: [EMAIL PROTECTED] From: [EMAIL PROTECTED] Subject: Bacula GDB traceback of bacula-dir Sender: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Mon, 23 May 2005 22:01:31 +0200 Status: R Using host libthread_db library /lib/libthread_db.so.1. [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 3346)] [New Thread 32769 (LWP 3351)] [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 3346)] [New Thread 32769 (LWP 3351)] [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 3346)] [New Thread 32769 (LWP 3351)] [New Thread 16386 (LWP 3352)] [New Thread 32771 (LWP 3353)] [New Thread 19726340 (LWP 26151)] [New Thread 19742725 (LWP 26152)] [New Thread 19759110 (LWP 26164)] [New Thread 19775495 (LWP 26172)] [New Thread 19791880 (LWP 26180)] [New Thread 19808265 (LWP 26203)] [New Thread 19824650 (LWP 26267)] [New Thread 19841035 (LWP 26294)] [New Thread 19857420 (LWP 26320)] [New Thread 19873805 (LWP 26381)] [New Thread 19890190 (LWP 26411)] [New Thread 19906575 (LWP 26434)] 0x004c80d4 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 $1 = atpcc7fc-dir, '\0' repeats 17 times $2 = 0x80b5230 bacula-dir $3 = 0x80b5dd0 /opt/bacula/sbin/ $4 = MySQL $5 = 0x80a321c 1.36.3 (22 April 2005) $6 = 0x809bfb8 i686-redhat-linux-gnu $7 = 0x809bfb1 redhat $8 = 0x809bfa4 (Heidelberg) #0 0x004c80d4 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 #1 0x004c7708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x004c9720 in __pthread_alt_lock () from /lib/i686/libpthread.so.0 #3 0x004c614e in pthread_mutex_lock () from /lib/i686/libpthread.so.0 #4 0x08057dab in jobq_add (jq=0x80b4300, jcr=0x80fc570) at