Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-24 Thread Volker Sauer
On Di, 23 Aug 2005, José Luis Tallón <[EMAIL PROTECTED]> wrote: > Volker Sauer wrote: > > >On Di, 23 Aug 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote: > > > > > >>Is there any chance you can upgrade to version 1.37.36 at least for this > >>machine? I'm 99% sure I've resolved all these kinds of

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-23 Thread Kern Sibbald
On Tuesday 23 August 2005 20:26, Martin Simmons wrote: > > On Tue, 23 Aug 2005 14:44:45 +0200, Kern Sibbald <[EMAIL PROTECTED]> > > said: > > Kern> On Tuesday 23 August 2005 13:35, Martin Simmons wrote: > >> > On Tue, 23 Aug 2005 12:30:45 +0200, Kern Sibbald > >> > <[EMAIL PRO

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-23 Thread Martin Simmons
> On Tue, 23 Aug 2005 14:44:45 +0200, Kern Sibbald <[EMAIL PROTECTED]> said: Kern> On Tuesday 23 August 2005 13:35, Martin Simmons wrote: >> > On Tue, 23 Aug 2005 12:30:45 +0200, Kern Sibbald <[EMAIL PROTECTED]> >> > said: >> Kern> Hello Volker, >> Kern> I've now found

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-23 Thread José Luis Tallón
Volker Sauer wrote: >On Di, 23 Aug 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote: > > >>Is there any chance you can upgrade to version 1.37.36 at least for this >>machine? I'm 99% sure I've resolved all these kinds of lockups in the >>Director. You would be a really good test case. >> >>

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-23 Thread Kern Sibbald
On Tuesday 23 August 2005 16:07, Volker Sauer wrote: > On Di, 23 Aug 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote: > > Is there any chance you can upgrade to version 1.37.36 at least for this > > machine? I'm 99% sure I've resolved all these kinds of lockups in the > > Director. You would be a re

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-23 Thread Volker Sauer
On Di, 23 Aug 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote: > > Is there any chance you can upgrade to version 1.37.36 at least for this > machine? I'm 99% sure I've resolved all these kinds of lockups in the > Director. You would be a really good test case. Actually I wanted to wait for 1.38

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-23 Thread Martin Simmons
> On Tue, 23 Aug 2005 14:28:53 +0200, Volker Sauer <[EMAIL PROTECTED]> said: Volker> On Di, 23 Aug 2005, Martin Simmons <[EMAIL PROTECTED]> wrote: >> > On Tue, 23 Aug 2005 12:30:45 +0200, Kern Sibbald <[EMAIL PROTECTED]>= Volker> said: >> =20 Kern> I've now found the time to loo

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-23 Thread Kern Sibbald
Try running Bacula without the debugger. If it still crashes or locks up, then you will need to run it under the debugger. Is there any chance you can upgrade to version 1.37.36 at least for this machine? I'm 99% sure I've resolved all these kinds of lockups in the Director. You would be a r

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-23 Thread Kern Sibbald
On Tuesday 23 August 2005 13:35, Martin Simmons wrote: > > On Tue, 23 Aug 2005 12:30:45 +0200, Kern Sibbald <[EMAIL PROTECTED]> > > said: > > Kern> Hello Volker, > > Kern> I've now found the time to look over your debug output below. My > analysis Kern> leads me to believe that what is

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-23 Thread Volker Sauer
On Di, 23 Aug 2005, Martin Simmons <[EMAIL PROTECTED]> wrote: > > On Tue, 23 Aug 2005 12:30:45 +0200, Kern Sibbald <[EMAIL PROTECTED]> > > said: > > Kern> I've now found the time to look over your debug output below. My > analysis > Kern> leads me to believe that what is show is "i

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-23 Thread Martin Simmons
> On Tue, 23 Aug 2005 12:30:45 +0200, Kern Sibbald <[EMAIL PROTECTED]> said: Kern> Hello Volker, Kern> I've now found the time to look over your debug output below. My analysis Kern> leads me to believe that what is show is "impossible". That is the code flow Kern> as created in

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-23 Thread Kern Sibbald
Hello Volker, I've now found the time to look over your debug output below. My analysis leads me to believe that what is show is "impossible". That is the code flow as created in the source code cannot possibly do what is indicated in the dump. What is shown in the dump is that the subroutine

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-21 Thread Kern Sibbald
Hello Volker, It looks like the segfault was in Bacula. This occurs if you reload the config file -- there is a problem if there are jobs scheduled. It shows up when the jobs run, and produces the seg fault. I'll fix this after version 1.38 is released. I don't think this seg fault is related

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-21 Thread Volker Sauer
On Di, 16 Aug 2005, Volker Sauer <[EMAIL PROTECTED]> wrote: > does it make sense to move /lib/tls ? Hi Kern, I tried to move /lib/tls and ran bacula-dir again under the debugger: (gdb) run -s -f -c /etc/bacula/bacula-dir.conf Starting program: /usr/sbin/bacula-dir -s -f -c /etc/bacula/bacula-d

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-08-16 Thread Volker Sauer
On Sa, 30 Jul 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote: > Wait a bit before running the debugger on the other daemons. This backtrace > is > very different, and seems to show a mutex lock up. I need to look at it in > detail ... Hi Kern, did you look at it into detail? Can I provide some

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-30 Thread Kern Sibbald
Wait a bit before running the debugger on the other daemons. This backtrace is very different, and seems to show a mutex lock up. I need to look at it in detail ... On Friday 29 July 2005 23:31, Volker Sauer wrote: > On Fr, 29 Jul 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote: > > What I see fro

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-29 Thread Volker Sauer
On Fr, 29 Jul 2005, Volker Sauer <[EMAIL PROTECTED]> wrote: > > Again, the director locked. This time it locked up at the first job > (Client Conc. Jobs = 1) and I was *not* able to connect with bconsole. > Therefore I couldn't get the status from sd or the clients. > Some additional info: I had

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-29 Thread Volker Sauer
On Fr, 29 Jul 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote: > > What I see from this is that everything in the Director is normal. It thinks > that something like 5 jobs are running. The threads are all waiting on input > from one of the other daemons, and there is no mutex dead lock situation

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-29 Thread Kern Sibbald
Hello Volker, On Friday 29 July 2005 18:23, Volker Sauer wrote: > On Do, 28 Jul 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote: > > On Thursday 28 July 2005 16:17, Volker Sauer wrote: > > > > I'll run the dirctor under the debugger and we'll see.. > > Hi Kern, > > it happend again ;-) Surprisingly t

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-29 Thread Volker Sauer
On Do, 28 Jul 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote: > On Thursday 28 July 2005 16:17, Volker Sauer wrote: > > > I'll run the dirctor under the debugger and we'll see.. Hi Kern, it happend again ;-) Surprisingly this time the bconsole was not locked - I could still connect to bacula-dir wh

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-28 Thread Kern Sibbald
On Thursday 28 July 2005 16:17, Volker Sauer wrote: > > I'll run the dirctor under the debugger and we'll see.. > > Short question on this: > I rebuilt the bacula package and removed the "strip" commands from the > build script. The result is the unstripped binary of bacula-dir. > What I'll do now

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-28 Thread Volker Sauer
> > I'll run the dirctor under the debugger and we'll see.. > Short question on this: I rebuilt the bacula package and removed the "strip" commands from the build script. The result is the unstripped binary of bacula-dir. What I'll do now is copy only this unstripped binary to the production mac

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-22 Thread Volker Sauer
On Fr, 22 Jul 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote: > Hello Volker, > > Please remind me what Linux distro and what kernel version you are using. Debian Sarge Kernel 2.6.8 > If you are on a 2.4 kernel, then the problem is 99% for sure the /lib/tls bug > that is mentioned in the manual.

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-22 Thread Kern Sibbald
Hello Volker, Please remind me what Linux distro and what kernel version you are using. If you are on a 2.4 kernel, then the problem is 99% for sure the /lib/tls bug that is mentioned in the manual. The manual suggests two solutions -- I prefer to zap the /lib/tls library by moving it to /lib/

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-22 Thread Russell Howe
Volker Sauer wrote: > 0x401a7436 in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0 > #0 0x401a5295 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib/tls/libpthread.so.0 > #7 0x401a2b63 in start_thread () from /lib/tls/libpthread.so.0 > #8 0x4037418a in clone () from /lib/tls/lib

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-22 Thread Volker Sauer
On So, 17 Jul 2005, Kern Sibbald <[EMAIL PROTECTED]> wrote: > > You could try running it under the debugger without debugging symbols. This > is > not ideal, but if it is an internal deadlock, I should be able to see it. > That might avoid you having to spend the time to rebuild it. Unfortuna

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-19 Thread Volker Sauer
Hi Kern, On So, 17 Jul 2005, Kern Sibbald wrote: > Hello Volker, > > > Mmh, this could be but actually there's no sign of stale NFS > > handles in the logs. > > In one of your earlier emails there were NFS version warning messages. It is > possible these could be a source of problems, b

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-17 Thread Kern Sibbald
Hello Volker, On Sunday 17 July 2005 21:29, Volker Sauer wrote: > Hi Kern, > > the bacula-server runs independently of NFS mounts, sorry to say that. > The bsr files are copied by a cron-job to NFS and all the bacula files > and mysql are on local disks. The machine which jobs caused the hangs > i

[Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-17 Thread Volker Sauer
Hi Kern, the bacula-server runs independently of NFS mounts, sorry to say that. The bsr files are copied by a cron-job to NFS and all the bacula files and mysql are on local disks. The machine which jobs caused the hangs is actually independent from NFS, too - at least I do not write or read an

Re: [Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-17 Thread Phil Stracchino
On Sun, Jul 17, 2005 at 12:28:24PM -0400, Phil Stracchino wrote: > I'm CC'ing everyone involved on this because I seem unable to send mail > to sourceforge (because Verizon refuses to give me a static IP on their > residential DSL service, and I can't get any service except Verizon > because I'm on

[Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-17 Thread Kern Sibbald
Hello Phil, Yes, now that you mention it, I remember having looked at these options a few years ago. I changed my hard into a soft and ended up having frequent NFS failures during long file transfers. I finally had to go back to using hard mounts. Perhaps things have improved since my trials

[Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-17 Thread Phil Stracchino
On Sun, Jul 17, 2005 at 03:57:54PM +0200, Kern Sibbald wrote: > Hello Volker, > > About the only thing I can think of is that you have a stale or bad NFS > connection and you are trying to write the bootstrap file to another machine > with the bad NFS link -- or perhaps the other machine is just

[Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-17 Thread Kern Sibbald
Hello Volker, About the only thing I can think of is that you have a stale or bad NFS connection and you are trying to write the bootstrap file to another machine with the bad NFS link -- or perhaps the other machine is just down. In that case, Bacula will hang forever. Don't blame me -- I do

[Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-15 Thread Volker Sauer
On Fr, 15 Jul 2005, Arno Lehmann wrote: > >I'll upgrade to 1.36.3 and see what happens. Maybe "Fix deadlock in > >multiple simultaneous jobs." (from ReleaseNotes) could be the right one. > >I already setup this site with 1.36.3 FileFormat because I knew it's > >going to be required! > I had the s

[Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-14 Thread Arno Lehmann
Hello, Volker Sauer wrote: Hi Kern, I'll upgrade to 1.36.3 and see what happens. Maybe "Fix deadlock in multiple simultaneous jobs." (from ReleaseNotes) could be the right one. I already setup this site with 1.36.3 FileFormat because I knew it's going to be required! One thing to note - I ha

[Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-13 Thread Volker Sauer
Hi Kern, I'll upgrade to 1.36.3 and see what happens. Maybe "Fix deadlock in multiple simultaneous jobs." (from ReleaseNotes) could be the right one. I already setup this site with 1.36.3 FileFormat because I knew it's going to be required! Regards Volker On Mi, 13 Jul 2005, Kern Sibbald wrote:

[Bacula-users] Re: [Bacula-devel] Severe problem: director hangs in production system

2005-07-13 Thread Kern Sibbald
Hello Volker, There were one or two race conditions that I fixed in 1.36.3. You might look at the release notes and see if they appy to you. Beware 1.36.3 requires the new format FileSets (and hence a Full backup unless you explicitly disable it). On Tuesday 12 July 2005 00:24, Volker Sauer