Re: [Bacula-users] Crashing storage director. Need help getting trace.

2009-12-15 Thread Martin Simmons
 On Mon, 14 Dec 2009 16:47:00 +0800, Jim Barber said:
 
 Jim Barber wrote:
  
  Thanks Martin.
  
  I've compiled and installed version 3.1.6 from a git pull I did on 10th Dec.
  I'm not sure if this new version will crash or not.
  But I've manually attached a gdb session to it just in case it does.
  
  Thanks.
 
 I'm not having much luck with this.
 When I attached to the process with gdb it seems to interfere with it.
 It's like to stops running.
 It no longer responds to status commands etc.
 
 I'm not familiar enough with gdb to resolve it.
 I tried the 'c'ontinue command just in case attaching stops the process.
 But it doesn't make any difference.

Yes, you do need to use the continue command after attaching, but that should
work.  Possibly your version of gdb is broken, which might also explain the
lack of email from btraceback.

Did gdb print anything after you did that?  It may be worth posting the whole
gdb session.

__Martin

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Crashing storage director. Need help getting trace.

2009-12-14 Thread Jim Barber
Jim Barber wrote:
 
 Thanks Martin.
 
 I've compiled and installed version 3.1.6 from a git pull I did on 10th Dec.
 I'm not sure if this new version will crash or not.
 But I've manually attached a gdb session to it just in case it does.
 
 Thanks.

I'm not having much luck with this.
When I attached to the process with gdb it seems to interfere with it.
It's like to stops running.
It no longer responds to status commands etc.

I'm not familiar enough with gdb to resolve it.
I tried the 'c'ontinue command just in case attaching stops the process.
But it doesn't make any difference.

Regards,

--
Jim Barber
DDI Health

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Crashing storage director. Need help getting trace.

2009-12-13 Thread Jim Barber
Martin Simmons wrote:
 
 Try doing it interactively by attaching gdb to the bacula-sd process before it
 crashes (run gdb /path/to/bacula-sd and then use gdb's attach command).  Then
 use the commands in btraceback.gdb when it crashes.
 
 __Martin

Thanks Martin.

I've compiled and installed version 3.1.6 from a git pull I did on 10th Dec.
I'm not sure if this new version will crash or not.
But I've manually attached a gdb session to it just in case it does.

Thanks.

--
Jim Barber
DDI Health

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Crashing storage director. Need help getting trace.

2009-12-11 Thread Martin Simmons
 On Mon, 07 Dec 2009 14:30:41 +0800, Jim Barber said:
 
 Hi all.
 
 I have a problem where every weekend (or more frequently) my storage daemon 
 crashes.
 The crash is random, but is happening either while running VirtualFull jobs 
 or Copy jobs.
 So far it hasn't crashed during regular incremental backups.
 
 I am running version 3.0.3 of the Bacula software.
 
 First of all I tried adding a '-d 200' to the arguments that start bacula-sd.
 This produced a lot of messages, but nothing unusual that I can see prior to 
 the crash.
 The last few lines in this log look like so:
 
   vc-sd: mac.c:241-468 before write JobId=468 FI=363302 SessId=1 Strm=MD5 
 len=16
   vc-sd: mac.c:241-468 before write JobId=468 FI=363303 SessId=1 
 Strm=UATTR len=104
   vc-sd: mac.c:241-468 before write JobId=468 FI=363304 SessId=1 
 Strm=UATTR len=122
   vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 
 Strm=UATTR len=77
   vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 
 Strm=DATA len=4496
   vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 Strm=MD5 
 len=16
 
 So next I have been trying to get the btraceback program running.
 
 I am using Debian packages (self built based on the 3.0.2 Debian sources).
 These run the storage daemon under the bacula:tape user:group.
 So I modified the btraceback program to use sudo to run gdb.
 I also configured sudo to allow the bacula user to do so without being 
 prompted for a password.
 I then modified the Debian sources so that packages with debugging symbols 
 are produced.
 
 If I become the bacula user and run a test like so:
 
   /usr/sbin/btraceback /usr/sbin/bacula-sd $PID
 
 Where: $PID = the process ID of the bacula-sd process,
 then I get an email showing debugging information.
 So as far as I can tell the btraceback program should be working.
 
 I had another crash of the storage daemon after making the changes and no 
 email was sent.
 Nor was a bacula-sd.9103.traceback file produced.
 So I can't send any useful information to try and track down why the storage 
 daemon is so unstable.
 
 It was also unstable when using the 3.0.2 Debian package as well so I don't 
 think it is my rebuild that is causing the issue.
 Although I feel 3.0.3 is more stable than 3.0.2 was, I still can't get a 
 complete weeks cycle working without a crash.
 
 The /etc/init.d/bacula-sd script defines the PATH to be, 
 PATH=/sbin:/bin:/usr/sbin:/usr/bin
 So /usr/sbin is in the PATH and so I'd imagine the program should be able to 
 find the traceback program.
 
 Any ideas how I can get some useful information from the crash?

Try doing it interactively by attaching gdb to the bacula-sd process before it
crashes (run gdb /path/to/bacula-sd and then use gdb's attach command).  Then
use the commands in btraceback.gdb when it crashes.

__Martin

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Crashing storage director. Need help getting trace.

2009-12-06 Thread Jim Barber
Hi all.

I have a problem where every weekend (or more frequently) my storage daemon 
crashes.
The crash is random, but is happening either while running VirtualFull jobs or 
Copy jobs.
So far it hasn't crashed during regular incremental backups.

I am running version 3.0.3 of the Bacula software.

First of all I tried adding a '-d 200' to the arguments that start bacula-sd.
This produced a lot of messages, but nothing unusual that I can see prior to 
the crash.
The last few lines in this log look like so:

vc-sd: mac.c:241-468 before write JobId=468 FI=363302 SessId=1 Strm=MD5 
len=16
vc-sd: mac.c:241-468 before write JobId=468 FI=363303 SessId=1 
Strm=UATTR len=104
vc-sd: mac.c:241-468 before write JobId=468 FI=363304 SessId=1 
Strm=UATTR len=122
vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 
Strm=UATTR len=77
vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 
Strm=DATA len=4496
vc-sd: mac.c:241-468 before write JobId=468 FI=363305 SessId=1 Strm=MD5 
len=16

So next I have been trying to get the btraceback program running.

I am using Debian packages (self built based on the 3.0.2 Debian sources).
These run the storage daemon under the bacula:tape user:group.
So I modified the btraceback program to use sudo to run gdb.
I also configured sudo to allow the bacula user to do so without being prompted 
for a password.
I then modified the Debian sources so that packages with debugging symbols are 
produced.

If I become the bacula user and run a test like so:

/usr/sbin/btraceback /usr/sbin/bacula-sd $PID

Where: $PID = the process ID of the bacula-sd process,
then I get an email showing debugging information.
So as far as I can tell the btraceback program should be working.

I had another crash of the storage daemon after making the changes and no email 
was sent.
Nor was a bacula-sd.9103.traceback file produced.
So I can't send any useful information to try and track down why the storage 
daemon is so unstable.

It was also unstable when using the 3.0.2 Debian package as well so I don't 
think it is my rebuild that is causing the issue.
Although I feel 3.0.3 is more stable than 3.0.2 was, I still can't get a 
complete weeks cycle working without a crash.

The /etc/init.d/bacula-sd script defines the PATH to be, 
PATH=/sbin:/bin:/usr/sbin:/usr/bin
So /usr/sbin is in the PATH and so I'd imagine the program should be able to 
find the traceback program.

Any ideas how I can get some useful information from the crash?

-- 
--
Jim Barber
DDI Health

--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users