Re: intermittent amanda failure

2010-03-09 Thread Steve Wray
Gene Heskett wrote: On Tuesday 09 March 2010, Dustin J. Mitchell wrote: On Wed, Mar 3, 2010 at 1:58 PM, Steve Wray steve.w...@cwa.co.nz wrote: Right, so the LATEST most up-to-date version of Debian uses a 3 year old version of amanda. Fantastic, thanks Debian for keeping things so 'stable'.

Re: intermittent amanda failure

2010-03-09 Thread Gene Heskett
On Tuesday 09 March 2010, Steve Wray wrote: Gene Heskett wrote: On Tuesday 09 March 2010, Dustin J. Mitchell wrote: On Wed, Mar 3, 2010 at 1:58 PM, Steve Wray steve.w...@cwa.co.nz wrote: Right, so the LATEST most up-to-date version of Debian uses a 3 year old version of amanda. Fantastic,

Re: intermittent amanda failure

2010-03-08 Thread Dustin J. Mitchell
On Wed, Mar 3, 2010 at 1:58 PM, Steve Wray steve.w...@cwa.co.nz wrote: Right, so the LATEST most up-to-date version of Debian uses a 3 year old version of amanda. Fantastic, thanks Debian for keeping things so 'stable'. To be fair, that's exactly the intent, and maintaining a Linux distribution

Re: intermittent amanda failure

2010-03-08 Thread Gene Heskett
On Tuesday 09 March 2010, Dustin J. Mitchell wrote: On Wed, Mar 3, 2010 at 1:58 PM, Steve Wray steve.w...@cwa.co.nz wrote: Right, so the LATEST most up-to-date version of Debian uses a 3 year old version of amanda. Fantastic, thanks Debian for keeping things so 'stable'. To be fair, that's

Re: intermittent amanda failure

2010-03-03 Thread Steve Wray
I'd like to put in an update to this thread, for anyone interested. The backup server had been running Debian Etch. The version of amanda on Etch was giving the errors described in this thread. I upgraded the server to Debian Lenny. The problems still occured with the version in Lenny.

Re: intermittent amanda failure

2010-01-25 Thread Steve Wray
Dustin J. Mitchell wrote: On Thu, Jan 21, 2010 at 5:05 PM, Jean-Louis Martineau martin...@zmanda.com wrote: xinetd is still configured to accept a tcp connection, but amandad expect a udp packet, so amandad do nothing and the server fail while waiting for an ACK. Right - it was the failure I

Re: intermittent amanda failure

2010-01-21 Thread Dustin J. Mitchell
On Wed, Jan 20, 2010 at 5:43 PM, Steve Wray steve.w...@cwa.co.nz wrote: The problem I had before was, to reiterate: Disklist with two entries. One entry uses bsdtcp The other entry uses BSD. If the client of the disklist entry that is configured on the server to use bsdtcp is not

Re: intermittent amanda failure

2010-01-21 Thread Jean-Louis Martineau
Dustin J. Mitchell wrote: If I restart euclid's inetd config to run it with the wrong -auth parameter, then amcheck says: WARNING: euclid: selfcheck request failed: timeout waiting for ACK Client check: 2 hosts checked in 30.030 seconds. 1 problem found. xinetd is still configured to accept

Re: intermittent amanda failure

2010-01-21 Thread Dustin J. Mitchell
On Thu, Jan 21, 2010 at 5:05 PM, Jean-Louis Martineau martin...@zmanda.com wrote: xinetd is still configured to accept a tcp connection, but amandad expect a udp packet, so amandad do nothing and the server fail while waiting for an ACK. Right - it was the failure I expected to see, not

Re: intermittent amanda failure

2010-01-20 Thread Steve Wray
I am going to try and resurrect this thread having been able to home in on the apparent bug I found with bsdtcp auth. I have now converted most of our systems to using bsdtcp and amcheck is showing no errors at this time. The problem I had before was, to reiterate: Disklist with two

Re: intermittent amanda failure

2010-01-11 Thread Steve Wray
Steve Wray wrote: Jean-Louis Martineau wrote: Run 'amadmin CONFIG disklist' and check the auth is set as expected for all dles. I've done this, with the amanda.conf having bsdudp and with it having bsdtcp for that entry. In both cases all auth entries for all other DLE's are 'BSD'. In

Re: intermittent amanda failure

2010-01-07 Thread Brian Cuttler
On Wed, Jan 06, 2010 at 04:15:41PM -0600, Dustin J. Mitchell wrote: On Wed, Jan 6, 2010 at 4:01 PM, Steve Wray steve.w...@cwa.co.nz wrote: Am I to understand that there could be a problem in having 'too many' DLE's for bsd or bsdudp to cope with? I never thought of there being a limit

Re: intermittent amanda failure

2010-01-06 Thread Steve Wray
Steve Wray wrote: Dustin J. Mitchell wrote: I suspect an estimate or data timeout. Have you tried increasing dtimeout and etimeout? etimeout 2000 dtimeout 2000 I'd be surprised. These seem like fairly substantial values. 2000 seconds is roughly half an hour. I'll increase them by another

Re: intermittent amanda failure

2010-01-06 Thread Jean-Louis Martineau
Steve Wray wrote: On the client, in the sendbackup.20100106012630.debug log I see: sendbackup-gnutar: time 0.056: /usr/lib/amanda/runtar: pid 3348 sendbackup: time 0.057: started backup sendbackup: time 90.352: index tee cannot write [Broken pipe] sendbackup: time 90.352: pid 3346 finish time

Re: intermittent amanda failure

2010-01-06 Thread Steve Wray
Jean-Louis Martineau wrote: Steve Wray wrote: On the client, in the sendbackup.20100106012630.debug log I see: sendbackup-gnutar: time 0.056: /usr/lib/amanda/runtar: pid 3348 sendbackup: time 0.057: started backup sendbackup: time 90.352: index tee cannot write [Broken pipe] sendbackup: time

Re: intermittent amanda failure

2010-01-06 Thread Steve Wray
Jean-Louis Martineau wrote: Steve Wray wrote: On the client, in the sendbackup.20100106012630.debug log I see: sendbackup-gnutar: time 0.056: /usr/lib/amanda/runtar: pid 3348 sendbackup: time 0.057: started backup sendbackup: time 90.352: index tee cannot write [Broken pipe] sendbackup: time

Re: intermittent amanda failure

2010-01-06 Thread Dustin J. Mitchell
On Wed, Jan 6, 2010 at 3:42 PM, Steve Wray steve.w...@cwa.co.nz wrote: Ah hang on, am I right in understanding that you can't have just one dle using bsdtcp auth? That they would all have to have it? (ie the inetd configuration) Well, all DLEs on a given host have to have the same auth. If

Re: intermittent amanda failure

2010-01-06 Thread Jean-Louis Martineau
Steve Wray wrote: Jean-Louis Martineau wrote: Steve Wray wrote: On the client, in the sendbackup.20100106012630.debug log I see: sendbackup-gnutar: time 0.056: /usr/lib/amanda/runtar: pid 3348 sendbackup: time 0.057: started backup sendbackup: time 90.352: index tee cannot write [Broken

Re: intermittent amanda failure

2010-01-06 Thread Steve Wray
Jean-Louis Martineau wrote: Steve Wray wrote: Dustin J. Mitchell wrote: On Wed, Jan 6, 2010 at 4:01 PM, Steve Wray steve.w...@cwa.co.nz wrote: Am I to understand that there could be a problem in having 'too many' DLE's for bsd or bsdudp to cope with? I never thought of there being a limit

Re: intermittent amanda failure

2010-01-06 Thread Steve Wray
Dustin J. Mitchell wrote: On Wed, Jan 6, 2010 at 3:42 PM, Steve Wray steve.w...@cwa.co.nz wrote: Ah hang on, am I right in understanding that you can't have just one dle using bsdtcp auth? That they would all have to have it? (ie the inetd configuration) Well, all DLEs on a given host have to

Re: intermittent amanda failure

2010-01-06 Thread Jean-Louis Martineau
Steve Wray wrote: Dustin J. Mitchell wrote: On Wed, Jan 6, 2010 at 4:01 PM, Steve Wray steve.w...@cwa.co.nz wrote: Am I to understand that there could be a problem in having 'too many' DLE's for bsd or bsdudp to cope with? I never thought of there being a limit to the number of DLE's

Re: intermittent amanda failure

2010-01-06 Thread Dustin J. Mitchell
On Wed, Jan 6, 2010 at 4:01 PM, Steve Wray steve.w...@cwa.co.nz wrote: Am I to understand that there could be a problem in having 'too many' DLE's for bsd or bsdudp to cope with? I never thought of there being a limit to the number of DLE's before... Our disklist file has 178. Yes, it's

Re: intermittent amanda failure

2010-01-06 Thread Steve Wray
Dustin J. Mitchell wrote: On Wed, Jan 6, 2010 at 4:01 PM, Steve Wray steve.w...@cwa.co.nz wrote: Am I to understand that there could be a problem in having 'too many' DLE's for bsd or bsdudp to cope with? I never thought of there being a limit to the number of DLE's before... Our disklist file

Re: intermittent amanda failure

2010-01-06 Thread Steve Wray
Jean-Louis Martineau wrote: Steve Wray wrote: Jean-Louis Martineau wrote: Steve Wray wrote: On the client, in the sendbackup.20100106012630.debug log I see: sendbackup-gnutar: time 0.056: /usr/lib/amanda/runtar: pid 3348 sendbackup: time 0.057: started backup sendbackup: time 90.352: index

Re: intermittent amanda failure

2010-01-06 Thread Jean-Louis Martineau
Run 'amadmin CONFIG disklist' and check the auth is set as expected for all dles. Jean-Louis Steve Wray wrote: Jean-Louis Martineau wrote: Steve Wray wrote: Jean-Louis Martineau wrote: Steve Wray wrote: On the client, in the sendbackup.20100106012630.debug log I see: sendbackup-gnutar:

Re: intermittent amanda failure

2010-01-06 Thread Steve Wray
Jean-Louis Martineau wrote: Run 'amadmin CONFIG disklist' and check the auth is set as expected for all dles. I've done this, with the amanda.conf having bsdudp and with it having bsdtcp for that entry. In both cases all auth entries for all other DLE's are 'BSD'. In both cases only that

Re: intermittent amanda failure

2010-01-06 Thread Steve Wray
I'll attach some debug logs. For the purposes of this test I cut the disklist file down to two entries: One entry is a client which is configured to use simple BSD auth. The other entry is a client which is configured to use bsdtcp auth. Both of these have been verified by running amcheck

intermittent amanda failure

2010-01-05 Thread Steve Wray
Hi there I have a server, at a remote location, which has recently started to intermittently fail backups. The amanda client is running Debian Lenny, the amanda server is running Debian Etch. On the amanda server, dpkg -s amanda-server shows Version: 1:2.5.2p1-5 On the amanda client, dpkg

Re: intermittent amanda failure

2010-01-05 Thread Dustin J. Mitchell
I suspect an estimate or data timeout. Have you tried increasing dtimeout and etimeout? Dustin -- Open Source Storage Engineer http://www.zmanda.com

Re: intermittent amanda failure

2010-01-05 Thread Steve Wray
Dustin J. Mitchell wrote: I suspect an estimate or data timeout. Have you tried increasing dtimeout and etimeout? etimeout 2000 dtimeout 2000 I'd be surprised. These seem like fairly substantial values. 2000 seconds is roughly half an hour. I'll increase them by another 1000 seconds