date:20090824

Re: New verbosity in the log when running amcheck after 2.6.2alpha20090824

2009-08-24 Thread Gene Heskett

:35:35 coyote xinetd[2394]: Swapping defaults
>Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda
>Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx
>Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape
>Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0
>(services)
>Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration
>Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults
>Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda
>Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx
>Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape
>Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0
>(services)
>Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration
>Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults
>Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda
>Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx
>Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape
>Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0
>(services)
>Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration
>Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape
>Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0
>(services)
>Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration
>Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape
>Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0
>(services)
>Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration
>Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape
>Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0
>(services)
>Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration
>Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape
>Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0
>(services)
>Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration
>Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape
>Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0
>(services)
>Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration
>Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape
>Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0
>(services)
>Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration
>Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx
>Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape
>Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0
>(services)
>Aug 24 10:35:38 coyote xinetd[2394]: Starting reconfiguration
>Aug 24 10:35:38 coyote xinetd[2394]: Swapping defaults
>Aug 24 10:35:38 coyote xinetd[2394]: readjusting service amanda
>Aug 24 10:35:38 coyote xinetd[2394]: readjusting service amandaidx
>Aug 24 10:35:38 coyote xinetd[2394]: readjusting service amidxtape
>Aug 24 10:35:38 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0
>(services)
>
>Then this afternoon, 2 more lines I don't recall seeing before:
>
>Aug 24 16:58:45 coyote xinetd[2394]: START: amanda pid=13481
>from=:::192.168.71.3
>Aug 24 16:59:15 coyote xinetd[2394]: EXIT: amanda status=0 pid=13481
>duration=30(sec)
>
>Anybody have a clue I could borrow?
>
>Thank you.

Another set of oddities:

[r...@coyote am

New verbosity in the log when running amcheck after 2.6.2alpha20090824

2009-08-24 Thread Gene Heskett

Greetings;

This doesn't look terribly comforting, and occurred apparently at the amcheck 
phase of a script I wrote to install new amanda snapshots:

Aug 24 10:35:34 coyote xinetd[2394]: Starting reconfiguration   
  
Aug 24 10:35:34 coyote xinetd[2394]: Swapping defaults  
  
Aug 24 10:35:34 coyote xinetd[2394]: readjusting service amanda 
  
Aug 24 10:35:34 coyote xinetd[2394]: readjusting service amandaidx  
  
Aug 24 10:35:34 coyote xinetd[2394]: readjusting service amidxtape  
  
Aug 24 10:35:34 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 
(services)   
Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration   
  
Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults  
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda 
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx  
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape  
  
Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 
(services)   
Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration   
  
Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults  
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda 
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx  
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape  
  
Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 
(services)   
Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration   
  
Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults  
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda 
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx  
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape  
  
Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 
(services)   
Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration   
  
Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults  
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda 
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx  
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape  
  
Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 
(services)   
Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration   
  
Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults  
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda 
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx  
  
Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape

Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread John Hein

stan wrote at 16:59 -0400 on Aug 24, 2009:
 > The firts thing I notice when comparing this function in 2.5.0 vs 2.5.2 is
 > that 2.5.0 does:
 > 
 > tv.tv_usec = 0;
 > 
 > and 2.5.2 does not. Could thim make a difference? Both do 
 > 
 > tv.tv_sec = timeout;

In 2.5.2, the memset sets the entire struct to 0.
2.5.0 is slightly more efficient, but otherwise the results
wind up being the same.

Nothing to see there.

Re: How can a write to a blocking pipe return EAGAIN?

2009-08-24 Thread stan

On Mon, Aug 24, 2009 at 09:34:50PM +0100, Owain Ainsworth wrote:
> On Mon, Aug 24, 2009 at 04:25:39PM -0400, stan wrote:
> >  I am trying to get Amanda of a recent vintage working on 4.5. The developrs
> >  don't understand how writing to a blocking pipe can return EAGAIN.
> >  Should the write just block?
> 
> Are they using pthreads?

I am fairly certain not.

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.

Re: How can a write to a blocking pipe return EAGAIN?

2009-08-24 Thread stan

On Mon, Aug 24, 2009 at 10:39:21PM +0200, Bret S. Lambert wrote:
> On Mon, Aug 24, 2009 at 04:25:39PM -0400, stan wrote:
> >  I am trying to get Amanda of a recent vintage working on 4.5. The developrs
> >  don't understand how writing to a blocking pipe can return EAGAIN.
> >  Should the write just block?
> 
> Yes, it should, and a quick, about-to-go-to-bed readthrough of the
> pipe write path shows that it shouldn't; is Amanda doing fcntls to
> the fd? If so, check those to be absolutely sure that they're not
> doing something hinkey.
> 
Thanks for the quick reply.

I am trying to come up with a small demonstration peice of code to try to
localize teh problem.

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.

Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread stan

On Fri, Aug 21, 2009 at 09:57:36AM -0600, John Hein wrote:
> stan wrote at 10:56 -0400 on Aug 21, 2009:
>  > OK here is the latest on this saga :-)
>  > 
>  > On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to
>  > back this machine up successfully (using classic UDP based authentication)
>  > 
>  > On another of them, I built 2.5.2p1. The first attempt to back this machine
>  > up failed. I checked the log files, and found they were having issues
>  > because /etc/amdates was missing. I corrected that, and started a 2nd
>  > backup run. (Remember amcheck reports all is well with this machine). I 
>  > got the following from amstatus when I attempted to back up this machine.
>  > Also remember, one of the test I ran with a 2.6.1 client was to connect a
>  > test machine directly to the client, using a crossover cable to eliminate
>  > any firewall, or router type issues.
>  > 
>  > I am attaching, what I think is, the amadnad debug file associated with 
> this
>  > failure.
>  > 
>  > Can anyone suggest what I can do to further troubleshoot this?
>  > 
>  > pb48:wd0f 1  dumper: [could not connect DATA stream:
>  > can't connect stream to pb48.meadwestvaco.com port 11996: Connection
>  > refused] (10:37:27)
>  > 
>.
>.
>.
>  > amandad: time 30.019: stream_accept: timeout after 30 seconds
>  > amandad: time 30.019: security_stream_seterr(0x86b67000, can't accept new 
> stream connection: No such file or directory)
>  > amandad: time 30.019: stream 0 accept failed: unknown protocol error
>  > amandad: time 30.019: security_stream_close(0x86b67000)
>  > amandad: time 60.027: stream_accept: timeout after 30 seconds
>  > amandad: time 60.027: security_stream_seterr(0x81212000, can't accept new 
> stream connection: No such file or directory)
>  > amandad: time 60.027: stream 1 accept failed: unknown protocol error
>  > amandad: time 60.027: security_stream_close(0x81212000)
>  > amandad: time 90.035: stream_accept: timeout after 30 seconds
>  > amandad: time 90.036: security_stream_seterr(0x84877000, can't accept new 
> stream connection: No such file or directory)
>  > amandad: time 90.036: stream 2 accept failed: unknown protocol error
>  > amandad: time 90.036: security_stream_close(0x84877000)
>  > amandad: time 90.036: security_close(handle=0x81bbf800, driver=0x298a9240 
> (BSD))
>  > amandad: time 120.044: pid 17702 finish time Fri Aug 21 10:39:27 2009
> 
> For some reason the socket is not getting marked ready for read.
> select(2) is timing out waiting.  Firewall setup perhaps?
> 
> This bit of code in 2.5.2p1's common-src/stream.c is where
> the failure is happening for you...
> 
> int
> stream_accept(
> int server_socket,
> int timeout,
> size_t sendsize,
> size_t recvsize)
> {
> SELECT_ARG_TYPE readset;
> struct timeval tv;
> int nfound, connected_socket;
> int save_errno;
> int ntries = 0;
> in_port_t port;
> 
> assert(server_socket >= 0);
> 
> do {
> ntries++;
> memset(&tv, 0, SIZEOF(tv));
> tv.tv_sec = timeout;
> memset(&readset, 0, SIZEOF(readset));
> FD_ZERO(&readset);
> FD_SET(server_socket, &readset);
> nfound = select(server_socket+1, &readset, NULL, NULL, &tv);
> if(nfound <= 0 || !FD_ISSET(server_socket, &readset)) {
> save_errno = errno;
> if(nfound < 0) {
> dbprintf(("%s: stream_accept: select() failed: %s\n",
>   debug_prefix_time(NULL),
>   strerror(save_errno)));
> } else if(nfound == 0) {
> dbprintf(("%s: stream_accept: timeout after %d second%s\n",
>   debug_prefix_time(NULL),
>   timeout,
>   (timeout == 1) ? "" : "s"));
> errno = ENOENT; /* ??? */
> return -1;
The firts thing I notice when comparing this function in 2.5.0 vs 2.5.2 is
that 2.5.0 does:

tv.tv_usec = 0;

and 2.5.2 does not. Could thim make a difference? Both do 

tv.tv_sec = timeout;

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.

[no subject]

2009-08-24 Thread Robert Reilly

Hello List

I am using amvault to copy from disk to tape it works great The one
problem I see it does not record the label and barcode like it does if
it was a backup. I am using the chg-zd-mtx script

Any thoughts ?

 

 

 

 

 

 

 

<>BEGIN:VCARD
VERSION:2.1
X-MS-SIGNATURE:YES
N;LANGUAGE=en-us:Reilly;Robert
FN:Robert Reilly
ORG:Fresh Direct
TITLE:Director IT / Systems Engineering
TEL;WORK;VOICE:(718) 928-1172
TEL;HOME;VOICE:(203) 938-2838
TEL;CELL;VOICE:(203) 297-3653
ADR;WORK;PREF:;;23-30 Borden Ave;Long Island City;NY.;11101;United States of America
LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE:23-30 Borden Ave=0D=0A=
Long Island City, NY. 11101
ADR;HOME:;;20 White Birch Road;Redding;CT.;06896;United States of America
LABEL;HOME;ENCODING=QUOTED-PRINTABLE:20 White Birch Road=0D=0A=
Redding, CT. 06896
X-MS-OL-DEFAULT-POSTAL-ADDRESS:2
URL;WORK:http://www.freshdirect.com
EMAIL;PREF;INTERNET:rrei...@freshdirect.com
X-MS-OL-DESIGN;CHARSET=utf-8:http://schemas.microsoft.com/office/outlook/12/electronicbusinesscards"; ver="1.0" layout="left" bgcolor="ff">WorkMobileHome
REV:20090820T181709Z
END:VCARD

Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread stan

On Mon, Aug 24, 2009 at 02:01:09PM -0400, Jean-Louis Martineau wrote:
> 
> You can also try the attached, it check the pipe are opened in blocking 
> mode.
> 
OK, I applied that to 2.61 (after figuriing out it would not apply to 2.5.2
:-)).

Of 4 DLE's on the test OpenBSD machine 3failed with PARTIAL, one tiny one
syas complete, and the only signifcant;y sized on seems to be stuck forever
at 0.04%, at least that's what amstatus has been reporting for 15 minutes.

I grep'ed the amanda debug files for pipe (that's what the new error
messages start with right?) and did not find any instances of this.

I can send this debug files, if it will help.

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.

Re: failure of backups after reboot

2009-08-24 Thread Jean-Louis Martineau


Chris Hoogendyk wrote:

planner: create debug directory "/tmp/amanda": File exists
It's a race, both planner and driver tried to create the directory at 
the same time.

Next run should be fine.

Jean-Louis

Re: failure of backups after reboot

2009-08-24 Thread Chris Hoogendyk




Jean-Louis Martineau wrote:

Chris Hoogendyk wrote:

 driver: FATAL Did not get DATE line from planner


The planner crashed, what's in the amdump.1 log file?


eclogite:/usr/local/etc/amanda/daily/logs:root# more amdump.1
amdump: start at Mon Aug 24 00:45:00 EDT 2009
amdump: datestamp 20090824
amdump: starttime 20090824004500
planner: create debug directory "/tmp/amanda": File exists
driver: pid 12625 executable /usr/local/libexec/driver version 2.5.1p3
driver: Did not get DATE line from planner
amdump: end at Mon Aug 24 00:45:01 EDT 2009
Scanning /var/spool/amanda/disk2...
Scanning /var/spool/amanda/disk1...
Scanning /var/spool/amanda/disk3...
 lost+found:   recover: skipping cruft directory, perhaps you should delete it.



--
---

Chris Hoogendyk

-
  O__   Systems Administrator
 c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~ - University of Massachusetts, Amherst 




--- 


Erdös 4

Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread stan

On Mon, Aug 24, 2009 at 02:01:09PM -0400, Jean-Louis Martineau wrote:
> 
> You can also try the attached, it check the pipe are opened in blocking 
> mode.
> 

I will try to test that today.

Thanks.

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.

Re: failure of backups after reboot

2009-08-24 Thread Jean-Louis Martineau


Chris Hoogendyk wrote:

 driver: FATAL Did not get DATE line from planner


The planner crashed, what's in the amdump.1 log file?

Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread stan

On Mon, Aug 24, 2009 at 02:01:09PM -0400, Jean-Louis Martineau wrote:
> This bug can't be fixed until we understand it.

Agreed.
> 
> Ask on a OpenBSD list how a write to a blocking pipe can return EAGAIN.
> Or the pipe semantics changed and they don't default to blocking.

I already posted the code snipet to the OpenBSD list, but no one has
commented on it yet. I have found, in the past, that having a simple
reproducable free standing code snippet that demonstrates the problem, can
often result in a better result on mailing lists.

I have not had time to try to understnad the code well enough to produce
such an example, yet.
> 

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.

failure of backups after reboot

2009-08-24 Thread Chris Hoogendyk

This is a recently configured Amanda 2.5.1p3 setup on a group of E250's
running Solaris 9. It has been running smoothly for a few weeks now.
Just before building Amanda, I had done a reconfigure reboot to get
drivers running for a new tape library. This weekend, after the weekend
backups had completed successfully and everything had flushed out to
tape, I did a `touch /reconfigure; reboot` to get a new drive recognized
(should have been a hot swappable drive, but seagate has changed their
firmware so it doesn't work the way it once did). Anyway, all the
services came up alright as far as I could tell. This is actually a
departmental server with sendmail, apache, samba, etc. all running on it
(they couldn't afford the parts to set up a dedicated backup server).

When I looked at the backup reports this morning, this server had:

*** THE DUMPS DID NOT FINISH PROPERLY!

The next tape Amanda expects to use is: geo-daily-05.
The next new tape already labelled is: geo-daily-03.

FAILURE AND STRANGE DUMP SUMMARY:
eclogite.geo.umass.edu / RESULTS MISSING
..
eclogite.geo.umass.edu /var/mail RESULTS MISSING
gis.geo.umass.edu / RESULTS MISSING
..
gis.geo.umass.edu /usr/local RESULTS MISSING
mogollon.geo.umass.edu / RESULTS MISSING
..
mogollon.geo.umass.edu /usr/local RESULTS MISSING
driver: FATAL Did not get DATE line from planner

The debug files were sparce:

eclogite:/:root# cd /tmp/amanda
eclogite:/tmp/amanda:root# ls -l
total 16
drwx-- 3 amandabackup amandabackup 179 Aug 24 00:45 server
eclogite:/tmp/amanda:root# cd server
eclogite:/tmp/amanda/server:root# ls -l
total 16
drwx-- 2 amandabackup amandabackup 546 Aug 24 00:45 daily
eclogite:/tmp/amanda/server:root# cd daily
eclogite:/tmp/amanda/server/daily:root# ls -l
total 80
-rw-r- 1 amandabackup amandabackup 223 Aug 24 00:45
amlogroll.20090824004501.debug
-rw-r- 1 amandabackup amandabackup 220 Aug 24 00:45
amreport.20090824004501.debug
-rw-r- 1 amandabackup amandabackup3875 Aug 24 00:45
amtrmidx.20090824004501.debug
-rw-r- 1 amandabackup amandabackup 286 Aug 24 00:45
amtrmlog.20090824004501.debug
-rw-r- 1 amandabackup amandabackup 281 Aug 24 00:45
driver.20090824004500.debug
eclogite:/tmp/amanda/server/daily:root# more driver.20090824004500.debug
driver: debug 1 pid 12625 ruid 555 euid 555: start at Mon Aug 24 00:45:00 2009

driver: debug 1 pid 12625 ruid 555 euid 555: rename at Mon Aug 24 00:45:01 2009
driver: time 0.017: Did not get DATE line from planner
driver: time 0.017: pid 12625 finish time Mon Aug 24 00:45:01 2009
eclogite:/tmp/amanda/server/daily:root# date

And there were no debug files on either of the clients for that
date/time. The cronlog entry shows a return code of 8:

CMD: /usr/local/sbin/amdump daily -o reserve=100 -o tapedev="" -o tpchanger=""
amandaba 12613 c Mon Aug 24 00:45:00 2009

< amandaba 12613 c Mon Aug 24 00:45:02 2009 rc=8

Paradoxically, when I ran an amcheck this morning to see if I could get
more detail, amcheck reported no problems. I've looked over everything I
can think of. Since the critical event here is a server reboot, I
particularly looked for things that might have been affected by that --
supposing, for example, I had set something up by hand, and on a reboot
it didn't come back properly. But, amanda isn't susceptible to too much
in that respect, since it runs off cron and isn't dependent on running
daemons, etc. Various mount points and permissions seemed to be alright,
/tmp/amanda was getting debug logs.

I suppose I could have tweaked the amanda entries for inetd incorrectly
and forgot to HUP inetd. Then the reboot would cause it to fail. But the
entries present match my other amanda servers. /etc/inetd.conf has the
following entry:

amanda dgram udp wait amandabackup /usr/local/libexec/amandad amandad -auth=bsd amdump amindexd amidxtaped

and /etc/services has these entries:

amanda 10080/udp
amandaidx 10082/tcp
amidxtape 10083/tcp

And, amcheck reports no problems right now (and in fact produces debug
logs on each of the clients in /tmp/amanda). So, to repeat myself --
this setup has been running smoothly for a couple of weeks. The reboot
borked it. I'm stuck. Might it just work tonight? I kind of doubt it.
Though something may have gotten poked by the amdump that failed and/or
by the amcheck that succeeded.

Here's the output from the amcheck:

eclogite:/usr/local/etc/amanda/daily$ amcheck daily
Amanda Tape Server Host Check
-
Holding disk /var/spool/amanda/disk2: 3631598 KB disk space available, using
2095598 KB
Holding disk /var/spool/amanda/disk1: 3631598 KB disk space available, using
2095598 KB
Holding disk /var/spool/amanda/disk3: 234427600 KB disk space available, using
223941840 KB

Re: Error redirecting stderr to fd 52

2009-08-24 Thread John Hein

Jean-Louis Martineau wrote at 11:48 -0400 on Aug 24, 2009:
 > John Hein wrote:
 > > On a 2.6.1b1 client ...
 > >   
 > Hmm, beta software ...
 > 
 > It's not fixed in 2.6.1 neither in 2.6.1p1.
 > 
 > You must use the latest 2.6.1p1 snapshot from 
 > http://www.zmanda.com/community-builds.php

Building a new version now.

Another interesting note.  On the client in question, amandad is still
running, but it shouldn't be.  It's got a couple unreaped zombies and
is waiting in select.

ps awwx -o user,pid,ppid,start,stat,wchan,command | grep backup
backup   37628 60010  4:43AM Z-  
backup   37629 60010  4:43AM Z-  
backup   60010 23993  7:01PM Ss   select amandad


lsof shows quite a few file descriptors still open.

COMMAND   PID   USER   FD   TYPE   DEVICE SIZE/OFFNODE NAME
amandad 60010 backup0u  IPv4   0xc87f5438  0t0 UDP *:amanda
amandad 60010 backup1u  IPv4   0xc87f5438  0t0 UDP *:amanda
amandad 60010 backup2u  IPv4   0xc87f5438  0t0 UDP *:amanda
amandad 60010 backup4u  IPv4   0xcdd09bf4  0t0 UDP *:58068
amandad 60010 backup   10w  VREG 0,88   107863  331113 / -- 
amandad/amandad.20090823190102.debug
amandad 60010 backup   21u  PIPE   0xcb0064c816384 ->0xcb006580
amandad 60010 backup   22u  PIPE   0xc916a99016384 ->0xc916aa48
amandad 60010 backup   23u  PIPE   0xcd62999016384 ->0xcd629a48
amandad 60010 backup   24u  PIPE   0xce8a899016384 ->0xce8a8a48
amandad 60010 backup   28u  PIPE   0xc86b47f816384 ->0xc86b48b0
amandad 60010 backup   29u  IPv4   0xce6f63a0  0t0 TCP *:6108 (LISTEN)
amandad 60010 backup   30u  PIPE   0xc8d2f33016384 ->0xc8d2f3e8
amandad 60010 backup   31u  PIPE   0xc946619816384 ->0xc9466250
amandad 60010 backup   33u  PIPE   0xca15c33016384 ->0xca15c3e8
amandad 60010 backup   37u  IPv4   0t0 TCP no PCB, 
CANTSENDMORE, CANTRCVMORE
amandad 60010 backup   39u  PIPE   0xcdeb4cc016384 ->0xcdeb4d78
amandad 60010 backup   40u  PIPE   0xc9c307f816384 ->0xc9c308b0
amandad 60010 backup   42u  PIPE   0xc96317f816384 ->0xc96318b0
amandad 60010 backup   44u  PIPE   0xc9c3066016384 ->0xc9c30718
amandad 60010 backup   46u  PIPE   0xc9c1d33016384 ->0xc9c1d3e8
amandad 60010 backup   47u  PIPE   0xc914e7180 ->0xc914e660
amandad 60010 backup   48u  PIPE   0xcc4dd4c816384 ->0xcc4dd580
amandad 60010 backup   49u  PIPE   0xc96be7f816384 ->0xc96be8b0
amandad 60010 backup   50u  PIPE   0xce8a833016384 ->0xce8a83e8
amandad 60010 backup   51u  PIPE   0xc914e5800 ->0xc914e4c8
amandad 60010 backup   52u  PIPE   0xc914ecc016384 ->0xc914ed78
amandad 60010 backup   60u  PIPE   0xc8e1033016384 ->0xc8e103e8
amandad 60010 backup   61u  PIPE   0xcc37fb2816384 ->0xcc37fbe0
amandad 60010 backup   62u  PIPE   0xce64966016384 ->0xce649718
amandad 60010 backup   63u  PIPE   0xcaa0099016384 ->0xcaa00a48
amandad 60010 backup   64u  PIPE   0xc912f66016384 ->0xc912f718
amandad 60010 backup   65u  PIPE   0xc90f1cc016384 ->0xc90f1d78

A few seconds of tracing the process shows..

 60010 amandad  RET   poll 0
 60010 amandad  CALL  gettimeofday(0xbfbfea88,0)
 60010 amandad  RET   gettimeofday 0
 60010 amandad  CALL  gettimeofday(0xbfbfea88,0)
 60010 amandad  RET   gettimeofday 0
 60010 amandad  CALL  poll(0x8052600,0x1,0x7530)

Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread Jean-Louis Martineau

This bug can't be fixed until we understand it.

Ask on a OpenBSD list how a write to a blocking pipe can return EAGAIN.
Or the pipe semantics changed and they don't default to blocking.

You can also try the attached, it check the pipe are opened in blocking 
mode.

Jean-Louis

stan wrote:

On Fri, Aug 21, 2009 at 01:23:29PM -0600, John Hein wrote:

stan wrote at 13:56 -0400 on Aug 21, 2009:
 > OK, I reproduced the failure with only a crossover cable between the test
 > client and the Amanda Master:

Just because you're using a crossover cable doesn't rule out firewall
or other such socket level interference.  I'm not saying that's your
problem, but using a crossover cable doesn't rule it out.

Fair enough, but I think we can rule that out in this case. First, I
control the 2 computersin question. The Master backus up 55 machines a day
taht consist of a diverse nix of OS'es, and Amanda versions. It's a Linux
machine, and I am not running a firwall on it. The client is an OpenBSD
machine. I use these for my firewalls, so I am familiar with configuring
firewalls on them, Doing so requires (as a minimum) running pf. I am not
running pf on the test machine at all.

I am not certain whether the sendbackup executable, or amandad opens the
port on the client side. My guess is that it is not suceding in doing this.
Thus when the Master tries to connect to it, there is no port for it to
connect to.

Index: amandad-src/amandad.c
===
--- amandad-src/amandad.c	(revision 2110)
+++ amandad-src/amandad.c	(working copy)
@@ -1517,6 +1517,7 @@
 struct active_service *as;
 pid_t pid;
 int newfd;
+int r;

 assert(security_handle != NULL);
 assert(cmd != NULL);
@@ -1535,15 +1536,40 @@
 	error(_("pipe: %s\n"), strerror(errno));
 	/*NOTREACHED*/
 	}
+	r = fcntl(data_read[i][0], F_GETFL, 0);
+	if (r & O_NONBLOCK) {
+	error(_("data_read[%d][0] is non blocking"), i);
+	}
+	r = fcntl(data_read[i][1], F_GETFL, 0);
+	if (r & O_NONBLOCK) {
+	error(_("data_read[%d][1] is non blocking"), i);
+	}
+
 	if (pipe(data_write[i]) < 0) {
 	error(_("pipe: %s\n"), strerror(errno));
 	/*NOTREACHED*/
 	}
+	r = fcntl(data_write[i][0], F_GETFL, 0);
+	if (r & O_NONBLOCK) {
+	error(_("data_write[%d][0] is non blocking"), i);
+	}
+	r = fcntl(data_write[i][1], F_GETFL, 0);
+	if (r & O_NONBLOCK) {
+	error(_("data_write[%d][1] is non blocking"), i);
+	}
 }
 if (pipe(data_write[STDERR_PIPE]) < 0) {
 	error(_("pipe: %s\n"), strerror(errno));
 	/*NOTREACHED*/
 }
+r = fcntl(data_write[STDERR_PIPE][0], F_GETFL, 0);
+if (r & O_NONBLOCK) {
+	error(_("data_write[%d][0] is non blocking"), STDERR_PIPE);
+}
+r = fcntl(data_write[STDERR_PIPE][1], F_GETFL, 0);
+if (r & O_NONBLOCK) {
+	error(_("data_write[%d][1] is non blocking"), STDERR_PIPE);
+}

 switch(pid = fork()) {
 case -1:

Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread Michael Burk

I'm using bsdtcp auth.
I'll try 2.5.1 today or tomorrow, to see if I can narrow down the range of
releases in which OpenBSD support broke.

-- Michael

On Mon, Aug 24, 2009 at 5:17 AM, stan  wrote:

> On Fri, Aug 21, 2009 at 03:53:13PM -0600, Michael Burk wrote:
> > Stan's not alone on this one. I have two OpenBSD 4.5 machines also on Sun
> > SPARC hardware. I had this same trouble a couple months ago with 2.6.1,
> but
> > didn't have time to look deeper. After seeing this discussion, I built
> > amanda-2.6.2alpha-20090812 (can't get 0820 to compile). I'm getting
> exactly
> > the same errors. I tried dump with and without compression.
> >
> > My Amanda server is on Solaris 10, and I have a mix of Solaris, Linux,
> and
> > OpenBSD systems. Except for the above-mentioned build, everything is at
> > 2.6.1p1. There are no firewalls on the Amanda server or the OpenBSD
> clients.
> >
> > I know "me too" responses aren't usually helpful, but maybe this will
> rule
> > out the firewall question, at least.
> >
> >
>
> Thanks for confirming the problem report.
>
> Any cn
> hance of you trying 2.500, and 2.5.2 to see if the first works, and the 2nd
> doesn't? Are you using bsd auth?
>
> --
> One of the main causes of the fall of the roman empire was that, lacking
> zero, they had no way to indicate successful termination of their C
> programs.
>

Re: Error redirecting stderr to fd 52

2009-08-24 Thread Jean-Louis Martineau


John Hein wrote:

On a 2.6.1b1 client ...
  

Hmm, beta software ...

It's not fixed in 2.6.1 neither in 2.6.1p1.

You must use the latest 2.6.1p1 snapshot from 
http://www.zmanda.com/community-builds.php


Jean-Louis

Error redirecting stderr to fd 52

2009-08-24 Thread John Hein

On a 2.6.1b1 client ...

1251090802.506210: sendbackup: pid 61161 ruid 5001 euid 5001 version 2.6.1b1: 
start at Sun Aug 23 23:13:22 2009
1251090802.506278: sendbackup: Version 2.6.1b1
1251090802.511032: sendbackup: pid 61161 ruid 5001 euid 5001 version 2.6.1b1: 
rename at Sun Aug 23 23:13:22 2009
1251090802.511055: sendbackup:   sendbackup req: 
1251090802.511102: sendbackup:   Parsed request as: program `GNUTAR'
1251090802.511109: sendbackup:  disk `/data'
1251090802.53: sendbackup:  device `/data'
1251090802.58: sendbackup:  level 1
1251090802.511123: sendbackup:  since 2009:8:21:6:55:50
1251090802.511128: sendbackup:  options 
`|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;'
1251090802.511203: sendbackup: Error redirecting stderr to fd 52: Bad file 
descriptor
1251090802.511215: sendbackup: pid 61161 finish time Sun Aug 23 23:13:22 2009


Has anyone ever seen that?

This is on a client with about a dozen DLEs with possibly 3
dumping in parallel at a time.  The DLE in question is
not small - certainly not so small to complete in 5 ms.

This DLE sometimes works.  Sometimes a different one fails the same
way.

It looks like the mesgfd in client-src/sendbackup.c is getting
closed before dup2(2) runs.  Perhaps a race.

amandad log (below) shows no obvious trouble other than it is continuing to do
work after the child sendbackup has failed (as seen in the log output above).

The two security_stream_close messages seem to be different than the
log messages associated with DLEs that worked.  The working ones have
three security_stream_close messages.  But, oddly, the DLE that worked
(and was small) right _before_ the failed DLE did _not_ have any
security_stream_close messages.  Possibly a clue.  In fact, it
seems all the failures are happening right after a small (< 10 MB),
and thus quick, dump.  Could just be a coincidence.

1251090802.228997: amandad: dgram_recv(dgram=0x280c2a04, timeout=0, 
fromaddr=0x280d29f0)
1251090802.229037: amandad: (sockaddr_in *)0x280d29f0 = { 2, 703, 
206.168.13.161 }
1251090802.229055: amandad: security_handleinit(handle=0x8052600, 
driver=0x280bc520 (BSD))
1251090802.235787: amandad: accept recv REQ pkt:
<
SERVICE sendbackup
OPTIONS features=9ffe00;hostname=bunny;config=test;
GNUTAR /data  1 2009:8:21:6:55:50 OPTIONS 
|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;
>
1251090802.237043: amandad: creating new service: sendbackup
OPTIONS features=9ffe00;hostname=bunny;config=test;
GNUTAR /data  1 2009:8:21:6:55:50 OPTIONS 
|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;

1251090802.237710: amandad: sending ACK pkt:
<
>
1251090802.237764: amandad: dgram_send_addr(addr=0x8052620, dgram=0x280c2a04)
1251090802.237772: amandad: (sockaddr_in *)0x8052620 = { 2, 703, 206.168.13.161 
}
1251090802.237779: amandad: dgram_send_addr: 0x280c2a04->socket = 0
1251090802.511364: amandad: security_streaminit(stream=0x81dd000, 
driver=0x280bc520 (BSD))
1251090802.511719: amandad: stream_server opening socket with family 2 
(requested family was 2)
1251090802.511736: amandad: try_socksize: send buffer size is 65536
1251090802.511743: amandad: try_socksize: receive buffer size is 65536
1251090802.512604: amandad: bind_portrange2: Try  port 6108: Available - Success
1251090802.512617: amandad: stream_server: waiting for connection: 0.0.0.0.6108
1251090802.512643: amandad: security_streaminit(stream=0x81e6000, 
driver=0x280bc520 (BSD))
1251090802.512658: amandad: stream_server opening socket with family 2 
(requested family was 2)
1251090802.512669: amandad: try_socksize: send buffer size is 65536
1251090802.512677: amandad: try_socksize: receive buffer size is 65536
1251090802.513496: amandad: bind_portrange2: Try  port 6108: Available - 
Address already in use
1251090802.514300: amandad: bind_portrange2: Try  port 6109: Available - Success
1251090802.514311: amandad: stream_server: waiting for connection: 0.0.0.0.6109
1251090802.514319: amandad: security_streaminit(stream=0x81ef000, 
driver=0x280bc520 (BSD))
1251090802.514333: amandad: stream_server opening socket with family 2 
(requested family was 2)
1251090802.514344: amandad: try_socksize: send buffer size is 65536
1251090802.514351: amandad: try_socksize: receive buffer size is 65536
1251090802.515189: amandad: bind_portrange2: Try  port 6108: Available - 
Address already in use
1251090802.515991: amandad: bind_portrange2: Try  port 6109: Available - 
Address already in use
1251090802.516772: amandad: bind_portrange2: Skip port 6110: Owned by softcm.
1251090802.517540: amandad: bind_portrange2: Skip port 6111: Owned by spc.
1251090802.518336: amandad: bind_portrange2: Try  port 6112: Available - Success
1251090802.518348: amandad: stream_server: waiting for connection: 0.0.0.0.6112
1251090802.518355: amandad: sen

Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread stan

On Fri, Aug 21, 2009 at 03:53:13PM -0600, Michael Burk wrote:
> Stan's not alone on this one. I have two OpenBSD 4.5 machines also on Sun
> SPARC hardware. I had this same trouble a couple months ago with 2.6.1, but
> didn't have time to look deeper. After seeing this discussion, I built
> amanda-2.6.2alpha-20090812 (can't get 0820 to compile). I'm getting exactly
> the same errors. I tried dump with and without compression.
> 
> My Amanda server is on Solaris 10, and I have a mix of Solaris, Linux, and
> OpenBSD systems. Except for the above-mentioned build, everything is at
> 2.6.1p1. There are no firewalls on the Amanda server or the OpenBSD clients.
> 
> I know "me too" responses aren't usually helpful, but maybe this will rule
> out the firewall question, at least.
> 
> 

Thanks for confirming the problem report. 

Any cn
hance of you trying 2.500, and 2.5.2 to see if the first works, and the 2nd
doesn't? Are you using bsd auth?

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.

Re: New verbosity in the log when running amcheck after 2.6.2alpha20090824

New verbosity in the log when running amcheck after 2.6.2alpha20090824

Re: Backup issues with OpenBSD 4.5 machines

Re: How can a write to a blocking pipe return EAGAIN?

Re: How can a write to a blocking pipe return EAGAIN?

Re: Backup issues with OpenBSD 4.5 machines

[no subject]

Re: Backup issues with OpenBSD 4.5 machines

Re: failure of backups after reboot

Re: failure of backups after reboot

Re: Backup issues with OpenBSD 4.5 machines

Re: failure of backups after reboot

Re: Backup issues with OpenBSD 4.5 machines

failure of backups after reboot

Re: Error redirecting stderr to fd 52

Re: Backup issues with OpenBSD 4.5 machines

Re: Backup issues with OpenBSD 4.5 machines

Re: Error redirecting stderr to fd 52

Error redirecting stderr to fd 52

Re: Backup issues with OpenBSD 4.5 machines

20 matches

Site Navigation

Mail list logo

Footer information