Re: New verbosity in the log when running amcheck after 2.6.2alpha20090824
:35:35 coyote xinetd[2394]: Swapping defaults >Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda >Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx >Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape >Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 >(services) >Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration >Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults >Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda >Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx >Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape >Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 >(services) >Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration >Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults >Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda >Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx >Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape >Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 >(services) >Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration >Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape >Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 >(services) >Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration >Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape >Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 >(services) >Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration >Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape >Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 >(services) >Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration >Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape >Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 >(services) >Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration >Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape >Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 >(services) >Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration >Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape >Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 >(services) >Aug 24 10:35:36 coyote xinetd[2394]: Starting reconfiguration >Aug 24 10:35:36 coyote xinetd[2394]: Swapping defaults >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amanda >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amandaidx >Aug 24 10:35:36 coyote xinetd[2394]: readjusting service amidxtape >Aug 24 10:35:36 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 >(services) >Aug 24 10:35:38 coyote xinetd[2394]: Starting reconfiguration >Aug 24 10:35:38 coyote xinetd[2394]: Swapping defaults >Aug 24 10:35:38 coyote xinetd[2394]: readjusting service amanda >Aug 24 10:35:38 coyote xinetd[2394]: readjusting service amandaidx >Aug 24 10:35:38 coyote xinetd[2394]: readjusting service amidxtape >Aug 24 10:35:38 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 >(services) > >Then this afternoon, 2 more lines I don't recall seeing before: > >Aug 24 16:58:45 coyote xinetd[2394]: START: amanda pid=13481 >from=:::192.168.71.3 >Aug 24 16:59:15 coyote xinetd[2394]: EXIT: amanda status=0 pid=13481 >duration=30(sec) > >Anybody have a clue I could borrow? > >Thank you. Another set of oddities: [r...@coyote am
New verbosity in the log when running amcheck after 2.6.2alpha20090824
Greetings; This doesn't look terribly comforting, and occurred apparently at the amcheck phase of a script I wrote to install new amanda snapshots: Aug 24 10:35:34 coyote xinetd[2394]: Starting reconfiguration Aug 24 10:35:34 coyote xinetd[2394]: Swapping defaults Aug 24 10:35:34 coyote xinetd[2394]: readjusting service amanda Aug 24 10:35:34 coyote xinetd[2394]: readjusting service amandaidx Aug 24 10:35:34 coyote xinetd[2394]: readjusting service amidxtape Aug 24 10:35:34 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 (services) Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 (services) Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 (services) Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 (services) Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape Aug 24 10:35:35 coyote xinetd[2394]: Reconfigured: new=0 old=3 dropped=0 (services) Aug 24 10:35:35 coyote xinetd[2394]: Starting reconfiguration Aug 24 10:35:35 coyote xinetd[2394]: Swapping defaults Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amanda Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amandaidx Aug 24 10:35:35 coyote xinetd[2394]: readjusting service amidxtape
Re: Backup issues with OpenBSD 4.5 machines
stan wrote at 16:59 -0400 on Aug 24, 2009: > The firts thing I notice when comparing this function in 2.5.0 vs 2.5.2 is > that 2.5.0 does: > > tv.tv_usec = 0; > > and 2.5.2 does not. Could thim make a difference? Both do > > tv.tv_sec = timeout; In 2.5.2, the memset sets the entire struct to 0. 2.5.0 is slightly more efficient, but otherwise the results wind up being the same. Nothing to see there.
Re: How can a write to a blocking pipe return EAGAIN?
On Mon, Aug 24, 2009 at 09:34:50PM +0100, Owain Ainsworth wrote: > On Mon, Aug 24, 2009 at 04:25:39PM -0400, stan wrote: > > I am trying to get Amanda of a recent vintage working on 4.5. The developrs > > don't understand how writing to a blocking pipe can return EAGAIN. > > Should the write just block? > > Are they using pthreads? I am fairly certain not. -- One of the main causes of the fall of the roman empire was that, lacking zero, they had no way to indicate successful termination of their C programs.
Re: How can a write to a blocking pipe return EAGAIN?
On Mon, Aug 24, 2009 at 10:39:21PM +0200, Bret S. Lambert wrote: > On Mon, Aug 24, 2009 at 04:25:39PM -0400, stan wrote: > > I am trying to get Amanda of a recent vintage working on 4.5. The developrs > > don't understand how writing to a blocking pipe can return EAGAIN. > > Should the write just block? > > Yes, it should, and a quick, about-to-go-to-bed readthrough of the > pipe write path shows that it shouldn't; is Amanda doing fcntls to > the fd? If so, check those to be absolutely sure that they're not > doing something hinkey. > Thanks for the quick reply. I am trying to come up with a small demonstration peice of code to try to localize teh problem. -- One of the main causes of the fall of the roman empire was that, lacking zero, they had no way to indicate successful termination of their C programs.
Re: Backup issues with OpenBSD 4.5 machines
On Fri, Aug 21, 2009 at 09:57:36AM -0600, John Hein wrote: > stan wrote at 10:56 -0400 on Aug 21, 2009: > > OK here is the latest on this saga :-) > > > > On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to > > back this machine up successfully (using classic UDP based authentication) > > > > On another of them, I built 2.5.2p1. The first attempt to back this machine > > up failed. I checked the log files, and found they were having issues > > because /etc/amdates was missing. I corrected that, and started a 2nd > > backup run. (Remember amcheck reports all is well with this machine). I > > got the following from amstatus when I attempted to back up this machine. > > Also remember, one of the test I ran with a 2.6.1 client was to connect a > > test machine directly to the client, using a crossover cable to eliminate > > any firewall, or router type issues. > > > > I am attaching, what I think is, the amadnad debug file associated with > this > > failure. > > > > Can anyone suggest what I can do to further troubleshoot this? > > > > pb48:wd0f 1 dumper: [could not connect DATA stream: > > can't connect stream to pb48.meadwestvaco.com port 11996: Connection > > refused] (10:37:27) > > >. >. >. > > amandad: time 30.019: stream_accept: timeout after 30 seconds > > amandad: time 30.019: security_stream_seterr(0x86b67000, can't accept new > stream connection: No such file or directory) > > amandad: time 30.019: stream 0 accept failed: unknown protocol error > > amandad: time 30.019: security_stream_close(0x86b67000) > > amandad: time 60.027: stream_accept: timeout after 30 seconds > > amandad: time 60.027: security_stream_seterr(0x81212000, can't accept new > stream connection: No such file or directory) > > amandad: time 60.027: stream 1 accept failed: unknown protocol error > > amandad: time 60.027: security_stream_close(0x81212000) > > amandad: time 90.035: stream_accept: timeout after 30 seconds > > amandad: time 90.036: security_stream_seterr(0x84877000, can't accept new > stream connection: No such file or directory) > > amandad: time 90.036: stream 2 accept failed: unknown protocol error > > amandad: time 90.036: security_stream_close(0x84877000) > > amandad: time 90.036: security_close(handle=0x81bbf800, driver=0x298a9240 > (BSD)) > > amandad: time 120.044: pid 17702 finish time Fri Aug 21 10:39:27 2009 > > For some reason the socket is not getting marked ready for read. > select(2) is timing out waiting. Firewall setup perhaps? > > This bit of code in 2.5.2p1's common-src/stream.c is where > the failure is happening for you... > > int > stream_accept( > int server_socket, > int timeout, > size_t sendsize, > size_t recvsize) > { > SELECT_ARG_TYPE readset; > struct timeval tv; > int nfound, connected_socket; > int save_errno; > int ntries = 0; > in_port_t port; > > assert(server_socket >= 0); > > do { > ntries++; > memset(&tv, 0, SIZEOF(tv)); > tv.tv_sec = timeout; > memset(&readset, 0, SIZEOF(readset)); > FD_ZERO(&readset); > FD_SET(server_socket, &readset); > nfound = select(server_socket+1, &readset, NULL, NULL, &tv); > if(nfound <= 0 || !FD_ISSET(server_socket, &readset)) { > save_errno = errno; > if(nfound < 0) { > dbprintf(("%s: stream_accept: select() failed: %s\n", > debug_prefix_time(NULL), > strerror(save_errno))); > } else if(nfound == 0) { > dbprintf(("%s: stream_accept: timeout after %d second%s\n", > debug_prefix_time(NULL), > timeout, > (timeout == 1) ? "" : "s")); > errno = ENOENT; /* ??? */ > return -1; The firts thing I notice when comparing this function in 2.5.0 vs 2.5.2 is that 2.5.0 does: tv.tv_usec = 0; and 2.5.2 does not. Could thim make a difference? Both do tv.tv_sec = timeout; -- One of the main causes of the fall of the roman empire was that, lacking zero, they had no way to indicate successful termination of their C programs.
[no subject]
Hello List I am using amvault to copy from disk to tape it works great The one problem I see it does not record the label and barcode like it does if it was a backup. I am using the chg-zd-mtx script Any thoughts ? <>BEGIN:VCARD VERSION:2.1 X-MS-SIGNATURE:YES N;LANGUAGE=en-us:Reilly;Robert FN:Robert Reilly ORG:Fresh Direct TITLE:Director IT / Systems Engineering TEL;WORK;VOICE:(718) 928-1172 TEL;HOME;VOICE:(203) 938-2838 TEL;CELL;VOICE:(203) 297-3653 ADR;WORK;PREF:;;23-30 Borden Ave;Long Island City;NY.;11101;United States of America LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE:23-30 Borden Ave=0D=0A= Long Island City, NY. 11101 ADR;HOME:;;20 White Birch Road;Redding;CT.;06896;United States of America LABEL;HOME;ENCODING=QUOTED-PRINTABLE:20 White Birch Road=0D=0A= Redding, CT. 06896 X-MS-OL-DEFAULT-POSTAL-ADDRESS:2 URL;WORK:http://www.freshdirect.com EMAIL;PREF;INTERNET:rrei...@freshdirect.com X-MS-OL-DESIGN;CHARSET=utf-8:http://schemas.microsoft.com/office/outlook/12/electronicbusinesscards"; ver="1.0" layout="left" bgcolor="ff">WorkMobileHome REV:20090820T181709Z END:VCARD
Re: Backup issues with OpenBSD 4.5 machines
On Mon, Aug 24, 2009 at 02:01:09PM -0400, Jean-Louis Martineau wrote: > > You can also try the attached, it check the pipe are opened in blocking > mode. > OK, I applied that to 2.61 (after figuriing out it would not apply to 2.5.2 :-)). Of 4 DLE's on the test OpenBSD machine 3failed with PARTIAL, one tiny one syas complete, and the only signifcant;y sized on seems to be stuck forever at 0.04%, at least that's what amstatus has been reporting for 15 minutes. I grep'ed the amanda debug files for pipe (that's what the new error messages start with right?) and did not find any instances of this. I can send this debug files, if it will help. -- One of the main causes of the fall of the roman empire was that, lacking zero, they had no way to indicate successful termination of their C programs.
Re: failure of backups after reboot
Chris Hoogendyk wrote: planner: create debug directory "/tmp/amanda": File exists It's a race, both planner and driver tried to create the directory at the same time. Next run should be fine. Jean-Louis
Re: failure of backups after reboot
Jean-Louis Martineau wrote: Chris Hoogendyk wrote: driver: FATAL Did not get DATE line from planner The planner crashed, what's in the amdump.1 log file? eclogite:/usr/local/etc/amanda/daily/logs:root# more amdump.1 amdump: start at Mon Aug 24 00:45:00 EDT 2009 amdump: datestamp 20090824 amdump: starttime 20090824004500 planner: create debug directory "/tmp/amanda": File exists driver: pid 12625 executable /usr/local/libexec/driver version 2.5.1p3 driver: Did not get DATE line from planner amdump: end at Mon Aug 24 00:45:01 EDT 2009 Scanning /var/spool/amanda/disk2... Scanning /var/spool/amanda/disk1... Scanning /var/spool/amanda/disk3... lost+found: recover: skipping cruft directory, perhaps you should delete it. -- --- Chris Hoogendyk - O__ Systems Administrator c/ /'_ --- Biology & Geology Departments (*) \(*) -- 140 Morrill Science Center ~~ - University of Massachusetts, Amherst --- Erdös 4
Re: Backup issues with OpenBSD 4.5 machines
On Mon, Aug 24, 2009 at 02:01:09PM -0400, Jean-Louis Martineau wrote: > > You can also try the attached, it check the pipe are opened in blocking > mode. > I will try to test that today. Thanks. -- One of the main causes of the fall of the roman empire was that, lacking zero, they had no way to indicate successful termination of their C programs.
Re: failure of backups after reboot
Chris Hoogendyk wrote: driver: FATAL Did not get DATE line from planner The planner crashed, what's in the amdump.1 log file?
Re: Backup issues with OpenBSD 4.5 machines
On Mon, Aug 24, 2009 at 02:01:09PM -0400, Jean-Louis Martineau wrote: > This bug can't be fixed until we understand it. Agreed. > > Ask on a OpenBSD list how a write to a blocking pipe can return EAGAIN. > Or the pipe semantics changed and they don't default to blocking. I already posted the code snipet to the OpenBSD list, but no one has commented on it yet. I have found, in the past, that having a simple reproducable free standing code snippet that demonstrates the problem, can often result in a better result on mailing lists. I have not had time to try to understnad the code well enough to produce such an example, yet. > -- One of the main causes of the fall of the roman empire was that, lacking zero, they had no way to indicate successful termination of their C programs.
failure of backups after reboot
This is a recently configured Amanda 2.5.1p3 setup on a group of E250's running Solaris 9. It has been running smoothly for a few weeks now. Just before building Amanda, I had done a reconfigure reboot to get drivers running for a new tape library. This weekend, after the weekend backups had completed successfully and everything had flushed out to tape, I did a `touch /reconfigure; reboot` to get a new drive recognized (should have been a hot swappable drive, but seagate has changed their firmware so it doesn't work the way it once did). Anyway, all the services came up alright as far as I could tell. This is actually a departmental server with sendmail, apache, samba, etc. all running on it (they couldn't afford the parts to set up a dedicated backup server). When I looked at the backup reports this morning, this server had: *** THE DUMPS DID NOT FINISH PROPERLY! The next tape Amanda expects to use is: geo-daily-05. The next new tape already labelled is: geo-daily-03. FAILURE AND STRANGE DUMP SUMMARY: eclogite.geo.umass.edu / RESULTS MISSING .. eclogite.geo.umass.edu /var/mail RESULTS MISSING gis.geo.umass.edu / RESULTS MISSING .. gis.geo.umass.edu /usr/local RESULTS MISSING mogollon.geo.umass.edu / RESULTS MISSING .. mogollon.geo.umass.edu /usr/local RESULTS MISSING driver: FATAL Did not get DATE line from planner The debug files were sparce: eclogite:/:root# cd /tmp/amanda eclogite:/tmp/amanda:root# ls -l total 16 drwx-- 3 amandabackup amandabackup 179 Aug 24 00:45 server eclogite:/tmp/amanda:root# cd server eclogite:/tmp/amanda/server:root# ls -l total 16 drwx-- 2 amandabackup amandabackup 546 Aug 24 00:45 daily eclogite:/tmp/amanda/server:root# cd daily eclogite:/tmp/amanda/server/daily:root# ls -l total 80 -rw-r- 1 amandabackup amandabackup 223 Aug 24 00:45 amlogroll.20090824004501.debug -rw-r- 1 amandabackup amandabackup 220 Aug 24 00:45 amreport.20090824004501.debug -rw-r- 1 amandabackup amandabackup3875 Aug 24 00:45 amtrmidx.20090824004501.debug -rw-r- 1 amandabackup amandabackup 286 Aug 24 00:45 amtrmlog.20090824004501.debug -rw-r- 1 amandabackup amandabackup 281 Aug 24 00:45 driver.20090824004500.debug eclogite:/tmp/amanda/server/daily:root# more driver.20090824004500.debug driver: debug 1 pid 12625 ruid 555 euid 555: start at Mon Aug 24 00:45:00 2009 driver: debug 1 pid 12625 ruid 555 euid 555: rename at Mon Aug 24 00:45:01 2009 driver: time 0.017: Did not get DATE line from planner driver: time 0.017: pid 12625 finish time Mon Aug 24 00:45:01 2009 eclogite:/tmp/amanda/server/daily:root# date And there were no debug files on either of the clients for that date/time. The cronlog entry shows a return code of 8: CMD: /usr/local/sbin/amdump daily -o reserve=100 -o tapedev="" -o tpchanger="" amandaba 12613 c Mon Aug 24 00:45:00 2009 < amandaba 12613 c Mon Aug 24 00:45:02 2009 rc=8 Paradoxically, when I ran an amcheck this morning to see if I could get more detail, amcheck reported no problems. I've looked over everything I can think of. Since the critical event here is a server reboot, I particularly looked for things that might have been affected by that -- supposing, for example, I had set something up by hand, and on a reboot it didn't come back properly. But, amanda isn't susceptible to too much in that respect, since it runs off cron and isn't dependent on running daemons, etc. Various mount points and permissions seemed to be alright, /tmp/amanda was getting debug logs. I suppose I could have tweaked the amanda entries for inetd incorrectly and forgot to HUP inetd. Then the reboot would cause it to fail. But the entries present match my other amanda servers. /etc/inetd.conf has the following entry: amanda dgram udp wait amandabackup /usr/local/libexec/amandad amandad -auth=bsd amdump amindexd amidxtaped and /etc/services has these entries: amanda 10080/udp amandaidx 10082/tcp amidxtape 10083/tcp And, amcheck reports no problems right now (and in fact produces debug logs on each of the clients in /tmp/amanda). So, to repeat myself -- this setup has been running smoothly for a couple of weeks. The reboot borked it. I'm stuck. Might it just work tonight? I kind of doubt it. Though something may have gotten poked by the amdump that failed and/or by the amcheck that succeeded. Here's the output from the amcheck: eclogite:/usr/local/etc/amanda/daily$ amcheck daily Amanda Tape Server Host Check - Holding disk /var/spool/amanda/disk2: 3631598 KB disk space available, using 2095598 KB Holding disk /var/spool/amanda/disk1: 3631598 KB disk space available, using 2095598 KB Holding disk /var/spool/amanda/disk3: 234427600 KB disk space available, using 223941840 KB
Re: Error redirecting stderr to fd 52
Jean-Louis Martineau wrote at 11:48 -0400 on Aug 24, 2009: > John Hein wrote: > > On a 2.6.1b1 client ... > > > Hmm, beta software ... > > It's not fixed in 2.6.1 neither in 2.6.1p1. > > You must use the latest 2.6.1p1 snapshot from > http://www.zmanda.com/community-builds.php Building a new version now. Another interesting note. On the client in question, amandad is still running, but it shouldn't be. It's got a couple unreaped zombies and is waiting in select. ps awwx -o user,pid,ppid,start,stat,wchan,command | grep backup backup 37628 60010 4:43AM Z- backup 37629 60010 4:43AM Z- backup 60010 23993 7:01PM Ss select amandad lsof shows quite a few file descriptors still open. COMMAND PID USER FD TYPE DEVICE SIZE/OFFNODE NAME amandad 60010 backup0u IPv4 0xc87f5438 0t0 UDP *:amanda amandad 60010 backup1u IPv4 0xc87f5438 0t0 UDP *:amanda amandad 60010 backup2u IPv4 0xc87f5438 0t0 UDP *:amanda amandad 60010 backup4u IPv4 0xcdd09bf4 0t0 UDP *:58068 amandad 60010 backup 10w VREG 0,88 107863 331113 / -- amandad/amandad.20090823190102.debug amandad 60010 backup 21u PIPE 0xcb0064c816384 ->0xcb006580 amandad 60010 backup 22u PIPE 0xc916a99016384 ->0xc916aa48 amandad 60010 backup 23u PIPE 0xcd62999016384 ->0xcd629a48 amandad 60010 backup 24u PIPE 0xce8a899016384 ->0xce8a8a48 amandad 60010 backup 28u PIPE 0xc86b47f816384 ->0xc86b48b0 amandad 60010 backup 29u IPv4 0xce6f63a0 0t0 TCP *:6108 (LISTEN) amandad 60010 backup 30u PIPE 0xc8d2f33016384 ->0xc8d2f3e8 amandad 60010 backup 31u PIPE 0xc946619816384 ->0xc9466250 amandad 60010 backup 33u PIPE 0xca15c33016384 ->0xca15c3e8 amandad 60010 backup 37u IPv4 0t0 TCP no PCB, CANTSENDMORE, CANTRCVMORE amandad 60010 backup 39u PIPE 0xcdeb4cc016384 ->0xcdeb4d78 amandad 60010 backup 40u PIPE 0xc9c307f816384 ->0xc9c308b0 amandad 60010 backup 42u PIPE 0xc96317f816384 ->0xc96318b0 amandad 60010 backup 44u PIPE 0xc9c3066016384 ->0xc9c30718 amandad 60010 backup 46u PIPE 0xc9c1d33016384 ->0xc9c1d3e8 amandad 60010 backup 47u PIPE 0xc914e7180 ->0xc914e660 amandad 60010 backup 48u PIPE 0xcc4dd4c816384 ->0xcc4dd580 amandad 60010 backup 49u PIPE 0xc96be7f816384 ->0xc96be8b0 amandad 60010 backup 50u PIPE 0xce8a833016384 ->0xce8a83e8 amandad 60010 backup 51u PIPE 0xc914e5800 ->0xc914e4c8 amandad 60010 backup 52u PIPE 0xc914ecc016384 ->0xc914ed78 amandad 60010 backup 60u PIPE 0xc8e1033016384 ->0xc8e103e8 amandad 60010 backup 61u PIPE 0xcc37fb2816384 ->0xcc37fbe0 amandad 60010 backup 62u PIPE 0xce64966016384 ->0xce649718 amandad 60010 backup 63u PIPE 0xcaa0099016384 ->0xcaa00a48 amandad 60010 backup 64u PIPE 0xc912f66016384 ->0xc912f718 amandad 60010 backup 65u PIPE 0xc90f1cc016384 ->0xc90f1d78 A few seconds of tracing the process shows.. 60010 amandad RET poll 0 60010 amandad CALL gettimeofday(0xbfbfea88,0) 60010 amandad RET gettimeofday 0 60010 amandad CALL gettimeofday(0xbfbfea88,0) 60010 amandad RET gettimeofday 0 60010 amandad CALL poll(0x8052600,0x1,0x7530)
Re: Backup issues with OpenBSD 4.5 machines
This bug can't be fixed until we understand it. Ask on a OpenBSD list how a write to a blocking pipe can return EAGAIN. Or the pipe semantics changed and they don't default to blocking. You can also try the attached, it check the pipe are opened in blocking mode. Jean-Louis stan wrote: On Fri, Aug 21, 2009 at 01:23:29PM -0600, John Hein wrote: stan wrote at 13:56 -0400 on Aug 21, 2009: > OK, I reproduced the failure with only a crossover cable between the test > client and the Amanda Master: Just because you're using a crossover cable doesn't rule out firewall or other such socket level interference. I'm not saying that's your problem, but using a crossover cable doesn't rule it out. Fair enough, but I think we can rule that out in this case. First, I control the 2 computersin question. The Master backus up 55 machines a day taht consist of a diverse nix of OS'es, and Amanda versions. It's a Linux machine, and I am not running a firwall on it. The client is an OpenBSD machine. I use these for my firewalls, so I am familiar with configuring firewalls on them, Doing so requires (as a minimum) running pf. I am not running pf on the test machine at all. I am not certain whether the sendbackup executable, or amandad opens the port on the client side. My guess is that it is not suceding in doing this. Thus when the Master tries to connect to it, there is no port for it to connect to. Index: amandad-src/amandad.c === --- amandad-src/amandad.c (revision 2110) +++ amandad-src/amandad.c (working copy) @@ -1517,6 +1517,7 @@ struct active_service *as; pid_t pid; int newfd; +int r; assert(security_handle != NULL); assert(cmd != NULL); @@ -1535,15 +1536,40 @@ error(_("pipe: %s\n"), strerror(errno)); /*NOTREACHED*/ } + r = fcntl(data_read[i][0], F_GETFL, 0); + if (r & O_NONBLOCK) { + error(_("data_read[%d][0] is non blocking"), i); + } + r = fcntl(data_read[i][1], F_GETFL, 0); + if (r & O_NONBLOCK) { + error(_("data_read[%d][1] is non blocking"), i); + } + if (pipe(data_write[i]) < 0) { error(_("pipe: %s\n"), strerror(errno)); /*NOTREACHED*/ } + r = fcntl(data_write[i][0], F_GETFL, 0); + if (r & O_NONBLOCK) { + error(_("data_write[%d][0] is non blocking"), i); + } + r = fcntl(data_write[i][1], F_GETFL, 0); + if (r & O_NONBLOCK) { + error(_("data_write[%d][1] is non blocking"), i); + } } if (pipe(data_write[STDERR_PIPE]) < 0) { error(_("pipe: %s\n"), strerror(errno)); /*NOTREACHED*/ } +r = fcntl(data_write[STDERR_PIPE][0], F_GETFL, 0); +if (r & O_NONBLOCK) { + error(_("data_write[%d][0] is non blocking"), STDERR_PIPE); +} +r = fcntl(data_write[STDERR_PIPE][1], F_GETFL, 0); +if (r & O_NONBLOCK) { + error(_("data_write[%d][1] is non blocking"), STDERR_PIPE); +} switch(pid = fork()) { case -1:
Re: Backup issues with OpenBSD 4.5 machines
I'm using bsdtcp auth. I'll try 2.5.1 today or tomorrow, to see if I can narrow down the range of releases in which OpenBSD support broke. -- Michael On Mon, Aug 24, 2009 at 5:17 AM, stan wrote: > On Fri, Aug 21, 2009 at 03:53:13PM -0600, Michael Burk wrote: > > Stan's not alone on this one. I have two OpenBSD 4.5 machines also on Sun > > SPARC hardware. I had this same trouble a couple months ago with 2.6.1, > but > > didn't have time to look deeper. After seeing this discussion, I built > > amanda-2.6.2alpha-20090812 (can't get 0820 to compile). I'm getting > exactly > > the same errors. I tried dump with and without compression. > > > > My Amanda server is on Solaris 10, and I have a mix of Solaris, Linux, > and > > OpenBSD systems. Except for the above-mentioned build, everything is at > > 2.6.1p1. There are no firewalls on the Amanda server or the OpenBSD > clients. > > > > I know "me too" responses aren't usually helpful, but maybe this will > rule > > out the firewall question, at least. > > > > > > Thanks for confirming the problem report. > > Any cn > hance of you trying 2.500, and 2.5.2 to see if the first works, and the 2nd > doesn't? Are you using bsd auth? > > -- > One of the main causes of the fall of the roman empire was that, lacking > zero, they had no way to indicate successful termination of their C > programs. >
Re: Error redirecting stderr to fd 52
John Hein wrote: On a 2.6.1b1 client ... Hmm, beta software ... It's not fixed in 2.6.1 neither in 2.6.1p1. You must use the latest 2.6.1p1 snapshot from http://www.zmanda.com/community-builds.php Jean-Louis
Error redirecting stderr to fd 52
On a 2.6.1b1 client ... 1251090802.506210: sendbackup: pid 61161 ruid 5001 euid 5001 version 2.6.1b1: start at Sun Aug 23 23:13:22 2009 1251090802.506278: sendbackup: Version 2.6.1b1 1251090802.511032: sendbackup: pid 61161 ruid 5001 euid 5001 version 2.6.1b1: rename at Sun Aug 23 23:13:22 2009 1251090802.511055: sendbackup: sendbackup req: 1251090802.511102: sendbackup: Parsed request as: program `GNUTAR' 1251090802.511109: sendbackup: disk `/data' 1251090802.53: sendbackup: device `/data' 1251090802.58: sendbackup: level 1 1251090802.511123: sendbackup: since 2009:8:21:6:55:50 1251090802.511128: sendbackup: options `|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;' 1251090802.511203: sendbackup: Error redirecting stderr to fd 52: Bad file descriptor 1251090802.511215: sendbackup: pid 61161 finish time Sun Aug 23 23:13:22 2009 Has anyone ever seen that? This is on a client with about a dozen DLEs with possibly 3 dumping in parallel at a time. The DLE in question is not small - certainly not so small to complete in 5 ms. This DLE sometimes works. Sometimes a different one fails the same way. It looks like the mesgfd in client-src/sendbackup.c is getting closed before dup2(2) runs. Perhaps a race. amandad log (below) shows no obvious trouble other than it is continuing to do work after the child sendbackup has failed (as seen in the log output above). The two security_stream_close messages seem to be different than the log messages associated with DLEs that worked. The working ones have three security_stream_close messages. But, oddly, the DLE that worked (and was small) right _before_ the failed DLE did _not_ have any security_stream_close messages. Possibly a clue. In fact, it seems all the failures are happening right after a small (< 10 MB), and thus quick, dump. Could just be a coincidence. 1251090802.228997: amandad: dgram_recv(dgram=0x280c2a04, timeout=0, fromaddr=0x280d29f0) 1251090802.229037: amandad: (sockaddr_in *)0x280d29f0 = { 2, 703, 206.168.13.161 } 1251090802.229055: amandad: security_handleinit(handle=0x8052600, driver=0x280bc520 (BSD)) 1251090802.235787: amandad: accept recv REQ pkt: < SERVICE sendbackup OPTIONS features=9ffe00;hostname=bunny;config=test; GNUTAR /data 1 2009:8:21:6:55:50 OPTIONS |;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar; > 1251090802.237043: amandad: creating new service: sendbackup OPTIONS features=9ffe00;hostname=bunny;config=test; GNUTAR /data 1 2009:8:21:6:55:50 OPTIONS |;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar; 1251090802.237710: amandad: sending ACK pkt: < > 1251090802.237764: amandad: dgram_send_addr(addr=0x8052620, dgram=0x280c2a04) 1251090802.237772: amandad: (sockaddr_in *)0x8052620 = { 2, 703, 206.168.13.161 } 1251090802.237779: amandad: dgram_send_addr: 0x280c2a04->socket = 0 1251090802.511364: amandad: security_streaminit(stream=0x81dd000, driver=0x280bc520 (BSD)) 1251090802.511719: amandad: stream_server opening socket with family 2 (requested family was 2) 1251090802.511736: amandad: try_socksize: send buffer size is 65536 1251090802.511743: amandad: try_socksize: receive buffer size is 65536 1251090802.512604: amandad: bind_portrange2: Try port 6108: Available - Success 1251090802.512617: amandad: stream_server: waiting for connection: 0.0.0.0.6108 1251090802.512643: amandad: security_streaminit(stream=0x81e6000, driver=0x280bc520 (BSD)) 1251090802.512658: amandad: stream_server opening socket with family 2 (requested family was 2) 1251090802.512669: amandad: try_socksize: send buffer size is 65536 1251090802.512677: amandad: try_socksize: receive buffer size is 65536 1251090802.513496: amandad: bind_portrange2: Try port 6108: Available - Address already in use 1251090802.514300: amandad: bind_portrange2: Try port 6109: Available - Success 1251090802.514311: amandad: stream_server: waiting for connection: 0.0.0.0.6109 1251090802.514319: amandad: security_streaminit(stream=0x81ef000, driver=0x280bc520 (BSD)) 1251090802.514333: amandad: stream_server opening socket with family 2 (requested family was 2) 1251090802.514344: amandad: try_socksize: send buffer size is 65536 1251090802.514351: amandad: try_socksize: receive buffer size is 65536 1251090802.515189: amandad: bind_portrange2: Try port 6108: Available - Address already in use 1251090802.515991: amandad: bind_portrange2: Try port 6109: Available - Address already in use 1251090802.516772: amandad: bind_portrange2: Skip port 6110: Owned by softcm. 1251090802.517540: amandad: bind_portrange2: Skip port 6111: Owned by spc. 1251090802.518336: amandad: bind_portrange2: Try port 6112: Available - Success 1251090802.518348: amandad: stream_server: waiting for connection: 0.0.0.0.6112 1251090802.518355: amandad: sen
Re: Backup issues with OpenBSD 4.5 machines
On Fri, Aug 21, 2009 at 03:53:13PM -0600, Michael Burk wrote: > Stan's not alone on this one. I have two OpenBSD 4.5 machines also on Sun > SPARC hardware. I had this same trouble a couple months ago with 2.6.1, but > didn't have time to look deeper. After seeing this discussion, I built > amanda-2.6.2alpha-20090812 (can't get 0820 to compile). I'm getting exactly > the same errors. I tried dump with and without compression. > > My Amanda server is on Solaris 10, and I have a mix of Solaris, Linux, and > OpenBSD systems. Except for the above-mentioned build, everything is at > 2.6.1p1. There are no firewalls on the Amanda server or the OpenBSD clients. > > I know "me too" responses aren't usually helpful, but maybe this will rule > out the firewall question, at least. > > Thanks for confirming the problem report. Any cn hance of you trying 2.500, and 2.5.2 to see if the first works, and the 2nd doesn't? Are you using bsd auth? -- One of the main causes of the fall of the roman empire was that, lacking zero, they had no way to indicate successful termination of their C programs.