Re: Backup issues with OpenBSD 4.5 machines

2009-09-29 Thread Dustin J. Mitchell
On Mon, Sep 28, 2009 at 9:42 PM, Nathan Stratton Treadway
natha...@ontko.com wrote:
 As you say, having this extra call shouldn't really hurt anything, but I
 wondering if it would make sense to tweak the #ifdef so that is possible
 to compile Amanda on OpenBSD without having to include the work-around
 even in environments where it's not necessary (e.g. when using the new
 theading library you mention)?

I wouldn't mind doing this, but it's only six fcntl() calls on
startup, so the cost really is negligible.

 (For example, does Configure generate any particular #define related to
 the use of the pthread library?)

I don't know a good way to do this -- the best I can think of is to
check GLIB_LDFLAGS for /pthread/, but I don't know if OpenBSD compiles
require that pthread be listed explicitly on the link line when
compiling against libgthread.  It would help if I had an OpenBSD
system handy..

When a version of OpenBSD is released where this is no longer an issue
(5.0?), we can conditionalize it on that version.

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


Re: Backup issues with OpenBSD 4.5 machines

2009-09-29 Thread Michael Burk
I added the following lines to util.c as a temporary fix:
#define DATA_FD_COUNT   3   /* number of general-use pipes */
#define DATA_FD_OFFSET  50

I ran three test backups, and each succeeded:
- forced full, without compression, no index
- forced full, with compression and index
- unforced, with compression and index

Thanks,
Michael


On Mon, Sep 28, 2009 at 2:46 PM, Michael Burk bur...@gmail.com wrote:
 Hi Dustin,

 Great analysis; thanks for sharing the details of the problem.


 On Mon, Sep 28, 2009 at 12:11 PM, Dustin J. Mitchell dus...@zmanda.com 
 wrote:
  while putting together the patch
 (attached),

 I patched the 0928 snapshot, but it didn't compile:

 util.c: In function `openbsd_fd_inform':
 util.c:1290: error: `DATA_FD_OFFSET' undeclared (first use in this
 function)
 util.c:1290: error: (Each undeclared identifier is reported only once
 util.c:1290: error: for each function it appears in.)
 util.c:1290: error: `DATA_FD_COUNT' undeclared (first use in this
 function)

 I tried adding amandad.h to the file, but that didn't help. Did I
 patch the wrong version?

 Thanks
 Michael



Re: Backup issues with OpenBSD 4.5 machines

2009-09-29 Thread Dustin J. Mitchell
On Tue, Sep 29, 2009 at 12:55 PM, Michael Burk bur...@gmail.com wrote:
 I added the following lines to util.c as a temporary fix:
 #define DATA_FD_COUNT   3               /* number of general-use pipes */
 #define DATA_FD_OFFSET  50

 I ran three test backups, and each succeeded:
 - forced full, without compression, no index
 - forced full, with compression and index
 - unforced, with compression and index

Thanks!  I'll fix up the compile problem -- not sure how it worked for me :)

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


Re: Backup issues with OpenBSD 4.5 machines

2009-09-29 Thread Dustin J. Mitchell
On Tue, Sep 29, 2009 at 3:10 PM, Dustin J. Mitchell dus...@zmanda.com wrote:
 Thanks!  I'll fix up the compile problem -- not sure how it worked for me :)

Duh, it worked for me because I'm not running OpenBSD..

Anyway, fixed in r2149.  Thanks to everyone on this 100+-message thread!

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


Re: Backup issues with OpenBSD 4.5 machines

2009-09-28 Thread Dustin J. Mitchell
On Mon, Sep 28, 2009 at 1:48 PM, Dustin J. Mitchell dus...@zmanda.com wrote:
 Option 1 would be temporary -- eventually, I would like to be able to
 use threads on clients, to support compression and encryption, for
 example.  Option 1 is also harder than it sounds -- futzing with the
 build process is like playing whack-a-mole, where any change causes
 problems on another platform.

Apologies for replying to myself, but while putting together the patch
(attached), I noticed that this will affect the server, too -- amandad
is the gateway for amrecover to get access to data for recoveries.  So
option 1 is not really feasible.

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com
help out OpenBSD's pthreads
(f369fd11620b09b5efb71eb05ce088fa326b4082)

 client-src/noop.c|1 +
 client-src/selfcheck.c   |1 +
 client-src/sendbackup.c  |1 +
 client-src/sendsize.c|1 +
 common-src/util.c|   12 
 common-src/util.h|   10 ++
 restore-src/amidxtaped.c |1 +
 server-src/amindexd.c|1 +
 8 files changed, 28 insertions(+), 0 deletions(-)
diff --git a/client-src/noop.c b/client-src/noop.c
index 1edf46f..e9b6270 100644
--- a/client-src/noop.c
+++ b/client-src/noop.c
@@ -65,6 +65,7 @@ main(
 signal(SIGPIPE, SIG_IGN);
 
 safe_fd(-1, 0);
+openbsd_fd_inform();
 
 check_running_as(RUNNING_AS_CLIENT_LOGIN);
 
diff --git a/client-src/selfcheck.c b/client-src/selfcheck.c
index 1f22717..ba1ee50 100644
--- a/client-src/selfcheck.c
+++ b/client-src/selfcheck.c
@@ -108,6 +108,7 @@ main(
 textdomain(amanda); 
 
 safe_fd(-1, 0);
+openbsd_fd_inform();
 safe_cd();
 
 set_pname(selfcheck);
diff --git a/client-src/sendbackup.c b/client-src/sendbackup.c
index fca1f98..219ebf6 100644
--- a/client-src/sendbackup.c
+++ b/client-src/sendbackup.c
@@ -137,6 +137,7 @@ main(
 textdomain(amanda); 
 
 safe_fd(DATA_FD_OFFSET, DATA_FD_COUNT*2);
+openbsd_fd_inform();
 
 safe_cd();
 
diff --git a/client-src/sendsize.c b/client-src/sendsize.c
index e4d589d..165c363 100644
--- a/client-src/sendsize.c
+++ b/client-src/sendsize.c
@@ -151,6 +151,7 @@ main(
 textdomain(amanda); 
 
 safe_fd(-1, 0);
+openbsd_fd_inform();
 safe_cd();
 
 set_pname(sendsize);
diff --git a/common-src/util.c b/common-src/util.c
index 9ccd9fb..d173e34 100644
--- a/common-src/util.c
+++ b/common-src/util.c
@@ -1282,3 +1282,15 @@ get_pcontext(void)
 return pcontext;
 }
 
+#ifdef __OpenBSD__
+void
+openbsd_fd_inform(void)
+{
+int i;
+for (i = DATA_FD_OFFSET; i  DATA_FD_OFFSET + DATA_FD_COUNT*2; i++) {
+   /* a simple fcntl() will cause the library to look at this file
+* descriptor, which is good enough */
+   (void)fcntl(i, F_GETFL);
+}
+}
+#endif
diff --git a/common-src/util.h b/common-src/util.h
index 8604cf1..f74910a 100644
--- a/common-src/util.h
+++ b/common-src/util.h
@@ -348,4 +348,14 @@ void proplist_add_to_argv(gpointer key_p,
  gpointer user_data_p);
 
 
+/* Inform the OpenBSD pthread library about the high-numbered file descriptors
+ * that an amandad service inherits.  This won't be necessary once the new
+ * threading library is availble (OpenBSD 5.0?), but won't hurt anyway.  See 
the
+ * thread Backup issues with OpenBSD 4.5 machines from September 2009. */
+#ifdef __OpenBSD__
+void openbsd_fd_inform(void);
+#else
+#define openbsd_fd_inform()
+#endif
+
 #endif /* UTIL_H */
diff --git a/restore-src/amidxtaped.c b/restore-src/amidxtaped.c
index 3b5b904..42aa369 100644
--- a/restore-src/amidxtaped.c
+++ b/restore-src/amidxtaped.c
@@ -261,6 +261,7 @@ main(
 textdomain(amanda); 
 
 safe_fd(DATA_FD_OFFSET, 4);
+openbsd_fd_inform();
 safe_cd();
 
 /* Don't die when child closes pipe */
diff --git a/server-src/amindexd.c b/server-src/amindexd.c
index e2d56d5..98e3c28 100644
--- a/server-src/amindexd.c
+++ b/server-src/amindexd.c
@@ -1251,6 +1251,7 @@ main(
 textdomain(amanda); 
 
 safe_fd(DATA_FD_OFFSET, 2);
+openbsd_fd_inform();
 safe_cd();
 
 /*


Re: Backup issues with OpenBSD 4.5 machines

2009-09-28 Thread Dustin J. Mitchell
OK, I have a more in-depth summary of exactly what's going on here,
and why the fcntl() calls fix it.  The good news: we've stumbled on a
pretty stable fix for this problem.

As background, the Amanda client operates something like this:

amandad is invoked by (x)inetd or some other mechanism
amandad uses the Amanda protocol to determine which service the
server is requesting (sendbackup in this case)
amandad forks and executes that service, after setting up a bunch of
pipes for it.

The unusual thing is that, aside from the usual stdin/stdout/stderr
(fd's 0-2), amandad sets up six pipes at hard-wired file descriptors
50-55, and sendbackup uses those to send the data, index, and message
streams back to the server.

As background on POSIX:

Multiple processes can hold the same file open at the same time.  This
is how, for example, a backgrounded process in the shell can share
your terminal with the shell itself.  Each of those processes would
like to access the file either in blocking mode (waiting for data to
be available) or nonblocking mode (immediately returning when no data
is available, to allow the application to work on something else while
waiting).  Unfortunately, POSIX specifies that the file *itself*
carries the O_NONBLOCK flag, so it is not specific to the application.
 In the case at hand, Amanda is accessing a particular pipe in
nonblocking mode (for reasons explained below), while gzip expects it
to be in blocking mode, and this leads to the EAGAIN that is killing
gzip.

As background on the OpenBSD pthreads (or, more accurately, uthreads
-- lib/libpthread/uthread):

This library shims its way between an application and the kernel, and
implements blocking threaded operations on file descriptors using
nonblocking kernel operations and a select() loop.  In order to do so,
it must set O_NONBLOCK on every file it accesses.  This is easily
accomplished by wrapping open(), pipe(), dup(), dup2(), socket(), and
so on -- the syscalls which create new file descriptors.  However,
inherited file descriptors -- those opened by the parent before
calling execve() -- are a little bit harder, because the library has
no way to know about them.  It hides this O_NONBLOCK flag from the
application by masking it out of the fcntl() return value.

The solution that uthreads uses is to start tracking a file the first
time it is referenced in a syscall (in _thread_fd_lock, to be
precise).  It sets O_NONBLOCK when it starts tracking the file, and
then removes the flag at the appropriate time (execve, in particular).

In the failure mode, uthreads finds out about the index file *after*
it has forked the gzip child, when sendbackup tries to close the file
descriptor.  Due to the design of the library, it carefully sets
O_NONBLOCK before closing the file descriptor, leading gzip to get an
EAGAIN error.

The mysterious fcntl() calls, however, serve as a warning to uthreads
that the index file exists.  Uthreads sets the O_NONBLOCK flag when
performing the fcntl(), but then clears it on execve(), so everything
works as expected.

So, there are really two fixes available, until OpenBSD's new
threading library is available:
 1. don't link Amanda client libraries with threading libraries
 2. inform uthreads of the high-numbered FD's in all of the service
binaries, using fcntl()

Option 1 would be temporary -- eventually, I would like to be able to
use threads on clients, to support compression and encryption, for
example.  Option 1 is also harder than it sounds -- futzing with the
build process is like playing whack-a-mole, where any change causes
problems on another platform.

From my analysis above, option 2 is fairly robust (as robust as
OpenBSD's pthreads, anyway), and won't cause any trouble on systems
with non-buggy threading libraries.  So I'm leaning that direction.

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


Re: Backup issues with OpenBSD 4.5 machines

2009-09-28 Thread Michael Burk
Hi Dustin,

Great analysis; thanks for sharing the details of the problem.


On Mon, Sep 28, 2009 at 12:11 PM, Dustin J. Mitchell dus...@zmanda.com wrote:
  while putting together the patch
 (attached),

I patched the 0928 snapshot, but it didn't compile:

util.c: In function `openbsd_fd_inform':
util.c:1290: error: `DATA_FD_OFFSET' undeclared (first use in this
function)
util.c:1290: error: (Each undeclared identifier is reported only once
util.c:1290: error: for each function it appears in.)
util.c:1290: error: `DATA_FD_COUNT' undeclared (first use in this
function)

I tried adding amandad.h to the file, but that didn't help. Did I
patch the wrong version?

Thanks
Michael


Re: Backup issues with OpenBSD 4.5 machines

2009-09-28 Thread Nathan Stratton Treadway
On Mon, Sep 28, 2009 at 13:48:37 -0400, Dustin J. Mitchell wrote:
 The mysterious fcntl() calls, however, serve as a warning to uthreads
 that the index file exists.  Uthreads sets the O_NONBLOCK flag when
 performing the fcntl(), but then clears it on execve(), so everything
 works as expected.

Congrats on tracking down all that! :)

One followup, question, though:

Early on in this thread (Aug 30 and 31), Michael reported that one of
the early patches that Jean-Louis sent to him (or to Stan?) did fix the
original gzip: stdout: Resource temporarily unavailable message... but
then there was instead a similar error from the sed process that's
part of the index-generation pipeline.  (Based on the followup
discussion, it seems that early patch only touched the handle for the
data file, though I am not sure exact where/when.)

However, a bit later on (Sept 4), Michael reported that the one-line
patch that touched just datafd (in sendbackup.c/main(), immediately
after values are assigned to datafd/mesgfd/indexfd) was enough to
allow his dumps to complete without error, no matter what combination of
indexing and compression he had configured.


Now that you have figured out that this is all, in fact, related to
pthreads, do you understand why this later patch was able to eliminate
the errors from the sed process as well?


Nathan



Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko  Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


signature.asc
Description: Digital signature


Re: Backup issues with OpenBSD 4.5 machines

2009-09-28 Thread Nathan Stratton Treadway
On Mon, Sep 28, 2009 at 14:11:51 -0400, Dustin J. Mitchell wrote:
 +++ b/common-src/util.c
 @@ -1282,3 +1282,15 @@ get_pcontext(void)
  return pcontext;
  }
  
 +#ifdef __OpenBSD__
[...]
 diff --git a/common-src/util.h b/common-src/util.h
 index 8604cf1..f74910a 100644
 --- a/common-src/util.h
 +++ b/common-src/util.h
 @@ -348,4 +348,14 @@ void proplist_add_to_argv(gpointer key_p,
 gpointer user_data_p);
  
  
 +/* Inform the OpenBSD pthread library about the high-numbered file 
 descriptors
 + * that an amandad service inherits.  This won't be necessary once the new
 + * threading library is availble (OpenBSD 5.0?), but won't hurt anyway.  See 
 the
 + * thread Backup issues with OpenBSD 4.5 machines from September 2009. */
 +#ifdef __OpenBSD__
 +void openbsd_fd_inform(void);
 +#else
[...]

As you say, having this extra call shouldn't really hurt anything, but I
wondering if it would make sense to tweak the #ifdef so that is possible
to compile Amanda on OpenBSD without having to include the work-around
even in environments where it's not necessary (e.g. when using the new
theading library you mention)?

(For example, does Configure generate any particular #define related to
the use of the pthread library?)

Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko  Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


signature.asc
Description: Digital signature


Re: Backup issues with OpenBSD 4.5 machines

2009-09-09 Thread Dustin J. Mitchell
On Tue, Sep 8, 2009 at 10:20 PM, Nathan Stratton
Treadwaynatha...@ontko.com wrote:
 At that time the discussion was focused on common-src/stream.c, which
 hadn't changed significantly between those versions, but it would be
 interesting to know if there were any changes in the sendbackup code
 path after 2.5.0p1 that seem like they might be related to this bug

Sigh, that reaches way back into the CVS days and Jean-Louis' memory
-- he's not around right now.  There were *bunches* of changes between
2.5.0 and 2.5.2, not least of which the introduction of the Backup API
(which became the Application API).  If we could reproduce the
behavior in a test program, exploring the history might be a more
fruitful avenue for pursuit, since we'd know what to look for.

One thing we could do is remove all of the other patches and replace
the fixing fcntl() calls with something similar -- maybe just
sleep(1)?  If this fixes the problem, then that's evidence this was
a race condition.

Alternately, remove all of the other patches and add:

1285 ptr = buffer;
1286 bytes_written = 0;

g_debug(fcntl(%d) = %d, fileno(pipe_fp), fcntl(fileno(pipe_fp),
F_GETFL, 0), O_NONBLOCK);

1287 just_written = full_write(fileno(pipe_fp), ptr, (size_t)bytes_read);
1288 if (just_written  (size_t)bytes_read) {

and try to trigger the same error.  I suspect you'll find that the
fcntl is returning something with bit 2 (O_NONBLOCK) set.

Dustin

--
Open Source Storage Engineer
http://www.zmanda.com


Re: Backup issues with OpenBSD 4.5 machines

2009-09-09 Thread Michael Burk
Hi Dustin - one question before I post on the OpenBSD list. In reviewing a
post Stan made a couple weeks ago to the OpenBSD list, someone asked if
Amanda uses pthreads. I noticed that ldd reports that the binaries link to
libpthread. Does Amanda use pthreads, either directly or through some other
library?

-- Michael

On Wed, Sep 9, 2009 at 7:31 AM, Dustin J. Mitchell dus...@zmanda.comwrote:

 On Tue, Sep 8, 2009 at 10:20 PM, Nathan Stratton
 Treadwaynatha...@ontko.com wrote:
  At that time the discussion was focused on common-src/stream.c, which
  hadn't changed significantly between those versions, but it would be
  interesting to know if there were any changes in the sendbackup code
  path after 2.5.0p1 that seem like they might be related to this bug

 Sigh, that reaches way back into the CVS days and Jean-Louis' memory
 -- he's not around right now.  There were *bunches* of changes between
 2.5.0 and 2.5.2, not least of which the introduction of the Backup API
 (which became the Application API).  If we could reproduce the
 behavior in a test program, exploring the history might be a more
 fruitful avenue for pursuit, since we'd know what to look for.

 One thing we could do is remove all of the other patches and replace
 the fixing fcntl() calls with something similar -- maybe just
 sleep(1)?  If this fixes the problem, then that's evidence this was
 a race condition.

 Alternately, remove all of the other patches and add:

 1285 ptr = buffer;
 1286 bytes_written = 0;

 g_debug(fcntl(%d) = %d, fileno(pipe_fp), fcntl(fileno(pipe_fp),
 F_GETFL, 0), O_NONBLOCK);

 1287 just_written = full_write(fileno(pipe_fp), ptr,
 (size_t)bytes_read);
 1288 if (just_written  (size_t)bytes_read) {

 and try to trigger the same error.  I suspect you'll find that the
 fcntl is returning something with bit 2 (O_NONBLOCK) set.

 Dustin

 --
 Open Source Storage Engineer
 http://www.zmanda.com



Re: Backup issues with OpenBSD 4.5 machines

2009-09-09 Thread Dustin J. Mitchell
On Wed, Sep 9, 2009 at 11:57 AM, Michael Burkbur...@gmail.com wrote:
 Hi Dustin - one question before I post on the OpenBSD list. In reviewing a
 post Stan made a couple weeks ago to the OpenBSD list, someone asked if
 Amanda uses pthreads. I noticed that ldd reports that the binaries link to
 libpthread. Does Amanda use pthreads, either directly or through some other
 library?

Yes, for uses = links against in the particular cases of amandad
and sendbackup.  No client-side processes ever run more than one
thread, though.

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


Re: Backup issues with OpenBSD 4.5 machines

2009-09-08 Thread Dustin J. Mitchell
On Tue, Sep 8, 2009 at 11:45 AM, Michael Burkbur...@gmail.com wrote:
 Thanks Dustin. Attached are the gory details.

What about the amandad logfile?

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


Re: Backup issues with OpenBSD 4.5 machines

2009-09-08 Thread Dustin J. Mitchell
On Tue, Sep 8, 2009 at 12:42 PM, Michael Burkbur...@gmail.com wrote:
 Sorry - here they are.

OK, I don't see anything funny there.  I think it's finally time to
take this to the OpenBSD list and see if they can find anything in
this wilderness.  It might help to provide a pointer to the archive of
this (now *very* long) thread.

I took a look at the kernel source.. EAGAIN only appears in a few
places, and from what I can tell is adequately protected by
O_NONBLOCK/FNONBLOCK in sys_pipe.c.  Looking at sys_fcntl, the F_GETFL
path is very simple and doesn't do any kind of caching that might
change behavior.

At this point, I'm guessing that this may be a race condition, and
that the fcntl() calls are just enough to change the winner of the
race.  File flags are global to the open file -- they are not
duplicated on a fork, and do not track with a particular process.  So,
hypothetically, if amandad opens a file descriptor, hands it off to
sendbackup, and then sets it to nonblocking mode, then we would see
this behavior.  The only wrinkle in this theory is that amandad very
quickly closes its copy of the file, and doesn't ever set it to
nonblocking mode.

At this point, I'm out of options.

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


Re: Backup issues with OpenBSD 4.5 machines

2009-09-05 Thread Dustin J. Mitchell
OK, sorry for the delay.  Attached is a patch, also at
  
http://github.com/djmitche/amanda/commit/435ed5a820819188578df4a8a730a6b084a9f29f.patch

which adds a whole bunch of debugging.  I'd like to hear how this
works, and to see the sendbackup and amandad debug log files.

This should elucidate any weird file-descriptor stuff going on.

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


lots_of_debugging.patch
Description: Binary data


Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 Thread Nathan Stratton Treadway
On Wed, Sep 02, 2009 at 11:11:39 -0600, Michael Burk wrote:
 This was a good idea; I tried it with one modification:  I determined
 earlier that the failure happens without indexing also, so I added just the
 line:
 fcntl(datafd, F_GETFL, 0);
 and that fixed the problem as well. So I guess this is truly the minimal
 patch!

Are you referring back to the results you posted to here (to
amanda-users) on Sept 1?  (That is, this message:
  http://www.mail-archive.com/amanda-users@amanda.org/msg42749.html
)

I take it you are concluding that the End of ape detected message in
that log file is actually caused by the same underlying resource
temporarily unavailable error on the pipe as the the index tee cannot
write message you originally reported?  (It does seem to show up in the
same spot during the dump, right after Pass III starts.)

Did you try running a dump with the above one-line patch applied and
indexing turned off?

Nathan



Nathan



Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko  Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 Thread Dustin J. Mitchell
Attached is a test program I just put together which does about what
Jean-Louis specified above (with the addition of some closed fd's).
This works fine on my mac, which is the closest approximation to
OpenBSD I have access to at the moment.  How does it work on 4.5?

The file-descriptor gymnastics that amandad performs is pretty crazy,
so it's quite possible I've misread the code and it's doing something
stupid, but please let me know what this test program does.

Second, in Jean-Louis' minimal patch, can you change it to read

  g_debug(first, fcntl returns %d; O_NONBLOCK=%d, fcntl(datafd,
F_GETFL, 0), O_NONBLOCK);
  g_debug(and then fcntl returns %d; O_NONBLOCK=%d, fcntl(datafd,
F_GETFL, 0), O_NONBLOCK);

and let me know what the corresponding lines in the debug file say?
It will be interesting to know if the descriptor flags have O_NONBLOCK
set either time.

We'll get to the bottom of this :)

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


test.c
Description: Binary data


Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 Thread Nathan Stratton Treadway
On Wed, Sep 02, 2009 at 11:11:39 -0600, Michael Burk wrote:
 This was a good idea; I tried it with one modification:  I determined
 earlier that the failure happens without indexing also, so I added just the
 line:
 fcntl(datafd, F_GETFL, 0);
 and that fixed the problem as well. So I guess this is truly the minimal
 patch!
 

I tried to understand the program flow for sendbackup.c and
sendbackup_dump.c, and I agree with Dustin that it is pretty crazy,
but as far as I can tell the only place that would generate the message
index tee cannot write [ is from the following code found in 
the start_index() function in sendbackup.c:

 just_written = full_write(3, ptr, bytes_read);
 if (just_written  (size_t)bytes_read) {
error(_(index tee cannot write [%s]), strerror(errno));
/*NOTREACHED*/
 } else {



Tracing backwards from there, it appears that the file descriptor #3 in
this chunck of code will point to the same file as the datafd
referenced in the patch you have been trying only if client-side compression
is not turned on.  If it is turned on, it looks like fd #3 will instead
point to the input side of the process that is doing the compression.

So, if other avenues of research don't lead to a solution first, it
might be interesting to try turning on client side compression (both
with and without the one-line patch applied), to see if a similar error
occurs in that case (and if so, whether the error is still in the index-
tee-write operation or if the compression process complains instead).


Also, when switched to your one-line patch (where you call fnctl() on
datafd only), did you start getting the 
   strange(?): sed: stdout: Resource temporarily unavailable
messages in the log file again?


Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko  Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 Thread Dustin J. Mitchell
On Fri, Sep 4, 2009 at 4:57 PM, Michael Burkbur...@gmail.com wrote:
 Thanks again for your help. Here's the output of the test prog:

Looks just like it does locally.  If the test had managed to reproduce
this failure, then I would have expected to see
  write: Resource temporarily unavailable

That means something deeper is going on here.

 And here's the output after the patch change (again with the 0831 snapshot):

And this shows me that, as far as the kernel is concerned, O_NONBLOCK
is not set on the file descriptor even on the first call.

Since my test program hasn't replicated the error, I think we should
explore a bit more before going to the openbsd lists.  What I'd like
to put together is a patch to amandad and sendbackup that posts a
debug message about every dup(), dup2(), pipe(), fcntl(), and close()
operation that they perform.  I'll whip that up and send it along in a
few minutes.

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 Thread Michael Burk
Hi Dustin,

Thanks again for your help. Here's the output of the test prog:

bu...@selenium$ ./test parent
pipe = r...@3 w...@4
parent closing
parent sleeping
child closing p[0]
child exec'ing
child duping
child closing B
child writing
child write done
parent reading
parent got 4 bytes
bu...@selenium$


And here's the output after the patch change (again with the 0831 snapshot):

1252076894.258118: sendbackup: pid 23986 ruid 150 euid 150 version
2.6.2alpha: start at Fri Sep  4 09:08:14 2009
1252076894.258292: sendbackup: Version 2.6.2alpha
1252076894.280268: sendbackup: pid 23986 ruid 150 euid 150 version
2.6.2alpha: rename at Fri Sep  4 09:08:14 2009
1252076894.280787: sendbackup:   Parsed request as: program `DUMP'
1252076894.280799: sendbackup:  disk `/'
1252076894.280808: sendbackup:  device `/'
1252076894.280818: sendbackup:  level 0
1252076894.280827: sendbackup:  since NODATE
1252076894.280836: sendbackup:  options `'
1252076894.280909: sendbackup: first, fcntl returns 2; O_NONBLOCK=4
1252076894.280933: sendbackup: and then fcntl returns 2; O_NONBLOCK=4
1252076894.281309: sendbackup: start: selenium.___.com:/ lev 0
1252076894.281838: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1252076894.281943: sendbackup: Spawning /sbin/dump dump 0usf 1048576 -
/dev/rsd0a in pipeline
1252076894.282765: sendbackup: Started backup
1252076894.286940: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Fri Sep  4 09:08:14 2009
1252076894.288165: sendbackup:  90:  normal(|):   DUMP: Date of last level 0
dump: the epoch
1252076894.350837: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd0a
(/) to standard output
1252076894.352274: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I)
[regular files]
1252076895.520191: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II)
[directories]
1252076895.520782: sendbackup:  90:  normal(|):   DUMP: estimated 47910 tape
blocks.
1252076895.522299: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at:
Fri Sep  4 09:08:15 2009
1252076895.522802: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III)
[directories]
1252076895.669127: sendbackup:  90:  normal(|):   DUMP: dumping (Pass IV)
[regular files]
1252076906.028912: sendbackup:  43:size(|):   DUMP: 48385 tape blocks
1252076906.029606: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Fri Sep  4 09:08:14 2009
1252076906.030489: sendbackup:  90:  normal(|):   DUMP: Volume 1 completed
at: Fri Sep  4 09:08:26 2009
1252076906.030896: sendbackup:  90:  normal(|):   DUMP: Volume 1 took
0:00:11
1252076906.031250: sendbackup:  90:  normal(|):   DUMP: Volume 1 transfer
rate: 4398 KB/s
1252076906.031614: sendbackup:  90:  normal(|):   DUMP: Date this dump
completed:  Fri Sep  4 09:08:26 2009
1252076906.031962: sendbackup:  90:  normal(|):   DUMP: Average transfer
rate: 4398 KB/s
1252076906.032322: sendbackup:  90:  normal(|):   DUMP: level 0 dump on Fri
Sep  4 09:08:14 2009
1252076906.032667: sendbackup:  90:  normal(|):   DUMP: DUMP IS DONE
1252076906.032777: sendbackup: Parsed backup messages
1252076906.032848: sendbackup: pid 23986 finish time Fri Sep  4 09:08:26
2009

Thanks,
Michael


On Fri, Sep 4, 2009 at 11:11 AM, Dustin J. Mitchell dus...@zmanda.comwrote:

 Attached is a test program I just put together which does about what
 Jean-Louis specified above (with the addition of some closed fd's).
 This works fine on my mac, which is the closest approximation to
 OpenBSD I have access to at the moment.  How does it work on 4.5?

 The file-descriptor gymnastics that amandad performs is pretty crazy,
 so it's quite possible I've misread the code and it's doing something
 stupid, but please let me know what this test program does.

 Second, in Jean-Louis' minimal patch, can you change it to read

  g_debug(first, fcntl returns %d; O_NONBLOCK=%d, fcntl(datafd,
 F_GETFL, 0), O_NONBLOCK);
  g_debug(and then fcntl returns %d; O_NONBLOCK=%d, fcntl(datafd,
 F_GETFL, 0), O_NONBLOCK);

 and let me know what the corresponding lines in the debug file say?
 It will be interesting to know if the descriptor flags have O_NONBLOCK
 set either time.

 We'll get to the bottom of this :)

 Dustin

 --
 Open Source Storage Engineer
 http://www.zmanda.com



Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 Thread Nathan Stratton Treadway
On Fri, Sep 04, 2009 at 17:13:19 -0400, Dustin J. Mitchell wrote:
 On Fri, Sep 4, 2009 at 4:57 PM, Michael Burkbur...@gmail.com wrote:
  Thanks again for your help. Here's the output of the test prog:
 
 Looks just like it does locally.  If the test had managed to reproduce
 this failure, then I would have expected to see
   write: Resource temporarily unavailable
 
 That means something deeper is going on here.

I wonder if more data needs to get sent down the pipe before the
unexpected Resource temporarily unavailable result is returned?


  And here's the output after the patch change (again with the 0831 snapshot):
 
 And this shows me that, as far as the kernel is concerned, O_NONBLOCK
 is not set on the file descriptor even on the first call.
 
 Since my test program hasn't replicated the error, I think we should
 explore a bit more before going to the openbsd lists.  What I'd like
 to put together is a patch to amandad and sendbackup that posts a
 debug message about every dup(), dup2(), pipe(), fcntl(), and close()
 operation that they perform.  I'll whip that up and send it along in a
 few minutes.

If I have followed this thread correctly, it seems like the tests show
that the kernel treating the file descriptor as if O_NONBLOCK is set even
when it's not... but seems to revert to the correct behavior if
fcntl(XX, F_GETFL, 0) is called.  

If just running a F_GETFL operation changes the behavior of a file
handle, it seems to point to a kernel bug of some kind But the
tracing you are talking about, combined with moving the F_GETFL
operation to different stages of the program flow may help narrow down
which point in the file-descriptor gymnastics is triggering that bug.

For example, if you found that adding
  fcntl(3, F_GETFL, 0); 
right after the 
  dup2(input, 3);
line in that in sendbackup.c:start_index() behaved differently than
adding 
  fcntl(input, F_GETFL, 0); 
in that same spot, that would tend to point to something going wrong
in dup2


Nathan




Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko  Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 Thread Michael Burk
Nathan, I think the following truth table will answer all your questions!
(view in monospace font)

Patch   gzipIndex Result
no   no  no   dump failed (end of tape)
no   no yes   index tee cannot write
no  yes  no   gzip strange, dump failed
no  yes yes   gzip strange, index tee strange, dump failed
yes   *   *   success, nothing strange


On Fri, Sep 4, 2009 at 3:01 PM, Nathan Stratton Treadway natha...@ontko.com
 wrote:

 On Wed, Sep 02, 2009 at 11:11:39 -0600, Michael Burk wrote:
  This was a good idea; I tried it with one modification:  I determined
  earlier that the failure happens without indexing also, so I added just
 the
  line:
  fcntl(datafd, F_GETFL, 0);
  and that fixed the problem as well. So I guess this is truly the minimal
  patch!
 

 I tried to understand the program flow for sendbackup.c and
 sendbackup_dump.c, and I agree with Dustin that it is pretty crazy,
 but as far as I can tell the only place that would generate the message
 index tee cannot write [ is from the following code found in
 the start_index() function in sendbackup.c:

 just_written = full_write(3, ptr, bytes_read);
 if (just_written  (size_t)bytes_read) {
error(_(index tee cannot write [%s]), strerror(errno));
/*NOTREACHED*/
 } else {



 Tracing backwards from there, it appears that the file descriptor #3 in
 this chunck of code will point to the same file as the datafd
 referenced in the patch you have been trying only if client-side
 compression
 is not turned on.  If it is turned on, it looks like fd #3 will instead
 point to the input side of the process that is doing the compression.

 So, if other avenues of research don't lead to a solution first, it
 might be interesting to try turning on client side compression (both
 with and without the one-line patch applied), to see if a similar error
 occurs in that case (and if so, whether the error is still in the index-
 tee-write operation or if the compression process complains instead).


 Also, when switched to your one-line patch (where you call fnctl() on
 datafd only), did you start getting the
strange(?): sed: stdout: Resource temporarily unavailable
 messages in the log file again?


Nathan


 
 Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
 Ray Ontko  Co.  -  Software consulting services  -
 http://www.ontko.com/
  GPG Key: 
 http://www.ontko.com/~nathanst/gpg_key.txthttp://www.ontko.com/%7Enathanst/gpg_key.txt
   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 Thread Nathan Stratton Treadway
On Fri, Sep 04, 2009 at 16:41:31 -0600, Michael Burk wrote:
 Nathan, I think the following truth table will answer all your questions!
 (view in monospace font)
 
 Patch   gzipIndex Result
 no   no  no   dump failed (end of tape)
 no   no yes   index tee cannot write
 no  yes  no   gzip strange, dump failed
 no  yes yes   gzip strange, index tee strange, dump failed
 yes   *   *   success, nothing strange
 

Most of them :)

Am I guessing correctly that the  Patch here was one of the earlier
patches that touched all three file descriptors?  And that when you
tried the one-line patch (that touched only datafd), your dump
succeeded, but sed complained about stdout again?

Nathan



Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko  Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 Thread Michael Burk
Sorry I didn't make that clear. The patch was the one-liner (datafd only). I
ran it again with the Dustin's 2-liner with the same results.

BTW, I ran these latest tests (all 8 runs) on the 0904 snapshot. It builds
cleanly on OpenBSD now.

-- Michael

On Fri, Sep 4, 2009 at 4:47 PM, Nathan Stratton Treadway natha...@ontko.com
 wrote:

 On Fri, Sep 04, 2009 at 16:41:31 -0600, Michael Burk wrote:
  Nathan, I think the following truth table will answer all your questions!
  (view in monospace font)
 
  Patch   gzipIndex Result
  no   no  no   dump failed (end of tape)
  no   no yes   index tee cannot write
  no  yes  no   gzip strange, dump failed
  no  yes yes   gzip strange, index tee strange, dump failed
  yes   *   *   success, nothing strange
 

 Most of them :)

 Am I guessing correctly that the  Patch here was one of the earlier
 patches that touched all three file descriptors?  And that when you
 tried the one-line patch (that touched only datafd), your dump
 succeeded, but sed complained about stdout again?

Nathan



 
 Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
 Ray Ontko  Co.  -  Software consulting services  -
 http://www.ontko.com/
  GPG Key: 
 http://www.ontko.com/~nathanst/gpg_key.txthttp://www.ontko.com/%7Enathanst/gpg_key.txt
   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 Thread Nathan Stratton Treadway
On Fri, Sep 04, 2009 at 16:49:24 -0600, Michael Burk wrote:
 Sorry I didn't make that clear. The patch was the one-liner (datafd only). I
 ran it again with the Dustin's 2-liner with the same results.

Ah, interesting.

I wonder if doing a one-line patch against (say) indexfd instead would
also prevent the index tee cannot write [ error message.  That would
imply that just making _any_ fnctl(XX, F_GETFL,0) call is sufficient to
get the correct O_NONBLOCK behavior on _all_ file descriptors

Nathan




Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko  Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 Thread Nathan Stratton Treadway
On Fri, Sep 04, 2009 at 19:40:28 -0400, Nathan Stratton Treadway wrote:
 On Fri, Sep 04, 2009 at 16:49:24 -0600, Michael Burk wrote:
  Sorry I didn't make that clear. The patch was the one-liner (datafd only). I
  ran it again with the Dustin's 2-liner with the same results.
 
 Ah, interesting.
 
 I wonder if doing a one-line patch against (say) indexfd instead would
 also prevent the index tee cannot write [ error message.  That would
 imply that just making _any_ fnctl(XX, F_GETFL,0) call is sufficient to
 get the correct O_NONBLOCK behavior on _all_ file descriptors

But, on the other hand, I don't immediately understand why the one-line
datafd patch would avoid the sed stdout error that you got when tried
Jean-Louis's pipe-blocking.diff patch [from Aug 24], as you reported on
Aug 30.  

That patch, I'm assuming, affected the file descriptor called datafd
(though by a different name), but not indexfd, and it seemed to allow
the index tee write to succeed, thus allowing the data to flow along far
enough to reach the stdout error from sed (which, I belive, is tied to
that indexfd file descriptor).

The later three-line patch touched both datafd and indexfd, and it
seemed to fix that second problem (and the first stayed fixed as well).

But it sounds like you are saying that the one-line patch, which touched
only datafd, actually fixed both problems as well


Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko  Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 Thread Michael Burk
On Fri, Sep 4, 2009 at 6:21 PM, Nathan Stratton Treadway natha...@ontko.com
 wrote:

 snip

 But it sounds like you are saying that the one-line patch, which touched
 only datafd, actually fixed both problems as well

 Exactly.


Re: Backup issues with OpenBSD 4.5 machines

2009-09-02 Thread Michael Burk
This was a good idea; I tried it with one modification:  I determined
earlier that the failure happens without indexing also, so I added just the
line:
fcntl(datafd, F_GETFL, 0);
and that fixed the problem as well. So I guess this is truly the minimal
patch!

-- Michael

On Tue, Sep 1, 2009 at 1:43 PM, Nathan Stratton Treadway natha...@ontko.com
 wrote:

 On Tue, Sep 01, 2009 at 11:31:26 -0600, Michael Burk wrote:
  I applied the 3-line patch to the 0831 snapshot and ran a full backup on
  both machines, with 4 file systems each. All 8 completed successfully
 with
  no strange messages.
 
  Next, I commented out the 3 new lines and tried the backup again on one
 of
  the machines. This time all 4 file systems failed; e.g.:
 [...]
  So it seems reliable that those 3 lines fix the problem somehow.
  Anything else you want to try before I ask for help on the OpenBSD list?

 I'm no expert on this topic, but if I were investigating something like
 this, I'd be curious to know if all three of the lines in the patch were
 necessary for the fix.

 Since, as you pointed out, the error seems to be tied to the indexing
 subprocess, I wonder what would happen if you included only the one

   fcntl(indexfd, F_GETFL, 0);

 line of the patch, but not the other two


Nathan


 
 Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
 Ray Ontko  Co.  -  Software consulting services  -
 http://www.ontko.com/
  GPG Key: 
 http://www.ontko.com/~nathanst/gpg_key.txthttp://www.ontko.com/%7Enathanst/gpg_key.txt
   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Re: Backup issues with OpenBSD 4.5 machines

2009-09-02 Thread Michael Burk
I modified sendbackup-dump.c to run ktrace, e.g.:
/usr/bin/ktrace -id -t censw -f /tmp/sendbackup.trc /sbin/dump 0usf 1048576
- /dev/rsd0d

Unfortunately, I don't get a backup, even with the patch applied. The trace
output shows write errors because of a broken pipe with or without the
patches, like this:
 21835 dump CALL  write(0x2,0x53fbec,0x8)
 21835 dump RET   write -1 errno 32 Broken pipe
So I suspect my approach is not correct. Any other ideas how I might get
some useful trace output?

Thanks,
Michael

On Tue, Sep 1, 2009 at 1:25 PM, Dustin J. Mitchell dus...@zmanda.comwrote:

 On Tue, Sep 1, 2009 at 3:18 PM, Jean-Louis
 Martineaumartin...@zmanda.com wrote:
  I have nothing else to try.
  The order of system call is a follow:

 If it's not too hard, it would be nice to have a ktrace or equivalent
 of this, first to look at here, and second to take to the OpenBSD
 list.  I know that's tricky since this is a daemon process..

 Dustin

 --
 Open Source Storage Engineer
 http://www.zmanda.com



Re: Backup issues with OpenBSD 4.5 machines

2009-09-02 Thread Dustin J. Mitchell
On Wed, Sep 2, 2009 at 1:07 PM, Michael Burkbur...@gmail.com wrote:
 So I suspect my approach is not correct. Any other ideas how I might get
 some useful trace output?

Can you have amandad sleep for, say, 120 seconds just before it
launches sendbackup, and somenow notify you of the pid to which you
should attach ktrace?  Maybe by writing it to /dev/console or to
syslog?

By the way, it's the launching of sendbackup that seems to be failing,
not the launching of dump.

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


Re: Backup issues with OpenBSD 4.5 machines

2009-09-01 Thread Jean-Louis Martineau

We need to find a minimal patch that fix the problem.
Cat you try the attached patch?

Jean-Louis

Michael Burk wrote:
I applied the patches to the 0827 version again (had to do the 
sendbackup-dump.c patch by hand since the patch was for another 
version). I ran a full backup again, and all 3 file systems were 
successful and there were no strange messages. The debug logs show 
these lines:

1251756671.281800: sendbackup: fd 3 is set with O_NONBLOCK before dup2
1251756681.457166: sendbackup: Index created successfully

This is looking like some bizarre instance of the observer effect!


Index: client-src/sendbackup.c
===
--- client-src/sendbackup.c	(revision 2126)
+++ client-src/sendbackup.c	(working copy)
@@ -406,6 +406,11 @@
 	mesgfd = DATA_FD_OFFSET + 2;
 	indexfd = DATA_FD_OFFSET + 4;
 }
+
+fcntl(datafd, F_GETFL, 0);
+fcntl(mesgfd, F_GETFL, 0);
+fcntl(indexfd, F_GETFL, 0);
+
 if (!dle-create_index)
 	indexfd = -1;
 


Re: Backup issues with OpenBSD 4.5 machines

2009-09-01 Thread Michael Burk
I checked the errata for OpenBSD 4.5, but saw nothing that looked related.
I applied the patch to the 0831 snapshot and am building it now. After we
find the minimal patch, as Jean-Louis said, I'll post on the OpenBSD-misc
list to see if anyone has an explanation.

Thanks guys for working on this - I know it's a frustrating one.

-- Michael

On Tue, Sep 1, 2009 at 8:33 AM, Dustin J. Mitchell dus...@zmanda.comwrote:

 On Tue, Sep 1, 2009 at 7:45 AM, Jean-Louis
 Martineaumartin...@zmanda.com wrote:
  We need to find a minimal patch that fix the problem.
  Cat you try the attached patch?

 This is starting to look like a kernel bug -- is there an associated
 OpenBSD bug or something that we could reference in comments in the
 code to explain this strange formulation?  Also, we'll probably need
 to do this everywhere we fork and plumb a new process -- in
 pipespawn.c at least.

 Dustin

 --
 Open Source Storage Engineer
 http://www.zmanda.com



Re: Backup issues with OpenBSD 4.5 machines

2009-09-01 Thread Dustin J. Mitchell
On Tue, Sep 1, 2009 at 7:45 AM, Jean-Louis
Martineaumartin...@zmanda.com wrote:
 We need to find a minimal patch that fix the problem.
 Cat you try the attached patch?

This is starting to look like a kernel bug -- is there an associated
OpenBSD bug or something that we could reference in comments in the
code to explain this strange formulation?  Also, we'll probably need
to do this everywhere we fork and plumb a new process -- in
pipespawn.c at least.

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


Re: Backup issues with OpenBSD 4.5 machines

2009-09-01 Thread Michael Burk
Since the logs indicate an error with the index tee, I thought I'd try
turning off the index generation. The backup still failed, but with a
different error:

1251826429.341455: sendbackup: pid 26726 ruid 150 euid 150 version
2.6.2alpha: start at Tue Sep  1 11:33:49 2009
1251826429.342070: sendbackup: Version 2.6.2alpha
1251826429.364640: sendbackup: pid 26726 ruid 150 euid 150 version
2.6.2alpha: rename at Tue Sep  1 11:33:49 2009
1251826429.366628: sendbackup:   Parsed request as: program `DUMP'
1251826429.35: sendbackup:  disk `/'
1251826429.366690: sendbackup:  device `/'
1251826429.366714: sendbackup:  level 0
1251826429.366737: sendbackup:  since NODATE
1251826429.366761: sendbackup:  options `'
1251826429.367758: sendbackup: start: zirconium.example.com:/ lev 0
1251826429.369745: sendbackup: dumping device '/dev/rsd1a' with 'ffs'
1251826429.370181: sendbackup: Spawning /sbin/dump dump 0usf 1048576 -
/dev/rsd1a in pipeline
1251826429.375384: sendbackup: Started backup
1251826429.395855: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Tue Sep  1 11:33:49 2009
1251826429.460067: sendbackup:  90:  normal(|):   DUMP: Date of last level 0
dump: the epoch
1251826429.46: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd1a
(/) to standard output
1251826429.538792: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I)
[regular files]
1251826430.811052: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II)
[directories]
1251826430.812848: sendbackup:  90:  normal(|):   DUMP: estimated 47600 tape
blocks.
1251826430.814960: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at:
Tue Sep  1 11:33:50 2009
1251826430.820737: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III)
[directories]
1251826430.829936: sendbackup:  90:  normal(|):   DUMP: End of tape detected
1251826430.833563: sendbackup:  90:  normal(|):   DUMP: Volume 1 completed
at: Tue Sep  1 11:33:50 2009
1251826430.836947: sendbackup:  90:  normal(|):   DUMP: Change Volumes:
Mount volume #2
1251826430.838051: sendbackup:  86: strange(?):   DUMP: fopen on /dev/tty
fails: Device not configured
1251826430.839081: sendbackup:  90:  normal(|):   DUMP: The ENTIRE dump is
aborted.
1251826430.839642: sendbackup: critical (fatal): error [dump (1011)
/sbin/dump returned 3]

Hope that helps.

-- Michael


On Tue, Sep 1, 2009 at 11:31 AM, Michael Burk bur...@gmail.com wrote:

 I applied the 3-line patch to the 0831 snapshot and ran a full backup on
 both machines, with 4 file systems each. All 8 completed successfully with
 no strange messages.

 Next, I commented out the 3 new lines and tried the backup again on one of
 the machines. This time all 4 file systems failed; e.g.:

 1251825776.552104: sendbackup: pid 24374 ruid 150 euid 150 version
 2.6.2alpha: start at Tue Sep  1 11:22:56 2009
 1251825776.552688: sendbackup: Version 2.6.2alpha
 1251825776.575326: sendbackup: pid 24374 ruid 150 euid 150 version
 2.6.2alpha: rename at Tue Sep  1 11:22:56 2009
 1251825776.577315: sendbackup:   Parsed request as: program `DUMP'
 1251825776.577352: sendbackup:  disk `/'
 1251825776.577377: sendbackup:  device `/'
 1251825776.577400: sendbackup:  level 0
 1251825776.577424: sendbackup:  since NODATE
 1251825776.577446: sendbackup:  options `'
 1251825776.578210: sendbackup: start: zirconium.example.com:/ lev 0
 1251825776.579426: sendbackup: dumping device '/dev/rsd1a' with 'ffs'
 1251825776.582918: sendbackup: Spawning /sbin/dump dump 0usf 1048576 -
 /dev/rsd1a in pipeline
 1251825776.587089: sendbackup: Started backup
 1251825776.619734: sendbackup:  90:  normal(|):   DUMP: Date of this level
 0 dump: Tue Sep  1 11:22:56 2009
 1251825776.632529: sendbackup: Started index creator: /sbin/restore -tvf -
 21 | sed -e '
 s/^leaf[]*[0-9]*[   ]*\.//
 t
 /^dir[  ]/ {
 s/^dir[ ]*[0-9]*[   ]*\.//
 s%$%/%
 t
 }
 d
 '
 1251825776.721646: sendbackup:  90:  normal(|):   DUMP: Date of last level
 0 dump: the epoch
 1251825776.722686: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd1a
 (/) to standard output
 1251825776.779507: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I)
 [regular files]
 1251825778.010107: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II)
 [directories]
 1251825778.012085: sendbackup:  90:  normal(|):   DUMP: estimated 47600
 tape blocks.
 1251825778.013367: sendbackup:  90:  normal(|):   DUMP: Volume 1 started
 at: Tue Sep  1 11:22:58 2009
 1251825778.020808: sendbackup: critical (fatal): index tee cannot write
 [Resource temporarily unavailable]
 1251825778.020918: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III)
 [directories]
 1251825778.022805: sendbackup: 115: strange(?): sendbackup: index tee
 cannot write [Resource temporarily unavailable]
 1251825778.042344: sendbackup:  90:  

Re: Backup issues with OpenBSD 4.5 machines

2009-09-01 Thread Michael Burk
I applied the 3-line patch to the 0831 snapshot and ran a full backup on
both machines, with 4 file systems each. All 8 completed successfully with
no strange messages.

Next, I commented out the 3 new lines and tried the backup again on one of
the machines. This time all 4 file systems failed; e.g.:

1251825776.552104: sendbackup: pid 24374 ruid 150 euid 150 version
2.6.2alpha: start at Tue Sep  1 11:22:56 2009
1251825776.552688: sendbackup: Version 2.6.2alpha
1251825776.575326: sendbackup: pid 24374 ruid 150 euid 150 version
2.6.2alpha: rename at Tue Sep  1 11:22:56 2009
1251825776.577315: sendbackup:   Parsed request as: program `DUMP'
1251825776.577352: sendbackup:  disk `/'
1251825776.577377: sendbackup:  device `/'
1251825776.577400: sendbackup:  level 0
1251825776.577424: sendbackup:  since NODATE
1251825776.577446: sendbackup:  options `'
1251825776.578210: sendbackup: start: zirconium.example.com:/ lev 0
1251825776.579426: sendbackup: dumping device '/dev/rsd1a' with 'ffs'
1251825776.582918: sendbackup: Spawning /sbin/dump dump 0usf 1048576 -
/dev/rsd1a in pipeline
1251825776.587089: sendbackup: Started backup
1251825776.619734: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Tue Sep  1 11:22:56 2009
1251825776.632529: sendbackup: Started index creator: /sbin/restore -tvf -
21 | sed -e '
s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251825776.721646: sendbackup:  90:  normal(|):   DUMP: Date of last level 0
dump: the epoch
1251825776.722686: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd1a
(/) to standard output
1251825776.779507: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I)
[regular files]
1251825778.010107: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II)
[directories]
1251825778.012085: sendbackup:  90:  normal(|):   DUMP: estimated 47600 tape
blocks.
1251825778.013367: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at:
Tue Sep  1 11:22:58 2009
1251825778.020808: sendbackup: critical (fatal): index tee cannot write
[Resource temporarily unavailable]
1251825778.020918: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III)
[directories]
1251825778.022805: sendbackup: 115: strange(?): sendbackup: index tee cannot
write [Resource temporarily unavailable]
1251825778.042344: sendbackup:  90:  normal(|):   DUMP: Broken pipe
1251825778.047559: sendbackup:  90:  normal(|):   DUMP: The ENTIRE dump is
aborted.
1251825778.048234: sendbackup: critical (fatal): error [dump (21317)
/sbin/dump returned 3]

So it seems reliable that those 3 lines fix the problem somehow.
Anything else you want to try before I ask for help on the OpenBSD list?

Thanks,
Michael

On Tue, Sep 1, 2009 at 9:32 AM, Michael Burk bur...@gmail.com wrote:

 I checked the errata for OpenBSD 4.5, but saw nothing that looked related.
 I applied the patch to the 0831 snapshot and am building it now. After we
 find the minimal patch, as Jean-Louis said, I'll post on the OpenBSD-misc
 list to see if anyone has an explanation.

 Thanks guys for working on this - I know it's a frustrating one.

 -- Michael


 On Tue, Sep 1, 2009 at 8:33 AM, Dustin J. Mitchell dus...@zmanda.comwrote:

 On Tue, Sep 1, 2009 at 7:45 AM, Jean-Louis
 Martineaumartin...@zmanda.com wrote:
  We need to find a minimal patch that fix the problem.
  Cat you try the attached patch?

 This is starting to look like a kernel bug -- is there an associated
 OpenBSD bug or something that we could reference in comments in the
 code to explain this strange formulation?  Also, we'll probably need
 to do this everywhere we fork and plumb a new process -- in
 pipespawn.c at least.

 Dustin

 --
 Open Source Storage Engineer
 http://www.zmanda.com





Re: Backup issues with OpenBSD 4.5 machines

2009-09-01 Thread Jean-Louis Martineau

Michael Burk wrote:

So it seems reliable that those 3 lines fix the problem somehow.
Anything else you want to try before I ask for help on the OpenBSD list?


I have nothing else to try.
The order of system call is a follow:

In amandad process:
 pipe(pipefd)
 dup2(pipefd[1],b)
 fork
 in the child:
 exec sendbackup process
 dup2(b,c)
 write c #fail with EAGAIN


The fix is to do the fcntl call at the beginning of the sendbackup 
process, before the dup2.


Jean-Louis


Re: Backup issues with OpenBSD 4.5 machines

2009-09-01 Thread Dustin J. Mitchell
On Tue, Sep 1, 2009 at 3:18 PM, Jean-Louis
Martineaumartin...@zmanda.com wrote:
 I have nothing else to try.
 The order of system call is a follow:

If it's not too hard, it would be nice to have a ktrace or equivalent
of this, first to look at here, and second to take to the OpenBSD
list.  I know that's tricky since this is a daemon process..

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


Re: Backup issues with OpenBSD 4.5 machines

2009-08-31 Thread Jean-Louis Martineau
You get the error in the index pipe instead of the data path. The backup 
is correct but your index is empty. That's why you get a STRANGE result 
instead of a failure.


Can you post a sendbackup.*.debug for a dle that failed?

Jean-Louis

Michael Burk wrote:

Hello,

I applied the patches to the 20090827 snapshot. I tried it on two 
OpenBSD 4.5 sparc64 systems, forcing both to do full backups. One 
system seemed to work on all 3 file systems, the other failed on all 4 
file systems with the same errors as before (exactly like what Stan 
reported). I'm using bsdtcp auth.


The system that worked, however, did give a strange result in the 
report on one filesystem. Here's the corresponding sendbackup debug 
file:


1251692700.529842: sendbackup: pid 30459 ruid 150 euid 150 version 
2.6.2alpha: start at Sun Aug 30 2

2:25:00 2009
1251692700.530411: sendbackup: Version 2.6.2alpha
1251692700.553135: sendbackup: pid 30459 ruid 150 euid 150 version 
2.6.2alpha: rename at Sun Aug 30

22:25:00 2009
1251692700.555394: sendbackup:   Parsed request as: program `DUMP'
1251692700.555435: sendbackup:  disk `/usr'
1251692700.555461: sendbackup:  device `/usr'
1251692700.555484: sendbackup:  level 0
1251692700.07: sendbackup:  since NODATE
1251692700.30: sendbackup:  options `'
1251692700.556750: sendbackup: start: zirconium.example.com:/usr lev 0
1251692700.559495: sendbackup: dumping device '/dev/rsd1d' with 'ffs'
1251692700.566094: sendbackup: Spawning /sbin/dump dump 0usf 1048576 
- /dev/rsd1d in pipeline

1251692700.571185: sendbackup: Started backup
1251692700.572425: sendbackup: fd 3 is set with O_NONBLOCK before dup2
1251692700.597388: sendbackup: Started index creator: /sbin/restore 
-tvf - 21 | sed -e '

s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251692700.611526: sendbackup:  90:  normal(|):   DUMP: Date of this 
level 0 dump: Sun Aug 30 22:25:

00 2009
1251692700.669703: sendbackup:  90:  normal(|):   DUMP: Date of last 
level 0 dump: the epoch
1251692700.670724: sendbackup:  90:  normal(|):   DUMP: Dumping 
/dev/rsd1d (/usr) to standard output
1251692700.763600: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
I) [regular files]
1251692705.618345: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
II) [directories]
1251692705.620040: sendbackup:  90:  normal(|):   DUMP: estimated 
1330085 tape blocks.
1251692705.623976: sendbackup:  90:  normal(|):   DUMP: Volume 1 
started at: Sun Aug 30 22:25:05 2009
1251692705.650687: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
III) [directories]
1251692745.051547: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
IV) [regular files]
1251692747.915767: sendbackup: 115: strange(?): sed: stdout: Resource 
temporarily unavailable
1251692954.633083: sendbackup:  43:size(|):   DUMP: 1437829 tape 
blocks
1251692954.637414: sendbackup:  90:  normal(|):   DUMP: Date of this 
level 0 dump: Sun Aug 30 22:25:00 2009
1251692954.641414: sendbackup:  90:  normal(|):   DUMP: Volume 1 
completed at: Sun Aug 30 22:29:14 2009

1251692954.641858: sendbackup: Index pipe exited with status 1
1251692954.642719: sendbackup:  90:  normal(|):   DUMP: Volume 1 took 
0:04:09
1251692954.643918: sendbackup:  90:  normal(|):   DUMP: Volume 1 
transfer rate: 5774 KB/s
1251692954.645095: sendbackup:  90:  normal(|):   DUMP: Date this dump 
completed:  Sun Aug 30 22:29:14 2009
1251692954.646239: sendbackup:  90:  normal(|):   DUMP: Average 
transfer rate: 5774 KB/s
1251692954.647430: sendbackup:  90:  normal(|):   DUMP: level 0 dump 
on Sun Aug 30 22:25:00 2009

1251692954.653480: sendbackup:  90:  normal(|):   DUMP: DUMP IS DONE
1251692954.653841: sendbackup: Parsed backup messages
1251692954.654032: sendbackup: pid 30459 finish time Sun Aug 30 
22:29:14 2009


-- Michael

On Fri, Aug 28, 2009 at 5:22 AM, stan st...@panix.com 
mailto:st...@panix.com wrote:


On Thu, Aug 27, 2009 at 03:35:47PM -0600, Michael Burk wrote:
 Cool - can you send me the patch? Or is it already in the 0826
snapshot?

 FYI - I got 2.5.0p2 working, but could never get 2.5.1p3
working, even with
 the same config. No idea why.
 I started looking at the source code last night in the 0825
snapshot. I'll
 discontinue that search if there's a patch.

This is interesting. I sent Jean-Louis Martineau an email, saying
teaks for
fixing things, and he replied that the patch he sent me just added
debugging,
it was not intended to fix anything.

Bit, using this snapshot 20090813, and the patch, which I will
send to you.
I have it working on my 3 OpenBSD 4.5 machines that I have
upgraded so far.
I ran several test yesterday, and a full fledged backup run last
night,
and I have not seen any issues since installing this code.

I am going to attach the patch to 

Re: Backup issues with OpenBSD 4.5 machines

2009-08-31 Thread Michael Burk
Here are the two sendbackup.*.debug files for the / fs.

First:

1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version
2.6.2alpha: start at Sun Aug 30 22:31:42 2009
1251693102.345326: sendbackup: Version 2.6.2alpha
1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version
2.6.2alpha: rename at Sun Aug 30 22:31:42 2009
1251693102.382860: sendbackup:   Parsed request as: program `DUMP'
1251693102.382940: sendbackup:  disk `/'
1251693102.382991: sendbackup:  device `/'
1251693102.383039: sendbackup:  level 0
1251693102.383086: sendbackup:  since NODATE
1251693102.383134: sendbackup:  options `'
1251693102.388833: sendbackup: start: bromine.example.com:/ lev 0
1251693102.391475: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1251693102.398325: sendbackup: Spawning /sbin/dump dump 0usf 1048576 -
/dev/rsd0a in pipeline
1251693102.403120: sendbackup: Started backup
1251693102.461663: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Sun Aug 30 22:31:42 2009
1251693102.468669: sendbackup:  90:  normal(|):   DUMP: Date of last level 0
dump: the epoch
1251693102.487536: sendbackup: Started index creator: /sbin/restore -tvf -
21 | sed -e '
s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251693102.519939: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd0a
(/) to standard output
1251693102.588086: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I)
[regular files]
1251693104.583392: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II)
[directories]
1251693104.585824: sendbackup:  90:  normal(|):   DUMP: estimated 35134 tape
blocks.
1251693104.595314: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at:
Sun Aug 30 22:31:44 2009
1251693104.615444: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III)
[directories]
1251693104.919691: sendbackup:  90:  normal(|):   DUMP: dumping (Pass IV)
[regular files]
1251693105.800795: sendbackup: critical (fatal): index tee cannot write
[Resource temporarily unavailable]
1251693105.803486: sendbackup: 115: strange(?): sendbackup: index tee cannot
write [Resource temporarily unavailable]
1251693105.827339: sendbackup:  90:  normal(|):   DUMP: Broken pipe
1251693105.832068: sendbackup:  90:  normal(|):   DUMP: The ENTIRE dump is
aborted.
1251693105.839675: sendbackup: critical (fatal): error [dump (7257)
/sbin/dump returned 3]

Second:

1251693123.840560: sendbackup: pid 32144 ruid 150 euid 150 version
2.6.2alpha: start at Sun Aug 30 22:32:03 2009
1251693123.841613: sendbackup: Version 2.6.2alpha
1251693123.892773: sendbackup: pid 32144 ruid 150 euid 150 version
2.6.2alpha: rename at Sun Aug 30 22:32:03 2009
1251693123.896360: sendbackup:   Parsed request as: program `DUMP'
1251693123.896437: sendbackup:  disk `/'
1251693123.896487: sendbackup:  device `/'
1251693123.896535: sendbackup:  level 0
1251693123.896584: sendbackup:  since NODATE
1251693123.896632: sendbackup:  options `'
1251693123.902332: sendbackup: start: bromine.example.com:/ lev 0
1251693123.905012: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1251693123.911778: sendbackup: Spawning /sbin/dump dump 0usf 1048576 -
/dev/rsd0a in pipeline
1251693123.916860: sendbackup: Started backup
1251693123.992261: sendbackup: Started index creator: /sbin/restore -tvf -
21 | sed -e '
s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251693123.995743: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Sun Aug 30 22:32:03 2009
1251693124.009013: sendbackup:  90:  normal(|):   DUMP: Date of last level 0
dump: the epoch
1251693124.038977: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd0a
(/) to standard output
1251693124.143454: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I)
[regular files]
1251693126.173916: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II)
[directories]
1251693126.176287: sendbackup:  90:  normal(|):   DUMP: estimated 35134 tape
blocks.
1251693126.185677: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at:
Sun Aug 30 22:32:06 2009
1251693126.205305: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III)
[directories]
1251693126.505055: sendbackup:  90:  normal(|):   DUMP: dumping (Pass IV)
[regular files]
1251693126.797271: sendbackup: critical (fatal): index tee cannot write
[Resource temporarily unavailable]
1251693126.799929: sendbackup: 115: strange(?): sendbackup: index tee cannot
write [Resource temporarily unavailable]
1251693126.821488: sendbackup:  90:  normal(|):   DUMP: Broken pipe
1251693126.825377: sendbackup:  90:  normal(|):   DUMP: The ENTIRE dump is
aborted.
1251693126.833573: sendbackup: critical (fatal): error [dump (22791)
/sbin/dump returned 3]

Thanks,
Michael

On Mon, Aug 31, 2009 at 

Re: Backup issues with OpenBSD 4.5 machines

2009-08-31 Thread Jean-Louis Martineau

Do you have the patch I sent to stan on this system?

The patch check before and after the write if the pipe is in O_NONBLOCK 
or not and give an error if it is.


I'm totally lost since it is in blocking mode and you get EAGAIN, which 
is impossible


Jean-louis

Michael Burk wrote:

Here are the two sendbackup.*.debug files for the / fs.

First:

1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version 
2.6.2alpha: start at Sun Aug 30 22:31:42 2009

1251693102.345326: sendbackup: Version 2.6.2alpha
1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version 
2.6.2alpha: rename at Sun Aug 30 22:31:42 2009

1251693102.382860: sendbackup:   Parsed request as: program `DUMP'
1251693102.382940: sendbackup:  disk `/'
1251693102.382991: sendbackup:  device `/'
1251693102.383039: sendbackup:  level 0
1251693102.383086: sendbackup:  since NODATE
1251693102.383134: sendbackup:  options `'
1251693102.388833: sendbackup: start: bromine.example.com:/ lev 0
1251693102.391475: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1251693102.398325: sendbackup: Spawning /sbin/dump dump 0usf 1048576 
- /dev/rsd0a in pipeline

1251693102.403120: sendbackup: Started backup
1251693102.461663: sendbackup:  90:  normal(|):   DUMP: Date of this 
level 0 dump: Sun Aug 30 22:31:42 2009
1251693102.468669: sendbackup:  90:  normal(|):   DUMP: Date of last 
level 0 dump: the epoch
1251693102.487536: sendbackup: Started index creator: /sbin/restore 
-tvf - 21 | sed -e '

s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251693102.519939: sendbackup:  90:  normal(|):   DUMP: Dumping 
/dev/rsd0a (/) to standard output
1251693102.588086: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
I) [regular files]
1251693104.583392: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
II) [directories]
1251693104.585824: sendbackup:  90:  normal(|):   DUMP: estimated 
35134 tape blocks.
1251693104.595314: sendbackup:  90:  normal(|):   DUMP: Volume 1 
started at: Sun Aug 30 22:31:44 2009
1251693104.615444: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
III) [directories]
1251693104.919691: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
IV) [regular files]
1251693105.800795: sendbackup: critical (fatal): index tee cannot 
write [Resource temporarily unavailable]
1251693105.803486: sendbackup: 115: strange(?): sendbackup: index tee 
cannot write [Resource temporarily unavailable]

1251693105.827339: sendbackup:  90:  normal(|):   DUMP: Broken pipe
1251693105.832068: sendbackup:  90:  normal(|):   DUMP: The ENTIRE 
dump is aborted.
1251693105.839675: sendbackup: critical (fatal): error [dump (7257) 
/sbin/dump returned 3]


Second:

1251693123.840560: sendbackup: pid 32144 ruid 150 euid 150 version 
2.6.2alpha: start at Sun Aug 30 22:32:03 2009

1251693123.841613: sendbackup: Version 2.6.2alpha
1251693123.892773: sendbackup: pid 32144 ruid 150 euid 150 version 
2.6.2alpha: rename at Sun Aug 30 22:32:03 2009

1251693123.896360: sendbackup:   Parsed request as: program `DUMP'
1251693123.896437: sendbackup:  disk `/'
1251693123.896487: sendbackup:  device `/'
1251693123.896535: sendbackup:  level 0
1251693123.896584: sendbackup:  since NODATE
1251693123.896632: sendbackup:  options `'
1251693123.902332: sendbackup: start: bromine.example.com:/ lev 0
1251693123.905012: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1251693123.911778: sendbackup: Spawning /sbin/dump dump 0usf 1048576 
- /dev/rsd0a in pipeline

1251693123.916860: sendbackup: Started backup
1251693123.992261: sendbackup: Started index creator: /sbin/restore 
-tvf - 21 | sed -e '

s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251693123.995743: sendbackup:  90:  normal(|):   DUMP: Date of this 
level 0 dump: Sun Aug 30 22:32:03 2009
1251693124.009013: sendbackup:  90:  normal(|):   DUMP: Date of last 
level 0 dump: the epoch
1251693124.038977: sendbackup:  90:  normal(|):   DUMP: Dumping 
/dev/rsd0a (/) to standard output
1251693124.143454: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
I) [regular files]
1251693126.173916: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
II) [directories]
1251693126.176287: sendbackup:  90:  normal(|):   DUMP: estimated 
35134 tape blocks.
1251693126.185677: sendbackup:  90:  normal(|):   DUMP: Volume 1 
started at: Sun Aug 30 22:32:06 2009
1251693126.205305: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
III) [directories]
1251693126.505055: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
IV) [regular files]
1251693126.797271: sendbackup: critical (fatal): index tee cannot 
write [Resource temporarily unavailable]
1251693126.799929: sendbackup: 115: strange(?): sendbackup: index tee 

Re: Backup issues with OpenBSD 4.5 machines

2009-08-31 Thread Michael Burk
Hi Jean-Louis,

I thought I had applied the patches on this machine also, but it turns out I
didn't (sorry about that). I applied the patches and ran a new dump. This
time all 4 file systems succeeded, though /usr got the same strange
message as on the other machine:

1251752282.483677: sendbackup: pid 15574 ruid 150 euid 150 version
2.6.2alpha: start at Mon Aug 31 14:58:02 2009
1251752282.484788: sendbackup: Version 2.6.2alpha
1251752282.523952: sendbackup: pid 15574 ruid 150 euid 150 version
2.6.2alpha: rename at Mon Aug 31 14:58:02 2009
1251752282.527795: sendbackup:   Parsed request as: program `DUMP'
1251752282.527880: sendbackup:  disk `/usr'
1251752282.527931: sendbackup:  device `/usr'
1251752282.527978: sendbackup:  level 0
1251752282.528025: sendbackup:  since NODATE
1251752282.528072: sendbackup:  options `'
1251752282.533704: sendbackup: start: bromine.example.com:/usr lev 0
1251752282.537190: sendbackup: dumping device '/dev/rsd0d' with 'ffs'
1251752282.543664: sendbackup: Spawning /sbin/dump dump 0usf 1048576 -
/dev/rsd0d in pipeline
1251752282.548532: sendbackup: Started backup
1251752282.561558: sendbackup: fd 3 is set with O_NONBLOCK before dup2
1251752282.625055: sendbackup: Started index creator: /sbin/restore -tvf -
21 | sed -e '
s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251752282.627853: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Mon Aug 31 14:58:02 2009
1251752282.641040: sendbackup:  90:  normal(|):   DUMP: Date of last level 0
dump: the epoch
1251752282.754862: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd0d
(/usr) to standard output
1251752282.792706: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I)
[regular files]
1251752286.777085: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II)
[directories]
1251752286.779538: sendbackup:  90:  normal(|):   DUMP: estimated 653358
tape blocks.
1251752286.789190: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at:
Mon Aug 31 14:58:06 2009
1251752286.827782: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III)
[directories]
1251752290.646004: sendbackup:  90:  normal(|):   DUMP: dumping (Pass IV)
[regular files]
1251752290.822019: sendbackup: 115: strange(?): sed: stdout: Resource
temporarily unavailable
1251752424.852658: sendbackup:  43:size(|):   DUMP: 664333 tape blocks
1251752424.854957: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Mon Aug 31 14:58:02 2009
1251752424.857083: sendbackup:  90:  normal(|):   DUMP: Volume 1 completed
at: Mon Aug 31 15:00:24 2009
1251752424.859064: sendbackup:  90:  normal(|):   DUMP: Volume 1 took
0:02:18
1251752424.860994: sendbackup:  90:  normal(|):   DUMP: Volume 1 transfer
rate: 4814 KB/s
1251752424.863042: sendbackup:  90:  normal(|):   DUMP: Date this dump
completed:  Mon Aug 31 15:00:24 2009
1251752424.864993: sendbackup:  90:  normal(|):   DUMP: Average transfer
rate: 4814 KB/s
1251752424.869138: sendbackup:  90:  normal(|):   DUMP: level 0 dump on Mon
Aug 31 14:58:02 2009
1251752424.878026: sendbackup:  90:  normal(|):   DUMP: DUMP IS DONE
1251752424.888324: sendbackup: Index pipe exited with status 1
1251752424.906986: sendbackup: Parsed backup messages
1251752424.907361: sendbackup: pid 15574 finish time Mon Aug 31 15:00:24
2009

So now I've seen the same behavior that Stan noted: applying the debug patch
seems to fix the dump.

Also, in case you didn't see it before, here is a snippet from an email
thread Stan started on an OpenBSD list:
   I am trying to get Amanda of a recent vintage working on 4.5. The
developrs
   don't understand how writing to a blocking pipe can return EAGAIN.
   Should the write just block?

 Yes, it should, and a quick, about-to-go-to-bed readthrough of the
 pipe write path shows that it shouldn't; is Amanda doing fcntls to
 the fd? If so, check those to be absolutely sure that they're not
 doing something hinkey.

Thanks,
Michael

On Mon, Aug 31, 2009 at 2:41 PM, Jean-Louis Martineau
martin...@zmanda.comwrote:

 Do you have the patch I sent to stan on this system?

 The patch check before and after the write if the pipe is in O_NONBLOCK or
 not and give an error if it is.

 I'm totally lost since it is in blocking mode and you get EAGAIN, which is
 impossible

 Jean-louis

 Michael Burk wrote:

 Here are the two sendbackup.*.debug files for the / fs.

 First:

 1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version
 2.6.2alpha: start at Sun Aug 30 22:31:42 2009
 1251693102.345326: sendbackup: Version 2.6.2alpha
 1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version
 2.6.2alpha: rename at Sun Aug 30 22:31:42 2009
 1251693102.382860: sendbackup:   Parsed request as: program `DUMP'
 1251693102.382940: sendbackup:  disk `/'
 1251693102.382991: sendbackup:  device `/'
 

Re: Backup issues with OpenBSD 4.5 machines

2009-08-31 Thread Jean-Louis Martineau

amanda doesn't do fcntl on these file descriptor.
But my patch do: fcntl(fd, F_GETFL, 0) to check if it have O_NONBLOCK 
set, the patch doesn't change it.
Can you try the attached patch, it do the same trick for the index file 
descriptor (remove previous patch before applying).


Jean-Louis

Michael Burk wrote:

Hi Jean-Louis,

I thought I had applied the patches on this machine also, but it turns 
out I didn't (sorry about that). I applied the patches and ran a new 
dump. This time all 4 file systems succeeded, though /usr got the same 
strange message as on the other machine:


1251752282.483677: sendbackup: pid 15574 ruid 150 euid 150 version 
2.6.2alpha: start at Mon Aug 31 14:58:02 2009

1251752282.484788: sendbackup: Version 2.6.2alpha
1251752282.523952: sendbackup: pid 15574 ruid 150 euid 150 version 
2.6.2alpha: rename at Mon Aug 31 14:58:02 2009

1251752282.527795: sendbackup:   Parsed request as: program `DUMP'
1251752282.527880: sendbackup:  disk `/usr'
1251752282.527931: sendbackup:  device `/usr'
1251752282.527978: sendbackup:  level 0
1251752282.528025: sendbackup:  since NODATE
1251752282.528072: sendbackup:  options `'
1251752282.533704: sendbackup: start: bromine.example.com:/usr lev 0
1251752282.537190: sendbackup: dumping device '/dev/rsd0d' with 'ffs'
1251752282.543664: sendbackup: Spawning /sbin/dump dump 0usf 1048576 
- /dev/rsd0d in pipeline

1251752282.548532: sendbackup: Started backup
1251752282.561558: sendbackup: fd 3 is set with O_NONBLOCK before dup2
1251752282.625055: sendbackup: Started index creator: /sbin/restore 
-tvf - 21 | sed -e '

s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251752282.627853: sendbackup:  90:  normal(|):   DUMP: Date of this 
level 0 dump: Mon Aug 31 14:58:02 2009
1251752282.641040: sendbackup:  90:  normal(|):   DUMP: Date of last 
level 0 dump: the epoch
1251752282.754862: sendbackup:  90:  normal(|):   DUMP: Dumping 
/dev/rsd0d (/usr) to standard output
1251752282.792706: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
I) [regular files]
1251752286.777085: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
II) [directories]
1251752286.779538: sendbackup:  90:  normal(|):   DUMP: estimated 
653358 tape blocks.
1251752286.789190: sendbackup:  90:  normal(|):   DUMP: Volume 1 
started at: Mon Aug 31 14:58:06 2009
1251752286.827782: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
III) [directories]
1251752290.646004: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
IV) [regular files]
1251752290.822019: sendbackup: 115: strange(?): sed: stdout: Resource 
temporarily unavailable

1251752424.852658: sendbackup:  43:size(|):   DUMP: 664333 tape blocks
1251752424.854957: sendbackup:  90:  normal(|):   DUMP: Date of this 
level 0 dump: Mon Aug 31 14:58:02 2009
1251752424.857083: sendbackup:  90:  normal(|):   DUMP: Volume 1 
completed at: Mon Aug 31 15:00:24 2009
1251752424.859064: sendbackup:  90:  normal(|):   DUMP: Volume 1 took 
0:02:18
1251752424.860994: sendbackup:  90:  normal(|):   DUMP: Volume 1 
transfer rate: 4814 KB/s
1251752424.863042: sendbackup:  90:  normal(|):   DUMP: Date this dump 
completed:  Mon Aug 31 15:00:24 2009
1251752424.864993: sendbackup:  90:  normal(|):   DUMP: Average 
transfer rate: 4814 KB/s
1251752424.869138: sendbackup:  90:  normal(|):   DUMP: level 0 dump 
on Mon Aug 31 14:58:02 2009

1251752424.878026: sendbackup:  90:  normal(|):   DUMP: DUMP IS DONE
1251752424.888324: sendbackup: Index pipe exited with status 1
1251752424.906986: sendbackup: Parsed backup messages
1251752424.907361: sendbackup: pid 15574 finish time Mon Aug 31 
15:00:24 2009


So now I've seen the same behavior that Stan noted: applying the debug 
patch seems to fix the dump.


Also, in case you didn't see it before, here is a snippet from an 
email thread Stan started on an OpenBSD list:
   I am trying to get Amanda of a recent vintage working on 4.5. The 
developrs

   don't understand how writing to a blocking pipe can return EAGAIN.
   Should the write just block?

 Yes, it should, and a quick, about-to-go-to-bed readthrough of the
 pipe write path shows that it shouldn't; is Amanda doing fcntls to
 the fd? If so, check those to be absolutely sure that they're not
 doing something hinkey.

Thanks,
Michael

On Mon, Aug 31, 2009 at 2:41 PM, Jean-Louis Martineau 
martin...@zmanda.com mailto:martin...@zmanda.com wrote:


Do you have the patch I sent to stan on this system?

The patch check before and after the write if the pipe is in
O_NONBLOCK or not and give an error if it is.

I'm totally lost since it is in blocking mode and you get EAGAIN,
which is impossible

Jean-louis

Michael Burk wrote:

Here are the two sendbackup.*.debug files for the / fs.

First:

1251693102.343436: sendbackup: pid 29087 ruid 150 euid 

Re: Backup issues with OpenBSD 4.5 machines

2009-08-30 Thread Michael Burk
Hello,

I applied the patches to the 20090827 snapshot. I tried it on two OpenBSD
4.5 sparc64 systems, forcing both to do full backups. One system seemed to
work on all 3 file systems, the other failed on all 4 file systems with the
same errors as before (exactly like what Stan reported). I'm using bsdtcp
auth.

The system that worked, however, did give a strange result in the report
on one filesystem. Here's the corresponding sendbackup debug file:

1251692700.529842: sendbackup: pid 30459 ruid 150 euid 150 version
2.6.2alpha: start at Sun Aug 30 2
2:25:00 2009
1251692700.530411: sendbackup: Version 2.6.2alpha
1251692700.553135: sendbackup: pid 30459 ruid 150 euid 150 version
2.6.2alpha: rename at Sun Aug 30
22:25:00 2009
1251692700.555394: sendbackup:   Parsed request as: program `DUMP'
1251692700.555435: sendbackup:  disk `/usr'
1251692700.555461: sendbackup:  device `/usr'
1251692700.555484: sendbackup:  level 0
1251692700.07: sendbackup:  since NODATE
1251692700.30: sendbackup:  options `'
1251692700.556750: sendbackup: start: zirconium.example.com:/usr lev 0
1251692700.559495: sendbackup: dumping device '/dev/rsd1d' with 'ffs'
1251692700.566094: sendbackup: Spawning /sbin/dump dump 0usf 1048576 -
/dev/rsd1d in pipeline
1251692700.571185: sendbackup: Started backup
1251692700.572425: sendbackup: fd 3 is set with O_NONBLOCK before dup2
1251692700.597388: sendbackup: Started index creator: /sbin/restore -tvf -
21 | sed -e '
s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251692700.611526: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Sun Aug 30 22:25:
00 2009
1251692700.669703: sendbackup:  90:  normal(|):   DUMP: Date of last level 0
dump: the epoch
1251692700.670724: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd1d
(/usr) to standard output
1251692700.763600: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I)
[regular files]
1251692705.618345: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II)
[directories]
1251692705.620040: sendbackup:  90:  normal(|):   DUMP: estimated 1330085
tape blocks.
1251692705.623976: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at:
Sun Aug 30 22:25:05 2009
1251692705.650687: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III)
[directories]
1251692745.051547: sendbackup:  90:  normal(|):   DUMP: dumping (Pass IV)
[regular files]
1251692747.915767: sendbackup: 115: strange(?): sed: stdout: Resource
temporarily unavailable
1251692954.633083: sendbackup:  43:size(|):   DUMP: 1437829 tape blocks
1251692954.637414: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Sun Aug 30 22:25:00 2009
1251692954.641414: sendbackup:  90:  normal(|):   DUMP: Volume 1 completed
at: Sun Aug 30 22:29:14 2009
1251692954.641858: sendbackup: Index pipe exited with status 1
1251692954.642719: sendbackup:  90:  normal(|):   DUMP: Volume 1 took
0:04:09
1251692954.643918: sendbackup:  90:  normal(|):   DUMP: Volume 1 transfer
rate: 5774 KB/s
1251692954.645095: sendbackup:  90:  normal(|):   DUMP: Date this dump
completed:  Sun Aug 30 22:29:14 2009
1251692954.646239: sendbackup:  90:  normal(|):   DUMP: Average transfer
rate: 5774 KB/s
1251692954.647430: sendbackup:  90:  normal(|):   DUMP: level 0 dump on Sun
Aug 30 22:25:00 2009
1251692954.653480: sendbackup:  90:  normal(|):   DUMP: DUMP IS DONE
1251692954.653841: sendbackup: Parsed backup messages
1251692954.654032: sendbackup: pid 30459 finish time Sun Aug 30 22:29:14
2009

-- Michael

On Fri, Aug 28, 2009 at 5:22 AM, stan st...@panix.com wrote:

 On Thu, Aug 27, 2009 at 03:35:47PM -0600, Michael Burk wrote:
  Cool - can you send me the patch? Or is it already in the 0826 snapshot?
 
  FYI - I got 2.5.0p2 working, but could never get 2.5.1p3 working, even
 with
  the same config. No idea why.
  I started looking at the source code last night in the 0825 snapshot.
 I'll
  discontinue that search if there's a patch.
 
 This is interesting. I sent Jean-Louis Martineau an email, saying teaks for
 fixing things, and he replied that the patch he sent me just added
 debugging,
 it was not intended to fix anything.

 Bit, using this snapshot 20090813, and the patch, which I will send to you.
 I have it working on my 3 OpenBSD 4.5 machines that I have upgraded so far.
 I ran several test yesterday, and a full fledged backup run last night,
 and I have not seen any issues since installing this code.

 I am going to attach the patch to this email.

 I would greatly appreciate it, if you would apply this patch to the above
 snapshot, test, and tell me what your results are.

 I am very puzzled at the moment.

 BTW, is your failure the same as mine? That is amcheck passes, and most
 times you get a PARTIAL backup?

 BTW, I am using bsdtcp auth and client compression now, but I have made
 this work with this code, with bsd auth, and no 

Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread stan
On Fri, Aug 21, 2009 at 03:53:13PM -0600, Michael Burk wrote:
 Stan's not alone on this one. I have two OpenBSD 4.5 machines also on Sun
 SPARC hardware. I had this same trouble a couple months ago with 2.6.1, but
 didn't have time to look deeper. After seeing this discussion, I built
 amanda-2.6.2alpha-20090812 (can't get 0820 to compile). I'm getting exactly
 the same errors. I tried dump with and without compression.
 
 My Amanda server is on Solaris 10, and I have a mix of Solaris, Linux, and
 OpenBSD systems. Except for the above-mentioned build, everything is at
 2.6.1p1. There are no firewalls on the Amanda server or the OpenBSD clients.
 
 I know me too responses aren't usually helpful, but maybe this will rule
 out the firewall question, at least.
 
 

Thanks for confirming the problem report. 

Any cn
hance of you trying 2.500, and 2.5.2 to see if the first works, and the 2nd
doesn't? Are you using bsd auth?

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread Michael Burk
I'm using bsdtcp auth.
I'll try 2.5.1 today or tomorrow, to see if I can narrow down the range of
releases in which OpenBSD support broke.

-- Michael

On Mon, Aug 24, 2009 at 5:17 AM, stan st...@panix.com wrote:

 On Fri, Aug 21, 2009 at 03:53:13PM -0600, Michael Burk wrote:
  Stan's not alone on this one. I have two OpenBSD 4.5 machines also on Sun
  SPARC hardware. I had this same trouble a couple months ago with 2.6.1,
 but
  didn't have time to look deeper. After seeing this discussion, I built
  amanda-2.6.2alpha-20090812 (can't get 0820 to compile). I'm getting
 exactly
  the same errors. I tried dump with and without compression.
 
  My Amanda server is on Solaris 10, and I have a mix of Solaris, Linux,
 and
  OpenBSD systems. Except for the above-mentioned build, everything is at
  2.6.1p1. There are no firewalls on the Amanda server or the OpenBSD
 clients.
 
  I know me too responses aren't usually helpful, but maybe this will
 rule
  out the firewall question, at least.
 
 

 Thanks for confirming the problem report.

 Any cn
 hance of you trying 2.500, and 2.5.2 to see if the first works, and the 2nd
 doesn't? Are you using bsd auth?

 --
 One of the main causes of the fall of the roman empire was that, lacking
 zero, they had no way to indicate successful termination of their C
 programs.



Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread Jean-Louis Martineau

This bug can't be fixed until we understand it.

Ask on a OpenBSD list how a write to a blocking pipe can return EAGAIN.
Or the pipe semantics changed and they don't default to blocking.

You can also try the attached, it check the pipe are opened in blocking 
mode.


Jean-Louis

stan wrote:

On Fri, Aug 21, 2009 at 01:23:29PM -0600, John Hein wrote:
  

stan wrote at 13:56 -0400 on Aug 21, 2009:
  OK, I reproduced the failure with only a crossover cable between the test
  client and the Amanda Master:

Just because you're using a crossover cable doesn't rule out firewall
or other such socket level interference.  I'm not saying that's your
problem, but using a crossover cable doesn't rule it out.



Fair enough, but I think we can rule that out in this case. First, I
control the 2 computersin question. The Master backus up 55 machines a day
taht consist of a diverse nix of OS'es, and Amanda versions. It's a Linux
machine, and I am not running a firwall on it. The client is an OpenBSD
machine. I use these for my firewalls, so I am familiar with configuring
firewalls on them, Doing so requires (as a minimum) running pf. I am not
running pf on the test machine at all.

I am not certain whether the sendbackup executable, or amandad opens the
port on the client side. My guess is that it is not suceding in doing this.
Thus when the Master tries to connect to it, there is no port for it to
connect to.

  


Index: amandad-src/amandad.c
===
--- amandad-src/amandad.c	(revision 2110)
+++ amandad-src/amandad.c	(working copy)
@@ -1517,6 +1517,7 @@
 struct active_service *as;
 pid_t pid;
 int newfd;
+int r;
 
 assert(security_handle != NULL);
 assert(cmd != NULL);
@@ -1535,15 +1536,40 @@
 	error(_(pipe: %s\n), strerror(errno));
 	/*NOTREACHED*/
 	}
+	r = fcntl(data_read[i][0], F_GETFL, 0);
+	if (r  O_NONBLOCK) {
+	error(_(data_read[%d][0] is non blocking), i);
+	}
+	r = fcntl(data_read[i][1], F_GETFL, 0);
+	if (r  O_NONBLOCK) {
+	error(_(data_read[%d][1] is non blocking), i);
+	}
+
 	if (pipe(data_write[i])  0) {
 	error(_(pipe: %s\n), strerror(errno));
 	/*NOTREACHED*/
 	}
+	r = fcntl(data_write[i][0], F_GETFL, 0);
+	if (r  O_NONBLOCK) {
+	error(_(data_write[%d][0] is non blocking), i);
+	}
+	r = fcntl(data_write[i][1], F_GETFL, 0);
+	if (r  O_NONBLOCK) {
+	error(_(data_write[%d][1] is non blocking), i);
+	}
 }
 if (pipe(data_write[STDERR_PIPE])  0) {
 	error(_(pipe: %s\n), strerror(errno));
 	/*NOTREACHED*/
 }
+r = fcntl(data_write[STDERR_PIPE][0], F_GETFL, 0);
+if (r  O_NONBLOCK) {
+	error(_(data_write[%d][0] is non blocking), STDERR_PIPE);
+}
+r = fcntl(data_write[STDERR_PIPE][1], F_GETFL, 0);
+if (r  O_NONBLOCK) {
+	error(_(data_write[%d][1] is non blocking), STDERR_PIPE);
+}
 
 switch(pid = fork()) {
 case -1:


Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread stan
On Mon, Aug 24, 2009 at 02:01:09PM -0400, Jean-Louis Martineau wrote:
 This bug can't be fixed until we understand it.

Agreed.
 
 Ask on a OpenBSD list how a write to a blocking pipe can return EAGAIN.
 Or the pipe semantics changed and they don't default to blocking.

I already posted the code snipet to the OpenBSD list, but no one has
commented on it yet. I have found, in the past, that having a simple
reproducable free standing code snippet that demonstrates the problem, can
often result in a better result on mailing lists.

I have not had time to try to understnad the code well enough to produce
such an example, yet.
 


-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread stan
On Mon, Aug 24, 2009 at 02:01:09PM -0400, Jean-Louis Martineau wrote:
 
 You can also try the attached, it check the pipe are opened in blocking 
 mode.
 

I will try to test that today.

Thanks.

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread stan
On Mon, Aug 24, 2009 at 02:01:09PM -0400, Jean-Louis Martineau wrote:
 
 You can also try the attached, it check the pipe are opened in blocking 
 mode.
 
OK, I applied that to 2.61 (after figuriing out it would not apply to 2.5.2
:-)).

Of 4 DLE's on the test OpenBSD machine 3failed with PARTIAL, one tiny one
syas complete, and the only signifcant;y sized on seems to be stuck forever
at 0.04%, at least that's what amstatus has been reporting for 15 minutes.

I grep'ed the amanda debug files for pipe (that's what the new error
messages start with right?) and did not find any instances of this.

I can send this debug files, if it will help.


-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread stan
On Fri, Aug 21, 2009 at 09:57:36AM -0600, John Hein wrote:
 stan wrote at 10:56 -0400 on Aug 21, 2009:
   OK here is the latest on this saga :-)
   
   On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to
   back this machine up successfully (using classic UDP based authentication)
   
   On another of them, I built 2.5.2p1. The first attempt to back this machine
   up failed. I checked the log files, and found they were having issues
   because /etc/amdates was missing. I corrected that, and started a 2nd
   backup run. (Remember amcheck reports all is well with this machine). I 
   got the following from amstatus when I attempted to back up this machine.
   Also remember, one of the test I ran with a 2.6.1 client was to connect a
   test machine directly to the client, using a crossover cable to eliminate
   any firewall, or router type issues.
   
   I am attaching, what I think is, the amadnad debug file associated with 
 this
   failure.
   
   Can anyone suggest what I can do to further troubleshoot this?
   
   pb48:wd0f 1  dumper: [could not connect DATA stream:
   can't connect stream to pb48.meadwestvaco.com port 11996: Connection
   refused] (10:37:27)
   
.
.
.
   amandad: time 30.019: stream_accept: timeout after 30 seconds
   amandad: time 30.019: security_stream_seterr(0x86b67000, can't accept new 
 stream connection: No such file or directory)
   amandad: time 30.019: stream 0 accept failed: unknown protocol error
   amandad: time 30.019: security_stream_close(0x86b67000)
   amandad: time 60.027: stream_accept: timeout after 30 seconds
   amandad: time 60.027: security_stream_seterr(0x81212000, can't accept new 
 stream connection: No such file or directory)
   amandad: time 60.027: stream 1 accept failed: unknown protocol error
   amandad: time 60.027: security_stream_close(0x81212000)
   amandad: time 90.035: stream_accept: timeout after 30 seconds
   amandad: time 90.036: security_stream_seterr(0x84877000, can't accept new 
 stream connection: No such file or directory)
   amandad: time 90.036: stream 2 accept failed: unknown protocol error
   amandad: time 90.036: security_stream_close(0x84877000)
   amandad: time 90.036: security_close(handle=0x81bbf800, driver=0x298a9240 
 (BSD))
   amandad: time 120.044: pid 17702 finish time Fri Aug 21 10:39:27 2009
 
 For some reason the socket is not getting marked ready for read.
 select(2) is timing out waiting.  Firewall setup perhaps?
 
 This bit of code in 2.5.2p1's common-src/stream.c is where
 the failure is happening for you...
 
 int
 stream_accept(
 int server_socket,
 int timeout,
 size_t sendsize,
 size_t recvsize)
 {
 SELECT_ARG_TYPE readset;
 struct timeval tv;
 int nfound, connected_socket;
 int save_errno;
 int ntries = 0;
 in_port_t port;
 
 assert(server_socket = 0);
 
 do {
 ntries++;
 memset(tv, 0, SIZEOF(tv));
 tv.tv_sec = timeout;
 memset(readset, 0, SIZEOF(readset));
 FD_ZERO(readset);
 FD_SET(server_socket, readset);
 nfound = select(server_socket+1, readset, NULL, NULL, tv);
 if(nfound = 0 || !FD_ISSET(server_socket, readset)) {
 save_errno = errno;
 if(nfound  0) {
 dbprintf((%s: stream_accept: select() failed: %s\n,
   debug_prefix_time(NULL),
   strerror(save_errno)));
 } else if(nfound == 0) {
 dbprintf((%s: stream_accept: timeout after %d second%s\n,
   debug_prefix_time(NULL),
   timeout,
   (timeout == 1) ?  : s));
 errno = ENOENT; /* ??? */
 return -1;
The firts thing I notice when comparing this function in 2.5.0 vs 2.5.2 is
that 2.5.0 does:

tv.tv_usec = 0;

and 2.5.2 does not. Could thim make a difference? Both do 

tv.tv_sec = timeout;

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread John Hein
stan wrote at 16:59 -0400 on Aug 24, 2009:
  The firts thing I notice when comparing this function in 2.5.0 vs 2.5.2 is
  that 2.5.0 does:
  
  tv.tv_usec = 0;
  
  and 2.5.2 does not. Could thim make a difference? Both do 
  
  tv.tv_sec = timeout;

In 2.5.2, the memset sets the entire struct to 0.
2.5.0 is slightly more efficient, but otherwise the results
wind up being the same.

Nothing to see there.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-21 Thread stan
On Tue, Aug 18, 2009 at 08:06:46AM -0400, Jean-Louis Martineau wrote:
 stan wrote:
 Any thoughts as to why I have twice as many debug files as I expect?
   
 
 When a dump fail, amanda try it a second time.
 

OK here is the latest on this saga :-)

On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to
back this machine up successfully (using classic UDP based authentication)

On another of them, I built 2.5.2p1. The first attempt to back this machine
up failed. I checked the log files, and found they were having issues
because /etc/amdates was missing. I corrected that, and started a 2nd
backup run. (Remember amcheck reports all is well with this machine). I 
got the following from amstatus when I attempted to back up this machine.
Also remember, one of the test I ran with a 2.6.1 client was to connect a
test machine directly to the client, using a crossover cable to eliminate
any firewall, or router type issues.

I am attaching, what I think is, the amadnad debug file associated with this
failure.

Can anyone suggest what I can do to further troubleshoot this?

pb48:wd0f 1  dumper: [could not connect DATA stream:
can't connect stream to pb48.meadwestvaco.com port 11996: Connection
refused] (10:37:27)


-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.
amandad: debug 1 pid 17702 ruid 1002 euid 1002: start at Fri Aug 21 10:37:27 
2009
Reading conf file /opt/amanda/etc/amanda-client.conf.
amandad: time 0.000: security_getdriver(name=BSD) returns 0x298a9240
amandad: version 2.5.2p1
amandad: time 0.001: build: VERSION=Amanda-2.5.2p1
amandad: time 0.001:BUILT_DATE=Fri Aug 21 07:26:22 EDT 2009
amandad: time 0.001:BUILT_MACH=OpenBSD pb48 4.5 GENERIC#1749 i386 
CC=gcc
amandad: time 0.001:CONFIGURE_COMMAND='./configure' 
'--with-config=DailyDump' '--prefix=/opt/amanda' 
'--with-index-server=amanda.meadwestvaco.com' 
'--with-tape-server=amanda.meadwestvaco.com' '--with-tape-device=chg-multi' 
'--with-changer-device=chg-multi' '--with-user=amanda' '--with-group=operator' 
'--with-debugging' '--sbindir=/opt/amanda/sbin' 
'--libexecdir=/opt/amanda/libexec' '--libdir=/opt/amanda/lib' 
'--with-configdir=/opt/amanda/etc' 
'--with-gnutar-listdir=/opt/amanda/var/gnutar-lists' '--mandir=/opt/amanda/man' 
'--with-buffered-dump' '--without-rundump'
amandad: time 0.001: paths: bindir=/opt/amanda/bin sbindir=/opt/amanda/sbin
amandad: time 0.001:libexecdir=/opt/amanda/libexec 
mandir=/opt/amanda/man
amandad: time 0.001:AMANDA_TMPDIR=/tmp/amanda 
AMANDA_DBGDIR=/tmp/amanda
amandad: time 0.001:CONFIG_DIR=/opt/amanda/etc DEV_PREFIX=/dev/
amandad: time 0.001:RDEV_PREFIX=/dev/r DUMP=/sbin/dump
amandad: time 0.001:RESTORE=/sbin/restore VDUMP=UNDEF VRESTORE=UNDEF
amandad: time 0.001:XFSDUMP=UNDEF XFSRESTORE=UNDEF VXDUMP=UNDEF 
VXRESTORE=UNDEF
amandad: time 0.001:SAMBA_CLIENT=/usr/local/bin/smbclient
amandad: time 0.001:GNUTAR=/usr/local/bin/gtar 
COMPRESS_PATH=/usr/bin/gzip
amandad: time 0.001:UNCOMPRESS_PATH=/usr/bin/gzip 
LPRCMD=/usr/bin/lpr
amandad: time 0.001:MAILER=/usr/bin/Mail
amandad: time 0.001:listed_incr_dir=/opt/amanda/var/gnutar-lists
amandad: time 0.001: defs:  DEFAULT_SERVER=amanda.meadwestvaco.com
amandad: time 0.001:DEFAULT_CONFIG=DailyDump
amandad: time 0.001:DEFAULT_TAPE_SERVER=amanda.meadwestvaco.com
amandad: time 0.001:DEFAULT_TAPE_DEVICE=chg-multi HAVE_MMAP 
NEED_STRSTR
amandad: time 0.001:HAVE_SYSVSHM LOCKING=**NONE** DEBUG_CODE 
AMANDA_DEBUG_DAYS=4
amandad: time 0.001:BSD_SECURITY RSH_SECURITY USE_AMANDAHOSTS
amandad: time 0.001:CLIENT_LOGIN=amanda FORCE_USERID HAVE_GZIP
amandad: time 0.001:COMPRESS_SUFFIX=.gz COMPRESS_FAST_OPT=--fast
amandad: time 0.001:COMPRESS_BEST_OPT=--best UNCOMPRESS_OPT=-dc
amandad: time 0.001: dgram_recv(dgram=0x298ad244, timeout=0, 
fromaddr=0x298bd230)
amandad: time 0.001: (sockaddr_in *)0x298bd230 = { 2, 927, 10.209.129.22 }
amandad: time 0.001: security_handleinit(handle=0x81bbf800, driver=0x298a9240 
(BSD))
amandad: time 0.006: accept recv REQ pkt:

SERVICE sendbackup
OPTIONS features=9ffe7f;hostname=pb48;config=DailyDump;
DUMP wd0f  1 2009:8:20:2:48:55 OPTIONS |;auth=BSD;index;

amandad: time 0.006: creating new service: sendbackup
OPTIONS features=9ffe7f;hostname=pb48;config=DailyDump;
DUMP wd0f  1 2009:8:20:2:48:55 OPTIONS |;auth=BSD;index;

amandad: time 0.007: sending ACK pkt:


amandad: time 0.007: dgram_send_addr(addr=0x81bbf820, dgram=0x298ad244)
amandad: time 0.007: (sockaddr_in *)0x81bbf820 = { 2, 927, 10.209.129.22 }
amandad: time 0.007: dgram_send_addr: 0x298ad244-socket = 0
amandad: time 0.013: security_streaminit(stream=0x86b67000, driver=0x298a9240 
(BSD))
amandad: time 0.013: try_socksize: send buffer size is 65536

Re: Backup issues with OpenBSD 4.5 machines

2009-08-21 Thread stan
On Tue, Aug 18, 2009 at 08:06:46AM -0400, Jean-Louis Martineau wrote:
 stan wrote:
 Any thoughts as to why I have twice as many debug files as I expect?
   
 
One thing I forgot to put in the previous message is that 10.209.129.22 is
the Amanda Master machine's address.

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-21 Thread John Hein
stan wrote at 10:56 -0400 on Aug 21, 2009:
  OK here is the latest on this saga :-)
  
  On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to
  back this machine up successfully (using classic UDP based authentication)
  
  On another of them, I built 2.5.2p1. The first attempt to back this machine
  up failed. I checked the log files, and found they were having issues
  because /etc/amdates was missing. I corrected that, and started a 2nd
  backup run. (Remember amcheck reports all is well with this machine). I 
  got the following from amstatus when I attempted to back up this machine.
  Also remember, one of the test I ran with a 2.6.1 client was to connect a
  test machine directly to the client, using a crossover cable to eliminate
  any firewall, or router type issues.
  
  I am attaching, what I think is, the amadnad debug file associated with this
  failure.
  
  Can anyone suggest what I can do to further troubleshoot this?
  
  pb48:wd0f 1  dumper: [could not connect DATA stream:
  can't connect stream to pb48.meadwestvaco.com port 11996: Connection
  refused] (10:37:27)
  
   .
   .
   .
  amandad: time 30.019: stream_accept: timeout after 30 seconds
  amandad: time 30.019: security_stream_seterr(0x86b67000, can't accept new 
  stream connection: No such file or directory)
  amandad: time 30.019: stream 0 accept failed: unknown protocol error
  amandad: time 30.019: security_stream_close(0x86b67000)
  amandad: time 60.027: stream_accept: timeout after 30 seconds
  amandad: time 60.027: security_stream_seterr(0x81212000, can't accept new 
  stream connection: No such file or directory)
  amandad: time 60.027: stream 1 accept failed: unknown protocol error
  amandad: time 60.027: security_stream_close(0x81212000)
  amandad: time 90.035: stream_accept: timeout after 30 seconds
  amandad: time 90.036: security_stream_seterr(0x84877000, can't accept new 
  stream connection: No such file or directory)
  amandad: time 90.036: stream 2 accept failed: unknown protocol error
  amandad: time 90.036: security_stream_close(0x84877000)
  amandad: time 90.036: security_close(handle=0x81bbf800, driver=0x298a9240 
  (BSD))
  amandad: time 120.044: pid 17702 finish time Fri Aug 21 10:39:27 2009

For some reason the socket is not getting marked ready for read.
select(2) is timing out waiting.  Firewall setup perhaps?

This bit of code in 2.5.2p1's common-src/stream.c is where
the failure is happening for you...

int
stream_accept(
int server_socket,
int timeout,
size_t sendsize,
size_t recvsize)
{
SELECT_ARG_TYPE readset;
struct timeval tv;
int nfound, connected_socket;
int save_errno;
int ntries = 0;
in_port_t port;

assert(server_socket = 0);

do {
ntries++;
memset(tv, 0, SIZEOF(tv));
tv.tv_sec = timeout;
memset(readset, 0, SIZEOF(readset));
FD_ZERO(readset);
FD_SET(server_socket, readset);
nfound = select(server_socket+1, readset, NULL, NULL, tv);
if(nfound = 0 || !FD_ISSET(server_socket, readset)) {
save_errno = errno;
if(nfound  0) {
dbprintf((%s: stream_accept: select() failed: %s\n,
  debug_prefix_time(NULL),
  strerror(save_errno)));
} else if(nfound == 0) {
dbprintf((%s: stream_accept: timeout after %d second%s\n,
  debug_prefix_time(NULL),
  timeout,
  (timeout == 1) ?  : s));
errno = ENOENT; /* ??? */
return -1;


Re: Backup issues with OpenBSD 4.5 machines

2009-08-21 Thread stan
On Fri, Aug 21, 2009 at 09:57:36AM -0600, John Hein wrote:
 stan wrote at 10:56 -0400 on Aug 21, 2009:
   OK here is the latest on this saga :-)
   
   On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to
   back this machine up successfully (using classic UDP based authentication)
   
   On another of them, I built 2.5.2p1. The first attempt to back this machine
   up failed. I checked the log files, and found they were having issues
   because /etc/amdates was missing. I corrected that, and started a 2nd
   backup run. (Remember amcheck reports all is well with this machine). I 
   got the following from amstatus when I attempted to back up this machine.
   Also remember, one of the test I ran with a 2.6.1 client was to connect a
   test machine directly to the client, using a crossover cable to eliminate
   any firewall, or router type issues.
   
   I am attaching, what I think is, the amadnad debug file associated with 
 this
   failure.
   
   Can anyone suggest what I can do to further troubleshoot this?
   
   pb48:wd0f 1  dumper: [could not connect DATA stream:
   can't connect stream to pb48.meadwestvaco.com port 11996: Connection
   refused] (10:37:27)
   
.
.
.
   amandad: time 30.019: stream_accept: timeout after 30 seconds
   amandad: time 30.019: security_stream_seterr(0x86b67000, can't accept new 
 stream connection: No such file or directory)
   amandad: time 30.019: stream 0 accept failed: unknown protocol error
   amandad: time 30.019: security_stream_close(0x86b67000)
   amandad: time 60.027: stream_accept: timeout after 30 seconds
   amandad: time 60.027: security_stream_seterr(0x81212000, can't accept new 
 stream connection: No such file or directory)
   amandad: time 60.027: stream 1 accept failed: unknown protocol error
   amandad: time 60.027: security_stream_close(0x81212000)
   amandad: time 90.035: stream_accept: timeout after 30 seconds
   amandad: time 90.036: security_stream_seterr(0x84877000, can't accept new 
 stream connection: No such file or directory)
   amandad: time 90.036: stream 2 accept failed: unknown protocol error
   amandad: time 90.036: security_stream_close(0x84877000)
   amandad: time 90.036: security_close(handle=0x81bbf800, driver=0x298a9240 
 (BSD))
   amandad: time 120.044: pid 17702 finish time Fri Aug 21 10:39:27 2009
 
 For some reason the socket is not getting marked ready for read.
 select(2) is timing out waiting.  Firewall setup perhaps?
 
 This bit of code in 2.5.2p1's common-src/stream.c is where
 the failure is happening for you...
 

Interesting.

Is this different in 2.5.0, than in 2.5.2 ?

RE firewall: Always my first suspicion around here :-( But, I tride this
with a 2.6.1 on OpenBSD 4.5 yesterday, which I had connected to the Amanda
master machine via a crossover cable to eliminate thta, and the failure
there _appeared_ to be identical to the original failure. I have noot tried
this with a 2.5.2 client. I supose I should do that test..


-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-21 Thread stan
On Fri, Aug 21, 2009 at 09:57:36AM -0600, John Hein wrote:
 stan wrote at 10:56 -0400 on Aug 21, 2009:
   OK here is the latest on this saga :-)
   
   On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to
   back this machine up successfully (using classic UDP based authentication)
   
   On another of them, I built 2.5.2p1. The first attempt to back this machine
   up failed. I checked the log files, and found they were having issues
   because /etc/amdates was missing. I corrected that, and started a 2nd
   backup run. (Remember amcheck reports all is well with this machine). I 
   got the following from amstatus when I attempted to back up this machine.
   Also remember, one of the test I ran with a 2.6.1 client was to connect a
   test machine directly to the client, using a crossover cable to eliminate
   any firewall, or router type issues.
   
   I am attaching, what I think is, the amadnad debug file associated with 
 this
   failure.
   
   Can anyone suggest what I can do to further troubleshoot this?
   
   pb48:wd0f 1  dumper: [could not connect DATA stream:
   can't connect stream to pb48.meadwestvaco.com port 11996: Connection
   refused] (10:37:27)
   
.
.
.
   amandad: time 30.019: stream_accept: timeout after 30 seconds
   amandad: time 30.019: security_stream_seterr(0x86b67000, can't accept new 
 stream connection: No such file or directory)
   amandad: time 30.019: stream 0 accept failed: unknown protocol error
   amandad: time 30.019: security_stream_close(0x86b67000)
   amandad: time 60.027: stream_accept: timeout after 30 seconds
   amandad: time 60.027: security_stream_seterr(0x81212000, can't accept new 
 stream connection: No such file or directory)
   amandad: time 60.027: stream 1 accept failed: unknown protocol error
   amandad: time 60.027: security_stream_close(0x81212000)
   amandad: time 90.035: stream_accept: timeout after 30 seconds
   amandad: time 90.036: security_stream_seterr(0x84877000, can't accept new 
 stream connection: No such file or directory)
   amandad: time 90.036: stream 2 accept failed: unknown protocol error
   amandad: time 90.036: security_stream_close(0x84877000)
   amandad: time 90.036: security_close(handle=0x81bbf800, driver=0x298a9240 
 (BSD))
   amandad: time 120.044: pid 17702 finish time Fri Aug 21 10:39:27 2009
 
 For some reason the socket is not getting marked ready for read.
 select(2) is timing out waiting.  Firewall setup perhaps?
 
 This bit of code in 2.5.2p1's common-src/stream.c is where
 the failure is happening for you...
 
OK, I reproduced the failure with only a crossover cable between the test
client and the Amanda Master:

192.168.1.2:wd0f 0  dumper: [could not connect DATA stream: can't connect
stream to 192.168.1.2 port 24376: Connection refused] (13:48:23)

Note the 192.168.1.2 address :-)

This is with a 2.5.2p1 clinet on OpenBSD 4.5 2.5.0p1 works on this same
machine/OS/netwrok configuration.

So, it appears to me that this must be because of something that changed
between 2.5.0p1 and 2.5.2p1. And we have a pretty good idea where in the
code this is failing. So can anyone enlighten me as to what chaged in this
area between those 2 versions?


-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-21 Thread John Hein
stan wrote at 13:56 -0400 on Aug 21, 2009:
  OK, I reproduced the failure with only a crossover cable between the test
  client and the Amanda Master:

Just because you're using a crossover cable doesn't rule out firewall
or other such socket level interference.  I'm not saying that's your
problem, but using a crossover cable doesn't rule it out.


  192.168.1.2:wd0f 0  dumper: [could not connect DATA stream: can't connect
  stream to 192.168.1.2 port 24376: Connection refused] (13:48:23)
  
  Note the 192.168.1.2 address :-)
  
  This is with a 2.5.2p1 clinet on OpenBSD 4.5 2.5.0p1 works on this same
  machine/OS/netwrok configuration.
  
  So, it appears to me that this must be because of something that changed
  between 2.5.0p1 and 2.5.2p1. And we have a pretty good idea where in the
  code this is failing. So can anyone enlighten me as to what chaged in this
  area between those 2 versions?

I haven't looked to see what changed between 2.5.0 and 2.5.2.  It's
all pretty basic socket stuff.  I wouldn't be surprised if that is
when the additional auth mechanisms (bsdudp, bsdtcp) were added.
However, if no one chimes in, it's not that hard to look yourself.  If
you can narrow it down a bit to where there seems to be a problem in
the code, the amanda-hackers@ list might be able to help more.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-21 Thread stan
On Fri, Aug 21, 2009 at 01:23:29PM -0600, John Hein wrote:
 stan wrote at 13:56 -0400 on Aug 21, 2009:
   OK, I reproduced the failure with only a crossover cable between the test
   client and the Amanda Master:
 
 Just because you're using a crossover cable doesn't rule out firewall
 or other such socket level interference.  I'm not saying that's your
 problem, but using a crossover cable doesn't rule it out.
 
Fair enough, but I think we can rule that out in this case. First, I
control the 2 computersin question. The Master backus up 55 machines a day
taht consist of a diverse nix of OS'es, and Amanda versions. It's a Linux
machine, and I am not running a firwall on it. The client is an OpenBSD
machine. I use these for my firewalls, so I am familiar with configuring
firewalls on them, Doing so requires (as a minimum) running pf. I am not
running pf on the test machine at all.

I am not certain whether the sendbackup executable, or amandad opens the
port on the client side. My guess is that it is not suceding in doing this.
Thus when the Master tries to connect to it, there is no port for it to
connect to.

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-21 Thread Michael Burk
Stan's not alone on this one. I have two OpenBSD 4.5 machines also on Sun
SPARC hardware. I had this same trouble a couple months ago with 2.6.1, but
didn't have time to look deeper. After seeing this discussion, I built
amanda-2.6.2alpha-20090812 (can't get 0820 to compile). I'm getting exactly
the same errors. I tried dump with and without compression.

My Amanda server is on Solaris 10, and I have a mix of Solaris, Linux, and
OpenBSD systems. Except for the above-mentioned build, everything is at
2.6.1p1. There are no firewalls on the Amanda server or the OpenBSD clients.

I know me too responses aren't usually helpful, but maybe this will rule
out the firewall question, at least.


On Fri, Aug 21, 2009 at 1:37 PM, stan st...@panix.com wrote:

 On Fri, Aug 21, 2009 at 01:23:29PM -0600, John Hein wrote:
  stan wrote at 13:56 -0400 on Aug 21, 2009:
OK, I reproduced the failure with only a crossover cable between the
 test
client and the Amanda Master:
 
  Just because you're using a crossover cable doesn't rule out firewall
  or other such socket level interference.  I'm not saying that's your
  problem, but using a crossover cable doesn't rule it out.
 
 Fair enough, but I think we can rule that out in this case. First, I
 control the 2 computersin question. The Master backus up 55 machines a day
 taht consist of a diverse nix of OS'es, and Amanda versions. It's a Linux
 machine, and I am not running a firwall on it. The client is an OpenBSD
 machine. I use these for my firewalls, so I am familiar with configuring
 firewalls on them, Doing so requires (as a minimum) running pf. I am not
 running pf on the test machine at all.

 I am not certain whether the sendbackup executable, or amandad opens the
 port on the client side. My guess is that it is not suceding in doing this.
 Thus when the Master tries to connect to it, there is no port for it to
 connect to.

 --
 One of the main causes of the fall of the roman empire was that, lacking
 zero, they had no way to indicate successful termination of their C
 programs.



Re: Backup issues with OpenBSD 4.5 machines

2009-08-20 Thread stan
On Tue, Aug 18, 2009 at 10:39:45AM -0400, Jean-Louis Martineau wrote:
 stan wrote:
  I know I have asked this before, but I can't recall getting a definative
  answer. Is port usage different between 2.5.x and 2.6 clients, when
  talkimng to a 2.6 server?
   
 
 No, the same port are used if configured with the same port range.
 Do yousee something in the firewall log?

Sorry to take so long to get back to everyone on this. I had one of my
contractors  decide he did not want to work, so I have been trying to do
his job, and mine :-(

In any case, today I took a crossover cable, and set up a direct 192.169
connection between my Amanda master machine and one of the OpenBSD machines
in the lab.

h behavior is exactly the same. That is, check passes, estimate gets done,
and the dump fails in mid dump. It is interesting that the files on the
dumpdisk are of varying sizes, so it's not like the dumps are not getting
started.

Now that we have eliminated the firewall issues. what can I do next to
further diagnose this failure?

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-20 Thread stan
On Tue, Aug 18, 2009 at 10:39:45AM -0400, Jean-Louis Martineau wrote:
 stan wrote:
  I know I have asked this before, but I can't recall getting a definative
  answer. Is port usage different between 2.5.x and 2.6 clients, when
  talkimng to a 2.6 server?
   
 
 No, the same port are used if configured with the same port range.
 Do yousee something in the firewall log?

Well, this is getting interesting.

I have a working client. It's 2.5.0p1. And it's using BSD auth, rather than
bsdtcp, and at the moment I am not using compression. But it works on
OpenBSD 4.5.

This was the version that was working on one of the OpenBSD 3,9 machines. I
recompiled it for I386, and copied over the inetd.conf line, and it worked
first time.

I made an attempt to get 2.5.2p1 working, but things were not going well
with that, and I decided to drop all the way back to a version that i had
working on OpenBSD.

I am not certain if 2.5.2p1 will work on OpenBSD 4,5 yet or not, as I was
having problems getting it set up configuration wise, so I cannot assume
that it has the same issue as 2.6.1 yet.

So, this raises the question, what changed in sendbackup between 2.5.0p1,
and 2.6.1? 

And given that this narrows the issue down to Amanda, and not the OS or
firewalls etc, how can i further troubleshoot the prolem?

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-18 Thread Jean-Louis Martineau

man amanda.conf
  compress [client|server]  string
  Default: client fast.

Set it to none if you don't want compression.
ig. compress none

Jean-Louis

stan wrote:

On Fri, Aug 14, 2009 at 03:26:10PM -0400, Jean-Louis Martineau wrote:
  

Check system log
Post complete amandad.*.debug and sendbackup.*.debug.

You can try to disable client compression.



OK, I am very puzzled this morning. Some of the dum,ps failled on the
OpenBSD machines last night. I had thought that I had set up dumptypes for
them without compression. Here is what I did:

define dumptype stan-test {
global
comment Non-root partitions on reasonably fast machines
maxdumps 4
priority medium
}

And hee is the global def:

define dumptype global {
comment Global definitions
# This is quite useful for setting global parameters, so you don't have
# to type them everywhere.  All dumptype definitions in this sample file
# do include these definitions, either directly or indirectly.
# There's nothing special about the name `global'; if you create any
# dumptype that does not contain the word `global' or the name of any
# other dumptype that contains it, these definitions won't apply.
# Note that these definitions may be overridden in other
# dumptypes, if the redefinitions appear *after* the `global'
# dumptype name.
# You may want to use this for globally enabling or disabling
	# indexing, recording, etc.  Some examples: 
	index yes 
	record yes
	# split_diskbuffer /raid/amanda 
	# fallback_splitsize 64m

}

And here are the DLE's for the machine in question:

pblab wd0a stan-test 1 eth0
pblab wd0f stan-test 1 eth0
pblab wd0d stan-test 1 eth0
pblab wd0e stan-test 1 eth0

Should be no compression, right?

But look at what the sendnackup debug files have to say:

-rw-r-  1 amanda  wheel  2865 Aug 17 22:52 sendbackup.20090817224841.debug
-rw-r-  1 amanda  wheel  2959 Aug 17 22:55 sendbackup.20090817225226.debug
-rw-r-  1 amanda  wheel  2851 Aug 17 22:56 sendbackup.20090817225558.debug
-rw-r-  1 amanda  wheel  3298 Aug 17 22:56 sendbackup.20090817225623.debug
-rw-r-  1 amanda  wheel  3195 Aug 17 22:56 sendbackup.20090817225638.debug
-rw-r-  1 amanda  wheel  3192 Aug 17 22:56 sendbackup.20090817225655.debug
-rw-r-  1 amanda  wheel  3190 Aug 17 22:57 sendbackup.20090817225728.debug
-rw-r-  1 amanda  wheel  3362 Aug 17 22:58 sendbackup.20090817225803.debug

Let's start off with why are there twice as many as I expect, based on the
DLE's?

The look at this:

$ grep -l gzip *backup*
sendbackup.20090817224841.debug
sendbackup.20090817225226.debug
sendbackup.20090817225558.debug
sendbackup.20090817225623.debug
sendbackup.20090817225638.debug
sendbackup.20090817225655.debug
sendbackup.20090817225728.debug
sendbackup.20090817225803.debug

I will atach a tarball with these files

I am very puzzled, at the moment.

Thanks for any clarification.
$ 




  




Re: Backup issues with OpenBSD 4.5 machines

2009-08-18 Thread stan
On Tue, Aug 18, 2009 at 07:34:11AM -0400, Jean-Louis Martineau wrote:
 man amanda.conf
   compress [client|server]  string
   Default: client fast.
 
 Set it to none if you don't want compression.
 ig. compress none

Thanks, I was just going to write an email noting that I had found that,
and done as you sugestd. I am starting atest abckup, as we speak. I should
have results in an hour or 2.

Any thoughts as to why I have twice as many debug files as I expect?
-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-18 Thread Jean-Louis Martineau

stan wrote:

Any thoughts as to why I have twice as many debug files as I expect?
  


When a dump fail, amanda try it a second time.



Re: Backup issues with OpenBSD 4.5 machines

2009-08-18 Thread stan
On Tue, Aug 18, 2009 at 08:06:46AM -0400, Jean-Louis Martineau wrote:
 stan wrote:
 Any thoughts as to why I have twice as many debug files as I expect?
   
 
 When a dump fail, amanda try it a second time.
 

Got it.

Now, hee are what I think are the pertinant lines from a dump this morning
W/O compression:

250596583.437206: sendbackup:  91:  normal(|):   DUMP: Volume 1 started at: Tue
 Aug 18 07:56:23 2009
1250596583.481419: sendbackup: critical (fatal): index tee cannot write [Resourc
 e temporarily unavailable]
1250596583.484261: sendbackup: 116: strange(?): sendbackup: index tee cannot 
wri te [Resource temporarily unavailable]
1250596583.485035: sendbackup:  91:  normal(|):   DUMP: Broken pipe
1250596583.485826: sendbackup:  91:  normal(|):   DUMP: The ENTIRE dump is 
aborted.
1250596583.500258: sendbackup: critical (fatal): error [dump (10626) /sbin/dump 
returned 3]

That's from a sendbackup debug file.

Do you need me to send the entire debug files?

What can I try to help further narow down the problem?


-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-18 Thread stan
On Mon, Aug 17, 2009 at 11:30:51AM -0400, Jean-Louis Martineau wrote:
 And what is the error?
 
 It's probably not a gzip: ...
 
One more data point that I need to clarify. Thes 2 OpenBSD machines I am
fighting with did have older versions of Amanda on them, before we
upgradedd. They had 2.5.0p1, and 2.5.2p1 on them respectivly. I live in a
world where people with little understanding of our needs control the
netwrk, and there are (IMHO) way to many firwall rules. Is it possible that
the port utilizaton has chnaged between these versions and 2.6.1?

I'd look at this with wiresharkm but at the moment, compiling that on the
newer version of OpenBSD is itself an issue :-(

Thanks for the ehlp on this.

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-18 Thread Jean-Louis Martineau

From the write man page:

[EAGAIN]   The file was marked for non-blocking I/O, and no data could
   be written immediately.


But sendbackup or gzip write the index to a blocking pipe.
Maybe it's the firewall that returns that error.

You can try to switch to the 'bsdtcp' auth, it is a lot more firewall 
friendly.


Jean-Louis

stan wrote:

On Tue, Aug 18, 2009 at 08:06:46AM -0400, Jean-Louis Martineau wrote:
  

stan wrote:


Any thoughts as to why I have twice as many debug files as I expect?
 
  

When a dump fail, amanda try it a second time.




Got it.

Now, hee are what I think are the pertinant lines from a dump this morning
W/O compression:

250596583.437206: sendbackup:  91:  normal(|):   DUMP: Volume 1 started at: Tue
 Aug 18 07:56:23 2009
1250596583.481419: sendbackup: critical (fatal): index tee cannot write [Resourc
 e temporarily unavailable]
1250596583.484261: sendbackup: 116: strange(?): sendbackup: index tee cannot 
wri te [Resource temporarily unavailable]
1250596583.485035: sendbackup:  91:  normal(|):   DUMP: Broken pipe
1250596583.485826: sendbackup:  91:  normal(|):   DUMP: The ENTIRE dump is 
aborted.
1250596583.500258: sendbackup: critical (fatal): error [dump (10626) /sbin/dump 
returned 3]

That's from a sendbackup debug file.

Do you need me to send the entire debug files?

What can I try to help further narow down the problem?


  




Re: Backup issues with OpenBSD 4.5 machines

2009-08-18 Thread stan
On Tue, Aug 18, 2009 at 09:32:21AM -0400, Jean-Louis Martineau wrote:
 From the write man page:
 
 [EAGAIN]   The file was marked for non-blocking I/O, and no data 
 could
be written immediately.
 
 
 But sendbackup or gzip write the index to a blocking pipe.
 Maybe it's the firewall that returns that error.
 
 You can try to switch to the 'bsdtcp' auth, it is a lot more firewall 
 friendly.
 
Same result:

1250605398.423331: sendbackup:  91:  normal(|):   DUMP: Volume 1 started at: Tue
 Aug 18 10:23:18 2009
1250605398.435367: sendbackup: critical (fatal): index tee cannot write 
[Resourc e temporarily unavailable]
1250605398.437948: sendbackup: 116: strange(?): sendbackup: index tee cannot 
write [Resource temporarily unavailable]
1250605398.439291: sendbackup:  91:  normal(|):   DUMP: Broken pipe
1250605398.440143: sendbackup:  91:  normal(|):   DUMP: The ENTIRE dump is 
aborted.
1250605398.440752: sendbackup: critical (fatal): error [dump (7497) /sbin/dump r
 eturned 3]

 I know I have asked this before, but I can't recall getting a definative
 answer. Is port usage different between 2.5.x and 2.6 clients, when
 talkimng to a 2.6 server?


-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-18 Thread stan
On Tue, Aug 18, 2009 at 10:39:45AM -0400, Jean-Louis Martineau wrote:
 stan wrote:
  I know I have asked this before, but I can't recall getting a definative
  answer. Is port usage different between 2.5.x and 2.6 clients, when
  talkimng to a 2.6 server?
   
 
 No, the same port are used if configured with the same port range.
 Do yousee something in the firewall log?

Geting access to that is um, chalenging. I may be better off setting up a
network i have control of in the lab, and see if I can reproduce the
problem there. 

After all, we are not _certain- it's a firewall issue ata all.

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-18 Thread Jean-Louis Martineau

stan wrote:

 I know I have asked this before, but I can't recall getting a definative
 answer. Is port usage different between 2.5.x and 2.6 clients, when
 talkimng to a 2.6 server?
  


No, the same port are used if configured with the same port range.
Do yousee something in the firewall log?

Jean-Louis



Re: Backup issues with OpenBSD 4.5 machines

2009-08-18 Thread Dewey Hylton
On Mon, Aug 17, 2009 at 4:32 PM, stanst...@panix.com wrote:
 On Tue, Aug 18, 2009 at 09:32:21AM -0400, Jean-Louis Martineau wrote:
 From the write man page:

     [EAGAIN]       The file was marked for non-blocking I/O, and no data
     could
                    be written immediately.


 But sendbackup or gzip write the index to a blocking pipe.
 Maybe it's the firewall that returns that error.

 You can try to switch to the 'bsdtcp' auth, it is a lot more firewall
 friendly.

 Same result:

 1250605398.423331: sendbackup:  91:  normal(|):   DUMP: Volume 1 started at: 
 Tue
  Aug 18 10:23:18 2009
 1250605398.435367: sendbackup: critical (fatal): index tee cannot write 
 [Resourc e temporarily unavailable]
 1250605398.437948: sendbackup: 116: strange(?): sendbackup: index tee cannot 
 write [Resource temporarily unavailable]
 1250605398.439291: sendbackup:  91:  normal(|):   DUMP: Broken pipe
 1250605398.440143: sendbackup:  91:  normal(|):   DUMP: The ENTIRE dump is 
 aborted.
 1250605398.440752: sendbackup: critical (fatal): error [dump (7497) 
 /sbin/dump r
  eturned 3]

  I know I have asked this before, but I can't recall getting a definative
  answer. Is port usage different between 2.5.x and 2.6 clients, when
  talkimng to a 2.6 server?

i have no idea what else your system is doing. is it possible you're
simply running out of file descriptors?
pstat -T | grep open


Re: Backup issues with OpenBSD 4.5 machines

2009-08-18 Thread stan
On Tue, Aug 18, 2009 at 01:52:56PM -0400, Dewey Hylton wrote:
 On Mon, Aug 17, 2009 at 4:32 PM, stanst...@panix.com wrote:
  On Tue, Aug 18, 2009 at 09:32:21AM -0400, Jean-Louis Martineau wrote:
  From the write man page:
 
  ? ? [EAGAIN] ? ? ? The file was marked for non-blocking I/O, and no data
  ? ? could
  ? ? ? ? ? ? ? ? ? ?be written immediately.
 
 
  But sendbackup or gzip write the index to a blocking pipe.
  Maybe it's the firewall that returns that error.
 
  You can try to switch to the 'bsdtcp' auth, it is a lot more firewall
  friendly.
 
  Same result:
 
  1250605398.423331: sendbackup: ?91: ?normal(|): ? DUMP: Volume 1 started 
  at: Tue
  ?Aug 18 10:23:18 2009
  1250605398.435367: sendbackup: critical (fatal): index tee cannot write 
  [Resourc e temporarily unavailable]
  1250605398.437948: sendbackup: 116: strange(?): sendbackup: index tee 
  cannot write [Resource temporarily unavailable]
  1250605398.439291: sendbackup: ?91: ?normal(|): ? DUMP: Broken pipe
  1250605398.440143: sendbackup: ?91: ?normal(|): ? DUMP: The ENTIRE dump is 
  aborted.
  1250605398.440752: sendbackup: critical (fatal): error [dump (7497) 
  /sbin/dump r
  ?eturned 3]
 
  ?I know I have asked this before, but I can't recall getting a definative
  ?answer. Is port usage different between 2.5.x and 2.6 clients, when
  ?talkimng to a 2.6 server?
 
 i have no idea what else your system is doing. is it possible you're
 simply running out of file descriptors?
 pstat -T | grep open

A useful thought, but it does not appear this is the problem

$ pstat -T | grep open
169/7030 open files
$

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-17 Thread stan
On Fri, Aug 14, 2009 at 03:26:10PM -0400, Jean-Louis Martineau wrote:
 Check system log
 Post complete amandad.*.debug and sendbackup.*.debug.
 
 You can try to disable client compression.
 
It still fails with client side compression truned off.

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-17 Thread Jean-Louis Martineau

And what is the error?

It's probably not a gzip: ...

Jean-Louis

stan wrote:

On Fri, Aug 14, 2009 at 03:26:10PM -0400, Jean-Louis Martineau wrote:
  

Check system log
Post complete amandad.*.debug and sendbackup.*.debug.

You can try to disable client compression.



It still fails with client side compression truned off.

  




Re: Backup issues with OpenBSD 4.5 machines

2009-08-17 Thread stan
On Mon, Aug 17, 2009 at 11:30:51AM -0400, Jean-Louis Martineau wrote:
 And what is the error?
 
 It's probably not a gzip: ...

Interesting. I set up a test doing just one of these machine (4 DLE's), and
it worked in non-compressed. I will leave a couple of the OpenBSD machines
set fro non-compressed in tonights ru, and see what ahppens.

Thanks.
-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-14 Thread Jean-Louis Martineau

gzip: stdout: Resource temporarily unavailable

I have no idea what's that error mean.

Can you try the latest 2.6.1p1 snapshot from 
http://www.zmanda.com/community-builds.php


Jean-Louis

stan wrote:

WE deployed 3 OpenBSD machines yesterday to replace older OpenBSD machines
that had been backing up happily. I honestly cannot remember which version
of Amanda was on these machines. The new ones have 2.6.1. The Amanda master
machine is 2.6.1, and has been happily backing up 55 machines every night
prior to this.

At least one filessytem failed on each of the upgraded machines on last
nights run. I manually started a run again this morning, and was able to
reproduce the failure. I am attaching 2 sendbackup debug files from one of
the machines. I am not quite certain why there are 2 files. I deleted all
the Amanda debug files, prior to doing the manual run, so I so I expected
just to have one debug file for a given filessytem, but found 2.

What can I do to further debug this?


  




Re: Backup issues with OpenBSD 4.5 machines

2009-08-14 Thread stan
On Fri, Aug 14, 2009 at 09:50:22AM -0400, Jean-Louis Martineau wrote:
 gzip: stdout: Resource temporarily unavailable
 
 I have no idea what's that error mean.
 
 Can you try the latest 2.6.1p1 snapshot from 
 http://www.zmanda.com/community-builds.php
 
 Jean-Louis
 
OK, I compiled and installed that version. Re-ran a backup, and the results
were similar. A total of 2 of the 12 filesystems on 3 machines suceded, the
rest were PARTIAL.

Debuging sugestions welcome.

-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-14 Thread Jean-Louis Martineau

Check system log
Post complete amandad.*.debug and sendbackup.*.debug.

You can try to disable client compression.

Jean-Louis

stan wrote:

On Fri, Aug 14, 2009 at 09:50:22AM -0400, Jean-Louis Martineau wrote:
  

gzip: stdout: Resource temporarily unavailable

I have no idea what's that error mean.

Can you try the latest 2.6.1p1 snapshot from 
http://www.zmanda.com/community-builds.php


Jean-Louis



OK, I compiled and installed that version. Re-ran a backup, and the results
were similar. A total of 2 of the 12 filesystems on 3 machines suceded, the
rest were PARTIAL.

Debuging sugestions welcome.