Re: [OpenAFS-devel] How to create inconsistency in the volserver and my mind.

Jeffrey Hutzelman Thu, 17 Mar 2005 10:09:09 -0800

On Thursday, March 17, 2005 12:33:12 PM +0100 Harald Barth <[EMAIL PROTECTED]> wrote:

I suppose it's possible you could construct something that does this
using  the convert-RO-to-RW functionality that is in very recent
servers.  But I'd  have to think about it for a lot longer to convince
myself that this would  actually be stable.


Yes. Something like that would be nice.

Those aren't error messages; they're log messages.  They are normal.
The  -overwrite switch doesn't mean the volume already exists; it tells
vos what  to do _if_ the volume already exists.  The way it tells that
is by trying  to create the volume and looking at the error code.


The problem is that they look dangerous to the non-suspecting sysadmin.
"Abort, abort - all brace for impact" ;-)

The non-suspecting sysadmin needs to get out of the habit of assuming that any output produced by any program must be a horrible fatal error. Solve that problem, and then we can talk about whether the messages are meaningful enough.

> Tue Mar 15 11:05:19 2005 1 Volser: Delete: volume 537057012 deleted
> Tue Mar 15 11:05:19 2005 1 Volser: CreateVolume: volume 537057012
> (dah.test.flopp) created Tue Mar 15 11:05:19 2005 1 Volser:
> RestoreVolume: Error reading header file for dump; aborted
>
> And this is the log from the broken -overwrite full which results in
> the vl-volser inconsistency.

Yeah, that makes sense.  The error is referring to the volserver's
inability to read the dump header over the wire, which is not
unsurprising  since in your example, vos will never send one.

And here, aborted actually means it fell over.

No, it means the volserver aborted the RPC, just like the first case. Before, the operation it was aborting was CreateVolume; in this example, it's RestoreVolume. Really, people who want to know the result of a command they ran with vos should look at the output of vos, not the contents of the volserver log.

> cysteine# tail -1 BosLog
> Mon Mar 14 18:15:08 2005: fs:vol exited on signal 6

What version and platform?


We are OpenAFS 1.3.77 built  2005-01-18 on i386 RH9.

Seems to be the threaded beast:

Well, then, that kills my theory that it's the 25-day bug, which only affects LWP processes, and apparently only on fairly new Linux.

cysteine# ldd  /usr/openafs/libexec/openafs/volserver
        libpthread.so.0 => /lib/i686/libpthread.so.0 (0x4001e000)
        libresolv.so.2 => /lib/libresolv.so.2 (0x4006f000)
        libc.so.6 => /lib/i686/libc.so.6 (0x40081000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

What is the current status about OpenAFS and Linux threads? I know the
thread situation on Linux sucks in general, just tell me your best
practice, ok? :-)

Ok. My best practice is to run fileservers on SPARC Solaris, thereby avoiding the Linux threads mess, the horrible kludge that is the namei fileserver, and all sorts of other problems that the rest of you have seen. :-)

Really, I can't tell you much about OpenAFS and Linux threads. Maybe Derrick can field that one.


-- Jeff
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: [OpenAFS-devel] How to create inconsistency in the volserver and my mind.

Reply via email to