We have been regression testing our production environment on Ubuntu
Hardy, and I believe the root cause of an intermittent nfs failure issue
we are seeing on Hardy (but not previous versions of ubuntu) is sourced
from this kernel bug.  It took a long time to track the issue down, as
our use case is significantly more complex than the one described here,
but when you boil it down, I think it's the same thing.

We are testing the 2.6.24-24-generic kernel on Ubuntu hardy.  All
patches are applied as of 06/04/09.  The test case provided by Ari
Mujunen does not work for me, and I am unable to reproduce it by those
steps.  I have been able to reproduce it reliably a different way:

System A: NFS Server
System B: NFS Client

System A:
mkdir tmpdir
touch tmpdir/tmpfile
tar -cvf x.tar tmpdir

System B:
stat tmpdir/tmpfile

System A:
tar -xvf x.tar

System B:
stat tmpdir/tmpfile

There's something about tar and having a subdirectory in the mix that
seems to trigger the issue for me, that Ari's steps don't.

Here are the server export:
xxx         client(rw,async,no_root_squash,no_subtree_check)
and the client mount:
server:/xxx  /x       nfs     
soft,intr,rsize=8192,wsize=8192,nosuid,noac,tcp,timeo=20

This is a very serious issue for us, and unlike the .XAuthority usecase,
we can't just 'work around it'.  Our production environment implements
an Active/Active redundant NFS store at the application level.  This
issue will (rightly so) make our nfs layer believe the nfs server has
failed, adding in arbitrary directory listing calls isn't a practical
option, and doesn't reliably work either.  The ls trick described above
seems to MOSTLY work for getting the nfs client back alive, but not
always.  We have directory structures that may be nested a few levels
deep, and this seems to cause further issues.  I've seen our application
get the kernel nfs client into a state where whole directories of files
are returning 'Stale NFS file handle' errors... One time I used the ls
trick to try to get the client working again, and it stopped the 'Stale
NFS file handle' errors, but ls returned completely corrupted garbage
(it listed the files but with alot of ??????'s for the attributes).

I have found reference to this bug here:
http://bugzilla.kernel.org/show_bug.cgi?id=12557

The but has a candidate patch attached, but I haven't tried it yet.

This seems like a fairly serious regression.  The tar usecase I have
provided seems like a very frequent operation over nfs, and this isn't a
client race condition.  The client is just toast... it doesn't come back
unless you remount, or you do some trickery by finding out what's
changed and trying to do directory listings of those directories.


** Bug watch added: Linux Kernel Bug Tracker #12557
   http://bugzilla.kernel.org/show_bug.cgi?id=12557

-- 
ssh -X breaks Xauthority on NFS mounted home dir
https://bugs.launchpad.net/bugs/269954
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to