We have been regression testing our production environment on Ubuntu Hardy, and I believe the root cause of an intermittent nfs failure issue we are seeing on Hardy (but not previous versions of ubuntu) is sourced from this kernel bug. It took a long time to track the issue down, as our use case is significantly more complex than the one described here, but when you boil it down, I think it's the same thing.
We are testing the 2.6.24-24-generic kernel on Ubuntu hardy. All patches are applied as of 06/04/09. The test case provided by Ari Mujunen does not work for me, and I am unable to reproduce it by those steps. I have been able to reproduce it reliably a different way: System A: NFS Server System B: NFS Client System A: mkdir tmpdir touch tmpdir/tmpfile tar -cvf x.tar tmpdir System B: stat tmpdir/tmpfile System A: tar -xvf x.tar System B: stat tmpdir/tmpfile There's something about tar and having a subdirectory in the mix that seems to trigger the issue for me, that Ari's steps don't. Here are the server export: xxx client(rw,async,no_root_squash,no_subtree_check) and the client mount: server:/xxx /x nfs soft,intr,rsize=8192,wsize=8192,nosuid,noac,tcp,timeo=20 This is a very serious issue for us, and unlike the .XAuthority usecase, we can't just 'work around it'. Our production environment implements an Active/Active redundant NFS store at the application level. This issue will (rightly so) make our nfs layer believe the nfs server has failed, adding in arbitrary directory listing calls isn't a practical option, and doesn't reliably work either. The ls trick described above seems to MOSTLY work for getting the nfs client back alive, but not always. We have directory structures that may be nested a few levels deep, and this seems to cause further issues. I've seen our application get the kernel nfs client into a state where whole directories of files are returning 'Stale NFS file handle' errors... One time I used the ls trick to try to get the client working again, and it stopped the 'Stale NFS file handle' errors, but ls returned completely corrupted garbage (it listed the files but with alot of ??????'s for the attributes). I have found reference to this bug here: http://bugzilla.kernel.org/show_bug.cgi?id=12557 The but has a candidate patch attached, but I haven't tried it yet. This seems like a fairly serious regression. The tar usecase I have provided seems like a very frequent operation over nfs, and this isn't a client race condition. The client is just toast... it doesn't come back unless you remount, or you do some trickery by finding out what's changed and trying to do directory listings of those directories. ** Bug watch added: Linux Kernel Bug Tracker #12557 http://bugzilla.kernel.org/show_bug.cgi?id=12557 -- ssh -X breaks Xauthority on NFS mounted home dir https://bugs.launchpad.net/bugs/269954 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
