On Thu, 12 Apr 2007, Stephan Wiesand wrote:
On Wed, 11 Apr 2007, Derrick J Brashear wrote:
On Wed, 11 Apr 2007, Stephan Wiesand wrote:
One of our systems panicked two times within 2 hours yesterday, at the
same location in the OpenAFS client. I attached the kernel's last words
below.
This is an SL3 system, kernel 2.4.21-47.0.1.ELsmp, i686. The client build
has two patches on top of 1.4.4: linux-task-pointer-safety-20070320 from
CVS, and the one from
https://lists.openafs.org/pipermail/openafs-devel/2007-March/014985.html
[]
so basically you appear to have an unhashed dcache entry. Either there's a
locking bug or something is becoming erroneously unhashed.
How reproducible is it?
Good news: it is reproducible. The user confessed that he'd run "less than
20" parallel rsyncs transferring data to our cell. The files are a mixture af
data and log files, with typical sizes of 15MB and 100kB.
So I set up a dozen rsyncs to copy this data into another volume, and after
some 9 hours got the panic you find below.
I'm going to repeat this exercise now, and will also try to make the panic
happen earlier (more rsyncs, read data from a faster source - any other
ideas?).
Just wondering what to do next then.
I'm thinking about a patch. I have something else I need to deal with but
I will try to work something up after. There's a 3rd possibility, namely
the missing object being mishashed. We can presumably just instead of
panicing iterate everything and dump state.
I suppose the other possibility would be to get a kernel crash dump but
it's sort of cumbersome to move those around so unless you're comfortable
with a debugger on a kernel dump that's probably a non-starter.
Derrick
_______________________________________________
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info