Re: [OpenAFS] 1.4.4 client on EL3: panic in afs_HashOutDcache

Derrick J Brashear Thu, 12 Apr 2007 07:47:11 -0700

On Thu, 12 Apr 2007, Stephan Wiesand wrote:

On Wed, 11 Apr 2007, Derrick J Brashear wrote:
On Wed, 11 Apr 2007, Stephan Wiesand wrote:
One of our systems panicked two times within 2 hours yesterday, at thesame location in the OpenAFS client. I attached the kernel's last wordsbelow.
This is an SL3 system, kernel 2.4.21-47.0.1.ELsmp, i686. The client buildhas two patches on top of 1.4.4: linux-task-pointer-safety-20070320 fromCVS, and the one from
https://lists.openafs.org/pipermail/openafs-devel/2007-March/014985.html

[]

so basically you appear to have an unhashed dcache entry. Either there's alocking bug or something is becoming erroneously unhashed.
How reproducible is it?
Good news: it is reproducible. The user confessed that he'd run "less than20" parallel rsyncs transferring data to our cell. The files are a mixture afdata and log files, with typical sizes of 15MB and 100kB.
So I set up a dozen rsyncs to copy this data into another volume, and aftersome 9 hours got the panic you find below.
I'm going to repeat this exercise now, and will also try to make the panichappen earlier (more rsyncs, read data from a faster source - any other
ideas?).

Just wondering what to do next then.

I'm thinking about a patch. I have something else I need to deal with butI will try to work something up after. There's a 3rd possibility, namelythe missing object being mishashed. We can presumably just instead ofpanicing iterate everything and dump state.

I suppose the other possibility would be to get a kernel crash dump butit's sort of cumbersome to move those around so unless you're comfortablewith a debugger on a kernel dump that's probably a non-starter.


Derrick
_______________________________________________
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] 1.4.4 client on EL3: panic in afs_HashOutDcache

Reply via email to