On Mon, 4 May 2009, Marc Dionne wrote:

Traces of the usual deadlocked suspects are attached. At that point, just
about any process can deadlock, I suppose. Apparently, the system ceases to
balance dirty pages (which appears plausible to me, but I have no experience
with virtual memory implementations whatsoever).

Ok this brought back some memories... I think you're seeing a problem
with older kernels that was addressed by Peter Zijlstra's "per BDI
dirty threshold" patch set in kernel 2.6.24:
   
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=04fbfdc14e5f4

Note the mention of "deadlocks with stacked BDIs", which is exactly
the case for AFS when using a disk cache.  The congestion on the AFS
backing device keeps processes from writing to other devices,
including the ext2/3 device holding the disk cache.  So the cache
manager can't make progress in writing back its dirty data.

See for instance: https://bugzilla.redhat.com/show_bug.cgi?id=453811 -
a request to backport the patch set to 2.6.18 for RHEL 5.

This doesn't look too promising - it's been in their pipe for almost a year.

It may well be that there's no way to work around this kernel problem
in the AFS code.

That seems likely. So this is actually good news. Nothing any of us can do,
right ;-)

Seriously though, the fix will eventually be available in RHEL (even if not
before EL6, we'll see).
Furthermore, this particular deadlock is a lot harder to reproduce than
the one fixed by the linux-mmap-antirecursion patch, and personally we never
even had problems with that one.
As such, we'd rather chance a deadlock we've never seen happen than data
corruption catching us at unawares.

For 1.4.10 testing, I'm in the process of deploying clients with
linux-mmap-antirecursion-20081020 reversed.
Needless to say, I had rather solved this thoroughly, but I lack both time and
expertise to hope for any success.

Cheers
 - Felix
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to