On Wed, 26 Oct 2011, Andrew Deason wrote:
On Wed, 26 Oct 2011 18:41:15 +0200
Stephan Wiesand <[email protected]> wrote:
Booker and me would probably be ok with errors being returned upon
access to a single volume that's being overwhelmed with I/O requests -
if it just wouldn't make the fileserver as a whole grind to a halt and
not service any request any more.
Well, see, it depends on _what_ is causing it to do that, as Jeffrey
said. If the threads are hanging on a lock somewhere in the host package
or Rx or something, this won't help a whole lot since we still have to
go through those layers and we'll still hang on those locks (same thing
for chewing up CPU, or moving memory around, etc). In fact, we'll do so
even more, since we (eventually) have to go through all that at least
twice for the VBUSY case.
The symptom we see is thread exhaustion due to write callbacks
from many clients for a single volume[1]. The problem is
insidious as it's not a gradual failure, because everything works just fine
until you hit a tipping point in the number of batch jobs.
It's often a file that the user isn't even aware they are
opening, but is a small file used by some library they are
using. Sometimes tracking down the file can take significant
effort.
- Booker C. Bense
[1]- I'm not the stuckee when this happens, just an interested
bystander so I may have the details slightly incorrect.
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info