Simon Wilkinson wrote:

On 18 Nov 2009, at 23:51, Jason Edgecombe wrote:

Nate Gordon wrote:

As someone who also runs AFS as the backend to a webserver, I can understand your problems. My problems stem more specifically from PHP on AFS and that PHP the language feels it is necessary to perform lots and lots of trivial stat operations. I have theorized that there are some global locking issues

This is the crux of the problem. Sadly, the AFS kernel module has a single global lock, which it uses to prevent two processes from being in the module at the same time. This does lead to contention, especially around operations like lookup and getattr, which applications expect to be low-cost. I do have a cunning plan to get round this, but it's going to require a bit more thought, and a lot of testing, before it's ready to see the light of day.

In addition to our own global lock, we also hold the Linux Big Kernel Lock around most of our VFS operations. This means that not only can we never run concurrently, but a number of other kernel operations are prevented from doing so, too. Matt Benjamin has done some work in the 1.5 tree which suggests that we can get rid of the BKL when we're using memory cache - I suspect that we may be able to generalise this to remove it for many operations, even when the disk cache is in use.

Derrick, I have 1.4.10 with the STABLE14-background-fsync-consistency-issues patch already compiled and ready to deploy. Would that be new enough to consider debugging?

If you are rolling out 1.4.10, then I would recommend that you disable the dynamic vcache support in it. Whilst dynamic vcaches are a huge improvement, the implementation in 1.4.10 aggressively minimises the number of vcaches that AFS holds by invalidating the Linux directory lookup cache every 5 minutes. If you are already seeing contention problems on lookup then this is likely to make things worse, by causing more time to be spent under the global lock.
1. My web server is on solaris, currently.
2. How do I disable dynamic vcache support?
3. I'm rolling 1.4.10 because I already have it compiled and packaged for deployment. (using the AFS package program, if that matters). I have already tested these binaries on other machines. Recompiling would require restarting the whole test cycle.

I have some data that suggests I need to increase the size of the vcache. It's about a 5% miss rate vs 2% miss rate on the dcache. I'll tweak the vcache size if the upgrade doesn't improve things.

Thanks,
Jason
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to