Simon Wilkinson <s...@inf.ed.ac.uk> writes: > The first possible cause is journalling filesystems. Many of these flush > their journals to disk at regular intervals, blocking or reducing access > to the filesystem during the journal flush. This block can be enough to > cause the fileserver to start queuing incoming connections, and in a > site that is finely balanced, may be enough to cause performance to > stall. This was made considerably worse by the fileserver performing a > sync() operation every 10 seconds. This is fixed in 1.6.0 - a 1.4.x > release containing the fix has yet to appear.
I *think* we're currently running a file server with patches applied to disable some of the sync() calls, but I may be misremembering. I know we've had this discussion before. > The next cause is due to deadlocks between the client and the > fileserver. The Linux dynamic vcaches code which was added in 1.4.10 is > known to interact badly with fileserver callback breaks, especially in > situations where the fileserver is under heavy load. There is a fix in > 1.6.0, but we have yet to ship a 1.4.x release which contains it. You > can also work around this particular problem by disabling dynamic > vcaches in your clients. The www.stanford.edu clients that are having problems are running with a patch to not hold the lock that causes the deadlock condition with callback breaks. -- Russ Allbery (r...@stanford.edu) <http://www.eyrie.org/~eagle/> _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-devel