I've found that if you run a program to generate tokens and pags frequently (about once per second), that fairly soon, the cpu system time on the machine will begin to swallow performance, though it takes a little while to observe it... but if you do that long enough, the machine will eventually grind to a halt. I found that this behavior started between openafs 1.4.1 and 1.4.2, where keyring support got enabled. Some experimentation has shown that the problem is related to the effective disabling of pag garbage collection when keyring support is compiled in.

Interestingly, just changing the bit of code to allow openafs w/ keyring support to do pag GC makes the problem go away, in that you don't get system time spikes/growing forever while afs.GCPAGs=1, but switching to afs.GCPAGs=0 makes the problem come back. So something about keyrings isn't really doing everything it should be if pag GC can make things better.

That patch is just:

--- src/afs/afs_osi.c.orig      2010-03-01 19:54:52.000000000 -0500
+++ src/afs/afs_osi.c   2010-03-01 19:55:00.000000000 -0500
@@ -841,7 +841,6 @@
 void
 afs_osi_TraverseProcTable()
 {
-#if !defined(LINUX_KEYRING_SUPPORT)
     struct task_struct *p;

 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,18) && 
defined(EXPORTED_TASKLIST_LOCK)
@@ -888,7 +888,6 @@
 #endif /* EXPORTED_TASKLIST_LOCK && LINUX_VERSION_CODE < 
KERNEL_VERSION(2,6,18) */
        rcu_read_unlock();
 #endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,16) */
-#endif
 }
 #endif

Maybe this isn't the best fix, but it definitely points out a problem.

(I also noticed that compilation of 1.4.12pre{3,4} breaks due to what appears to be a misapplied patch, where "crfee" is present in the code, but probably is supposed to be "crfree")
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to