Hi, I'm invoking some OpenAFS API methods via JAFS from multiple threads in JAVA. The JAVA VM and JAFS use the same /lib/tls/libpthread.so.0 library. At random times it crashes with assertion fail in libadmin, at random places where UNLOCK_GLOBAL_MUTEX macro is called. Possible assertion places: : Assertion failed! file ../auth/cellconfig.c, line 968. : Assertion failed! file afs_clientAdmin.c, line 1968. : Assertion failed! file afs_utilAdmin.c, line 460
I've put a printf trace into src/util/pthread_glock.c, pthread_recursive_mutex_unlock into the else block, and I get in each cases: pthread_mutex_unlock else: mut->locked: 1, mut->times_inside: 2, mut->owner: 1133333424, pthread_self: 1133067184 (so that the owner and the calling pthread looks different, and the function sets rc=-1 which causes assert) The error occures rarely, but if occures then it comes out within 10min-2hour. I've written a C OpenAFS API thread test, but this error didn't come out (so far...). (Note: I've read that JAVA VM suspend threads very often, perhaps that's why I'm getting it _only_ from my java test.) I have 2 suspicions: (1) phtread_glock.c, h is buggy: the members of struct pthread_recursive_mutex_t are not "volatile" - I'm still testing this case... ... (it seems to me a real error in theory, but first I have to test a lot whether setting them "volatile" solves my problem...) (2) the JavaVM thread handling is not 100% compatible(?) with the usage of OpenAFS pthreads (it's strange because they _seem_ to work together well mostly) And I've read if JavaVM and C part use the same pthread implementation, they should work together. I was able to reproduce the error on 2 platforms: - SLES9, 2.6.5-7.193-smp, i686, Classic VM (build 1.4.2, J2RE 1.4.2 IBM build cxia321420-20040626 (JIT enabled: jitc)), libpthread: NPTL, 2 cpu, glibc-2.3.3-98.47 - SuSE8, 2.6.11.5, i686, Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed mode), libpthread: LinuxThreads, 1 cpu, glibc-2.3.2-88 (openafs-1.3.87) If anybody can strenghten/reject one of my suspicions above, or experienced the same, please tell me. Thank you in advance. Note 1: I've opened a ticket about this in RT: #21526 Note 2: using the native recursive mutex implementation of pthread _seems_ to solve this problem (after 1-2 tests), and seems to be a faster/well tested implementation. Why not use it? Note 3: an _example_ of a stack trace _part_ - just to help understanding the problem: ... 3HPNATIVESTACK Native Stack of "Thread-2" PID 7065 NULL ------------------------- 3HPSTACKLINE FFFFE410 3HPSTACKLINE abort at 40073CE9 in libc.so.6 3HPSTACKLINE ?? at 433E121E in libjafsadm.so 3HPSTACKLINE util_AdminServerAddressGetFromName at 433B1F62 in libjafsadm.so 3HPSTACKLINE bos_ServerOpen at 433A3D70 in libjafsadm.so 3HPSTACKLINE Java_org_openafs_jafs_Server_getBosServerHandle at 4338340E in libjafsadm.so 3HPSTACKLINE 431FA9C8 3HPSTACKLINE mmipExecuteJava at 402F3D03 in libjvm.so 3HPSTACKLINE 438D2B58 ... (I have more javacore files having something like the above part) -- Peter Somogyi Software Developer, Gamax Ltd. 1114 Budapest, Bartok B. u 15/d Tel.: +36-1-381-0544 e-mail: [EMAIL PROTECTED] _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
