Hi,

I'm invoking some OpenAFS API methods via JAFS from multiple threads in JAVA.
The JAVA VM and JAFS use the same /lib/tls/libpthread.so.0 library.
At random times it crashes with assertion fail in libadmin, at random places 
where UNLOCK_GLOBAL_MUTEX macro is called.
Possible assertion places:
: Assertion failed! file ../auth/cellconfig.c, line 968.
: Assertion failed! file afs_clientAdmin.c, line 1968.
: Assertion failed! file afs_utilAdmin.c, line 460

I've put a printf trace into src/util/pthread_glock.c, 
pthread_recursive_mutex_unlock into the else block, and I get in each cases:
pthread_mutex_unlock else: mut->locked: 1, mut->times_inside: 2, mut->owner: 
1133333424, pthread_self: 1133067184
(so that the owner and the calling pthread looks different, and the function 
sets rc=-1 which causes assert)

The error occures rarely, but if occures then it comes out within 10min-2hour.

I've written a C OpenAFS API thread test, but this error didn't come out (so 
far...).
(Note: I've read that JAVA VM suspend threads very often, perhaps that's why 
I'm getting it _only_ from my java test.)

I have 2 suspicions:

(1) phtread_glock.c, h is buggy: the members of struct 
pthread_recursive_mutex_t are not "volatile" - I'm still testing this case... 
...
(it seems to me a real error in theory, but first I have to test a lot whether 
setting them "volatile" solves my problem...)

(2) the JavaVM thread handling is not 100% compatible(?) with the usage of 
OpenAFS pthreads (it's strange because they _seem_ to work together well mostly)
And I've read if JavaVM and C part use the same pthread implementation, they 
should work together.

I was able to reproduce the error on 2 platforms:
- SLES9, 2.6.5-7.193-smp, i686, Classic VM (build 1.4.2, J2RE 1.4.2 IBM build 
cxia321420-20040626 (JIT enabled: jitc)), libpthread: NPTL, 2 cpu, 
glibc-2.3.3-98.47
- SuSE8, 2.6.11.5, i686, Java HotSpot(TM) Client VM (build 1.4.2-b28, mixed 
mode), libpthread: LinuxThreads, 1 cpu, glibc-2.3.2-88
(openafs-1.3.87)

If anybody can strenghten/reject one of my suspicions above, or experienced the 
same, please tell me.
Thank you in advance.

Note 1: I've opened a ticket about this in RT: #21526
Note 2: using the native recursive mutex implementation of pthread _seems_ to 
solve this problem (after 1-2 tests), and seems to be a faster/well tested 
implementation. Why not use it?
Note 3: an _example_ of a stack trace _part_ - just to help understanding the 
problem:
...
3HPNATIVESTACK         Native Stack of "Thread-2" PID 7065
NULL                   -------------------------
3HPSTACKLINE            FFFFE410
3HPSTACKLINE            abort at 40073CE9 in libc.so.6
3HPSTACKLINE            ?? at 433E121E in libjafsadm.so
3HPSTACKLINE            util_AdminServerAddressGetFromName at 433B1F62 in 
libjafsadm.so
3HPSTACKLINE            bos_ServerOpen at 433A3D70 in libjafsadm.so
3HPSTACKLINE            Java_org_openafs_jafs_Server_getBosServerHandle at 
4338340E in libjafsadm.so
3HPSTACKLINE            431FA9C8
3HPSTACKLINE            mmipExecuteJava at 402F3D03 in libjvm.so
3HPSTACKLINE            438D2B58
...
(I have more javacore files having something like the above part)

-- 
Peter Somogyi
Software Developer, Gamax Ltd.
1114 Budapest, Bartok B. u 15/d
Tel.: +36-1-381-0544
e-mail: [EMAIL PROTECTED]

_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to