This sunday we suffered two simultaneous fileserver crashes. Both of them crashed while talking to the same client, and they both crashed at the same codeline.
This was using openafs-1.2.6 on Tru64 5.0a on alpha. The crash occured in GetClient() on line 1485 in viced/host.c (ladebug) where #0 0x3ff800d08f8 in _sigprocmask(0x2, 0x20, 0x0, 0x0, 0x0, 0x2000324cea5) in /usr/shlib/libc.so #1 0x3ff800d2b6c in __sigprocmask(0x2, 0x20, 0x0, 0x0, 0x0, 0x2000324cea5) in /usr/shlib/libc.so #2 0x3ff801894f0 in abort(0x2, 0x20, 0x0, 0x0, 0x0, 0x2000324cea5) in /usr/shlib/libc.so #3 0x120042334 in AssertionFailed(file=0x140004cf0="../viced/host.c", line=32) "../util/assert.c":44 >4 0x12003344c in GetClient(tcon=0x1419c0900, cp=0x2000324d858) "../viced/host.c":1485 #5 0x120029f54 in GetVolumePackage(tcon=0x1419c0900, Fid=0x2000324d988, volptr=0x2000324d860, targetptr=0x2000324d870, chkforDir=<no value>, parent=0x2000324d868, client=0x2000324d858, locktype=1, rights=0x2000324d850, anyrights=0x2000324d848) "../viced/afsfileprocs.c":4922 #6 0x12001dce4 in SAFSS_FetchStatus(tcall=0x1418d1c00, Fid=0x2000324d988, OutStatus=0x2000324d9a8, CallBack=0x2000324d978, Sync=0x2000324d960) "../viced/afsfileprocs.c":817 #7 0x12001ee84 in SRXAFS_FetchStatus(tcon=0x1419c0900, Fid=0x2000324d988, OutStatus=0x2000324d9a8, CallBack=0x2000324d978, Sync=0x2000324d960) "../viced/afsfileprocs.c":1173 #8 0x12005df18 in _RXAFS_FetchStatus(z_call=0x1418d1c00, z_xdrs=0x2000324da20) "../fsint/afsint.ss.c":174 #9 0x120063c8c in RXAFS_ExecuteRequest(z_call=0x1418d1c00) "../fsint/afsint.ss.c":1892 #10 0x12007ccb8 in rxi_ServerProc(threadID=<no value>, newcall=0x0, socketp=0x2000324dac0) "../rx/rx.c":1326 #11 0x120092c9c in rx_ServerProc() "../rx/rx_pthread.c":288 #12 0x12009246c in server_entry(argp=0x3ff805b44a0) "../rx/rx_pthread.c":94 #13 0x3ff805b5f3c in __thdBase(0x2, 0x20, 0x0, 0x0, 0x0, 0x2000324cea5) in /usr/shlib/libpthread.so (ladebug) p client 0x0 (ladebug) p tcon->nSpecific 2 (ladebug) p rxcon_client_key 1 (ladebug) p tcon->specific[1] 0x0 The way I read the code, tcon->specific[rxcon_client_key] can only become NULL at the same time as tcon->nSpecific is 2 if rx_SetSpecific is called to set it to NULL. This is done in h_TossStuff_r() The log shows: Sun Sep 15 17:40:43 2002 CB: new identity for host 130.237.49.75:26386, deleting Sun Sep 15 17:40:59 2002 CB: new identity for host 130.237.49.75:26386, deleting Sun Sep 15 17:40:59 2002 CB: new identity for host 130.237.49.75:26386, deleting Sun Sep 15 17:40:59 2002 CB: new identity for host 130.237.49.75:26386, deleting Sun Sep 15 17:40:59 2002 CB: new identity for host 130.237.49.75:26386, deleting Sun Sep 15 17:42:20 2002 CB: new identity for host 130.237.49.75:26386, deleting 26386 is 4711 if you swap byteorder, so there seems to be a byteorder error somewhere, but that's kinda unrelated I guess. : datan mattiasa \$ ; rxdebug 130.237.49.75 4711 -version Trying 130.237.49.75 (port 4711): AFS version: arla-0.35.8pre1 So, I asume that h_GetHost_r (called from preable) manages to h_Release_r(host) and toss it. What I don't understand is how it can later become reused. /mattiasa _______________________________________________ OpenAFS-devel mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-devel
