After our fileserver fell over, the salvager had to run and it fell over with:
Program terminated with signal 6, Aborted. #0 0x00007f2b3fe328a5 in raise () from /lib64/libc.so.6 64 gdb on the core gives: (gdb) where #0 0x00007f2b3fe328a5 in raise () from /lib64/libc.so.6 #1 0x00007f2b3fe34085 in abort () from /lib64/libc.so.6 #2 0x0000000000424851 in osi_Panic ( msg=0x43ef88 "assertion failed: %s, file: %s, line: %d\n") at rx_user.c:251 #3 0x000000000042486e in osi_AssertFailU ( expr=0xed1a <Address 0xed1a out of bounds>, file=0x6 <Address 0x6 out of bounds>, line=-1) at rx_user.c:261 #4 0x000000000040a29b in SalvageVolume (salvinfo=0x7fffd0c150b0, rwIsp=<value optimized out>, alinkH=0x17125b0) at vol-salvage.c:3986 #5 0x000000000040cb2d in DoSalvageVolumeGroup ( salvinfo=<value optimized out>, isp=0x1710450, nVols=1) at vol-salvage.c:2092 #6 0x000000000040db85 in SalvageFileSys1 (partP=<value optimized out>, singleVolumeNumber=0) at vol-salvage.c:937 #7 0x000000000040e1c5 in SalvageFileSysParallel (partP=0x16ebbe0) at vol-salvage.c:667 #8 0x000000000040ee2f in handleit (as=<value optimized out>, arock=<value optimized out>) at ./salvager.c:375 #9 0x0000000000410687 in cmd_Dispatch (argc=7, argv=0x16e74b0) at cmd.c:905 #10 0x000000000040e9ce in main (argc=6, argv=0x7fffd0c15cc8) at ./salvager.c:534 (gdb) up #1 0x00007f2b3fe34085 in abort () from /lib64/libc.so.6 (gdb) up #2 0x0000000000424851 in osi_Panic ( msg=0x43ef88 "assertion failed: %s, file: %s, line: %d\n") at rx_user.c:251 251 afs_abort(); (gdb) up #3 0x000000000042486e in osi_AssertFailU ( expr=0xed1a <Address 0xed1a out of bounds>, file=0x6 <Address 0x6 out of bounds>, line=-1) at rx_user.c:261 261 osi_Panic("assertion failed: %s, file: %s, line: %d\n", expr, (gdb) up #4 0x000000000040a29b in SalvageVolume (salvinfo=0x7fffd0c150b0, rwIsp=<value optimized out>, alinkH=0x17125b0) at vol-salvage.c:3986 3986 osi_Assert(Delete(&dh, "..") == 0); (gdb) list 3981 SetSalvageDirHandle(&dh, vid, salvinfo->fileSysDevice, 3982 salvinfo->vnodeInfo[class].inodes[v], 3983 &salvinfo->VolumeChanged); 3984 pa.Vnode = LFVnode; 3985 pa.Unique = LFUnique; 3986 osi_Assert(Delete(&dh, "..") == 0); 3987 osi_Assert(Create(&dh, "..", &pa) == 0); 3988 3989 /* The original parent's link count was decremented above. 3990 * Here we increment the new parent's link count. (gdb) I assume the salvager tries to delete the directory entry .. and create it again new. Looks to me like FindItem() in dir.c:Delete() came up empty handed, we got ENOENT which did Abort(). Do you think it's safe to change row 3986 to something less dramatic that Abort() or do you have a better suggestion? Harald. _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-devel