Hello,
we are running OpenAFS 1.4.5 on a zLinux System (Novell SLES-9
distribution) and we see afs_cachetrim failing about once a weeknight
like this:
Feb 18 01:02:49 mclinx kernel: openafs: dcache hvkernel BUG at
/usr/src/packages/BUILD/openafs-1.4.5/obj/s390/src/libafs/MODLOAD-2.6
.5-7.287.3-s390x-MP/afs_dcache.c:719!
Feb 18 01:02:49 mclinx kernel: illegal operation: 0001 [#1]
Feb 18 01:02:49 mclinx kernel: CPU: 1 Tainted: PF U
(2.6.5-7.287.3-s390x SLES9_SP3_BRANCH-20071002073136)
Feb 18 01:02:49 mclinx kernel: Process afs_cachetrim (pid: 6092, task:
00000003d4bf9000, ksp: 00000003e1d437b0)
Feb 18 01:02:49 mclinx kernel: Krnl PSW : 0700000180000000 00000003fe45a3c2
(afs_HashOutDCache+0x25a/0x284 [libafs])
Feb 18 01:02:49 mclinx kernel: Krnl GPRS: 00000000000000f0 0000000000507fc0
0000000000000079 00000003e1d43970
Feb 18 01:02:49 mclinx kernel: 00000003fe45a3c0 000000000002740c
00000003fe508670 0000000000000000
Feb 18 01:02:49 mclinx kernel: 0000000300000001 00000003fe4f1ff8
00000000000000c8 00000003feaae9c0
Feb 18 01:02:49 mclinx kernel: 00000003fe445000 00000003fe4c8e50
00000003fe45a3c0 00000003e1d43a70
Feb 18 01:02:49 mclinx kernel: Krnl Code: 00 00 e3 10 b0 a6 00 90 a5 1b 00
02 42 10 b0 a6 a7 18 00 00
Feb 18 01:02:49 mclinx kernel: Call Trace:
Feb 18 01:02:49 mclinx kernel: [<00000003fe45ac9e>]
afs_GetDownD+0x79a/0x960 [libafs]
Feb 18 01:02:49 mclinx kernel: [<00000003fe45ea02>]
afs_CacheTruncateDaemon+0x196/0x5e8 [libafs]
Feb 18 01:02:49 mclinx kernel: [<00000003fe4bf7a4>]
afsd_thread+0x3c8/0x8e8 [libafs]
Feb 18 01:02:49 mclinx kernel: [<0000000000108b60>]
kernel_thread_starter+0x14/0x1c
Feb 18 01:02:49 mclinx kernel:
Unfortunately, I have no idea how to reproduce this situation and we can't
predict when it will happen next.
Looking at the source code we can spot the problem here:
/* remove entry from *other* hash chain */
i = DVHash(&adc->f.fid);
us = afs_dvhashTbl[i];
if (us == adc->index) {
/* first dude in the list */
afs_dvhashTbl[i] = afs_dvnextTbl[adc->index];
} else {
/* somewhere on the chain */
while (us != NULLIDX) {
if (afs_dvnextTbl[us] == adc->index) {
/* found item pointing at the one to delete */
afs_dvnextTbl[us] = afs_dvnextTbl[adc->index];
break;
}
us = afs_dvnextTbl[us];
}
if (us == NULLIDX)
osi_Panic("dcache hv"); <---------- this is the line 719 as
shown in the panic message
}
Well, I don't know what the DVHash is, but it looks like the
index which is found in the afs_dvhashTbl[] array which is
indicated by &adc->f.fid is containing the value "NULLIDX"
and it seems the routine afs_HashOutDCache() doesn't like
this at all.
My question is under which circumstances the cachetrim thread enters
this situation. Is there indeed no other way then calling osi_Panic()
to handle this?
The cachetrim crash is painful for us because we have to restart
the whole system whenever we face it.
With kind regards,
Carsten Jacobi (*120-4468)
Firmware Development in Böblingen
IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294