On Wed, 11 Apr 2007, Derrick J Brashear wrote:

On Wed, 11 Apr 2007, Stephan Wiesand wrote:

One of our systems panicked two times within 2 hours yesterday, at the same location in the OpenAFS client. I attached the kernel's last words below.

This is an SL3 system, kernel 2.4.21-47.0.1.ELsmp, i686. The client build has two patches on top of 1.4.4: linux-task-pointer-safety-20070320 from CVS, and the one from
https://lists.openafs.org/pipermail/openafs-devel/2007-March/014985.html

afs_HashOutDCache has
   /* if this guy is in the hash table, pull him out */
   if (adc->f.fid.Fid.Volume != 0) {
       i = DCHash(&adc->f.fid, adc->f.chunk);
       us = afs_dchashTbl[i];
       if (us == adc->index) {
..
      } else {
           /* somewhere on the chain */
           while (us != NULLIDX) {
               if (afs_dcnextTbl[us] == adc->index) {
                   /* found item pointing at the one to delete */
                   afs_dcnextTbl[us] = afs_dcnextTbl[adc->index];
                   break;
               }
               us = afs_dcnextTbl[us];
           }
           if (us == NULLIDX)
               osi_Panic("dcache hc");

so basically you appear to have an unhashed dcache entry. Either there's a locking bug or something is becoming erroneously unhashed.

How reproducible is it?

Good news: it is reproducible. The user confessed that he'd run "less than 20" parallel rsyncs transferring data to our cell. The files are a mixture af data and log files, with typical sizes of 15MB and 100kB.

So I set up a dozen rsyncs to copy this data into another volume, and after some 9 hours got the panic you find below.

I'm going to repeat this exercise now, and will also try to make the panic happen earlier (more rsyncs, read data from a faster source - any other
ideas?).

Just wondering what to do next then.

Thanks for caring,
        Stephan

PS Here's the Oops:

dcache hc<1>Unable to handle kernel NULL pointer dereference at virtual address 
00000000
printing eip: f8a6da50 *pde = 34669001 *pte = 5b103067 Oops: 0002 panfs nfs lockd sunrpc openafs netconsole 3c59x mii microcode ohci1394 ieee1394 loop keybdev mousedev hid input usb-uhci usbcore ext3 jbd lvm-mod aic7xxx disk CPU: 2 EIP: 0060:[<f8a6da50>] Tainted: P EFLAGS: 00010282

EIP is at osi_Panic [openafs] 0x20 (2.4.21-47.0.1.ELsmp/i686) eax: 00000009 ebx: f8b74000 ecx: 00000046 edx: c0388e98 esi: f8c328c0 edi: 0015fa73 ebp: 0000000d esp: f5427e04 ds: 0068 es: 0068 ss: 0068 Process afs_cachetrim (pid: 987, stackpage=f5427000) Stack: f8a9365b 00000002 00000000 f8a46e77 f8c328c0 0015fa73 0000000d f8a2d9ef
       f8a9365b 00000002 00000000 f8a46e77 f8c328c0 d4938380 0015fa73 f8a2d6a8
f8c328c0 00000000 00000000 0000f2da d0928990 00000000 00000000 4dd6d295 Call Trace: [<f8a9365b>] .rodata.str1.1 [openafs] 0x11f (0xf5427e04) [<f8a46e77>] shutdown_vcache [openafs] 0x357 (0xf5427e10) [<f8a2d9ef>] afs_HashOutDCache [openafs] 0x7f (0xf5427e20) [<f8a9365b>] .rodata.str1.1 [openafs] 0x11f (0xf5427e24) [<f8a46e77>] shutdown_vcache [openafs] 0x357 (0xf5427e30) [<f8a2d6a8>] afs_GetDownD [openafs] 0x528 (0xf5427e40) [<f8a2cd2e>] afs_CacheTruncateDaemon [openafs] 0x12e (0xf5427fa0) [<f8a7f9f0>] afsd_thread [openafs] 0x3e0 (0xf5427fe0) [<f8a7f610>] afsd_thread [openafs] 0x0 (0xf5427fe4) [<c01095cd>] kernel_thread_helper [kernel] 0x5 (0xf5427ff0)

Code: c6 05 00 00 00 00 00 83 c4 1c c3 90 8d 74 26 00 b8 4f 42 a9

Kernel panic: Fatal exception


--
Stephan Wiesand
  DESY - DV -
  Platanenallee 6
  15738 Zeuthen, Germany
_______________________________________________
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to