Hi! I have a 2.6.12-gentoo-r4 kernel, single CPU p4, SMP (HT) enabled, preemption disabled. I'm running openafs 1.3.87. When I start "afsd" with the parameters -memcache -chunksize 14 -afsdb -dynroot, and when I have the following /etc/openafs/cacheinfo: /afs:/usr/vice/cache:500000 (When using cachesize 50000, the problem doesn't occur, or at least not as easily (which means: I have seen errors when using smaller cachesize, but they may well have been caused by something else)) The console displays "afsd: All AFS daemons started." and then waits forever. Very shortly after that, I get a kernel oops. The machine doesn't hang however. In ps auxwf I find:
root 12829 0.0 0.0 2000 868 tty3 D+ 12:56 0:00 \_ /usr/sbin/afsd -memcache - chunksize 14 -afsdb -dynroot root 12833 0.0 0.0 0 0 tty3 Z<+ 12:56 0:00 \_ [afsd] <defunct> root 12834 0.0 0.0 0 0 tty3 Z+ 12:56 0:00 \_ [afsd] <defunct> root 12837 0.0 0.0 0 0 tty3 Z<+ 12:56 0:00 \_ [afsd] <defunct> root 12839 0.0 0.0 0 0 tty3 Z+ 12:56 0:00 \_ [afsd] <defunct> root 12842 0.0 0.0 0 0 tty3 Z+ 12:56 0:00 \_ [afsd] <defunct> root 12844 0.0 0.0 1996 860 tty3 D+ 12:56 0:00 \_ /usr/sbin/afsd -memcac he -chunksize 14 -afsdb -dynroot root 12846 0.0 0.0 0 0 tty3 Z+ 12:56 0:00 \_ [afsd] <defunct> root 12848 0.0 0.0 1996 860 tty3 D+ 12:56 0:00 \_ /usr/sbin/afsd -memcac he -chunksize 14 -afsdb -dynroot root 12850 0.0 0.0 0 0 tty3 Z+ 12:56 0:00 \_ [afsd] <defunct> and also: root 12835 0.0 0.0 0 0 ? S 12:56 0:00 [afs_rxlistener] root 12836 0.0 0.0 0 0 ? S 12:56 0:00 [afs_callback] root 12838 0.0 0.0 0 0 ? D 12:56 0:00 [afs_rxevent] root 12840 0.0 0.0 1996 860 ? Ss 12:56 0:00 /usr/sbin/afsd -memcache -chunksize 14 -afsdb -dynroot root 12843 0.0 0.0 0 0 ? D 12:56 0:00 [afsd] root 12845 0.0 0.0 0 0 ? D 12:56 0:00 [afs_checkserver] root 12847 0.0 0.0 0 0 ? S 12:56 0:00 [afs_background] root 12849 0.0 0.0 0 0 ? S 12:56 0:00 [afs_background] The oops looks like this: (dmesg | ksymoops) ksymoops 2.4.11 on i686 2.6.12-gentoo-r4. -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.6.12-gentoo-r4/ (default) -m /boot/kernel-2.6.12-gentoo-r4/System.map (specified) Error (regular_file): read_ksyms stat /proc/ksyms failed ksymoops: No such file or directory No modules in ksyms, skipping objects No ksyms, skipping lsmod ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) Machine check exception polling timer started. SGI XFS with large block numbers, no debug enabled ehci_hcd 0000:00:1d.7: debug port 1 Unable to handle kernel NULL pointer dereference at virtual address 00000147 f9b58c77 *pde = 00000000 Oops: 0000 [#1] CPU: 1 EIP: 0060:[<f9b58c77>] Tainted: P VLI Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010206 (2.6.12-gentoo-r4) eax: f9bd34d4 ebx: 000000d7 ecx: 00007a12 edx: 00000000 esi: 0000000a edi: 00000000 ebp: 00000000 esp: cea27e30 ds: 007b es: 007b ss: 0068 Stack: c011cb47 cf372520 f6329300 32b7b53b 32b7b53b 0000006e cea27e68 c011cc9e cf372520 c1807558 d17226e0 d1722520 f5cca580 d1722648 00000004 000000d7 00000000 00000009 c042aa62 00000000 00000002 00000001 00000000 cea27ea8 Call Trace: [<c011cb47>] recalc_task_prio+0x8e/0x155 [<c011cc9e>] activate_task+0x90/0xa4 [<c042aa62>] schedule+0x3c6/0xc81 [<c042a514>] __down+0xcc/0xdb [<c011ed5a>] default_wake_function+0x0/0x12 [<c0137a91>] remove_wait_queue+0x1a/0x4a [<f9ba772c>] afs_osi_SleepSig+0x150/0x1a7 [libafs] [<f9b5821a>] afs_CacheTruncateDaemon+0x0/0x456 [libafs] [<c011ed5a>] default_wake_function+0x0/0x12 [<f9ba7819>] afs_osi_Sleep+0x96/0xbb [libafs] [<c010788c>] do_gettimeofday+0x1e/0xbf [<f9b58325>] afs_CacheTruncateDaemon+0x10b/0x456 [libafs] [<f9bac7b0>] afsd_thread+0x3d0/0x656 [libafs] [<f9bac3e0>] afsd_thread+0x0/0x656 [libafs] [<c0101401>] kernel_thread_helper+0x5/0xb Code: 31 bd f9 7c ec 8b 84 24 74 01 00 00 85 c0 0f 8e 55 01 00 00 8b 0d 44 31 bd f9 e9 39 fb ff ff a1 64 31 bd f9 8b 1c b0 85 db 74 0b <66> 83 7b 70 00 0f 85 5a fb ff ff a1 e4 31 bd f9 80 e2 08 8b 3c >>EIP; f9b58c77 <pg0+3959fc77/3fa45400> <===== >>eax; f9bd34d4 <pg0+3961a4d4/3fa45400> >>esp; cea27e30 <pg0+e46ee30/3fa45400> Trace; c011cb47 <recalc_task_prio+8e/155> Trace; c011cc9e <activate_task+90/a4> Trace; c042aa62 <schedule+3c6/c81> Trace; c042a514 <__down+cc/db> Trace; c011ed5a <default_wake_function+0/12> Trace; c0137a91 <remove_wait_queue+1a/4a> Trace; f9ba772c <pg0+395ee72c/3fa45400> Trace; f9b5821a <pg0+3959f21a/3fa45400> Trace; c011ed5a <default_wake_function+0/12> Trace; f9ba7819 <pg0+395ee819/3fa45400> Trace; c010788c <do_gettimeofday+1e/bf> Trace; f9b58325 <pg0+3959f325/3fa45400> Trace; f9bac7b0 <pg0+395f37b0/3fa45400> Trace; f9bac3e0 <pg0+395f33e0/3fa45400> Trace; c0101401 <kernel_thread_helper+5/b> This architecture has variable length instructions, decoding before eip is unreliable, take these instructions with a pinch of salt. Code; f9b58c4c <pg0+3959fc4c/3fa45400> 00000000 <_EIP>: Code; f9b58c4c <pg0+3959fc4c/3fa45400> 0: 31 bd f9 7c ec 8b xor %edi,0x8bec7cf9(%ebp) Code; f9b58c52 <pg0+3959fc52/3fa45400> 6: 84 24 74 test %ah,(%esp,%esi,2) Code; f9b58c55 <pg0+3959fc55/3fa45400> 9: 01 00 add %eax,(%eax) Code; f9b58c57 <pg0+3959fc57/3fa45400> b: 00 85 c0 0f 8e 55 add %al,0x558e0fc0(%ebp) Code; f9b58c5d <pg0+3959fc5d/3fa45400> 11: 01 00 add %eax,(%eax) Code; f9b58c5f <pg0+3959fc5f/3fa45400> 13: 00 8b 0d 44 31 bd add %cl,0xbd31440d(%ebx) Code; f9b58c65 <pg0+3959fc65/3fa45400> 19: f9 stc Code; f9b58c66 <pg0+3959fc66/3fa45400> 1a: e9 39 fb ff ff jmp fffffb58 <_EIP+0xfffffb58> Code; f9b58c6b <pg0+3959fc6b/3fa45400> 1f: a1 64 31 bd f9 mov 0xf9bd3164,%eax Code; f9b58c70 <pg0+3959fc70/3fa45400> 24: 8b 1c b0 mov (%eax,%esi,4),%ebx Code; f9b58c73 <pg0+3959fc73/3fa45400> 27: 85 db test %ebx,%ebx Code; f9b58c75 <pg0+3959fc75/3fa45400> 29: 74 0b je 36 <_EIP+0x36> This decode from eip onwards should be reliable Code; f9b58c77 <pg0+3959fc77/3fa45400> 00000000 <_EIP>: Code; f9b58c77 <pg0+3959fc77/3fa45400> <===== 0: 66 83 7b 70 00 cmpw $0x0,0x70(%ebx) <===== Code; f9b58c7c <pg0+3959fc7c/3fa45400> 5: 0f 85 5a fb ff ff jne fffffb65 <_EIP+0xfffffb65> Code; f9b58c82 <pg0+3959fc82/3fa45400> b: a1 e4 31 bd f9 mov 0xf9bd31e4,%eax Code; f9b58c87 <pg0+3959fc87/3fa45400> 10: 80 e2 08 and $0x8,%dl Code; f9b58c8a <pg0+3959fc8a/3fa45400> 13: 8b .byte 0x8b Code; f9b58c8b <pg0+3959fc8b/3fa45400> 14: 3c .byte 0x3c 1 error issued. Results may not be reliable. Cheers, Stefaan _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-devel