Re: wm0 panic
On Sat, Jun 27, 2020 at 04:24:21PM +0100, Patrick Welche wrote: > Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can > boot multiuser without a network. If I log in as root, as soon as I hit > enter: > > # ifconfig wm0 inet 10.0.0.62 netmask 0xff00 > [ 127.5763268] Kernel lock error 127.5763268] lock address : > 0x8106ab40 type : spin I can't reproduce this after http://mail-index.netbsd.org/source-changes/2020/07/07/msg119158.html Cheers, Patrick
Re: wm0 panic
Hello, all. On 2020/07/06 17:03, Masanobu SAITOH wrote: Hi, all. On 2020/06/29 12:53, Kengo NAKAHARA wrote: Hi, On 2020/06/28 0:24, Patrick Welche wrote: Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can boot multiuser without a network. If I log in as root, as soon as I hit enter: # ifconfig wm0 inet 10.0.0.62 netmask 0xff00 [ 127.5763268] Kernel lock error 127.5763268] lock address : 0x8106ab40 type : spin [ 127.5863237] initialized : 0x80b0bbb9 [ 127.5863237] shared holds : 0 exclusive: 1 [ 127.5963238] shares wanted: 0 exclusive: 1 [ 127.6063236] relevant cpu : 1 last held: 0 [ 127.6163235] relevant lwp : 0x8d419a07f20 [ 127.6163235] last locked* : 0x80a7d2f5 unlocked : 0x80a7d2e6 [ 127.6263235] curcpu holds : 0 wanted by: 0x8d419a07f200 [ 127.6363234] panic: LOCKDEBock,244: spinout [ 127.6363234] cpu1: Begin traceback... [ 127.6463233] vpanic() at netbsd:vpanic+0x152 [ 127.6463233] snprintf() at netbsd:snprintf [ 127.6563232] lockdebug_more() at netbsd:lockdebug_more [ 127.6563232] _kernel_lock() at netbsd:_kernel_lock+0x244 [ 127.6663231] ip_slowtimo() at netbsd:ip_slowtimo+0x1a [ 127.6763231] pfslowtimo() at netbsd:pfslowtimo+0x34 [ 127.6763231] callout_softclock() at netbsd:callout_softclock+0x10f [ 127.6863230] softint_disph+0x108 [ 127.6863230] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xa4825d02eff0 [ 127.6963230] Xsoftintr() at netbsd:Xsoftintr+0x4f [ 127.7063229] --- interrupt --- [ 127.706322traceback... It seems some other code have held KERNEL_LOCK too long time. Could you show the function of last locked address? # e.g. addr2line -e "your kernel image" -f 0x80a7d2f5 If the panic can reappear, could you show "show all locks/t" of ddb? Thanks, It seems this problem is the same as the following mail: http://mail-index.netbsd.org/current-users/2020/06/03/msg038785.html Me, and Patrick tried to find the root cause of this problem off-list. (Note that I can't reproduce the same problem on my machines) Every time Patrick got this problem, ifconfig had tried to take module_hook.mtx which was initialized in module_hook_init(). It seems ddb can't trace the stack of ifconfig. Was the process in a module? Is there a way to know what's happening in it? -- [ 22.3977608] Kernel lock error: _kernel_lock,244: spinout [ 22.4077593] lock address : 0x818a9600 type : spin [ 22.4177593] initialized : 0x80e7e7e9 [ 22.4277592] shared holds : 0 exclusive: 1 [ 22.4477593] shares wanted: 0 exclusive: 2 [ 22.4577592] relevant cpu : 6 last held: 0 [ 22.4677593] relevant lwp : 0x850788f52140 last held: 0x85078a413b40 [ 22.4877595] last locked* : 0x80db867b unlocked : 0x80db866c [ 22.4977596] curcpu holds : 0 wanted by: 0x850788f52140 [ 22.5077596] panic: LOCKDEBUG: Kernel lock error: _kernel_lock,244: spinout [ 22.5277594] cpu6: Begin traceback... [ 22.5377595] vpanic() at netbsd:vpanic+0x152 [ 22.5377595] snprintf() at netbsd:snprintf [ 22.5477594] lockdebug_more() at netbsd:lockdebug_more [ 22.5577621] _kernel_lock() at netbsd:_kernel_lock+0x244 [ 22.5777595] frag6_fasttimo() at netbsd:frag6_fasttimo+0x1a [ 22.5877596] pffasttimo() at netbsd:pffasttimo+0x34 [ 22.5977596] callout_softclock() at netbsd:callout_softclock+0x10f [ 22.6077597] softint_dispatch() at netbsd:softint_dispatch+0x108 [ 22.6177598] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0x96025d0d1ff0 [ 22.6377596] Xsoftintr() at netbsd:Xsoftintr+0x4f [ 22.6377596] --- interrupt --- [ 22.6477596] d787ba705305aa3: [ 22.6577597] cpu6: End traceback... [ 22.6577597] fatal breakpoint trap in supervisor mode [ 22.6677597] trap type 1 code 0 rip 0x8022098d cs 0x8 rflags 0x202 cr2 0 ilevel 0x2 rsp 0x96025d0d1d70 [ 22.6877599] curlwp 0x850788f52140 pid 0.118 lowest kstack 0x96025d0cd2c0 Stopped in pid 0.118 (system) atnetbsd:breakpoint+0x5: leave breakpoint() at netbsd:breakpoint+0x5 vpanic() at netbsd:vpanic+0x152 snprintf() at netbsd:snprintf lockdebug_more() at netbsd:lockdebug_more _kernel_lock() at netbsd:_kernel_lock+0x244 frag6_fasttimo() at netbsd:frag6_fasttimo+0x1a pffasttimo() at netbsd:pffasttimo+0x34 callout_softclock() at netbsd:callout_softclock+0x10f softint_dispatch() at netbsd:softint_dispatch+0x108 DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0x96025d0d1ff0 Xsoftintr() at netbsd:Xsoftintr+0x4f --- interrupt --- d787ba705305aa3: ds 1d80 es 1d30 fs 1d70 gs 10 rdi 2 rsi 2d5 rbp 96025d0d1d70 rbx 104 rdx 1 rcx 2 rax 0 r8 104 r9
Re: wm0 panic
Hi, all. On 2020/06/29 12:53, Kengo NAKAHARA wrote: > Hi, > > On 2020/06/28 0:24, Patrick Welche wrote: >> Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can >> boot multiuser without a network. If I log in as root, as soon as I hit >> enter: >> >> # ifconfig wm0 inet 10.0.0.62 netmask 0xff00 >> [ 127.5763268] Kernel lock error 127.5763268] lock address : >> 0x8106ab40 type : spin >> [ 127.5863237] initialized : 0x80b0bbb9 >> [ 127.5863237] shared holds : 0 exclusive: >> 1 >> [ 127.5963238] shares wanted: 0 exclusive: >> 1 >> [ 127.6063236] relevant cpu : 1 last held: >> 0 >> [ 127.6163235] relevant lwp : 0x8d419a07f20 >> [ 127.6163235] last locked* : 0x80a7d2f5 unlocked : >> 0x80a7d2e6 >> [ 127.6263235] curcpu holds : 0 wanted by: >> 0x8d419a07f200 >> [ 127.6363234] panic: LOCKDEBock,244: spinout >> [ 127.6363234] cpu1: Begin traceback... >> [ 127.6463233] vpanic() at netbsd:vpanic+0x152 >> [ 127.6463233] snprintf() at netbsd:snprintf >> [ 127.6563232] lockdebug_more() at netbsd:lockdebug_more >> [ 127.6563232] _kernel_lock() at netbsd:_kernel_lock+0x244 >> [ 127.6663231] ip_slowtimo() at netbsd:ip_slowtimo+0x1a >> [ 127.6763231] pfslowtimo() at netbsd:pfslowtimo+0x34 >> [ 127.6763231] callout_softclock() at netbsd:callout_softclock+0x10f >> [ 127.6863230] softint_disph+0x108 >> [ 127.6863230] DDB lost frame for netbsd:Xsoftintr+0x4f, trying >> 0xa4825d02eff0 >> [ 127.6963230] Xsoftintr() at netbsd:Xsoftintr+0x4f >> [ 127.7063229] --- interrupt --- >> [ 127.706322traceback... > It seems some other code have held KERNEL_LOCK too long time. > Could you show the function of last locked address? > # e.g. addr2line -e "your kernel image" -f 0x80a7d2f5 > > If the panic can reappear, could you show "show all locks/t" of ddb? > > > Thanks, It seems this problem is the same as the following mail: http://mail-index.netbsd.org/current-users/2020/06/03/msg038785.html -- --- SAITOH Masanobu (msai...@execsw.org msai...@netbsd.org)
Re: wm0 panic
On Mon, Jun 29, 2020 at 12:53:23PM +0900, Kengo NAKAHARA wrote: > It seems some other code have held KERNEL_LOCK too long time. > Could you show the function of last locked address? > # e.g. addr2line -e "your kernel image" -f 0x80a7d2f5 With Jun 28 14:26 code # addr2line -e netbsd.3.gdb -f 0x80a4c526 doifioctl /usr/src/sys/arch/amd64/compile/QUANTZDBG/../../../../net/if.c:3403 (discriminator 3) > If the panic can reappear, could you show "show all locks/t" of ddb? It is nicely reproducible (boot single user, type "ifconfig wm0 up"), I have a core dump and a serial console, but debugging locking issues is "interesting"! Thanks, Patrick type : spin initialized : 0x80ada119 shared holds : 0 exclusive: 1 shares wanted: 0 exclusive: 3 relevant cpu : 1 last held: 0 relevant lwp : 0xf1c63767f200 last held: 0xf1c6387c8a40 last locked* : 0x80a4c526 unlocked : 0x80a4c517 curcpu holds : 0 wanted by: 0xf1c63767f200 db{1}> show all locks /t [Locks tracked through LWPs] ** LWP 330.330 (ifconfig) @ 0xf1c6387c8a40, l_stat=7 *** Locks held: * Lock 0 (ick address : 0xf1c637a4e380 type : sleep/adaptive initialized : 0x80a475fd shared holds : 0 exclusive: 1 shares wanted: 0 exclusive: 0 relevant cpu : 0 last held: 0 relevant lwp : 0xf1c6388a40 last locked* : 0x80a4bf94 unlocked : 0x80a4c02b owner field : 0xf1c6387c8a40 wait/spin:0/0 Turnstile: no active turnstile for this lock. *** Loczed at module_hook_init) lock address : 0x8106a800 type : sleep/adaptive initialized : 0x80952c6e shared holds : 0 exclusive: 0 shares wanted: 0 exclusive: 0 relevant cpu : 0 last held: 0 relevant lwp : 0xf1c6387c8a40 last held: 00 last locked : 00 unlocked*: 00 owner field : 00 wait/spin:0/0 Turnstile: no active turnstile for this lock. *** Traceback: trace: pid 330 lid 330 at 0x8 address 0x283 is invalid ?() at 283 address 0x10 is invalid address 0x8 is invalid db_printf() at netbsd:db_printf ** LWP 0.402 (iic1) @ 0xf1c637f1aa40, l_stat=7 *** Locks held: none *** Locks wanted: * Lock 0 (initialized at main) lock address : 0x8106a700 type :0x80ada119 shared holds : 0 exclusive: 1 shares wanted: 0 exclusive: 3 relevant cpu : 2 last hellwp : 0xf1c637f1aa40 last held: 0xf1c6387c8a40 last locked* : 0x80a4c526 unlocked : 0x80a4c517 curcpu holds : 0 wanted by: 0xf1c63767f200 *** 02 at 0xb0025da16ec0 sleepq_block() at netbsd:sleepq_block+0x211 iic_smbus_intr_thread() at netbsd:iic_smbus_intr_thread+0x52 ** LWP 0.401 (iic0) @ 0xf1c637f1a600, l_stat=7 *** Locks held: none *** Locks wanted: * Lock 0 (initiax8106a700 type : spin initialized : 0x80ada119 shared holds : 0 exclusive: 1 shares wanted: 0 exclusive: 3 relevant cpu : 1 last held: 0 relevant lwp : 0xf1c637f1a600 last held: 0xf1c6387c8a40 last locked* : 0x80a4c526 unlocked : 0x80a4c517 curcpu holds : 0 wanted by: 0xf1c63767f200 *** Traceback: trace: pid 0 lid 401 at 0xb0025da11ec0 sleepq_block() at netbsd:sleepq_block+0x211 iic_smbus_intr_thread() at netbsd:iic_smbus_intr_thread+0x52 ** LWP 0.23 (softclk/1) @ 0xf1c63767f200, l_stat=7 *** Locks held: * Lock 0 (initialized at soinit) lock address : 0xf1cd177e3080 type : sleep/adaptive initialed holds : 0 exclusive: 1 shares wanted: 0 exclusive: 0 relevant cpu : 1 last held: 1 r last held: 0xf1c63767f200 last locked* : 0x806c3e65 unlocked : 0x806d5ebd owner field : 0xf1c63767f200 wait/spin:0/0 Turnstile: no active turnstileted: * Lock 0 (initialized at main) lock address : 0x8106a700 type : spin initialized : 0x80ada119 shared holds : 0 exclusive: 1 shares wanted: 0 exclusive: 3 relevant cpu : 1 last held: 0 relevant lwp : 0xf1c63767f200 last held: 0xf1c6387c8a40 last locked* : 0x80a4c526 unlocked :
Re: wm0 panic
Hi, On 2020/06/28 0:24, Patrick Welche wrote: Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can boot multiuser without a network. If I log in as root, as soon as I hit enter: # ifconfig wm0 inet 10.0.0.62 netmask 0xff00 [ 127.5763268] Kernel lock error 127.5763268] lock address : 0x8106ab40 type : spin [ 127.5863237] initialized : 0x80b0bbb9 [ 127.5863237] shared holds : 0 exclusive: 1 [ 127.5963238] shares wanted: 0 exclusive: 1 [ 127.6063236] relevant cpu : 1 last held: 0 [ 127.6163235] relevant lwp : 0x8d419a07f20 [ 127.6163235] last locked* : 0x80a7d2f5 unlocked : 0x80a7d2e6 [ 127.6263235] curcpu holds : 0 wanted by: 0x8d419a07f200 [ 127.6363234] panic: LOCKDEBock,244: spinout [ 127.6363234] cpu1: Begin traceback... [ 127.6463233] vpanic() at netbsd:vpanic+0x152 [ 127.6463233] snprintf() at netbsd:snprintf [ 127.6563232] lockdebug_more() at netbsd:lockdebug_more [ 127.6563232] _kernel_lock() at netbsd:_kernel_lock+0x244 [ 127.6663231] ip_slowtimo() at netbsd:ip_slowtimo+0x1a [ 127.6763231] pfslowtimo() at netbsd:pfslowtimo+0x34 [ 127.6763231] callout_softclock() at netbsd:callout_softclock+0x10f [ 127.6863230] softint_disph+0x108 [ 127.6863230] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xa4825d02eff0 [ 127.6963230] Xsoftintr() at netbsd:Xsoftintr+0x4f [ 127.7063229] --- interrupt --- [ 127.706322traceback... It seems some other code have held KERNEL_LOCK too long time. Could you show the function of last locked address? # e.g. addr2line -e "your kernel image" -f 0x80a7d2f5 If the panic can reappear, could you show "show all locks/t" of ddb? Thanks, -- // Internet Initiative Japan Inc. Device Engineering Section, Product Development Department, Product Division, Technology Unit Kengo NAKAHARA
Re: wm0 panic
On Sat, Jun 27, 2020 at 04:24:21PM +0100, Patrick Welche wrote: > (must try with biosboot instead fo EFI which is the case here) makes no difference
wm0 panic
Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can boot multiuser without a network. If I log in as root, as soon as I hit enter: # ifconfig wm0 inet 10.0.0.62 netmask 0xff00 [ 127.5763268] Kernel lock error 127.5763268] lock address : 0x8106ab40 type : spin [ 127.5863237] initialized : 0x80b0bbb9 [ 127.5863237] shared holds : 0 exclusive: 1 [ 127.5963238] shares wanted: 0 exclusive: 1 [ 127.6063236] relevant cpu : 1 last held: 0 [ 127.6163235] relevant lwp : 0x8d419a07f20 [ 127.6163235] last locked* : 0x80a7d2f5 unlocked : 0x80a7d2e6 [ 127.6263235] curcpu holds : 0 wanted by: 0x8d419a07f200 [ 127.6363234] panic: LOCKDEBock,244: spinout [ 127.6363234] cpu1: Begin traceback... [ 127.6463233] vpanic() at netbsd:vpanic+0x152 [ 127.6463233] snprintf() at netbsd:snprintf [ 127.6563232] lockdebug_more() at netbsd:lockdebug_more [ 127.6563232] _kernel_lock() at netbsd:_kernel_lock+0x244 [ 127.6663231] ip_slowtimo() at netbsd:ip_slowtimo+0x1a [ 127.6763231] pfslowtimo() at netbsd:pfslowtimo+0x34 [ 127.6763231] callout_softclock() at netbsd:callout_softclock+0x10f [ 127.6863230] softint_disph+0x108 [ 127.6863230] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xa4825d02eff0 [ 127.6963230] Xsoftintr() at netbsd:Xsoftintr+0x4f [ 127.7063229] --- interrupt --- [ 127.706322traceback... (box is happily usable without the LOCKDEBUG - it just means I can't debug what I'm trying to get at...) (must try with biosboot instead fo EFI which is the case here) wm0 at pci7 dev 0 function 0: I211 Ethernet (COPPER) (rev. 0x03) wm0: for TX and RX interrupting at msix3 vec 0 affinity to 1 wm0: for TX and RX interrupting at msix3 vec 1 affinity to 2 wm0: for LINK interrupting at msix3 vec 2 wm0: PCI-Express bus wm0: 64 words iNVM, version 0.6 wm0: Ethernet address 60:45:cb:9e:13:dd wm0: COMPAT = wm0: Copper wm0: 0xc614420 makphy0 at wm0 phy 1: I210 10/100/1000 media interface, rev. 0 # strings /netbsd | grep if_wm.c $NetBSD: if_wm.c,v 1.679 2020/06/27 13:32:00 jmcneill Exp $ Cheers, Patrick