Re: wm0 panic

2020-07-23 Thread Patrick Welche
On Sat, Jun 27, 2020 at 04:24:21PM +0100, Patrick Welche wrote:
> Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can
> boot multiuser without a network. If I log in as root, as soon as I hit
> enter:
> 
> # ifconfig wm0 inet 10.0.0.62 netmask 0xff00
> [ 127.5763268] Kernel lock error 127.5763268] lock address : 
> 0x8106ab40 type :   spin

I can't reproduce this after
 
  http://mail-index.netbsd.org/source-changes/2020/07/07/msg119158.html
 
Cheers,
 
Patrick



Re: wm0 panic

2020-07-13 Thread SAITOH Masanobu

Hello, all.

On 2020/07/06 17:03, Masanobu SAITOH wrote:

Hi, all.

On 2020/06/29 12:53, Kengo NAKAHARA wrote:

Hi,

On 2020/06/28 0:24, Patrick Welche wrote:

Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can
boot multiuser without a network. If I log in as root, as soon as I hit
enter:

# ifconfig wm0 inet 10.0.0.62 netmask 0xff00
[ 127.5763268] Kernel lock error 127.5763268] lock address : 0x8106ab40 
type :   spin
[ 127.5863237] initialized  : 0x80b0bbb9
[ 127.5863237] shared holds :  0 exclusive:  1
[ 127.5963238] shares wanted:  0 exclusive:  1
[ 127.6063236] relevant cpu :  1 last held:  0
[ 127.6163235] relevant lwp : 0x8d419a07f20
[ 127.6163235] last locked* : 0x80a7d2f5 unlocked : 0x80a7d2e6
[ 127.6263235] curcpu holds :  0 wanted by: 0x8d419a07f200
[ 127.6363234] panic: LOCKDEBock,244: spinout
[ 127.6363234] cpu1: Begin traceback...
[ 127.6463233] vpanic() at netbsd:vpanic+0x152
[ 127.6463233] snprintf() at netbsd:snprintf
[ 127.6563232] lockdebug_more() at netbsd:lockdebug_more
[ 127.6563232] _kernel_lock() at netbsd:_kernel_lock+0x244
[ 127.6663231] ip_slowtimo() at netbsd:ip_slowtimo+0x1a
[ 127.6763231] pfslowtimo() at netbsd:pfslowtimo+0x34
[ 127.6763231] callout_softclock() at netbsd:callout_softclock+0x10f
[ 127.6863230] softint_disph+0x108
[ 127.6863230] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 
0xa4825d02eff0
[ 127.6963230] Xsoftintr() at netbsd:Xsoftintr+0x4f
[ 127.7063229] --- interrupt ---
[ 127.706322traceback...

It seems some other code have held KERNEL_LOCK too long time.
Could you show the function of last locked address?
# e.g. addr2line -e "your kernel image" -f 0x80a7d2f5

If the panic can reappear, could you show "show all locks/t" of ddb?


Thanks,


It seems this problem is the same as the following mail:

http://mail-index.netbsd.org/current-users/2020/06/03/msg038785.html



Me, and Patrick tried to find the root cause of this problem off-list.
(Note that I can't reproduce the same problem on my machines)

Every time Patrick got this problem, ifconfig had tried to take
module_hook.mtx which was initialized in module_hook_init().
It seems ddb can't trace the stack of ifconfig. Was the process
in a module? Is there a way to know what's happening in it?


--
[  22.3977608] Kernel lock error: _kernel_lock,244: spinout

[  22.4077593] lock address : 0x818a9600 type :   spin
[  22.4177593] initialized  : 0x80e7e7e9
[  22.4277592] shared holds :  0 exclusive:  1
[  22.4477593] shares wanted:  0 exclusive:  2
[  22.4577592] relevant cpu :  6 last held:  0
[  22.4677593] relevant lwp : 0x850788f52140 last held: 0x85078a413b40
[  22.4877595] last locked* : 0x80db867b unlocked : 0x80db866c
[  22.4977596] curcpu holds :  0 wanted by: 0x850788f52140

[  22.5077596] panic: LOCKDEBUG: Kernel lock error: _kernel_lock,244: spinout
[  22.5277594] cpu6: Begin traceback...
[  22.5377595] vpanic() at netbsd:vpanic+0x152
[  22.5377595] snprintf() at netbsd:snprintf
[  22.5477594] lockdebug_more() at netbsd:lockdebug_more
[  22.5577621] _kernel_lock() at netbsd:_kernel_lock+0x244
[  22.5777595] frag6_fasttimo() at netbsd:frag6_fasttimo+0x1a
[  22.5877596] pffasttimo() at netbsd:pffasttimo+0x34
[  22.5977596] callout_softclock() at netbsd:callout_softclock+0x10f
[  22.6077597] softint_dispatch() at netbsd:softint_dispatch+0x108
[  22.6177598] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 
0x96025d0d1ff0
[  22.6377596] Xsoftintr() at netbsd:Xsoftintr+0x4f
[  22.6377596] --- interrupt ---
[  22.6477596] d787ba705305aa3:
[  22.6577597] cpu6: End traceback...
[  22.6577597] fatal breakpoint trap in supervisor mode
[  22.6677597] trap type 1 code 0 rip 0x8022098d cs 0x8 rflags 0x202 
cr2 0 ilevel 0x2 rsp 0x96025d0d1d70
[  22.6877599] curlwp 0x850788f52140 pid 0.118 lowest kstack 
0x96025d0cd2c0
Stopped in pid 0.118 (system) atnetbsd:breakpoint+0x5:  leave
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x152
snprintf() at netbsd:snprintf
lockdebug_more() at netbsd:lockdebug_more
_kernel_lock() at netbsd:_kernel_lock+0x244
frag6_fasttimo() at netbsd:frag6_fasttimo+0x1a
pffasttimo() at netbsd:pffasttimo+0x34
callout_softclock() at netbsd:callout_softclock+0x10f
softint_dispatch() at netbsd:softint_dispatch+0x108
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0x96025d0d1ff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
d787ba705305aa3:
ds  1d80
es  1d30
fs  1d70
gs  10
rdi 2
rsi 2d5
rbp 96025d0d1d70
rbx 104
rdx 1
rcx 2
rax 0
r8  104
r9   

Re: wm0 panic

2020-07-06 Thread Masanobu SAITOH
Hi, all.

On 2020/06/29 12:53, Kengo NAKAHARA wrote:
> Hi,
> 
> On 2020/06/28 0:24, Patrick Welche wrote:
>> Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can
>> boot multiuser without a network. If I log in as root, as soon as I hit
>> enter:
>>
>> # ifconfig wm0 inet 10.0.0.62 netmask 0xff00
>> [ 127.5763268] Kernel lock error 127.5763268] lock address : 
>> 0x8106ab40 type :   spin
>> [ 127.5863237] initialized  : 0x80b0bbb9
>> [ 127.5863237] shared holds :  0 exclusive:  
>> 1
>> [ 127.5963238] shares wanted:  0 exclusive:  
>> 1
>> [ 127.6063236] relevant cpu :  1 last held:  >> 0
>> [ 127.6163235] relevant lwp : 0x8d419a07f20
>> [ 127.6163235] last locked* : 0x80a7d2f5 unlocked : 
>> 0x80a7d2e6
>> [ 127.6263235] curcpu holds :  0 wanted by: 
>> 0x8d419a07f200
>> [ 127.6363234] panic: LOCKDEBock,244: spinout
>> [ 127.6363234] cpu1: Begin traceback...
>> [ 127.6463233] vpanic() at netbsd:vpanic+0x152
>> [ 127.6463233] snprintf() at netbsd:snprintf
>> [ 127.6563232] lockdebug_more() at netbsd:lockdebug_more
>> [ 127.6563232] _kernel_lock() at netbsd:_kernel_lock+0x244
>> [ 127.6663231] ip_slowtimo() at netbsd:ip_slowtimo+0x1a
>> [ 127.6763231] pfslowtimo() at netbsd:pfslowtimo+0x34
>> [ 127.6763231] callout_softclock() at netbsd:callout_softclock+0x10f
>> [ 127.6863230] softint_disph+0x108
>> [ 127.6863230] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 
>> 0xa4825d02eff0
>> [ 127.6963230] Xsoftintr() at netbsd:Xsoftintr+0x4f
>> [ 127.7063229] --- interrupt ---
>> [ 127.706322traceback...
> It seems some other code have held KERNEL_LOCK too long time.
> Could you show the function of last locked address?
> # e.g. addr2line -e "your kernel image" -f 0x80a7d2f5
> 
> If the panic can reappear, could you show "show all locks/t" of ddb?
> 
> 
> Thanks,

It seems this problem is the same as the following mail:

http://mail-index.netbsd.org/current-users/2020/06/03/msg038785.html


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: wm0 panic

2020-06-29 Thread Patrick Welche
On Mon, Jun 29, 2020 at 12:53:23PM +0900, Kengo NAKAHARA wrote:
> It seems some other code have held KERNEL_LOCK too long time.
> Could you show the function of last locked address?
> # e.g. addr2line -e "your kernel image" -f 0x80a7d2f5

With Jun 28 14:26 code

# addr2line -e netbsd.3.gdb -f 0x80a4c526
doifioctl
/usr/src/sys/arch/amd64/compile/QUANTZDBG/../../../../net/if.c:3403 
(discriminator 3)

> If the panic can reappear, could you show "show all locks/t" of ddb?

It is nicely reproducible (boot single user, type "ifconfig wm0 up"),
I have a core dump and a serial console, but debugging locking issues
is "interesting"!


Thanks,

Patrick
type :   spin
initialized  : 0x80ada119
shared holds :  0 exclusive:  1
shares wanted:  0 exclusive:  3
relevant cpu :  1 last held:  0
relevant lwp : 0xf1c63767f200 last held: 0xf1c6387c8a40
last locked* : 0x80a4c526 unlocked : 0x80a4c517
curcpu holds :  0 wanted by: 0xf1c63767f200

db{1}> show all locks /t
[Locks tracked through LWPs]

** LWP 330.330 (ifconfig) @ 0xf1c6387c8a40, l_stat=7

*** Locks held:

* Lock 0 (ick address : 0xf1c637a4e380 type : sleep/adaptive
initialized  : 0x80a475fd
shared holds :  0 exclusive:  1
shares wanted:  0 exclusive:  0
relevant cpu :  0 last held:  0
relevant lwp : 0xf1c6388a40
last locked* : 0x80a4bf94 unlocked : 0x80a4c02b
owner field  : 0xf1c6387c8a40 wait/spin:0/0
Turnstile: no active turnstile for this lock.

*** Loczed at module_hook_init)
lock address : 0x8106a800 type : sleep/adaptive
initialized  : 0x80952c6e
shared holds :  0 exclusive:  0
shares wanted:  0 exclusive:  0
relevant cpu :  0 last held:  0
relevant lwp : 0xf1c6387c8a40 last held: 00
last locked  : 00 unlocked*: 00
owner field  : 00 wait/spin:0/0
Turnstile: no active turnstile for this lock.
   
*** Traceback:
   
trace: pid 330 lid 330 at 0x8
address 0x283 is invalid
?() at 283 
address 0x10 is invalid
address 0x8 is invalid
db_printf() at netbsd:db_printf
   
   
** LWP 0.402 (iic1) @ 0xf1c637f1aa40, l_stat=7
   
*** Locks held: none

*** Locks wanted:

* Lock 0 (initialized at main)
lock address : 0x8106a700 type :0x80ada119
shared holds :  0 exclusive:  1
shares wanted:  0 exclusive:  3
relevant cpu :  2 last hellwp : 0xf1c637f1aa40 last held: 
0xf1c6387c8a40
last locked* : 0x80a4c526 unlocked : 0x80a4c517
curcpu holds :  0 wanted by: 0xf1c63767f200

*** 02 at 0xb0025da16ec0
sleepq_block() at netbsd:sleepq_block+0x211
iic_smbus_intr_thread() at netbsd:iic_smbus_intr_thread+0x52


** LWP 0.401 (iic0) @ 0xf1c637f1a600, l_stat=7
   
*** Locks held: none

*** Locks wanted:

* Lock 0 (initiax8106a700 type :   spin
initialized  : 0x80ada119
shared holds :  0 exclusive:  1
shares wanted:  0 exclusive:  3
relevant cpu :  1 last held:  0
relevant lwp : 0xf1c637f1a600 last held: 0xf1c6387c8a40
last locked* : 0x80a4c526 unlocked : 0x80a4c517
curcpu holds :  0 wanted by: 0xf1c63767f200

*** Traceback:

trace: pid 0 lid 401 at 0xb0025da11ec0
sleepq_block() at netbsd:sleepq_block+0x211
iic_smbus_intr_thread() at netbsd:iic_smbus_intr_thread+0x52


** LWP 0.23 (softclk/1) @ 0xf1c63767f200, l_stat=7
   
*** Locks held:

* Lock 0 (initialized at soinit)
lock address : 0xf1cd177e3080 type : sleep/adaptive
initialed holds :  0 exclusive:  1
shares wanted:  0 exclusive:  0
relevant cpu :  1 last held:  1
r last held: 0xf1c63767f200
last locked* : 0x806c3e65 unlocked : 0x806d5ebd
owner field  : 0xf1c63767f200 wait/spin:0/0
Turnstile: no active turnstileted:

* Lock 0 (initialized at main)
lock address : 0x8106a700 type :   spin
initialized  : 0x80ada119
shared holds :  0 exclusive:  1
shares wanted:  0 exclusive:  3
relevant cpu :  1 last held:  0
relevant lwp : 0xf1c63767f200 last held: 0xf1c6387c8a40
last locked* : 0x80a4c526 unlocked : 

Re: wm0 panic

2020-06-28 Thread Kengo NAKAHARA

Hi,

On 2020/06/28 0:24, Patrick Welche wrote:

Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can
boot multiuser without a network. If I log in as root, as soon as I hit
enter:

# ifconfig wm0 inet 10.0.0.62 netmask 0xff00
[ 127.5763268] Kernel lock error 127.5763268] lock address : 0x8106ab40 
type :   spin
[ 127.5863237] initialized  : 0x80b0bbb9
[ 127.5863237] shared holds :  0 exclusive:  1
[ 127.5963238] shares wanted:  0 exclusive:  1
[ 127.6063236] relevant cpu :  1 last held:  0
[ 127.6163235] relevant lwp : 0x8d419a07f20
[ 127.6163235] last locked* : 0x80a7d2f5 unlocked : 0x80a7d2e6
[ 127.6263235] curcpu holds :  0 wanted by: 0x8d419a07f200
[ 127.6363234] panic: LOCKDEBock,244: spinout
[ 127.6363234] cpu1: Begin traceback...
[ 127.6463233] vpanic() at netbsd:vpanic+0x152
[ 127.6463233] snprintf() at netbsd:snprintf
[ 127.6563232] lockdebug_more() at netbsd:lockdebug_more
[ 127.6563232] _kernel_lock() at netbsd:_kernel_lock+0x244
[ 127.6663231] ip_slowtimo() at netbsd:ip_slowtimo+0x1a
[ 127.6763231] pfslowtimo() at netbsd:pfslowtimo+0x34
[ 127.6763231] callout_softclock() at netbsd:callout_softclock+0x10f
[ 127.6863230] softint_disph+0x108
[ 127.6863230] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 
0xa4825d02eff0
[ 127.6963230] Xsoftintr() at netbsd:Xsoftintr+0x4f
[ 127.7063229] --- interrupt ---
[ 127.706322traceback...

It seems some other code have held KERNEL_LOCK too long time.
Could you show the function of last locked address?
# e.g. addr2line -e "your kernel image" -f 0x80a7d2f5

If the panic can reappear, could you show "show all locks/t" of ddb?


Thanks,

--
//
Internet Initiative Japan Inc.

Device Engineering Section,
Product Development Department,
Product Division,
Technology Unit

Kengo NAKAHARA 


Re: wm0 panic

2020-06-27 Thread Patrick Welche
On Sat, Jun 27, 2020 at 04:24:21PM +0100, Patrick Welche wrote:
> (must try with biosboot instead fo EFI which is the case here)
makes no difference


wm0 panic

2020-06-27 Thread Patrick Welche
Trying a today's -current/amd64 with DIAGNOSTIC/DEBUG/LOCKDEBUG, I can
boot multiuser without a network. If I log in as root, as soon as I hit
enter:

# ifconfig wm0 inet 10.0.0.62 netmask 0xff00
[ 127.5763268] Kernel lock error 127.5763268] lock address : 0x8106ab40 
type :   spin
[ 127.5863237] initialized  : 0x80b0bbb9
[ 127.5863237] shared holds :  0 exclusive:  1
[ 127.5963238] shares wanted:  0 exclusive:  1
[ 127.6063236] relevant cpu :  1 last held:  0
[ 127.6163235] relevant lwp : 0x8d419a07f20
[ 127.6163235] last locked* : 0x80a7d2f5 unlocked : 0x80a7d2e6
[ 127.6263235] curcpu holds :  0 wanted by: 0x8d419a07f200
[ 127.6363234] panic: LOCKDEBock,244: spinout
[ 127.6363234] cpu1: Begin traceback...
[ 127.6463233] vpanic() at netbsd:vpanic+0x152
[ 127.6463233] snprintf() at netbsd:snprintf
[ 127.6563232] lockdebug_more() at netbsd:lockdebug_more
[ 127.6563232] _kernel_lock() at netbsd:_kernel_lock+0x244
[ 127.6663231] ip_slowtimo() at netbsd:ip_slowtimo+0x1a
[ 127.6763231] pfslowtimo() at netbsd:pfslowtimo+0x34
[ 127.6763231] callout_softclock() at netbsd:callout_softclock+0x10f
[ 127.6863230] softint_disph+0x108
[ 127.6863230] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 
0xa4825d02eff0
[ 127.6963230] Xsoftintr() at netbsd:Xsoftintr+0x4f
[ 127.7063229] --- interrupt ---
[ 127.706322traceback...


(box is happily usable without the LOCKDEBUG - it just means I can't debug
what I'm trying to get at...)
(must try with biosboot instead fo EFI which is the case here)

wm0 at pci7 dev 0 function 0: I211 Ethernet (COPPER) (rev. 0x03)
wm0: for TX and RX interrupting at msix3 vec 0 affinity to 1
wm0: for TX and RX interrupting at msix3 vec 1 affinity to 2
wm0: for LINK interrupting at msix3 vec 2
wm0: PCI-Express bus
wm0: 64 words iNVM, version 0.6
wm0: Ethernet address 60:45:cb:9e:13:dd
wm0: COMPAT = 
wm0: Copper
wm0: 0xc614420
makphy0 at wm0 phy 1: I210 10/100/1000 media interface, rev. 0

# strings /netbsd | grep if_wm.c
$NetBSD: if_wm.c,v 1.679 2020/06/27 13:32:00 jmcneill Exp $



Cheers,

Patrick