On 2017/11/13 13:17, Martin Pieuchot wrote:
> On 13/11/17(Mon) 10:03, Stuart Henderson wrote:
> > On 2017/11/13 08:44, Martin Pieuchot wrote:
> > > On 12/11/17(Sun) 22:10, Stuart Henderson wrote:
> > > > On 2017/11/12 22:48, Martin Pieuchot wrote:
> > > > > On 12/11/17(Sun) 21:30, Stuart Henderson wrote:
> > > > > > iked box, GENERIC.MP + WITNESS, -current as of Friday 10th:
> > > > > 
> > > > > Weird, did you tweak "kern.splassert" on this box?   Otherwise is 
> > > > > looks
> > > > > like a major corruption.
> > > > 
> > > > It would have kern.splassert=2. (I know this can cause problems
> > > > sometimes, though this would be the first time in 5+ years I've bumped
> > > > into it, most of my routers where I have serial console have this set).
> > > 
> > > Well the panic below correspond to a value of 0 or > 3.
> > 
> > Confirmed, it was definitely set to 2.
> 
> So it seems that two of your CPU end up looking at/dealing with
> corrupted memory...

Is that for sure? 2 does normally print a trace, 3 also drops into ddb.

> > > > I'm trying to get more information because it had either hanged or
> > > > panicked previously (it didn't have serial connected at the time and
> > > > the machine was needed so it had to be rebooted before I had chance
> > > > to dig into it).
> > > 
> > > From which snapshot was the kernel that hanged or panic'd?
> > > 
> > 
> > It was running this:
> > 
> > OpenBSD 6.2-current (GENERIC.MP) #199: Tue Nov  7 18:41:54 MST 2017
> > 
> > I've got it onto a remote control PDU now, now looking for some machine
> > with an old enough ssh client to be able to connect to the PDU :-|
> > 
> > Which kernel would be most useful to run now?
> 
> -current
> 
> > I have now moved it to -current GENERIC.MP with the "fast path chunk
> > removed from amd64/amd64/fpu.c fpu_kernel_enter() which we still suspect
> > as maybe having some issues.
> 
> That's perfect from my point of view.
> 

Same after an hour or two uptime, but this time I get some "netlock:
lock not held" from some cpu or other, and some functions in the bits of
the trace that get displayed:

login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file 
"/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310
Starting stack trace...
panic() at panic+0x11b
__assert(ffffffff812105d4,ffff80001f898a70,ffffff0063dc5b00,ffffff0061804318) 
at __assert+0x24
sbappendaddr(0,ffffff0061804318,ffffff005fca5600,0,ffffff0063dc5b00) at 
sbappendaddrpanic: netlock: lock not held
Faulted in traceback, aborting...
+0x276
pfkey_sendup(4,c,ffff8000008f8b00) at pfkey_sendup+0x75
pfkeyv2_sendmessage(ffffff00617e9160,ffff800000902700,ffffff00617e00a0,1,ffff8000009027d8,2)
 at pfkeyv2_sendmessage+0x228
pfkeyv2_acquire(ffffff00617e924c,ffffff0067772090,ffffff006777201c,ffffff00617e9160,ffff80001f898dc8)
 at pfkeyv2_acquire+0x553
ipsp_acquire_sa(ffffff00617e9160,0,ffff8000004d3880,ffff80001f898f20,0) at 
panic: netlock: lock not heldipsp_acquire_sa
Faulted in traceback, aborting...
+0x4c6panic: netlock: lock not held
Faulted in traceback, aborting...

panic: netlock: lock not held
Faulted in traceback, aborting...
ipsp_spd_lookup(panic: ffffff0005747400,netlock: lock not held
Faulted in traceback, aborting...
0,panic: netlock: lock not heldffff8000004dc900,ffff80001f898fb0
Faulted in traceback, aborting...
,panic: netlock: lock not held
Faulted in traceback, aborting...
0,panic: netlock: lock not held
Faulted in traceback, aborting...
9c519d9d517a98c1) at panic: netlock: lock not held
Faulted in traceback, aborting...
ipsp_spd_lookuppanic: netlock: lock not held+0xcbe
Faulted in traceback, aborting...

panic: netlock: lock not held
Faulted in traceback, aborting...
ip_output_ipsec_lookup(panic: netlock: lock not held
Faulted in traceback, aborting...
ffff80001f898fc0,panic: netlock: lock not held
Faulted in traceback, aborting...
ffffff006276f4d4,panic: netlock: lock not heldffff8000004dc900
Faulted in traceback, aborting...
,panic: netlock: lock not held
Faulted in traceback, aborting...
ffff80001f898fb0,panic: netlock: lock not held
Faulted in traceback, aborting...
0) at panic: netlock: lock not held
Faulted in traceback, aborting...
ip_output_ipsec_lookuppanic: netlock: lock not held+0x34
Faulted in traceback, aborting...

panic: netlock: lock not held
Faulted in traceback, aborting...
ip_output(panic: netlock: lock not held
Faulted in traceback, aborting...
0,panic: 0,netlock: lock not held
Faulted in traceback, aborting...
1,panic: netlock: lock not held
Faulted in traceback, aborting...
ffffff00615ed020panic: netlock: lock not held
Faulted in traceback, aborting...
,panic: ffffff0005747400,netlock: lock not held
Faulted in traceback, aborting...
9c519d9d517a98c1) at panic: ip_outputnetlock: lock not held
Faulted in traceback, aborting...
+0x3e7panic: netlock: lock not held
Faulted in traceback, aborting...

panic: netlock: lock not held
Faulted in traceback, aborting...
ip_forward(panic: netlock: lock not held
Faulted in traceback, aborting...
ffff8000008f9800,panic: netlock: lock not held14,
Faulted in traceback, aborting...
ffff80001f899190,panic: netlock: lock not held
Faulted in traceback, aborting...
ffff80001f89918cpanic: netlock: lock not held
Faulted in traceback, aborting...
) at panic: netlock: lock not held
Faulted in traceback, aborting...
ip_forwardpanic: netlock: lock not held+0x25a
Faulted in traceback, aborting...

panic: netlock: lock not held
Faulted in traceback, aborting...
ip_input_if(panic: netlock: lock not held
ffff8000008f0800,Faulted in traceback, aborting...
panic: ffffff006276f4c6,netlock: lock not held
Faulted in traceback, aborting...
800,panic: netlock: lock not heldffffff0005747400,
Faulted in traceback, aborting...
ffffff0005747400) at panic: netlock: lock not held
ip_input_ifFaulted in traceback, aborting...
+0x5cepanic: netlock: lock not held
Faulted in traceback, aborting...

panic: netlock: lock not held
Faulted in traceback, aborting...
ipv4_input(panic: netlock: lock not held9c519d9d517a98c1
Faulted in traceback, aborting...
,ffffff0005747400) at panic: netlock: lock not heldipv4_input
Faulted in traceback, aborting...
+0x39panic: netlock: lock not held
Faulted in traceback, aborting...

panic: netlock: lock not held
Faulted in traceback, aborting...
panic: netlock: lock not heldether_input(
Faulted in traceback, aborting...
ffff8000008f99f8,panic: netlock: lock not held
Faulted in traceback, aborting...
ffffff0005747400,panic: netlock: lock not held
Faulted in traceback, aborting...
ffff8000000b1f20) at panic: netlock: lock not held
Faulted in traceback, aborting...
ether_inputpanic: +0x2cbnetlock: lock not held
Faulted in traceback, aborting...

panic: netlock: lock not held
Faulted in[halt sent]
PCEngines apu2
coreboot build 20160311
-2064 MB DRAM

Reply via email to