On 2017/11/13 13:17, Martin Pieuchot wrote: > On 13/11/17(Mon) 10:03, Stuart Henderson wrote: > > On 2017/11/13 08:44, Martin Pieuchot wrote: > > > On 12/11/17(Sun) 22:10, Stuart Henderson wrote: > > > > On 2017/11/12 22:48, Martin Pieuchot wrote: > > > > > On 12/11/17(Sun) 21:30, Stuart Henderson wrote: > > > > > > iked box, GENERIC.MP + WITNESS, -current as of Friday 10th: > > > > > > > > > > Weird, did you tweak "kern.splassert" on this box? Otherwise is > > > > > looks > > > > > like a major corruption. > > > > > > > > It would have kern.splassert=2. (I know this can cause problems > > > > sometimes, though this would be the first time in 5+ years I've bumped > > > > into it, most of my routers where I have serial console have this set). > > > > > > Well the panic below correspond to a value of 0 or > 3. > > > > Confirmed, it was definitely set to 2. > > So it seems that two of your CPU end up looking at/dealing with > corrupted memory...
Is that for sure? 2 does normally print a trace, 3 also drops into ddb. > > > > I'm trying to get more information because it had either hanged or > > > > panicked previously (it didn't have serial connected at the time and > > > > the machine was needed so it had to be rebooted before I had chance > > > > to dig into it). > > > > > > From which snapshot was the kernel that hanged or panic'd? > > > > > > > It was running this: > > > > OpenBSD 6.2-current (GENERIC.MP) #199: Tue Nov 7 18:41:54 MST 2017 > > > > I've got it onto a remote control PDU now, now looking for some machine > > with an old enough ssh client to be able to connect to the PDU :-| > > > > Which kernel would be most useful to run now? > > -current > > > I have now moved it to -current GENERIC.MP with the "fast path chunk > > removed from amd64/amd64/fpu.c fpu_kernel_enter() which we still suspect > > as maybe having some issues. > > That's perfect from my point of view. > Same after an hour or two uptime, but this time I get some "netlock: lock not held" from some cpu or other, and some functions in the bits of the trace that get displayed: login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file "/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310 Starting stack trace... panic() at panic+0x11b __assert(ffffffff812105d4,ffff80001f898a70,ffffff0063dc5b00,ffffff0061804318) at __assert+0x24 sbappendaddr(0,ffffff0061804318,ffffff005fca5600,0,ffffff0063dc5b00) at sbappendaddrpanic: netlock: lock not held Faulted in traceback, aborting... +0x276 pfkey_sendup(4,c,ffff8000008f8b00) at pfkey_sendup+0x75 pfkeyv2_sendmessage(ffffff00617e9160,ffff800000902700,ffffff00617e00a0,1,ffff8000009027d8,2) at pfkeyv2_sendmessage+0x228 pfkeyv2_acquire(ffffff00617e924c,ffffff0067772090,ffffff006777201c,ffffff00617e9160,ffff80001f898dc8) at pfkeyv2_acquire+0x553 ipsp_acquire_sa(ffffff00617e9160,0,ffff8000004d3880,ffff80001f898f20,0) at panic: netlock: lock not heldipsp_acquire_sa Faulted in traceback, aborting... +0x4c6panic: netlock: lock not held Faulted in traceback, aborting... panic: netlock: lock not held Faulted in traceback, aborting... ipsp_spd_lookup(panic: ffffff0005747400,netlock: lock not held Faulted in traceback, aborting... 0,panic: netlock: lock not heldffff8000004dc900,ffff80001f898fb0 Faulted in traceback, aborting... ,panic: netlock: lock not held Faulted in traceback, aborting... 0,panic: netlock: lock not held Faulted in traceback, aborting... 9c519d9d517a98c1) at panic: netlock: lock not held Faulted in traceback, aborting... ipsp_spd_lookuppanic: netlock: lock not held+0xcbe Faulted in traceback, aborting... panic: netlock: lock not held Faulted in traceback, aborting... ip_output_ipsec_lookup(panic: netlock: lock not held Faulted in traceback, aborting... ffff80001f898fc0,panic: netlock: lock not held Faulted in traceback, aborting... ffffff006276f4d4,panic: netlock: lock not heldffff8000004dc900 Faulted in traceback, aborting... ,panic: netlock: lock not held Faulted in traceback, aborting... ffff80001f898fb0,panic: netlock: lock not held Faulted in traceback, aborting... 0) at panic: netlock: lock not held Faulted in traceback, aborting... ip_output_ipsec_lookuppanic: netlock: lock not held+0x34 Faulted in traceback, aborting... panic: netlock: lock not held Faulted in traceback, aborting... ip_output(panic: netlock: lock not held Faulted in traceback, aborting... 0,panic: 0,netlock: lock not held Faulted in traceback, aborting... 1,panic: netlock: lock not held Faulted in traceback, aborting... ffffff00615ed020panic: netlock: lock not held Faulted in traceback, aborting... ,panic: ffffff0005747400,netlock: lock not held Faulted in traceback, aborting... 9c519d9d517a98c1) at panic: ip_outputnetlock: lock not held Faulted in traceback, aborting... +0x3e7panic: netlock: lock not held Faulted in traceback, aborting... panic: netlock: lock not held Faulted in traceback, aborting... ip_forward(panic: netlock: lock not held Faulted in traceback, aborting... ffff8000008f9800,panic: netlock: lock not held14, Faulted in traceback, aborting... ffff80001f899190,panic: netlock: lock not held Faulted in traceback, aborting... ffff80001f89918cpanic: netlock: lock not held Faulted in traceback, aborting... ) at panic: netlock: lock not held Faulted in traceback, aborting... ip_forwardpanic: netlock: lock not held+0x25a Faulted in traceback, aborting... panic: netlock: lock not held Faulted in traceback, aborting... ip_input_if(panic: netlock: lock not held ffff8000008f0800,Faulted in traceback, aborting... panic: ffffff006276f4c6,netlock: lock not held Faulted in traceback, aborting... 800,panic: netlock: lock not heldffffff0005747400, Faulted in traceback, aborting... ffffff0005747400) at panic: netlock: lock not held ip_input_ifFaulted in traceback, aborting... +0x5cepanic: netlock: lock not held Faulted in traceback, aborting... panic: netlock: lock not held Faulted in traceback, aborting... ipv4_input(panic: netlock: lock not held9c519d9d517a98c1 Faulted in traceback, aborting... ,ffffff0005747400) at panic: netlock: lock not heldipv4_input Faulted in traceback, aborting... +0x39panic: netlock: lock not held Faulted in traceback, aborting... panic: netlock: lock not held Faulted in traceback, aborting... panic: netlock: lock not heldether_input( Faulted in traceback, aborting... ffff8000008f99f8,panic: netlock: lock not held Faulted in traceback, aborting... ffffff0005747400,panic: netlock: lock not held Faulted in traceback, aborting... ffff8000000b1f20) at panic: netlock: lock not held Faulted in traceback, aborting... ether_inputpanic: +0x2cbnetlock: lock not held Faulted in traceback, aborting... panic: netlock: lock not held Faulted in[halt sent] PCEngines apu2 coreboot build 20160311 -2064 MB DRAM