Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec
On 13/11/17(Mon) 12:33, Stuart Henderson wrote: > On 2017/11/13 13:17, Martin Pieuchot wrote: > > [...] > > So it seems that two of your CPU end up looking at/dealing with > > corrupted memory... > > Is that for sure? 2 does normally print a trace, 3 also drops into ddb. But none of them print: panic: spl assertion failure in soassertlocked. However it might just be a race because the other CPU just entered panic and set splassert_ctl to 0. > Same after an hour or two uptime, but this time I get some "netlock: > lock not held" from some cpu or other, and some functions in the bits of > the trace that get displayed: > > login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file > "/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310 > Starting stack trace... > panic() at panic+0x11b > __assert(812105d4,80001f898a70,ff0063dc5b00,ff0061804318) > at __assert+0x24 > sbappendaddr(0,ff0061804318,ff005fca5600,0,ff0063dc5b00) at > sbappendaddrpanic: netlock: lock not held Does the diff below help? It should in any case reduce the "netlock: lock not held" noises. Index: net/pfkeyv2.c === RCS file: /cvs/src/sys/net/pfkeyv2.c,v retrieving revision 1.173 diff -u -p -r1.173 pfkeyv2.c --- net/pfkeyv2.c 12 Nov 2017 14:11:15 - 1.173 +++ net/pfkeyv2.c 13 Nov 2017 12:57:36 - @@ -428,12 +428,14 @@ pfkeyv2_sendmessage(void **headers, int * Search for promiscuous listeners, skipping the * original destination. */ + KERNEL_LOCK(); LIST_FOREACH(s, _sockets, kcb_list) { if ((s->flags & PFKEYV2_SOCKETFLAGS_PROMISC) && (s->rcb.rcb_socket != so) && (s->rdomain == rdomain)) pfkey_sendup(s, packet, 1); } + KERNEL_UNLOCK(); m_freem(packet); break; @@ -442,6 +444,7 @@ pfkeyv2_sendmessage(void **headers, int * Send the message to all registered sockets that match * the specified satype (e.g., all IPSEC-ESP negotiators) */ + KERNEL_LOCK(); LIST_FOREACH(s, _sockets, kcb_list) { if ((s->flags & PFKEYV2_SOCKETFLAGS_REGISTERED) && (s->rdomain == rdomain)) { @@ -454,6 +457,7 @@ pfkeyv2_sendmessage(void **headers, int } } } + KERNEL_UNLOCK(); /* Free last/original copy of the packet */ m_freem(packet); @@ -472,21 +476,25 @@ pfkeyv2_sendmessage(void **headers, int goto ret; /* Send to all registered promiscuous listeners */ + KERNEL_LOCK(); LIST_FOREACH(s, _sockets, kcb_list) { if ((s->flags & PFKEYV2_SOCKETFLAGS_PROMISC) && !(s->flags & PFKEYV2_SOCKETFLAGS_REGISTERED) && (s->rdomain == rdomain)) pfkey_sendup(s, packet, 1); } + KERNEL_UNLOCK(); m_freem(packet); break; case PFKEYV2_SENDMESSAGE_BROADCAST: /* Send message to all sockets */ + KERNEL_LOCK(); LIST_FOREACH(s, _sockets, kcb_list) { if (s->rdomain == rdomain) pfkey_sendup(s, packet, 1); } + KERNEL_UNLOCK(); m_freem(packet); break; } @@ -1010,11 +1018,13 @@ pfkeyv2_send(struct socket *so, void *me goto ret; /* Send to all promiscuous listeners */ + KERNEL_LOCK(); LIST_FOREACH(bkp, _sockets, kcb_list) { if ((bkp->flags & PFKEYV2_SOCKETFLAGS_PROMISC) && (bkp->rdomain == rdomain)) pfkey_sendup(bkp, packet, 1); } + KERNEL_UNLOCK(); m_freem(packet); @@ -1788,12 +1798,15 @@ pfkeyv2_send(struct socket *so, void *me if ((rval = pfdatatopacket(message, len, )) != 0) goto ret; - LIST_FOREACH(bkp, _sockets, kcb_list) + KERNEL_LOCK(); + LIST_FOREACH(bkp, _sockets, kcb_list) { if ((bkp != kp) && (bkp->rdomain == rdomain) && (!smsg->sadb_msg_seq || (smsg->sadb_msg_seq == kp->pid))) pfkey_sendup(bkp, packet, 1); +
Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec
On Mon, Nov 13, 2017 at 12:33:35PM +, Stuart Henderson wrote: > > Same after an hour or two uptime, but this time I get some "netlock: > lock not held" from some cpu or other, and some functions in the bits of > the trace that get displayed: > > login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file > "/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310 just a simple question regarding the previous line. does the start of the line ("login: ") is part of the kernel output or it is just the login(1) prompt on console (printed long time before the panic) and you copied the whole line ? thanks. -- Sebastien Marie
Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec
On 2017/11/13 13:17, Martin Pieuchot wrote: > On 13/11/17(Mon) 10:03, Stuart Henderson wrote: > > On 2017/11/13 08:44, Martin Pieuchot wrote: > > > On 12/11/17(Sun) 22:10, Stuart Henderson wrote: > > > > On 2017/11/12 22:48, Martin Pieuchot wrote: > > > > > On 12/11/17(Sun) 21:30, Stuart Henderson wrote: > > > > > > iked box, GENERIC.MP + WITNESS, -current as of Friday 10th: > > > > > > > > > > Weird, did you tweak "kern.splassert" on this box? Otherwise is > > > > > looks > > > > > like a major corruption. > > > > > > > > It would have kern.splassert=2. (I know this can cause problems > > > > sometimes, though this would be the first time in 5+ years I've bumped > > > > into it, most of my routers where I have serial console have this set). > > > > > > Well the panic below correspond to a value of 0 or > 3. > > > > Confirmed, it was definitely set to 2. > > So it seems that two of your CPU end up looking at/dealing with > corrupted memory... Is that for sure? 2 does normally print a trace, 3 also drops into ddb. > > > > I'm trying to get more information because it had either hanged or > > > > panicked previously (it didn't have serial connected at the time and > > > > the machine was needed so it had to be rebooted before I had chance > > > > to dig into it). > > > > > > From which snapshot was the kernel that hanged or panic'd? > > > > > > > It was running this: > > > > OpenBSD 6.2-current (GENERIC.MP) #199: Tue Nov 7 18:41:54 MST 2017 > > > > I've got it onto a remote control PDU now, now looking for some machine > > with an old enough ssh client to be able to connect to the PDU :-| > > > > Which kernel would be most useful to run now? > > -current > > > I have now moved it to -current GENERIC.MP with the "fast path chunk > > removed from amd64/amd64/fpu.c fpu_kernel_enter() which we still suspect > > as maybe having some issues. > > That's perfect from my point of view. > Same after an hour or two uptime, but this time I get some "netlock: lock not held" from some cpu or other, and some functions in the bits of the trace that get displayed: login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file "/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310 Starting stack trace... panic() at panic+0x11b __assert(812105d4,80001f898a70,ff0063dc5b00,ff0061804318) at __assert+0x24 sbappendaddr(0,ff0061804318,ff005fca5600,0,ff0063dc5b00) at sbappendaddrpanic: netlock: lock not held Faulted in traceback, aborting... +0x276 pfkey_sendup(4,c,808f8b00) at pfkey_sendup+0x75 pfkeyv2_sendmessage(ff00617e9160,80902700,ff00617e00a0,1,809027d8,2) at pfkeyv2_sendmessage+0x228 pfkeyv2_acquire(ff00617e924c,ff0067772090,ff006777201c,ff00617e9160,80001f898dc8) at pfkeyv2_acquire+0x553 ipsp_acquire_sa(ff00617e9160,0,804d3880,80001f898f20,0) at panic: netlock: lock not heldipsp_acquire_sa Faulted in traceback, aborting... +0x4c6panic: netlock: lock not held Faulted in traceback, aborting... panic: netlock: lock not held Faulted in traceback, aborting... ipsp_spd_lookup(panic: ff0005747400,netlock: lock not held Faulted in traceback, aborting... 0,panic: netlock: lock not held804dc900,80001f898fb0 Faulted in traceback, aborting... ,panic: netlock: lock not held Faulted in traceback, aborting... 0,panic: netlock: lock not held Faulted in traceback, aborting... 9c519d9d517a98c1) at panic: netlock: lock not held Faulted in traceback, aborting... ipsp_spd_lookuppanic: netlock: lock not held+0xcbe Faulted in traceback, aborting... panic: netlock: lock not held Faulted in traceback, aborting... ip_output_ipsec_lookup(panic: netlock: lock not held Faulted in traceback, aborting... 80001f898fc0,panic: netlock: lock not held Faulted in traceback, aborting... ff006276f4d4,panic: netlock: lock not held804dc900 Faulted in traceback, aborting... ,panic: netlock: lock not held Faulted in traceback, aborting... 80001f898fb0,panic: netlock: lock not held Faulted in traceback, aborting... 0) at panic: netlock: lock not held Faulted in traceback, aborting... ip_output_ipsec_lookuppanic: netlock: lock not held+0x34 Faulted in traceback, aborting... panic: netlock: lock not held Faulted in traceback, aborting... ip_output(panic: netlock: lock not held Faulted in traceback, aborting... 0,panic: 0,netlock: lock not held Faulted in traceback, aborting... 1,panic: netlock: lock not held Faulted in traceback, aborting... ff00615ed020panic: netlock: lock not held Faulted in traceback, aborting... ,panic: ff0005747400,netlock: lock not held Faulted in traceback, aborting... 9c519d9d517a98c1) at panic: ip_outputnetlock: lock not held Faulted in traceback, aborting... +0x3e7panic: netlock: lock not held Faulted in traceback, aborting... panic: netlock: lock not held Faulted in traceback, aborting... ip_forward(panic: netlock: lock not held Faulted in traceback,
Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec
On 13/11/17(Mon) 10:03, Stuart Henderson wrote: > On 2017/11/13 08:44, Martin Pieuchot wrote: > > On 12/11/17(Sun) 22:10, Stuart Henderson wrote: > > > On 2017/11/12 22:48, Martin Pieuchot wrote: > > > > On 12/11/17(Sun) 21:30, Stuart Henderson wrote: > > > > > iked box, GENERIC.MP + WITNESS, -current as of Friday 10th: > > > > > > > > Weird, did you tweak "kern.splassert" on this box? Otherwise is looks > > > > like a major corruption. > > > > > > It would have kern.splassert=2. (I know this can cause problems > > > sometimes, though this would be the first time in 5+ years I've bumped > > > into it, most of my routers where I have serial console have this set). > > > > Well the panic below correspond to a value of 0 or > 3. > > Confirmed, it was definitely set to 2. So it seems that two of your CPU end up looking at/dealing with corrupted memory... > > > I'm trying to get more information because it had either hanged or > > > panicked previously (it didn't have serial connected at the time and > > > the machine was needed so it had to be rebooted before I had chance > > > to dig into it). > > > > From which snapshot was the kernel that hanged or panic'd? > > > > It was running this: > > OpenBSD 6.2-current (GENERIC.MP) #199: Tue Nov 7 18:41:54 MST 2017 > > I've got it onto a remote control PDU now, now looking for some machine > with an old enough ssh client to be able to connect to the PDU :-| > > Which kernel would be most useful to run now? -current > I have now moved it to -current GENERIC.MP with the "fast path chunk > removed from amd64/amd64/fpu.c fpu_kernel_enter() which we still suspect > as maybe having some issues. That's perfect from my point of view.
Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec
On 2017/11/13 08:44, Martin Pieuchot wrote: > On 12/11/17(Sun) 22:10, Stuart Henderson wrote: > > On 2017/11/12 22:48, Martin Pieuchot wrote: > > > On 12/11/17(Sun) 21:30, Stuart Henderson wrote: > > > > iked box, GENERIC.MP + WITNESS, -current as of Friday 10th: > > > > > > Weird, did you tweak "kern.splassert" on this box? Otherwise is looks > > > like a major corruption. > > > > It would have kern.splassert=2. (I know this can cause problems > > sometimes, though this would be the first time in 5+ years I've bumped > > into it, most of my routers where I have serial console have this set). > > Well the panic below correspond to a value of 0 or > 3. Confirmed, it was definitely set to 2. > > > > login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: > > > > file "/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310 > > > ^^^ > > > Looks like one CPU is triggering this. > > > > > > > splassert: soassertlocked: want 1 have 256 > > > > > > > > panic: spl assertion failure in soassertlocked > > > ^^^ > > > That can't be coming from the same CPU.. > > > > > > > > > > > > > > > > Starting stack trace... > > > > Faulted in traceback, aborting... > > > > panic(splassert: if_down: want 1 have 256 > > > > panic: spl assertion failure in if_down) at > > > > Faulted in traceback, aborting... > > > > panicsplassert: if_down: want 1 have 256 > > > > +0x133panic: spl assertion failure in if_down > > > > Faulted in traceback, aborting... > > > > > > > > > > > > > > > > It's stuck at this point, I can't enter ddb. > > > > > > Are you running with WITNESS on purpose? Can you reproduce such problem > > > without it? I'm not saying it's WITNESS fault, but it's clear that > > > WITNESS kernels aren't ready for production yet. > > > > > > > I'm trying to get more information because it had either hanged or > > panicked previously (it didn't have serial connected at the time and > > the machine was needed so it had to be rebooted before I had chance > > to dig into it). > > From which snapshot was the kernel that hanged or panic'd? > It was running this: OpenBSD 6.2-current (GENERIC.MP) #199: Tue Nov 7 18:41:54 MST 2017 I've got it onto a remote control PDU now, now looking for some machine with an old enough ssh client to be able to connect to the PDU :-| Which kernel would be most useful to run now? I have now moved it to -current GENERIC.MP with the "fast path chunk removed from amd64/amd64/fpu.c fpu_kernel_enter() which we still suspect as maybe having some issues.
Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec
On 12/11/17(Sun) 22:10, Stuart Henderson wrote: > On 2017/11/12 22:48, Martin Pieuchot wrote: > > On 12/11/17(Sun) 21:30, Stuart Henderson wrote: > > > iked box, GENERIC.MP + WITNESS, -current as of Friday 10th: > > > > Weird, did you tweak "kern.splassert" on this box? Otherwise is looks > > like a major corruption. > > It would have kern.splassert=2. (I know this can cause problems > sometimes, though this would be the first time in 5+ years I've bumped > into it, most of my routers where I have serial console have this set). Well the panic below correspond to a value of 0 or > 3. > > > login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: > > > file "/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310 > > ^^^ > > Looks like one CPU is triggering this. > > > > > splassert: soassertlocked: want 1 have 256 > > > > > > panic: spl assertion failure in soassertlocked > > ^^^ > > That can't be coming from the same CPU.. > > > > > > > > > > > Starting stack trace... > > > Faulted in traceback, aborting... > > > panic(splassert: if_down: want 1 have 256 > > > panic: spl assertion failure in if_down) at > > > Faulted in traceback, aborting... > > > panicsplassert: if_down: want 1 have 256 > > > +0x133panic: spl assertion failure in if_down > > > Faulted in traceback, aborting... > > > > > > > > > > > > It's stuck at this point, I can't enter ddb. > > > > Are you running with WITNESS on purpose? Can you reproduce such problem > > without it? I'm not saying it's WITNESS fault, but it's clear that > > WITNESS kernels aren't ready for production yet. > > > > I'm trying to get more information because it had either hanged or > panicked previously (it didn't have serial connected at the time and > the machine was needed so it had to be rebooted before I had chance > to dig into it). >From which snapshot was the kernel that hanged or panic'd?
Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec
On 2017/11/12 22:48, Martin Pieuchot wrote: > On 12/11/17(Sun) 21:30, Stuart Henderson wrote: > > iked box, GENERIC.MP + WITNESS, -current as of Friday 10th: > > Weird, did you tweak "kern.splassert" on this box? Otherwise is looks > like a major corruption. It would have kern.splassert=2. (I know this can cause problems sometimes, though this would be the first time in 5+ years I've bumped into it, most of my routers where I have serial console have this set). > > login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: > > file "/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310 > ^^^ > Looks like one CPU is triggering this. > > > splassert: soassertlocked: want 1 have 256 > > > > panic: spl assertion failure in soassertlocked > ^^^ > That can't be coming from the same CPU.. > > > > > > Starting stack trace... > > Faulted in traceback, aborting... > > panic(splassert: if_down: want 1 have 256 > > panic: spl assertion failure in if_down) at > > Faulted in traceback, aborting... > > panicsplassert: if_down: want 1 have 256 > > +0x133panic: spl assertion failure in if_down > > Faulted in traceback, aborting... > > > > > > > > It's stuck at this point, I can't enter ddb. > > Are you running with WITNESS on purpose? Can you reproduce such problem > without it? I'm not saying it's WITNESS fault, but it's clear that > WITNESS kernels aren't ready for production yet. > I'm trying to get more information because it had either hanged or panicked previously (it didn't have serial connected at the time and the machine was needed so it had to be rebooted before I had chance to dig into it).
Re: assertion "_kernel_lock_held()" failed, uipc_socket2.c: ipsec
On 12/11/17(Sun) 21:30, Stuart Henderson wrote: > iked box, GENERIC.MP + WITNESS, -current as of Friday 10th: Weird, did you tweak "kern.splassert" on this box? Otherwise is looks like a major corruption. > login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file > "/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310 ^^^ Looks like one CPU is triggering this. > splassert: soassertlocked: want 1 have 256 > > panic: spl assertion failure in soassertlocked ^^^ That can't be coming from the same CPU.. > Starting stack trace... > Faulted in traceback, aborting... > panic(splassert: if_down: want 1 have 256 > panic: spl assertion failure in if_down) at > Faulted in traceback, aborting... > panicsplassert: if_down: want 1 have 256 > +0x133panic: spl assertion failure in if_down > Faulted in traceback, aborting... > > > > It's stuck at this point, I can't enter ddb. Are you running with WITNESS on purpose? Can you reproduce such problem without it? I'm not saying it's WITNESS fault, but it's clear that WITNESS kernels aren't ready for production yet.