On 2017/11/13 08:44, Martin Pieuchot wrote:
> On 12/11/17(Sun) 22:10, Stuart Henderson wrote:
> > On 2017/11/12 22:48, Martin Pieuchot wrote:
> > > On 12/11/17(Sun) 21:30, Stuart Henderson wrote:
> > > > iked box, GENERIC.MP + WITNESS, -current as of Friday 10th:
> > > 
> > > Weird, did you tweak "kern.splassert" on this box?   Otherwise is looks
> > > like a major corruption.
> > 
> > It would have kern.splassert=2. (I know this can cause problems
> > sometimes, though this would be the first time in 5+ years I've bumped
> > into it, most of my routers where I have serial console have this set).
> 
> Well the panic below correspond to a value of 0 or > 3.

Confirmed, it was definitely set to 2.

> > > > login: panic: kernel diagnostic assertion "_kernel_lock_held()" failed: 
> > > > file "/src/cvs-openbsd/sys/kern/uipc_socket2.c", line 310
> > > ^^^
> > > Looks like one CPU is triggering this.
> > > 
> > > > splassert: soassertlocked: want 1 have 256
> > > > 
> > > > panic: spl assertion failure in soassertlocked
> > > ^^^
> > > That can't be coming from the same CPU..
> > > 
> > > 
> > > 
> > > 
> > > > Starting stack trace...
> > > > Faulted in traceback, aborting...
> > > > panic(splassert: if_down: want 1 have 256
> > > > panic: spl assertion failure in if_down) at
> > > > Faulted in traceback, aborting...
> > > > panicsplassert: if_down: want 1 have 256
> > > > +0x133panic: spl assertion failure in if_down
> > > > Faulted in traceback, aborting...
> > > > 
> > > > <repeated a few times>
> > > > 
> > > > It's stuck at this point, I can't enter ddb.
> > > 
> > > Are you running with WITNESS on purpose?  Can you reproduce such problem
> > > without it?  I'm not saying it's WITNESS fault, but it's clear that
> > > WITNESS kernels aren't ready for production yet.
> > > 
> > 
> > I'm trying to get more information because it had either hanged or
> > panicked previously (it didn't have serial connected at the time and
> > the machine was needed so it had to be rebooted before I had chance
> > to dig into it).
> 
> From which snapshot was the kernel that hanged or panic'd?
> 

It was running this:

OpenBSD 6.2-current (GENERIC.MP) #199: Tue Nov  7 18:41:54 MST 2017

I've got it onto a remote control PDU now, now looking for some machine
with an old enough ssh client to be able to connect to the PDU :-|

Which kernel would be most useful to run now?

I have now moved it to -current GENERIC.MP with the "fast path chunk
removed from amd64/amd64/fpu.c fpu_kernel_enter() which we still suspect
as maybe having some issues.

Reply via email to