Re: Double fault trap in rtable_l2

2020-04-21 Thread Hinnerk van Bruinehsen
Hi,

I guess I hit the same issue on a production box. At first I thought it
may be related to faulty memory, but the description fits my issue. Same
(minimal) trace.
I encountered it first after applying the latest syspatch (just the last
one, fixing a iseemingly unrelated check for drm).
The box encoutered the issue twice yesterday and after reverting the
patch it seems to be stable again.
It's also related to iked/ikev2, as I seem to be able to trigger it
relatively reliable by restarting iked.
Since it's a VPN Gateway and I am doing home office today I don't want to
trigger it now, but I should be able to reapply the syspatch and test it
tomorrow.

Why the drm related syspatch seems to be related doesn't make much sense
to me, though...


- Hinnerk



On Mon, Apr 20, 2020 at 08:21:27AM +0200, Otto Moerbeek wrote:
> On Mon, Apr 20, 2020 at 08:03:23AM +0200, Thomas de Grivel wrote:
> 
> > Thanks Otto,
> > 
> > Now I still don't know what could cause the double fault, I see no
> > interrupt related code in rtable_l2. What am I missing ? I would like
> > to investigate more but I'm not really a kernel developer.
> 
> Traps are used for more things than interrupts.
> 
> > 
> > The wikipedia page says it has to be a kernel bug, as in not from
> > userland. It also says it would probably not happen on SPARC64. X86
> > has some flawed designs at its core
> > 
> > I have a small diff for >2GB ext2fs partitions though I don't see how
> > it could be related ?
> 
> First retest with a kervel without diffs.
> 
> If you collect more information you can file a bug report, see
> http://www.openbsd.org/report.html
> 
>   -Otto
> 
> 
> > 
> > Le dim. 19 avr. 2020 à 17:30, Otto Moerbeek  a écrit :
> > >
> > > On Sun, Apr 19, 2020 at 10:26:20AM +0200, Thomas de Grivel wrote:
> > >
> > > > Hello,
> > > >
> > > > I got this error last night on an OpenBSD 6.6-stable amd64 on which I
> > > > recently enabled IKEv2 :
> > > >
> > > > > kernel: double fault trap, code=0
> > > > > Stopped atrtable_l2+0x27: callq   srp_enter+0x4
> > > >
> > > > I'm a bit puzzled by the "double fault trap" part of the message, what
> > > > does it mean ?
> > > >
> > > > The relevant sources seem to be /sys/net/rtable.c and
> > > > /sys/kern/kern_srp.c though I don't really grok what I'm looking at
> > > > there either.
> > > >
> > > > --
> > > >  Thomas de Grivel
> > > >  kmx.io
> > > >
> > >
> > > Googling is not that hard: https://en.wikipedia.org/wiki/Double_fault
> > >
> > > -Otto
> > 
> > 
> > 
> > -- 
> >  Thomas de Grivel
> >  kmx.io
> 



Re: Double fault trap in rtable_l2

2020-04-20 Thread Philip Guenther
On Sat, Apr 18, 2020 at 11:28 PM Thomas de Grivel 
wrote:

> I got this error last night on an OpenBSD 6.6-stable amd64 on which I
> recently enabled IKEv2 :
>
> > kernel: double fault trap, code=0
> > Stopped atrtable_l2+0x27: callq   srp_enter+0x4
>

That was the *complete* output from ddb?  Really?  Not a screen full of
backtrace after that showing that it has a very deep stack?

As you might guess from my questions: the #1 cause of a double fault traps
are kernel bugs causing deep recursion where it runs off the end of the
allocated stack, triggering a page fault exception which itself faults when
it can't write the stack frame for the page fault.  That "fault while
trying to fault" results in a double fault, which I configured to be
delivered on its own stack so that we can report this.

Fixing the deep recursion in this case would require you providing the full
stack trace to the list, so that the correct parties can see it and
identify where it's incorrectly looping.


Philip Guenther


Re: Double fault trap in rtable_l2

2020-04-19 Thread Otto Moerbeek
On Mon, Apr 20, 2020 at 08:03:23AM +0200, Thomas de Grivel wrote:

> Thanks Otto,
> 
> Now I still don't know what could cause the double fault, I see no
> interrupt related code in rtable_l2. What am I missing ? I would like
> to investigate more but I'm not really a kernel developer.

Traps are used for more things than interrupts.

> 
> The wikipedia page says it has to be a kernel bug, as in not from
> userland. It also says it would probably not happen on SPARC64. X86
> has some flawed designs at its core
> 
> I have a small diff for >2GB ext2fs partitions though I don't see how
> it could be related ?

First retest with a kervel without diffs.

If you collect more information you can file a bug report, see
http://www.openbsd.org/report.html

-Otto


> 
> Le dim. 19 avr. 2020 à 17:30, Otto Moerbeek  a écrit :
> >
> > On Sun, Apr 19, 2020 at 10:26:20AM +0200, Thomas de Grivel wrote:
> >
> > > Hello,
> > >
> > > I got this error last night on an OpenBSD 6.6-stable amd64 on which I
> > > recently enabled IKEv2 :
> > >
> > > > kernel: double fault trap, code=0
> > > > Stopped atrtable_l2+0x27: callq   srp_enter+0x4
> > >
> > > I'm a bit puzzled by the "double fault trap" part of the message, what
> > > does it mean ?
> > >
> > > The relevant sources seem to be /sys/net/rtable.c and
> > > /sys/kern/kern_srp.c though I don't really grok what I'm looking at
> > > there either.
> > >
> > > --
> > >  Thomas de Grivel
> > >  kmx.io
> > >
> >
> > Googling is not that hard: https://en.wikipedia.org/wiki/Double_fault
> >
> > -Otto
> 
> 
> 
> -- 
>  Thomas de Grivel
>  kmx.io



Re: Double fault trap in rtable_l2

2020-04-19 Thread Thomas de Grivel
Thanks Otto,

Now I still don't know what could cause the double fault, I see no
interrupt related code in rtable_l2. What am I missing ? I would like
to investigate more but I'm not really a kernel developer.

The wikipedia page says it has to be a kernel bug, as in not from
userland. It also says it would probably not happen on SPARC64. X86
has some flawed designs at its core

I have a small diff for >2GB ext2fs partitions though I don't see how
it could be related ?

Le dim. 19 avr. 2020 à 17:30, Otto Moerbeek  a écrit :
>
> On Sun, Apr 19, 2020 at 10:26:20AM +0200, Thomas de Grivel wrote:
>
> > Hello,
> >
> > I got this error last night on an OpenBSD 6.6-stable amd64 on which I
> > recently enabled IKEv2 :
> >
> > > kernel: double fault trap, code=0
> > > Stopped atrtable_l2+0x27: callq   srp_enter+0x4
> >
> > I'm a bit puzzled by the "double fault trap" part of the message, what
> > does it mean ?
> >
> > The relevant sources seem to be /sys/net/rtable.c and
> > /sys/kern/kern_srp.c though I don't really grok what I'm looking at
> > there either.
> >
> > --
> >  Thomas de Grivel
> >  kmx.io
> >
>
> Googling is not that hard: https://en.wikipedia.org/wiki/Double_fault
>
> -Otto



-- 
 Thomas de Grivel
 kmx.io



Re: Double fault trap in rtable_l2

2020-04-19 Thread Otto Moerbeek
On Sun, Apr 19, 2020 at 10:26:20AM +0200, Thomas de Grivel wrote:

> Hello,
> 
> I got this error last night on an OpenBSD 6.6-stable amd64 on which I
> recently enabled IKEv2 :
> 
> > kernel: double fault trap, code=0
> > Stopped atrtable_l2+0x27: callq   srp_enter+0x4
> 
> I'm a bit puzzled by the "double fault trap" part of the message, what
> does it mean ?
> 
> The relevant sources seem to be /sys/net/rtable.c and
> /sys/kern/kern_srp.c though I don't really grok what I'm looking at
> there either.
> 
> -- 
>  Thomas de Grivel
>  kmx.io
> 

Googling is not that hard: https://en.wikipedia.org/wiki/Double_fault

-Otto



Double fault trap in rtable_l2

2020-04-19 Thread Thomas de Grivel
Hello,

I got this error last night on an OpenBSD 6.6-stable amd64 on which I
recently enabled IKEv2 :

> kernel: double fault trap, code=0
> Stopped atrtable_l2+0x27: callq   srp_enter+0x4

I'm a bit puzzled by the "double fault trap" part of the message, what
does it mean ?

The relevant sources seem to be /sys/net/rtable.c and
/sys/kern/kern_srp.c though I don't really grok what I'm looking at
there either.

-- 
 Thomas de Grivel
 kmx.io