Re: Double fault trap in rtable_l2
Hi, I guess I hit the same issue on a production box. At first I thought it may be related to faulty memory, but the description fits my issue. Same (minimal) trace. I encountered it first after applying the latest syspatch (just the last one, fixing a iseemingly unrelated check for drm). The box encoutered the issue twice yesterday and after reverting the patch it seems to be stable again. It's also related to iked/ikev2, as I seem to be able to trigger it relatively reliable by restarting iked. Since it's a VPN Gateway and I am doing home office today I don't want to trigger it now, but I should be able to reapply the syspatch and test it tomorrow. Why the drm related syspatch seems to be related doesn't make much sense to me, though... - Hinnerk On Mon, Apr 20, 2020 at 08:21:27AM +0200, Otto Moerbeek wrote: > On Mon, Apr 20, 2020 at 08:03:23AM +0200, Thomas de Grivel wrote: > > > Thanks Otto, > > > > Now I still don't know what could cause the double fault, I see no > > interrupt related code in rtable_l2. What am I missing ? I would like > > to investigate more but I'm not really a kernel developer. > > Traps are used for more things than interrupts. > > > > > The wikipedia page says it has to be a kernel bug, as in not from > > userland. It also says it would probably not happen on SPARC64. X86 > > has some flawed designs at its core > > > > I have a small diff for >2GB ext2fs partitions though I don't see how > > it could be related ? > > First retest with a kervel without diffs. > > If you collect more information you can file a bug report, see > http://www.openbsd.org/report.html > > -Otto > > > > > > Le dim. 19 avr. 2020 à 17:30, Otto Moerbeek a écrit : > > > > > > On Sun, Apr 19, 2020 at 10:26:20AM +0200, Thomas de Grivel wrote: > > > > > > > Hello, > > > > > > > > I got this error last night on an OpenBSD 6.6-stable amd64 on which I > > > > recently enabled IKEv2 : > > > > > > > > > kernel: double fault trap, code=0 > > > > > Stopped atrtable_l2+0x27: callq srp_enter+0x4 > > > > > > > > I'm a bit puzzled by the "double fault trap" part of the message, what > > > > does it mean ? > > > > > > > > The relevant sources seem to be /sys/net/rtable.c and > > > > /sys/kern/kern_srp.c though I don't really grok what I'm looking at > > > > there either. > > > > > > > > -- > > > > Thomas de Grivel > > > > kmx.io > > > > > > > > > > Googling is not that hard: https://en.wikipedia.org/wiki/Double_fault > > > > > > -Otto > > > > > > > > -- > > Thomas de Grivel > > kmx.io >
Re: Double fault trap in rtable_l2
On Sat, Apr 18, 2020 at 11:28 PM Thomas de Grivel wrote: > I got this error last night on an OpenBSD 6.6-stable amd64 on which I > recently enabled IKEv2 : > > > kernel: double fault trap, code=0 > > Stopped atrtable_l2+0x27: callq srp_enter+0x4 > That was the *complete* output from ddb? Really? Not a screen full of backtrace after that showing that it has a very deep stack? As you might guess from my questions: the #1 cause of a double fault traps are kernel bugs causing deep recursion where it runs off the end of the allocated stack, triggering a page fault exception which itself faults when it can't write the stack frame for the page fault. That "fault while trying to fault" results in a double fault, which I configured to be delivered on its own stack so that we can report this. Fixing the deep recursion in this case would require you providing the full stack trace to the list, so that the correct parties can see it and identify where it's incorrectly looping. Philip Guenther
Re: Double fault trap in rtable_l2
On Mon, Apr 20, 2020 at 08:03:23AM +0200, Thomas de Grivel wrote: > Thanks Otto, > > Now I still don't know what could cause the double fault, I see no > interrupt related code in rtable_l2. What am I missing ? I would like > to investigate more but I'm not really a kernel developer. Traps are used for more things than interrupts. > > The wikipedia page says it has to be a kernel bug, as in not from > userland. It also says it would probably not happen on SPARC64. X86 > has some flawed designs at its core > > I have a small diff for >2GB ext2fs partitions though I don't see how > it could be related ? First retest with a kervel without diffs. If you collect more information you can file a bug report, see http://www.openbsd.org/report.html -Otto > > Le dim. 19 avr. 2020 à 17:30, Otto Moerbeek a écrit : > > > > On Sun, Apr 19, 2020 at 10:26:20AM +0200, Thomas de Grivel wrote: > > > > > Hello, > > > > > > I got this error last night on an OpenBSD 6.6-stable amd64 on which I > > > recently enabled IKEv2 : > > > > > > > kernel: double fault trap, code=0 > > > > Stopped atrtable_l2+0x27: callq srp_enter+0x4 > > > > > > I'm a bit puzzled by the "double fault trap" part of the message, what > > > does it mean ? > > > > > > The relevant sources seem to be /sys/net/rtable.c and > > > /sys/kern/kern_srp.c though I don't really grok what I'm looking at > > > there either. > > > > > > -- > > > Thomas de Grivel > > > kmx.io > > > > > > > Googling is not that hard: https://en.wikipedia.org/wiki/Double_fault > > > > -Otto > > > > -- > Thomas de Grivel > kmx.io
Re: Double fault trap in rtable_l2
Thanks Otto, Now I still don't know what could cause the double fault, I see no interrupt related code in rtable_l2. What am I missing ? I would like to investigate more but I'm not really a kernel developer. The wikipedia page says it has to be a kernel bug, as in not from userland. It also says it would probably not happen on SPARC64. X86 has some flawed designs at its core I have a small diff for >2GB ext2fs partitions though I don't see how it could be related ? Le dim. 19 avr. 2020 à 17:30, Otto Moerbeek a écrit : > > On Sun, Apr 19, 2020 at 10:26:20AM +0200, Thomas de Grivel wrote: > > > Hello, > > > > I got this error last night on an OpenBSD 6.6-stable amd64 on which I > > recently enabled IKEv2 : > > > > > kernel: double fault trap, code=0 > > > Stopped atrtable_l2+0x27: callq srp_enter+0x4 > > > > I'm a bit puzzled by the "double fault trap" part of the message, what > > does it mean ? > > > > The relevant sources seem to be /sys/net/rtable.c and > > /sys/kern/kern_srp.c though I don't really grok what I'm looking at > > there either. > > > > -- > > Thomas de Grivel > > kmx.io > > > > Googling is not that hard: https://en.wikipedia.org/wiki/Double_fault > > -Otto -- Thomas de Grivel kmx.io
Re: Double fault trap in rtable_l2
On Sun, Apr 19, 2020 at 10:26:20AM +0200, Thomas de Grivel wrote: > Hello, > > I got this error last night on an OpenBSD 6.6-stable amd64 on which I > recently enabled IKEv2 : > > > kernel: double fault trap, code=0 > > Stopped atrtable_l2+0x27: callq srp_enter+0x4 > > I'm a bit puzzled by the "double fault trap" part of the message, what > does it mean ? > > The relevant sources seem to be /sys/net/rtable.c and > /sys/kern/kern_srp.c though I don't really grok what I'm looking at > there either. > > -- > Thomas de Grivel > kmx.io > Googling is not that hard: https://en.wikipedia.org/wiki/Double_fault -Otto
Double fault trap in rtable_l2
Hello, I got this error last night on an OpenBSD 6.6-stable amd64 on which I recently enabled IKEv2 : > kernel: double fault trap, code=0 > Stopped atrtable_l2+0x27: callq srp_enter+0x4 I'm a bit puzzled by the "double fault trap" part of the message, what does it mean ? The relevant sources seem to be /sys/net/rtable.c and /sys/kern/kern_srp.c though I don't really grok what I'm looking at there either. -- Thomas de Grivel kmx.io