Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()

2018-09-19 Thread Tobias Hommel
> After running for about 24 hours, I now encountered another panic. This time 
> it
> is caused by an out of memory situation. Although the trace shows action in 
> the
> filesystem code I'm posting it here because I cannot isolate the error and
> maybe it is caused by our NULL pointer bug or by the new fix.
> I do not have a serial console attached, so I could only attach a screenshot 
> of
> the panic to this mail.
> 
> I am running v4.19-rc3 from git with the above mentioned patch applied.
> After 19 hours everything still looked fine, XfrmFwdHdrError value was at 
> ~950.
> Overall memory usage shown by htop was at 1.2G/15.6G.
> I had htop running via ssh so I was able to see at least some status post
> mortem. Uptime: 23:50:57
> Overall memory usage was at 10.2G/15.6G and user processes were just
> using the usual amount of memory, so it looks like the kernel was eating up at
> least 9G of RAM.
> 
> Maybe this information is not very helpful for debugging, but it is at least a
> warning that something might still be wrong.
> 
> I'll try to gather some more information and keep you updated.

Running stable under load for more than 5 days now, I was not able to reproduce
that OOM situation. I leave it at that, the fix for the initial bug is fine for
me.


Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()

2018-09-12 Thread Tobias Hommel
On Wed, Sep 12, 2018 at 10:50:46AM +0200, Steffen Klassert wrote:
> On Tue, Sep 11, 2018 at 09:02:48PM +0200, Tobias Hommel wrote:
> > > > Subject: [PATCH RFC] xfrm: Fix NULL pointer dereference when 
> > > > skb_dst_force
> > > > clears the dst_entry.
> > > > 
> > > > Since commit 222d7dbd258d ("net: prevent dst uses after free")
> > > > skb_dst_force() might clear the dst_entry attached to the skb.
> > > > The xfrm code don't expect this to happen, so we crash with
> > > > a NULL pointer dereference in this case. Fix it by checking
> > > > skb_dst(skb) for NULL after skb_dst_force() and drop the packet
> > > > in cast the dst_entry was cleared.
> > > > 
> > > > Fixes: 222d7dbd258d ("net: prevent dst uses after free")
> > > > Reported-by: Tobias Hommel 
> > > > Reported-by: Kristian Evensen 
> > > > Reported-by: Wolfgang Walter 
> > > > Signed-off-by: Steffen Klassert 
> > > > ---
> > > 
> > > This patch fixes the problem here.
> > > 
> > > XfrmFwdHdrError gets around 80 at the very beginning and remains so. 
> > > Probably 
> > > this happens when some route are changed/set then. 
> > > 
> > > Regards and thanks,
> > 
> > Same here, we're now running stable for ~6 hours, XfrmFwdHdrError is at 220.
> > This is less than 1 lost packet per minute, which seems to be okay for now.
> 
> Thanks a lot for testing! This is now applied to the ipsec tree.

After running for about 24 hours, I now encountered another panic. This time it
is caused by an out of memory situation. Although the trace shows action in the
filesystem code I'm posting it here because I cannot isolate the error and
maybe it is caused by our NULL pointer bug or by the new fix.
I do not have a serial console attached, so I could only attach a screenshot of
the panic to this mail.

I am running v4.19-rc3 from git with the above mentioned patch applied.
After 19 hours everything still looked fine, XfrmFwdHdrError value was at ~950.
Overall memory usage shown by htop was at 1.2G/15.6G.
I had htop running via ssh so I was able to see at least some status post
mortem. Uptime: 23:50:57
Overall memory usage was at 10.2G/15.6G and user processes were just
using the usual amount of memory, so it looks like the kernel was eating up at
least 9G of RAM.

Maybe this information is not very helpful for debugging, but it is at least a
warning that something might still be wrong.

I'll try to gather some more information and keep you updated.


Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()

2018-09-11 Thread Tobias Hommel
> > Subject: [PATCH RFC] xfrm: Fix NULL pointer dereference when skb_dst_force
> > clears the dst_entry.
> > 
> > Since commit 222d7dbd258d ("net: prevent dst uses after free")
> > skb_dst_force() might clear the dst_entry attached to the skb.
> > The xfrm code don't expect this to happen, so we crash with
> > a NULL pointer dereference in this case. Fix it by checking
> > skb_dst(skb) for NULL after skb_dst_force() and drop the packet
> > in cast the dst_entry was cleared.
> > 
> > Fixes: 222d7dbd258d ("net: prevent dst uses after free")
> > Reported-by: Tobias Hommel 
> > Reported-by: Kristian Evensen 
> > Reported-by: Wolfgang Walter 
> > Signed-off-by: Steffen Klassert 
> > ---
> >  net/xfrm/xfrm_output.c | 4 
> >  net/xfrm/xfrm_policy.c | 4 
> >  2 files changed, 8 insertions(+)
> > 
> > diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
> > index 89b178a78dc7..36d15a38ce5e 100644
> > --- a/net/xfrm/xfrm_output.c
> > +++ b/net/xfrm/xfrm_output.c
> > @@ -101,6 +101,10 @@ static int xfrm_output_one(struct sk_buff *skb, int
> > err) spin_unlock_bh(>lock);
> > 
> > skb_dst_force(skb);
> > +   if (!skb_dst(skb)) {
> > +   XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
> > +   goto error_nolock;
> > +   }
> > 
> > if (xfrm_offload(skb)) {
> > x->type_offload->encap(x, skb);
> > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> > index 7c5e8978aeaa..626e0f4d1749 100644
> > --- a/net/xfrm/xfrm_policy.c
> > +++ b/net/xfrm/xfrm_policy.c
> > @@ -2548,6 +2548,10 @@ int __xfrm_route_forward(struct sk_buff *skb,
> > unsigned short family) }
> > 
> > skb_dst_force(skb);
> > +   if (!skb_dst(skb)) {
> > +   XFRM_INC_STATS(net, LINUX_MIB_XFRMFWDHDRERROR);
> > +   return 0;
> > +   }
> > 
> > dst = xfrm_lookup(net, skb_dst(skb), , NULL, XFRM_LOOKUP_QUEUE);
> > if (IS_ERR(dst)) {
> 
> This patch fixes the problem here.
> 
> XfrmFwdHdrError gets around 80 at the very beginning and remains so. Probably 
> this happens when some route are changed/set then. 
> 
> Regards and thanks,

Same here, we're now running stable for ~6 hours, XfrmFwdHdrError is at 220.
This is less than 1 lost packet per minute, which seems to be okay for now.


Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()

2018-09-10 Thread Tobias Hommel
On Mon, Sep 10, 2018 at 08:37:39AM +0200, Steffen Klassert wrote:
...
> The other thing I wonder about is why Tobias bisected this to
> 
> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
> ipv4: mark DST_NOGC and remove the operation of dst_free()
> 
> from 'Jun 17 2017' and not to
> 
> commit 222d7dbd258dad4cd5241c43ef818141fad5a87a
> net: prevent dst uses after free
> 
> from 'Sep 21 2017'.
> 
> Maybe Tobias has seen two bugs. Before
> ("net: prevent dst uses after free"), it was the
> use after free, and after this fix it was a NULL
> pointer derference of skb->dst.
> 
Uhm, yeah, I checked back, we actually had different bugs. My mistake, sorry
for the confusion.


Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

2018-09-06 Thread Tobias Hommel
Hey guys,

I finally got some time to do a bisect and we narrowed the problem down to:

b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
Author: Wei Wang 
Date:   Sat Jun 17 10:42:32 2017 -0700

ipv4: mark DST_NOGC and remove the operation of dst_free()

With the previous preparation patches, we are ready to get rid of the
dst gc operation in ipv4 code and release dst based on refcnt only.
So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls
to dst_free().
At this point, all dst created in ipv4 code do not use the dst gc
anymore and will be destroyed at the point when refcnt drops to 0.

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
Signed-off-by: David S. Miller 

:04 04 9b7e7fb641de6531fc7887473ca47ef7cb6a11da 
831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M  net


I also saw there was a new thread some days ago reporting a similar problem. So
I put you guys (Wolfgang, Wei) into Cc.

Tobi

On Thu, Jun 14, 2018 at 10:38:01AM +0200, Kristian Evensen wrote:
> Hello,
> 
> On Tue, Jun 12, 2018 at 10:29 AM, Kristian Evensen
>  wrote:
> > Thanks for spending time on this. I will see what I can manage in
> > terms of a bisect. Our last good kernel was 4.9, so at least it
> > narrows the scope down a bit compared to 4.4 or 4.1.
> 
> I hope we might have got somewhere. While looking more into ipsec and
> 4.14, we noticed large performance regressions (-~20%) on some
> low-powered devices we are also using. We quickly identified the
> removal of the flow cache as the "culprit", and the performance
> regression is discussed in the netdev-thread for the removal of the
> cache ("xfrm: remove flow cache"). For the time being and in order to
> restore the performance, we have reverted the patch series removing
> the flow cache. When running our tests (on the APU) after the revert,
> we no longer see the crash. Before the revert, the APU would always
> crash within some hours. After the revert, our tests have been running
> for 24 hours+. Our test is quite basic, we establish 1, 2, 3 ...,  50
> tunnels and then run iperf on all tunnels in parallel. The tunnels are
> teared down between each iteration.
> 
> We are still running the test and will keep doing so, but I thought I
> should share this finding in case it can help in fixing the error. I
> will report back in case we find out something more, and please let me
> know if you have any suggestions for things I can test. I don't for
> example know if it is safe to revert one and one commit of the flow
> cache, to try to pin the crash even more down.
> 
> BR,
> Kristian


Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

2018-06-12 Thread Tobias Hommel
On Fri, Jun 08, 2018 at 10:41:37AM +0200, Kristian Evensen wrote:
> Hi,
> 
> On Wed, Jun 6, 2018 at 6:03 PM, Tobias Hommel  
> wrote:
> > Sorry no progress until now, I currently do not get time to have a deeper 
> > look
> > into that. We're back to 4.1.6 right now.
> 
> Thanks for letting me know. In the project I am currently involved in,
> we unfortunately don't have the option of reverting the kernel, so we
> are finding ways to live with the error. We have been looking into the
> error a bit more, and have made the following observations:
> 
> * First of all, as discussed earlier in the thread, the error is
> triggered by dst_orig being NULL. Our current work-around is just to
> return from xfrm_lookup if dst_orig is NULL and this seems to work
> fine, the error doesn't happen that often (in our use-cases at least).
> * The machine we use for testing (and where we first saw the error) is
> used as initiator.
The machine where I encountered the bug is a "roadwarrior gateway", so it only
serves as a responder.

> * When we compare the logs from Strongswan with the ones from the
> kernel, it seems that the error is typically triggered when a tunnels
> is teared down/about to come up. We need quite a lot of tunnels for
> the error to trigger, usually around 30+. I guess this might point to
> some race or some condition not being met when packets are
> sent/received.
> * We see the error much more frequently when hardware encryption is enabled.
> * Yesterday, we upgraded the kernel from 4.14.34 to 4.14.48, and the
> error happens much less frequently. I see that 4.14.48 includes
> several IPsec fixes (for example the previously mentioned ("xfrm: Fix
> a race in the xdst pcpu cache.")).
> 
> BR,
> Kristian


Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

2018-06-06 Thread Tobias Hommel
Hi,

On Wed, Jun 06, 2018 at 12:41:53PM +0200, Kristian Evensen wrote:
> Hi,
> 
> I am experiencing the same issue on a PC Engines APU2 running kernel
> 4.14.34, both with and without hardware encryption. With hw.
> encryption, the crash occurs within 2-4 hours. Without hw. encryption,
> it takes 7-8 hours. My setup is nothing crazy, between 7 and 20
> tunnels with heavy RX/TX.
> 
> On Fri, Feb 2, 2018 at 9:09 AM, Steffen Klassert
>  wrote:
> > Thanks for offering help, but I fear we have to wait until
> > Tobias has bisected it.
> 
> Has any progress been made here? I would like to try a bisect myself,

Sorry no progress until now, I currently do not get time to have a deeper look
into that. We're back to 4.1.6 right now.

> but my board is running OpenWRT, which makes bisecting hard. The only
> observation I have so far, is that I did not see the issue with kernel
> 4.9.
> 
> BR,
> Kristian

Tobi


Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

2018-01-29 Thread Tobias Hommel
On Wed, Jan 24, 2018 at 10:59:21AM +0100, Steffen Klassert wrote:
> On Fri, Jan 19, 2018 at 03:45:46PM +0100, Tobias Hommel wrote:
> > 
> > I tried to strip down the system configuration and was able to reproduce the
> > problem with a minimal configuration:
> > * ipsets are not used anymore
> > * no firewall markings are used any longer
> > * iptables are "completely empty", i.e. all policies set to ACCEPT and 
> > there is
> >   no rule in any table
> > * no additional routing policies (ip rule) except the default ones
> > * only main routing table is used
> > * using a "minimal" kernel config:
> >  * run `make defconfig`
> >  * add basic things (ESP, IGB driver, some crypto algorithms)
> >  * add options required to boot up the system (TPM crypt, some device mapper
> >options, overlayfs)
> > 
> > I attached the minimal config (minimal.config) and the defconfig for 
> > reference
> > (minimal.defconfig).
> > 
> > The setup is really simple now, the gateway is forwarding HTTP connections
> > between eth1(IPSec tunnels) and eth0 without any firewall, NAT, whatsoever.
> 
> Thanks a lot for your debugging effort!
> 
> > 
> > The only thing I can think of are the rather aggressive roadwarrior clients.
> > There are 750 roadwarriors that are controlled by a script which starts and
> > stops the IPSec connection.
> 
> I still can't reproduce it with my tests. This is probably some race
> triggered due to your aggressive roadwarrior setup which I don't have.
> 
> > I tried 4.15-rc8 and have the same problem here (see attached
> > kernel-4.15-rc8.log). SMP affinity for IRQs has changed in 4.15 and 
> > something's
> 
> There is one patch that could influence this which is not in v4.15-rc8:
> 
> commit 76a4201191814a0061cb5c861fafb9ecaa764846
> ("xfrm: Fix a race in the xdst pcpu cache.")
> 
> It is included in v4.15-rc9.
I already tested that one some weeks ago, when it appeared on the mailing list,
with 4.14. Without any luck.

> 
> If this does not fix your problem, I'm out of ideas. In this case
> I have to ask to do a bisection to find the offending commit.
> 
I'll do a bisect session then. It'll take some time though as the hardware is
currently occupied with other tests. I'll keep you up-to-date about the
results.


Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

2018-01-10 Thread Tobias Hommel
On Wed, Jan 10, 2018 at 08:30:38AM +0100, Steffen Klassert wrote:
> On Tue, Jan 09, 2018 at 03:49:21PM +0100, Tobias Hommel wrote:
> > 
> > I copied the config from my 4.14.12 sources to a fresh 4.13.16 source tree, 
> > ran
> > `make olddefconfig` and built a new kernel.
> > The kernel config is attached as kernel-4.13.16.config.
> > The panic*.log files are kernel logs from different crashes of this 4.13.16
> > kernel, but all from the same scenario as before.
> > I also enabled CONFIG_DEBUG_INFO, so if any disassemblies are required, I'd 
> > be
> > happy to provide them.
> > 
> > So, the system still crashes, but the traces are completely different from
> > those with 4.14.12. This time there are also WARNINGs and "refcnt: -1" 
> > messages
> > sometimes before the actual panic, so not sure if there is maybe some other
> > problem. Still, the crashes all seem to be related to ip routing somehow.
> 
> Strange, you must do something that other people don't do.
> Do you have some uncommon netfiler rules, namespaces, etc?
No, no namespaces yet.
However, the box uses marks and routing based on marks. Firewall marks are a
bit strange sometimes, so I'll try to clean up everything and see if it is
possible to reproduce the bug without marks.

> 
> Please try to build your kernels with
> 
> CONFIG_ORC_UNWINDER (v4.14 and above)
> 
> and
> 
> CONFIG_KASAN
> 
> This can give some better debug informations (depends on the compiler
> version).
I'll also try that. I'm currently using GCC 5.4.0.

> 
> There are some things we can do now:
> 
> - Try v4.15-rc7, just to be sure that we don't search for
>   something that is already fixed.
And that one, too. All this will probably take some time though. ;-)
I'll keep you informed.

> 
> - Find a working kernel version and try to bisect.
> 
> - Minimalize the configuration with that the bug happens,
>   so that I can try to reproduce it here.
> 


Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

2018-01-09 Thread Tobias Hommel
On Tue, Jan 09, 2018 at 03:49:21PM +0100, Tobias Hommel wrote:
> On Tue, Jan 09, 2018 at 10:26:24AM +0100, Steffen Klassert wrote:
> > On Tue, Jan 09, 2018 at 10:06:51AM +0100, Tobias Hommel wrote:
> > > > 
> > > > You have CONFIG_INET_ESP_OFFLOAD enabled, this is new maybe it
> > > > still has some problems. You should not hit an offload codepath
> > > > because all your SAs are configured with UDP encapsulation which
> > > > is still not supported with offload.
I ran some new tests with 4.14.12. This time I removed encap=yes from the
strongswan config so I have plain ESP tunnels, without UDP encapsulation. Just
to be sure. It still crashes, the attached panic.noencap.log is pretty much
the same as the logs before.

> > > > 
> > > > Please try to disable GRO on both interfaces and see what happens:
> > > > 
> > > > ethtool -K eth0 gro off
> > > > ethtool -K eth1 gro off
> > > I actually already tried that with only eth1 off, to verify I turned 
> > > offloading
> > > off for both interfaces. The same problem: see attached panic.gro_off.log
> > > 
> > > > 
> > > > Then disable CONFIG_INET_ESP_OFFLOAD and try again.
> > > Rebuild with CONFIG_INET_ESP_OFFLOAD disabled, same problem: see attached
> > > panic.esp_offload_disabled.log
> > 
> > So ESP offload is not the problem. Next thing that comes to my mind
> > is the flowcache removal, this was introduced with v4.14.
> > 
> > > 
> > > > 
> > > > This should show us if this feature is responsible for the bug.
> > > > 
> > > 
> > > I will try narrowing down the problem by trying out some older kernels 
> > > for now.
> > 
> > Thanks!
> > 
> > Let me know about the results.
> 
> I copied the config from my 4.14.12 sources to a fresh 4.13.16 source tree, 
> ran
> `make olddefconfig` and built a new kernel.
> The kernel config is attached as kernel-4.13.16.config.
> The panic*.log files are kernel logs from different crashes of this 4.13.16
> kernel, but all from the same scenario as before.
> I also enabled CONFIG_DEBUG_INFO, so if any disassemblies are required, I'd be
> happy to provide them.
> 
> So, the system still crashes, but the traces are completely different from
> those with 4.14.12. This time there are also WARNINGs and "refcnt: -1" 
> messages
> sometimes before the actual panic, so not sure if there is maybe some other
> problem. Still, the crashes all seem to be related to ip routing somehow.
[ 2298.720212] BUG: unable to handle kernel NULL pointer dereference at 
0020
[ 2298.728193] IP: xfrm_lookup+0x2a/0x7d0
[ 2298.731986] PGD 0 P4D 0 
[ 2298.734535] Oops:  [#1] SMP PTI
[ 2298.738035] Modules linked in:
[ 2298.741121] CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.14.12 #3
[ 2298.747362] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 1.01 
07/11/2016
[ 2298.755136] task: a0dafb08dc00 task.stack: a211c004
[ 2298.761091] RIP: 0010:xfrm_lookup+0x2a/0x7d0
[ 2298.765403] RSP: 0018:a211c0043ad0 EFLAGS: 00010246
[ 2298.770656] RAX:  RBX: 87074080 RCX: 
[ 2298.777851] RDX: a211c0043b48 RSI:  RDI: 87074080
[ 2298.785025] RBP: 87074080 R08: 0002 R09: 
[ 2298.792184] R10: 0020 R11: 0020 R12: a211c0043b48
[ 2298.799351] R13:  R14: 0002 R15: a0dafb240078
[ 2298.806511] FS:  () GS:a0daffc0() 
knlGS:
[ 2298.814647] CS:  0010 DS:  ES:  CR0: 80050033
[ 2298.820428] CR2: 0020 CR3: 000177dcc000 CR4: 001006f0
[ 2298.827587] Call Trace:
[ 2298.830045]  __xfrm_route_forward+0xa4/0x110
[ 2298.834340]  ip_forward+0x3da/0x450
[ 2298.837851]  ? ip_rcv_finish+0x61/0x390
[ 2298.841708]  ip_rcv+0x2b5/0x380
[ 2298.844871]  ? inet_del_offload+0x30/0x30
[ 2298.848910]  __netif_receive_skb_core+0x751/0xb00
[ 2298.853640]  ? netif_receive_skb_internal+0x47/0xf0
[ 2298.858573]  ? inet_gro_receive+0x1fa/0x2a0
[ 2298.862785]  netif_receive_skb_internal+0x47/0xf0
[ 2298.867523]  dev_gro_receive+0x270/0x440
[ 2298.871487]  napi_gro_receive+0x28/0x90
[ 2298.875350]  igb_poll+0x600/0xe80
[ 2298.878695]  net_rx_action+0x1fc/0x310
[ 2298.882478]  __do_softirq+0xd5/0x1cf
[ 2298.886064]  run_ksoftirqd+0x14/0x30
[ 2298.889670]  smpboot_thread_fn+0xf9/0x150
[ 2298.893707]  kthread+0xf2/0x130
[ 2298.896869]  ? sort_range+0x20/0x20
[ 2298.900387]  ? kthread_park+0x60/0x60
[ 2298.904080]  ret_from_fork+0x1f/0x30
[ 2298.907684] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 89 
d4 48 89 fb 48 83 ec 40 65 

Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

2018-01-09 Thread Tobias Hommel
On Tue, Jan 09, 2018 at 09:19:39AM +0100, Steffen Klassert wrote:
> On Mon, Jan 08, 2018 at 02:53:48PM +0100, Tobias Hommel wrote:
> 
> ...
> 
> > [  439.095554] BUG: unable to handle kernel NULL pointer dereference at 
> > 0020
> > [  439.103664] IP: xfrm_lookup+0x2a/0x7d0
> > [  439.107551] PGD 0 P4D 0 
> > [  439.110144] Oops:  [#1] SMP PTI
> > [  439.113653] Modules linked in:
> > [  439.116774] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.14.12 #1
> > [  439.122900] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 
> > 1.01 07/11/2016
> > [  439.130769] task: 8cf33b0ea280 task.stack: 9492c009
> > [  439.136726] RIP: 0010:xfrm_lookup+0x2a/0x7d0
> > [  439.141005] RSP: 0018:8cf33fd83bd0 EFLAGS: 00010246
> > [  439.146315] RAX:  RBX: 87074080 RCX: 
> > 
> > [  439.153537] RDX: 8cf33fd83c48 RSI:  RDI: 
> > 87074080
> > [  439.160780] RBP: 87074080 R08: 0002 R09: 
> > 
> > [  439.167958] R10: 0020 R11: 0020 R12: 
> > 8cf33fd83c48
> > [  439.175115] R13:  R14: 0002 R15: 
> > 8cf33b240078
> > [  439.182337] FS:  () GS:8cf33fd8() 
> > knlGS:
> > [  439.190456] CS:  0010 DS:  ES:  CR0: 80050033
> > [  439.196227] CR2: 0020 CR3: 00013200a000 CR4: 
> > 001006e0
> > [  439.203386] Call Trace:
> > [  439.205869]  
> > [  439.207886]  __xfrm_route_forward+0xa4/0x110
> > [  439.212195]  ip_forward+0x3da/0x450
> > [  439.215696]  ? ip_rcv_finish+0x61/0x390
> > [  439.219542]  ip_rcv+0x2b5/0x380
> > [  439.222716]  ? inet_del_offload+0x30/0x30
> > [  439.226736]  __netif_receive_skb_core+0x751/0xb00
> > [  439.231469]  ? netif_receive_skb_internal+0x47/0xf0
> > [  439.236391]  netif_receive_skb_internal+0x47/0xf0
> > [  439.241150]  napi_gro_flush+0x50/0x70
> > [  439.244831]  napi_complete_done+0x90/0xd0
> > [  439.248872]  igb_poll+0x8fd/0xe80
> > [  439.252190]  net_rx_action+0x1fc/0x310
> > [  439.255978]  __do_softirq+0xd5/0x1cf
> > [  439.259584]  irq_exit+0xa3/0xb0
> > [  439.262763]  do_IRQ+0x45/0xc0
> > [  439.265772]  common_interrupt+0x95/0x95
> > [  439.269609]  
> > [  439.271733] RIP: 0010:cpuidle_enter_state+0x120/0x200
> > [  439.276810] RSP: 0018:9492c0093eb8 EFLAGS: 0282 ORIG_RAX: 
> > ff5d
> > [  439.284436] RAX: 8cf33fd9ea80 RBX: 0002 RCX: 
> > 00663c21ea0f
> > [  439.291604] RDX:  RSI: 36ca RDI: 
> > 
> > [  439.298772] RBP: 8cf33fda71e8 R08: 0003 R09: 
> > 0018
> > [  439.305930] R10:  R11: 057c R12: 
> > 00663c21ea0f
> > [  439.313089] R13: 00663c1c6c33 R14: 0002 R15: 
> > 
> > [  439.320259]  ? cpuidle_enter_state+0x11c/0x200
> > [  439.324740]  do_idle+0xd6/0x170
> > [  439.327885]  cpu_startup_entry+0x67/0x70
> > [  439.331837]  start_secondary+0x167/0x190
> > [  439.335788]  secondary_startup_64+0xa5/0xb0
> > [  439.340001] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 
> > 89 d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 
> > <48> 8b 46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 0f 84 
> > [  439.358988] RIP: xfrm_lookup+0x2a/0x7d0 RSP: 8cf33fd83bd0
> > [  439.364759] CR2: 0020
> > [  439.368105] ---[ end trace c6b298b556ea7769 ]---
> > [  439.372752] Kernel panic - not syncing: Fatal exception in interrupt
> > [  439.379255] Kernel Offset: 0x500 from 0x8100 (relocation 
> > range: 0x8000-0xbfff)
> > [  439.390029] Rebooting in 10 seconds..
> 
> ...
> 
> > 4230 :
> > 4230:   41 57   push   %r15
> > 4232:   41 56   push   %r14
> > 4234:   45 89 c6mov%r8d,%r14d
> > 4237:   41 55   push   %r13
> > 4239:   41 54   push   %r12
> > 423b:   49 89 f5mov%rsi,%r13
> > 423e:   55  push   %rbp
> > 423f:   53  push   %rbx
> > 4240:   49 89 d4mov%rdx,%r12
> > 4243:   48 89 fbmov%rdi,%rbx
> > 4246:   48 83 ec 40 sub$0x40,%rsp
> > 424a:   65 48 8b 04 2

Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

2018-01-08 Thread Tobias Hommel
On Fri, Jan 05, 2018 at 09:55:23PM +, Tobias Hommel wrote:
> On Sat, Jan 06, 2018 at 12:27:11AM +0300, Ozgur wrote:
> > 
> > 
> > 06.01.2018, 00:20, "Tobias Hommel" <netdev-l...@genoetigt.de>:
> > > Hi,
> > 
> > Hi Tobias,
> > 
> > > I'm running into a NULL pointer dereference after updating from Linux 
> > > 4.1.6 to
> > > 4.14.11 (see kernel log below). I tried 4.14.3 initially which did not 
> > > work
> > > either.
> > > Anyone has an idea what is happening here?
> > >
> > > The affected machine has 2 active ethernet interfaces (igb driver) and 
> > > acts as
> > > a VPN gateway running strongswan. There are several hundreds of IPSec
> > > roadwarriors connecting to eth1. eth0 connects to an infrastructure 
> > > running an
> > > HTTP server.
> > > During my tests these roadwarriors connect to the gateway, sometimes 
> > > download a
> > > large file from the HTTP server, disconnect and after a random delay 
> > > repeat
> > > these steps.
> > >
> > > Some observations I made:
> > > * SMP Affinity for IRQs of the NICs Rx/Tx queues 
> > > (/proc/irq/$IRQ/smp_affinity)
> > >   * all affinities set to default ff is broken
> > >   * setting affinity for all queues of both interfaces to the same CPU 
> > > seems to
> > > work fine (running stable for more than 1 day now)
> > >   * setting affinity of eth0 queues to CPU 1 and affinity of eth1 queues 
> > > to CPU
> > > 2 is broken and seems to always trigger the bug on CPU 1
> > > * the top 6 entries of the call trace are the same every time the system
> > >   crashes, the other entries differ sometimes
> > >
> > > The bug is 100% reproducible on the Intel Atom machine from the log below 
> > > and
> > > also on a HP ProLiant Gen6 (also igb driver).
> > > I can, of course, provide further information (CPU, NIC, kernel config, 
> > > more
> > > traces, etc.) if required.
> > > If helpful I could also run tests on HP ProLiant Gen9 which has different 
> > > NICs
> > > (tg3).
> > >
> > > [ 7998.489094] BUG: unable to handle kernel NULL pointer dereference at 
> > > 0020
> > > [ 7998.496993] IP: xfrm_lookup+0x2a/0x7e0
> > > [ 7998.500759] PGD 0 P4D 0
> > > [ 7998.503316] Oops:  [#1] SMP PTI
> > > [ 7998.506835] Modules linked in:
> > > [ 7998.509929] CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 4.14.11 #3
> > > [ 7998.516244] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 
> > > 1.01 07/11/2016
> > > [ 7998.524039] task: 8826bb118000 task.stack: 947ac00f
> > > [ 7998.530004] RIP: 0010:xfrm_lookup+0x2a/0x7e0
> > > [ 7998.534298] RSP: 0018:947ac00f3b60 EFLAGS: 00010246
> > > [ 7998.539550] RAX:  RBX: 93074040 RCX: 
> > > 
> > > [ 7998.546709] RDX: 947ac00f3bd8 RSI:  RDI: 
> > > 93074040
> > > [ 7998.553868] RBP: 93074040 R08: 0002 R09: 
> > > 0001
> > > [ 7998.561026] R10: 0032 R11:  R12: 
> > > 947ac00f3bd8
> > > [ 7998.568212] R13:  R14: 0002 R15: 
> > > 8826b69a8078
> > > [ 7998.575395] FS: () GS:8826bfc8() 
> > > knlGS:
> > > [ 7998.583550] CS: 0010 DS:  ES:  CR0: 80050033
> > > [ 7998.589324] CR2: 0020 CR3: 0001781da000 CR4: 
> > > 001006e0
> > > [ 7998.596482] Call Trace:
> > > [ 7998.598959] __xfrm_route_forward+0xa4/0x110
> > > [ 7998.603263] ip_forward+0x3e0/0x450
> > > [ 7998.606778] ? ip_rcv_finish+0x61/0x3a0
> > > [ 7998.610645] ip_rcv+0x2c4/0x390
> > > [ 7998.613818] ? inet_del_offload+0x30/0x30
> > > [ 7998.617857] __netif_receive_skb_core+0x751/0xb00
> > > [ 7998.622562] ? skb_send_sock+0x40/0x40
> > > [ 7998.626356] ? netif_receive_skb_internal+0x47/0xf0
> > > [ 7998.631252] netif_receive_skb_internal+0x47/0xf0
> > > [ 7998.635987] napi_gro_receive+0x70/0x90
> > > [ 7998.639835] gro_cell_poll+0x53/0x90
> > > [ 7998.643439] net_rx_action+0x1fc/0x310
> > > [ 7998.647210] ? rebalance_domains+0x101/0x2b0
> > > [ 7998.651500] __do_softirq+0xd5/0x1cf
> > > [ 7998.655105] run_ksoftirqd+0x14/0x30
> > > [ 7998.658712] smpboot_thr

Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

2018-01-05 Thread Tobias Hommel
On Fri, Jan 05, 2018 at 09:51:16PM +, Holger Hoffstätte wrote:
> On Fri, 05 Jan 2018 22:13:23 +0100, Tobias Hommel wrote:
> 
> > Hi,
> > 
> > I'm running into a NULL pointer dereference after updating from Linux 4.1.6 
> > to
> > 4.14.11 (see kernel log below). I tried 4.14.3 initially which did not work
> > either.
> > Anyone has an idea what is happening here?
> 
> Try 4.14.12 because of:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.14.y=2d01ac8cc12b973668bf898b03bf9ffb12d83b83

Using tunnel mode here, not transport mode. Anyway, I tried it, the same
problem:
[  275.655170] BUG: unable to handle kernel NULL pointer dereference at 
0020
[  275.663230] IP: xfrm_lookup+0x2a/0x7d0
[  275.666986] PGD 0 P4D 0 
[  275.669579] Oops:  [#1] SMP PTI
[  275.673097] Modules linked in:
[  275.676182] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.14.12 #1
[  275.682215] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 1.01 
07/11/2016
[  275.690013] task: 9b43fb0ed080 task.stack: b0af0009
[  275.695960] RIP: 0010:xfrm_lookup+0x2a/0x7d0
[  275.700256] RSP: 0018:9b43ffd83bd0 EFLAGS: 00010246
[  275.705528] RAX:  RBX: 8e074080 RCX: 
[  275.712710] RDX: 9b43ffd83c48 RSI:  RDI: 8e074080
[  275.719895] RBP: 8e074080 R08: 0002 R09: 
[  275.727071] R10: 0020 R11: 0020 R12: 9b43ffd83c48
[  275.734248] R13:  R14: 0002 R15: 9b43fb240078
[  275.741415] FS:  () GS:9b43ffd8() 
knlGS:
[  275.749527] CS:  0010 DS:  ES:  CR0: 80050033
[  275.755307] CR2: 0020 CR3: 00013e00a000 CR4: 001006e0
[  275.762474] Call Trace:
[  275.764939]   
[  275.766986]  __xfrm_route_forward+0xa4/0x110
[  275.771282]  ip_forward+0x3da/0x450
[  275.774803]  ? ip_rcv_finish+0x61/0x390
[  275.778666]  ip_rcv+0x2b5/0x380
[  275.781840]  ? inet_del_offload+0x30/0x30
[  275.785860]  __netif_receive_skb_core+0x751/0xb00
[  275.790593]  ? tcp_gro_receive+0x24d/0x310
[  275.794716]  ? netif_receive_skb_internal+0x47/0xf0
[  275.799620]  netif_receive_skb_internal+0x47/0xf0
[  275.804381]  napi_gro_flush+0x50/0x70
[  275.808071]  napi_complete_done+0x90/0xd0
[  275.812111]  igb_poll+0x8fd/0xe80
[  275.815458]  net_rx_action+0x1fc/0x310
[  275.819227]  __do_softirq+0xd5/0x1cf
[  275.822834]  irq_exit+0xa3/0xb0
[  275.826003]  do_IRQ+0x45/0xc0
[  275.829004]  common_interrupt+0x95/0x95
[  275.832868]  
[  275.835002] RIP: 0010:cpuidle_enter_state+0x120/0x200
[  275.840076] RSP: 0018:b0af00093eb8 EFLAGS: 0282 ORIG_RAX: 
ff7d
[  275.847685] RAX: 9b43ffd9ea80 RBX: 0002 RCX: 00402e53956c
[  275.854844] RDX:  RSI: 36ca RDI: 
[  275.862039] RBP: 9b43ffda71e8 R08: 0003 R09: 0018
[  275.869222] R10:  R11: 0102 R12: 00402e53956c
[  275.876398] R13: 00402e4e8838 R14: 0002 R15: 
[  275.883568]  ? cpuidle_enter_state+0x11c/0x200
[  275.888023]  do_idle+0xd6/0x170
[  275.891177]  cpu_startup_entry+0x67/0x70
[  275.895129]  start_secondary+0x167/0x190
[  275.899080]  secondary_startup_64+0xa5/0xb0
[  275.903291] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 89 
d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 <48> 8b 
46 20 48 85 c9 44 0f b7 38 c7 44 
[  275.922273] RIP: xfrm_lookup+0x2a/0x7d0 RSP: 9b43ffd83bd0
[  275.928070] CR2: 0020
[  275.931417] ---[ end trace 453df6e200be3ed0 ]---
[  275.936061] Kernel panic - not syncing: Fatal exception in interrupt
[  275.942566] Kernel Offset: 0xc00 from 0x8100 (relocation 
range: 0x8000-0xbfff)
[  275.953309] Rebooting in 10 seconds..

> 
> -h
> 


Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

2018-01-05 Thread Tobias Hommel
On Sat, Jan 06, 2018 at 12:27:11AM +0300, Ozgur wrote:
> 
> 
> 06.01.2018, 00:20, "Tobias Hommel" <netdev-l...@genoetigt.de>:
> > Hi,
> 
> Hi Tobias,
> 
> > I'm running into a NULL pointer dereference after updating from Linux 4.1.6 
> > to
> > 4.14.11 (see kernel log below). I tried 4.14.3 initially which did not work
> > either.
> > Anyone has an idea what is happening here?
> >
> > The affected machine has 2 active ethernet interfaces (igb driver) and acts 
> > as
> > a VPN gateway running strongswan. There are several hundreds of IPSec
> > roadwarriors connecting to eth1. eth0 connects to an infrastructure running 
> > an
> > HTTP server.
> > During my tests these roadwarriors connect to the gateway, sometimes 
> > download a
> > large file from the HTTP server, disconnect and after a random delay repeat
> > these steps.
> >
> > Some observations I made:
> > * SMP Affinity for IRQs of the NICs Rx/Tx queues 
> > (/proc/irq/$IRQ/smp_affinity)
> >   * all affinities set to default ff is broken
> >   * setting affinity for all queues of both interfaces to the same CPU 
> > seems to
> > work fine (running stable for more than 1 day now)
> >   * setting affinity of eth0 queues to CPU 1 and affinity of eth1 queues to 
> > CPU
> > 2 is broken and seems to always trigger the bug on CPU 1
> > * the top 6 entries of the call trace are the same every time the system
> >   crashes, the other entries differ sometimes
> >
> > The bug is 100% reproducible on the Intel Atom machine from the log below 
> > and
> > also on a HP ProLiant Gen6 (also igb driver).
> > I can, of course, provide further information (CPU, NIC, kernel config, more
> > traces, etc.) if required.
> > If helpful I could also run tests on HP ProLiant Gen9 which has different 
> > NICs
> > (tg3).
> >
> > [ 7998.489094] BUG: unable to handle kernel NULL pointer dereference at 
> > 0020
> > [ 7998.496993] IP: xfrm_lookup+0x2a/0x7e0
> > [ 7998.500759] PGD 0 P4D 0
> > [ 7998.503316] Oops:  [#1] SMP PTI
> > [ 7998.506835] Modules linked in:
> > [ 7998.509929] CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 4.14.11 #3
> > [ 7998.516244] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 
> > 1.01 07/11/2016
> > [ 7998.524039] task: 8826bb118000 task.stack: 947ac00f
> > [ 7998.530004] RIP: 0010:xfrm_lookup+0x2a/0x7e0
> > [ 7998.534298] RSP: 0018:947ac00f3b60 EFLAGS: 00010246
> > [ 7998.539550] RAX:  RBX: 93074040 RCX: 
> > 
> > [ 7998.546709] RDX: 947ac00f3bd8 RSI:  RDI: 
> > 93074040
> > [ 7998.553868] RBP: 93074040 R08: 0002 R09: 
> > 0001
> > [ 7998.561026] R10: 0032 R11:  R12: 
> > 947ac00f3bd8
> > [ 7998.568212] R13:  R14: 0002 R15: 
> > 8826b69a8078
> > [ 7998.575395] FS: () GS:8826bfc8() 
> > knlGS:
> > [ 7998.583550] CS: 0010 DS:  ES:  CR0: 80050033
> > [ 7998.589324] CR2: 0020 CR3: 0001781da000 CR4: 
> > 001006e0
> > [ 7998.596482] Call Trace:
> > [ 7998.598959] __xfrm_route_forward+0xa4/0x110
> > [ 7998.603263] ip_forward+0x3e0/0x450
> > [ 7998.606778] ? ip_rcv_finish+0x61/0x3a0
> > [ 7998.610645] ip_rcv+0x2c4/0x390
> > [ 7998.613818] ? inet_del_offload+0x30/0x30
> > [ 7998.617857] __netif_receive_skb_core+0x751/0xb00
> > [ 7998.622562] ? skb_send_sock+0x40/0x40
> > [ 7998.626356] ? netif_receive_skb_internal+0x47/0xf0
> > [ 7998.631252] netif_receive_skb_internal+0x47/0xf0
> > [ 7998.635987] napi_gro_receive+0x70/0x90
> > [ 7998.639835] gro_cell_poll+0x53/0x90
> > [ 7998.643439] net_rx_action+0x1fc/0x310
> > [ 7998.647210] ? rebalance_domains+0x101/0x2b0
> > [ 7998.651500] __do_softirq+0xd5/0x1cf
> > [ 7998.655105] run_ksoftirqd+0x14/0x30
> > [ 7998.658712] smpboot_thread_fn+0xf9/0x150
> > [ 7998.662723] kthread+0xef/0x130
> > [ 7998.665893] ? sort_range+0x20/0x20
> > [ 7998.669404] ? kthread_park+0x60/0x60
> > [ 7998.673098] ret_from_fork+0x1f/0x30
> > [ 7998.676674] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 
> > 89 d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 
> > <48> 8b 46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 0f 84
> > [ 7998.695681] RIP: xfrm_lookup+0x2a/0x7e0 RSP: 947ac00f3b60
> > [ 7998.701479] CR

BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

2018-01-05 Thread Tobias Hommel
Hi,

I'm running into a NULL pointer dereference after updating from Linux 4.1.6 to
4.14.11 (see kernel log below). I tried 4.14.3 initially which did not work
either.
Anyone has an idea what is happening here?

The affected machine has 2 active ethernet interfaces (igb driver) and acts as
a VPN gateway running strongswan. There are several hundreds of IPSec
roadwarriors connecting to eth1. eth0 connects to an infrastructure running an
HTTP server.
During my tests these roadwarriors connect to the gateway, sometimes download a
large file from the HTTP server, disconnect and after a random delay repeat
these steps.

Some observations I made:
* SMP Affinity for IRQs of the NICs Rx/Tx queues (/proc/irq/$IRQ/smp_affinity)
  * all affinities set to default ff is broken
  * setting affinity for all queues of both interfaces to the same CPU seems to
work fine (running stable for more than 1 day now)
  * setting affinity of eth0 queues to CPU 1 and affinity of eth1 queues to CPU
2 is broken and seems to always trigger the bug on CPU 1
* the top 6 entries of the call trace are the same every time the system
  crashes, the other entries differ sometimes

The bug is 100% reproducible on the Intel Atom machine from the log below and
also on a HP ProLiant Gen6 (also igb driver).
I can, of course, provide further information (CPU, NIC, kernel config, more
traces, etc.) if required.
If helpful I could also run tests on HP ProLiant Gen9 which has different NICs
(tg3).

[ 7998.489094] BUG: unable to handle kernel NULL pointer dereference at 
0020
[ 7998.496993] IP: xfrm_lookup+0x2a/0x7e0
[ 7998.500759] PGD 0 P4D 0 
[ 7998.503316] Oops:  [#1] SMP PTI
[ 7998.506835] Modules linked in:
[ 7998.509929] CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 4.14.11 #3
[ 7998.516244] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 1.01 
07/11/2016
[ 7998.524039] task: 8826bb118000 task.stack: 947ac00f
[ 7998.530004] RIP: 0010:xfrm_lookup+0x2a/0x7e0
[ 7998.534298] RSP: 0018:947ac00f3b60 EFLAGS: 00010246
[ 7998.539550] RAX:  RBX: 93074040 RCX: 
[ 7998.546709] RDX: 947ac00f3bd8 RSI:  RDI: 93074040
[ 7998.553868] RBP: 93074040 R08: 0002 R09: 0001
[ 7998.561026] R10: 0032 R11:  R12: 947ac00f3bd8
[ 7998.568212] R13:  R14: 0002 R15: 8826b69a8078
[ 7998.575395] FS:  () GS:8826bfc8() 
knlGS:
[ 7998.583550] CS:  0010 DS:  ES:  CR0: 80050033
[ 7998.589324] CR2: 0020 CR3: 0001781da000 CR4: 001006e0
[ 7998.596482] Call Trace:
[ 7998.598959]  __xfrm_route_forward+0xa4/0x110
[ 7998.603263]  ip_forward+0x3e0/0x450
[ 7998.606778]  ? ip_rcv_finish+0x61/0x3a0
[ 7998.610645]  ip_rcv+0x2c4/0x390
[ 7998.613818]  ? inet_del_offload+0x30/0x30
[ 7998.617857]  __netif_receive_skb_core+0x751/0xb00
[ 7998.622562]  ? skb_send_sock+0x40/0x40
[ 7998.626356]  ? netif_receive_skb_internal+0x47/0xf0
[ 7998.631252]  netif_receive_skb_internal+0x47/0xf0
[ 7998.635987]  napi_gro_receive+0x70/0x90
[ 7998.639835]  gro_cell_poll+0x53/0x90
[ 7998.643439]  net_rx_action+0x1fc/0x310
[ 7998.647210]  ? rebalance_domains+0x101/0x2b0
[ 7998.651500]  __do_softirq+0xd5/0x1cf
[ 7998.655105]  run_ksoftirqd+0x14/0x30
[ 7998.658712]  smpboot_thread_fn+0xf9/0x150
[ 7998.662723]  kthread+0xef/0x130
[ 7998.665893]  ? sort_range+0x20/0x20
[ 7998.669404]  ? kthread_park+0x60/0x60
[ 7998.673098]  ret_from_fork+0x1f/0x30
[ 7998.676674] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 89 
d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 <48> 8b 
46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 0f 84 
[ 7998.695681] RIP: xfrm_lookup+0x2a/0x7e0 RSP: 947ac00f3b60
[ 7998.701479] CR2: 0020
[ 7998.704799] ---[ end trace 0544b1946919baad ]---
[ 7998.709442] Kernel panic - not syncing: Fatal exception in interrupt
[ 7998.715918] Kernel Offset: 0x1100 from 0x8100 (relocation 
range: 0x8000-0xbfff)

Best regards,

Tobias Hommel