Re: [BUG] xfrm: unable to handle kernel NULL pointer dereference
On Fri, Nov 16, 2018 at 08:12:46PM +0100, Steffen Klassert wrote: > On Fri, Nov 16, 2018 at 08:48:00PM +0200, Lennert Buytenhek wrote: > > On Sat, Nov 10, 2018 at 08:34:34PM +0100, Jean-Philippe Menil wrote: > > > > > we're seeing unexpected crashes from kernel 4.15 to 4.18.17, using > > > IPsec VTI interfaces, on several vpn hosts, since upgrade from 4.4. > > > > I looked into this with Jean-Philippe, and it appears to be crashing > > on a NULL pointer dereference in the inlined xfrm_policy_check() call > > in vti_rcv_cb(), and specifically on the skb_dst(skb) dereference in > > __xfrm_policy_check2(): > > > > return (!net->xfrm.policy_count[dir] && !skb->sp) || > > (skb_dst(skb)->flags & DST_NOPOLICY) || <= > > __xfrm_policy_check(sk, ndir, skb, family); > > > > Commit 9e1437937807 ("xfrm: Fix NULL pointer dereference when > > skb_dst_force clears the dst_entry.") fixes a very similar problem on > > the output and forward paths, but our issue seems to be triggering on > > the input path. > > Yes, this is the same problem. skb_dst_force() does not > really force a refcount anymore, it might clear the dst > pointer instead (maybe this function should be renamed). I plan to apply this patch to the ipsec tree: [PATCH RFC] xfrm: Fix NULL pointer dereference in xfrm_input when skb_dst_force clears the dst_entry. Since commit 222d7dbd258d ("net: prevent dst uses after free") skb_dst_force() might clear the dst_entry attached to the skb. The xfrm code don't expect this to happen, so we crash with a NULL pointer dereference in this case. Fix it by checking skb_dst(skb) for NULL after skb_dst_force() and drop the packet in cast the dst_entry was cleared. We also move the skb_dst_force() to a codepath that is not used when the transformation was offloaded, because in this case we don't have a dst_entry attached to the skb. The output and forwarding path was already fixed by commit 9e1437937807 ("xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.") Fixes: 222d7dbd258d ("net: prevent dst uses after free") Reported-by: Jean-Philippe Menil Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_input.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 684c0bc01e2c..d5635908587f 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -346,6 +346,12 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) skb->sp->xvec[skb->sp->len++] = x; + skb_dst_force(skb); + if (!skb_dst(skb)) { + XFRM_INC_STATS(net, LINUX_MIB_XFRMINERROR); + goto drop; + } + lock: spin_lock(&x->lock); @@ -385,7 +391,6 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) XFRM_SKB_CB(skb)->seq.input.low = seq; XFRM_SKB_CB(skb)->seq.input.hi = seq_hi; - skb_dst_force(skb); dev_hold(skb->dev); if (crypto_done) -- 2.17.1
Re: [BUG] xfrm: unable to handle kernel NULL pointer dereference
On Fri, Nov 16, 2018 at 08:48:00PM +0200, Lennert Buytenhek wrote: > On Sat, Nov 10, 2018 at 08:34:34PM +0100, Jean-Philippe Menil wrote: > > > we're seeing unexpected crashes from kernel 4.15 to 4.18.17, using > > IPsec VTI interfaces, on several vpn hosts, since upgrade from 4.4. > > I looked into this with Jean-Philippe, and it appears to be crashing > on a NULL pointer dereference in the inlined xfrm_policy_check() call > in vti_rcv_cb(), and specifically on the skb_dst(skb) dereference in > __xfrm_policy_check2(): > > return (!net->xfrm.policy_count[dir] && !skb->sp) || > (skb_dst(skb)->flags & DST_NOPOLICY) || <= > __xfrm_policy_check(sk, ndir, skb, family); > > Commit 9e1437937807 ("xfrm: Fix NULL pointer dereference when > skb_dst_force clears the dst_entry.") fixes a very similar problem on > the output and forward paths, but our issue seems to be triggering on > the input path. Yes, this is the same problem. skb_dst_force() does not really force a refcount anymore, it might clear the dst pointer instead (maybe this function should be renamed). Want to submit a fix? If not I'll go to fix that. Thanks!
Re: [BUG] xfrm: unable to handle kernel NULL pointer dereference
On Sat, Nov 10, 2018 at 08:34:34PM +0100, Jean-Philippe Menil wrote: > we're seeing unexpected crashes from kernel 4.15 to 4.18.17, using > IPsec VTI interfaces, on several vpn hosts, since upgrade from 4.4. I looked into this with Jean-Philippe, and it appears to be crashing on a NULL pointer dereference in the inlined xfrm_policy_check() call in vti_rcv_cb(), and specifically on the skb_dst(skb) dereference in __xfrm_policy_check2(): return (!net->xfrm.policy_count[dir] && !skb->sp) || (skb_dst(skb)->flags & DST_NOPOLICY) || <= __xfrm_policy_check(sk, ndir, skb, family); Commit 9e1437937807 ("xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.") fixes a very similar problem on the output and forward paths, but our issue seems to be triggering on the input path. This hack patch seems to make the crashes go away, and the printk added triggers with approximately the same regularity as the crashes used to occur, so the fix from 9e1437937807 probably needs to be extended to the input path somewhat like this. Thanks! diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 352abca2605f..c666e29441b4 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -381,6 +381,12 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) XFRM_SKB_CB(skb)->seq.input.hi = seq_hi; skb_dst_force(skb); + if (!skb_dst(skb)) { + if (net_ratelimit()) + printk(KERN_CRIT "OH CRAP\n"); + goto drop; + } + dev_hold(skb->dev); if (crypto_done) > Attached, the offended oops against 4.18. > > Output of decodedecode: > > [ 37.134864] Code: 8b 44 24 70 0f c8 89 87 b4 00 00 00 48 8b 86 20 05 00 00 > 8b 80 f8 14 00 00 85 c0 75 05 48 85 d2 74 0e 48 8b 43 58 48 83 e0 fe 40 > 38 04 74 7d 44 89 b3 b4 00 00 00 49 8b 44 24 20 48 39 86 20 > All code > >0: 8b 44 24 70 mov0x70(%rsp),%eax >4: 0f c8 bswap %eax >6: 89 87 b4 00 00 00 mov%eax,0xb4(%rdi) >c: 48 8b 86 20 05 00 00mov0x520(%rsi),%rax > 13: 8b 80 f8 14 00 00 mov0x14f8(%rax),%eax > 19: 85 c0 test %eax,%eax > 1b: 75 05 jne0x22 > 1d: 48 85 d2test %rdx,%rdx > 20: 74 0e je 0x30 > 22: 48 8b 43 58 mov0x58(%rbx),%rax > 26: 48 83 e0 fe and$0xfffe,%rax > 2a:* f6 40 38 04 testb $0x4,0x38(%rax) <-- trapping > instruction > 2e: 74 7d je 0xad > 30: 44 89 b3 b4 00 00 00mov%r14d,0xb4(%rbx) > 37: 49 8b 44 24 20 mov0x20(%r12),%rax > 3c: 48 rex.W > 3d: 39 .byte 0x39 > 3e: 86 20 xchg %ah,(%rax) > > Code starting with the faulting instruction > === >0: f6 40 38 04 testb $0x4,0x38(%rax) >4: 74 7d je 0x83 >6: 44 89 b3 b4 00 00 00mov%r14d,0xb4(%rbx) >d: 49 8b 44 24 20 mov0x20(%r12),%rax > 12: 48 rex.W > 13: 39 .byte 0x39 > 14: 86 20 xchg %ah,(%rax) > > > if my understanding is correct, we fail here: > > /build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h: > 1169return (!net->xfrm.policy_count[dir] && !skb->sp) || >0x0b19 <+185>: testb $0x4,0x38(%rax) >0x0b1d <+189>: je 0xb9c > > (gdb) list *0x0b19 > 0xb19 is in vti_rcv_cb > (/build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:1169). > 1164int ndir = dir | (reverse ? XFRM_POLICY_MASK + 1 : 0); > 1165 > 1166if (sk && sk->sk_policy[XFRM_POLICY_IN]) > 1167return __xfrm_policy_check(sk, ndir, skb, family); > 1168 > 1169return (!net->xfrm.policy_count[dir] && !skb->sp) || > 1170(skb_dst(skb)->flags & DST_NOPOLICY) || > 1171__xfrm_policy_check(sk, ndir, skb, family); > 1172} > 1173 > > I really have hard time to understand why skb seem to be freed twice. > > I'm not able to repeat the bug in lab, but it happened regulary in prod, > seem to depend of the workload. > > Any help will be appreciated. > > Let me know if you need further informations. > > Regards, > > Jean-Philippe > [ 31.154360] BUG: unable to handle kernel NULL pointer dereference at > 0038 > [ 31.162233] PGD 0 P4D 0 > [ 31.164786] Oops: [#1] SMP PTI > [ 31.168291] CPU: 5 PID: 42 Comm: ksoftirqd/5 Not tainted 4.18.0-11-generic > #12~18.04.1-Ubuntu > [ 31.176854] Hardware name: Supermicro
[BUG] xfrm: unable to handle kernel NULL pointer dereference
Hi guys, we're seeing unexpected crashes from kernel 4.15 to 4.18.17, using IPsec VTI interfaces, on several vpn hosts, since upgrade from 4.4. Attached, the offended oops against 4.18. Output of decodedecode: [ 37.134864] Code: 8b 44 24 70 0f c8 89 87 b4 00 00 00 48 8b 86 20 05 00 00 8b 80 f8 14 00 00 85 c0 75 05 48 85 d2 74 0e 48 8b 43 58 48 83 e0 fe 40 38 04 74 7d 44 89 b3 b4 00 00 00 49 8b 44 24 20 48 39 86 20 All code 0: 8b 44 24 70 mov0x70(%rsp),%eax 4: 0f c8 bswap %eax 6: 89 87 b4 00 00 00 mov%eax,0xb4(%rdi) c: 48 8b 86 20 05 00 00mov0x520(%rsi),%rax 13: 8b 80 f8 14 00 00 mov0x14f8(%rax),%eax 19: 85 c0 test %eax,%eax 1b: 75 05 jne0x22 1d: 48 85 d2test %rdx,%rdx 20: 74 0e je 0x30 22: 48 8b 43 58 mov0x58(%rbx),%rax 26: 48 83 e0 fe and$0xfffe,%rax 2a:* f6 40 38 04 testb $0x4,0x38(%rax) <-- trapping instruction 2e: 74 7d je 0xad 30: 44 89 b3 b4 00 00 00mov%r14d,0xb4(%rbx) 37: 49 8b 44 24 20 mov0x20(%r12),%rax 3c: 48 rex.W 3d: 39 .byte 0x39 3e: 86 20 xchg %ah,(%rax) Code starting with the faulting instruction === 0: f6 40 38 04 testb $0x4,0x38(%rax) 4: 74 7d je 0x83 6: 44 89 b3 b4 00 00 00mov%r14d,0xb4(%rbx) d: 49 8b 44 24 20 mov0x20(%r12),%rax 12: 48 rex.W 13: 39 .byte 0x39 14: 86 20 xchg %ah,(%rax) if my understanding is correct, we fail here: /build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h: 1169return (!net->xfrm.policy_count[dir] && !skb->sp) || 0x0b19 <+185>: testb $0x4,0x38(%rax) 0x0b1d <+189>: je 0xb9c (gdb) list *0x0b19 0xb19 is in vti_rcv_cb (/build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:1169). 1164int ndir = dir | (reverse ? XFRM_POLICY_MASK + 1 : 0); 1165 1166if (sk && sk->sk_policy[XFRM_POLICY_IN]) 1167return __xfrm_policy_check(sk, ndir, skb, family); 1168 1169return (!net->xfrm.policy_count[dir] && !skb->sp) || 1170(skb_dst(skb)->flags & DST_NOPOLICY) || 1171__xfrm_policy_check(sk, ndir, skb, family); 1172} 1173 I really have hard time to understand why skb seem to be freed twice. I'm not able to repeat the bug in lab, but it happened regulary in prod, seem to depend of the workload. Any help will be appreciated. Let me know if you need further informations. Regards, Jean-Philippe [ 31.154360] BUG: unable to handle kernel NULL pointer dereference at 0038 [ 31.162233] PGD 0 P4D 0 [ 31.164786] Oops: [#1] SMP PTI [ 31.168291] CPU: 5 PID: 42 Comm: ksoftirqd/5 Not tainted 4.18.0-11-generic #12~18.04.1-Ubuntu [ 31.176854] Hardware name: Supermicro Super Server/X10SDV-4C-7TP4F, BIOS 1.0b 11/21/2016 [ 31.184980] RIP: 0010:vti_rcv_cb+0xb9/0x1a0 [ip_vti] [ 31.189962] Code: 8b 44 24 70 0f c8 89 87 b4 00 00 00 48 8b 86 20 05 00 00 8b 80 f8 14 00 00 85 c0 75 05 48 85 d2 74 0e 48 8b 43 58 48 83 e0 fe 40 38 04 74 7d 44 89 b3 b4 00 00 00 49 8b 44 24 20 48 39 86 20 [ 31.208916] RSP: 0018:bc61832e3920 EFLAGS: 00010246 [ 31.214160] RAX: RBX: 9a3504964a00 RCX: 0002 [ 31.221328] RDX: 9a351add4080 RSI: 9a351aa08000 RDI: 9a3504964a00 [ 31.228485] RBP: bc61832e3940 R08: 0004 R09: c0aa612b [ 31.235643] R10: 0008f09b99881884 R11: 1884bd4e2d6b1fac R12: 9a3507b31900 [ 31.242803] R13: 9a3507b31000 R14: R15: 9a3504964a00 [ 31.249964] FS: () GS:9a35bfd4() knlGS: [ 31.258077] CS: 0010 DS: ES: CR0: 80050033 [ 31.263848] CR2: 0038 CR3: 00041a40a003 CR4: 003606e0 [ 31.271004] DR0: DR1: DR2: [ 31.278163] DR3: DR6: fffe0ff0 DR7: 0400 [ 31.285320] Call Trace: [ 31.287789] xfrm4_rcv_cb+0x4a/0x70 [ 31.291297] xfrm_input+0x58f/0x8f0 [ 31.294807] vti_input+0xaa/0x110 [ip_vti] [ 31.298926] vti_rcv+0x33/0x3c [ip_vti] [ 31.302783] xfrm4_esp_rcv+0x39/0x50 [ 31.306375] ip_local_deliver_finish+0x62/0x200 [ 31.310923] ip_local_deliver+0xdf/0xf0 [ 31.314775] ? ip_rcv_finish+0x420/0x420 [ 31.318718] ip_rcv_finish+0x126/0x420 [ 31.322486] ip_rcv+0x28f/0x360 [ 31.325655] ? inet_del_offload+0x40/0x40 [ 31.329686] __netif_receive_skb_core+0x48c/0xb70 [ 31.