Re: [BUG] xfrm: unable to handle kernel NULL pointer dereference

2018-11-21 Thread Steffen Klassert
On Fri, Nov 16, 2018 at 08:12:46PM +0100, Steffen Klassert wrote:
> On Fri, Nov 16, 2018 at 08:48:00PM +0200, Lennert Buytenhek wrote:
> > On Sat, Nov 10, 2018 at 08:34:34PM +0100, Jean-Philippe Menil wrote:
> > 
> > > we're seeing unexpected crashes from kernel 4.15 to 4.18.17, using
> > > IPsec VTI interfaces, on several vpn hosts, since upgrade from 4.4.
> > 
> > I looked into this with Jean-Philippe, and it appears to be crashing
> > on a NULL pointer dereference in the inlined xfrm_policy_check() call
> > in vti_rcv_cb(), and specifically on the skb_dst(skb) dereference in
> > __xfrm_policy_check2():
> > 
> > return  (!net->xfrm.policy_count[dir] && !skb->sp) ||
> > (skb_dst(skb)->flags & DST_NOPOLICY) || <=
> > __xfrm_policy_check(sk, ndir, skb, family);
> > 
> > Commit 9e1437937807 ("xfrm: Fix NULL pointer dereference when
> > skb_dst_force clears the dst_entry.") fixes a very similar problem on
> > the output and forward paths, but our issue seems to be triggering on
> > the input path.
> 
> Yes, this is the same problem. skb_dst_force() does not
> really force a refcount anymore, it might clear the dst
> pointer instead (maybe this function should be renamed).

I plan to apply this patch to the ipsec tree:

[PATCH RFC] xfrm: Fix NULL pointer dereference in xfrm_input when skb_dst_force 
clears the dst_entry.

Since commit 222d7dbd258d ("net: prevent dst uses after free")
skb_dst_force() might clear the dst_entry attached to the skb.
The xfrm code don't expect this to happen, so we crash with
a NULL pointer dereference in this case.

Fix it by checking skb_dst(skb) for NULL after skb_dst_force()
and drop the packet in cast the dst_entry was cleared. We also
move the skb_dst_force() to a codepath that is not used when
the transformation was offloaded, because in this case we
don't have a  dst_entry attached to the skb.

The output and forwarding path was already fixed by
commit 9e1437937807 ("xfrm: Fix NULL pointer dereference when
skb_dst_force clears the dst_entry.")

Fixes: 222d7dbd258d ("net: prevent dst uses after free")
Reported-by: Jean-Philippe Menil 
Signed-off-by: Steffen Klassert 
---
 net/xfrm/xfrm_input.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 684c0bc01e2c..d5635908587f 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -346,6 +346,12 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
 
skb->sp->xvec[skb->sp->len++] = x;
 
+   skb_dst_force(skb);
+   if (!skb_dst(skb)) {
+   XFRM_INC_STATS(net, LINUX_MIB_XFRMINERROR);
+   goto drop;
+   }
+
 lock:
spin_lock(&x->lock);
 
@@ -385,7 +391,6 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
XFRM_SKB_CB(skb)->seq.input.low = seq;
XFRM_SKB_CB(skb)->seq.input.hi = seq_hi;
 
-   skb_dst_force(skb);
dev_hold(skb->dev);
 
if (crypto_done)
-- 
2.17.1



Re: [BUG] xfrm: unable to handle kernel NULL pointer dereference

2018-11-16 Thread Steffen Klassert
On Fri, Nov 16, 2018 at 08:48:00PM +0200, Lennert Buytenhek wrote:
> On Sat, Nov 10, 2018 at 08:34:34PM +0100, Jean-Philippe Menil wrote:
> 
> > we're seeing unexpected crashes from kernel 4.15 to 4.18.17, using
> > IPsec VTI interfaces, on several vpn hosts, since upgrade from 4.4.
> 
> I looked into this with Jean-Philippe, and it appears to be crashing
> on a NULL pointer dereference in the inlined xfrm_policy_check() call
> in vti_rcv_cb(), and specifically on the skb_dst(skb) dereference in
> __xfrm_policy_check2():
> 
>   return  (!net->xfrm.policy_count[dir] && !skb->sp) ||
>   (skb_dst(skb)->flags & DST_NOPOLICY) || <=
>   __xfrm_policy_check(sk, ndir, skb, family);
> 
> Commit 9e1437937807 ("xfrm: Fix NULL pointer dereference when
> skb_dst_force clears the dst_entry.") fixes a very similar problem on
> the output and forward paths, but our issue seems to be triggering on
> the input path.

Yes, this is the same problem. skb_dst_force() does not
really force a refcount anymore, it might clear the dst
pointer instead (maybe this function should be renamed).

Want to submit a fix? If not I'll go to fix that.

Thanks!


Re: [BUG] xfrm: unable to handle kernel NULL pointer dereference

2018-11-16 Thread Lennert Buytenhek
On Sat, Nov 10, 2018 at 08:34:34PM +0100, Jean-Philippe Menil wrote:

> we're seeing unexpected crashes from kernel 4.15 to 4.18.17, using
> IPsec VTI interfaces, on several vpn hosts, since upgrade from 4.4.

I looked into this with Jean-Philippe, and it appears to be crashing
on a NULL pointer dereference in the inlined xfrm_policy_check() call
in vti_rcv_cb(), and specifically on the skb_dst(skb) dereference in
__xfrm_policy_check2():

return  (!net->xfrm.policy_count[dir] && !skb->sp) ||
(skb_dst(skb)->flags & DST_NOPOLICY) || <=
__xfrm_policy_check(sk, ndir, skb, family);

Commit 9e1437937807 ("xfrm: Fix NULL pointer dereference when
skb_dst_force clears the dst_entry.") fixes a very similar problem on
the output and forward paths, but our issue seems to be triggering on
the input path.

This hack patch seems to make the crashes go away, and the printk added
triggers with approximately the same regularity as the crashes used
to occur, so the fix from 9e1437937807 probably needs to be extended
to the input path somewhat like this.

Thanks!


diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 352abca2605f..c666e29441b4 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -381,6 +381,12 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
XFRM_SKB_CB(skb)->seq.input.hi = seq_hi;
 
skb_dst_force(skb);
+   if (!skb_dst(skb)) {
+   if (net_ratelimit())
+   printk(KERN_CRIT "OH CRAP\n");
+   goto drop;
+   }
+
dev_hold(skb->dev);
 
if (crypto_done)



> Attached, the offended oops against 4.18.
> 
> Output of decodedecode:
> 
> [ 37.134864] Code: 8b 44 24 70 0f c8 89 87 b4 00 00 00 48 8b 86 20 05 00 00
> 8b 80 f8 14 00 00 85 c0 75 05 48 85 d2 74 0e 48 8b 43 58 48 83 e0 fe  40
> 38 04 74 7d 44 89 b3 b4 00 00 00 49 8b 44 24 20 48 39 86 20
> All code
> 
>0:   8b 44 24 70 mov0x70(%rsp),%eax
>4:   0f c8   bswap  %eax
>6:   89 87 b4 00 00 00   mov%eax,0xb4(%rdi)
>c:   48 8b 86 20 05 00 00mov0x520(%rsi),%rax
>   13:   8b 80 f8 14 00 00   mov0x14f8(%rax),%eax
>   19:   85 c0   test   %eax,%eax
>   1b:   75 05   jne0x22
>   1d:   48 85 d2test   %rdx,%rdx
>   20:   74 0e   je 0x30
>   22:   48 8b 43 58 mov0x58(%rbx),%rax
>   26:   48 83 e0 fe and$0xfffe,%rax
>   2a:*  f6 40 38 04 testb  $0x4,0x38(%rax)  <-- trapping
> instruction
>   2e:   74 7d   je 0xad
>   30:   44 89 b3 b4 00 00 00mov%r14d,0xb4(%rbx)
>   37:   49 8b 44 24 20  mov0x20(%r12),%rax
>   3c:   48  rex.W
>   3d:   39  .byte 0x39
>   3e:   86 20   xchg   %ah,(%rax)
> 
> Code starting with the faulting instruction
> ===
>0:   f6 40 38 04 testb  $0x4,0x38(%rax)
>4:   74 7d   je 0x83
>6:   44 89 b3 b4 00 00 00mov%r14d,0xb4(%rbx)
>d:   49 8b 44 24 20  mov0x20(%r12),%rax
>   12:   48  rex.W
>   13:   39  .byte 0x39
>   14:   86 20   xchg   %ah,(%rax)
> 
> 
> if my understanding is correct, we fail here:
> 
> /build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:
> 1169return  (!net->xfrm.policy_count[dir] && !skb->sp) ||
>0x0b19 <+185>:   testb  $0x4,0x38(%rax)
>0x0b1d <+189>:   je 0xb9c 
> 
> (gdb) list *0x0b19
> 0xb19 is in vti_rcv_cb
> (/build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:1169).
> 1164int ndir = dir | (reverse ? XFRM_POLICY_MASK + 1 : 0);
> 1165
> 1166if (sk && sk->sk_policy[XFRM_POLICY_IN])
> 1167return __xfrm_policy_check(sk, ndir, skb, family);
> 1168
> 1169return  (!net->xfrm.policy_count[dir] && !skb->sp) ||
> 1170(skb_dst(skb)->flags & DST_NOPOLICY) ||
> 1171__xfrm_policy_check(sk, ndir, skb, family);
> 1172}
> 1173
> 
> I really have hard time to understand why skb seem to be freed twice.
> 
> I'm not able to repeat the bug in lab, but it happened regulary in prod,
> seem to depend of the workload.
> 
> Any help will be appreciated.
> 
> Let me know if you need further informations.
> 
> Regards,
> 
> Jean-Philippe

> [   31.154360] BUG: unable to handle kernel NULL pointer dereference at 
> 0038
> [   31.162233] PGD 0 P4D 0
> [   31.164786] Oops:  [#1] SMP PTI
> [   31.168291] CPU: 5 PID: 42 Comm: ksoftirqd/5 Not tainted 4.18.0-11-generic 
> #12~18.04.1-Ubuntu
> [   31.176854] Hardware name: Supermicro 

[BUG] xfrm: unable to handle kernel NULL pointer dereference

2018-11-10 Thread Jean-Philippe Menil

Hi guys,

we're seeing unexpected crashes from kernel 4.15 to 4.18.17, using IPsec 
VTI interfaces, on several vpn hosts, since upgrade from 4.4.


Attached, the offended oops against 4.18.

Output of decodedecode:

[ 37.134864] Code: 8b 44 24 70 0f c8 89 87 b4 00 00 00 48 8b 86 20 05 00 
00 8b 80 f8 14 00 00 85 c0 75 05 48 85 d2 74 0e 48 8b 43 58 48 83 e0 fe 
 40 38 04 74 7d 44 89 b3 b4 00 00 00 49 8b 44 24 20 48 39 86 20

All code

   0:   8b 44 24 70 mov0x70(%rsp),%eax
   4:   0f c8   bswap  %eax
   6:   89 87 b4 00 00 00   mov%eax,0xb4(%rdi)
   c:   48 8b 86 20 05 00 00mov0x520(%rsi),%rax
  13:   8b 80 f8 14 00 00   mov0x14f8(%rax),%eax
  19:   85 c0   test   %eax,%eax
  1b:   75 05   jne0x22
  1d:   48 85 d2test   %rdx,%rdx
  20:   74 0e   je 0x30
  22:   48 8b 43 58 mov0x58(%rbx),%rax
  26:   48 83 e0 fe and$0xfffe,%rax
  2a:*  f6 40 38 04 testb  $0x4,0x38(%rax)  <-- 
trapping instruction

  2e:   74 7d   je 0xad
  30:   44 89 b3 b4 00 00 00mov%r14d,0xb4(%rbx)
  37:   49 8b 44 24 20  mov0x20(%r12),%rax
  3c:   48  rex.W
  3d:   39  .byte 0x39
  3e:   86 20   xchg   %ah,(%rax)

Code starting with the faulting instruction
===
   0:   f6 40 38 04 testb  $0x4,0x38(%rax)
   4:   74 7d   je 0x83
   6:   44 89 b3 b4 00 00 00mov%r14d,0xb4(%rbx)
   d:   49 8b 44 24 20  mov0x20(%r12),%rax
  12:   48  rex.W
  13:   39  .byte 0x39
  14:   86 20   xchg   %ah,(%rax)


if my understanding is correct, we fail here:

/build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:
1169return  (!net->xfrm.policy_count[dir] && !skb->sp) ||
   0x0b19 <+185>:   testb  $0x4,0x38(%rax)
   0x0b1d <+189>:   je 0xb9c 

(gdb) list *0x0b19
0xb19 is in vti_rcv_cb 
(/build/linux-hwe-edge-yHKLQJ/linux-hwe-edge-4.18.0/include/net/xfrm.h:1169).

1164int ndir = dir | (reverse ? XFRM_POLICY_MASK + 1 : 0);
1165
1166if (sk && sk->sk_policy[XFRM_POLICY_IN])
1167return __xfrm_policy_check(sk, ndir, skb, family);
1168
1169return  (!net->xfrm.policy_count[dir] && !skb->sp) ||
1170(skb_dst(skb)->flags & DST_NOPOLICY) ||
1171__xfrm_policy_check(sk, ndir, skb, family);
1172}
1173

I really have hard time to understand why skb seem to be freed twice.

I'm not able to repeat the bug in lab, but it happened regulary in prod, 
seem to depend of the workload.


Any help will be appreciated.

Let me know if you need further informations.

Regards,

Jean-Philippe
[   31.154360] BUG: unable to handle kernel NULL pointer dereference at 
0038
[   31.162233] PGD 0 P4D 0
[   31.164786] Oops:  [#1] SMP PTI
[   31.168291] CPU: 5 PID: 42 Comm: ksoftirqd/5 Not tainted 4.18.0-11-generic 
#12~18.04.1-Ubuntu
[   31.176854] Hardware name: Supermicro Super Server/X10SDV-4C-7TP4F, BIOS 
1.0b 11/21/2016
[   31.184980] RIP: 0010:vti_rcv_cb+0xb9/0x1a0 [ip_vti]
[   31.189962] Code: 8b 44 24 70 0f c8 89 87 b4 00 00 00 48 8b 86 20 05 00 00 
8b 80 f8 14 00 00 85 c0 75 05 48 85 d2 74 0e 48 8b 43 58 48 83 e0 fe  40 38 
04 74 7d 44 89 b3 b4 00 00 00 49 8b 44 24 20 48 39 86 20
[   31.208916] RSP: 0018:bc61832e3920 EFLAGS: 00010246
[   31.214160] RAX:  RBX: 9a3504964a00 RCX: 0002
[   31.221328] RDX: 9a351add4080 RSI: 9a351aa08000 RDI: 9a3504964a00
[   31.228485] RBP: bc61832e3940 R08: 0004 R09: c0aa612b
[   31.235643] R10: 0008f09b99881884 R11: 1884bd4e2d6b1fac R12: 9a3507b31900
[   31.242803] R13: 9a3507b31000 R14:  R15: 9a3504964a00
[   31.249964] FS:  () GS:9a35bfd4() 
knlGS:
[   31.258077] CS:  0010 DS:  ES:  CR0: 80050033
[   31.263848] CR2: 0038 CR3: 00041a40a003 CR4: 003606e0
[   31.271004] DR0:  DR1:  DR2: 
[   31.278163] DR3:  DR6: fffe0ff0 DR7: 0400
[   31.285320] Call Trace:
[   31.287789]  xfrm4_rcv_cb+0x4a/0x70
[   31.291297]  xfrm_input+0x58f/0x8f0
[   31.294807]  vti_input+0xaa/0x110 [ip_vti]
[   31.298926]  vti_rcv+0x33/0x3c [ip_vti]
[   31.302783]  xfrm4_esp_rcv+0x39/0x50
[   31.306375]  ip_local_deliver_finish+0x62/0x200
[   31.310923]  ip_local_deliver+0xdf/0xf0
[   31.314775]  ? ip_rcv_finish+0x420/0x420
[   31.318718]  ip_rcv_finish+0x126/0x420
[   31.322486]  ip_rcv+0x28f/0x360
[   31.325655]  ? inet_del_offload+0x40/0x40
[   31.329686]  __netif_receive_skb_core+0x48c/0xb70
[   31.