Re: kernel BUG at net/key/af_key.c:LINE!

2017-12-03 Thread Eric Biggers
On Wed, Nov 15, 2017 at 12:29:19PM +0100, Steffen Klassert wrote:
> On Fri, Nov 10, 2017 at 02:14:06PM +1100, Herbert Xu wrote:
> > On Fri, Nov 10, 2017 at 01:30:38PM +1100, Herbert Xu wrote:
> > > 
> > > I found the problem.  This crap is coming from clone_policy.  Now
> > > let me where this code came from.
> > 
> > ---8<---
> > Subject: xfrm: Copy policy family in clone_policy
> > 
> > The syzbot found an ancient bug in the IPsec code.  When we cloned
> > a socket policy (for example, for a child TCP socket derived from a
> > listening socket), we did not copy the family field.  This results
> > in a live policy with a zero family field.  This triggers a BUG_ON
> > check in the af_key code when the cloned policy is retrieved.
> > 
> > This patch fixes it by copying the family field over.
> > 
> > Reported-by: syzbot 
> > Signed-off-by: Herbert Xu 
> 
> Patch applied, thanks Herbert!

And to tell the bot what fixes this:

#syz fix: xfrm: Copy policy family in clone_policy

Also, does this fix need to go to stable?  The commit doesn't have Cc: stable.


Re: kernel BUG at net/key/af_key.c:LINE!

2017-11-15 Thread Steffen Klassert
On Fri, Nov 10, 2017 at 02:14:06PM +1100, Herbert Xu wrote:
> On Fri, Nov 10, 2017 at 01:30:38PM +1100, Herbert Xu wrote:
> > 
> > I found the problem.  This crap is coming from clone_policy.  Now
> > let me where this code came from.
> 
> ---8<---
> Subject: xfrm: Copy policy family in clone_policy
> 
> The syzbot found an ancient bug in the IPsec code.  When we cloned
> a socket policy (for example, for a child TCP socket derived from a
> listening socket), we did not copy the family field.  This results
> in a live policy with a zero family field.  This triggers a BUG_ON
> check in the af_key code when the cloned policy is retrieved.
> 
> This patch fixes it by copying the family field over.
> 
> Reported-by: syzbot 
> Signed-off-by: Herbert Xu 

Patch applied, thanks Herbert!


Re: kernel BUG at net/key/af_key.c:LINE!

2017-11-09 Thread Herbert Xu
On Fri, Nov 10, 2017 at 01:30:38PM +1100, Herbert Xu wrote:
> 
> I found the problem.  This crap is coming from clone_policy.  Now
> let me where this code came from.

---8<---
Subject: xfrm: Copy policy family in clone_policy

The syzbot found an ancient bug in the IPsec code.  When we cloned
a socket policy (for example, for a child TCP socket derived from a
listening socket), we did not copy the family field.  This results
in a live policy with a zero family field.  This triggers a BUG_ON
check in the af_key code when the cloned policy is retrieved.

This patch fixes it by copying the family field over.

Reported-by: syzbot 
Signed-off-by: Herbert Xu 

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 8cafb3c..c238959 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1306,6 +1306,7 @@ static struct xfrm_policy *clone_policy(const struct 
xfrm_policy *old, int dir)
newp->xfrm_nr = old->xfrm_nr;
newp->index = old->index;
newp->type = old->type;
+   newp->family = old->family;
memcpy(newp->xfrm_vec, old->xfrm_vec,
   newp->xfrm_nr*sizeof(struct xfrm_tmpl));
spin_lock_bh(>xfrm.xfrm_policy_lock);
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: kernel BUG at net/key/af_key.c:LINE!

2017-11-09 Thread Herbert Xu
On Fri, Nov 10, 2017 at 01:11:45PM +1100, Herbert Xu wrote:
>
> Oh and this is an important clue.  We have two policies with
> identical index values.  The index value is meant to be unique
> so clearly something funny is going on.

I found the problem.  This crap is coming from clone_policy.  Now
let me where this code came from.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: kernel BUG at net/key/af_key.c:LINE!

2017-11-09 Thread Herbert Xu
On Fri, Nov 10, 2017 at 01:04:59PM +1100, Herbert Xu wrote:
> 
> By castrating the reproducer to not perform a pfkey dump I have
> captured the corrupted policy via xfrm:
> 
> src ???/0 dst ???/0 uid 0
> socket in action allow index 2083 priority 0 ptype main share any 
> flag  (0x)
> lifetime config:
>   limit: soft 0(bytes), hard 0(bytes)
>   limit: soft 0(packets), hard 0(packets)
>   expire add: soft 0(sec), hard 0(sec)
>   expire use: soft 0(sec), hard 0(sec)
> lifetime current:
>   0(bytes), 0(packets)
>   add 2017-11-10 09:58:17 use 2017-11-10 09:58:20
> tmpl src ac14:bb:: dst ::
> proto 0 spi 0x(0) reqid 0(0x) mode transport
> level 5 share any 
> enc-mask  auth-mask  comp-mask 
> 
> For comparison here is a good policy that was also created by the
> reproducer:
> 
> src fe80::bb/0 dst ::/0 uid 0
> socket in action allow index 2083 priority 0 ptype main share any 
> flag  (0x)
> lifetime config:
>   limit: soft 0(bytes), hard 0(bytes)
>   limit: soft 0(packets), hard 0(packets)
>   expire add: soft 0(sec), hard 0(sec)
>   expire use: soft 0(sec), hard 0(sec)
> lifetime current:
>   0(bytes), 0(packets)
>   add 2017-11-10 09:58:17 use 2017-11-10 09:58:17
> tmpl src ac14:bb:: dst ::
> proto 0 spi 0x(0) reqid 0(0x) mode transport
> level 5 share any 
> enc-mask  auth-mask  comp-mask 

Oh and this is an important clue.  We have two policies with
identical index values.  The index value is meant to be unique
so clearly something funny is going on.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: kernel BUG at net/key/af_key.c:LINE!

2017-11-09 Thread Herbert Xu
On Thu, Nov 09, 2017 at 10:38:57PM +1100, Herbert Xu wrote:
> 
> The xfrm code path is meant to forbid the creation of such a policy.
> I don't currently see how this is bypassing that check.  But
> clearly it has found a way through the check since it's crashing.

By castrating the reproducer to not perform a pfkey dump I have
captured the corrupted policy via xfrm:

src ???/0 dst ???/0 uid 0
socket in action allow index 2083 priority 0 ptype main share any flag  
(0x)
lifetime config:
  limit: soft 0(bytes), hard 0(bytes)
  limit: soft 0(packets), hard 0(packets)
  expire add: soft 0(sec), hard 0(sec)
  expire use: soft 0(sec), hard 0(sec)
lifetime current:
  0(bytes), 0(packets)
  add 2017-11-10 09:58:17 use 2017-11-10 09:58:20
tmpl src ac14:bb:: dst ::
proto 0 spi 0x(0) reqid 0(0x) mode transport
level 5 share any 
enc-mask  auth-mask  comp-mask 

For comparison here is a good policy that was also created by the
reproducer:

src fe80::bb/0 dst ::/0 uid 0
socket in action allow index 2083 priority 0 ptype main share any flag  
(0x)
lifetime config:
  limit: soft 0(bytes), hard 0(bytes)
  limit: soft 0(packets), hard 0(packets)
  expire add: soft 0(sec), hard 0(sec)
  expire use: soft 0(sec), hard 0(sec)
lifetime current:
  0(bytes), 0(packets)
  add 2017-11-10 09:58:17 use 2017-11-10 09:58:17
tmpl src ac14:bb:: dst ::
proto 0 spi 0x(0) reqid 0(0x) mode transport
level 5 share any 
enc-mask  auth-mask  comp-mask 

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: kernel BUG at net/key/af_key.c:LINE!

2017-11-09 Thread Herbert Xu
On Wed, Nov 08, 2017 at 08:59:15AM +0100, Dmitry Vyukov wrote:
>
> Also the repro needs to be compiled with -m32 (but it does not compile
> without it due to missing __NR_mmap2, so I guess you passed -m32).

OK that's what I was missing.  I had hacked it to compile in
64-bit :)

However, I still don't understand why it's crashing yet.  What is
clear is that we're getting a socket policy with xp->family set
to zero, and the policy is created via the xfrm code path (as
opposed to af_key).

The xfrm code path is meant to forbid the creation of such a policy.
I don't currently see how this is bypassing that check.  But
clearly it has found a way through the check since it's crashing.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: kernel BUG at net/key/af_key.c:LINE!

2017-11-08 Thread Herbert Xu
On Tue, Oct 24, 2017 at 05:10:06PM +0200, Dmitry Vyukov wrote:
> On Tue, Oct 24, 2017 at 5:08 PM, syzbot
> 
> wrote:
> > Hello,
> >
> > syzkaller hit the following crash on
> > 02a2b05395dde2f49eb67b51a5fbc6606943
> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> > C reproducer is attached
> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > for information about syzkaller reproducers
> 
> This also happened on more recent commits, including net-next
> 833e0e2f24fd0525090878f71e129a8a4cb8bf78 (Oct 10) with similar
> signature:

Unfortunately I cannot reproduce the crash with your reproducer.
Does it always crash for you?

> [ cut here ]
> kernel BUG at net/key/af_key.c:2068!
> invalid opcode:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 11011 Comm: syz-executor1 Not tainted 4.14.0-rc4+ #80
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> task: 8801d4ecc1c0 task.stack: 8801c13f8000
> RIP: 0010:pfkey_xfrm_policy2msg+0x209c/0x22b0 net/key/af_key.c:2068

This shows that you have a xfrm policy that has a bogus family
field in your policy database.  But it gives no clue as to how
it got there.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: kernel BUG at net/key/af_key.c:LINE!

2017-11-08 Thread Dmitry Vyukov
On Wed, Nov 8, 2017 at 8:59 AM, Dmitry Vyukov  wrote:
> On Wed, Nov 8, 2017 at 8:47 AM, Herbert Xu  
> wrote:
>> On Tue, Oct 24, 2017 at 05:10:06PM +0200, Dmitry Vyukov wrote:
>>> On Tue, Oct 24, 2017 at 5:08 PM, syzbot
>>> 
>>> wrote:
>>> > Hello,
>>> >
>>> > syzkaller hit the following crash on
>>> > 02a2b05395dde2f49eb67b51a5fbc6606943
>>> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>>> > compiler: gcc (GCC) 7.1.1 20170620
>>> > .config is attached
>>> > Raw console output is attached.
>>> > C reproducer is attached
>>> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>>> > for information about syzkaller reproducers
>>>
>>> This also happened on more recent commits, including net-next
>>> 833e0e2f24fd0525090878f71e129a8a4cb8bf78 (Oct 10) with similar
>>> signature:
>>
>> Unfortunately I cannot reproduce the crash with your reproducer.
>> Does it always crash for you?
>>
>>> [ cut here ]
>>> kernel BUG at net/key/af_key.c:2068!
>>> invalid opcode:  [#1] SMP KASAN
>>> Dumping ftrace buffer:
>>>(ftrace buffer empty)
>>> Modules linked in:
>>> CPU: 1 PID: 11011 Comm: syz-executor1 Not tainted 4.14.0-rc4+ #80
>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>> BIOS Google 01/01/2011
>>> task: 8801d4ecc1c0 task.stack: 8801c13f8000
>>> RIP: 0010:pfkey_xfrm_policy2msg+0x209c/0x22b0 net/key/af_key.c:2068
>>
>> This shows that you have a xfrm policy that has a bogus family
>> field in your policy database.  But it gives no clue as to how
>> it got there.
>
> Just triggered it within a second.
> Are you using the provided config?
> Also the repro needs to be compiled with -m32 (but it does not compile
> without it due to missing __NR_mmap2, so I guess you passed -m32).


That was on linux-next:

commit 8b82a8a7ab53ee1a065ac69c835737a701f46b2e (HEAD, tag:
next-20171107, linux-next/master)
Author: Stephen Rothwell
Date:   Tue Nov 7 16:18:10 2017 +1100
Add linux-next specific files for 20171107


Re: kernel BUG at net/key/af_key.c:LINE!

2017-11-08 Thread Dmitry Vyukov
On Wed, Nov 8, 2017 at 8:47 AM, Herbert Xu  wrote:
> On Tue, Oct 24, 2017 at 05:10:06PM +0200, Dmitry Vyukov wrote:
>> On Tue, Oct 24, 2017 at 5:08 PM, syzbot
>> 
>> wrote:
>> > Hello,
>> >
>> > syzkaller hit the following crash on
>> > 02a2b05395dde2f49eb67b51a5fbc6606943
>> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>> > compiler: gcc (GCC) 7.1.1 20170620
>> > .config is attached
>> > Raw console output is attached.
>> > C reproducer is attached
>> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> > for information about syzkaller reproducers
>>
>> This also happened on more recent commits, including net-next
>> 833e0e2f24fd0525090878f71e129a8a4cb8bf78 (Oct 10) with similar
>> signature:
>
> Unfortunately I cannot reproduce the crash with your reproducer.
> Does it always crash for you?
>
>> [ cut here ]
>> kernel BUG at net/key/af_key.c:2068!
>> invalid opcode:  [#1] SMP KASAN
>> Dumping ftrace buffer:
>>(ftrace buffer empty)
>> Modules linked in:
>> CPU: 1 PID: 11011 Comm: syz-executor1 Not tainted 4.14.0-rc4+ #80
>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>> BIOS Google 01/01/2011
>> task: 8801d4ecc1c0 task.stack: 8801c13f8000
>> RIP: 0010:pfkey_xfrm_policy2msg+0x209c/0x22b0 net/key/af_key.c:2068
>
> This shows that you have a xfrm policy that has a bogus family
> field in your policy database.  But it gives no clue as to how
> it got there.

Just triggered it within a second.
Are you using the provided config?
Also the repro needs to be compiled with -m32 (but it does not compile
without it due to missing __NR_mmap2, so I guess you passed -m32).


Re: kernel BUG at net/key/af_key.c:LINE!

2017-10-24 Thread Dmitry Vyukov
On Tue, Oct 24, 2017 at 5:08 PM, syzbot

wrote:
> Hello,
>
> syzkaller hit the following crash on
> 02a2b05395dde2f49eb67b51a5fbc6606943
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers

This also happened on more recent commits, including net-next
833e0e2f24fd0525090878f71e129a8a4cb8bf78 (Oct 10) with similar
signature:

[ cut here ]
kernel BUG at net/key/af_key.c:2068!
invalid opcode:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 11011 Comm: syz-executor1 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
task: 8801d4ecc1c0 task.stack: 8801c13f8000
RIP: 0010:pfkey_xfrm_policy2msg+0x209c/0x22b0 net/key/af_key.c:2068
RSP: 0018:8801c13ff4b0 EFLAGS: 00010212
RAX: 0001 RBX: 8801ceaa828c RCX: c90001f3c000
RDX: 0599 RSI: 8444c4fc RDI: 8801ceaa812c
RBP: 8801c13ff588 R08: 0001 R09: 8801d55dbb40
R10: 001b R11: ed003aabb782 R12: 8801ceaa8148
R13: 8801ceaa8040 R14: 0008 R15: 0001
FS:  7fc611208700() GS:8801db30() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2ff0 CR3: 0001a13b6000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 dump_sp+0x14f/0x510 net/key/af_key.c:2669
 xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1015
 pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2692
 pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
 pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2719
 pfkey_process+0x60b/0x720 net/key/af_key.c:2809
 pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3648
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 sock_write_iter+0x320/0x5e0 net/socket.c:912
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4520a9
RSP: 002b:7fc611207c08 EFLAGS: 0216 ORIG_RAX: 0001
RAX: ffda RBX: 00718000 RCX: 004520a9
RDX: 0010 RSI: 2ff0 RDI: 0019
RBP: 0086 R08:  R09: 
R10:  R11: 0216 R12: 004bf3b0
R13:  R14: 0005 R15: 0029
Code: ff ff 48 89 95 58 ff ff ff 89 8d 70 ff ff ff e8 fb 70 5e fd 48
8b 95 58 ff ff ff 8b 8d 70 ff ff ff e9 04 e3 ff ff e8 74 4c 29 fd <0f>
0b be 02 00 00 00 4c 89 f7 e8 15 72 5e fd e9 6f e3 ff ff 48
RIP: pfkey_xfrm_policy2msg+0x209c/0x22b0 net/key/af_key.c:2068 RSP:
8801c13ff4b0
---[ end trace 3103e09d7f60a307 ]---



> [ cut here ]
> kernel BUG at net/key/af_key.c:2068!
> invalid opcode:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 3024 Comm: syzkaller790413 Not tainted 4.14.0-rc2+ #16
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> task: 8801cddc8100 task.stack: 8801c0a88000
> RIP: 0010:pfkey_xfrm_policy2msg+0x209c/0x22b0 net/key/af_key.c:2068
> RSP: 0018:8801c0a8f318 EFLAGS: 00010297
> RAX: 8801cddc8100 RBX: 8801cea778cc RCX: 
> RDX:  RSI: 204e RDI: 8801cea7776c
> RBP: 8801c0a8f3f0 R08: 0001 R09: 8801d0b66dc0
> R10: 001b R11: ed003a16cdd2 R12: 8801cea77788
> R13: 8801cea77680 R14: 0008 R15: 0001
> FS:  () GS:8801db20(0063) knlGS:ecf1fb40
> CS:  0010 DS: 002b ES: 002b CR0: 80050033
> CR2: 20002ff0 CR3: 0001d4b3c000 CR4: 001406f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  dump_sp+0x14f/0x510 net/key/af_key.c:2669
>  xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1015
>  pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2692
>  pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
>  pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2719
>  pfkey_process+0x60b/0x720 net/key/af_key.c:2809
>  pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3648
>  sock_sendmsg_nosec net/socket.c:633 [inline]
>  sock_sendmsg+0xca/0x110