Re: skb_over_panic using UDP and 6lowpan / fakelb

2017-04-03 Thread Cong Wang
(Cc'ing netdev and maintainers)

On Mon, Mar 27, 2017 at 2:16 AM, David Palma  wrote:
>
> Hi,
>
> Sending a simple UDP packet (39 bytes long), over a 6lowpan interface
> (using fakelb), creates a kernel panic (skb_over_panic).
>
> Steps to reproduce, and more details, can be found in:
> https://github.com/PalmaITEM/6lowpan-skb_over_panic
>
> This bug has been reported in
> https://bugzilla.kernel.org/show_bug.cgi?id=195059
>
> I have found that lengths around 39 bytes can also trigger this
> behaviour and that longer packets are handled without problem.
>
> Verified in:
>
> - Linux version 4.9.0-0.bpo.2-amd64 (debian-ker...@lists.debian.org)
> (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Debian 4.9.13-1~bpo8+1
> (2017-02-27)
> - Linux version 4.10.4-1-ARCH (builduser@tobias) (gcc version 6.3.1
> 20170306 (GCC) ) #1 SMP PREEMPT Sat Mar 18 19:39:18 CET 2017
>
>
> I am not familiar with the process of reporting kernel bugs, so
> apologies beforehand. I am also available to provide any missing
> information.
>
> Cheers,
> --
> David Palma
>


Re: net/sched: GPF in qdisc_hash_add

2017-03-24 Thread Cong Wang
On Thu, Mar 23, 2017 at 12:10 PM, Eric Dumazet <eduma...@google.com> wrote:
> On Thu, Mar 23, 2017 at 12:06 PM, Dmitry Vyukov <dvyu...@google.com> wrote:
>>
>> On Thu, Mar 23, 2017 at 8:00 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>> > On Thu, Mar 23, 2017 at 9:06 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
>> >> kasan: CONFIG_KASAN_INLINE enabled
>> >> kasan: GPF could be caused by NULL-ptr deref or user memory access
>> >> general protection fault:  [#1] SMP KASAN
>> >> Dumping ftrace buffer:
>> >>(ftrace buffer empty)
>> >> Modules linked in:
>> >> CPU: 2 PID: 12732 Comm: syz-executor6 Not tainted 4.11.0-rc3+ #365
>> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
>> >> 01/01/2011
>> >> task: 880062b7a2c0 task.stack: 88003348
>> >> RIP: 0010:qdisc_hash_add.part.19+0xb6/0x3c0 net/sched/sch_api.c:280
>> >> RSP: 0018:880033487820 EFLAGS: 00010202
>> >> RAX: dc00 RBX: 85357e00 RCX: c90002b24000
>> >> RDX: 007a RSI: 835a523a RDI: 03d0
>> >> RBP: 8800334878b8 R08: fbfff0a6afeb R09: fbfff0a6afeb
>> >> R10: 0001 R11: fbfff0a6afea R12: 85357e48
>> >> R13: 110006690f06 R14: 880033487890 R15: 
>> >> FS:  7f68665d0700() GS:88006e20() 
>> >> knlGS:
>> >> CS:  0010 DS:  ES:  CR0: 80050033
>> >> CR2: 004c2d44 CR3: 3c6f8000 CR4: 26e0
>> >> Call Trace:
>> >>  qdisc_hash_add+0x76/0x90 net/sched/sch_api.c:279
>> >>  attach_default_qdiscs net/sched/sch_generic.c:798 [inline]
>> >>  dev_activate+0x6ca/0x920 net/sched/sch_generic.c:829
>> >>  __dev_open+0x25b/0x360 net/core/dev.c:1348
>> >>  __dev_change_flags+0x159/0x3d0 net/core/dev.c:6460
>> >>  dev_change_flags+0x88/0x140 net/core/dev.c:6525
>> >>  dev_ifsioc+0x51f/0x9b0 net/core/dev_ioctl.c:254
>> >>  dev_ioctl+0x1fe/0x1030 net/core/dev_ioctl.c:532
>> >>  sock_do_ioctl+0x94/0xb0 net/socket.c:902
>> >>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>> >>  vfs_ioctl fs/ioctl.c:45 [inline]
>> >>  do_vfs_ioctl+0x1af/0x16d0 fs/ioctl.c:685
>> >>  SYSC_ioctl fs/ioctl.c:700 [inline]
>> >>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>> >>  entry_SYSCALL_64_fastpath+0x1f/0xc2
>> >
>> > The interesting part is why the NULL dereference is in
>> > qdisc_hash_add(), since we have a check before calling
>> > it:
>> >
>> > #ifdef CONFIG_NET_SCHED
>> > if (dev->qdisc)
>> > qdisc_hash_add(dev->qdisc);
>> > #endif
>> >
>> >
>> > When attach_one_default_qdisc() fails, we should trigger
>> > the NULL pointer dereference bug at:
>> >
>> > atomic_inc(>qdisc->refcnt);
>>
>> I think qdisc is not NULL, it's something _in_ qdisc that is NULL. The
>> crash happens here:
>>
>> struct Qdisc *root = qdisc_dev(q)->qdisc;
>>
>> so it's probably device.
>
>
>
> Looks like this bug came with commit 59cc1f61f09c
> ("net: sched: convert qdisc linked list to hashtable")
>
> I would simply guard qdisc_hash_add()
>
> (Against _qdisc)

Yeah, I missed that dev_init_scheduler() could assign noop_qdisc
to each tx queue. Then the check in attach_default_qdiscs()
is always false? If so we need...

 #ifdef CONFIG_NET_SCHED
-   if (dev->qdisc)
+   if (dev->qdisc != _qdisc)
qdisc_hash_add(dev->qdisc);
 #endif


Re: net/sched: GPF in qdisc_hash_add

2017-03-24 Thread Cong Wang
On Thu, Mar 23, 2017 at 12:10 PM, Eric Dumazet  wrote:
> On Thu, Mar 23, 2017 at 12:06 PM, Dmitry Vyukov  wrote:
>>
>> On Thu, Mar 23, 2017 at 8:00 PM, Cong Wang  wrote:
>> > On Thu, Mar 23, 2017 at 9:06 AM, Dmitry Vyukov  wrote:
>> >> kasan: CONFIG_KASAN_INLINE enabled
>> >> kasan: GPF could be caused by NULL-ptr deref or user memory access
>> >> general protection fault:  [#1] SMP KASAN
>> >> Dumping ftrace buffer:
>> >>(ftrace buffer empty)
>> >> Modules linked in:
>> >> CPU: 2 PID: 12732 Comm: syz-executor6 Not tainted 4.11.0-rc3+ #365
>> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
>> >> 01/01/2011
>> >> task: 880062b7a2c0 task.stack: 88003348
>> >> RIP: 0010:qdisc_hash_add.part.19+0xb6/0x3c0 net/sched/sch_api.c:280
>> >> RSP: 0018:880033487820 EFLAGS: 00010202
>> >> RAX: dc00 RBX: 85357e00 RCX: c90002b24000
>> >> RDX: 007a RSI: 835a523a RDI: 03d0
>> >> RBP: 8800334878b8 R08: fbfff0a6afeb R09: fbfff0a6afeb
>> >> R10: 0001 R11: fbfff0a6afea R12: 85357e48
>> >> R13: 110006690f06 R14: 880033487890 R15: 
>> >> FS:  7f68665d0700() GS:88006e20() 
>> >> knlGS:
>> >> CS:  0010 DS:  ES:  CR0: 80050033
>> >> CR2: 004c2d44 CR3: 3c6f8000 CR4: 26e0
>> >> Call Trace:
>> >>  qdisc_hash_add+0x76/0x90 net/sched/sch_api.c:279
>> >>  attach_default_qdiscs net/sched/sch_generic.c:798 [inline]
>> >>  dev_activate+0x6ca/0x920 net/sched/sch_generic.c:829
>> >>  __dev_open+0x25b/0x360 net/core/dev.c:1348
>> >>  __dev_change_flags+0x159/0x3d0 net/core/dev.c:6460
>> >>  dev_change_flags+0x88/0x140 net/core/dev.c:6525
>> >>  dev_ifsioc+0x51f/0x9b0 net/core/dev_ioctl.c:254
>> >>  dev_ioctl+0x1fe/0x1030 net/core/dev_ioctl.c:532
>> >>  sock_do_ioctl+0x94/0xb0 net/socket.c:902
>> >>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>> >>  vfs_ioctl fs/ioctl.c:45 [inline]
>> >>  do_vfs_ioctl+0x1af/0x16d0 fs/ioctl.c:685
>> >>  SYSC_ioctl fs/ioctl.c:700 [inline]
>> >>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>> >>  entry_SYSCALL_64_fastpath+0x1f/0xc2
>> >
>> > The interesting part is why the NULL dereference is in
>> > qdisc_hash_add(), since we have a check before calling
>> > it:
>> >
>> > #ifdef CONFIG_NET_SCHED
>> > if (dev->qdisc)
>> > qdisc_hash_add(dev->qdisc);
>> > #endif
>> >
>> >
>> > When attach_one_default_qdisc() fails, we should trigger
>> > the NULL pointer dereference bug at:
>> >
>> > atomic_inc(>qdisc->refcnt);
>>
>> I think qdisc is not NULL, it's something _in_ qdisc that is NULL. The
>> crash happens here:
>>
>> struct Qdisc *root = qdisc_dev(q)->qdisc;
>>
>> so it's probably device.
>
>
>
> Looks like this bug came with commit 59cc1f61f09c
> ("net: sched: convert qdisc linked list to hashtable")
>
> I would simply guard qdisc_hash_add()
>
> (Against _qdisc)

Yeah, I missed that dev_init_scheduler() could assign noop_qdisc
to each tx queue. Then the check in attach_default_qdiscs()
is always false? If so we need...

 #ifdef CONFIG_NET_SCHED
-   if (dev->qdisc)
+   if (dev->qdisc != _qdisc)
qdisc_hash_add(dev->qdisc);
 #endif


Re: net/sched: GPF in qdisc_hash_add

2017-03-23 Thread Cong Wang
On Thu, Mar 23, 2017 at 9:06 AM, Dmitry Vyukov  wrote:
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 2 PID: 12732 Comm: syz-executor6 Not tainted 4.11.0-rc3+ #365
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 880062b7a2c0 task.stack: 88003348
> RIP: 0010:qdisc_hash_add.part.19+0xb6/0x3c0 net/sched/sch_api.c:280
> RSP: 0018:880033487820 EFLAGS: 00010202
> RAX: dc00 RBX: 85357e00 RCX: c90002b24000
> RDX: 007a RSI: 835a523a RDI: 03d0
> RBP: 8800334878b8 R08: fbfff0a6afeb R09: fbfff0a6afeb
> R10: 0001 R11: fbfff0a6afea R12: 85357e48
> R13: 110006690f06 R14: 880033487890 R15: 
> FS:  7f68665d0700() GS:88006e20() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 004c2d44 CR3: 3c6f8000 CR4: 26e0
> Call Trace:
>  qdisc_hash_add+0x76/0x90 net/sched/sch_api.c:279
>  attach_default_qdiscs net/sched/sch_generic.c:798 [inline]
>  dev_activate+0x6ca/0x920 net/sched/sch_generic.c:829
>  __dev_open+0x25b/0x360 net/core/dev.c:1348
>  __dev_change_flags+0x159/0x3d0 net/core/dev.c:6460
>  dev_change_flags+0x88/0x140 net/core/dev.c:6525
>  dev_ifsioc+0x51f/0x9b0 net/core/dev_ioctl.c:254
>  dev_ioctl+0x1fe/0x1030 net/core/dev_ioctl.c:532
>  sock_do_ioctl+0x94/0xb0 net/socket.c:902
>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:45 [inline]
>  do_vfs_ioctl+0x1af/0x16d0 fs/ioctl.c:685
>  SYSC_ioctl fs/ioctl.c:700 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>  entry_SYSCALL_64_fastpath+0x1f/0xc2

The interesting part is why the NULL dereference is in
qdisc_hash_add(), since we have a check before calling
it:

#ifdef CONFIG_NET_SCHED
if (dev->qdisc)
qdisc_hash_add(dev->qdisc);
#endif


When attach_one_default_qdisc() fails, we should trigger
the NULL pointer dereference bug at:

atomic_inc(>qdisc->refcnt);


Re: net/sched: GPF in qdisc_hash_add

2017-03-23 Thread Cong Wang
On Thu, Mar 23, 2017 at 9:06 AM, Dmitry Vyukov  wrote:
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 2 PID: 12732 Comm: syz-executor6 Not tainted 4.11.0-rc3+ #365
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 880062b7a2c0 task.stack: 88003348
> RIP: 0010:qdisc_hash_add.part.19+0xb6/0x3c0 net/sched/sch_api.c:280
> RSP: 0018:880033487820 EFLAGS: 00010202
> RAX: dc00 RBX: 85357e00 RCX: c90002b24000
> RDX: 007a RSI: 835a523a RDI: 03d0
> RBP: 8800334878b8 R08: fbfff0a6afeb R09: fbfff0a6afeb
> R10: 0001 R11: fbfff0a6afea R12: 85357e48
> R13: 110006690f06 R14: 880033487890 R15: 
> FS:  7f68665d0700() GS:88006e20() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 004c2d44 CR3: 3c6f8000 CR4: 26e0
> Call Trace:
>  qdisc_hash_add+0x76/0x90 net/sched/sch_api.c:279
>  attach_default_qdiscs net/sched/sch_generic.c:798 [inline]
>  dev_activate+0x6ca/0x920 net/sched/sch_generic.c:829
>  __dev_open+0x25b/0x360 net/core/dev.c:1348
>  __dev_change_flags+0x159/0x3d0 net/core/dev.c:6460
>  dev_change_flags+0x88/0x140 net/core/dev.c:6525
>  dev_ifsioc+0x51f/0x9b0 net/core/dev_ioctl.c:254
>  dev_ioctl+0x1fe/0x1030 net/core/dev_ioctl.c:532
>  sock_do_ioctl+0x94/0xb0 net/socket.c:902
>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:45 [inline]
>  do_vfs_ioctl+0x1af/0x16d0 fs/ioctl.c:685
>  SYSC_ioctl fs/ioctl.c:700 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>  entry_SYSCALL_64_fastpath+0x1f/0xc2

The interesting part is why the NULL dereference is in
qdisc_hash_add(), since we have a check before calling
it:

#ifdef CONFIG_NET_SCHED
if (dev->qdisc)
qdisc_hash_add(dev->qdisc);
#endif


When attach_one_default_qdisc() fails, we should trigger
the NULL pointer dereference bug at:

atomic_inc(>qdisc->refcnt);


Re: net/kcm: double free of kcm inode

2017-03-23 Thread Cong Wang
On Thu, Mar 23, 2017 at 5:09 AM, Dmitry Vyukov  wrote:
> Hello,
>
> I've got the following report while running syzkaller fuzzer. Note the
> preceding kmem_cache_alloc injected failure, it's most likely the root
> cause.
>
> FAULT_INJECTION: forcing a failure.
> name failslab, interval 1, probability 0, space 0, times 0
> CPU: 1 PID: 21839 Comm: syz-executor4 Not tainted 4.11.0-rc3+ #364
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x1b8/0x28d lib/dump_stack.c:52
>  fail_dump lib/fault-inject.c:45 [inline]
>  should_fail+0x78a/0x870 lib/fault-inject.c:154
>  should_failslab+0xec/0x120 mm/failslab.c:31
>  slab_pre_alloc_hook mm/slab.h:434 [inline]
>  slab_alloc mm/slab.c:3394 [inline]
>  kmem_cache_alloc+0x200/0x720 mm/slab.c:3570
>  sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1331
>  sk_alloc+0x8c/0x710 net/core/sock.c:1393
>  kcm_clone net/kcm/kcmsock.c:1655 [inline]
>  kcm_ioctl+0xb65/0x17e0 net/kcm/kcmsock.c:1713
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:45 [inline]
>  do_vfs_ioctl+0x1af/0x16d0 fs/ioctl.c:685
>  SYSC_ioctl fs/ioctl.c:700 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>  entry_SYSCALL_64_fastpath+0x1f/0xc2

I don't know if this patch could fix this bug or not:
https://patchwork.ozlabs.org/patch/742860/

This is why I don't add your Reported-by. But it could be related.

Thanks.

> RIP: 0033:0x445b79
> RSP: 002b:7f05eb28e858 EFLAGS: 0286 ORIG_RAX: 0010
> RAX: ffda RBX: 00708000 RCX: 00445b79
> RDX: 20001000 RSI: 89e2 RDI: 0005
> RBP: 0086 R08:  R09: 
> R10:  R11: 0286 R12: 004a7e31
> R13:  R14: 7f05eb28e618 R15: 7f05eb28e788
> ==
> BUG: KASAN: use-after-free in __fput+0x6b0/0x7f0 fs/file_table.c:211
> at addr 880037a25670
> Read of size 2 by task syz-executor4/21839
> CPU: 1 PID: 21839 Comm: syz-executor4 Not tainted 4.11.0-rc3+ #364
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x1b8/0x28d lib/dump_stack.c:52
>  kasan_object_err+0x1c/0x70 mm/kasan/report.c:166
>  print_address_description mm/kasan/report.c:210 [inline]
>  kasan_report_error mm/kasan/report.c:294 [inline]
>  kasan_report.part.2+0x1be/0x480 mm/kasan/report.c:316
>  kasan_report mm/kasan/report.c:335 [inline]
>  __asan_report_load2_noabort+0x29/0x30 mm/kasan/report.c:335
>  __fput+0x6b0/0x7f0 fs/file_table.c:211
>  fput+0x15/0x20 fs/file_table.c:245
>  task_work_run+0x1a4/0x270 kernel/task_work.c:116
>  tracehook_notify_resume include/linux/tracehook.h:191 [inline]
>  exit_to_usermode_loop+0x24d/0x2d0 arch/x86/entry/common.c:161
>  prepare_exit_to_usermode arch/x86/entry/common.c:191 [inline]
>  syscall_return_slowpath+0x3bd/0x460 arch/x86/entry/common.c:260
>  entry_SYSCALL_64_fastpath+0xc0/0xc2
> RIP: 0033:0x445b79
> RSP: 002b:7f05eb28e858 EFLAGS: 0286 ORIG_RAX: 0010
> RAX: fff4 RBX: 00708000 RCX: 00445b79
> RDX: 20001000 RSI: 89e2 RDI: 0005
> RBP: 2170 R08:  R09: 
> R10:  R11: 0286 R12: 006e0230
> R13: 89e2 R14: 20001000 R15: 0005
> Object at 880037a25640, in cache sock_inode_cache size: 944
> Allocated:
> PID = 21839
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:517
>  set_track mm/kasan/kasan.c:529 [inline]
>  kasan_kmalloc+0xbc/0xf0 mm/kasan/kasan.c:620
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:559
>  kmem_cache_alloc+0x110/0x720 mm/slab.c:3572
>  sock_alloc_inode+0x70/0x300 net/socket.c:250
>  alloc_inode+0x65/0x180 fs/inode.c:207
>  new_inode_pseudo+0x69/0x190 fs/inode.c:889
>  sock_alloc+0x41/0x270 net/socket.c:565
>  kcm_clone net/kcm/kcmsock.c:1634 [inline]
>  kcm_ioctl+0x990/0x17e0 net/kcm/kcmsock.c:1713
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:45 [inline]
>  do_vfs_ioctl+0x1af/0x16d0 fs/ioctl.c:685
>  SYSC_ioctl fs/ioctl.c:700 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> Freed:
> PID = 21839
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:517
>  set_track mm/kasan/kasan.c:529 [inline]
>  kasan_slab_free+0x81/0xc0 mm/kasan/kasan.c:593
>  __cache_free mm/slab.c:3514 [inline]
>  kmem_cache_free+0x71/0x240 mm/slab.c:3774
>  sock_destroy_inode+0x56/0x70 net/socket.c:280
>  destroy_inode+0x15d/0x200 fs/inode.c:264
>  evict+0x57e/0x920 fs/inode.c:570
>  

Re: net/kcm: double free of kcm inode

2017-03-23 Thread Cong Wang
On Thu, Mar 23, 2017 at 5:09 AM, Dmitry Vyukov  wrote:
> Hello,
>
> I've got the following report while running syzkaller fuzzer. Note the
> preceding kmem_cache_alloc injected failure, it's most likely the root
> cause.
>
> FAULT_INJECTION: forcing a failure.
> name failslab, interval 1, probability 0, space 0, times 0
> CPU: 1 PID: 21839 Comm: syz-executor4 Not tainted 4.11.0-rc3+ #364
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x1b8/0x28d lib/dump_stack.c:52
>  fail_dump lib/fault-inject.c:45 [inline]
>  should_fail+0x78a/0x870 lib/fault-inject.c:154
>  should_failslab+0xec/0x120 mm/failslab.c:31
>  slab_pre_alloc_hook mm/slab.h:434 [inline]
>  slab_alloc mm/slab.c:3394 [inline]
>  kmem_cache_alloc+0x200/0x720 mm/slab.c:3570
>  sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1331
>  sk_alloc+0x8c/0x710 net/core/sock.c:1393
>  kcm_clone net/kcm/kcmsock.c:1655 [inline]
>  kcm_ioctl+0xb65/0x17e0 net/kcm/kcmsock.c:1713
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:45 [inline]
>  do_vfs_ioctl+0x1af/0x16d0 fs/ioctl.c:685
>  SYSC_ioctl fs/ioctl.c:700 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>  entry_SYSCALL_64_fastpath+0x1f/0xc2

I don't know if this patch could fix this bug or not:
https://patchwork.ozlabs.org/patch/742860/

This is why I don't add your Reported-by. But it could be related.

Thanks.

> RIP: 0033:0x445b79
> RSP: 002b:7f05eb28e858 EFLAGS: 0286 ORIG_RAX: 0010
> RAX: ffda RBX: 00708000 RCX: 00445b79
> RDX: 20001000 RSI: 89e2 RDI: 0005
> RBP: 0086 R08:  R09: 
> R10:  R11: 0286 R12: 004a7e31
> R13:  R14: 7f05eb28e618 R15: 7f05eb28e788
> ==
> BUG: KASAN: use-after-free in __fput+0x6b0/0x7f0 fs/file_table.c:211
> at addr 880037a25670
> Read of size 2 by task syz-executor4/21839
> CPU: 1 PID: 21839 Comm: syz-executor4 Not tainted 4.11.0-rc3+ #364
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x1b8/0x28d lib/dump_stack.c:52
>  kasan_object_err+0x1c/0x70 mm/kasan/report.c:166
>  print_address_description mm/kasan/report.c:210 [inline]
>  kasan_report_error mm/kasan/report.c:294 [inline]
>  kasan_report.part.2+0x1be/0x480 mm/kasan/report.c:316
>  kasan_report mm/kasan/report.c:335 [inline]
>  __asan_report_load2_noabort+0x29/0x30 mm/kasan/report.c:335
>  __fput+0x6b0/0x7f0 fs/file_table.c:211
>  fput+0x15/0x20 fs/file_table.c:245
>  task_work_run+0x1a4/0x270 kernel/task_work.c:116
>  tracehook_notify_resume include/linux/tracehook.h:191 [inline]
>  exit_to_usermode_loop+0x24d/0x2d0 arch/x86/entry/common.c:161
>  prepare_exit_to_usermode arch/x86/entry/common.c:191 [inline]
>  syscall_return_slowpath+0x3bd/0x460 arch/x86/entry/common.c:260
>  entry_SYSCALL_64_fastpath+0xc0/0xc2
> RIP: 0033:0x445b79
> RSP: 002b:7f05eb28e858 EFLAGS: 0286 ORIG_RAX: 0010
> RAX: fff4 RBX: 00708000 RCX: 00445b79
> RDX: 20001000 RSI: 89e2 RDI: 0005
> RBP: 2170 R08:  R09: 
> R10:  R11: 0286 R12: 006e0230
> R13: 89e2 R14: 20001000 R15: 0005
> Object at 880037a25640, in cache sock_inode_cache size: 944
> Allocated:
> PID = 21839
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:517
>  set_track mm/kasan/kasan.c:529 [inline]
>  kasan_kmalloc+0xbc/0xf0 mm/kasan/kasan.c:620
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:559
>  kmem_cache_alloc+0x110/0x720 mm/slab.c:3572
>  sock_alloc_inode+0x70/0x300 net/socket.c:250
>  alloc_inode+0x65/0x180 fs/inode.c:207
>  new_inode_pseudo+0x69/0x190 fs/inode.c:889
>  sock_alloc+0x41/0x270 net/socket.c:565
>  kcm_clone net/kcm/kcmsock.c:1634 [inline]
>  kcm_ioctl+0x990/0x17e0 net/kcm/kcmsock.c:1713
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:45 [inline]
>  do_vfs_ioctl+0x1af/0x16d0 fs/ioctl.c:685
>  SYSC_ioctl fs/ioctl.c:700 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> Freed:
> PID = 21839
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:517
>  set_track mm/kasan/kasan.c:529 [inline]
>  kasan_slab_free+0x81/0xc0 mm/kasan/kasan.c:593
>  __cache_free mm/slab.c:3514 [inline]
>  kmem_cache_free+0x71/0x240 mm/slab.c:3774
>  sock_destroy_inode+0x56/0x70 net/socket.c:280
>  destroy_inode+0x15d/0x200 fs/inode.c:264
>  evict+0x57e/0x920 fs/inode.c:570
>  iput_final 

Re: net/sctp: recursive locking in sctp_do_peeloff

2017-03-15 Thread Cong Wang
On Wed, Mar 15, 2017 at 5:52 AM, Marcelo Ricardo Leitner
<marcelo.leit...@gmail.com> wrote:
> On Tue, Mar 14, 2017 at 09:52:15PM -0700, Cong Wang wrote:
>> Instead of checking for the status of the sock, I believe the following
>> one-line fix should do the trick too. Can you give it a try?
>>
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index 0f378ea..4de62d4 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -1494,7 +1494,7 @@ static void sctp_close(struct sock *sk, long timeout)
>>
>> pr_debug("%s: sk:%p, timeout:%ld\n", __func__, sk, timeout);
>>
>> -   lock_sock(sk);
>> +   lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
>> sk->sk_shutdown = SHUTDOWN_MASK;
>> sk->sk_state = SCTP_SS_CLOSING;
>
> I refrained on doing this just because it will change the lock signature
> for the first level too, as sctp_close() can be called directly, and
> might avoid some other lockdep detections.

I knew, but for the first level it is fine to use a different class,
it is merely to make lockdep happy. There is no real deadlock here
since they are two different socks anyway.

>
> Then you probably also need:
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 465a9c8464f9..02506b4406d2 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1543,7 +1543,7 @@ static void sctp_close(struct sock *sk, long timeout)
>  * held and that should be grabbed before socket lock.
>  */
> spin_lock_bh(>sctp.addr_wq_lock);
> -   bh_lock_sock(sk);
> +   bh_lock_sock_nested(sk);
>
> /* Hold the sock, since sk_common_release() will put sock_put()
>  * and we have just a little more cleanup.
>
> because sctp_close will re-lock the socket a little later (for backlog
> processing).
>

Ah, of course I missed the re-lock. Dmitry, please add this piece too.

Thanks.


Re: net/sctp: recursive locking in sctp_do_peeloff

2017-03-15 Thread Cong Wang
On Wed, Mar 15, 2017 at 5:52 AM, Marcelo Ricardo Leitner
 wrote:
> On Tue, Mar 14, 2017 at 09:52:15PM -0700, Cong Wang wrote:
>> Instead of checking for the status of the sock, I believe the following
>> one-line fix should do the trick too. Can you give it a try?
>>
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index 0f378ea..4de62d4 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -1494,7 +1494,7 @@ static void sctp_close(struct sock *sk, long timeout)
>>
>> pr_debug("%s: sk:%p, timeout:%ld\n", __func__, sk, timeout);
>>
>> -   lock_sock(sk);
>> +   lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
>> sk->sk_shutdown = SHUTDOWN_MASK;
>> sk->sk_state = SCTP_SS_CLOSING;
>
> I refrained on doing this just because it will change the lock signature
> for the first level too, as sctp_close() can be called directly, and
> might avoid some other lockdep detections.

I knew, but for the first level it is fine to use a different class,
it is merely to make lockdep happy. There is no real deadlock here
since they are two different socks anyway.

>
> Then you probably also need:
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 465a9c8464f9..02506b4406d2 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1543,7 +1543,7 @@ static void sctp_close(struct sock *sk, long timeout)
>  * held and that should be grabbed before socket lock.
>  */
> spin_lock_bh(>sctp.addr_wq_lock);
> -   bh_lock_sock(sk);
> +   bh_lock_sock_nested(sk);
>
> /* Hold the sock, since sk_common_release() will put sock_put()
>  * and we have just a little more cleanup.
>
> because sctp_close will re-lock the socket a little later (for backlog
> processing).
>

Ah, of course I missed the re-lock. Dmitry, please add this piece too.

Thanks.


Re: net/sctp: recursive locking in sctp_do_peeloff

2017-03-14 Thread Cong Wang
On Fri, Mar 10, 2017 at 12:04 PM, Dmitry Vyukov  wrote:
> On Fri, Mar 10, 2017 at 8:46 PM, Marcelo Ricardo Leitner
>  wrote:
>> On Fri, Mar 10, 2017 at 4:11 PM, Dmitry Vyukov  wrote:
>>> Hello,
>>>
>>> I've got the following recursive locking report while running
>>> syzkaller fuzzer on net-next/9c28286b1b4b9bce6e35dd4c8a1265f03802a89a:
>>>
>>> [ INFO: possible recursive locking detected ]
>>> 4.10.0+ #14 Not tainted
>>> -
>>> syz-executor3/5560 is trying to acquire lock:
>>>  (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
>>> include/net/sock.h:1460 [inline]
>>>  (sk_lock-AF_INET6){+.+.+.}, at: []
>>> sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
>>>
>>> but task is already holding lock:
>>>  (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
>>> include/net/sock.h:1460 [inline]
>>>  (sk_lock-AF_INET6){+.+.+.}, at: []
>>> sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611
>>>
>>> other info that might help us debug this:
>>>  Possible unsafe locking scenario:
>>>
>>>CPU0
>>>
>>>   lock(sk_lock-AF_INET6);
>>>   lock(sk_lock-AF_INET6);
>>>
>>>  *** DEADLOCK ***
>>>
>>>  May be due to missing lock nesting notation
>>
>> Pretty much the case, I suppose. The lock held by sctp_getsockopt() is
>> on one socket, while the other lock that sctp_close() is getting later
>> is on the newly created (which failed) socket during peeloff
>> operation.
>
>
> Does this mean that never-ever lock 2 sockets at a time except for
> this case? If so, it probably suggests that this case should not do it
> either.
>

Yeah, actually for the error path we don't even need to lock sock
since it is newly allocated and no one else could see it yet.

Instead of checking for the status of the sock, I believe the following
one-line fix should do the trick too. Can you give it a try?

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 0f378ea..4de62d4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1494,7 +1494,7 @@ static void sctp_close(struct sock *sk, long timeout)

pr_debug("%s: sk:%p, timeout:%ld\n", __func__, sk, timeout);

-   lock_sock(sk);
+   lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
sk->sk_shutdown = SHUTDOWN_MASK;
sk->sk_state = SCTP_SS_CLOSING;


Re: net/sctp: recursive locking in sctp_do_peeloff

2017-03-14 Thread Cong Wang
On Fri, Mar 10, 2017 at 12:04 PM, Dmitry Vyukov  wrote:
> On Fri, Mar 10, 2017 at 8:46 PM, Marcelo Ricardo Leitner
>  wrote:
>> On Fri, Mar 10, 2017 at 4:11 PM, Dmitry Vyukov  wrote:
>>> Hello,
>>>
>>> I've got the following recursive locking report while running
>>> syzkaller fuzzer on net-next/9c28286b1b4b9bce6e35dd4c8a1265f03802a89a:
>>>
>>> [ INFO: possible recursive locking detected ]
>>> 4.10.0+ #14 Not tainted
>>> -
>>> syz-executor3/5560 is trying to acquire lock:
>>>  (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
>>> include/net/sock.h:1460 [inline]
>>>  (sk_lock-AF_INET6){+.+.+.}, at: []
>>> sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
>>>
>>> but task is already holding lock:
>>>  (sk_lock-AF_INET6){+.+.+.}, at: [] lock_sock
>>> include/net/sock.h:1460 [inline]
>>>  (sk_lock-AF_INET6){+.+.+.}, at: []
>>> sctp_getsockopt+0x450/0x67e0 net/sctp/socket.c:6611
>>>
>>> other info that might help us debug this:
>>>  Possible unsafe locking scenario:
>>>
>>>CPU0
>>>
>>>   lock(sk_lock-AF_INET6);
>>>   lock(sk_lock-AF_INET6);
>>>
>>>  *** DEADLOCK ***
>>>
>>>  May be due to missing lock nesting notation
>>
>> Pretty much the case, I suppose. The lock held by sctp_getsockopt() is
>> on one socket, while the other lock that sctp_close() is getting later
>> is on the newly created (which failed) socket during peeloff
>> operation.
>
>
> Does this mean that never-ever lock 2 sockets at a time except for
> this case? If so, it probably suggests that this case should not do it
> either.
>

Yeah, actually for the error path we don't even need to lock sock
since it is newly allocated and no one else could see it yet.

Instead of checking for the status of the sock, I believe the following
one-line fix should do the trick too. Can you give it a try?

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 0f378ea..4de62d4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1494,7 +1494,7 @@ static void sctp_close(struct sock *sk, long timeout)

pr_debug("%s: sk:%p, timeout:%ld\n", __func__, sk, timeout);

-   lock_sock(sk);
+   lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
sk->sk_shutdown = SHUTDOWN_MASK;
sk->sk_state = SCTP_SS_CLOSING;


Re: net: deadlock between ip_expire/sch_direct_xmit

2017-03-14 Thread Cong Wang
On Tue, Mar 14, 2017 at 7:56 AM, Eric Dumazet  wrote:
> On Tue, Mar 14, 2017 at 7:46 AM, Dmitry Vyukov  wrote:
>
>> I am confused. Lockdep has observed both of these stacks:
>>
>>CPU0CPU1
>>
>>   lock(&(>lock)->rlock);
>>lock(_xmit_ETHER#2);
>>lock(&(>lock)->rlock);
>>   lock(_xmit_ETHER#2);
>>
>>
>> So it somehow happened. Or what do you mean?
>>
>
> Lockdep said " possible circular locking dependency detected " .
> It is not an actual deadlock, but lockdep machinery firing.
>
> For a dead lock to happen, this would require that he ICMP message
> sent by ip_expire() is itself fragmented and reassembled.
> This cannot be, because ICMP messages are not candidates for
> fragmentation, but lockdep can not know that of course...

It doesn't have to be ICMP, as long as get the same hash for
the inet_frag_queue, we will need to take the same lock and
deadlock will happen.

hash = ipqhashfn(iph->id, iph->saddr, iph->daddr, iph->protocol);

So it is really up to this hash function.


Re: net: deadlock between ip_expire/sch_direct_xmit

2017-03-14 Thread Cong Wang
On Tue, Mar 14, 2017 at 7:56 AM, Eric Dumazet  wrote:
> On Tue, Mar 14, 2017 at 7:46 AM, Dmitry Vyukov  wrote:
>
>> I am confused. Lockdep has observed both of these stacks:
>>
>>CPU0CPU1
>>
>>   lock(&(>lock)->rlock);
>>lock(_xmit_ETHER#2);
>>lock(&(>lock)->rlock);
>>   lock(_xmit_ETHER#2);
>>
>>
>> So it somehow happened. Or what do you mean?
>>
>
> Lockdep said " possible circular locking dependency detected " .
> It is not an actual deadlock, but lockdep machinery firing.
>
> For a dead lock to happen, this would require that he ICMP message
> sent by ip_expire() is itself fragmented and reassembled.
> This cannot be, because ICMP messages are not candidates for
> fragmentation, but lockdep can not know that of course...

It doesn't have to be ICMP, as long as get the same hash for
the inet_frag_queue, we will need to take the same lock and
deadlock will happen.

hash = ipqhashfn(iph->id, iph->saddr, iph->daddr, iph->protocol);

So it is really up to this hash function.


Re: net: BUG in unix_notinflight

2017-03-10 Thread Cong Wang
On Tue, Mar 7, 2017 at 2:23 PM, Nikolay Borisov
 wrote:
>
>>>
>>>
>>> New report from linux-next/c0b7b2b33bd17f7155956d0338ce92615da686c9
>>>
>>> [ cut here ]
>>> kernel BUG at net/unix/garbage.c:149!
>>> invalid opcode:  [#1] SMP KASAN
>>> Dumping ftrace buffer:
>>>(ftrace buffer empty)
>>> Modules linked in:
>>> CPU: 0 PID: 1806 Comm: syz-executor7 Not tainted 4.10.0-next-20170303+ #6
>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>> BIOS Google 01/01/2011
>>> task: 880121c64740 task.stack: 88012c9e8000
>>> RIP: 0010:unix_notinflight+0x417/0x5d0 net/unix/garbage.c:149
>>> RSP: 0018:88012c9ef0f8 EFLAGS: 00010297
>>> RAX: 880121c64740 RBX: 11002593de23 RCX: 8801c490c628
>>> RDX:  RSI: 11002593de27 RDI: 8557e504
>>> RBP: 88012c9ef220 R08: 0001 R09: 
>>> R10: dc00 R11: ed002593de55 R12: 8801c490c0c0
>>> R13: 88012c9ef1f8 R14: 85101620 R15: dc00
>>> FS:  013d3940() GS:8801dbe0() knlGS:
>>> CS:  0010 DS:  ES:  CR0: 80050033
>>> CR2: 01fd8cd8 CR3: 0001cce69000 CR4: 001426f0
>>> Call Trace:
>>>  unix_detach_fds.isra.23+0xfa/0x170 net/unix/af_unix.c:1490
>>>  unix_destruct_scm+0xf4/0x200 net/unix/af_unix.c:1499
>>
>> The problem here is there is no lock protecting concurrent unix_detach_fds()
>> even though unix_notinflight() is already serialized, if we call
>> unix_notinflight()
>> twice on the same file pointer, we trigger this bug...
>>
>> I don't know what is the right lock here to serialize it.
>>
>
>
> I reported something similar a while ago
> https://lists.gt.net/linux/kernel/2534612
>
> And Miklos Szeredi then produced the following patch :
>
> https://patchwork.kernel.org/patch/9305121/
>
> However, this was never applied. I wonder if the patch makes sense?

I doubt it is the same case. According to Miklos' description,
the case he tried to fix is MSG_PEEK, but Dmitry's test case does not
set it... They are different problems probably.


Re: net: BUG in unix_notinflight

2017-03-10 Thread Cong Wang
On Tue, Mar 7, 2017 at 2:23 PM, Nikolay Borisov
 wrote:
>
>>>
>>>
>>> New report from linux-next/c0b7b2b33bd17f7155956d0338ce92615da686c9
>>>
>>> [ cut here ]
>>> kernel BUG at net/unix/garbage.c:149!
>>> invalid opcode:  [#1] SMP KASAN
>>> Dumping ftrace buffer:
>>>(ftrace buffer empty)
>>> Modules linked in:
>>> CPU: 0 PID: 1806 Comm: syz-executor7 Not tainted 4.10.0-next-20170303+ #6
>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>> BIOS Google 01/01/2011
>>> task: 880121c64740 task.stack: 88012c9e8000
>>> RIP: 0010:unix_notinflight+0x417/0x5d0 net/unix/garbage.c:149
>>> RSP: 0018:88012c9ef0f8 EFLAGS: 00010297
>>> RAX: 880121c64740 RBX: 11002593de23 RCX: 8801c490c628
>>> RDX:  RSI: 11002593de27 RDI: 8557e504
>>> RBP: 88012c9ef220 R08: 0001 R09: 
>>> R10: dc00 R11: ed002593de55 R12: 8801c490c0c0
>>> R13: 88012c9ef1f8 R14: 85101620 R15: dc00
>>> FS:  013d3940() GS:8801dbe0() knlGS:
>>> CS:  0010 DS:  ES:  CR0: 80050033
>>> CR2: 01fd8cd8 CR3: 0001cce69000 CR4: 001426f0
>>> Call Trace:
>>>  unix_detach_fds.isra.23+0xfa/0x170 net/unix/af_unix.c:1490
>>>  unix_destruct_scm+0xf4/0x200 net/unix/af_unix.c:1499
>>
>> The problem here is there is no lock protecting concurrent unix_detach_fds()
>> even though unix_notinflight() is already serialized, if we call
>> unix_notinflight()
>> twice on the same file pointer, we trigger this bug...
>>
>> I don't know what is the right lock here to serialize it.
>>
>
>
> I reported something similar a while ago
> https://lists.gt.net/linux/kernel/2534612
>
> And Miklos Szeredi then produced the following patch :
>
> https://patchwork.kernel.org/patch/9305121/
>
> However, this was never applied. I wonder if the patch makes sense?

I doubt it is the same case. According to Miklos' description,
the case he tried to fix is MSG_PEEK, but Dmitry's test case does not
set it... They are different problems probably.


Re: net: BUG in unix_notinflight

2017-03-07 Thread Cong Wang
On Tue, Mar 7, 2017 at 12:37 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
> On Mon, Mar 6, 2017 at 11:34 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>> The problem here is there is no lock protecting concurrent unix_detach_fds()
>> even though unix_notinflight() is already serialized, if we call
>> unix_notinflight()
>> twice on the same file pointer, we trigger this bug...
>>
>> I don't know what is the right lock here to serialize it.
>
>
> What exactly here needs to be protected?
>
> 1484 static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
> 1485 {
> 1486 int i;
> 1487
> 1488 scm->fp = UNIXCB(skb).fp;
> 1489 UNIXCB(skb).fp = NULL;
> 1490
> 1491 for (i = scm->fp->count-1; i >= 0; i--)
> 1492 unix_notinflight(scm->fp->user, scm->fp->fp[i]);
> 1493 }
>
> Whole unix_notinflight happens under global unix_gc_lock.
>
> Is it that 2 threads call unix_detach_fds for the same skb, and then
> call unix_notinflight for the same fd twice?

Not the same skb, but their UNIXCB(skb).fp points to the same place,
therefore we call unix_notinflight() twice on the same fp->user and
fp->fp[i], although we have refcounting but still able to trigger this
warning.


Re: net: BUG in unix_notinflight

2017-03-07 Thread Cong Wang
On Tue, Mar 7, 2017 at 12:37 AM, Dmitry Vyukov  wrote:
> On Mon, Mar 6, 2017 at 11:34 PM, Cong Wang  wrote:
>> The problem here is there is no lock protecting concurrent unix_detach_fds()
>> even though unix_notinflight() is already serialized, if we call
>> unix_notinflight()
>> twice on the same file pointer, we trigger this bug...
>>
>> I don't know what is the right lock here to serialize it.
>
>
> What exactly here needs to be protected?
>
> 1484 static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
> 1485 {
> 1486 int i;
> 1487
> 1488 scm->fp = UNIXCB(skb).fp;
> 1489 UNIXCB(skb).fp = NULL;
> 1490
> 1491 for (i = scm->fp->count-1; i >= 0; i--)
> 1492 unix_notinflight(scm->fp->user, scm->fp->fp[i]);
> 1493 }
>
> Whole unix_notinflight happens under global unix_gc_lock.
>
> Is it that 2 threads call unix_detach_fds for the same skb, and then
> call unix_notinflight for the same fd twice?

Not the same skb, but their UNIXCB(skb).fp points to the same place,
therefore we call unix_notinflight() twice on the same fp->user and
fp->fp[i], although we have refcounting but still able to trigger this
warning.


Re: net: BUG in unix_notinflight

2017-03-06 Thread Cong Wang
On Mon, Mar 6, 2017 at 2:40 AM, Dmitry Vyukov  wrote:
> Now with a nice single-threaded C reproducer!

Excellent...

>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> void test()
> {
>   long r[54];
>   memset(r, -1, sizeof(r));
>   syscall(__NR_mmap, 0x2000ul, 0xfff000ul, 0x3ul, 0x32ul, -1, 0);
>   r[1] = syscall(__NR_socketpair, 0x1ul, 0x5ul, 0x0ul, 0x20521ff8ul);
>   r[2] = *(uint32_t*)0x20521ff8;
>   r[3] = *(uint32_t*)0x20521ffc;
>   r[5] = syscall(__NR_open, "/dev/net/tun", 0x20ul);
>   r[6] = syscall(__NR_socketpair, 0x1ul, 0x5ul, 0x0ul,
>  0x20d85000ul, 0, 0, 0, 0, 0);
>   r[7] = *(uint32_t*)0x20d85000;
>   (*(uint64_t*)0x2fc8 = (uint64_t)0x2000);
>   (*(uint32_t*)0x2fd0 = (uint32_t)0xa);
>   (*(uint64_t*)0x2fd8 = (uint64_t)0x2005d000);
>   (*(uint64_t*)0x2fe0 = (uint64_t)0x8);
>   (*(uint64_t*)0x2fe8 = (uint64_t)0x2ff0);
>   (*(uint64_t*)0x2ff0 = (uint64_t)0x1);
>   (*(uint32_t*)0x2ff8 = (uint32_t)0x0);
>   (*(uint16_t*)0x2000 = (uint16_t)0x1);
>   memcpy((void*)0x2002, "\x2e\x2f\x66\x69\x6c\x65\x30\x00", 8);
>   (*(uint64_t*)0x2005d000 = (uint64_t)0x20784f06);
>   (*(uint64_t*)0x2005d008 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d010 = (uint64_t)0x209a5f78);
>   (*(uint64_t*)0x2005d018 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d020 = (uint64_t)0x20ec3ffc);
>   (*(uint64_t*)0x2005d028 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d030 = (uint64_t)0x2057e000);
>   (*(uint64_t*)0x2005d038 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d040 = (uint64_t)0x200c9f9d);
>   (*(uint64_t*)0x2005d048 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d050 = (uint64_t)0x20331000);
>   (*(uint64_t*)0x2005d058 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d060 = (uint64_t)0x206a1f7b);
>   (*(uint64_t*)0x2005d068 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d070 = (uint64_t)0x20e7f000);
>   (*(uint64_t*)0x2005d078 = (uint64_t)0x0);
>   (*(uint64_t*)0x2ff0 = (uint64_t)0x18);
>   (*(uint32_t*)0x2ff8 = (uint32_t)0x1);
>   (*(uint32_t*)0x2ffc = (uint32_t)0x1);
>   (*(uint32_t*)0x20001000 = r[5]);
>   (*(uint32_t*)0x20001004 = r[7]);
>   syscall(__NR_sendmsg, r[7], 0x2fc8ul, 0x0ul);
>   (*(uint64_t*)0x2fc8 = (uint64_t)0x2000);
>   (*(uint32_t*)0x2fd0 = (uint32_t)0x8);
>   (*(uint64_t*)0x2fd8 = (uint64_t)0x20026000);
>   (*(uint64_t*)0x2fe0 = (uint64_t)0x0);
>   (*(uint64_t*)0x2fe8 = (uint64_t)0x2ff0);
>   (*(uint64_t*)0x2ff0 = (uint64_t)0x1);
>   (*(uint32_t*)0x2ff8 = (uint32_t)0x0);
>   (*(uint16_t*)0x2000 = (uint16_t)0x0);
>   (*(uint8_t*)0x2002 = (uint8_t)0x0);
>   (*(uint32_t*)0x2004 = (uint32_t)0x4e20);
>   (*(uint64_t*)0x2ff0 = (uint64_t)0x18);
>   (*(uint32_t*)0x2ff8 = (uint32_t)0x1);
>   (*(uint32_t*)0x2ffc = (uint32_t)0x1);
>   (*(uint32_t*)0x20001000 = r[2]);
>   syscall(__NR_sendmsg, r[3], 0x2fc8ul, 0x0ul);
> }
>
> int main()
> {
>   int i, pid, status;
>   for (i = 0; i < 4; i++) {
> if (fork() == 0) {
>   for (;;) {
> pid = fork();
> if (pid == 0) {
>   test();
>   exit(0);
> }
> while (waitpid(pid, , __WALL) != pid) {}
>   }
> }
>   }
>   sleep(100);
>   return 0;
> }
>
>
>
> New report from linux-next/c0b7b2b33bd17f7155956d0338ce92615da686c9
>
> [ cut here ]
> kernel BUG at net/unix/garbage.c:149!
> invalid opcode:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 1806 Comm: syz-executor7 Not tainted 4.10.0-next-20170303+ #6
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> task: 880121c64740 task.stack: 88012c9e8000
> RIP: 0010:unix_notinflight+0x417/0x5d0 net/unix/garbage.c:149
> RSP: 0018:88012c9ef0f8 EFLAGS: 00010297
> RAX: 880121c64740 RBX: 11002593de23 RCX: 8801c490c628
> RDX:  RSI: 11002593de27 RDI: 8557e504
> RBP: 88012c9ef220 R08: 0001 R09: 
> R10: dc00 R11: ed002593de55 R12: 8801c490c0c0
> R13: 88012c9ef1f8 R14: 85101620 R15: dc00
> FS:  013d3940() GS:8801dbe0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 01fd8cd8 CR3: 0001cce69000 CR4: 001426f0
> Call Trace:
>  unix_detach_fds.isra.23+0xfa/0x170 net/unix/af_unix.c:1490
>  unix_destruct_scm+0xf4/0x200 net/unix/af_unix.c:1499

The problem here is there is no lock protecting concurrent unix_detach_fds()
even though unix_notinflight() is already serialized, if we call
unix_notinflight()
twice on the same file pointer, we trigger this bug...

I don't know what is the right lock here to serialize it.


Re: net: BUG in unix_notinflight

2017-03-06 Thread Cong Wang
On Mon, Mar 6, 2017 at 2:40 AM, Dmitry Vyukov  wrote:
> Now with a nice single-threaded C reproducer!

Excellent...

>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> void test()
> {
>   long r[54];
>   memset(r, -1, sizeof(r));
>   syscall(__NR_mmap, 0x2000ul, 0xfff000ul, 0x3ul, 0x32ul, -1, 0);
>   r[1] = syscall(__NR_socketpair, 0x1ul, 0x5ul, 0x0ul, 0x20521ff8ul);
>   r[2] = *(uint32_t*)0x20521ff8;
>   r[3] = *(uint32_t*)0x20521ffc;
>   r[5] = syscall(__NR_open, "/dev/net/tun", 0x20ul);
>   r[6] = syscall(__NR_socketpair, 0x1ul, 0x5ul, 0x0ul,
>  0x20d85000ul, 0, 0, 0, 0, 0);
>   r[7] = *(uint32_t*)0x20d85000;
>   (*(uint64_t*)0x2fc8 = (uint64_t)0x2000);
>   (*(uint32_t*)0x2fd0 = (uint32_t)0xa);
>   (*(uint64_t*)0x2fd8 = (uint64_t)0x2005d000);
>   (*(uint64_t*)0x2fe0 = (uint64_t)0x8);
>   (*(uint64_t*)0x2fe8 = (uint64_t)0x2ff0);
>   (*(uint64_t*)0x2ff0 = (uint64_t)0x1);
>   (*(uint32_t*)0x2ff8 = (uint32_t)0x0);
>   (*(uint16_t*)0x2000 = (uint16_t)0x1);
>   memcpy((void*)0x2002, "\x2e\x2f\x66\x69\x6c\x65\x30\x00", 8);
>   (*(uint64_t*)0x2005d000 = (uint64_t)0x20784f06);
>   (*(uint64_t*)0x2005d008 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d010 = (uint64_t)0x209a5f78);
>   (*(uint64_t*)0x2005d018 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d020 = (uint64_t)0x20ec3ffc);
>   (*(uint64_t*)0x2005d028 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d030 = (uint64_t)0x2057e000);
>   (*(uint64_t*)0x2005d038 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d040 = (uint64_t)0x200c9f9d);
>   (*(uint64_t*)0x2005d048 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d050 = (uint64_t)0x20331000);
>   (*(uint64_t*)0x2005d058 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d060 = (uint64_t)0x206a1f7b);
>   (*(uint64_t*)0x2005d068 = (uint64_t)0x0);
>   (*(uint64_t*)0x2005d070 = (uint64_t)0x20e7f000);
>   (*(uint64_t*)0x2005d078 = (uint64_t)0x0);
>   (*(uint64_t*)0x2ff0 = (uint64_t)0x18);
>   (*(uint32_t*)0x2ff8 = (uint32_t)0x1);
>   (*(uint32_t*)0x2ffc = (uint32_t)0x1);
>   (*(uint32_t*)0x20001000 = r[5]);
>   (*(uint32_t*)0x20001004 = r[7]);
>   syscall(__NR_sendmsg, r[7], 0x2fc8ul, 0x0ul);
>   (*(uint64_t*)0x2fc8 = (uint64_t)0x2000);
>   (*(uint32_t*)0x2fd0 = (uint32_t)0x8);
>   (*(uint64_t*)0x2fd8 = (uint64_t)0x20026000);
>   (*(uint64_t*)0x2fe0 = (uint64_t)0x0);
>   (*(uint64_t*)0x2fe8 = (uint64_t)0x2ff0);
>   (*(uint64_t*)0x2ff0 = (uint64_t)0x1);
>   (*(uint32_t*)0x2ff8 = (uint32_t)0x0);
>   (*(uint16_t*)0x2000 = (uint16_t)0x0);
>   (*(uint8_t*)0x2002 = (uint8_t)0x0);
>   (*(uint32_t*)0x2004 = (uint32_t)0x4e20);
>   (*(uint64_t*)0x2ff0 = (uint64_t)0x18);
>   (*(uint32_t*)0x2ff8 = (uint32_t)0x1);
>   (*(uint32_t*)0x2ffc = (uint32_t)0x1);
>   (*(uint32_t*)0x20001000 = r[2]);
>   syscall(__NR_sendmsg, r[3], 0x2fc8ul, 0x0ul);
> }
>
> int main()
> {
>   int i, pid, status;
>   for (i = 0; i < 4; i++) {
> if (fork() == 0) {
>   for (;;) {
> pid = fork();
> if (pid == 0) {
>   test();
>   exit(0);
> }
> while (waitpid(pid, , __WALL) != pid) {}
>   }
> }
>   }
>   sleep(100);
>   return 0;
> }
>
>
>
> New report from linux-next/c0b7b2b33bd17f7155956d0338ce92615da686c9
>
> [ cut here ]
> kernel BUG at net/unix/garbage.c:149!
> invalid opcode:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 1806 Comm: syz-executor7 Not tainted 4.10.0-next-20170303+ #6
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> task: 880121c64740 task.stack: 88012c9e8000
> RIP: 0010:unix_notinflight+0x417/0x5d0 net/unix/garbage.c:149
> RSP: 0018:88012c9ef0f8 EFLAGS: 00010297
> RAX: 880121c64740 RBX: 11002593de23 RCX: 8801c490c628
> RDX:  RSI: 11002593de27 RDI: 8557e504
> RBP: 88012c9ef220 R08: 0001 R09: 
> R10: dc00 R11: ed002593de55 R12: 8801c490c0c0
> R13: 88012c9ef1f8 R14: 85101620 R15: dc00
> FS:  013d3940() GS:8801dbe0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 01fd8cd8 CR3: 0001cce69000 CR4: 001426f0
> Call Trace:
>  unix_detach_fds.isra.23+0xfa/0x170 net/unix/af_unix.c:1490
>  unix_destruct_scm+0xf4/0x200 net/unix/af_unix.c:1499

The problem here is there is no lock protecting concurrent unix_detach_fds()
even though unix_notinflight() is already serialized, if we call
unix_notinflight()
twice on the same file pointer, we trigger this bug...

I don't know what is the right lock here to serialize it.


Re: netlink: GPF in netlink_unicast

2017-03-06 Thread Cong Wang
On Mon, Mar 6, 2017 at 2:54 AM, Dmitry Vyukov  wrote:
> Hello,
>
> I've got the following crash while running syzkaller fuzzer on
> net-next/8d70eeb84ab277377c017af6a21d0a337025dede:
>
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 883 Comm: kauditd Not tainted 4.10.0+ #6
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> task: 8801d79f0240 task.stack: 8801d7a2
> RIP: 0010:sock_sndtimeo include/net/sock.h:2162 [inline]
> RIP: 0010:netlink_unicast+0xdd/0x730 net/netlink/af_netlink.c:1249
> RSP: 0018:8801d7a27c38 EFLAGS: 00010206
> RAX: 0056 RBX: 8801d7a27cd0 RCX: 
> RDX:  RSI:  RDI: 02b0
> RBP: 8801d7a27cf8 R08: ed00385cf286 R09: ed00385cf286
> R10: 0006 R11: ed00385cf285 R12: 
> R13: dc00 R14: 8801c2fc3c80 R15: 014000c0
> FS:  () GS:8801dbe0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 20cfd000 CR3: 0001c758f000 CR4: 001406f0
> Call Trace:
>  kauditd_send_unicast_skb+0x3c/0x70 kernel/audit.c:482
>  kauditd_thread+0x174/0xb00 kernel/audit.c:599
>  kthread+0x326/0x3f0 kernel/kthread.c:229
>  ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
> Code: 44 89 fe e8 56 15 ff ff 8b 8d 70 ff ff ff 49 89 c6 31 c0 85 c9
> 75 27 e8 b2 b2 f4 fd 49 8d bc 24 b0 02 00 00 48 89 f8 48 c1 e8 03 <42>
> 80 3c 28 00 0f 85 37 06 00 00 49 8b 84 24 b0 02 00 00 4c 8d
> RIP: sock_sndtimeo include/net/sock.h:2162 [inline] RSP: 8801d7a27c38
> RIP: netlink_unicast+0xdd/0x730 net/netlink/af_netlink.c:1249 RSP:
> 8801d7a27c38
> ---[ end trace ad1bba9d457430b6 ]---
> Kernel panic - not syncing: Fatal exception
>
>
> This is not reproducible and seems to be caused by an elusive race.
> However, looking at the code I don't see any proper protection of
> audit_sock (other than the if (!audit_pid) which is obviously not
> enough to protect against races).

audit_cmd_mutex is supposed to protect it, I think.
But kauditd_send_unicast_skb() seems not holding this mutex.

Richard?


Re: netlink: GPF in netlink_unicast

2017-03-06 Thread Cong Wang
On Mon, Mar 6, 2017 at 2:54 AM, Dmitry Vyukov  wrote:
> Hello,
>
> I've got the following crash while running syzkaller fuzzer on
> net-next/8d70eeb84ab277377c017af6a21d0a337025dede:
>
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 883 Comm: kauditd Not tainted 4.10.0+ #6
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> task: 8801d79f0240 task.stack: 8801d7a2
> RIP: 0010:sock_sndtimeo include/net/sock.h:2162 [inline]
> RIP: 0010:netlink_unicast+0xdd/0x730 net/netlink/af_netlink.c:1249
> RSP: 0018:8801d7a27c38 EFLAGS: 00010206
> RAX: 0056 RBX: 8801d7a27cd0 RCX: 
> RDX:  RSI:  RDI: 02b0
> RBP: 8801d7a27cf8 R08: ed00385cf286 R09: ed00385cf286
> R10: 0006 R11: ed00385cf285 R12: 
> R13: dc00 R14: 8801c2fc3c80 R15: 014000c0
> FS:  () GS:8801dbe0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 20cfd000 CR3: 0001c758f000 CR4: 001406f0
> Call Trace:
>  kauditd_send_unicast_skb+0x3c/0x70 kernel/audit.c:482
>  kauditd_thread+0x174/0xb00 kernel/audit.c:599
>  kthread+0x326/0x3f0 kernel/kthread.c:229
>  ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
> Code: 44 89 fe e8 56 15 ff ff 8b 8d 70 ff ff ff 49 89 c6 31 c0 85 c9
> 75 27 e8 b2 b2 f4 fd 49 8d bc 24 b0 02 00 00 48 89 f8 48 c1 e8 03 <42>
> 80 3c 28 00 0f 85 37 06 00 00 49 8b 84 24 b0 02 00 00 4c 8d
> RIP: sock_sndtimeo include/net/sock.h:2162 [inline] RSP: 8801d7a27c38
> RIP: netlink_unicast+0xdd/0x730 net/netlink/af_netlink.c:1249 RSP:
> 8801d7a27c38
> ---[ end trace ad1bba9d457430b6 ]---
> Kernel panic - not syncing: Fatal exception
>
>
> This is not reproducible and seems to be caused by an elusive race.
> However, looking at the code I don't see any proper protection of
> audit_sock (other than the if (!audit_pid) which is obviously not
> enough to protect against races).

audit_cmd_mutex is supposed to protect it, I think.
But kauditd_send_unicast_skb() seems not holding this mutex.

Richard?


Re: net/ipv4: deadlock in ip_ra_control

2017-03-05 Thread Cong Wang
On Fri, Mar 3, 2017 at 10:43 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
> On Thu, Mar 2, 2017 at 10:40 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
>> On Wed, Mar 1, 2017 at 6:18 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>>> On Wed, Mar 1, 2017 at 2:44 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
>>>> Hello,
>>>>
>>>> I've got the following deadlock report while running syzkaller fuzzer
>>>> on linux-next/51788aebe7cae79cb334ad50641347465fc188fd:
>>>>
>>>> ==
>>>> [ INFO: possible circular locking dependency detected ]
>>>> 4.10.0-next-20170301+ #1 Not tainted
>>>> ---
>>>> syz-executor1/3394 is trying to acquire lock:
>>>>  (sk_lock-AF_INET){+.+.+.}, at: [] lock_sock
>>>> include/net/sock.h:1460 [inline]
>>>>  (sk_lock-AF_INET){+.+.+.}, at: []
>>>> do_ip_setsockopt.isra.12+0x21c/0x3540 net/ipv4/ip_sockglue.c:652
>>>>
>>>> but task is already holding lock:
>>>>  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
>>>> net/core/rtnetlink.c:70
>>>>
>>>> which lock already depends on the new lock.
>>>>
>>>>
>>>> the existing dependency chain (in reverse order) is:
>>>>
>>>> -> #1 (rtnl_mutex){+.+.+.}:
>>>>validate_chain kernel/locking/lockdep.c:2265 [inline]
>>>>__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
>>>>lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
>>>>__mutex_lock_common kernel/locking/mutex.c:754 [inline]
>>>>__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:891
>>>>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:906
>>>>rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
>>>>mrtsock_destruct+0x86/0x2c0 net/ipv4/ipmr.c:1281
>>>>ip_ra_control+0x459/0x600 net/ipv4/ip_sockglue.c:372
>>>>do_ip_setsockopt.isra.12+0x1064/0x3540 net/ipv4/ip_sockglue.c:1161
>>>>ip_setsockopt+0x3a/0xb0 net/ipv4/ip_sockglue.c:1264
>>>>raw_setsockopt+0xb7/0xd0 net/ipv4/raw.c:839
>>>>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2725
>>>>SYSC_setsockopt net/socket.c:1786 [inline]
>>>>SyS_setsockopt+0x25c/0x390 net/socket.c:1765
>>>>entry_SYSCALL_64_fastpath+0x1f/0xc2
>>>>
>>>> -> #0 (sk_lock-AF_INET){+.+.+.}:
>>>>check_prev_add kernel/locking/lockdep.c:1828 [inline]
>>>>check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938
>>>>validate_chain kernel/locking/lockdep.c:2265 [inline]
>>>>__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
>>>>lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
>>>>lock_sock_nested+0xcb/0x120 net/core/sock.c:2530
>>>>lock_sock include/net/sock.h:1460 [inline]
>>>>do_ip_setsockopt.isra.12+0x21c/0x3540 net/ipv4/ip_sockglue.c:652
>>>>ip_setsockopt+0x3a/0xb0 net/ipv4/ip_sockglue.c:1264
>>>>tcp_setsockopt+0x82/0xd0 net/ipv4/tcp.c:2721
>>>>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2725
>>>>SYSC_setsockopt net/socket.c:1786 [inline]
>>>>SyS_setsockopt+0x25c/0x390 net/socket.c:1765
>>>>entry_SYSCALL_64_fastpath+0x1f/0xc2
>>>>
>>>
>>> Please try the attached patch (compile only).
>>
>>
>> Pushed the patch to the bots.
>> Thanks
>
>
> This patch triggers:

Ah, update the patch to fix this.
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index ebd953b..bda318a 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -591,6 +591,7 @@ static bool setsockopt_needs_rtnl(int optname)
case MCAST_LEAVE_GROUP:
case MCAST_LEAVE_SOURCE_GROUP:
case MCAST_UNBLOCK_SOURCE:
+   case IP_ROUTER_ALERT:
return true;
}
return false;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index c0317c9..b036e85 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1278,7 +1278,7 @@ static void mrtsock_destruct(struct sock *sk)
struct net *net = sock_net(sk);
struct mr_table *mrt;
 
-   rtnl_lock();
+   ASSERT_RTNL();
ipmr_for_each_table(mrt, net) {
if (sk == rtnl_dereference(mrt->mroute_sk)) {
IPV4_DEVCONF_ALL(net

Re: net/ipv4: deadlock in ip_ra_control

2017-03-05 Thread Cong Wang
On Fri, Mar 3, 2017 at 10:43 AM, Dmitry Vyukov  wrote:
> On Thu, Mar 2, 2017 at 10:40 AM, Dmitry Vyukov  wrote:
>> On Wed, Mar 1, 2017 at 6:18 PM, Cong Wang  wrote:
>>> On Wed, Mar 1, 2017 at 2:44 AM, Dmitry Vyukov  wrote:
>>>> Hello,
>>>>
>>>> I've got the following deadlock report while running syzkaller fuzzer
>>>> on linux-next/51788aebe7cae79cb334ad50641347465fc188fd:
>>>>
>>>> ==
>>>> [ INFO: possible circular locking dependency detected ]
>>>> 4.10.0-next-20170301+ #1 Not tainted
>>>> ---
>>>> syz-executor1/3394 is trying to acquire lock:
>>>>  (sk_lock-AF_INET){+.+.+.}, at: [] lock_sock
>>>> include/net/sock.h:1460 [inline]
>>>>  (sk_lock-AF_INET){+.+.+.}, at: []
>>>> do_ip_setsockopt.isra.12+0x21c/0x3540 net/ipv4/ip_sockglue.c:652
>>>>
>>>> but task is already holding lock:
>>>>  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
>>>> net/core/rtnetlink.c:70
>>>>
>>>> which lock already depends on the new lock.
>>>>
>>>>
>>>> the existing dependency chain (in reverse order) is:
>>>>
>>>> -> #1 (rtnl_mutex){+.+.+.}:
>>>>validate_chain kernel/locking/lockdep.c:2265 [inline]
>>>>__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
>>>>lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
>>>>__mutex_lock_common kernel/locking/mutex.c:754 [inline]
>>>>__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:891
>>>>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:906
>>>>rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
>>>>mrtsock_destruct+0x86/0x2c0 net/ipv4/ipmr.c:1281
>>>>ip_ra_control+0x459/0x600 net/ipv4/ip_sockglue.c:372
>>>>do_ip_setsockopt.isra.12+0x1064/0x3540 net/ipv4/ip_sockglue.c:1161
>>>>ip_setsockopt+0x3a/0xb0 net/ipv4/ip_sockglue.c:1264
>>>>raw_setsockopt+0xb7/0xd0 net/ipv4/raw.c:839
>>>>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2725
>>>>SYSC_setsockopt net/socket.c:1786 [inline]
>>>>SyS_setsockopt+0x25c/0x390 net/socket.c:1765
>>>>entry_SYSCALL_64_fastpath+0x1f/0xc2
>>>>
>>>> -> #0 (sk_lock-AF_INET){+.+.+.}:
>>>>check_prev_add kernel/locking/lockdep.c:1828 [inline]
>>>>check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938
>>>>validate_chain kernel/locking/lockdep.c:2265 [inline]
>>>>__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
>>>>lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
>>>>lock_sock_nested+0xcb/0x120 net/core/sock.c:2530
>>>>lock_sock include/net/sock.h:1460 [inline]
>>>>do_ip_setsockopt.isra.12+0x21c/0x3540 net/ipv4/ip_sockglue.c:652
>>>>ip_setsockopt+0x3a/0xb0 net/ipv4/ip_sockglue.c:1264
>>>>tcp_setsockopt+0x82/0xd0 net/ipv4/tcp.c:2721
>>>>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2725
>>>>SYSC_setsockopt net/socket.c:1786 [inline]
>>>>SyS_setsockopt+0x25c/0x390 net/socket.c:1765
>>>>entry_SYSCALL_64_fastpath+0x1f/0xc2
>>>>
>>>
>>> Please try the attached patch (compile only).
>>
>>
>> Pushed the patch to the bots.
>> Thanks
>
>
> This patch triggers:

Ah, update the patch to fix this.
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index ebd953b..bda318a 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -591,6 +591,7 @@ static bool setsockopt_needs_rtnl(int optname)
case MCAST_LEAVE_GROUP:
case MCAST_LEAVE_SOURCE_GROUP:
case MCAST_UNBLOCK_SOURCE:
+   case IP_ROUTER_ALERT:
return true;
}
return false;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index c0317c9..b036e85 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1278,7 +1278,7 @@ static void mrtsock_destruct(struct sock *sk)
struct net *net = sock_net(sk);
struct mr_table *mrt;
 
-   rtnl_lock();
+   ASSERT_RTNL();
ipmr_for_each_table(mrt, net) {
if (sk == rtnl_dereference(mrt->mroute_sk)) {
IPV4_DEVCONF_ALL(net, MC_FORWARDING)--;
@@ -1289,7 +1289,6 @@ static void mrtsock_destruct(struct sock 

Re: net/ipv6: null-ptr-deref in ip6mr_sk_done

2017-03-05 Thread Cong Wang
On Sun, Mar 5, 2017 at 8:54 AM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following bug report while fuzzing the kernel with syzkaller.
> Unfortunately it's not reproducible.
>
> On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570 (Feb 26).
>
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 14446 Comm: syz-executor6 Not tainted 4.10.0+ #82
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88001f311700 task.stack: 88001f6e8000
> RIP: 0010:ip6mr_sk_done+0x15a/0x3d0 net/ipv6/ip6mr.c:1618
> RSP: 0018:88001f6ef418 EFLAGS: 00010202
> RAX: dc00 RBX: 110003edde8c RCX: c900043ee000
> RDX: 0004 RSI: 83e3b3f8 RDI: 0020
> RBP: 88001f6ef508 R08: fbfff0dcc5d8 R09: 
> R10: 86e62ec0 R11:  R12: 
> R13:  R14: 88001f6ef4e0 R15: 8800380a0040
> FS:  7f7a52cec700() GS:88003ec0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0061c500 CR3: 1f1ae000 CR4: 06f0
> DR0: 2000 DR1: 2000 DR2: 
> DR3:  DR6: 0ff0 DR7: 0600
> Call Trace:
>  rawv6_close+0x4c/0x80 net/ipv6/raw.c:1217
>  inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
>  sock_release+0x8d/0x1e0 net/socket.c:597
>  __sock_create+0x39d/0x880 net/socket.c:1226
>  sock_create_kern+0x3f/0x50 net/socket.c:1243
>  inet_ctl_sock_create+0xbb/0x280 net/ipv4/af_inet.c:1526
>  icmpv6_sk_init+0x163/0x500 net/ipv6/icmp.c:954
>  ops_init+0x10a/0x550 net/core/net_namespace.c:115
>  setup_net+0x261/0x660 net/core/net_namespace.c:291
>  copy_net_ns+0x27e/0x540 net/core/net_namespace.c:396
> 9pnet_virtio: no channels available for device ./file1
>  create_new_namespaces+0x437/0x9b0 kernel/nsproxy.c:106
>  unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
>  SYSC_unshare kernel/fork.c:2281 [inline]
>  SyS_unshare+0x64e/0x1000 kernel/fork.c:2231
>  entry_SYSCALL_64_fastpath+0x1f/0xc2

Hmm, I think net->ipv6.mr6_tables is not initialized at that point,
ip6mr_rules_init() is not called yet, therefore on the error path when
we iterator the list, we trigger this oops. We need some other way
to check if it is initialized or not.


Re: net/ipv6: null-ptr-deref in ip6mr_sk_done

2017-03-05 Thread Cong Wang
On Sun, Mar 5, 2017 at 8:54 AM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following bug report while fuzzing the kernel with syzkaller.
> Unfortunately it's not reproducible.
>
> On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570 (Feb 26).
>
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 14446 Comm: syz-executor6 Not tainted 4.10.0+ #82
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88001f311700 task.stack: 88001f6e8000
> RIP: 0010:ip6mr_sk_done+0x15a/0x3d0 net/ipv6/ip6mr.c:1618
> RSP: 0018:88001f6ef418 EFLAGS: 00010202
> RAX: dc00 RBX: 110003edde8c RCX: c900043ee000
> RDX: 0004 RSI: 83e3b3f8 RDI: 0020
> RBP: 88001f6ef508 R08: fbfff0dcc5d8 R09: 
> R10: 86e62ec0 R11:  R12: 
> R13:  R14: 88001f6ef4e0 R15: 8800380a0040
> FS:  7f7a52cec700() GS:88003ec0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0061c500 CR3: 1f1ae000 CR4: 06f0
> DR0: 2000 DR1: 2000 DR2: 
> DR3:  DR6: 0ff0 DR7: 0600
> Call Trace:
>  rawv6_close+0x4c/0x80 net/ipv6/raw.c:1217
>  inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
>  sock_release+0x8d/0x1e0 net/socket.c:597
>  __sock_create+0x39d/0x880 net/socket.c:1226
>  sock_create_kern+0x3f/0x50 net/socket.c:1243
>  inet_ctl_sock_create+0xbb/0x280 net/ipv4/af_inet.c:1526
>  icmpv6_sk_init+0x163/0x500 net/ipv6/icmp.c:954
>  ops_init+0x10a/0x550 net/core/net_namespace.c:115
>  setup_net+0x261/0x660 net/core/net_namespace.c:291
>  copy_net_ns+0x27e/0x540 net/core/net_namespace.c:396
> 9pnet_virtio: no channels available for device ./file1
>  create_new_namespaces+0x437/0x9b0 kernel/nsproxy.c:106
>  unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
>  SYSC_unshare kernel/fork.c:2281 [inline]
>  SyS_unshare+0x64e/0x1000 kernel/fork.c:2231
>  entry_SYSCALL_64_fastpath+0x1f/0xc2

Hmm, I think net->ipv6.mr6_tables is not initialized at that point,
ip6mr_rules_init() is not called yet, therefore on the error path when
we iterator the list, we trigger this oops. We need some other way
to check if it is initialized or not.


Re: net/kcm: use-after-free in kcm_wq

2017-03-03 Thread Cong Wang
On Fri, Mar 3, 2017 at 2:11 AM, Dmitry Vyukov  wrote:
> Also like this one:
>
> ==
> BUG: KASAN: use-after-free in atomic_long_read
> include/linux/compiler.h:254 [inline] at addr 8800538aba60
> BUG: KASAN: use-after-free in get_work_pool+0x2f2/0x340
> kernel/workqueue.c:709 at addr 8800538aba60
> Read of size 8 by task syz-executor6/7965
> CPU: 2 PID: 7965 Comm: syz-executor6 Not tainted 4.10.0+ #248
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>  kasan_object_err+0x1c/0x70 mm/kasan/report.c:166
>  print_address_description mm/kasan/report.c:204 [inline]
>  kasan_report_error mm/kasan/report.c:288 [inline]
>  kasan_report.part.2+0x198/0x440 mm/kasan/report.c:310
>  kasan_report mm/kasan/report.c:331 [inline]
>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:331
>  atomic_long_read include/linux/compiler.h:254 [inline]
>  get_work_pool+0x2f2/0x340 kernel/workqueue.c:709
>  __queue_work+0x2b3/0x1210 kernel/workqueue.c:1401
>  queue_work_on+0x2e9/0x330 kernel/workqueue.c:1486
>  queue_work include/linux/workqueue.h:487 [inline]
>  strp_check_rcv+0x25/0x30 net/strparser/strparser.c:494


It is not kcm_wq, it is strp_wq, and the work struct is strp->rx_work
which lives in struct kcm_psock. The work is cancelled by strp_done(),
it seems get queued again after strp_done()...


>  kcm_attach net/kcm/kcmsock.c:1434 [inline]
>  kcm_attach_ioctl net/kcm/kcmsock.c:1455 [inline]
>  kcm_ioctl+0x8bb/0x1800 net/kcm/kcmsock.c:1690
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:43 [inline]
>  do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x4458d9
> RSP: 002b:7f1dce9d1b58 EFLAGS: 0286 ORIG_RAX: 0010
> RAX: ffda RBX: 0024 RCX: 004458d9
> RDX: 20b68000 RSI: 89e0 RDI: 0024
> RBP: 006e0220 R08:  R09: 
> R10:  R11: 0286 R12: 007080a8
> R13:  R14: 7f1dce9d29c0 R15: 7f1dce9d2700
> Object at 8800538ab940, in cache kcm_psock_cache size: 616
> Allocated:
> PID = 7965
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:502
>  set_track mm/kasan/kasan.c:514 [inline]
>  kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:605
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
>  kmem_cache_alloc+0x102/0x680 mm/slab.c:3571
>  kmem_cache_zalloc include/linux/slab.h:653 [inline]
>  kcm_attach net/kcm/kcmsock.c:1384 [inline]
>  kcm_attach_ioctl net/kcm/kcmsock.c:1455 [inline]
>  kcm_ioctl+0x303/0x1800 net/kcm/kcmsock.c:1690
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:43 [inline]
>  do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> Freed:
> PID = 7982
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:502
>  set_track mm/kasan/kasan.c:514 [inline]
>  kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:578
>  __cache_free mm/slab.c:3513 [inline]
>  kmem_cache_free+0x71/0x240 mm/slab.c:3773
>  kcm_unattach+0xee7/0x1520 net/kcm/kcmsock.c:1558
>  kcm_unattach_ioctl net/kcm/kcmsock.c:1603 [inline]
>  kcm_ioctl+0xfae/0x1800 net/kcm/kcmsock.c:1700
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:43 [inline]
>  do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> Memory state around the buggy address:
>  8800538ab900: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>  8800538ab980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>8800538aba00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>^
>  8800538aba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  8800538abb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==


Re: net/kcm: use-after-free in kcm_wq

2017-03-03 Thread Cong Wang
On Fri, Mar 3, 2017 at 2:11 AM, Dmitry Vyukov  wrote:
> Also like this one:
>
> ==
> BUG: KASAN: use-after-free in atomic_long_read
> include/linux/compiler.h:254 [inline] at addr 8800538aba60
> BUG: KASAN: use-after-free in get_work_pool+0x2f2/0x340
> kernel/workqueue.c:709 at addr 8800538aba60
> Read of size 8 by task syz-executor6/7965
> CPU: 2 PID: 7965 Comm: syz-executor6 Not tainted 4.10.0+ #248
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>  kasan_object_err+0x1c/0x70 mm/kasan/report.c:166
>  print_address_description mm/kasan/report.c:204 [inline]
>  kasan_report_error mm/kasan/report.c:288 [inline]
>  kasan_report.part.2+0x198/0x440 mm/kasan/report.c:310
>  kasan_report mm/kasan/report.c:331 [inline]
>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:331
>  atomic_long_read include/linux/compiler.h:254 [inline]
>  get_work_pool+0x2f2/0x340 kernel/workqueue.c:709
>  __queue_work+0x2b3/0x1210 kernel/workqueue.c:1401
>  queue_work_on+0x2e9/0x330 kernel/workqueue.c:1486
>  queue_work include/linux/workqueue.h:487 [inline]
>  strp_check_rcv+0x25/0x30 net/strparser/strparser.c:494


It is not kcm_wq, it is strp_wq, and the work struct is strp->rx_work
which lives in struct kcm_psock. The work is cancelled by strp_done(),
it seems get queued again after strp_done()...


>  kcm_attach net/kcm/kcmsock.c:1434 [inline]
>  kcm_attach_ioctl net/kcm/kcmsock.c:1455 [inline]
>  kcm_ioctl+0x8bb/0x1800 net/kcm/kcmsock.c:1690
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:43 [inline]
>  do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x4458d9
> RSP: 002b:7f1dce9d1b58 EFLAGS: 0286 ORIG_RAX: 0010
> RAX: ffda RBX: 0024 RCX: 004458d9
> RDX: 20b68000 RSI: 89e0 RDI: 0024
> RBP: 006e0220 R08:  R09: 
> R10:  R11: 0286 R12: 007080a8
> R13:  R14: 7f1dce9d29c0 R15: 7f1dce9d2700
> Object at 8800538ab940, in cache kcm_psock_cache size: 616
> Allocated:
> PID = 7965
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:502
>  set_track mm/kasan/kasan.c:514 [inline]
>  kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:605
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
>  kmem_cache_alloc+0x102/0x680 mm/slab.c:3571
>  kmem_cache_zalloc include/linux/slab.h:653 [inline]
>  kcm_attach net/kcm/kcmsock.c:1384 [inline]
>  kcm_attach_ioctl net/kcm/kcmsock.c:1455 [inline]
>  kcm_ioctl+0x303/0x1800 net/kcm/kcmsock.c:1690
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:43 [inline]
>  do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> Freed:
> PID = 7982
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:502
>  set_track mm/kasan/kasan.c:514 [inline]
>  kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:578
>  __cache_free mm/slab.c:3513 [inline]
>  kmem_cache_free+0x71/0x240 mm/slab.c:3773
>  kcm_unattach+0xee7/0x1520 net/kcm/kcmsock.c:1558
>  kcm_unattach_ioctl net/kcm/kcmsock.c:1603 [inline]
>  kcm_ioctl+0xfae/0x1800 net/kcm/kcmsock.c:1700
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x2c2/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:43 [inline]
>  do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> Memory state around the buggy address:
>  8800538ab900: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>  8800538ab980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>8800538aba00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>^
>  8800538aba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  8800538abb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==


Re: net: use-after-free in neigh_timer_handler/sock_wfree

2017-03-01 Thread Cong Wang
On Wed, Mar 1, 2017 at 3:15 PM, Eric Dumazet <eduma...@google.com> wrote:
> On Wed, Mar 1, 2017 at 3:09 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>
>>
>> But I doubt skb_orphan() is the solution here, shouldn't we just
>> update sk->sk_wmem_alloc with skb->truesize changes?
>
> Is it worth it ? Apart from syszkaller I mean...
>
> We started with something that had a real impact on real workloads.
>
> 158f323b9868b59967ad96957c4ca388161be321 net: adjust skb->truesize in
> pskb_expand_head()
>
> Note that auditing the stack took me a while.

I don't know how sk refcnt could work correctly without making
sk_wmem_alloc correctly. We certainly could just call skb_orphan()
is we don't need skb->sk any more, probably like the frag case,
but for this case, the neigh one, the skb's sitting in neigh->arp_queue
are not going to be freed unless in failed case, therefore skb->sk
should not be orphaned so early.


Re: net: use-after-free in neigh_timer_handler/sock_wfree

2017-03-01 Thread Cong Wang
On Wed, Mar 1, 2017 at 3:15 PM, Eric Dumazet  wrote:
> On Wed, Mar 1, 2017 at 3:09 PM, Cong Wang  wrote:
>
>>
>> But I doubt skb_orphan() is the solution here, shouldn't we just
>> update sk->sk_wmem_alloc with skb->truesize changes?
>
> Is it worth it ? Apart from syszkaller I mean...
>
> We started with something that had a real impact on real workloads.
>
> 158f323b9868b59967ad96957c4ca388161be321 net: adjust skb->truesize in
> pskb_expand_head()
>
> Note that auditing the stack took me a while.

I don't know how sk refcnt could work correctly without making
sk_wmem_alloc correctly. We certainly could just call skb_orphan()
is we don't need skb->sk any more, probably like the frag case,
but for this case, the neigh one, the skb's sitting in neigh->arp_queue
are not going to be freed unless in failed case, therefore skb->sk
should not be orphaned so early.


Re: [PATCH v4] net: don't call strlen() on the user buffer in packet_bind_spkt()

2017-03-01 Thread Cong Wang
On Wed, Mar 1, 2017 at 3:57 AM, Alexander Potapenko  wrote:
> This happens because addr.sa_data copied from the userspace is not
> zero-terminated, and copying it with strlcpy() in packet_bind_spkt()
> results in calling strlen() on the kernel copy of that non-terminated
> buffer.

Very similar to

commit b301f2538759933cf9ff1f7c4f968da72e3f0757
Author: Pablo Neira Ayuso 
Date:   Thu Mar 24 21:29:53 2016 +0100

netfilter: x_tables: enforce nul-terminated table name from
getsockopt GET_ENTRIES


Re: [PATCH v4] net: don't call strlen() on the user buffer in packet_bind_spkt()

2017-03-01 Thread Cong Wang
On Wed, Mar 1, 2017 at 3:57 AM, Alexander Potapenko  wrote:
> This happens because addr.sa_data copied from the userspace is not
> zero-terminated, and copying it with strlcpy() in packet_bind_spkt()
> results in calling strlen() on the kernel copy of that non-terminated
> buffer.

Very similar to

commit b301f2538759933cf9ff1f7c4f968da72e3f0757
Author: Pablo Neira Ayuso 
Date:   Thu Mar 24 21:29:53 2016 +0100

netfilter: x_tables: enforce nul-terminated table name from
getsockopt GET_ENTRIES


Re: net: use-after-free in neigh_timer_handler/sock_wfree

2017-03-01 Thread Cong Wang
On Wed, Mar 1, 2017 at 1:54 PM, Eric Dumazet <eduma...@google.com> wrote:
> On Wed, Mar 1, 2017 at 1:43 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>>>
>>> This one looks very similar to a previous one:
>>> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>>>
>>> Both happen on raw v6 sockets.
>>>
>>> For me, it seems the sk refcnt is not correct, skb should still hold
>>> a refcnt so it should not be freed before kfree_skb() in a timer
>>> handler...
>>
>> More precisely, after this commit:
>>
>> commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
>> Author: Eric Dumazet <eric.duma...@gmail.com>
>> Date:   Thu Jun 11 02:55:43 2009 -0700
>>
>> net: No more expensive sock_hold()/sock_put() on each tx
>>
>> we don't take (old) refcnt any more on TX path, sk_wmem_alloc
>> is the new refcnt. ;)
>
> So the bug is that skb->truesize is mangled by reassembly unit,
> while sbk->sk is tracking sk_wmem_alloc changes in order
> to decide when it is safe to free sk.

That is my suspicion as well, skb->truesize is updated somewhere
but sk->sk_wmem_alloc isn't, so leads to this bug.

>
> This is why we need to call skb_orphan(), as we did for IPv4 in
> 8282f27449bf15548


But I doubt skb_orphan() is the solution here, shouldn't we just
update sk->sk_wmem_alloc with skb->truesize changes?


Re: net: use-after-free in neigh_timer_handler/sock_wfree

2017-03-01 Thread Cong Wang
On Wed, Mar 1, 2017 at 1:54 PM, Eric Dumazet  wrote:
> On Wed, Mar 1, 2017 at 1:43 PM, Cong Wang  wrote:
>>>
>>> This one looks very similar to a previous one:
>>> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>>>
>>> Both happen on raw v6 sockets.
>>>
>>> For me, it seems the sk refcnt is not correct, skb should still hold
>>> a refcnt so it should not be freed before kfree_skb() in a timer
>>> handler...
>>
>> More precisely, after this commit:
>>
>> commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
>> Author: Eric Dumazet 
>> Date:   Thu Jun 11 02:55:43 2009 -0700
>>
>> net: No more expensive sock_hold()/sock_put() on each tx
>>
>> we don't take (old) refcnt any more on TX path, sk_wmem_alloc
>> is the new refcnt. ;)
>
> So the bug is that skb->truesize is mangled by reassembly unit,
> while sbk->sk is tracking sk_wmem_alloc changes in order
> to decide when it is safe to free sk.

That is my suspicion as well, skb->truesize is updated somewhere
but sk->sk_wmem_alloc isn't, so leads to this bug.

>
> This is why we need to call skb_orphan(), as we did for IPv4 in
> 8282f27449bf15548


But I doubt skb_orphan() is the solution here, shouldn't we just
update sk->sk_wmem_alloc with skb->truesize changes?


Re: net: use-after-free in neigh_timer_handler/sock_wfree

2017-03-01 Thread Cong Wang
On Wed, Mar 1, 2017 at 1:24 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
> On Wed, Mar 1, 2017 at 11:27 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
>> Hello,
>>
>> I am seeing the following use-after-free report while running
>> syzkaller fuzzer on
>> linux-next/3e7350242c6f3d41d28e03418bd781cc1b7bad5f:
>>
>> ==
>> BUG: KASAN: use-after-free in constant_test_bit
>> arch/x86/include/asm/bitops.h:324 [inline] at addr 8801c56d5460
>> BUG: KASAN: use-after-free in sock_flag include/net/sock.h:789
>> [inline] at addr 8801c56d5460
>> BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
>> net/core/sock.c:1630 at addr 8801c56d5460
>> Read of size 8 by task syz-fuzzer/3261
>> CPU: 0 PID: 3261 Comm: syz-fuzzer Not tainted 4.10.0-next-20170224+ #1
>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>> BIOS Google 01/01/2011
>> Call Trace:
>>  
>>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>>  constant_test_bit arch/x86/include/asm/bitops.h:324 [inline]
>>  sock_flag include/net/sock.h:789 [inline]
>>  sock_wfree+0x118/0x120 net/core/sock.c:1630
>>  skb_release_head_state+0xfc/0x200 net/core/skbuff.c:654
>>  skb_release_all+0x15/0x60 net/core/skbuff.c:667
>>  __kfree_skb+0x15/0x20 net/core/skbuff.c:683
>>  kfree_skb+0x16e/0x4c0 net/core/skbuff.c:704
>>  ndisc_error_report+0xbb/0x190 net/ipv6/ndisc.c:683
>>  neigh_invalidate+0x23e/0x570 net/core/neighbour.c:848
>>  neigh_timer_handler+0x4e7/0x1140 net/core/neighbour.c:933
>>  call_timer_fn+0x241/0x820 kernel/time/timer.c:1266
>>  expire_timers kernel/time/timer.c:1305 [inline]
>>  __run_timers+0x960/0xcf0 kernel/time/timer.c:1599
>>  run_timer_softirq+0x21/0x80 kernel/time/timer.c:1612
>>  __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
>>  invoke_softirq kernel/softirq.c:364 [inline]
>>  irq_exit+0x1cc/0x200 kernel/softirq.c:405
>>  exiting_irq arch/x86/include/asm/apic.h:658 [inline]
>>  smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
>>  apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
>
> This one looks very similar to a previous one:
> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>
> Both happen on raw v6 sockets.
>
> For me, it seems the sk refcnt is not correct, skb should still hold
> a refcnt so it should not be freed before kfree_skb() in a timer
> handler...

More precisely, after this commit:

commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
Author: Eric Dumazet <eric.duma...@gmail.com>
Date:   Thu Jun 11 02:55:43 2009 -0700

net: No more expensive sock_hold()/sock_put() on each tx

we don't take (old) refcnt any more on TX path, sk_wmem_alloc
is the new refcnt. ;)


Re: net: use-after-free in neigh_timer_handler/sock_wfree

2017-03-01 Thread Cong Wang
On Wed, Mar 1, 2017 at 1:24 PM, Cong Wang  wrote:
> On Wed, Mar 1, 2017 at 11:27 AM, Dmitry Vyukov  wrote:
>> Hello,
>>
>> I am seeing the following use-after-free report while running
>> syzkaller fuzzer on
>> linux-next/3e7350242c6f3d41d28e03418bd781cc1b7bad5f:
>>
>> ==
>> BUG: KASAN: use-after-free in constant_test_bit
>> arch/x86/include/asm/bitops.h:324 [inline] at addr 8801c56d5460
>> BUG: KASAN: use-after-free in sock_flag include/net/sock.h:789
>> [inline] at addr 8801c56d5460
>> BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
>> net/core/sock.c:1630 at addr 8801c56d5460
>> Read of size 8 by task syz-fuzzer/3261
>> CPU: 0 PID: 3261 Comm: syz-fuzzer Not tainted 4.10.0-next-20170224+ #1
>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>> BIOS Google 01/01/2011
>> Call Trace:
>>  
>>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>>  constant_test_bit arch/x86/include/asm/bitops.h:324 [inline]
>>  sock_flag include/net/sock.h:789 [inline]
>>  sock_wfree+0x118/0x120 net/core/sock.c:1630
>>  skb_release_head_state+0xfc/0x200 net/core/skbuff.c:654
>>  skb_release_all+0x15/0x60 net/core/skbuff.c:667
>>  __kfree_skb+0x15/0x20 net/core/skbuff.c:683
>>  kfree_skb+0x16e/0x4c0 net/core/skbuff.c:704
>>  ndisc_error_report+0xbb/0x190 net/ipv6/ndisc.c:683
>>  neigh_invalidate+0x23e/0x570 net/core/neighbour.c:848
>>  neigh_timer_handler+0x4e7/0x1140 net/core/neighbour.c:933
>>  call_timer_fn+0x241/0x820 kernel/time/timer.c:1266
>>  expire_timers kernel/time/timer.c:1305 [inline]
>>  __run_timers+0x960/0xcf0 kernel/time/timer.c:1599
>>  run_timer_softirq+0x21/0x80 kernel/time/timer.c:1612
>>  __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
>>  invoke_softirq kernel/softirq.c:364 [inline]
>>  irq_exit+0x1cc/0x200 kernel/softirq.c:405
>>  exiting_irq arch/x86/include/asm/apic.h:658 [inline]
>>  smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
>>  apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
>
> This one looks very similar to a previous one:
> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>
> Both happen on raw v6 sockets.
>
> For me, it seems the sk refcnt is not correct, skb should still hold
> a refcnt so it should not be freed before kfree_skb() in a timer
> handler...

More precisely, after this commit:

commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
Author: Eric Dumazet 
Date:   Thu Jun 11 02:55:43 2009 -0700

net: No more expensive sock_hold()/sock_put() on each tx

we don't take (old) refcnt any more on TX path, sk_wmem_alloc
is the new refcnt. ;)


Re: net: use-after-free in neigh_timer_handler/sock_wfree

2017-03-01 Thread Cong Wang
On Wed, Mar 1, 2017 at 11:27 AM, Dmitry Vyukov  wrote:
> Hello,
>
> I am seeing the following use-after-free report while running
> syzkaller fuzzer on
> linux-next/3e7350242c6f3d41d28e03418bd781cc1b7bad5f:
>
> ==
> BUG: KASAN: use-after-free in constant_test_bit
> arch/x86/include/asm/bitops.h:324 [inline] at addr 8801c56d5460
> BUG: KASAN: use-after-free in sock_flag include/net/sock.h:789
> [inline] at addr 8801c56d5460
> BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
> net/core/sock.c:1630 at addr 8801c56d5460
> Read of size 8 by task syz-fuzzer/3261
> CPU: 0 PID: 3261 Comm: syz-fuzzer Not tainted 4.10.0-next-20170224+ #1
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> Call Trace:
>  
>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>  constant_test_bit arch/x86/include/asm/bitops.h:324 [inline]
>  sock_flag include/net/sock.h:789 [inline]
>  sock_wfree+0x118/0x120 net/core/sock.c:1630
>  skb_release_head_state+0xfc/0x200 net/core/skbuff.c:654
>  skb_release_all+0x15/0x60 net/core/skbuff.c:667
>  __kfree_skb+0x15/0x20 net/core/skbuff.c:683
>  kfree_skb+0x16e/0x4c0 net/core/skbuff.c:704
>  ndisc_error_report+0xbb/0x190 net/ipv6/ndisc.c:683
>  neigh_invalidate+0x23e/0x570 net/core/neighbour.c:848
>  neigh_timer_handler+0x4e7/0x1140 net/core/neighbour.c:933
>  call_timer_fn+0x241/0x820 kernel/time/timer.c:1266
>  expire_timers kernel/time/timer.c:1305 [inline]
>  __run_timers+0x960/0xcf0 kernel/time/timer.c:1599
>  run_timer_softirq+0x21/0x80 kernel/time/timer.c:1612
>  __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
>  invoke_softirq kernel/softirq.c:364 [inline]
>  irq_exit+0x1cc/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:658 [inline]
>  smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
>  apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707

This one looks very similar to a previous one:
https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ

Both happen on raw v6 sockets.

For me, it seems the sk refcnt is not correct, skb should still hold
a refcnt so it should not be freed before kfree_skb() in a timer
handler...



> RIP: 0033:0x46a7c3
> RSP: 002b:00c83e2d5180 EFLAGS: 0202 ORIG_RAX: ff10
> RAX:  RBX: 0046a7b0 RCX: 00c820471200
> RDX: 0020 RSI: 00c839e1bba0 RDI: 00c83e2d5190
> RBP: 0002 R08: 0002 R09: 0073
> R10: 00c839a31b03 R11: 00c839e1bbf8 R12: 
> R13:  R14: 0010 R15: 01263e90
>  
> Object at 8801c56d5400, in cache RAWv6 size: 1480
> Allocated:
> PID = 12540
>  kmem_cache_alloc+0x102/0x680 mm/slab.c:3568
>  sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1332
>  sk_alloc+0x8c/0x470 net/core/sock.c:1394
>  inet6_create+0x44d/0x1140 net/ipv6/af_inet6.c:183
>  __sock_create+0x4e4/0x870 net/socket.c:1197
>  sock_create net/socket.c:1237 [inline]
>  SYSC_socket net/socket.c:1267 [inline]
>  SyS_socket+0xf9/0x230 net/socket.c:1247
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> Freed:
> PID = 12572
>  kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:580
>  __cache_free mm/slab.c:3510 [inline]
>  kmem_cache_free+0x71/0x240 mm/slab.c:3770
>  sk_prot_free net/core/sock.c:1375 [inline]
>  __sk_destruct+0x487/0x6b0 net/core/sock.c:1450
>  sk_destruct+0x47/0x80 net/core/sock.c:1458
>  __sk_free+0x57/0x230 net/core/sock.c:1466
>  sk_free+0x23/0x30 net/core/sock.c:1477
>  sock_put include/net/sock.h:1644 [inline]
>  sk_common_release+0x3bf/0x5e0 net/core/sock.c:2781
>  rawv6_close+0x4c/0x80 net/ipv6/raw.c:1218
>  inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
>  sock_release+0x8d/0x1e0 net/socket.c:597
>  sock_close+0x16/0x20 net/socket.c:1061
>  __fput+0x332/0x7f0 fs/file_table.c:208
>  fput+0x15/0x20 fs/file_table.c:244
>  task_work_run+0x18a/0x260 kernel/task_work.c:116
>  exit_task_work include/linux/task_work.h:21 [inline]
>  do_exit+0x1956/0x2900 kernel/exit.c:873
>  do_group_exit+0x149/0x420 kernel/exit.c:977
>  get_signal+0x7e0/0x1820 kernel/signal.c:2313
>  do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:807
>  exit_to_usermode_loop+0x200/0x2a0 arch/x86/entry/common.c:156
>  prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline]
>  syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
>  entry_SYSCALL_64_fastpath+0xc0/0xc2


Re: net: use-after-free in neigh_timer_handler/sock_wfree

2017-03-01 Thread Cong Wang
On Wed, Mar 1, 2017 at 11:27 AM, Dmitry Vyukov  wrote:
> Hello,
>
> I am seeing the following use-after-free report while running
> syzkaller fuzzer on
> linux-next/3e7350242c6f3d41d28e03418bd781cc1b7bad5f:
>
> ==
> BUG: KASAN: use-after-free in constant_test_bit
> arch/x86/include/asm/bitops.h:324 [inline] at addr 8801c56d5460
> BUG: KASAN: use-after-free in sock_flag include/net/sock.h:789
> [inline] at addr 8801c56d5460
> BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
> net/core/sock.c:1630 at addr 8801c56d5460
> Read of size 8 by task syz-fuzzer/3261
> CPU: 0 PID: 3261 Comm: syz-fuzzer Not tainted 4.10.0-next-20170224+ #1
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> Call Trace:
>  
>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>  constant_test_bit arch/x86/include/asm/bitops.h:324 [inline]
>  sock_flag include/net/sock.h:789 [inline]
>  sock_wfree+0x118/0x120 net/core/sock.c:1630
>  skb_release_head_state+0xfc/0x200 net/core/skbuff.c:654
>  skb_release_all+0x15/0x60 net/core/skbuff.c:667
>  __kfree_skb+0x15/0x20 net/core/skbuff.c:683
>  kfree_skb+0x16e/0x4c0 net/core/skbuff.c:704
>  ndisc_error_report+0xbb/0x190 net/ipv6/ndisc.c:683
>  neigh_invalidate+0x23e/0x570 net/core/neighbour.c:848
>  neigh_timer_handler+0x4e7/0x1140 net/core/neighbour.c:933
>  call_timer_fn+0x241/0x820 kernel/time/timer.c:1266
>  expire_timers kernel/time/timer.c:1305 [inline]
>  __run_timers+0x960/0xcf0 kernel/time/timer.c:1599
>  run_timer_softirq+0x21/0x80 kernel/time/timer.c:1612
>  __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
>  invoke_softirq kernel/softirq.c:364 [inline]
>  irq_exit+0x1cc/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:658 [inline]
>  smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
>  apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707

This one looks very similar to a previous one:
https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ

Both happen on raw v6 sockets.

For me, it seems the sk refcnt is not correct, skb should still hold
a refcnt so it should not be freed before kfree_skb() in a timer
handler...



> RIP: 0033:0x46a7c3
> RSP: 002b:00c83e2d5180 EFLAGS: 0202 ORIG_RAX: ff10
> RAX:  RBX: 0046a7b0 RCX: 00c820471200
> RDX: 0020 RSI: 00c839e1bba0 RDI: 00c83e2d5190
> RBP: 0002 R08: 0002 R09: 0073
> R10: 00c839a31b03 R11: 00c839e1bbf8 R12: 
> R13:  R14: 0010 R15: 01263e90
>  
> Object at 8801c56d5400, in cache RAWv6 size: 1480
> Allocated:
> PID = 12540
>  kmem_cache_alloc+0x102/0x680 mm/slab.c:3568
>  sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1332
>  sk_alloc+0x8c/0x470 net/core/sock.c:1394
>  inet6_create+0x44d/0x1140 net/ipv6/af_inet6.c:183
>  __sock_create+0x4e4/0x870 net/socket.c:1197
>  sock_create net/socket.c:1237 [inline]
>  SYSC_socket net/socket.c:1267 [inline]
>  SyS_socket+0xf9/0x230 net/socket.c:1247
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> Freed:
> PID = 12572
>  kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:580
>  __cache_free mm/slab.c:3510 [inline]
>  kmem_cache_free+0x71/0x240 mm/slab.c:3770
>  sk_prot_free net/core/sock.c:1375 [inline]
>  __sk_destruct+0x487/0x6b0 net/core/sock.c:1450
>  sk_destruct+0x47/0x80 net/core/sock.c:1458
>  __sk_free+0x57/0x230 net/core/sock.c:1466
>  sk_free+0x23/0x30 net/core/sock.c:1477
>  sock_put include/net/sock.h:1644 [inline]
>  sk_common_release+0x3bf/0x5e0 net/core/sock.c:2781
>  rawv6_close+0x4c/0x80 net/ipv6/raw.c:1218
>  inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
>  sock_release+0x8d/0x1e0 net/socket.c:597
>  sock_close+0x16/0x20 net/socket.c:1061
>  __fput+0x332/0x7f0 fs/file_table.c:208
>  fput+0x15/0x20 fs/file_table.c:244
>  task_work_run+0x18a/0x260 kernel/task_work.c:116
>  exit_task_work include/linux/task_work.h:21 [inline]
>  do_exit+0x1956/0x2900 kernel/exit.c:873
>  do_group_exit+0x149/0x420 kernel/exit.c:977
>  get_signal+0x7e0/0x1820 kernel/signal.c:2313
>  do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:807
>  exit_to_usermode_loop+0x200/0x2a0 arch/x86/entry/common.c:156
>  prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline]
>  syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
>  entry_SYSCALL_64_fastpath+0xc0/0xc2


Re: net/ipv4: deadlock in ip_ra_control

2017-03-01 Thread Cong Wang
On Wed, Mar 1, 2017 at 2:44 AM, Dmitry Vyukov  wrote:
> Hello,
>
> I've got the following deadlock report while running syzkaller fuzzer
> on linux-next/51788aebe7cae79cb334ad50641347465fc188fd:
>
> ==
> [ INFO: possible circular locking dependency detected ]
> 4.10.0-next-20170301+ #1 Not tainted
> ---
> syz-executor1/3394 is trying to acquire lock:
>  (sk_lock-AF_INET){+.+.+.}, at: [] lock_sock
> include/net/sock.h:1460 [inline]
>  (sk_lock-AF_INET){+.+.+.}, at: []
> do_ip_setsockopt.isra.12+0x21c/0x3540 net/ipv4/ip_sockglue.c:652
>
> but task is already holding lock:
>  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
> net/core/rtnetlink.c:70
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (rtnl_mutex){+.+.+.}:
>validate_chain kernel/locking/lockdep.c:2265 [inline]
>__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
>lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
>__mutex_lock_common kernel/locking/mutex.c:754 [inline]
>__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:891
>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:906
>rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
>mrtsock_destruct+0x86/0x2c0 net/ipv4/ipmr.c:1281
>ip_ra_control+0x459/0x600 net/ipv4/ip_sockglue.c:372
>do_ip_setsockopt.isra.12+0x1064/0x3540 net/ipv4/ip_sockglue.c:1161
>ip_setsockopt+0x3a/0xb0 net/ipv4/ip_sockglue.c:1264
>raw_setsockopt+0xb7/0xd0 net/ipv4/raw.c:839
>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2725
>SYSC_setsockopt net/socket.c:1786 [inline]
>SyS_setsockopt+0x25c/0x390 net/socket.c:1765
>entry_SYSCALL_64_fastpath+0x1f/0xc2
>
> -> #0 (sk_lock-AF_INET){+.+.+.}:
>check_prev_add kernel/locking/lockdep.c:1828 [inline]
>check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938
>validate_chain kernel/locking/lockdep.c:2265 [inline]
>__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
>lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
>lock_sock_nested+0xcb/0x120 net/core/sock.c:2530
>lock_sock include/net/sock.h:1460 [inline]
>do_ip_setsockopt.isra.12+0x21c/0x3540 net/ipv4/ip_sockglue.c:652
>ip_setsockopt+0x3a/0xb0 net/ipv4/ip_sockglue.c:1264
>tcp_setsockopt+0x82/0xd0 net/ipv4/tcp.c:2721
>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2725
>SYSC_setsockopt net/socket.c:1786 [inline]
>SyS_setsockopt+0x25c/0x390 net/socket.c:1765
>entry_SYSCALL_64_fastpath+0x1f/0xc2
>

Please try the attached patch (compile only).

Thanks.
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index ebd953b..bda318a 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -591,6 +591,7 @@ static bool setsockopt_needs_rtnl(int optname)
case MCAST_LEAVE_GROUP:
case MCAST_LEAVE_SOURCE_GROUP:
case MCAST_UNBLOCK_SOURCE:
+   case IP_ROUTER_ALERT:
return true;
}
return false;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index beacd02..932321b 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1278,7 +1278,7 @@ static void mrtsock_destruct(struct sock *sk)
struct net *net = sock_net(sk);
struct mr_table *mrt;
 
-   rtnl_lock();
+   ASSERT_RTNL();
ipmr_for_each_table(mrt, net) {
if (sk == rtnl_dereference(mrt->mroute_sk)) {
IPV4_DEVCONF_ALL(net, MC_FORWARDING)--;
@@ -1289,7 +1289,6 @@ static void mrtsock_destruct(struct sock *sk)
mroute_clean_tables(mrt, false);
}
}
-   rtnl_unlock();
 }
 
 /* Socket options and virtual interface manipulation. The whole
@@ -1353,13 +1352,8 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, 
char __user *optval,
if (sk != rcu_access_pointer(mrt->mroute_sk)) {
ret = -EACCES;
} else {
-   /* We need to unlock here because mrtsock_destruct takes
-* care of rtnl itself and we can't change that due to
-* the IP_ROUTER_ALERT setsockopt which runs without it.
-*/
-   rtnl_unlock();
ret = ip_ra_control(sk, 0, NULL);
-   goto out;
+   goto out_unlock;
}
break;
case MRT_ADD_VIF:
@@ -1470,7 +1464,6 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, 
char __user *optval,
}
 out_unlock:
rtnl_unlock();
-out:
return ret;
 }
 


Re: net/ipv4: deadlock in ip_ra_control

2017-03-01 Thread Cong Wang
On Wed, Mar 1, 2017 at 2:44 AM, Dmitry Vyukov  wrote:
> Hello,
>
> I've got the following deadlock report while running syzkaller fuzzer
> on linux-next/51788aebe7cae79cb334ad50641347465fc188fd:
>
> ==
> [ INFO: possible circular locking dependency detected ]
> 4.10.0-next-20170301+ #1 Not tainted
> ---
> syz-executor1/3394 is trying to acquire lock:
>  (sk_lock-AF_INET){+.+.+.}, at: [] lock_sock
> include/net/sock.h:1460 [inline]
>  (sk_lock-AF_INET){+.+.+.}, at: []
> do_ip_setsockopt.isra.12+0x21c/0x3540 net/ipv4/ip_sockglue.c:652
>
> but task is already holding lock:
>  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
> net/core/rtnetlink.c:70
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (rtnl_mutex){+.+.+.}:
>validate_chain kernel/locking/lockdep.c:2265 [inline]
>__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
>lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
>__mutex_lock_common kernel/locking/mutex.c:754 [inline]
>__mutex_lock+0x172/0x1730 kernel/locking/mutex.c:891
>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:906
>rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
>mrtsock_destruct+0x86/0x2c0 net/ipv4/ipmr.c:1281
>ip_ra_control+0x459/0x600 net/ipv4/ip_sockglue.c:372
>do_ip_setsockopt.isra.12+0x1064/0x3540 net/ipv4/ip_sockglue.c:1161
>ip_setsockopt+0x3a/0xb0 net/ipv4/ip_sockglue.c:1264
>raw_setsockopt+0xb7/0xd0 net/ipv4/raw.c:839
>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2725
>SYSC_setsockopt net/socket.c:1786 [inline]
>SyS_setsockopt+0x25c/0x390 net/socket.c:1765
>entry_SYSCALL_64_fastpath+0x1f/0xc2
>
> -> #0 (sk_lock-AF_INET){+.+.+.}:
>check_prev_add kernel/locking/lockdep.c:1828 [inline]
>check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938
>validate_chain kernel/locking/lockdep.c:2265 [inline]
>__lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
>lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
>lock_sock_nested+0xcb/0x120 net/core/sock.c:2530
>lock_sock include/net/sock.h:1460 [inline]
>do_ip_setsockopt.isra.12+0x21c/0x3540 net/ipv4/ip_sockglue.c:652
>ip_setsockopt+0x3a/0xb0 net/ipv4/ip_sockglue.c:1264
>tcp_setsockopt+0x82/0xd0 net/ipv4/tcp.c:2721
>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2725
>SYSC_setsockopt net/socket.c:1786 [inline]
>SyS_setsockopt+0x25c/0x390 net/socket.c:1765
>entry_SYSCALL_64_fastpath+0x1f/0xc2
>

Please try the attached patch (compile only).

Thanks.
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index ebd953b..bda318a 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -591,6 +591,7 @@ static bool setsockopt_needs_rtnl(int optname)
case MCAST_LEAVE_GROUP:
case MCAST_LEAVE_SOURCE_GROUP:
case MCAST_UNBLOCK_SOURCE:
+   case IP_ROUTER_ALERT:
return true;
}
return false;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index beacd02..932321b 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1278,7 +1278,7 @@ static void mrtsock_destruct(struct sock *sk)
struct net *net = sock_net(sk);
struct mr_table *mrt;
 
-   rtnl_lock();
+   ASSERT_RTNL();
ipmr_for_each_table(mrt, net) {
if (sk == rtnl_dereference(mrt->mroute_sk)) {
IPV4_DEVCONF_ALL(net, MC_FORWARDING)--;
@@ -1289,7 +1289,6 @@ static void mrtsock_destruct(struct sock *sk)
mroute_clean_tables(mrt, false);
}
}
-   rtnl_unlock();
 }
 
 /* Socket options and virtual interface manipulation. The whole
@@ -1353,13 +1352,8 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, 
char __user *optval,
if (sk != rcu_access_pointer(mrt->mroute_sk)) {
ret = -EACCES;
} else {
-   /* We need to unlock here because mrtsock_destruct takes
-* care of rtnl itself and we can't change that due to
-* the IP_ROUTER_ALERT setsockopt which runs without it.
-*/
-   rtnl_unlock();
ret = ip_ra_control(sk, 0, NULL);
-   goto out;
+   goto out_unlock;
}
break;
case MRT_ADD_VIF:
@@ -1470,7 +1464,6 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, 
char __user *optval,
}
 out_unlock:
rtnl_unlock();
-out:
return ret;
 }
 


Re: net: GPF in rt6_nexthop_info

2017-02-28 Thread Cong Wang
On Tue, Feb 28, 2017 at 5:10 AM, Eric Dumazet  wrote:
> On Tue, 2017-02-28 at 12:34 +0100, Dmitry Vyukov wrote:
>> Hello,
>>
>> The following program triggers GPF in rt6_nexthop_info
>>
>> // autogenerated by syzkaller (http://github.com/google/syzkaller)
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> int main()
>> {
>>   unshare(CLONE_NEWNET);
>>   int fd = socket(PF_NETLINK, SOCK_DGRAM|SOCK_NONBLOCK, 0);
>>   const char* data ="\x22\x00\x00\x00\x1a\x00\x07\x00\x08\x09\x00\x0f"
>> "\x09\x00\x07\x00\x0a\xff\xfc\x03\x1b\x00\xa8\xc6"
>> "\x19\xf3\xff\xff\x05\x00\x10\x00\xbe\x45";
>>   write(fd, data, 34);
>>   return 0;
>> }
>>
>> general protection fault:  [#1] SMP KASAN
>> Modules linked in:
>> CPU: 2 PID: 2965 Comm: a.out Not tainted 4.10.0+ #229
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: 8800650fc100 task.stack: 880065378000
>> RIP: 0010:rt6_nexthop_info+0x278/0x3b0 net/ipv6/route.c:3325
>> RSP: 0018:88006537f350 EFLAGS: 00010203
>> RAX: dc00 RBX: 88006865da00 RCX: 
>> RDX:  RSI: 004a RDI: 0254
>> RBP: 88006537f3f0 R08: ed000cc55820 R09: ed000cc55820
>> R10: 0001 R11: ed000cc5581f R12: 11000ca6fe6d
>> R13: 8800662ac0d8 R14: 88006537f3c8 R15: dc00
>> FS:  00c39880() GS:88006d10() knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2: 004b20e0 CR3: 6cdb7000 CR4: 001406e0
>> Call Trace:
>>  rt6_fill_node.isra.61+0xea4/0x1780 net/ipv6/route.c:3513
>>  inet6_rtm_getroute+0x7da/0xce0 net/ipv6/route.c:3639
>>  rtnetlink_rcv_msg+0x609/0x860 net/core/rtnetlink.c:4104
>>  netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
>>  rtnetlink_rcv+0x2a/0x40 net/core/rtnetlink.c:4110
>>  netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
>>  netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
>>  netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
>>  sock_sendmsg_nosec net/socket.c:633 [inline]
>>  sock_sendmsg+0xca/0x110 net/socket.c:643
>>  sock_write_iter+0x326/0x600 net/socket.c:846
>>  new_sync_write fs/read_write.c:499 [inline]
>>  __vfs_write+0x483/0x740 fs/read_write.c:512
>>  vfs_write+0x187/0x530 fs/read_write.c:560
>>  SYSC_write fs/read_write.c:607 [inline]
>>  SyS_write+0xfb/0x230 fs/read_write.c:599
>>  entry_SYSCALL_64_fastpath+0x1f/0xc2
>> RIP: 0033:0x433ef0
>> RSP: 002b:7ffcea6171b8 EFLAGS: 0246 ORIG_RAX: 0001
>> RAX: ffda RBX: 00401730 RCX: 00433ef0
>> RDX: 0022 RSI: 00493548 RDI: 0003
>> RBP:  R08: 000b R09: 0004
>> R10: 000d R11: 0246 R12: 004002b0
>> R13: 7ffcea6172c8 R14: 0002 R15: 
>> Code: ff df 80 3c 01 00 0f 85 3c 01 00 00 48 8b 8b 58 01 00 00 48 b8
>> 00 00 00 00 00 fc ff df 48 8d b9 54 02 00 00 48 89 fe 48 c1 ee 03 <0f>
>> b6 34 06 48 89 f8 83 e0 07 83 c0 03 40 38 f0 7c 05 40 84 f6
>> RIP: rt6_nexthop_info+0x278/0x3b0 net/ipv6/route.c:3325 RSP: 88006537f350
>> ---[ end trace 6a59074c79adfb5f ]---
>>
>> On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570
>
> Not sure how you chose all these addresses on CC, while you forgot David
> Ahern. I doubt Yuchung is interested in fixing IPv6 bugs.
>
> David, rt->rt6i_idev can be NULL.

Probably we need to skip null entry again here:

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f54f426..25590d1 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3627,6 +3627,12 @@ static int inet6_rtm_getroute(struct sk_buff
*in_skb, struct nlmsghdr *nlh)
rt = (struct rt6_info *)ip6_route_output(net, NULL, );
}

+   if (rt == net->ipv6.ip6_null_entry) {
+   ip6_rt_put(rt);
+   err = -ENOENT;
+   goto errout;
+   }
+
skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
if (!skb) {
ip6_rt_put(rt);


Re: net: GPF in rt6_nexthop_info

2017-02-28 Thread Cong Wang
On Tue, Feb 28, 2017 at 5:10 AM, Eric Dumazet  wrote:
> On Tue, 2017-02-28 at 12:34 +0100, Dmitry Vyukov wrote:
>> Hello,
>>
>> The following program triggers GPF in rt6_nexthop_info
>>
>> // autogenerated by syzkaller (http://github.com/google/syzkaller)
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> int main()
>> {
>>   unshare(CLONE_NEWNET);
>>   int fd = socket(PF_NETLINK, SOCK_DGRAM|SOCK_NONBLOCK, 0);
>>   const char* data ="\x22\x00\x00\x00\x1a\x00\x07\x00\x08\x09\x00\x0f"
>> "\x09\x00\x07\x00\x0a\xff\xfc\x03\x1b\x00\xa8\xc6"
>> "\x19\xf3\xff\xff\x05\x00\x10\x00\xbe\x45";
>>   write(fd, data, 34);
>>   return 0;
>> }
>>
>> general protection fault:  [#1] SMP KASAN
>> Modules linked in:
>> CPU: 2 PID: 2965 Comm: a.out Not tainted 4.10.0+ #229
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: 8800650fc100 task.stack: 880065378000
>> RIP: 0010:rt6_nexthop_info+0x278/0x3b0 net/ipv6/route.c:3325
>> RSP: 0018:88006537f350 EFLAGS: 00010203
>> RAX: dc00 RBX: 88006865da00 RCX: 
>> RDX:  RSI: 004a RDI: 0254
>> RBP: 88006537f3f0 R08: ed000cc55820 R09: ed000cc55820
>> R10: 0001 R11: ed000cc5581f R12: 11000ca6fe6d
>> R13: 8800662ac0d8 R14: 88006537f3c8 R15: dc00
>> FS:  00c39880() GS:88006d10() knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2: 004b20e0 CR3: 6cdb7000 CR4: 001406e0
>> Call Trace:
>>  rt6_fill_node.isra.61+0xea4/0x1780 net/ipv6/route.c:3513
>>  inet6_rtm_getroute+0x7da/0xce0 net/ipv6/route.c:3639
>>  rtnetlink_rcv_msg+0x609/0x860 net/core/rtnetlink.c:4104
>>  netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
>>  rtnetlink_rcv+0x2a/0x40 net/core/rtnetlink.c:4110
>>  netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline]
>>  netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
>>  netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
>>  sock_sendmsg_nosec net/socket.c:633 [inline]
>>  sock_sendmsg+0xca/0x110 net/socket.c:643
>>  sock_write_iter+0x326/0x600 net/socket.c:846
>>  new_sync_write fs/read_write.c:499 [inline]
>>  __vfs_write+0x483/0x740 fs/read_write.c:512
>>  vfs_write+0x187/0x530 fs/read_write.c:560
>>  SYSC_write fs/read_write.c:607 [inline]
>>  SyS_write+0xfb/0x230 fs/read_write.c:599
>>  entry_SYSCALL_64_fastpath+0x1f/0xc2
>> RIP: 0033:0x433ef0
>> RSP: 002b:7ffcea6171b8 EFLAGS: 0246 ORIG_RAX: 0001
>> RAX: ffda RBX: 00401730 RCX: 00433ef0
>> RDX: 0022 RSI: 00493548 RDI: 0003
>> RBP:  R08: 000b R09: 0004
>> R10: 000d R11: 0246 R12: 004002b0
>> R13: 7ffcea6172c8 R14: 0002 R15: 
>> Code: ff df 80 3c 01 00 0f 85 3c 01 00 00 48 8b 8b 58 01 00 00 48 b8
>> 00 00 00 00 00 fc ff df 48 8d b9 54 02 00 00 48 89 fe 48 c1 ee 03 <0f>
>> b6 34 06 48 89 f8 83 e0 07 83 c0 03 40 38 f0 7c 05 40 84 f6
>> RIP: rt6_nexthop_info+0x278/0x3b0 net/ipv6/route.c:3325 RSP: 88006537f350
>> ---[ end trace 6a59074c79adfb5f ]---
>>
>> On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570
>
> Not sure how you chose all these addresses on CC, while you forgot David
> Ahern. I doubt Yuchung is interested in fixing IPv6 bugs.
>
> David, rt->rt6i_idev can be NULL.

Probably we need to skip null entry again here:

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f54f426..25590d1 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3627,6 +3627,12 @@ static int inet6_rtm_getroute(struct sk_buff
*in_skb, struct nlmsghdr *nlh)
rt = (struct rt6_info *)ip6_route_output(net, NULL, );
}

+   if (rt == net->ipv6.ip6_null_entry) {
+   ip6_rt_put(rt);
+   err = -ENOENT;
+   goto errout;
+   }
+
skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
if (!skb) {
ip6_rt_put(rt);


Re: net/ipv6: null-ptr-deref in ip6_route_del/lock_acquire

2017-02-27 Thread Cong Wang
On Mon, Feb 27, 2017 at 12:05 PM, Andrey Konovalov
<andreyk...@google.com> wrote:
> On Mon, Feb 27, 2017 at 8:59 PM, David Ahern <d...@cumulusnetworks.com> wrote:
>> On 2/27/17 10:11 AM, Cong Wang wrote:
>>> The attached patch fixes this crash, but I am not sure if it is the
>>> best way to fix this bug yet...
>>
>> I'll take a look. I can not reproduce this using route or ip, so the
>> fuzzer is doing something interesting.
>
> Hi David,
>
> I've attached a simple reproducer to the report, it doesn't work for you?

It works for me and I have verified the formal patch I sent.


Re: net/ipv6: null-ptr-deref in ip6_route_del/lock_acquire

2017-02-27 Thread Cong Wang
On Mon, Feb 27, 2017 at 12:05 PM, Andrey Konovalov
 wrote:
> On Mon, Feb 27, 2017 at 8:59 PM, David Ahern  wrote:
>> On 2/27/17 10:11 AM, Cong Wang wrote:
>>> The attached patch fixes this crash, but I am not sure if it is the
>>> best way to fix this bug yet...
>>
>> I'll take a look. I can not reproduce this using route or ip, so the
>> fuzzer is doing something interesting.
>
> Hi David,
>
> I've attached a simple reproducer to the report, it doesn't work for you?

It works for me and I have verified the formal patch I sent.


Re: net/ipv6: null-ptr-deref in ip6_route_del/lock_acquire

2017-02-27 Thread Cong Wang
On Mon, Feb 27, 2017 at 7:28 AM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570 (Feb 26).
>
> A reproducer and .config are attached.
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Modules linked in:
> CPU: 0 PID: 4045 Comm: a.out Not tainted 4.10.0+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88006b6bac00 task.stack: 88006a688000
> RIP: 0010:__lock_acquire+0xac4/0x3270 kernel/locking/lockdep.c:3224
> RSP: 0018:88006a68f250 EFLAGS: 00010006
> RAX: dc00 RBX: dc00 RCX: 
> RDX: 0006 RSI:  RDI: 11000d4d1ea4
> RBP: 88006a68f788 R08: 0001 R09: 
> R10: 0030 R11:  R12: 88006b6bac00
> R13:  R14: 86e64ec0 R15: 0001
> FS:  7fda492ff700() GS:88006ca0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 208c4000 CR3: 6a7e9000 CR4: 06f0
> Call Trace:
>  lock_acquire+0x241/0x580 kernel/locking/lockdep.c:3753
>  __raw_write_lock_bh ./include/linux/rwlock_api_smp.h:203
>  _raw_write_lock_bh+0x3a/0x50 kernel/locking/spinlock.c:319
>  __ip6_del_rt_siblings net/ipv6/route.c:2177
>  ip6_route_del+0x4dd/0xa70 net/ipv6/route.c:2257
>  ipv6_route_ioctl+0x62d/0x790 net/ipv6/route.c:2620
>  inet6_ioctl+0xef/0x1e0 net/ipv6/af_inet6.c:520
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x28f/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:43
>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:204

The attached patch fixes this crash, but I am not sure if it is the
best way to fix this bug yet...

Thanks.
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f54f426..3d1b260 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2216,12 +2216,13 @@ static int __ip6_del_rt_siblings(struct rt6_info *rt, 
struct fib6_config *cfg)
 
 static int ip6_route_del(struct fib6_config *cfg)
 {
+   struct net *net = cfg->fc_nlinfo.nl_net;
struct fib6_table *table;
struct fib6_node *fn;
struct rt6_info *rt;
int err = -ESRCH;
 
-   table = fib6_get_table(cfg->fc_nlinfo.nl_net, cfg->fc_table);
+   table = fib6_get_table(net, cfg->fc_table);
if (!table)
return err;
 
@@ -2247,6 +2248,8 @@ static int ip6_route_del(struct fib6_config *cfg)
continue;
if (cfg->fc_protocol && cfg->fc_protocol != 
rt->rt6i_protocol)
continue;
+   if (rt == net->ipv6.ip6_null_entry)
+   continue;
dst_hold(>dst);
read_unlock_bh(>tb6_lock);
 


Re: net/ipv6: null-ptr-deref in ip6_route_del/lock_acquire

2017-02-27 Thread Cong Wang
On Mon, Feb 27, 2017 at 7:28 AM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570 (Feb 26).
>
> A reproducer and .config are attached.
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Modules linked in:
> CPU: 0 PID: 4045 Comm: a.out Not tainted 4.10.0+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88006b6bac00 task.stack: 88006a688000
> RIP: 0010:__lock_acquire+0xac4/0x3270 kernel/locking/lockdep.c:3224
> RSP: 0018:88006a68f250 EFLAGS: 00010006
> RAX: dc00 RBX: dc00 RCX: 
> RDX: 0006 RSI:  RDI: 11000d4d1ea4
> RBP: 88006a68f788 R08: 0001 R09: 
> R10: 0030 R11:  R12: 88006b6bac00
> R13:  R14: 86e64ec0 R15: 0001
> FS:  7fda492ff700() GS:88006ca0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 208c4000 CR3: 6a7e9000 CR4: 06f0
> Call Trace:
>  lock_acquire+0x241/0x580 kernel/locking/lockdep.c:3753
>  __raw_write_lock_bh ./include/linux/rwlock_api_smp.h:203
>  _raw_write_lock_bh+0x3a/0x50 kernel/locking/spinlock.c:319
>  __ip6_del_rt_siblings net/ipv6/route.c:2177
>  ip6_route_del+0x4dd/0xa70 net/ipv6/route.c:2257
>  ipv6_route_ioctl+0x62d/0x790 net/ipv6/route.c:2620
>  inet6_ioctl+0xef/0x1e0 net/ipv6/af_inet6.c:520
>  sock_do_ioctl+0x65/0xb0 net/socket.c:895
>  sock_ioctl+0x28f/0x440 net/socket.c:993
>  vfs_ioctl fs/ioctl.c:43
>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:204

The attached patch fixes this crash, but I am not sure if it is the
best way to fix this bug yet...

Thanks.
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f54f426..3d1b260 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2216,12 +2216,13 @@ static int __ip6_del_rt_siblings(struct rt6_info *rt, 
struct fib6_config *cfg)
 
 static int ip6_route_del(struct fib6_config *cfg)
 {
+   struct net *net = cfg->fc_nlinfo.nl_net;
struct fib6_table *table;
struct fib6_node *fn;
struct rt6_info *rt;
int err = -ESRCH;
 
-   table = fib6_get_table(cfg->fc_nlinfo.nl_net, cfg->fc_table);
+   table = fib6_get_table(net, cfg->fc_table);
if (!table)
return err;
 
@@ -2247,6 +2248,8 @@ static int ip6_route_del(struct fib6_config *cfg)
continue;
if (cfg->fc_protocol && cfg->fc_protocol != 
rt->rt6i_protocol)
continue;
+   if (rt == net->ipv6.ip6_null_entry)
+   continue;
dst_hold(>dst);
read_unlock_bh(>tb6_lock);
 


Re: net: possible deadlock in skb_queue_tail

2017-02-20 Thread Cong Wang
On Mon, Feb 20, 2017 at 5:29 AM, Andrey Konovalov  wrote:
> other info that might help us debug this:
>
>  Possible unsafe locking scenario:
>
>CPU0CPU1
>
>   lock(&(>lock)->rlock);
>lock(&(>lock)->rlock#3);
>lock(&(>lock)->rlock);
>   lock(&(>lock)->rlock#3);
>

They are different types of sockets and different lists of skb's,
one is netlink socket the other is udp socket, so I don't think
we could have a deadlock in this scenario, we probably need to
explicitly mark them as different lockdep classes.


Re: net: possible deadlock in skb_queue_tail

2017-02-20 Thread Cong Wang
On Mon, Feb 20, 2017 at 5:29 AM, Andrey Konovalov  wrote:
> other info that might help us debug this:
>
>  Possible unsafe locking scenario:
>
>CPU0CPU1
>
>   lock(&(>lock)->rlock);
>lock(&(>lock)->rlock#3);
>lock(&(>lock)->rlock);
>   lock(&(>lock)->rlock#3);
>

They are different types of sockets and different lists of skb's,
one is netlink socket the other is udp socket, so I don't think
we could have a deadlock in this scenario, we probably need to
explicitly mark them as different lockdep classes.


Re: e1000_netpoll() , BUG: sleeping function called from invalid context

2017-02-17 Thread Cong Wang
On Fri, Feb 17, 2017 at 3:38 PM, Gabriel C  wrote:
>
> My card seems to use the e1000e driver which is buit-in..
>
> Anyway here an objdump -x :
>
> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/objdump-x_e1000.ko.txt
>

Found disable_hardirq() but not disable_irq().

Are you sure the kernel warning was emitted by this binary rather than
some old one?


Re: e1000_netpoll() , BUG: sleeping function called from invalid context

2017-02-17 Thread Cong Wang
On Fri, Feb 17, 2017 at 3:38 PM, Gabriel C  wrote:
>
> My card seems to use the e1000e driver which is buit-in..
>
> Anyway here an objdump -x :
>
> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/objdump-x_e1000.ko.txt
>

Found disable_hardirq() but not disable_irq().

Are you sure the kernel warning was emitted by this binary rather than
some old one?


Re: e1000_netpoll() , BUG: sleeping function called from invalid context

2017-02-17 Thread Cong Wang
On Fri, Feb 17, 2017 at 3:16 PM, Gabriel C <nix.or@gmail.com> wrote:
>
>
> On 17.02.2017 23:38, Cong Wang wrote:
>>
>> On Fri, Feb 17, 2017 at 1:20 PM, Gabriel C <nix.or@gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> while poking at a different issue I found the following on my logs :
>>>
>>> [85362.132770] BUG: sleeping function called from invalid context at
>>> kernel/irq/manage.c:110
>>> [85362.132771] in_atomic(): 1, irqs_disabled(): 1, pid: 1153, name:
>>> systemd-journal
>>> [85362.132772] no locks held by systemd-journal/1153.
>>> [85362.132772] irq event stamp: 60088359
>>> [85362.132777] hardirqs last  enabled at (60088359): []
>>> vprintk_emit+0x432/0x470
>>> [85362.132779] hardirqs last disabled at (60088358): []
>>> vprintk_emit+0x5c/0x470
>>> [85362.132782] softirqs last  enabled at (60088258): []
>>> __do_softirq+0x22d/0x290
>>> [85362.132784] softirqs last disabled at (60088233): []
>>> irq_exit+0x6a/0xd0
>>> [85362.132784] Preemption disabled at:
>>> [85362.132787] [] write_msg+0x4e/0xf0
>>> [85362.132790] CPU: 0 PID: 1153 Comm: systemd-journal Tainted: G
>>> I
>>> 4.10.0-rc8-debug-1-ga1015e374d94-dirty #5
>>> [85362.132791] Hardware name: FUJITSU  PRIMERGY
>>> TX200 S5 /D2709, BIOS 6.00 Rev. 1.14.2709
>>> 02/04/2013
>>> [85362.132792] Call Trace:
>>> [85362.132796]  dump_stack+0x86/0xc1
>>> [85362.132799]  ___might_sleep+0x213/0x230
>>> [85362.132801]  __might_sleep+0x6b/0x80
>>> [85362.132803]  synchronize_irq+0x33/0x90
>>> [85362.132805]  ? __irq_put_desc_unlock+0x19/0x40
>>> [85362.132807]  ? __disable_irq_nosync+0x4e/0x60
>>> [85362.132808]  disable_irq+0x17/0x20
>>
>>
>>
>> Hmm, your kernel base version is 4.10.0-rc8 but the symbols here
>> look like prior to my commit, because with my commit here should
>> be disable_hardirq() calling synchronize_hardirq().
>>
>> Did you revert it or make any local changes?
>
>
> The kernel is -rc8 with reverted d966564fcdc19e13eb6ba1fbe6b8101070339c3d
> +
> http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=202461e2f3c15dbfb05825d29ace0d20cdf55fa4
> + an debug patch from Thomas to find these goldfish issues.
> (
> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/goldfish-debug.patch
> )
>
> No other changes..

That is weird, the stack trace doesn't match the source code for some reason.
Can you objdump your e1000.ko module to see if that is true?


Re: e1000_netpoll() , BUG: sleeping function called from invalid context

2017-02-17 Thread Cong Wang
On Fri, Feb 17, 2017 at 3:16 PM, Gabriel C  wrote:
>
>
> On 17.02.2017 23:38, Cong Wang wrote:
>>
>> On Fri, Feb 17, 2017 at 1:20 PM, Gabriel C  wrote:
>>>
>>> Hi all,
>>>
>>> while poking at a different issue I found the following on my logs :
>>>
>>> [85362.132770] BUG: sleeping function called from invalid context at
>>> kernel/irq/manage.c:110
>>> [85362.132771] in_atomic(): 1, irqs_disabled(): 1, pid: 1153, name:
>>> systemd-journal
>>> [85362.132772] no locks held by systemd-journal/1153.
>>> [85362.132772] irq event stamp: 60088359
>>> [85362.132777] hardirqs last  enabled at (60088359): []
>>> vprintk_emit+0x432/0x470
>>> [85362.132779] hardirqs last disabled at (60088358): []
>>> vprintk_emit+0x5c/0x470
>>> [85362.132782] softirqs last  enabled at (60088258): []
>>> __do_softirq+0x22d/0x290
>>> [85362.132784] softirqs last disabled at (60088233): []
>>> irq_exit+0x6a/0xd0
>>> [85362.132784] Preemption disabled at:
>>> [85362.132787] [] write_msg+0x4e/0xf0
>>> [85362.132790] CPU: 0 PID: 1153 Comm: systemd-journal Tainted: G
>>> I
>>> 4.10.0-rc8-debug-1-ga1015e374d94-dirty #5
>>> [85362.132791] Hardware name: FUJITSU  PRIMERGY
>>> TX200 S5 /D2709, BIOS 6.00 Rev. 1.14.2709
>>> 02/04/2013
>>> [85362.132792] Call Trace:
>>> [85362.132796]  dump_stack+0x86/0xc1
>>> [85362.132799]  ___might_sleep+0x213/0x230
>>> [85362.132801]  __might_sleep+0x6b/0x80
>>> [85362.132803]  synchronize_irq+0x33/0x90
>>> [85362.132805]  ? __irq_put_desc_unlock+0x19/0x40
>>> [85362.132807]  ? __disable_irq_nosync+0x4e/0x60
>>> [85362.132808]  disable_irq+0x17/0x20
>>
>>
>>
>> Hmm, your kernel base version is 4.10.0-rc8 but the symbols here
>> look like prior to my commit, because with my commit here should
>> be disable_hardirq() calling synchronize_hardirq().
>>
>> Did you revert it or make any local changes?
>
>
> The kernel is -rc8 with reverted d966564fcdc19e13eb6ba1fbe6b8101070339c3d
> +
> http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=202461e2f3c15dbfb05825d29ace0d20cdf55fa4
> + an debug patch from Thomas to find these goldfish issues.
> (
> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/goldfish-debug.patch
> )
>
> No other changes..

That is weird, the stack trace doesn't match the source code for some reason.
Can you objdump your e1000.ko module to see if that is true?


Re: e1000_netpoll() , BUG: sleeping function called from invalid context

2017-02-17 Thread Cong Wang
On Fri, Feb 17, 2017 at 1:20 PM, Gabriel C  wrote:
> Hi all,
>
> while poking at a different issue I found the following on my logs :
>
> [85362.132770] BUG: sleeping function called from invalid context at
> kernel/irq/manage.c:110
> [85362.132771] in_atomic(): 1, irqs_disabled(): 1, pid: 1153, name:
> systemd-journal
> [85362.132772] no locks held by systemd-journal/1153.
> [85362.132772] irq event stamp: 60088359
> [85362.132777] hardirqs last  enabled at (60088359): []
> vprintk_emit+0x432/0x470
> [85362.132779] hardirqs last disabled at (60088358): []
> vprintk_emit+0x5c/0x470
> [85362.132782] softirqs last  enabled at (60088258): []
> __do_softirq+0x22d/0x290
> [85362.132784] softirqs last disabled at (60088233): []
> irq_exit+0x6a/0xd0
> [85362.132784] Preemption disabled at:
> [85362.132787] [] write_msg+0x4e/0xf0
> [85362.132790] CPU: 0 PID: 1153 Comm: systemd-journal Tainted: G  I
> 4.10.0-rc8-debug-1-ga1015e374d94-dirty #5
> [85362.132791] Hardware name: FUJITSU  PRIMERGY
> TX200 S5 /D2709, BIOS 6.00 Rev. 1.14.2709
> 02/04/2013
> [85362.132792] Call Trace:
> [85362.132796]  dump_stack+0x86/0xc1
> [85362.132799]  ___might_sleep+0x213/0x230
> [85362.132801]  __might_sleep+0x6b/0x80
> [85362.132803]  synchronize_irq+0x33/0x90
> [85362.132805]  ? __irq_put_desc_unlock+0x19/0x40
> [85362.132807]  ? __disable_irq_nosync+0x4e/0x60
> [85362.132808]  disable_irq+0x17/0x20


Hmm, your kernel base version is 4.10.0-rc8 but the symbols here
look like prior to my commit, because with my commit here should
be disable_hardirq() calling synchronize_hardirq().

Did you revert it or make any local changes?


> [85362.132810]  e1000_netpoll+0x3d/0x110
> [85362.132813]  netpoll_poll_dev+0xa0/0x170
> [85362.132814]  netpoll_send_skb_on_dev+0x1ab/0x2b0
> [85362.132816]  netpoll_send_udp+0x417/0x450
> [85362.132817]  write_msg+0xdb/0xf0
> [85362.132819]  console_unlock+0x2e5/0x430
> [85362.132821]  ? __down_trylock_console_sem+0x41/0x60
> [85362.132822]  vprintk_emit+0x456/0x470
> [85362.132825]  printk_emit+0x2e/0x36
> [85362.132827]  ? simple_strtoull+0x2c/0x50
> [85362.132829]  devkmsg_write+0x115/0x160
> [85362.132831]  do_iter_readv_writev+0xd8/0x100
> [85362.132833]  do_readv_writev+0xdf/0x220
> [85362.132835]  ? __might_fault+0x37/0x90
> [85362.132838]  ? entry_SYSCALL_64_fastpath+0x5/0xc2
> [85362.132839]  vfs_writev+0x3a/0x50
> [85362.132841]  do_writev+0x4c/0xd0
> [85362.132842]  SyS_writev+0xb/0x10
> [85362.132843]  entry_SYSCALL_64_fastpath+0x1f/0xc2
> [85362.132845] RIP: 0033:0x7fc6d305c46d
> [85362.132846] RSP: 002b:7ffca94161e0 EFLAGS: 0293 ORIG_RAX:
> 0014
> [85362.132847] RAX: ffda RBX: 813afad3 RCX:
> 7fc6d305c46d
> [85362.132848] RDX: 0005 RSI: 7ffca94162f0 RDI:
> 0006
> [85362.132849] RBP: c9000d4c3f88 R08:  R09:
> 0008
> [85362.132850] R10: 0069 R11: 0293 R12:
> 7ffca94162f0
> [85362.132851] R13: 55768b19c920 R14: 7ffca9416330 R15:
> 7ffca94163c0
> [85362.132853]  ? __this_cpu_preempt_check+0x13/0x20
>
>
> Seems to be introduced by :
>
> commit 311191297125156319be8f86d546ea1c569f7e95
> Author: WANG Cong 
> Date:   Sat Dec 10 14:22:42 2016 -0800
>
> e1000: use disable_hardirq() for e1000_netpoll()
>
> ...
>
> The used config can be found there:
>
> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/config-rcx
>
>
> Regards,
>
> Gabriel C


Re: e1000_netpoll() , BUG: sleeping function called from invalid context

2017-02-17 Thread Cong Wang
On Fri, Feb 17, 2017 at 1:20 PM, Gabriel C  wrote:
> Hi all,
>
> while poking at a different issue I found the following on my logs :
>
> [85362.132770] BUG: sleeping function called from invalid context at
> kernel/irq/manage.c:110
> [85362.132771] in_atomic(): 1, irqs_disabled(): 1, pid: 1153, name:
> systemd-journal
> [85362.132772] no locks held by systemd-journal/1153.
> [85362.132772] irq event stamp: 60088359
> [85362.132777] hardirqs last  enabled at (60088359): []
> vprintk_emit+0x432/0x470
> [85362.132779] hardirqs last disabled at (60088358): []
> vprintk_emit+0x5c/0x470
> [85362.132782] softirqs last  enabled at (60088258): []
> __do_softirq+0x22d/0x290
> [85362.132784] softirqs last disabled at (60088233): []
> irq_exit+0x6a/0xd0
> [85362.132784] Preemption disabled at:
> [85362.132787] [] write_msg+0x4e/0xf0
> [85362.132790] CPU: 0 PID: 1153 Comm: systemd-journal Tainted: G  I
> 4.10.0-rc8-debug-1-ga1015e374d94-dirty #5
> [85362.132791] Hardware name: FUJITSU  PRIMERGY
> TX200 S5 /D2709, BIOS 6.00 Rev. 1.14.2709
> 02/04/2013
> [85362.132792] Call Trace:
> [85362.132796]  dump_stack+0x86/0xc1
> [85362.132799]  ___might_sleep+0x213/0x230
> [85362.132801]  __might_sleep+0x6b/0x80
> [85362.132803]  synchronize_irq+0x33/0x90
> [85362.132805]  ? __irq_put_desc_unlock+0x19/0x40
> [85362.132807]  ? __disable_irq_nosync+0x4e/0x60
> [85362.132808]  disable_irq+0x17/0x20


Hmm, your kernel base version is 4.10.0-rc8 but the symbols here
look like prior to my commit, because with my commit here should
be disable_hardirq() calling synchronize_hardirq().

Did you revert it or make any local changes?


> [85362.132810]  e1000_netpoll+0x3d/0x110
> [85362.132813]  netpoll_poll_dev+0xa0/0x170
> [85362.132814]  netpoll_send_skb_on_dev+0x1ab/0x2b0
> [85362.132816]  netpoll_send_udp+0x417/0x450
> [85362.132817]  write_msg+0xdb/0xf0
> [85362.132819]  console_unlock+0x2e5/0x430
> [85362.132821]  ? __down_trylock_console_sem+0x41/0x60
> [85362.132822]  vprintk_emit+0x456/0x470
> [85362.132825]  printk_emit+0x2e/0x36
> [85362.132827]  ? simple_strtoull+0x2c/0x50
> [85362.132829]  devkmsg_write+0x115/0x160
> [85362.132831]  do_iter_readv_writev+0xd8/0x100
> [85362.132833]  do_readv_writev+0xdf/0x220
> [85362.132835]  ? __might_fault+0x37/0x90
> [85362.132838]  ? entry_SYSCALL_64_fastpath+0x5/0xc2
> [85362.132839]  vfs_writev+0x3a/0x50
> [85362.132841]  do_writev+0x4c/0xd0
> [85362.132842]  SyS_writev+0xb/0x10
> [85362.132843]  entry_SYSCALL_64_fastpath+0x1f/0xc2
> [85362.132845] RIP: 0033:0x7fc6d305c46d
> [85362.132846] RSP: 002b:7ffca94161e0 EFLAGS: 0293 ORIG_RAX:
> 0014
> [85362.132847] RAX: ffda RBX: 813afad3 RCX:
> 7fc6d305c46d
> [85362.132848] RDX: 0005 RSI: 7ffca94162f0 RDI:
> 0006
> [85362.132849] RBP: c9000d4c3f88 R08:  R09:
> 0008
> [85362.132850] R10: 0069 R11: 0293 R12:
> 7ffca94162f0
> [85362.132851] R13: 55768b19c920 R14: 7ffca9416330 R15:
> 7ffca94163c0
> [85362.132853]  ? __this_cpu_preempt_check+0x13/0x20
>
>
> Seems to be introduced by :
>
> commit 311191297125156319be8f86d546ea1c569f7e95
> Author: WANG Cong 
> Date:   Sat Dec 10 14:22:42 2016 -0800
>
> e1000: use disable_hardirq() for e1000_netpoll()
>
> ...
>
> The used config can be found there:
>
> http://ftp.frugalware.org/pub/other/people/crazy/kernel/t/config-rcx
>
>
> Regards,
>
> Gabriel C


Re: net: use-after-free in tw_timer_handler

2017-02-17 Thread Cong Wang
On Fri, Feb 17, 2017 at 12:36 PM, Dmitry Vyukov <dvyu...@google.com> wrote:
> On Fri, Feb 17, 2017 at 7:51 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>>>>>>
>>>>>> This code was changed a long time ago :
>>>>>>
>>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>>>>>>
>>>>>> So I suspect a recent patch broke the logic.
>>>>>>
>>>>>> You might start a bisection :
>>>>>>
>>>>>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>>>>>
>>>>>
>>>>> It happens with too low rate for bisecting (few times per day). I
>>>>> could add some additional checks into code, but I don't know what
>>>>> checks could be useful.
>>>>
>>>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
>>>> we are able to help.
>>>
>>>
>>> There are also chances that the problem is older.
>>>
>>> Looking at the code, this part of inet_twsk_purge looks fishy:
>>>
>>> 285 if (unlikely((tw->tw_family != family) ||
>>> 286  
>>> atomic_read(_net(tw)->count))) {
>>>
>>> It uses net->count == 0 check to find the right sockets. But what if
>>> there are several nets with count == 0 in flight, can't there be
>>> several inet_twsk_purge calls running concurrently freeing each other
>>> sockets? If so it looks like inet_twsk_purge can call
>>> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
>>> different nets discover the socket, check that net->count==0 and both
>>> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
>>> net that it needs to purge?
>>
>> I don't think this could happen, because cleanup_net() is called in a
>> work struct, and this work can't be scheduled twice, so there should
>> not be any parallel cleanup_net().
>>
>> Also, inet_twsk_deschedule_put() already waits for the flying timer,
>> net->count==0 at this point and all sockets in this netns are already
>> gone, so I don't know how a timer could be still fired after this.
>
>
> One thing that I noticed is that use-after-free always happens in
> tw_timer_handler, but never in timer code. This suggests that:
> 1. Whoever frees the struct waits for the timer mutex unlock.
> 2. Free happens almost at the same time as timer fires (otherwise the
> timer would probably be cancelled).
>
> That could happen if there is a race in timer code during cancellation
> or rearming. I understand that it's heavily stressed code, but who
> knows...

Good point! I think this could happen since timer is deactivated before
unlinking the tw sock, so another CPU could find it and rearm the timer
during such window?

>
> I still wonder if twsk_net(tw)->count==0 is the right condition. This
> net may not yet have been scheduled for destruction, but another task
> could pick it up and start destroying...

Good question. net_exit_list is just a snapshot of the cleanup_list, so
it is true that inet_twsk_purge() could purge more netns'es than in
net_exit_list, but I don't see any problem with this, the work later can't
find the same twsck again since it is unlinked from hashtable.
Do you see otherwise?

>
> Cong, do you have any ideas as to what debugging checks I could add to
> help track this down?

I don't have any theory for the cause of this bug. Your point above actually
makes sense for me. Maybe you can add some code in between the following
code:

if (del_timer_sync(>tw_timer))
inet_twsk_kill(tw);

to verify is the timer is rearmed or not.


Re: net: use-after-free in tw_timer_handler

2017-02-17 Thread Cong Wang
On Fri, Feb 17, 2017 at 12:36 PM, Dmitry Vyukov  wrote:
> On Fri, Feb 17, 2017 at 7:51 PM, Cong Wang  wrote:
>>>>>>
>>>>>> This code was changed a long time ago :
>>>>>>
>>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>>>>>>
>>>>>> So I suspect a recent patch broke the logic.
>>>>>>
>>>>>> You might start a bisection :
>>>>>>
>>>>>> I would check if 4.7 and 4.8 trigger the issue you noticed.
>>>>>
>>>>>
>>>>> It happens with too low rate for bisecting (few times per day). I
>>>>> could add some additional checks into code, but I don't know what
>>>>> checks could be useful.
>>>>
>>>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
>>>> we are able to help.
>>>
>>>
>>> There are also chances that the problem is older.
>>>
>>> Looking at the code, this part of inet_twsk_purge looks fishy:
>>>
>>> 285 if (unlikely((tw->tw_family != family) ||
>>> 286  
>>> atomic_read(_net(tw)->count))) {
>>>
>>> It uses net->count == 0 check to find the right sockets. But what if
>>> there are several nets with count == 0 in flight, can't there be
>>> several inet_twsk_purge calls running concurrently freeing each other
>>> sockets? If so it looks like inet_twsk_purge can call
>>> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
>>> different nets discover the socket, check that net->count==0 and both
>>> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
>>> net that it needs to purge?
>>
>> I don't think this could happen, because cleanup_net() is called in a
>> work struct, and this work can't be scheduled twice, so there should
>> not be any parallel cleanup_net().
>>
>> Also, inet_twsk_deschedule_put() already waits for the flying timer,
>> net->count==0 at this point and all sockets in this netns are already
>> gone, so I don't know how a timer could be still fired after this.
>
>
> One thing that I noticed is that use-after-free always happens in
> tw_timer_handler, but never in timer code. This suggests that:
> 1. Whoever frees the struct waits for the timer mutex unlock.
> 2. Free happens almost at the same time as timer fires (otherwise the
> timer would probably be cancelled).
>
> That could happen if there is a race in timer code during cancellation
> or rearming. I understand that it's heavily stressed code, but who
> knows...

Good point! I think this could happen since timer is deactivated before
unlinking the tw sock, so another CPU could find it and rearm the timer
during such window?

>
> I still wonder if twsk_net(tw)->count==0 is the right condition. This
> net may not yet have been scheduled for destruction, but another task
> could pick it up and start destroying...

Good question. net_exit_list is just a snapshot of the cleanup_list, so
it is true that inet_twsk_purge() could purge more netns'es than in
net_exit_list, but I don't see any problem with this, the work later can't
find the same twsck again since it is unlinked from hashtable.
Do you see otherwise?

>
> Cong, do you have any ideas as to what debugging checks I could add to
> help track this down?

I don't have any theory for the cause of this bug. Your point above actually
makes sense for me. Maybe you can add some code in between the following
code:

if (del_timer_sync(>tw_timer))
inet_twsk_kill(tw);

to verify is the timer is rearmed or not.


Re: net: use-after-free in tw_timer_handler

2017-02-17 Thread Cong Wang
On Wed, Feb 8, 2017 at 9:36 AM, Dmitry Vyukov  wrote:
> On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet  wrote:
>> On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov  wrote:

 This code was changed a long time ago :

 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054

 So I suspect a recent patch broke the logic.

 You might start a bisection :

 I would check if 4.7 and 4.8 trigger the issue you noticed.
>>>
>>>
>>> It happens with too low rate for bisecting (few times per day). I
>>> could add some additional checks into code, but I don't know what
>>> checks could be useful.
>>
>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
>> we are able to help.
>
>
> There are also chances that the problem is older.
>
> Looking at the code, this part of inet_twsk_purge looks fishy:
>
> 285 if (unlikely((tw->tw_family != family) ||
> 286  atomic_read(_net(tw)->count))) {
>
> It uses net->count == 0 check to find the right sockets. But what if
> there are several nets with count == 0 in flight, can't there be
> several inet_twsk_purge calls running concurrently freeing each other
> sockets? If so it looks like inet_twsk_purge can call
> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
> different nets discover the socket, check that net->count==0 and both
> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
> net that it needs to purge?

I don't think this could happen, because cleanup_net() is called in a
work struct, and this work can't be scheduled twice, so there should
not be any parallel cleanup_net().

Also, inet_twsk_deschedule_put() already waits for the flying timer,
net->count==0 at this point and all sockets in this netns are already
gone, so I don't know how a timer could be still fired after this.


Re: net: use-after-free in tw_timer_handler

2017-02-17 Thread Cong Wang
On Wed, Feb 8, 2017 at 9:36 AM, Dmitry Vyukov  wrote:
> On Tue, Jan 24, 2017 at 4:52 PM, Eric Dumazet  wrote:
>> On Tue, Jan 24, 2017 at 7:06 AM, Dmitry Vyukov  wrote:

 This code was changed a long time ago :

 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054

 So I suspect a recent patch broke the logic.

 You might start a bisection :

 I would check if 4.7 and 4.8 trigger the issue you noticed.
>>>
>>>
>>> It happens with too low rate for bisecting (few times per day). I
>>> could add some additional checks into code, but I don't know what
>>> checks could be useful.
>>
>> If you can not tell if 4.7 and/or 4.8 have the problem, I am not sure
>> we are able to help.
>
>
> There are also chances that the problem is older.
>
> Looking at the code, this part of inet_twsk_purge looks fishy:
>
> 285 if (unlikely((tw->tw_family != family) ||
> 286  atomic_read(_net(tw)->count))) {
>
> It uses net->count == 0 check to find the right sockets. But what if
> there are several nets with count == 0 in flight, can't there be
> several inet_twsk_purge calls running concurrently freeing each other
> sockets? If so it looks like inet_twsk_purge can call
> inet_twsk_deschedule_put twice for a socket. Namely, two calls for
> different nets discover the socket, check that net->count==0 and both
> call inet_twsk_deschedule_put. Shouldn't we just give inet_twsk_purge
> net that it needs to purge?

I don't think this could happen, because cleanup_net() is called in a
work struct, and this work can't be scheduled twice, so there should
not be any parallel cleanup_net().

Also, inet_twsk_deschedule_put() already waits for the flying timer,
net->count==0 at this point and all sockets in this netns are already
gone, so I don't know how a timer could be still fired after this.


Re: net/dccp: use-after-free in dccp_feat_activate_values

2017-02-13 Thread Cong Wang
On Mon, Feb 13, 2017 at 11:19 AM, Andrey Konovalov
 wrote:
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> On commit 926af6273fc683cd98cd0ce7bf0d04a02eed6742.
>
> A reproducer and .config are attached.
> Note, that it takes quite some time to trigger the bug (up to 10 minutes).
>
> BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0
> net/dccp/feat.c:1541 at addr 88003713be68
> Read of size 8 by task syz-executor2/8457
> CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x292/0x398 lib/dump_stack.c:51
>  kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
>  print_address_description mm/kasan/report.c:200 [inline]
>  kasan_report_error mm/kasan/report.c:289 [inline]
>  kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311
>  kasan_report mm/kasan/report.c:332 [inline]
>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>  dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541
>  dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
>  dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
>  dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
>  dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
>  ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
>  NF_HOOK include/linux/netfilter.h:257 [inline]
>  ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
>  dst_input include/net/dst.h:507 [inline]
>  ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
>  NF_HOOK include/linux/netfilter.h:257 [inline]
>  ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
>  __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
>  __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
>  process_backlog+0xe5/0x6c0 net/core/dev.c:4839
>  napi_poll net/core/dev.c:5202 [inline]
>  net_rx_action+0xe70/0x1900 net/core/dev.c:5267
>  __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
>  do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902


Seems there is a race condition between iterating dccp_feat_entry
and freeing it, bh_lock_sock() seems not held in this path.


>  
>  do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
>  do_softirq kernel/softirq.c:176 [inline]
>  __local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181
>  local_bh_enable include/linux/bottom_half.h:31 [inline]
>  rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline]
>  ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123
>  ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148
>  NF_HOOK_COND include/linux/netfilter.h:246 [inline]
>  ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162
>  ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501
>  inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179
>  dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141
>  dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280
>  dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362
>  dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796
>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
>  sock_sendmsg_nosec net/socket.c:635 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:645
>  SYSC_sendto+0x660/0x810 net/socket.c:1687
>  SyS_sendto+0x40/0x50 net/socket.c:1655
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x4458b9
> RSP: 002b:7f8ceb77bb58 EFLAGS: 0282 ORIG_RAX: 002c
> RAX: ffda RBX: 0017 RCX: 004458b9
> RDX: 0023 RSI: 20e6 RDI: 0017
> RBP: 006e1b90 R08: 200f9fe1 R09: 0020
> R10: 8010 R11: 0282 R12: 007080a8
> R13:  R14: 7f8ceb77c9c0 R15: 7f8ceb77c700
> Object at 88003713be50, in cache kmalloc-64 size: 64
> Allocated:
> PID = 8446
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:502
>  set_track mm/kasan/kasan.c:514 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
>  kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738
>  kmalloc include/linux/slab.h:490 [inline]
>  dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467
>  dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487
>  __feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741
>  dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949
>  dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012
>  dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423
>  dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217
>  dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377
>  dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606
>  dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
>  sk_backlog_rcv include/net/sock.h:893 [inline]
>  __sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479
>  dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742
>  ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
>  NF_HOOK include/linux/netfilter.h:257 [inline]
>  

Re: net/dccp: use-after-free in dccp_feat_activate_values

2017-02-13 Thread Cong Wang
On Mon, Feb 13, 2017 at 11:19 AM, Andrey Konovalov
 wrote:
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> On commit 926af6273fc683cd98cd0ce7bf0d04a02eed6742.
>
> A reproducer and .config are attached.
> Note, that it takes quite some time to trigger the bug (up to 10 minutes).
>
> BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0
> net/dccp/feat.c:1541 at addr 88003713be68
> Read of size 8 by task syz-executor2/8457
> CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x292/0x398 lib/dump_stack.c:51
>  kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
>  print_address_description mm/kasan/report.c:200 [inline]
>  kasan_report_error mm/kasan/report.c:289 [inline]
>  kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311
>  kasan_report mm/kasan/report.c:332 [inline]
>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>  dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541
>  dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
>  dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
>  dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
>  dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
>  ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
>  NF_HOOK include/linux/netfilter.h:257 [inline]
>  ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
>  dst_input include/net/dst.h:507 [inline]
>  ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
>  NF_HOOK include/linux/netfilter.h:257 [inline]
>  ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
>  __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
>  __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
>  process_backlog+0xe5/0x6c0 net/core/dev.c:4839
>  napi_poll net/core/dev.c:5202 [inline]
>  net_rx_action+0xe70/0x1900 net/core/dev.c:5267
>  __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
>  do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902


Seems there is a race condition between iterating dccp_feat_entry
and freeing it, bh_lock_sock() seems not held in this path.


>  
>  do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
>  do_softirq kernel/softirq.c:176 [inline]
>  __local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181
>  local_bh_enable include/linux/bottom_half.h:31 [inline]
>  rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline]
>  ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123
>  ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148
>  NF_HOOK_COND include/linux/netfilter.h:246 [inline]
>  ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162
>  ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501
>  inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179
>  dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141
>  dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280
>  dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362
>  dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796
>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
>  sock_sendmsg_nosec net/socket.c:635 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:645
>  SYSC_sendto+0x660/0x810 net/socket.c:1687
>  SyS_sendto+0x40/0x50 net/socket.c:1655
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x4458b9
> RSP: 002b:7f8ceb77bb58 EFLAGS: 0282 ORIG_RAX: 002c
> RAX: ffda RBX: 0017 RCX: 004458b9
> RDX: 0023 RSI: 20e6 RDI: 0017
> RBP: 006e1b90 R08: 200f9fe1 R09: 0020
> R10: 8010 R11: 0282 R12: 007080a8
> R13:  R14: 7f8ceb77c9c0 R15: 7f8ceb77c700
> Object at 88003713be50, in cache kmalloc-64 size: 64
> Allocated:
> PID = 8446
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:502
>  set_track mm/kasan/kasan.c:514 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
>  kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738
>  kmalloc include/linux/slab.h:490 [inline]
>  dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467
>  dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487
>  __feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741
>  dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949
>  dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012
>  dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423
>  dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217
>  dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377
>  dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606
>  dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
>  sk_backlog_rcv include/net/sock.h:893 [inline]
>  __sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479
>  dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742
>  ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
>  NF_HOOK include/linux/netfilter.h:257 [inline]
>  ip6_input+0xdb/0x590 

Re: net/kcm: GPF in kcm_sendmsg

2017-02-13 Thread Cong Wang
On Mon, Feb 13, 2017 at 7:14 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program triggers GPF in kcm_sendmsg:
>
>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> int main()
> {
>   int sock = socket(41 /*AF_KCM*/, SOCK_SEQPACKET, 0);
>   struct mmsghdr msg;
>   memset(, 0, sizeof(msg));
>   sendmmsg(sock, , 1, 0);
>   return 0;
> }
>
>
> general protection fault:  [#1] SMP KASAN
> Modules linked in:
> CPU: 2 PID: 2935 Comm: a.out Not tainted 4.10.0-rc8+ #218
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88006b506440 task.stack: 8800662b8000
> RIP: 0010:kcm_sendmsg+0x92e/0x2240 net/kcm/kcmsock.c:1048

Hmm, head is NULL in kcm_tx_msg(head)->last_skb = skb;,
I missed the !eor case in the previous fix.


> RSP: 0018:8800662bf720 EFLAGS: 00010202
> RAX: dc00 RBX:  RCX: 
> RDX: 0008 RSI: 88006b506c38 RDI: 0040
> RBP: 8800662bfa00 R08: 0001 R09: 
> R10: 0006 R11:  R12: 7fff
> R13: dc00 R14:  R15: 88006af12040
> FS:  01077880() GS:88006d10() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 004b2140 CR3: 651b7000 CR4: 001406e0
> Call Trace:
>  sock_sendmsg_nosec net/socket.c:635 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:645
>  ___sys_sendmsg+0x4a3/0x9f0 net/socket.c:1985
>  __sys_sendmmsg+0x25c/0x750 net/socket.c:2075
>  SYSC_sendmmsg net/socket.c:2106 [inline]
>  SyS_sendmmsg+0x35/0x60 net/socket.c:2101
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x436dc9
> RSP: 002b:7ffe84e1a938 EFLAGS: 0246 ORIG_RAX: 0133
> RAX: ffda RBX: 00401730 RCX: 00436dc9
> RDX: 0001 RSI: 7ffe84e1a950 RDI: 0003
> RBP:  R08: 000b R09: 0004
> R10:  R11: 0246 R12: 004002b0
> R13: 7ffe84e1aa88 R14: 0002 R15: 
> Code: 02 00 0f 85 d4 14 00 00 48 8b 85 c0 fd ff ff 48 8d 78 40 49 89
> 87 30 05 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
> 3c 02 00 0f 85 9d 14 00 00 48 8b 85 c0 fd ff ff 4c 89 70 40
> RIP: kcm_sendmsg+0x92e/0x2240 net/kcm/kcmsock.c:1048 RSP: 8800662bf720
> ---[ end trace 62093774c8609871 ]---
>
>
> On commit 7089db84e356562f8ba737c29e472cc42d530dbc (4.10-rc8).


Re: net/kcm: GPF in kcm_sendmsg

2017-02-13 Thread Cong Wang
On Mon, Feb 13, 2017 at 7:14 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program triggers GPF in kcm_sendmsg:
>
>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> int main()
> {
>   int sock = socket(41 /*AF_KCM*/, SOCK_SEQPACKET, 0);
>   struct mmsghdr msg;
>   memset(, 0, sizeof(msg));
>   sendmmsg(sock, , 1, 0);
>   return 0;
> }
>
>
> general protection fault:  [#1] SMP KASAN
> Modules linked in:
> CPU: 2 PID: 2935 Comm: a.out Not tainted 4.10.0-rc8+ #218
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88006b506440 task.stack: 8800662b8000
> RIP: 0010:kcm_sendmsg+0x92e/0x2240 net/kcm/kcmsock.c:1048

Hmm, head is NULL in kcm_tx_msg(head)->last_skb = skb;,
I missed the !eor case in the previous fix.


> RSP: 0018:8800662bf720 EFLAGS: 00010202
> RAX: dc00 RBX:  RCX: 
> RDX: 0008 RSI: 88006b506c38 RDI: 0040
> RBP: 8800662bfa00 R08: 0001 R09: 
> R10: 0006 R11:  R12: 7fff
> R13: dc00 R14:  R15: 88006af12040
> FS:  01077880() GS:88006d10() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 004b2140 CR3: 651b7000 CR4: 001406e0
> Call Trace:
>  sock_sendmsg_nosec net/socket.c:635 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:645
>  ___sys_sendmsg+0x4a3/0x9f0 net/socket.c:1985
>  __sys_sendmmsg+0x25c/0x750 net/socket.c:2075
>  SYSC_sendmmsg net/socket.c:2106 [inline]
>  SyS_sendmmsg+0x35/0x60 net/socket.c:2101
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x436dc9
> RSP: 002b:7ffe84e1a938 EFLAGS: 0246 ORIG_RAX: 0133
> RAX: ffda RBX: 00401730 RCX: 00436dc9
> RDX: 0001 RSI: 7ffe84e1a950 RDI: 0003
> RBP:  R08: 000b R09: 0004
> R10:  R11: 0246 R12: 004002b0
> R13: 7ffe84e1aa88 R14: 0002 R15: 
> Code: 02 00 0f 85 d4 14 00 00 48 8b 85 c0 fd ff ff 48 8d 78 40 49 89
> 87 30 05 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
> 3c 02 00 0f 85 9d 14 00 00 48 8b 85 c0 fd ff ff 4c 89 70 40
> RIP: kcm_sendmsg+0x92e/0x2240 net/kcm/kcmsock.c:1048 RSP: 8800662bf720
> ---[ end trace 62093774c8609871 ]---
>
>
> On commit 7089db84e356562f8ba737c29e472cc42d530dbc (4.10-rc8).


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang
On Fri, Feb 10, 2017 at 10:02 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Fri, 2017-02-10 at 09:59 -0800, Eric Dumazet wrote:
>> On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote:
>> > On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet <eric.duma...@gmail.com> 
>> > wrote:
>> > > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
>> > >
>> > >> More likely the bug is in fanout_add(), with a buggy sequence in error
>> > >> case, and not correct locking.
>> > >>
>> > >> kfree(po->rollover);
>> > >> po->rollover = NULL;
>> > >>
>> > >> Two cpus entering fanout_add() (using the same af_packet socket,
>> > >> syzkaller courtesy...) might both see po->fanout being NULL.
>> > >>
>> > >> Then they grab the mutex.  Too late...
>> > >
>> > > Patch could be :
>> > >
>> >
>> > For me, clearly the data structure that use-after-free'd is struct sock
>> > rather than struct packet_rollover.
>>
>> Fine. But your patch makes absolutely no sense.
>
> At least, Anoob patch is making a step into the right direction ;)
>
> https://patchwork.ozlabs.org/patch/726532/
>

Yeah, but still looks like a different one with the one Dmitry reported.


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang
On Fri, Feb 10, 2017 at 10:02 AM, Eric Dumazet  wrote:
> On Fri, 2017-02-10 at 09:59 -0800, Eric Dumazet wrote:
>> On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote:
>> > On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet  
>> > wrote:
>> > > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
>> > >
>> > >> More likely the bug is in fanout_add(), with a buggy sequence in error
>> > >> case, and not correct locking.
>> > >>
>> > >> kfree(po->rollover);
>> > >> po->rollover = NULL;
>> > >>
>> > >> Two cpus entering fanout_add() (using the same af_packet socket,
>> > >> syzkaller courtesy...) might both see po->fanout being NULL.
>> > >>
>> > >> Then they grab the mutex.  Too late...
>> > >
>> > > Patch could be :
>> > >
>> >
>> > For me, clearly the data structure that use-after-free'd is struct sock
>> > rather than struct packet_rollover.
>>
>> Fine. But your patch makes absolutely no sense.
>
> At least, Anoob patch is making a step into the right direction ;)
>
> https://patchwork.ozlabs.org/patch/726532/
>

Yeah, but still looks like a different one with the one Dmitry reported.


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang
On Fri, Feb 10, 2017 at 9:59 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote:
>> On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>> > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
>> >
>> >> More likely the bug is in fanout_add(), with a buggy sequence in error
>> >> case, and not correct locking.
>> >>
>> >> kfree(po->rollover);
>> >> po->rollover = NULL;
>> >>
>> >> Two cpus entering fanout_add() (using the same af_packet socket,
>> >> syzkaller courtesy...) might both see po->fanout being NULL.
>> >>
>> >> Then they grab the mutex.  Too late...
>> >
>> > Patch could be :
>> >
>>
>> For me, clearly the data structure that use-after-free'd is struct sock
>> rather than struct packet_rollover.
>
> Fine. But your patch makes absolutely no sense.

I don't have to give a 100% correct patch to prove my explanation
of the crash. At least it makes more sense than yours...


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang
On Fri, Feb 10, 2017 at 9:59 AM, Eric Dumazet  wrote:
> On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote:
>> On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet  wrote:
>> > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
>> >
>> >> More likely the bug is in fanout_add(), with a buggy sequence in error
>> >> case, and not correct locking.
>> >>
>> >> kfree(po->rollover);
>> >> po->rollover = NULL;
>> >>
>> >> Two cpus entering fanout_add() (using the same af_packet socket,
>> >> syzkaller courtesy...) might both see po->fanout being NULL.
>> >>
>> >> Then they grab the mutex.  Too late...
>> >
>> > Patch could be :
>> >
>>
>> For me, clearly the data structure that use-after-free'd is struct sock
>> rather than struct packet_rollover.
>
> Fine. But your patch makes absolutely no sense.

I don't have to give a 100% correct patch to prove my explanation
of the crash. At least it makes more sense than yours...


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang
On Thu, Feb 9, 2017 at 7:33 PM, Sowmini Varadhan
 wrote:
> On (02/09/17 19:19), Eric Dumazet wrote:
>>
>> More likely the bug is in fanout_add(), with a buggy sequence in error
>> case, and not correct locking.
>>
>> kfree(po->rollover);
>> po->rollover = NULL;
>>
>> Two cpus entering fanout_add() (using the same af_packet socket,
>> syzkaller courtesy...) might both see po->fanout being NULL.
>>
>> Then they grab the mutex.  Too late...
>
> I'm not sure I follow- aiui the panic was in acceessing the
> sk_receive_queue.lock in a socket that had been closed earlier. I think
> the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and
> rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit
> packet delivery can be done safely, and the synchronize_net in
> packet_release() makes sure that the Tx paths are quiesced before freeing
> the socket.  What is the race-hole here? Does it have to do with the
> _bh and softirq context, somehow?

My understanding about the race here is packet_release() doesn't
wait for flying packets correctly, which leads to a flying packet still
refers to the struct sock which is being released.

This could happen because struct packet_fanout is refcn'ted, it is
still there when this is not the last sock referring it, therefore, the
callback packet_rcv_fanout() is not removed yet. When packet_release()
tries to remove the pointer to struct sock from f->arr[i] in
__fanout_unlink(), a flying packet could race with f->arr[i]:

po = pkt_sk(f->arr[idx]);

Of course, the fix may not be as easy as just adding a synchronize_net(),
perhaps we need the spinlock too in fanout_demux_rollover().

At least I believe this explains the crash Dmitry reported.


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang
On Thu, Feb 9, 2017 at 7:33 PM, Sowmini Varadhan
 wrote:
> On (02/09/17 19:19), Eric Dumazet wrote:
>>
>> More likely the bug is in fanout_add(), with a buggy sequence in error
>> case, and not correct locking.
>>
>> kfree(po->rollover);
>> po->rollover = NULL;
>>
>> Two cpus entering fanout_add() (using the same af_packet socket,
>> syzkaller courtesy...) might both see po->fanout being NULL.
>>
>> Then they grab the mutex.  Too late...
>
> I'm not sure I follow- aiui the panic was in acceessing the
> sk_receive_queue.lock in a socket that had been closed earlier. I think
> the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and
> rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit
> packet delivery can be done safely, and the synchronize_net in
> packet_release() makes sure that the Tx paths are quiesced before freeing
> the socket.  What is the race-hole here? Does it have to do with the
> _bh and softirq context, somehow?

My understanding about the race here is packet_release() doesn't
wait for flying packets correctly, which leads to a flying packet still
refers to the struct sock which is being released.

This could happen because struct packet_fanout is refcn'ted, it is
still there when this is not the last sock referring it, therefore, the
callback packet_rcv_fanout() is not removed yet. When packet_release()
tries to remove the pointer to struct sock from f->arr[i] in
__fanout_unlink(), a flying packet could race with f->arr[i]:

po = pkt_sk(f->arr[idx]);

Of course, the fix may not be as easy as just adding a synchronize_net(),
perhaps we need the spinlock too in fanout_demux_rollover().

At least I believe this explains the crash Dmitry reported.


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang
On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet  wrote:
> On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
>
>> More likely the bug is in fanout_add(), with a buggy sequence in error
>> case, and not correct locking.
>>
>> kfree(po->rollover);
>> po->rollover = NULL;
>>
>> Two cpus entering fanout_add() (using the same af_packet socket,
>> syzkaller courtesy...) might both see po->fanout being NULL.
>>
>> Then they grab the mutex.  Too late...
>
> Patch could be :
>

For me, clearly the data structure that use-after-free'd is struct sock
rather than struct packet_rollover.


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang
On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet  wrote:
> On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
>
>> More likely the bug is in fanout_add(), with a buggy sequence in error
>> case, and not correct locking.
>>
>> kfree(po->rollover);
>> po->rollover = NULL;
>>
>> Two cpus entering fanout_add() (using the same af_packet socket,
>> syzkaller courtesy...) might both see po->fanout being NULL.
>>
>> Then they grab the mutex.  Too late...
>
> Patch could be :
>

For me, clearly the data structure that use-after-free'd is struct sock
rather than struct packet_rollover.


Re: fs, net: deadlock between bind/splice on af_unix

2017-02-09 Thread Cong Wang
On Tue, Feb 7, 2017 at 6:20 AM, Mateusz Guzik  wrote:
>
> Yes, but unix_release_sock is expected to leave the file behind.
> Note I'm not claiming there is a leak, but that racing threads will be
> able to trigger a condition where you create a file and fail to bind it.
>

Which is expected, right? No one guarantees the success of file
creation is the success of bind, the previous code does but it is not
part of API AFAIK. Should a sane user-space application check
the file creation for a successful bind() or just check its return value?

> What to do with the file now?
>

We just do what unix_release_sock() does, so why do you keep
asking the same question?

If you still complain about the race with user-space, think about the
same race in-between a successful bind() and close(), nothing is new.


Re: fs, net: deadlock between bind/splice on af_unix

2017-02-09 Thread Cong Wang
On Tue, Feb 7, 2017 at 6:20 AM, Mateusz Guzik  wrote:
>
> Yes, but unix_release_sock is expected to leave the file behind.
> Note I'm not claiming there is a leak, but that racing threads will be
> able to trigger a condition where you create a file and fail to bind it.
>

Which is expected, right? No one guarantees the success of file
creation is the success of bind, the previous code does but it is not
part of API AFAIK. Should a sane user-space application check
the file creation for a successful bind() or just check its return value?

> What to do with the file now?
>

We just do what unix_release_sock() does, so why do you keep
asking the same question?

If you still complain about the race with user-space, think about the
same race in-between a successful bind() and close(), nothing is new.


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-09 Thread Cong Wang
On Thu, Feb 9, 2017 at 5:14 AM, Dmitry Vyukov  wrote:
> Hello,
>
> I've got the following use-after-free report in packet_rcv_fanout
> while running syzkaller fuzzer on linux-next
> e3e6c5f3544c5d05c6b3b309a34f4f2c3537e993. So far it happened once and
> is not reproducible, but maybe the stacks will allow you to figure out
> what happens.
>
> BUG: KASAN: use-after-free in __lock_acquire+0x3212/0x3430
> kernel/locking/lockdep.c:3224 at addr 8801d903d538
> Read of size 8 by task syz-executor1/10596
> CPU: 1 PID: 10596 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170208 #1
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
>
> Call Trace:
>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>  __lock_acquire+0x3212/0x3430 kernel/locking/lockdep.c:3224
>  lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
>  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
>  _raw_spin_lock_bh+0x3a/0x50 kernel/locking/spinlock.c:175
>  spin_lock_bh include/linux/spinlock.h:304 [inline]
>  packet_rcv_has_room+0x25/0xb0 net/packet/af_packet.c:1308
>  fanout_demux_rollover+0x3bb/0x6b0 net/packet/af_packet.c:1388
>  packet_rcv_fanout+0x674/0x800 net/packet/af_packet.c:1490
>  dev_queue_xmit_nit+0x73a/0xa90 net/core/dev.c:1898
>  xmit_one net/core/dev.c:2870 [inline]
>  dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2890
>  __dev_queue_xmit+0x16d1/0x1e60 net/core/dev.c:3355
>  dev_queue_xmit+0x17/0x20 net/core/dev.c:3388
>  neigh_hh_output include/net/neighbour.h:468 [inline]
>  dst_neigh_output include/net/dst.h:452 [inline]
>  ip6_finish_output2+0x1461/0x2380 net/ipv6/ip6_output.c:123
>  ip6_finish_output+0x2f9/0x950 net/ipv6/ip6_output.c:149
>  NF_HOOK_COND include/linux/netfilter.h:246 [inline]
>  ip6_output+0x1cb/0x8c0 net/ipv6/ip6_output.c:163
>  ip6_xmit+0xc2f/0x1e80 include/net/dst.h:498
>  inet6_csk_xmit+0x320/0x5d0 net/ipv6/inet6_connection_sock.c:139
>  tcp_transmit_skb+0x1ab4/0x3460 net/ipv4/tcp_output.c:1054
>  tcp_send_syn_data net/ipv4/tcp_output.c:3343 [inline]
>  tcp_connect+0x11a7/0x2f50 net/ipv4/tcp_output.c:3375
>  tcp_v6_connect+0x1a6e/0x1f70 net/ipv6/tcp_ipv6.c:295
>  __inet_stream_connect+0x2d1/0xf80 net/ipv4/af_inet.c:618
>  tcp_sendmsg_fastopen net/ipv4/tcp.c:1110 [inline]
>  tcp_sendmsg+0x23ac/0x3bd0 net/ipv4/tcp.c:1133
>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
>  sock_sendmsg_nosec net/socket.c:633 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:643
>  SYSC_sendto+0x660/0x810 net/socket.c:1685
>  SyS_sendto+0x40/0x50 net/socket.c:1653
>  entry_SYSCALL_64_fastpath+0x1f/0xc2

It seems on-flying packets could still refer the struct sock pointer
via f->arr[i], if so we need a sync before unlinking it:

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d56ee46..8724a98 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2924,6 +2924,8 @@ static int packet_release(struct socket *sock)
sock_prot_inuse_add(net, sk->sk_prot, -1);
preempt_enable();

+   synchronize_net();
+
spin_lock(>bind_lock);
unregister_prot_hook(sk, false);
packet_cached_dev_reset(po);


Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-09 Thread Cong Wang
On Thu, Feb 9, 2017 at 5:14 AM, Dmitry Vyukov  wrote:
> Hello,
>
> I've got the following use-after-free report in packet_rcv_fanout
> while running syzkaller fuzzer on linux-next
> e3e6c5f3544c5d05c6b3b309a34f4f2c3537e993. So far it happened once and
> is not reproducible, but maybe the stacks will allow you to figure out
> what happens.
>
> BUG: KASAN: use-after-free in __lock_acquire+0x3212/0x3430
> kernel/locking/lockdep.c:3224 at addr 8801d903d538
> Read of size 8 by task syz-executor1/10596
> CPU: 1 PID: 10596 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170208 #1
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
>
> Call Trace:
>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>  __lock_acquire+0x3212/0x3430 kernel/locking/lockdep.c:3224
>  lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
>  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
>  _raw_spin_lock_bh+0x3a/0x50 kernel/locking/spinlock.c:175
>  spin_lock_bh include/linux/spinlock.h:304 [inline]
>  packet_rcv_has_room+0x25/0xb0 net/packet/af_packet.c:1308
>  fanout_demux_rollover+0x3bb/0x6b0 net/packet/af_packet.c:1388
>  packet_rcv_fanout+0x674/0x800 net/packet/af_packet.c:1490
>  dev_queue_xmit_nit+0x73a/0xa90 net/core/dev.c:1898
>  xmit_one net/core/dev.c:2870 [inline]
>  dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2890
>  __dev_queue_xmit+0x16d1/0x1e60 net/core/dev.c:3355
>  dev_queue_xmit+0x17/0x20 net/core/dev.c:3388
>  neigh_hh_output include/net/neighbour.h:468 [inline]
>  dst_neigh_output include/net/dst.h:452 [inline]
>  ip6_finish_output2+0x1461/0x2380 net/ipv6/ip6_output.c:123
>  ip6_finish_output+0x2f9/0x950 net/ipv6/ip6_output.c:149
>  NF_HOOK_COND include/linux/netfilter.h:246 [inline]
>  ip6_output+0x1cb/0x8c0 net/ipv6/ip6_output.c:163
>  ip6_xmit+0xc2f/0x1e80 include/net/dst.h:498
>  inet6_csk_xmit+0x320/0x5d0 net/ipv6/inet6_connection_sock.c:139
>  tcp_transmit_skb+0x1ab4/0x3460 net/ipv4/tcp_output.c:1054
>  tcp_send_syn_data net/ipv4/tcp_output.c:3343 [inline]
>  tcp_connect+0x11a7/0x2f50 net/ipv4/tcp_output.c:3375
>  tcp_v6_connect+0x1a6e/0x1f70 net/ipv6/tcp_ipv6.c:295
>  __inet_stream_connect+0x2d1/0xf80 net/ipv4/af_inet.c:618
>  tcp_sendmsg_fastopen net/ipv4/tcp.c:1110 [inline]
>  tcp_sendmsg+0x23ac/0x3bd0 net/ipv4/tcp.c:1133
>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
>  sock_sendmsg_nosec net/socket.c:633 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:643
>  SYSC_sendto+0x660/0x810 net/socket.c:1685
>  SyS_sendto+0x40/0x50 net/socket.c:1653
>  entry_SYSCALL_64_fastpath+0x1f/0xc2

It seems on-flying packets could still refer the struct sock pointer
via f->arr[i], if so we need a sync before unlinking it:

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d56ee46..8724a98 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2924,6 +2924,8 @@ static int packet_release(struct socket *sock)
sock_prot_inuse_add(net, sk->sk_prot, -1);
preempt_enable();

+   synchronize_net();
+
spin_lock(>bind_lock);
unregister_prot_hook(sk, false);
packet_cached_dev_reset(po);


[PATCH resend] 9p: fix a potential acl leak

2017-02-09 Thread Cong Wang
posix_acl_update_mode() could possibly clear 'acl', if so
we leak the memory pointed by 'acl'. Save this pointer
before calling posix_acl_update_mode() and release the memory
if 'acl' really gets cleared.

Reported-by: Mark Salyzyn <saly...@android.com>
Reviewed-by: Jan Kara <j...@suse.cz>
Reviewed-by: Greg Kurz <gr...@kaod.org>
Cc: Eric Van Hensbergen <eri...@gmail.com>
Cc: Ron Minnich <rminn...@sandia.gov>
Cc: Latchesar Ionkov <lu...@ionkov.net>
Cc: Andrew Morton <a...@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com>
---
 fs/9p/acl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/9p/acl.c b/fs/9p/acl.c
index b3c2cc7..082d227 100644
--- a/fs/9p/acl.c
+++ b/fs/9p/acl.c
@@ -277,6 +277,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler 
*handler,
case ACL_TYPE_ACCESS:
if (acl) {
struct iattr iattr;
+   struct posix_acl *old_acl = acl;
 
retval = posix_acl_update_mode(inode, _mode, 
);
if (retval)
@@ -287,6 +288,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler 
*handler,
 * by the mode bits. So don't
 * update ACL.
 */
+   posix_acl_release(old_acl);
value = NULL;
size = 0;
}
-- 
2.5.5



[PATCH resend] 9p: fix a potential acl leak

2017-02-09 Thread Cong Wang
posix_acl_update_mode() could possibly clear 'acl', if so
we leak the memory pointed by 'acl'. Save this pointer
before calling posix_acl_update_mode() and release the memory
if 'acl' really gets cleared.

Reported-by: Mark Salyzyn 
Reviewed-by: Jan Kara 
Reviewed-by: Greg Kurz 
Cc: Eric Van Hensbergen 
Cc: Ron Minnich 
Cc: Latchesar Ionkov 
Cc: Andrew Morton 
Signed-off-by: Cong Wang 
---
 fs/9p/acl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/9p/acl.c b/fs/9p/acl.c
index b3c2cc7..082d227 100644
--- a/fs/9p/acl.c
+++ b/fs/9p/acl.c
@@ -277,6 +277,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler 
*handler,
case ACL_TYPE_ACCESS:
if (acl) {
struct iattr iattr;
+   struct posix_acl *old_acl = acl;
 
retval = posix_acl_update_mode(inode, _mode, 
);
if (retval)
@@ -287,6 +288,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler 
*handler,
 * by the mode bits. So don't
 * update ACL.
 */
+   posix_acl_release(old_acl);
value = NULL;
size = 0;
}
-- 
2.5.5



Re: net/ipv6: double free in ipip6_dev_free

2017-02-08 Thread Cong Wang
On Wed, Feb 8, 2017 at 5:56 AM, Dmitry Vyukov  wrote:
> First dev->tstats was freed here:
>
> 1376 static int ipip6_tunnel_init(struct net_device *dev)
> 1377 {
> 1378 struct ip_tunnel *tunnel = netdev_priv(dev);
> 1379 int err;
> 1380
> 1381 tunnel->dev = dev;
> 1382 tunnel->net = dev_net(dev);
> 1383 strcpy(tunnel->parms.name, dev->name);
> 1384
> 1385 ipip6_tunnel_bind_dev(dev);
> 1386 dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
> 1387 if (!dev->tstats)
> 1388 return -ENOMEM;
> 1389
> 1390 err = dst_cache_init(>dst_cache, GFP_KERNEL);
> 1391 if (err) {
> 1392 free_percpu(dev->tstats);
> 1393 return err;
> 1394 }
> 1395
> 1396 return 0;
> 1397 }
>
> And then again here:
>
> 1342 static void ipip6_dev_free(struct net_device *dev)
> 1343 {
> 1344 struct ip_tunnel *tunnel = netdev_priv(dev);
> 1345
> 1346 dst_cache_destroy(>dst_cache);
> 1347 free_percpu(dev->tstats);
> 1348 free_netdev(dev);
> 1349 }

Probably we need to NULL dev->tstats in the ndo_init(),
and ipip6 tunnel seems not the only one missing it.


Re: net/ipv6: double free in ipip6_dev_free

2017-02-08 Thread Cong Wang
On Wed, Feb 8, 2017 at 5:56 AM, Dmitry Vyukov  wrote:
> First dev->tstats was freed here:
>
> 1376 static int ipip6_tunnel_init(struct net_device *dev)
> 1377 {
> 1378 struct ip_tunnel *tunnel = netdev_priv(dev);
> 1379 int err;
> 1380
> 1381 tunnel->dev = dev;
> 1382 tunnel->net = dev_net(dev);
> 1383 strcpy(tunnel->parms.name, dev->name);
> 1384
> 1385 ipip6_tunnel_bind_dev(dev);
> 1386 dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
> 1387 if (!dev->tstats)
> 1388 return -ENOMEM;
> 1389
> 1390 err = dst_cache_init(>dst_cache, GFP_KERNEL);
> 1391 if (err) {
> 1392 free_percpu(dev->tstats);
> 1393 return err;
> 1394 }
> 1395
> 1396 return 0;
> 1397 }
>
> And then again here:
>
> 1342 static void ipip6_dev_free(struct net_device *dev)
> 1343 {
> 1344 struct ip_tunnel *tunnel = netdev_priv(dev);
> 1345
> 1346 dst_cache_destroy(>dst_cache);
> 1347 free_percpu(dev->tstats);
> 1348 free_netdev(dev);
> 1349 }

Probably we need to NULL dev->tstats in the ndo_init(),
and ipip6 tunnel seems not the only one missing it.


Re: [RFC] igmp: address pmc kmemleak from on igmpv3_del_delrec()

2017-02-06 Thread Cong Wang
On Fri, Feb 3, 2017 at 1:20 PM, Luis R. Rodriguez  wrote:
> When we igmpv3_add_delrec() we kzalloc the pmc, but when users
> calligmpv3_del_delrec() we never free the pmc. This was caught
> by the following kmemleak splat:
>
> unreferenced object 0x99666ff43b40 (size 192):
>   comm "systemd-resolve", pid 1258, jiffies 4309905600 (age 2138.352s)
>   hex dump (first 32 bytes):
> 00 6a 64 72 66 99 ff ff e0 00 00 fc 00 00 00 00  .jdrf...
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace:
> [] kmemleak_alloc+0x4a/0xa0
> [] kmem_cache_alloc_trace+0x107/0x240
> [] igmp_group_dropped+0xfd/0x270
> [] ip_mc_dec_group+0xaf/0x110
> [] ip_mc_leave_group+0xb6/0x140
> [] do_ip_setsockopt.isra.13+0x4c7/0xed0
> [] ip_setsockopt+0x34/0xb0
> [] udp_setsockopt+0x1b/0x30
> [] sock_common_setsockopt+0x14/0x20
> [] SyS_setsockopt+0x80/0xe0
> [] do_syscall_64+0x5b/0xc0
> [] return_from_SYSCALL_64+0x0/0x6a
> [] 0x
>
> Signed-off-by: Luis R. Rodriguez 
> ---
>
> I can reproduce this over time on a qemu box running next-20170125.
> After running this for a while I no longer see the splat. This needs
> confirmation form folks more familiar with the code, hence RFC. If
> this is a real fix we need appropriate tags for the patch.


Looks good to me. Adding some people who recent touched it to CC.

>
>  net/ipv4/igmp.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
> index 5b15459955f8..44fd86de2823 100644
> --- a/net/ipv4/igmp.c
> +++ b/net/ipv4/igmp.c
> @@ -1172,6 +1172,7 @@ static void igmpv3_del_delrec(struct in_device *in_dev, 
> struct ip_mc_list *im)
> psf->sf_crcount = im->crcount;
> }
> in_dev_put(pmc->interface);
> +   kfree(pmc);
> }
> spin_unlock_bh(>lock);
>  }
> --
> 2.11.0
>


Re: [RFC] igmp: address pmc kmemleak from on igmpv3_del_delrec()

2017-02-06 Thread Cong Wang
On Fri, Feb 3, 2017 at 1:20 PM, Luis R. Rodriguez  wrote:
> When we igmpv3_add_delrec() we kzalloc the pmc, but when users
> calligmpv3_del_delrec() we never free the pmc. This was caught
> by the following kmemleak splat:
>
> unreferenced object 0x99666ff43b40 (size 192):
>   comm "systemd-resolve", pid 1258, jiffies 4309905600 (age 2138.352s)
>   hex dump (first 32 bytes):
> 00 6a 64 72 66 99 ff ff e0 00 00 fc 00 00 00 00  .jdrf...
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace:
> [] kmemleak_alloc+0x4a/0xa0
> [] kmem_cache_alloc_trace+0x107/0x240
> [] igmp_group_dropped+0xfd/0x270
> [] ip_mc_dec_group+0xaf/0x110
> [] ip_mc_leave_group+0xb6/0x140
> [] do_ip_setsockopt.isra.13+0x4c7/0xed0
> [] ip_setsockopt+0x34/0xb0
> [] udp_setsockopt+0x1b/0x30
> [] sock_common_setsockopt+0x14/0x20
> [] SyS_setsockopt+0x80/0xe0
> [] do_syscall_64+0x5b/0xc0
> [] return_from_SYSCALL_64+0x0/0x6a
> [] 0x
>
> Signed-off-by: Luis R. Rodriguez 
> ---
>
> I can reproduce this over time on a qemu box running next-20170125.
> After running this for a while I no longer see the splat. This needs
> confirmation form folks more familiar with the code, hence RFC. If
> this is a real fix we need appropriate tags for the patch.


Looks good to me. Adding some people who recent touched it to CC.

>
>  net/ipv4/igmp.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
> index 5b15459955f8..44fd86de2823 100644
> --- a/net/ipv4/igmp.c
> +++ b/net/ipv4/igmp.c
> @@ -1172,6 +1172,7 @@ static void igmpv3_del_delrec(struct in_device *in_dev, 
> struct ip_mc_list *im)
> psf->sf_crcount = im->crcount;
> }
> in_dev_put(pmc->interface);
> +   kfree(pmc);
> }
> spin_unlock_bh(>lock);
>  }
> --
> 2.11.0
>


Re: net/kcm: WARNING in kcm_write_msgs

2017-02-06 Thread Cong Wang
On Mon, Feb 6, 2017 at 4:43 AM, Dmitry Vyukov  wrote:
> [resending as plain text]
>
> Hello,
>
> The following program triggers WARNING in kcm_write_msgs:
>
> WARNING: CPU: 3 PID: 2936 at net/kcm/kcmsock.c:627
> kcm_write_msgs+0x12e3/0x1b90 net/kcm/kcmsock.c:627
> CPU: 3 PID: 2936 Comm: a.out Not tainted 4.10.0-rc6+ #209
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>  panic+0x1fb/0x412 kernel/panic.c:179
>  __warn+0x1c4/0x1e0 kernel/panic.c:539
>  warn_slowpath_null+0x2c/0x40 kernel/panic.c:582
>  kcm_write_msgs+0x12e3/0x1b90 net/kcm/kcmsock.c:627
>  kcm_sendmsg+0x163a/0x2200 net/kcm/kcmsock.c:1029
>  sock_sendmsg_nosec net/socket.c:635 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:645
>  sock_write_iter+0x326/0x600 net/socket.c:848
>  new_sync_write fs/read_write.c:499 [inline]
>  __vfs_write+0x483/0x740 fs/read_write.c:512
>  vfs_write+0x187/0x530 fs/read_write.c:560
>  SYSC_write fs/read_write.c:607 [inline]
>  SyS_write+0xfb/0x230 fs/read_write.c:599
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
[...]
>   syscall(__NR_write, sock2, 0x208aaf27ul, 0x0ul);

Looks like len == 0 case is not handled correctly in kcm_sendmsg().
The attached patch fixes it, but I am not sure if it is correct in all
cases yet, the logic is complicated.
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 7e08a4d..64f0e85 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -929,23 +929,25 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr 
*msg, size_t len)
goto out_error;
}
 
-   /* New message, alloc head skb */
-   head = alloc_skb(0, sk->sk_allocation);
-   while (!head) {
-   kcm_push(kcm);
-   err = sk_stream_wait_memory(sk, );
-   if (err)
-   goto out_error;
-
+   if (msg_data_left(msg)) {
+   /* New message, alloc head skb */
head = alloc_skb(0, sk->sk_allocation);
-   }
+   while (!head) {
+   kcm_push(kcm);
+   err = sk_stream_wait_memory(sk, );
+   if (err)
+   goto out_error;
 
-   skb = head;
+   head = alloc_skb(0, sk->sk_allocation);
+   }
 
-   /* Set ip_summed to CHECKSUM_UNNECESSARY to avoid calling
-* csum_and_copy_from_iter from skb_do_copy_data_nocache.
-*/
-   skb->ip_summed = CHECKSUM_UNNECESSARY;
+   skb = head;
+
+   /* Set ip_summed to CHECKSUM_UNNECESSARY to avoid calling
+* csum_and_copy_from_iter from skb_do_copy_data_nocache.
+*/
+   skb->ip_summed = CHECKSUM_UNNECESSARY;
+   }
 
 start:
while (msg_data_left(msg)) {
@@ -1018,10 +1020,12 @@ static int kcm_sendmsg(struct socket *sock, struct 
msghdr *msg, size_t len)
if (eor) {
bool not_busy = skb_queue_empty(>sk_write_queue);
 
-   /* Message complete, queue it on send buffer */
-   __skb_queue_tail(>sk_write_queue, head);
-   kcm->seq_skb = NULL;
-   KCM_STATS_INCR(kcm->stats.tx_msgs);
+   if (head) {
+   /* Message complete, queue it on send buffer */
+   __skb_queue_tail(>sk_write_queue, head);
+   kcm->seq_skb = NULL;
+   KCM_STATS_INCR(kcm->stats.tx_msgs);
+   }
 
if (msg->msg_flags & MSG_BATCH) {
kcm->tx_wait_more = true;


Re: net/kcm: WARNING in kcm_write_msgs

2017-02-06 Thread Cong Wang
On Mon, Feb 6, 2017 at 4:43 AM, Dmitry Vyukov  wrote:
> [resending as plain text]
>
> Hello,
>
> The following program triggers WARNING in kcm_write_msgs:
>
> WARNING: CPU: 3 PID: 2936 at net/kcm/kcmsock.c:627
> kcm_write_msgs+0x12e3/0x1b90 net/kcm/kcmsock.c:627
> CPU: 3 PID: 2936 Comm: a.out Not tainted 4.10.0-rc6+ #209
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>  panic+0x1fb/0x412 kernel/panic.c:179
>  __warn+0x1c4/0x1e0 kernel/panic.c:539
>  warn_slowpath_null+0x2c/0x40 kernel/panic.c:582
>  kcm_write_msgs+0x12e3/0x1b90 net/kcm/kcmsock.c:627
>  kcm_sendmsg+0x163a/0x2200 net/kcm/kcmsock.c:1029
>  sock_sendmsg_nosec net/socket.c:635 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:645
>  sock_write_iter+0x326/0x600 net/socket.c:848
>  new_sync_write fs/read_write.c:499 [inline]
>  __vfs_write+0x483/0x740 fs/read_write.c:512
>  vfs_write+0x187/0x530 fs/read_write.c:560
>  SYSC_write fs/read_write.c:607 [inline]
>  SyS_write+0xfb/0x230 fs/read_write.c:599
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
[...]
>   syscall(__NR_write, sock2, 0x208aaf27ul, 0x0ul);

Looks like len == 0 case is not handled correctly in kcm_sendmsg().
The attached patch fixes it, but I am not sure if it is correct in all
cases yet, the logic is complicated.
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 7e08a4d..64f0e85 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -929,23 +929,25 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr 
*msg, size_t len)
goto out_error;
}
 
-   /* New message, alloc head skb */
-   head = alloc_skb(0, sk->sk_allocation);
-   while (!head) {
-   kcm_push(kcm);
-   err = sk_stream_wait_memory(sk, );
-   if (err)
-   goto out_error;
-
+   if (msg_data_left(msg)) {
+   /* New message, alloc head skb */
head = alloc_skb(0, sk->sk_allocation);
-   }
+   while (!head) {
+   kcm_push(kcm);
+   err = sk_stream_wait_memory(sk, );
+   if (err)
+   goto out_error;
 
-   skb = head;
+   head = alloc_skb(0, sk->sk_allocation);
+   }
 
-   /* Set ip_summed to CHECKSUM_UNNECESSARY to avoid calling
-* csum_and_copy_from_iter from skb_do_copy_data_nocache.
-*/
-   skb->ip_summed = CHECKSUM_UNNECESSARY;
+   skb = head;
+
+   /* Set ip_summed to CHECKSUM_UNNECESSARY to avoid calling
+* csum_and_copy_from_iter from skb_do_copy_data_nocache.
+*/
+   skb->ip_summed = CHECKSUM_UNNECESSARY;
+   }
 
 start:
while (msg_data_left(msg)) {
@@ -1018,10 +1020,12 @@ static int kcm_sendmsg(struct socket *sock, struct 
msghdr *msg, size_t len)
if (eor) {
bool not_busy = skb_queue_empty(>sk_write_queue);
 
-   /* Message complete, queue it on send buffer */
-   __skb_queue_tail(>sk_write_queue, head);
-   kcm->seq_skb = NULL;
-   KCM_STATS_INCR(kcm->stats.tx_msgs);
+   if (head) {
+   /* Message complete, queue it on send buffer */
+   __skb_queue_tail(>sk_write_queue, head);
+   kcm->seq_skb = NULL;
+   KCM_STATS_INCR(kcm->stats.tx_msgs);
+   }
 
if (msg->msg_flags & MSG_BATCH) {
kcm->tx_wait_more = true;


Re: net/icmp: null-ptr-deref in ping_v4_push_pending_frames

2017-02-06 Thread Cong Wang
On Mon, Feb 6, 2017 at 11:39 AM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following error report while running the syzkaller fuzzer.
>
> The null-ptr-deref is caused by sendto() on a socket(PF_INET,
> SOCK_DGRAM, PROT_ICMP).
> Note, that this requires the ability to create such sockets, which can
> be configured by net.ipv4.ping_group_range
> (https://lwn.net/Articles/422330/).
>
> A reproducer and .config are attached.
>
> On commit a572a1b999489efb591287632279c6c9eca3e4ed.
>
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 2 PID: 3880 Comm: syz-executor1 Not tainted 4.10.0-rc6+ #124
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 880060048040 task.stack: 880069be8000
> RIP: 0010:ping_v4_push_pending_frames net/ipv4/ping.c:647 [inline]
> RIP: 0010:ping_v4_sendmsg+0x1acd/0x23f0 net/ipv4/ping.c:837
> RSP: 0018:880069bef8b8 EFLAGS: 00010206
> RAX: dc00 RBX: 880069befb90 RCX: 
> RDX: 0018 RSI: 880069befa30 RDI: 00c2
> RBP: 880069befbb8 R08: 0008 R09: 
> R10: 0002 R11:  R12: 880069befab0
> R13: 88006c624a80 R14: 880069befa70 R15: 
> FS:  7f6f7c716700() GS:88006de0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 004a6f28 CR3: 3a134000 CR4: 06e0
> Call Trace:
>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
>  sock_sendmsg_nosec net/socket.c:635 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:645
>  SYSC_sendto+0x660/0x810 net/socket.c:1687
>  SyS_sendto+0x40/0x50 net/socket.c:1655
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x445879
> RSP: 002b:7f6f7c715b58 EFLAGS: 0282 ORIG_RAX: 002c
> RAX: ffda RBX: 0005 RCX: 00445879
> RDX: 0008 RSI: 20001000 RDI: 0005
> RBP: 006e1ca0 R08: 20ed9ff0 R09: 0010
> R10: 2010 R11: 0282 R12: 00708000
> R13:  R14: 7f6f7c7169c0 R15: 7f6f7c716700
> Code: 38 ca 7c 08 84 c9 0f 85 35 03 00 00 49 8d bf c2 00 00 00 66 89
> 83 a2 fe ff ff 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f>
> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85
> RIP: ping_v4_push_pending_frames net/ipv4/ping.c:647 [inline] RSP:
> 880069bef8b8
> RIP: ping_v4_sendmsg+0x1acd/0x23f0 net/ipv4/ping.c:837 RSP: 880069bef8b8
> ---[ end trace 13dad24b243d08a7 ]---

This fixes it for me:

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 86cca61..68d77b1 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -642,6 +642,8 @@ static int ping_v4_push_pending_frames(struct sock
*sk, struct pingfakehdr *pfh,
 {
struct sk_buff *skb = skb_peek(>sk_write_queue);

+   if (!skb)
+   return 0;
pfh->wcheck = csum_partial((char *)>icmph,
sizeof(struct icmphdr), pfh->wcheck);
pfh->icmph.checksum = csum_fold(pfh->wcheck);


Re: net/icmp: null-ptr-deref in ping_v4_push_pending_frames

2017-02-06 Thread Cong Wang
On Mon, Feb 6, 2017 at 11:39 AM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following error report while running the syzkaller fuzzer.
>
> The null-ptr-deref is caused by sendto() on a socket(PF_INET,
> SOCK_DGRAM, PROT_ICMP).
> Note, that this requires the ability to create such sockets, which can
> be configured by net.ipv4.ping_group_range
> (https://lwn.net/Articles/422330/).
>
> A reproducer and .config are attached.
>
> On commit a572a1b999489efb591287632279c6c9eca3e4ed.
>
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 2 PID: 3880 Comm: syz-executor1 Not tainted 4.10.0-rc6+ #124
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 880060048040 task.stack: 880069be8000
> RIP: 0010:ping_v4_push_pending_frames net/ipv4/ping.c:647 [inline]
> RIP: 0010:ping_v4_sendmsg+0x1acd/0x23f0 net/ipv4/ping.c:837
> RSP: 0018:880069bef8b8 EFLAGS: 00010206
> RAX: dc00 RBX: 880069befb90 RCX: 
> RDX: 0018 RSI: 880069befa30 RDI: 00c2
> RBP: 880069befbb8 R08: 0008 R09: 
> R10: 0002 R11:  R12: 880069befab0
> R13: 88006c624a80 R14: 880069befa70 R15: 
> FS:  7f6f7c716700() GS:88006de0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 004a6f28 CR3: 3a134000 CR4: 06e0
> Call Trace:
>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
>  sock_sendmsg_nosec net/socket.c:635 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:645
>  SYSC_sendto+0x660/0x810 net/socket.c:1687
>  SyS_sendto+0x40/0x50 net/socket.c:1655
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x445879
> RSP: 002b:7f6f7c715b58 EFLAGS: 0282 ORIG_RAX: 002c
> RAX: ffda RBX: 0005 RCX: 00445879
> RDX: 0008 RSI: 20001000 RDI: 0005
> RBP: 006e1ca0 R08: 20ed9ff0 R09: 0010
> R10: 2010 R11: 0282 R12: 00708000
> R13:  R14: 7f6f7c7169c0 R15: 7f6f7c716700
> Code: 38 ca 7c 08 84 c9 0f 85 35 03 00 00 49 8d bf c2 00 00 00 66 89
> 83 a2 fe ff ff 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f>
> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85
> RIP: ping_v4_push_pending_frames net/ipv4/ping.c:647 [inline] RSP:
> 880069bef8b8
> RIP: ping_v4_sendmsg+0x1acd/0x23f0 net/ipv4/ping.c:837 RSP: 880069bef8b8
> ---[ end trace 13dad24b243d08a7 ]---

This fixes it for me:

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 86cca61..68d77b1 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -642,6 +642,8 @@ static int ping_v4_push_pending_frames(struct sock
*sk, struct pingfakehdr *pfh,
 {
struct sk_buff *skb = skb_peek(>sk_write_queue);

+   if (!skb)
+   return 0;
pfh->wcheck = csum_partial((char *)>icmph,
sizeof(struct icmphdr), pfh->wcheck);
pfh->icmph.checksum = csum_fold(pfh->wcheck);


Re: fs, net: deadlock between bind/splice on af_unix

2017-02-05 Thread Cong Wang
On Tue, Jan 31, 2017 at 10:14 AM, Mateusz Guzik <mgu...@redhat.com> wrote:
> On Mon, Jan 30, 2017 at 10:44:03PM -0800, Cong Wang wrote:
>> Mind being more specific?
>
> Consider 2 threads which bind the same socket, but with different paths.
>
> Currently exactly one file will get created, the one used to bind.
>
> With your patch both threads can succeed creating their respective
> files, but only one will manage to bind. The other one must error out,
> but it already created a file it is unclear what to do with.

In this case, it simply puts the path back:

err = -EINVAL;
if (u->addr)
goto out_up;
[...]

out_up:
mutex_unlock(>bindlock);
out_put:
if (err)
path_put();
out:
return err;


Which is what unix_release_sock() does too:

if (path.dentry)
path_put();


Re: fs, net: deadlock between bind/splice on af_unix

2017-02-05 Thread Cong Wang
On Tue, Jan 31, 2017 at 10:14 AM, Mateusz Guzik  wrote:
> On Mon, Jan 30, 2017 at 10:44:03PM -0800, Cong Wang wrote:
>> Mind being more specific?
>
> Consider 2 threads which bind the same socket, but with different paths.
>
> Currently exactly one file will get created, the one used to bind.
>
> With your patch both threads can succeed creating their respective
> files, but only one will manage to bind. The other one must error out,
> but it already created a file it is unclear what to do with.

In this case, it simply puts the path back:

err = -EINVAL;
if (u->addr)
goto out_up;
[...]

out_up:
mutex_unlock(>bindlock);
out_put:
if (err)
path_put();
out:
return err;


Which is what unix_release_sock() does too:

if (path.dentry)
path_put();


Re: net: deadlock on genl_mutex

2017-02-05 Thread Cong Wang
On Sun, Jan 29, 2017 at 2:11 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
> On Fri, Dec 9, 2016 at 6:08 AM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>>>> Chain exists of:
>>>>  Possible unsafe locking scenario:
>>>>
>>>>CPU0CPU1
>>>>
>>>>   lock(genl_mutex);
>>>>lock(nlk->cb_mutex);
>>>>lock(genl_mutex);
>>>>   lock(rtnl_mutex);
>>>>
>>>>  *** DEADLOCK ***
>>>
>>> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
>>> Let me think about it.
>>
>> Never mind. Actually both reports in this thread are legitimate.
>>
>> I know what happened now, the lock chain is so long, 4 locks are involved
>> to form a chain!!!
>>
>> Let me think about how to break the chain.
>
>
> Cong, any success with breaking the chain?

No luck yet. Each part of the chain seems legit, not sure which
one could be reordered. :-/


Re: net: deadlock on genl_mutex

2017-02-05 Thread Cong Wang
On Sun, Jan 29, 2017 at 2:11 AM, Dmitry Vyukov  wrote:
> On Fri, Dec 9, 2016 at 6:08 AM, Cong Wang  wrote:
>>>> Chain exists of:
>>>>  Possible unsafe locking scenario:
>>>>
>>>>CPU0CPU1
>>>>
>>>>   lock(genl_mutex);
>>>>lock(nlk->cb_mutex);
>>>>lock(genl_mutex);
>>>>   lock(rtnl_mutex);
>>>>
>>>>  *** DEADLOCK ***
>>>
>>> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
>>> Let me think about it.
>>
>> Never mind. Actually both reports in this thread are legitimate.
>>
>> I know what happened now, the lock chain is so long, 4 locks are involved
>> to form a chain!!!
>>
>> Let me think about how to break the chain.
>
>
> Cong, any success with breaking the chain?

No luck yet. Each part of the chain seems legit, not sure which
one could be reordered. :-/


Re: net: suspicious RCU usage in nf_hook

2017-02-02 Thread Cong Wang
On Wed, Feb 1, 2017 at 3:59 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Wed, 2017-02-01 at 15:48 -0800, Eric Dumazet wrote:
>> On Wed, Feb 1, 2017 at 3:29 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>>
>> > Not sure if it is better. The difference is caught up in 
>> > net_enable_timestamp(),
>> > which is called setsockopt() path and sk_clone() path, so we could be
>> > in netstamp_needed state for a long time too until user-space exercises
>> > these paths.
>> >
>> > I am feeling we probably need to get rid of netstamp_needed_deferred,
>> > and simply defer the whole static_key_slow_dec(), like the attached patch
>> > (compile only).
>> >
>> > What do you think?
>>
>> I think we need to keep the atomic.
>>
>> If two cpus call net_disable_timestamp() roughly at the same time, the
>> work will be scheduled once.

Good point! Yeah, the same work will not be schedule twice.

>
> Updated patch (but not tested yet)

I can't think out a better way to fix this. I expect jump_label to provide
an API for this, but it doesn't, static_key_slow_dec_deferred()
is just for batching. Probably we should introduce one to avoid these
ugly #ifdef HAVE_JUMP_LABEL here, but that is a -next material.

So, please feel free to send it formally.

Thanks.


Re: net: suspicious RCU usage in nf_hook

2017-02-02 Thread Cong Wang
On Wed, Feb 1, 2017 at 3:59 PM, Eric Dumazet  wrote:
> On Wed, 2017-02-01 at 15:48 -0800, Eric Dumazet wrote:
>> On Wed, Feb 1, 2017 at 3:29 PM, Cong Wang  wrote:
>>
>> > Not sure if it is better. The difference is caught up in 
>> > net_enable_timestamp(),
>> > which is called setsockopt() path and sk_clone() path, so we could be
>> > in netstamp_needed state for a long time too until user-space exercises
>> > these paths.
>> >
>> > I am feeling we probably need to get rid of netstamp_needed_deferred,
>> > and simply defer the whole static_key_slow_dec(), like the attached patch
>> > (compile only).
>> >
>> > What do you think?
>>
>> I think we need to keep the atomic.
>>
>> If two cpus call net_disable_timestamp() roughly at the same time, the
>> work will be scheduled once.

Good point! Yeah, the same work will not be schedule twice.

>
> Updated patch (but not tested yet)

I can't think out a better way to fix this. I expect jump_label to provide
an API for this, but it doesn't, static_key_slow_dec_deferred()
is just for batching. Probably we should introduce one to avoid these
ugly #ifdef HAVE_JUMP_LABEL here, but that is a -next material.

So, please feel free to send it formally.

Thanks.


Re: net: suspicious RCU usage in nf_hook

2017-02-01 Thread Cong Wang
On Wed, Feb 1, 2017 at 1:16 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Wed, 2017-02-01 at 12:51 -0800, Cong Wang wrote:
>> On Tue, Jan 31, 2017 at 7:44 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>> > On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote:
>> >
>> >>
>> >> The context is process context (TX path before hitting qdisc), and
>> >> BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
>> >> makes me thinking maybe we really need to disable BH in this
>> >> case for nf_hook()? But it is called in RX path too, and BH is
>> >> already disabled there.
>> >
>> > ipt_do_table() and similar netfilter entry points disable BH.
>> >
>> > Maybe it is done too late.
>>
>> I think we need a fix like the following one for minimum impact.
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 727b6fd..eee7d63 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1720,12 +1720,10 @@ EXPORT_SYMBOL(net_enable_timestamp);
>>  void net_disable_timestamp(void)
>>  {
>>  #ifdef HAVE_JUMP_LABEL
>> -   if (in_interrupt()) {
>> -   atomic_inc(_needed_deferred);
>> -   return;
>> -   }
>> -#endif
>> +   atomic_inc(_needed_deferred);
>> +#else
>> static_key_slow_dec(_needed);
>> +#endif
>>  }
>>  EXPORT_SYMBOL(net_disable_timestamp);
>
> This would permanently leave the kernel in the netstamp_needed state.
>
> I would prefer the patch using a process context to perform the
> cleanup ? Note there is a race window, but probably not a big deal.

Not sure if it is better. The difference is caught up in net_enable_timestamp(),
which is called setsockopt() path and sk_clone() path, so we could be
in netstamp_needed state for a long time too until user-space exercises
these paths.

I am feeling we probably need to get rid of netstamp_needed_deferred,
and simply defer the whole static_key_slow_dec(), like the attached patch
(compile only).

What do you think?
diff --git a/net/core/dev.c b/net/core/dev.c
index 727b6fd..0ef1734 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1694,38 +1694,28 @@ EXPORT_SYMBOL_GPL(net_dec_egress_queue);
 #endif
 
 static struct static_key netstamp_needed __read_mostly;
-#ifdef HAVE_JUMP_LABEL
-/* We are not allowed to call static_key_slow_dec() from irq context
- * If net_disable_timestamp() is called from irq context, defer the
- * static_key_slow_dec() calls.
- */
-static atomic_t netstamp_needed_deferred;
-#endif
 
-void net_enable_timestamp(void)
+static void netstamp_clear(struct work_struct *work)
 {
-#ifdef HAVE_JUMP_LABEL
-   int deferred = atomic_xchg(_needed_deferred, 0);
+   static_key_slow_dec(_needed);
+}
 
-   if (deferred) {
-   while (--deferred)
-   static_key_slow_dec(_needed);
-   return;
-   }
-#endif
+static DECLARE_WORK(netstamp_work, netstamp_clear);
+
+void net_enable_timestamp(void)
+{
+   flush_work(_work);
static_key_slow_inc(_needed);
 }
 EXPORT_SYMBOL(net_enable_timestamp);
 
 void net_disable_timestamp(void)
 {
-#ifdef HAVE_JUMP_LABEL
-   if (in_interrupt()) {
-   atomic_inc(_needed_deferred);
-   return;
-   }
-#endif
-   static_key_slow_dec(_needed);
+   /* We are not allowed to call static_key_slow_dec() from irq context
+* If net_disable_timestamp() is called from irq context, defer the
+* static_key_slow_dec() calls.
+*/
+   schedule_work(_work);
 }
 EXPORT_SYMBOL(net_disable_timestamp);
 


Re: net: suspicious RCU usage in nf_hook

2017-02-01 Thread Cong Wang
On Wed, Feb 1, 2017 at 1:16 PM, Eric Dumazet  wrote:
> On Wed, 2017-02-01 at 12:51 -0800, Cong Wang wrote:
>> On Tue, Jan 31, 2017 at 7:44 AM, Eric Dumazet  wrote:
>> > On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote:
>> >
>> >>
>> >> The context is process context (TX path before hitting qdisc), and
>> >> BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
>> >> makes me thinking maybe we really need to disable BH in this
>> >> case for nf_hook()? But it is called in RX path too, and BH is
>> >> already disabled there.
>> >
>> > ipt_do_table() and similar netfilter entry points disable BH.
>> >
>> > Maybe it is done too late.
>>
>> I think we need a fix like the following one for minimum impact.
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 727b6fd..eee7d63 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1720,12 +1720,10 @@ EXPORT_SYMBOL(net_enable_timestamp);
>>  void net_disable_timestamp(void)
>>  {
>>  #ifdef HAVE_JUMP_LABEL
>> -   if (in_interrupt()) {
>> -   atomic_inc(_needed_deferred);
>> -   return;
>> -   }
>> -#endif
>> +   atomic_inc(_needed_deferred);
>> +#else
>> static_key_slow_dec(_needed);
>> +#endif
>>  }
>>  EXPORT_SYMBOL(net_disable_timestamp);
>
> This would permanently leave the kernel in the netstamp_needed state.
>
> I would prefer the patch using a process context to perform the
> cleanup ? Note there is a race window, but probably not a big deal.

Not sure if it is better. The difference is caught up in net_enable_timestamp(),
which is called setsockopt() path and sk_clone() path, so we could be
in netstamp_needed state for a long time too until user-space exercises
these paths.

I am feeling we probably need to get rid of netstamp_needed_deferred,
and simply defer the whole static_key_slow_dec(), like the attached patch
(compile only).

What do you think?
diff --git a/net/core/dev.c b/net/core/dev.c
index 727b6fd..0ef1734 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1694,38 +1694,28 @@ EXPORT_SYMBOL_GPL(net_dec_egress_queue);
 #endif
 
 static struct static_key netstamp_needed __read_mostly;
-#ifdef HAVE_JUMP_LABEL
-/* We are not allowed to call static_key_slow_dec() from irq context
- * If net_disable_timestamp() is called from irq context, defer the
- * static_key_slow_dec() calls.
- */
-static atomic_t netstamp_needed_deferred;
-#endif
 
-void net_enable_timestamp(void)
+static void netstamp_clear(struct work_struct *work)
 {
-#ifdef HAVE_JUMP_LABEL
-   int deferred = atomic_xchg(_needed_deferred, 0);
+   static_key_slow_dec(_needed);
+}
 
-   if (deferred) {
-   while (--deferred)
-   static_key_slow_dec(_needed);
-   return;
-   }
-#endif
+static DECLARE_WORK(netstamp_work, netstamp_clear);
+
+void net_enable_timestamp(void)
+{
+   flush_work(_work);
static_key_slow_inc(_needed);
 }
 EXPORT_SYMBOL(net_enable_timestamp);
 
 void net_disable_timestamp(void)
 {
-#ifdef HAVE_JUMP_LABEL
-   if (in_interrupt()) {
-   atomic_inc(_needed_deferred);
-   return;
-   }
-#endif
-   static_key_slow_dec(_needed);
+   /* We are not allowed to call static_key_slow_dec() from irq context
+* If net_disable_timestamp() is called from irq context, defer the
+* static_key_slow_dec() calls.
+*/
+   schedule_work(_work);
 }
 EXPORT_SYMBOL(net_disable_timestamp);
 


Re: net: suspicious RCU usage in nf_hook

2017-02-01 Thread Cong Wang
On Tue, Jan 31, 2017 at 7:44 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote:
>
>>
>> The context is process context (TX path before hitting qdisc), and
>> BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
>> makes me thinking maybe we really need to disable BH in this
>> case for nf_hook()? But it is called in RX path too, and BH is
>> already disabled there.
>
> ipt_do_table() and similar netfilter entry points disable BH.
>
> Maybe it is done too late.

I think we need a fix like the following one for minimum impact.

diff --git a/net/core/dev.c b/net/core/dev.c
index 727b6fd..eee7d63 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1720,12 +1720,10 @@ EXPORT_SYMBOL(net_enable_timestamp);
 void net_disable_timestamp(void)
 {
 #ifdef HAVE_JUMP_LABEL
-   if (in_interrupt()) {
-   atomic_inc(_needed_deferred);
-   return;
-   }
-#endif
+   atomic_inc(_needed_deferred);
+#else
static_key_slow_dec(_needed);
+#endif
 }
 EXPORT_SYMBOL(net_disable_timestamp);


Re: net: suspicious RCU usage in nf_hook

2017-02-01 Thread Cong Wang
On Tue, Jan 31, 2017 at 7:44 AM, Eric Dumazet  wrote:
> On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote:
>
>>
>> The context is process context (TX path before hitting qdisc), and
>> BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
>> makes me thinking maybe we really need to disable BH in this
>> case for nf_hook()? But it is called in RX path too, and BH is
>> already disabled there.
>
> ipt_do_table() and similar netfilter entry points disable BH.
>
> Maybe it is done too late.

I think we need a fix like the following one for minimum impact.

diff --git a/net/core/dev.c b/net/core/dev.c
index 727b6fd..eee7d63 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1720,12 +1720,10 @@ EXPORT_SYMBOL(net_enable_timestamp);
 void net_disable_timestamp(void)
 {
 #ifdef HAVE_JUMP_LABEL
-   if (in_interrupt()) {
-   atomic_inc(_needed_deferred);
-   return;
-   }
-#endif
+   atomic_inc(_needed_deferred);
+#else
static_key_slow_dec(_needed);
+#endif
 }
 EXPORT_SYMBOL(net_disable_timestamp);


Re: fs, net: deadlock between bind/splice on af_unix

2017-01-30 Thread Cong Wang
On Thu, Jan 26, 2017 at 10:41 PM, Mateusz Guzik <mgu...@redhat.com> wrote:
> On Thu, Jan 26, 2017 at 09:11:07PM -0800, Cong Wang wrote:
>> On Thu, Jan 26, 2017 at 3:29 PM, Mateusz Guzik <mgu...@redhat.com> wrote:
>> > Currently the file creation is potponed until unix_bind can no longer
>> > fail otherwise. With it reordered, it may be someone races you with a
>> > different path and now you are left with a file to clean up. Except it
>> > is quite unclear for me if you can unlink it.
>>
>> What races do you mean here? If you mean someone could get a
>> refcount of that file, it could happen no matter we have bindlock or not
>> since it is visible once created. The filesystem layer should take care of
>> the file refcount so all we need to do here is calling path_put() as in my
>> patch. Or if you mean two threads calling unix_bind() could race without
>> binlock, only one of them should succeed the other one just fails out.
>
> Two threads can race and one fails with EINVAL.
>
> With your patch there is a new file created and it is unclear what to
> do with it - leaving it as it is sounds like the last resort and
> unlinking it sounds extremely fishy as it opens you to games played by
> the user.

But the file is created and visible to users too even without my patch,
the file is also put when the unix sock is released. So the only difference
my patch makes is bindlock is no longer taken during file creation, which
does not seem to be the cause of the problem you complain here.

Mind being more specific?


Re: fs, net: deadlock between bind/splice on af_unix

2017-01-30 Thread Cong Wang
On Thu, Jan 26, 2017 at 10:41 PM, Mateusz Guzik  wrote:
> On Thu, Jan 26, 2017 at 09:11:07PM -0800, Cong Wang wrote:
>> On Thu, Jan 26, 2017 at 3:29 PM, Mateusz Guzik  wrote:
>> > Currently the file creation is potponed until unix_bind can no longer
>> > fail otherwise. With it reordered, it may be someone races you with a
>> > different path and now you are left with a file to clean up. Except it
>> > is quite unclear for me if you can unlink it.
>>
>> What races do you mean here? If you mean someone could get a
>> refcount of that file, it could happen no matter we have bindlock or not
>> since it is visible once created. The filesystem layer should take care of
>> the file refcount so all we need to do here is calling path_put() as in my
>> patch. Or if you mean two threads calling unix_bind() could race without
>> binlock, only one of them should succeed the other one just fails out.
>
> Two threads can race and one fails with EINVAL.
>
> With your patch there is a new file created and it is unclear what to
> do with it - leaving it as it is sounds like the last resort and
> unlinking it sounds extremely fishy as it opens you to games played by
> the user.

But the file is created and visible to users too even without my patch,
the file is also put when the unix sock is released. So the only difference
my patch makes is bindlock is no longer taken during file creation, which
does not seem to be the cause of the problem you complain here.

Mind being more specific?


Re: net: suspicious RCU usage in nf_hook

2017-01-30 Thread Cong Wang
On Fri, Jan 27, 2017 at 5:31 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Fri, 2017-01-27 at 17:00 -0800, Cong Wang wrote:
>> On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>> > Oh well, I forgot to submit the official patch I think, Jan 9th.
>> >
>> > https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>> >
>>
>> Hmm, but why only fragments need skb_orphan()? It seems like
>> any kfree_skb() inside a nf hook needs to have a preceding
>> skb_orphan().
>
>
>>
>> Also, I am not convinced it is similar to commit 8282f27449bf15548
>> which is on RX path.
>
> Well, we clearly see IPv6 reassembly being part of the equation in both
> cases.

Yeah, of course. My worry is that this problem is more than just
IPv6 reassembly.

>
> I was replying to first part of the splat [1], which was already
> diagnosed and had a non official patch.
>
> use after free is also a bug, regardless of jump label being used or
> not.
>
> I still do not really understand this nf_hook issue, I thought we were
> disabling BH in netfilter.

It is a different warning from use-after-free, this one is about sleep
in atomic context, mutex lock is acquired with RCU read lock held.


>
> So the in_interrupt() check in net_disable_timestamp() should trigger,
> this was the intent of netstamp_needed_deferred existence.
>
> Not sure if we can test for rcu_read_lock() as well.
>

The context is process context (TX path before hitting qdisc), and
BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
makes me thinking maybe we really need to disable BH in this
case for nf_hook()? But it is called in RX path too, and BH is
already disabled there.


Re: net: suspicious RCU usage in nf_hook

2017-01-30 Thread Cong Wang
On Fri, Jan 27, 2017 at 5:31 PM, Eric Dumazet  wrote:
> On Fri, 2017-01-27 at 17:00 -0800, Cong Wang wrote:
>> On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet  wrote:
>> > Oh well, I forgot to submit the official patch I think, Jan 9th.
>> >
>> > https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>> >
>>
>> Hmm, but why only fragments need skb_orphan()? It seems like
>> any kfree_skb() inside a nf hook needs to have a preceding
>> skb_orphan().
>
>
>>
>> Also, I am not convinced it is similar to commit 8282f27449bf15548
>> which is on RX path.
>
> Well, we clearly see IPv6 reassembly being part of the equation in both
> cases.

Yeah, of course. My worry is that this problem is more than just
IPv6 reassembly.

>
> I was replying to first part of the splat [1], which was already
> diagnosed and had a non official patch.
>
> use after free is also a bug, regardless of jump label being used or
> not.
>
> I still do not really understand this nf_hook issue, I thought we were
> disabling BH in netfilter.

It is a different warning from use-after-free, this one is about sleep
in atomic context, mutex lock is acquired with RCU read lock held.


>
> So the in_interrupt() check in net_disable_timestamp() should trigger,
> this was the intent of netstamp_needed_deferred existence.
>
> Not sure if we can test for rcu_read_lock() as well.
>

The context is process context (TX path before hitting qdisc), and
BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
makes me thinking maybe we really need to disable BH in this
case for nf_hook()? But it is called in RX path too, and BH is
already disabled there.


Re: net: suspicious RCU usage in nf_hook

2017-01-27 Thread Cong Wang
On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet  wrote:
> Oh well, I forgot to submit the official patch I think, Jan 9th.
>
> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>

Hmm, but why only fragments need skb_orphan()? It seems like
any kfree_skb() inside a nf hook needs to have a preceding
skb_orphan().

Also, I am not convinced it is similar to commit 8282f27449bf15548
which is on RX path.


Re: net: suspicious RCU usage in nf_hook

2017-01-27 Thread Cong Wang
On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet  wrote:
> Oh well, I forgot to submit the official patch I think, Jan 9th.
>
> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>

Hmm, but why only fragments need skb_orphan()? It seems like
any kfree_skb() inside a nf hook needs to have a preceding
skb_orphan().

Also, I am not convinced it is similar to commit 8282f27449bf15548
which is on RX path.


Re: net: suspicious RCU usage in nf_hook

2017-01-27 Thread Cong Wang
On Fri, Jan 27, 2017 at 3:22 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
> On Fri, Jan 27, 2017 at 1:15 PM, Dmitry Vyukov <dvyu...@google.com> wrote:
>> stack backtrace:
>> CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:15 [inline]
>>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>>  lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
>>  rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
>>  ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
>>  __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
>>  mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
>>  atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
>>  __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
>>  static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
>>  net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
>>  sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
>>  __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
>>  sk_destruct+0x47/0x80 net/core/sock.c:1460
>
> jump label uses a mutex and we call jump label API in softIRQ context...
> Maybe we have to move it to a work struct as what we did for netlink.

Correct myself: process context but with RCU read lock held...


<    2   3   4   5   6   7   8   9   10   11   >