Re: INFO: task hung in nbd_ioctl (3)

2021-02-15 Thread syzbot
syzbot has found a reproducer for the following issue on:

HEAD commit:f40ddce8 Linux 5.11
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10a8b204d0
kernel config:  https://syzkaller.appspot.com/x/.config?x=4b919ebed7b4902
dashboard link: https://syzkaller.appspot.com/bug?extid=fe03c50d25c0188f7487
compiler:   Debian clang version 11.0.1-2
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=11a7953cd0
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11bc9914d0

The issue was bisected to:

commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4
Author: Mike Christie 
Date:   Sun Aug 4 19:10:06 2019 +

nbd: fix max number of supported devs

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=171556f050
final oops: https://syzkaller.appspot.com/x/report.txt?x=149556f050
console output: https://syzkaller.appspot.com/x/log.txt?x=109556f050

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+fe03c50d25c0188f7...@syzkaller.appspotmail.com
Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")

INFO: task syz-executor645:8465 blocked for more than 143 seconds.
  Not tainted 5.11.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor645 state:D stack:28256 pid: 8465 ppid:  8464 flags:0x4004
Call Trace:
 context_switch kernel/sched/core.c:4327 [inline]
 __schedule+0x999/0xe70 kernel/sched/core.c:5078
 schedule+0x14b/0x200 kernel/sched/core.c:5157
 schedule_timeout+0x43/0x250 kernel/time/timer.c:1854
 do_wait_for_common+0x266/0x3a0 kernel/sched/completion.c:85
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x43/0x50 kernel/sched/completion.c:138
 flush_workqueue+0x704/0x1620 kernel/workqueue.c:2838
 nbd_start_device_ioctl drivers/block/nbd.c:1332 [inline]
 __nbd_ioctl drivers/block/nbd.c:1393 [inline]
 nbd_ioctl+0x76d/0x940 drivers/block/nbd.c:1433
 blkdev_ioctl+0x2d6/0x5f0 block/ioctl.c:576
 block_ioctl+0xae/0xf0 fs/block_dev.c:1658
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:739
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4441e9
RSP: 002b:7ffc87363948 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 004004a0 RCX: 004441e9
RDX:  RSI: ab03 RDI: 0003
RBP:  R08: 7ffc87363ae8 R09: 7ffc87363ae8
R10: 7ffc87363ae8 R11: 0246 R12: 00403500
R13: 431bde82d7b634db R14: 004b2018 R15: 004004a0

Showing all locks held in the system:
1 lock held by khungtaskd/1644:
 #0: 8c711680 (rcu_read_lock){}-{1:2}, at: 
rcu_lock_acquire+0x0/0x30 arch/x86/pci/mmconfig_64.c:151
3 locks held by kworker/u5:0/2034:
 #0: 888016152938 ((wq_completion)knbd0-recv){+.+.}-{0:0}, at: 
process_one_work+0x6f4/0xfc0 kernel/workqueue.c:2248
 #1: c900072afd78 ((work_completion)(>work)){+.+.}-{0:0}, at: 
process_one_work+0x733/0xfc0 kernel/workqueue.c:2250
 #2: 888018ca7120 (sk_lock-AF_AX25){+.+.}-{0:0}, at: lock_sock 
include/net/sock.h:1594 [inline]
 #2: 888018ca7120 (sk_lock-AF_AX25){+.+.}-{0:0}, at: 
ax25_recvmsg+0x86/0x740 net/ax25/af_ax25.c:1626
1 lock held by in:imklog/8097:
 #0: 8880198b14f0 (>f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x24e/0x2f0 
fs/file.c:947

=

NMI backtrace for cpu 1
CPU: 1 PID: 1644 Comm: khungtaskd Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x137/0x1be lib/dump_stack.c:120
 nmi_cpu_backtrace+0x16c/0x190 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x191/0x2f0 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:209 [inline]
 watchdog+0xce9/0xd30 kernel/hung_task.c:294
 kthread+0x39a/0x3c0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 4866 Comm: systemd-journal Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:rw_verify_area+0xef/0x370 fs/read_write.c:392
Code: 49 8d 9f 20 02 00 00 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df 
80 3c 08 00 74 08 48 89 df e8 55 a5 f6 ff 48 83 3b 00 <74> 6f 49 8d 5f 28 48 89 
d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff
RSP: 0018:c9000167fde0 EFLAGS: 0246
RAX: 1110280e66bc RBX: 8881407335e0 RCX: dc00
RDX:  RSI: 2000 RDI: 
RBP: c9000167ff00 R08: 81c4ad94 R09: 

Re: INFO: task hung in nbd_ioctl (3)

2020-10-10 Thread syzbot
syzbot has bisected this issue to:

commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4
Author: Mike Christie 
Date:   Sun Aug 4 19:10:06 2019 +

nbd: fix max number of supported devs

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=171556f050
start commit:   fb0155a0 Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs...
git tree:   upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=149556f050
console output: https://syzkaller.appspot.com/x/log.txt?x=109556f050
kernel config:  https://syzkaller.appspot.com/x/.config?x=41b736b7ce1b3ea4
dashboard link: https://syzkaller.appspot.com/bug?extid=fe03c50d25c0188f7487
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=173d9b1790

Reported-by: syzbot+fe03c50d25c0188f7...@syzkaller.appspotmail.com
Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection


Re: INFO: task hung in nbd_ioctl

2019-10-17 Thread Richard W.M. Jones
On Thu, Oct 17, 2019 at 10:47:59AM -0500, Mike Christie wrote:
> On 10/17/2019 09:03 AM, Richard W.M. Jones wrote:
> > On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote:
> >> Hey Josef and nbd list,
> >>
> >> I had a question about if there are any socket family restrictions for nbd?
> > 
> > In normal circumstances, in userspace, the NBD protocol would only be
> > used over AF_UNIX or AF_INET/AF_INET6.
> > 
> > There's a bit of confusion because netlink is used by nbd-client to
> > configure the NBD device, setting things like block size and timeouts
> > (instead of ioctl which is deprecated).  I think you don't mean this
> > use of netlink?
> 
> I didn't. It looks like it is just a bad test.
> 
> For the automated test in this thread the test created a AF_NETLINK
> socket and passed it into the NBD_SET_SOCK ioctl. That is what got used
> for the NBD_DO_IT ioctl.
> 
> I was not sure if the test creator picked any old socket and it just
> happened to pick one nbd never supported, or it was trying to simulate
> sockets that did not support the shutdown method.
> 
> I attached the automated test that got run (test.c).

I'd say it sounds like a bad test, but I'm not familiar with syzkaller
nor how / from where it generates these tests.  Did someone report a
bug and then syzkaller wrote this test?

Rich.

> > 
> >> The bug here is that some socket familys do not support the
> >> sock->ops->shutdown callout, and when nbd calls kernel_sock_shutdown
> >> their callout returns -EOPNOTSUPP. That then leaves recv_work stuck in
> >> nbd_read_stat -> sock_xmit -> sock_recvmsg. My patch added a
> >> flush_workqueue call, so for socket familys like AF_NETLINK in this bug
> >> we hang like we see below.
> >>
> >> I can just remove the flush_workqueue call in that code path since it's
> >> not needed there, but it leaves the original bug my patch was hitting
> >> where we leave the recv_work running which can then result in leaked
> >> resources, or possible use after free crashes and you still get the hang
> >> if you remove the module.
> >>
> >> It looks like we have used kernel_sock_shutdown for a while so I thought
> >> we might never have supported sockets that did not support the callout.
> >> Is that correct? If so then I can just add a check for this in
> >> nbd_add_socket and fix that bug too.
> > 
> > Rich.
> > 
> >> On 09/30/2019 05:39 PM, syzbot wrote:
> >>> Hello,
> >>>
> >>> syzbot found the following crash on:
> >>>
> >>> HEAD commit:bb2aee77 Add linux-next specific files for 20190926
> >>> git tree:   linux-next
> >>> console output: https://syzkaller.appspot.com/x/log.txt?x=13385ca360
> >>> kernel config:  https://syzkaller.appspot.com/x/.config?x=e60af4ac5a01e964
> >>> dashboard link:
> >>> https://syzkaller.appspot.com/bug?extid=24c12fa8d218ed26011a
> >>> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> >>> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=12abc2a360
> >>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11712c0560
> >>>
> >>> The bug was bisected to:
> >>>
> >>> commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4
> >>> Author: Mike Christie 
> >>> Date:   Sun Aug 4 19:10:06 2019 +
> >>>
> >>> nbd: fix max number of supported devs
> >>>
> >>> bisection log:  
> >>> https://syzkaller.appspot.com/x/bisect.txt?x=1226f3c560
> >>> final crash:
> >>> https://syzkaller.appspot.com/x/report.txt?x=1126f3c560
> >>> console output: https://syzkaller.appspot.com/x/log.txt?x=1626f3c560
> >>>
> >>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >>> Reported-by: syzbot+24c12fa8d218ed260...@syzkaller.appspotmail.com
> >>> Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
> >>>
> >>> INFO: task syz-executor390:8778 can't die for more than 143 seconds.
> >>> syz-executor390 D27432  8778   8777 0x4004
> >>> Call Trace:
> >>>  context_switch kernel/sched/core.c:3384 [inline]
> >>>  __schedule+0x828/0x1c20 kernel/sched/core.c:4065
> >>>  schedule+0xd9/0x260 kernel/sched/core.c:4132
> >>>  schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871
> >>>  do_wait_for_common kernel/sched/completion.c:83 [inline]
> >>>  __wait_for_common kernel/sched/completion.c:104 [inline]
> >>>  wait_for_common kernel/sched/completion.c:115 [inline]
> >>>  wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
> >>>  flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826
> >>>  nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline]
> >>>  __nbd_ioctl drivers/block/nbd.c:1347 [inline]
> >>>  nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387
> >>>  __blkdev_driver_ioctl block/ioctl.c:304 [inline]
> >>>  blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606
> >>>  block_ioctl+0xee/0x130 fs/block_dev.c:1954
> >>>  vfs_ioctl fs/ioctl.c:47 [inline]
> >>>  file_ioctl fs/ioctl.c:539 [inline]
> >>>  do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726
> >>>  ksys_ioctl+0xab/0xd0 fs/ioctl.c:743
> >>>  __do_sys_ioctl fs/ioctl.c:750 

Re: INFO: task hung in nbd_ioctl

2019-10-01 Thread Mike Christie
On 09/30/2019 05:39 PM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:bb2aee77 Add linux-next specific files for 20190926
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13385ca360
> kernel config:  https://syzkaller.appspot.com/x/.config?x=e60af4ac5a01e964
> dashboard link:
> https://syzkaller.appspot.com/bug?extid=24c12fa8d218ed26011a
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=12abc2a360
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11712c0560
> 
> The bug was bisected to:
> 
> commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4
> Author: Mike Christie 
> Date:   Sun Aug 4 19:10:06 2019 +
> 
> nbd: fix max number of supported devs
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1226f3c560
> final crash:https://syzkaller.appspot.com/x/report.txt?x=1126f3c560
> console output: https://syzkaller.appspot.com/x/log.txt?x=1626f3c560
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+24c12fa8d218ed260...@syzkaller.appspotmail.com
> Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
> 
> INFO: task syz-executor390:8778 can't die for more than 143 seconds.
> syz-executor390 D27432  8778   8777 0x4004
> Call Trace:
>  context_switch kernel/sched/core.c:3384 [inline]
>  __schedule+0x828/0x1c20 kernel/sched/core.c:4065
>  schedule+0xd9/0x260 kernel/sched/core.c:4132
>  schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871
>  do_wait_for_common kernel/sched/completion.c:83 [inline]
>  __wait_for_common kernel/sched/completion.c:104 [inline]
>  wait_for_common kernel/sched/completion.c:115 [inline]
>  wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
>  flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826
>  nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline]
>  __nbd_ioctl drivers/block/nbd.c:1347 [inline]
>  nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387
>  __blkdev_driver_ioctl block/ioctl.c:304 [inline]
>  blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606
>  block_ioctl+0xee/0x130 fs/block_dev.c:1954
>  vfs_ioctl fs/ioctl.c:47 [inline]
>  file_ioctl fs/ioctl.c:539 [inline]
>  do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726
>  ksys_ioctl+0xab/0xd0 fs/ioctl.c:743
>  __do_sys_ioctl fs/ioctl.c:750 [inline]
>  __se_sys_ioctl fs/ioctl.c:748 [inline]
>  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748
>  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4452d9
> Code: Bad RIP value.
> RSP: 002b:7ffde928d288 EFLAGS: 0246 ORIG_RAX: 0010
> RAX: ffda RBX:  RCX: 004452d9
> RDX:  RSI: ab03 RDI: 0004
> RBP:  R08: 004025b0 R09: 004025b0
> R10:  R11: 0246 R12: 00402520
> R13: 004025b0 R14:  R15: 
> INFO: task syz-executor390:8778 blocked for more than 143 seconds.
>   Not tainted 5.3.0-next-20190926 #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor390 D27432  8778   8777 0x4004
> Call Trace:
>  context_switch kernel/sched/core.c:3384 [inline]
>  __schedule+0x828/0x1c20 kernel/sched/core.c:4065
>  schedule+0xd9/0x260 kernel/sched/core.c:4132
>  schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871
>  do_wait_for_common kernel/sched/completion.c:83 [inline]
>  __wait_for_common kernel/sched/completion.c:104 [inline]
>  wait_for_common kernel/sched/completion.c:115 [inline]
>  wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
>  flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826
>  nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline]
>  __nbd_ioctl drivers/block/nbd.c:1347 [inline]
>  nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387
>  __blkdev_driver_ioctl block/ioctl.c:304 [inline]
>  blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606
>  block_ioctl+0xee/0x130 fs/block_dev.c:1954
>  vfs_ioctl fs/ioctl.c:47 [inline]
>  file_ioctl fs/ioctl.c:539 [inline]
>  do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726
>  ksys_ioctl+0xab/0xd0 fs/ioctl.c:743
>  __do_sys_ioctl fs/ioctl.c:750 [inline]
>  __se_sys_ioctl fs/ioctl.c:748 [inline]
>  __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748
>  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4452d9
> Code: Bad RIP value.
> RSP: 002b:7ffde928d288 EFLAGS: 0246 ORIG_RAX: 0010
> RAX: ffda RBX:  RCX: 004452d9
> RDX:  RSI: ab03 RDI: 0004
> RBP:  R08: 004025b0 R09: 004025b0
> R10:  R11: 0246 R12: 00402520
> R13: 004025b0 R14:  R15: 
> 

I will send a fix for