Re: INFO: task hung in nbd_ioctl (3)
syzbot has found a reproducer for the following issue on: HEAD commit:f40ddce8 Linux 5.11 git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=10a8b204d0 kernel config: https://syzkaller.appspot.com/x/.config?x=4b919ebed7b4902 dashboard link: https://syzkaller.appspot.com/bug?extid=fe03c50d25c0188f7487 compiler: Debian clang version 11.0.1-2 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11a7953cd0 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11bc9914d0 The issue was bisected to: commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 Author: Mike Christie Date: Sun Aug 4 19:10:06 2019 + nbd: fix max number of supported devs bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=171556f050 final oops: https://syzkaller.appspot.com/x/report.txt?x=149556f050 console output: https://syzkaller.appspot.com/x/log.txt?x=109556f050 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+fe03c50d25c0188f7...@syzkaller.appspotmail.com Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") INFO: task syz-executor645:8465 blocked for more than 143 seconds. Not tainted 5.11.0-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor645 state:D stack:28256 pid: 8465 ppid: 8464 flags:0x4004 Call Trace: context_switch kernel/sched/core.c:4327 [inline] __schedule+0x999/0xe70 kernel/sched/core.c:5078 schedule+0x14b/0x200 kernel/sched/core.c:5157 schedule_timeout+0x43/0x250 kernel/time/timer.c:1854 do_wait_for_common+0x266/0x3a0 kernel/sched/completion.c:85 __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x43/0x50 kernel/sched/completion.c:138 flush_workqueue+0x704/0x1620 kernel/workqueue.c:2838 nbd_start_device_ioctl drivers/block/nbd.c:1332 [inline] __nbd_ioctl drivers/block/nbd.c:1393 [inline] nbd_ioctl+0x76d/0x940 drivers/block/nbd.c:1433 blkdev_ioctl+0x2d6/0x5f0 block/ioctl.c:576 block_ioctl+0xae/0xf0 fs/block_dev.c:1658 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x4441e9 RSP: 002b:7ffc87363948 EFLAGS: 0246 ORIG_RAX: 0010 RAX: ffda RBX: 004004a0 RCX: 004441e9 RDX: RSI: ab03 RDI: 0003 RBP: R08: 7ffc87363ae8 R09: 7ffc87363ae8 R10: 7ffc87363ae8 R11: 0246 R12: 00403500 R13: 431bde82d7b634db R14: 004b2018 R15: 004004a0 Showing all locks held in the system: 1 lock held by khungtaskd/1644: #0: 8c711680 (rcu_read_lock){}-{1:2}, at: rcu_lock_acquire+0x0/0x30 arch/x86/pci/mmconfig_64.c:151 3 locks held by kworker/u5:0/2034: #0: 888016152938 ((wq_completion)knbd0-recv){+.+.}-{0:0}, at: process_one_work+0x6f4/0xfc0 kernel/workqueue.c:2248 #1: c900072afd78 ((work_completion)(>work)){+.+.}-{0:0}, at: process_one_work+0x733/0xfc0 kernel/workqueue.c:2250 #2: 888018ca7120 (sk_lock-AF_AX25){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1594 [inline] #2: 888018ca7120 (sk_lock-AF_AX25){+.+.}-{0:0}, at: ax25_recvmsg+0x86/0x740 net/ax25/af_ax25.c:1626 1 lock held by in:imklog/8097: #0: 8880198b14f0 (>f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x24e/0x2f0 fs/file.c:947 = NMI backtrace for cpu 1 CPU: 1 PID: 1644 Comm: khungtaskd Not tainted 5.11.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x137/0x1be lib/dump_stack.c:120 nmi_cpu_backtrace+0x16c/0x190 lib/nmi_backtrace.c:105 nmi_trigger_cpumask_backtrace+0x191/0x2f0 lib/nmi_backtrace.c:62 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline] check_hung_uninterruptible_tasks kernel/hung_task.c:209 [inline] watchdog+0xce9/0xd30 kernel/hung_task.c:294 kthread+0x39a/0x3c0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 4866 Comm: systemd-journal Not tainted 5.11.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:rw_verify_area+0xef/0x370 fs/read_write.c:392 Code: 49 8d 9f 20 02 00 00 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df 80 3c 08 00 74 08 48 89 df e8 55 a5 f6 ff 48 83 3b 00 <74> 6f 49 8d 5f 28 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff RSP: 0018:c9000167fde0 EFLAGS: 0246 RAX: 1110280e66bc RBX: 8881407335e0 RCX: dc00 RDX: RSI: 2000 RDI: RBP: c9000167ff00 R08: 81c4ad94 R09:
Re: INFO: task hung in nbd_ioctl (3)
syzbot has bisected this issue to: commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 Author: Mike Christie Date: Sun Aug 4 19:10:06 2019 + nbd: fix max number of supported devs bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=171556f050 start commit: fb0155a0 Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs... git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=149556f050 console output: https://syzkaller.appspot.com/x/log.txt?x=109556f050 kernel config: https://syzkaller.appspot.com/x/.config?x=41b736b7ce1b3ea4 dashboard link: https://syzkaller.appspot.com/bug?extid=fe03c50d25c0188f7487 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=173d9b1790 Reported-by: syzbot+fe03c50d25c0188f7...@syzkaller.appspotmail.com Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") For information about bisection process see: https://goo.gl/tpsmEJ#bisection
Re: INFO: task hung in nbd_ioctl
On Thu, Oct 17, 2019 at 10:47:59AM -0500, Mike Christie wrote: > On 10/17/2019 09:03 AM, Richard W.M. Jones wrote: > > On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote: > >> Hey Josef and nbd list, > >> > >> I had a question about if there are any socket family restrictions for nbd? > > > > In normal circumstances, in userspace, the NBD protocol would only be > > used over AF_UNIX or AF_INET/AF_INET6. > > > > There's a bit of confusion because netlink is used by nbd-client to > > configure the NBD device, setting things like block size and timeouts > > (instead of ioctl which is deprecated). I think you don't mean this > > use of netlink? > > I didn't. It looks like it is just a bad test. > > For the automated test in this thread the test created a AF_NETLINK > socket and passed it into the NBD_SET_SOCK ioctl. That is what got used > for the NBD_DO_IT ioctl. > > I was not sure if the test creator picked any old socket and it just > happened to pick one nbd never supported, or it was trying to simulate > sockets that did not support the shutdown method. > > I attached the automated test that got run (test.c). I'd say it sounds like a bad test, but I'm not familiar with syzkaller nor how / from where it generates these tests. Did someone report a bug and then syzkaller wrote this test? Rich. > > > >> The bug here is that some socket familys do not support the > >> sock->ops->shutdown callout, and when nbd calls kernel_sock_shutdown > >> their callout returns -EOPNOTSUPP. That then leaves recv_work stuck in > >> nbd_read_stat -> sock_xmit -> sock_recvmsg. My patch added a > >> flush_workqueue call, so for socket familys like AF_NETLINK in this bug > >> we hang like we see below. > >> > >> I can just remove the flush_workqueue call in that code path since it's > >> not needed there, but it leaves the original bug my patch was hitting > >> where we leave the recv_work running which can then result in leaked > >> resources, or possible use after free crashes and you still get the hang > >> if you remove the module. > >> > >> It looks like we have used kernel_sock_shutdown for a while so I thought > >> we might never have supported sockets that did not support the callout. > >> Is that correct? If so then I can just add a check for this in > >> nbd_add_socket and fix that bug too. > > > > Rich. > > > >> On 09/30/2019 05:39 PM, syzbot wrote: > >>> Hello, > >>> > >>> syzbot found the following crash on: > >>> > >>> HEAD commit:bb2aee77 Add linux-next specific files for 20190926 > >>> git tree: linux-next > >>> console output: https://syzkaller.appspot.com/x/log.txt?x=13385ca360 > >>> kernel config: https://syzkaller.appspot.com/x/.config?x=e60af4ac5a01e964 > >>> dashboard link: > >>> https://syzkaller.appspot.com/bug?extid=24c12fa8d218ed26011a > >>> compiler: gcc (GCC) 9.0.0 20181231 (experimental) > >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12abc2a360 > >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11712c0560 > >>> > >>> The bug was bisected to: > >>> > >>> commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 > >>> Author: Mike Christie > >>> Date: Sun Aug 4 19:10:06 2019 + > >>> > >>> nbd: fix max number of supported devs > >>> > >>> bisection log: > >>> https://syzkaller.appspot.com/x/bisect.txt?x=1226f3c560 > >>> final crash: > >>> https://syzkaller.appspot.com/x/report.txt?x=1126f3c560 > >>> console output: https://syzkaller.appspot.com/x/log.txt?x=1626f3c560 > >>> > >>> IMPORTANT: if you fix the bug, please add the following tag to the commit: > >>> Reported-by: syzbot+24c12fa8d218ed260...@syzkaller.appspotmail.com > >>> Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") > >>> > >>> INFO: task syz-executor390:8778 can't die for more than 143 seconds. > >>> syz-executor390 D27432 8778 8777 0x4004 > >>> Call Trace: > >>> context_switch kernel/sched/core.c:3384 [inline] > >>> __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > >>> schedule+0xd9/0x260 kernel/sched/core.c:4132 > >>> schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > >>> do_wait_for_common kernel/sched/completion.c:83 [inline] > >>> __wait_for_common kernel/sched/completion.c:104 [inline] > >>> wait_for_common kernel/sched/completion.c:115 [inline] > >>> wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > >>> flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > >>> nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > >>> __nbd_ioctl drivers/block/nbd.c:1347 [inline] > >>> nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > >>> __blkdev_driver_ioctl block/ioctl.c:304 [inline] > >>> blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > >>> block_ioctl+0xee/0x130 fs/block_dev.c:1954 > >>> vfs_ioctl fs/ioctl.c:47 [inline] > >>> file_ioctl fs/ioctl.c:539 [inline] > >>> do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > >>> ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > >>> __do_sys_ioctl fs/ioctl.c:750
Re: INFO: task hung in nbd_ioctl
On 09/30/2019 05:39 PM, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit:bb2aee77 Add linux-next specific files for 20190926 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=13385ca360 > kernel config: https://syzkaller.appspot.com/x/.config?x=e60af4ac5a01e964 > dashboard link: > https://syzkaller.appspot.com/bug?extid=24c12fa8d218ed26011a > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12abc2a360 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11712c0560 > > The bug was bisected to: > > commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 > Author: Mike Christie > Date: Sun Aug 4 19:10:06 2019 + > > nbd: fix max number of supported devs > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1226f3c560 > final crash:https://syzkaller.appspot.com/x/report.txt?x=1126f3c560 > console output: https://syzkaller.appspot.com/x/log.txt?x=1626f3c560 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+24c12fa8d218ed260...@syzkaller.appspotmail.com > Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") > > INFO: task syz-executor390:8778 can't die for more than 143 seconds. > syz-executor390 D27432 8778 8777 0x4004 > Call Trace: > context_switch kernel/sched/core.c:3384 [inline] > __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > schedule+0xd9/0x260 kernel/sched/core.c:4132 > schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > do_wait_for_common kernel/sched/completion.c:83 [inline] > __wait_for_common kernel/sched/completion.c:104 [inline] > wait_for_common kernel/sched/completion.c:115 [inline] > wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > __nbd_ioctl drivers/block/nbd.c:1347 [inline] > nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > __blkdev_driver_ioctl block/ioctl.c:304 [inline] > blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > block_ioctl+0xee/0x130 fs/block_dev.c:1954 > vfs_ioctl fs/ioctl.c:47 [inline] > file_ioctl fs/ioctl.c:539 [inline] > do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > __do_sys_ioctl fs/ioctl.c:750 [inline] > __se_sys_ioctl fs/ioctl.c:748 [inline] > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 > do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x4452d9 > Code: Bad RIP value. > RSP: 002b:7ffde928d288 EFLAGS: 0246 ORIG_RAX: 0010 > RAX: ffda RBX: RCX: 004452d9 > RDX: RSI: ab03 RDI: 0004 > RBP: R08: 004025b0 R09: 004025b0 > R10: R11: 0246 R12: 00402520 > R13: 004025b0 R14: R15: > INFO: task syz-executor390:8778 blocked for more than 143 seconds. > Not tainted 5.3.0-next-20190926 #0 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > syz-executor390 D27432 8778 8777 0x4004 > Call Trace: > context_switch kernel/sched/core.c:3384 [inline] > __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > schedule+0xd9/0x260 kernel/sched/core.c:4132 > schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > do_wait_for_common kernel/sched/completion.c:83 [inline] > __wait_for_common kernel/sched/completion.c:104 [inline] > wait_for_common kernel/sched/completion.c:115 [inline] > wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > __nbd_ioctl drivers/block/nbd.c:1347 [inline] > nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > __blkdev_driver_ioctl block/ioctl.c:304 [inline] > blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > block_ioctl+0xee/0x130 fs/block_dev.c:1954 > vfs_ioctl fs/ioctl.c:47 [inline] > file_ioctl fs/ioctl.c:539 [inline] > do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > __do_sys_ioctl fs/ioctl.c:750 [inline] > __se_sys_ioctl fs/ioctl.c:748 [inline] > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 > do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x4452d9 > Code: Bad RIP value. > RSP: 002b:7ffde928d288 EFLAGS: 0246 ORIG_RAX: 0010 > RAX: ffda RBX: RCX: 004452d9 > RDX: RSI: ab03 RDI: 0004 > RBP: R08: 004025b0 R09: 004025b0 > R10: R11: 0246 R12: 00402520 > R13: 004025b0 R14: R15: > I will send a fix for