Re: INFO: task hung in stop_sync_thread (2)

2018-03-30 Thread Dmitry Vyukov
On Fri, Mar 30, 2018 at 6:46 PM, syzbot
 wrote:
> syzbot has found reproducer for the following crash on net-next commit

/\/\/\/\/\/\

now syzbot can test fixes

> 18845557fd6fc1998f2d0d8c30467f86db587529 (Thu Mar 29 20:24:06 2018 +)
> Merge tag 'wireless-drivers-next-for-davem-2018-03-29' of
> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=5fe074c01b2032ce9618
>
> So far this crash happened 2 times on net-next.
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5531070485233664
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=5632385408303104
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=6520566391504896
> Kernel config: https://syzkaller.appspot.com/x/.config?id=-37309782588693906
> compiler: gcc (GCC) 7.1.1 20170620
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+5fe074c01b2032ce9...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed.


Re: INFO: task hung in stop_sync_thread (2)

2018-03-30 Thread syzbot

syzbot has found reproducer for the following crash on net-next commit
18845557fd6fc1998f2d0d8c30467f86db587529 (Thu Mar 29 20:24:06 2018 +)
Merge tag 'wireless-drivers-next-for-davem-2018-03-29' of  
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=5fe074c01b2032ce9618


So far this crash happened 2 times on net-next.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5531070485233664
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5632385408303104
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=6520566391504896

Kernel config: https://syzkaller.appspot.com/x/.config?id=-37309782588693906
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+5fe074c01b2032ce9...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.

INFO: task syzkaller923914:4319 blocked for more than 120 seconds.
  Not tainted 4.16.0-rc6+ #286
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syzkaller923914 D23312  4319   4316 0x
Call Trace:
 context_switch kernel/sched/core.c:2862 [inline]
 __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
 schedule+0xf5/0x430 kernel/sched/core.c:3499
 schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
 do_wait_for_common kernel/sched/completion.c:86 [inline]
 __wait_for_common kernel/sched/completion.c:107 [inline]
 wait_for_common kernel/sched/completion.c:118 [inline]
 wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
 kthread_stop+0x14a/0x7a0 kernel/kthread.c:530
 stop_sync_thread+0x3d9/0x740 net/netfilter/ipvs/ip_vs_sync.c:1996
 do_ip_vs_set_ctl+0x2b1/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2394
 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
 nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
 ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1253
 udp_setsockopt+0x45/0x80 net/ipv4/udp.c:2400
 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:3039
 SYSC_setsockopt net/socket.c:1850 [inline]
 SyS_setsockopt+0x189/0x360 net/socket.c:1829
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4459d9
RSP: 002b:7f1d6f47cdb8 EFLAGS: 0297 ORIG_RAX: 0036
RAX: ffda RBX: 006dac24 RCX: 004459d9
RDX: 048c RSI:  RDI: 0008
RBP: 006dac20 R08: 0018 R09: 
R10: 2000 R11: 0297 R12: 
R13: 7fffcf128acf R14: 7f1d6f47d9c0 R15: 0001
INFO: lockdep is turned off.
NMI backtrace for cpu 1
CPU: 1 PID: 869 Comm: khungtaskd Not tainted 4.16.0-rc6+ #286
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
 nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
 check_hung_task kernel/hung_task.c:132 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
 watchdog+0x90c/0xd60 kernel/hung_task.c:249
 kthread+0x33c/0x400 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 4317 Comm: syzkaller923914 Not tainted 4.16.0-rc6+ #286
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50
RSP: 0018:8801b56df138 EFLAGS: 0093
RAX: 8801cbaae640 RBX:  RCX: 866a3971
RDX:  RSI: 0040 RDI: 8801db218038
RBP: 8801b56df168 R08: 88021fff801c R09: 88021fff8008
R10: 88021fff8010 R11: 88021fff801d R12: 8801db218038
R13: 0040 R14: 8801db218038 R15: 
FS:  01968880() GS:8801db20() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ff600400 CR3: 0001c94dc001 CR4: 001606f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 cpumask_next+0x24/0x30 lib/cpumask.c:21
 select_idle_smt kernel/sched/fair.c:6116 [inline]
 select_idle_sibling+0x86d/0xda0 kernel/sched/fair.c:6238
 select_task_rq_fair+0xe0a/0x2910 kernel/sched/fair.c:6394
 select_task_rq kernel/sched/core.c:1554 [inline]
 try_to_wake_up+0x4ee/0x15f0 kernel/sched/core.c:2064
 default_wake_function+0x30/0x50 kernel/sched/core.c:3693
 autoremove_wake_function+0x78/0x350 kernel/sched/wait.c:377
 __wake_up_common+0x18e/0x780 kernel/sched/wait.c:97
 __wake_up_common_lock+0x1b4/0

Re: INFO: task hung in stop_sync_thread (2)

2018-03-29 Thread Julian Anastasov

Hello,

On Thu, 29 Mar 2018, syzbot wrote:

> syzbot hit the following crash on net-next commit
> 5d22d47b9ed96eddb35821dc2cc4f629f45827f7 (Tue Mar 27 17:33:21 2018 +)
> Merge branch 'sfc-filter-locking'
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=5fe074c01b2032ce9618
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=6119456711376896
> Kernel config: https://syzkaller.appspot.com/x/.config?id=4372867303600475372
> compiler: gcc (GCC) 7.1.1 20170620
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+5fe074c01b2032ce9...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for details.
> If you forward the report, please keep this part and the footer.
> 
> IPVS: sync thread started: state = BACKUP, mcast_ifn = syz_tun, syncid = 4, id
> IPVS: = 0
> IPVS: stopping backup sync thread 25415 ...
> INFO: task syz-executor7:25421 blocked for more than 120 seconds.

I think, I know what happens: start_sync_thread holds
rtnl_lock while calling kthread_stop on error. The backup kthread is
signalled and it tries to rtnl_lock in sock_release => lockup.
I think, my fix will solve both reports, only that there is no
reproducer for this one to test the fix with test command.

>  Not tainted 4.16.0-rc6+ #284
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor7   D23688 25421   4408 0x0004
> Call Trace:
> context_switch kernel/sched/core.c:2862 [inline]
> __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> schedule+0xf5/0x430 kernel/sched/core.c:3499
> schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> do_wait_for_common kernel/sched/completion.c:86 [inline]
> __wait_for_common kernel/sched/completion.c:107 [inline]
> wait_for_common kernel/sched/completion.c:118 [inline]
> wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> kthread_stop+0x14a/0x7a0 kernel/kthread.c:530
> stop_sync_thread+0x3d9/0x740 net/netfilter/ipvs/ip_vs_sync.c:1996
> do_ip_vs_set_ctl+0x2b1/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2394
> nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
> nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
> ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1253
> sctp_setsockopt+0x2ca/0x63e0 net/sctp/socket.c:4154
> sock_common_setsockopt+0x95/0xd0 net/core/sock.c:3039
> SYSC_setsockopt net/socket.c:1850 [inline]
> SyS_setsockopt+0x189/0x360 net/socket.c:1829
> do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x454889
> RSP: 002b:7fc927626c68 EFLAGS: 0246 ORIG_RAX: 0036
> RAX: ffda RBX: 7fc9276276d4 RCX: 00454889
> RDX: 048c RSI:  RDI: 0017
> RBP: 0072bf58 R08: 0018 R09: 
> R10: 2000 R11: 0246 R12: 
> R13: 051c R14: 006f9b40 R15: 0001
> 
> Showing all locks held in the system:
> 2 locks held by khungtaskd/868:
> #0:  (rcu_read_lock){}, at: []
> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
> #0:  (rcu_read_lock){}, at: [] watchdog+0x1c5/0xd60
> kernel/hung_task.c:249
> #1:  (tasklist_lock){.+.+}, at: [<37c2f8f9>]
> debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> 1 lock held by rsyslogd/4247:
> #0:  (&f->f_pos_lock){+.+.}, at: [<0d8d6983>] __fdget_pos+0x12b/0x190
> fs/file.c:765
> 2 locks held by getty/4338:
> #0:  (&tty->ldisc_sem){}, at: []
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> #1:  (&ldata->atomic_read_lock){+.+.}, at: []
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4339:
> #0:  (&tty->ldisc_sem){}, at: []
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> #1:  (&ldata->atomic_read_lock){+.+.}, at: []
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4340:
> #0:  (&tty->ldisc_sem){}, at: []
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> #1:  (&ldata->atomic_read_lock){+.+.}, at: []
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4341:
> #0:  (&tty->ldisc_sem){}, at: []
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> #1:  (&ldata->atomic_read_lock){+.+.}, at: []
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4342:
> #0:  (&tty->ldisc_sem){}, at: []
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> #1:  (&ldata->atomic_read_lock){+.+.}, at: []
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4343:
> #0:  (&tty->ldisc_se