Re: INFO: task hung in linkwatch_event (2)

2020-12-10 Thread syzbot
syzbot has found a reproducer for the following issue on:

HEAD commit:a7105e34 Merge branch 'hns3-next'
git tree:   net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=155af80f50
kernel config:  https://syzkaller.appspot.com/x/.config?x=2ac2dabe250b3a58
dashboard link: https://syzkaller.appspot.com/bug?extid=96ff6cfc4551fcc29342
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=11bc7b1350
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1674046b50

The issue was bisected to:

commit 386d4716fd91869e07c731657f2cde5a33086516
Author: Luo bin 
Date:   Thu Feb 27 06:34:44 2020 +

hinic: fix a bug of rss configuration

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=16626fcfe0
final oops: https://syzkaller.appspot.com/x/report.txt?x=15626fcfe0
console output: https://syzkaller.appspot.com/x/log.txt?x=11626fcfe0

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+96ff6cfc4551fcc29...@syzkaller.appspotmail.com
Fixes: 386d4716fd91 ("hinic: fix a bug of rss configuration")

INFO: task kworker/0:2:3004 blocked for more than 143 seconds.
  Not tainted 5.10.0-rc6-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:2 state:D stack:28448 pid: 3004 ppid: 2 flags:0x4000
Workqueue: events linkwatch_event
Call Trace:
 context_switch kernel/sched/core.c:3779 [inline]
 __schedule+0x893/0x2130 kernel/sched/core.c:4528
 schedule+0xcf/0x270 kernel/sched/core.c:4606
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:4665
 __mutex_lock_common kernel/locking/mutex.c:1033 [inline]
 __mutex_lock+0x3e2/0x10e0 kernel/locking/mutex.c:1103
 linkwatch_event+0xb/0x60 net/core/link_watch.c:250
 process_one_work+0x933/0x15a0 kernel/workqueue.c:2272
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2418
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
INFO: task kworker/0:0:8837 blocked for more than 143 seconds.
  Not tainted 5.10.0-rc6-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:0 state:D stack:29768 pid: 8837 ppid: 2 flags:0x4000
Workqueue: ipv6_addrconf addrconf_verify_work
Call Trace:
 context_switch kernel/sched/core.c:3779 [inline]
 __schedule+0x893/0x2130 kernel/sched/core.c:4528
 schedule+0xcf/0x270 kernel/sched/core.c:4606
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:4665
 __mutex_lock_common kernel/locking/mutex.c:1033 [inline]
 __mutex_lock+0x3e2/0x10e0 kernel/locking/mutex.c:1103
 addrconf_verify_work+0xa/0x20 net/ipv6/addrconf.c:4569
 process_one_work+0x933/0x15a0 kernel/workqueue.c:2272
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2418
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296

Showing all locks held in the system:
1 lock held by khungtaskd/1655:
 #0: 8b337a20 (rcu_read_lock){}-{1:2}, at: 
debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6254
3 locks held by kworker/0:2/3004:
 #0: 888010064d38 ((wq_completion)events){+.+.}-{0:0}, at: 
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: 888010064d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic64_set 
include/asm-generic/atomic-instrumented.h:856 [inline]
 #0: 888010064d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic_long_set 
include/asm-generic/atomic-long.h:41 [inline]
 #0: 888010064d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_data 
kernel/workqueue.c:616 [inline]
 #0: 888010064d38 ((wq_completion)events){+.+.}-{0:0}, at: 
set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
 #0: 888010064d38 ((wq_completion)events){+.+.}-{0:0}, at: 
process_one_work+0x821/0x15a0 kernel/workqueue.c:2243
 #1: c90001dafda8 ((linkwatch_work).work){+.+.}-{0:0}, at: 
process_one_work+0x854/0x15a0 kernel/workqueue.c:2247
 #2: 8c92d448 (rtnl_mutex){+.+.}-{3:3}, at: linkwatch_event+0xb/0x60 
net/core/link_watch.c:250
1 lock held by in:imklog/8186:
 #0: 888017c900f0 (>f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 
fs/file.c:932
2 locks held by syz-executor047/8830:
3 locks held by kworker/0:0/8837:
 #0: 888147499138 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: 
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: 888147499138 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: 
atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
 #0: 888147499138 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: 
atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
 #0: 888147499138 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: 
set_work_data kernel/workqueue.c:616 [inline]
 #0: 888147499138 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: 
set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
 #0: 

Re: INFO: task hung in linkwatch_event (2)

2020-05-06 Thread Yunsheng Lin
+cc Xiaoguang & Jens
On 2020/5/6 14:56, Hillf Danton wrote:
> 
> Wed, 29 Apr 2020 02:59:13 -0700
>> syzbot found the following crash on:
>>
>> HEAD commit:b4f63322 Merge branch 'for-linus' of git://git.kernel.org/..
>> git tree:   upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1558936fe0
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68
>> dashboard link: https://syzkaller.appspot.com/bug?extid=96ff6cfc4551fcc29342
>> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
>> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=14a5782810
>>
>> The bug was bisected to:
>>
>> commit 386d4716fd91869e07c731657f2cde5a33086516
>> Author: Luo bin 
>> Date:   Thu Feb 27 06:34:44 2020 +
>>
>> hinic: fix a bug of rss configuration
>>
>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=16626fcfe0
>> final crash:https://syzkaller.appspot.com/x/report.txt?x=15626fcfe0
>> console output: https://syzkaller.appspot.com/x/log.txt?x=11626fcfe0
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+96ff6cfc4551fcc29...@syzkaller.appspotmail.com
>> Fixes: 386d4716fd91 ("hinic: fix a bug of rss configuration")
>>
>> INFO: task kworker/1:5:2724 blocked for more than 143 seconds.
>>   Not tainted 5.7.0-rc2-syzkaller #0
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> kworker/1:5 D27416  2724  2 0x80004000
>> Workqueue: events linkwatch_event
>> Call Trace:
>>  schedule+0xd0/0x2a0 kernel/sched/core.c:4163
>>  schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:4222
>>  __mutex_lock_common kernel/locking/mutex.c:1033 [inline]
>>  __mutex_lock+0x7ab/0x13c0 kernel/locking/mutex.c:1103
>>  linkwatch_event+0xb/0x60 net/core/link_watch.c:242
>>  process_one_work+0x965/0x16a0 kernel/workqueue.c:2268
>>  worker_thread+0x96/0xe20 kernel/workqueue.c:2414
>>  kthread+0x388/0x470 kernel/kthread.c:268
>>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
>> INFO: task syz-executor.0:7053 blocked for more than 143 seconds.
>>   Not tainted 5.7.0-rc2-syzkaller #0
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> syz-executor.0  D23512  7053  1 0x80004006
>> Call Trace:
>>  schedule+0xd0/0x2a0 kernel/sched/core.c:4163
>>  schedule_timeout+0x55b/0x850 kernel/time/timer.c:1874
>>  do_wait_for_common kernel/sched/completion.c:85 [inline]
>>  __wait_for_common kernel/sched/completion.c:106 [inline]
>>  wait_for_common kernel/sched/completion.c:117 [inline]
>>  wait_for_completion+0x16a/0x270 kernel/sched/completion.c:138
>>  __flush_work+0x4fd/0xa80 kernel/workqueue.c:3045
>>  flush_all_backlogs net/core/dev.c:5527 [inline]
>>  rollback_registered_many+0x562/0xe70 net/core/dev.c:8813
>>  rollback_registered+0xf2/0x1c0 net/core/dev.c:8873
>>  unregister_netdevice_queue net/core/dev.c:9969 [inline]
>>  unregister_netdevice_queue+0x1d7/0x2b0 net/core/dev.c:9962
>>  unregister_netdevice include/linux/netdevice.h:2725 [inline]
>>  __tun_detach+0xe42/0x1110 drivers/net/tun.c:690
>>  tun_detach drivers/net/tun.c:707 [inline]
>>  tun_chr_close+0xd9/0x180 drivers/net/tun.c:3413
>>  __fput+0x33e/0x880 fs/file_table.c:280
>>  task_work_run+0xf4/0x1b0 kernel/task_work.c:123
>>  exit_task_work include/linux/task_work.h:22 [inline]
>>  do_exit+0xb34/0x2dd0 kernel/exit.c:795
>>  do_group_exit+0x125/0x340 kernel/exit.c:893
>>  get_signal+0x47b/0x24e0 kernel/signal.c:2739
>>  do_signal+0x81/0x2240 arch/x86/kernel/signal.c:784
>>  exit_to_usermode_loop+0x26c/0x360 arch/x86/entry/common.c:161
>>  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>>  syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
>>  do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
>>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
>> RIP: 0033:0x4166ca
>> Code: Bad RIP value.
>> RSP: 002b:7ffd4022d478 EFLAGS: 0246 ORIG_RAX: 003d
>> RAX: fe00 RBX: 01d60940 RCX: 004166ca
>> RDX: 4000 RSI: 7ffd4022d4b0 RDI: 
>> RBP: 2996 R08: 0001 R09: 0001
>> R10:  R11: 0246 R12: 0003
>> R13: 7ffd4022d4b0 R14: 01d6099b R15: 7ffd4022d4c0
>>
>> Showing all locks held in the system:
>> 1 lock held by khungtaskd/1125:
>>  #0: 899beb00 (rcu_read_lock){}-{1:2}, at: 
>> debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:5754
>> 3 locks held by kworker/1:5/2724:
>>  #0: 8880aa026d38 ((wq_completion)events){+.+.}-{0:0}, at: 
>> __write_once_size include/linux/compiler.h:226 [inline]
>>  #0: 8880aa026d38 ((wq_completion)events){+.+.}-{0:0}, at: 
>> arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
>>  #0: 8880aa026d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic64_set 
>> include/asm-generic/atomic-instrumented.h:855 [inline]
>>  #0: 

Re: INFO: task hung in linkwatch_event (2)

2020-05-06 Thread Yunsheng Lin
On 2020/5/6 12:25, Hillf Danton wrote:
> 
> On Wed, 6 May 2020 09:38:21 Yunsheng Lin wrote:
>>
>> On 2020/4/29 17:59, syzbot wrote:
>>> Hello,
>>>
>>> syzbot found the following crash on:
>>>
>>> HEAD commit:b4f63322 Merge branch 'for-linus' of git://git.kernel.org/..
>>> git tree:   upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1558936fe0
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=96ff6cfc4551fcc29342
>>> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
>>> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=14a5782810
>>>
>>> The bug was bisected to:
>>>
>>> commit 386d4716fd91869e07c731657f2cde5a33086516
>>> Author: Luo bin 
>>> Date:   Thu Feb 27 06:34:44 2020 +
>>>
>>> hinic: fix a bug of rss configuration
>>
>> The above patch does not seem to be the cause of the crash.
>>
>> From the below call trace, it seems the blocking is caused by
>> the tun_detach() which need to flush the all the pending work
> 
> queued on system_highpri_wq
> 
>> for each online cpu, it is the linkwatch_work that need to be
>> flushed in this crash case.
> 
> Not so sure it's linkwatch_work because it's on system_wq.

Yes, you are right. The work of flush_backlog() is queued on
system_highpri_wq, and maybe that is the work tun_detach() is
trying to flush.

So the tun_detach is flushing a work queued on the system_highpri_wq
while holding the RTNL lock, and linkwatch_event work is running to
try to take the RTNL lock, do they compete for the same worker in the
same cpu even they are queued for different wq? I do not understand
wq very well, if Yes, there may be a dead loop here?

> 
>> But the linkwatch_work() need to take
>> RTNL lock, which is already taken by the tun_detach(), and that is
>> where the blocking is happening.
>>
>> Possible way to fix or avoid this:
>> 1. Call flush_all_backlogs() without holding the RTNL lock, I am not
>>sure it is safe to do this.
>> 2. Disabling adding link event to the unregisterring netdev, and flush
>>all the pending link event without taking RTNL lock before calling
>>unregister_netdevice() in tun_detach().
>>
>> Any better suggestion? Thanks.
>>
> Not before some extra info about what's going on the highpri wq is available.
> 
>>>
>>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=16626fcfe0
>>> final crash:https://syzkaller.appspot.com/x/report.txt?x=15626fcfe0
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=11626fcfe0
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+96ff6cfc4551fcc29...@syzkaller.appspotmail.com
>>> Fixes: 386d4716fd91 ("hinic: fix a bug of rss configuration")
>>>
>>> INFO: task kworker/1:5:2724 blocked for more than 143 seconds.
>>>   Not tainted 5.7.0-rc2-syzkaller #0
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> kworker/1:5 D27416  2724  2 0x80004000
>>> Workqueue: events linkwatch_event
>>> Call Trace:
>>>  schedule+0xd0/0x2a0 kernel/sched/core.c:4163
>>>  schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:4222
>>>  __mutex_lock_common kernel/locking/mutex.c:1033 [inline]
>>>  __mutex_lock+0x7ab/0x13c0 kernel/locking/mutex.c:1103
>>>  linkwatch_event+0xb/0x60 net/core/link_watch.c:242
>>>  process_one_work+0x965/0x16a0 kernel/workqueue.c:2268
>>>  worker_thread+0x96/0xe20 kernel/workqueue.c:2414
>>>  kthread+0x388/0x470 kernel/kthread.c:268
>>>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
>>> INFO: task syz-executor.0:7053 blocked for more than 143 seconds.
>>>   Not tainted 5.7.0-rc2-syzkaller #0
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> syz-executor.0  D23512  7053  1 0x80004006
>>> Call Trace:
>>>  schedule+0xd0/0x2a0 kernel/sched/core.c:4163
>>>  schedule_timeout+0x55b/0x850 kernel/time/timer.c:1874
>>>  do_wait_for_common kernel/sched/completion.c:85 [inline]
>>>  __wait_for_common kernel/sched/completion.c:106 [inline]
>>>  wait_for_common kernel/sched/completion.c:117 [inline]
>>>  wait_for_completion+0x16a/0x270 kernel/sched/completion.c:138
>>>  __flush_work+0x4fd/0xa80 kernel/workqueue.c:3045
>>>  flush_all_backlogs net/core/dev.c:5527 [inline]
>>>  rollback_registered_many+0x562/0xe70 net/core/dev.c:8813
>>>  rollback_registered+0xf2/0x1c0 net/core/dev.c:8873
>>>  unregister_netdevice_queue net/core/dev.c:9969 [inline]
>>>  unregister_netdevice_queue+0x1d7/0x2b0 net/core/dev.c:9962
>>>  unregister_netdevice include/linux/netdevice.h:2725 [inline]
>>>  __tun_detach+0xe42/0x1110 drivers/net/tun.c:690
>>>  tun_detach drivers/net/tun.c:707 [inline]
>>>  tun_chr_close+0xd9/0x180 drivers/net/tun.c:3413
>>>  __fput+0x33e/0x880 fs/file_table.c:280
>>>  task_work_run+0xf4/0x1b0 kernel/task_work.c:123
>>>  exit_task_work include/linux/task_work.h:22 [inline]
>>>  

Re: INFO: task hung in linkwatch_event (2)

2020-05-05 Thread Yunsheng Lin
On 2020/4/29 17:59, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:b4f63322 Merge branch 'for-linus' of git://git.kernel.org/..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1558936fe0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68
> dashboard link: https://syzkaller.appspot.com/bug?extid=96ff6cfc4551fcc29342
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=14a5782810
> 
> The bug was bisected to:
> 
> commit 386d4716fd91869e07c731657f2cde5a33086516
> Author: Luo bin 
> Date:   Thu Feb 27 06:34:44 2020 +
> 
> hinic: fix a bug of rss configuration

The above patch does not seem to be the cause of the crash.

>From the below call trace, it seems the blocking is caused by
the tun_detach() which need to flush the all the pending work
for each online cpu, it is the linkwatch_work that need to be
flushed in this crash case. But the linkwatch_work() need to take
RTNL lock, which is already taken by the tun_detach(), and that is
where the blocking is happening.

Possible way to fix or avoid this:
1. Call flush_all_backlogs() without holding the RTNL lock, I am not
   sure it is safe to do this.
2. Disabling adding link event to the unregisterring netdev, and flush
   all the pending link event without taking RTNL lock before calling
   unregister_netdevice() in tun_detach().

Any better suggestion? Thanks.

> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=16626fcfe0
> final crash:https://syzkaller.appspot.com/x/report.txt?x=15626fcfe0
> console output: https://syzkaller.appspot.com/x/log.txt?x=11626fcfe0
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+96ff6cfc4551fcc29...@syzkaller.appspotmail.com
> Fixes: 386d4716fd91 ("hinic: fix a bug of rss configuration")
> 
> INFO: task kworker/1:5:2724 blocked for more than 143 seconds.
>   Not tainted 5.7.0-rc2-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/1:5 D27416  2724  2 0x80004000
> Workqueue: events linkwatch_event
> Call Trace:
>  schedule+0xd0/0x2a0 kernel/sched/core.c:4163
>  schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:4222
>  __mutex_lock_common kernel/locking/mutex.c:1033 [inline]
>  __mutex_lock+0x7ab/0x13c0 kernel/locking/mutex.c:1103
>  linkwatch_event+0xb/0x60 net/core/link_watch.c:242
>  process_one_work+0x965/0x16a0 kernel/workqueue.c:2268
>  worker_thread+0x96/0xe20 kernel/workqueue.c:2414
>  kthread+0x388/0x470 kernel/kthread.c:268
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> INFO: task syz-executor.0:7053 blocked for more than 143 seconds.
>   Not tainted 5.7.0-rc2-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.0  D23512  7053  1 0x80004006
> Call Trace:
>  schedule+0xd0/0x2a0 kernel/sched/core.c:4163
>  schedule_timeout+0x55b/0x850 kernel/time/timer.c:1874
>  do_wait_for_common kernel/sched/completion.c:85 [inline]
>  __wait_for_common kernel/sched/completion.c:106 [inline]
>  wait_for_common kernel/sched/completion.c:117 [inline]
>  wait_for_completion+0x16a/0x270 kernel/sched/completion.c:138
>  __flush_work+0x4fd/0xa80 kernel/workqueue.c:3045
>  flush_all_backlogs net/core/dev.c:5527 [inline]
>  rollback_registered_many+0x562/0xe70 net/core/dev.c:8813
>  rollback_registered+0xf2/0x1c0 net/core/dev.c:8873
>  unregister_netdevice_queue net/core/dev.c:9969 [inline]
>  unregister_netdevice_queue+0x1d7/0x2b0 net/core/dev.c:9962
>  unregister_netdevice include/linux/netdevice.h:2725 [inline]
>  __tun_detach+0xe42/0x1110 drivers/net/tun.c:690
>  tun_detach drivers/net/tun.c:707 [inline]
>  tun_chr_close+0xd9/0x180 drivers/net/tun.c:3413
>  __fput+0x33e/0x880 fs/file_table.c:280
>  task_work_run+0xf4/0x1b0 kernel/task_work.c:123
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0xb34/0x2dd0 kernel/exit.c:795
>  do_group_exit+0x125/0x340 kernel/exit.c:893
>  get_signal+0x47b/0x24e0 kernel/signal.c:2739
>  do_signal+0x81/0x2240 arch/x86/kernel/signal.c:784
>  exit_to_usermode_loop+0x26c/0x360 arch/x86/entry/common.c:161
>  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
>  do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> RIP: 0033:0x4166ca
> Code: Bad RIP value.
> RSP: 002b:7ffd4022d478 EFLAGS: 0246 ORIG_RAX: 003d
> RAX: fe00 RBX: 01d60940 RCX: 004166ca
> RDX: 4000 RSI: 7ffd4022d4b0 RDI: 
> RBP: 2996 R08: 0001 R09: 0001
> R10:  R11: 0246 R12: 0003
> R13: 7ffd4022d4b0 R14: 01d6099b R15: 7ffd4022d4c0
> 
> Showing all 

Re: INFO: task hung in linkwatch_event

2018-02-14 Thread Dmitry Vyukov
On Wed, Feb 14, 2018 at 2:59 PM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> a2e5790d841658485d642196dbb0927303d6c22f (Wed Feb 7 06:15:42 2018 +)
> Merge branch 'akpm' (patches from Andrew)
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.

Another hang on rtnl lock:

#syz dup: INFO: task hung in netdev_run_todo

> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+a3926c3116d237751...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> INFO: task kworker/1:3:1828 blocked for more than 120 seconds.
>   Not tainted 4.15.0+ #301
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/1:3 D19968  1828  2 0x8000
> Workqueue: events linkwatch_event
> Call Trace:
>  context_switch kernel/sched/core.c:2857 [inline]
>  __schedule+0x8e2/0x2040 kernel/sched/core.c:3435
>  schedule+0xf5/0x430 kernel/sched/core.c:3494
>  schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3552
>  __mutex_lock_common kernel/locking/mutex.c:833 [inline]
>  __mutex_lock+0xaad/0x1a80 kernel/locking/mutex.c:893
>  mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>  rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
>  linkwatch_event+0x57/0xc0 net/core/link_watch.c:236
>  process_one_work+0xbbf/0x1af0 kernel/workqueue.c:2113
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2247
>  kthread+0x33c/0x400 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:429
>
> Showing all locks held in the system:
> 3 locks held by kworker/1:1/23:
>  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<3027d709>]
> process_one_work+0xaaf/0x1af0 kernel/workqueue.c:2084
>  #1:  ((addr_chk_work).work){+.+.}, at: []
> process_one_work+0xb01/0x1af0 kernel/workqueue.c:2088
>  #2:  (rtnl_mutex){+.+.}, at: [<55e4e532>] rtnl_lock+0x17/0x20
> net/core/rtnetlink.c:74
> 2 locks held by khungtaskd/760:
>  #0:  (rcu_read_lock){}, at: [<7d511abd>]
> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
>  #0:  (rcu_read_lock){}, at: [<7d511abd>] watchdog+0x1c5/0xd60
> kernel/hung_task.c:249
>  #1:  (tasklist_lock){.+.+}, at: [<78b2aec7>]
> debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> 3 locks held by kworker/1:3/1828:
>  #0:  ((wq_completion)"events"){+.+.}, at: [<3027d709>]
> process_one_work+0xaaf/0x1af0 kernel/workqueue.c:2084
>  #1:  ((linkwatch_work).work){+.+.}, at: []
> process_one_work+0xb01/0x1af0 kernel/workqueue.c:2088
>  #2:  (rtnl_mutex){+.+.}, at: [<55e4e532>] rtnl_lock+0x17/0x20
> net/core/rtnetlink.c:74
> 1 lock held by rsyslogd/3879:
>  #0:  (>f_pos_lock){+.+.}, at: [<1e0f1a09>]
> __fdget_pos+0x12b/0x190 fs/file.c:765
> 2 locks held by getty/4002:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4003:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4004:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4005:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4006:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4007:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 2 locks held by getty/22206:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 1 lock held by syz-executor0/953:
>  #0:  (sk_lock-AF_INET6){+.+.}, at: [<918a22c0>] lock_sock
> include/net/sock.h:1463 [inline]
>  

Re: INFO: task hung in linkwatch_event

2018-02-14 Thread Dmitry Vyukov
On Wed, Feb 14, 2018 at 2:59 PM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> a2e5790d841658485d642196dbb0927303d6c22f (Wed Feb 7 06:15:42 2018 +)
> Merge branch 'akpm' (patches from Andrew)
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.

Another hang on rtnl lock:

#syz dup: INFO: task hung in netdev_run_todo

> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+a3926c3116d237751...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> INFO: task kworker/1:3:1828 blocked for more than 120 seconds.
>   Not tainted 4.15.0+ #301
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/1:3 D19968  1828  2 0x8000
> Workqueue: events linkwatch_event
> Call Trace:
>  context_switch kernel/sched/core.c:2857 [inline]
>  __schedule+0x8e2/0x2040 kernel/sched/core.c:3435
>  schedule+0xf5/0x430 kernel/sched/core.c:3494
>  schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3552
>  __mutex_lock_common kernel/locking/mutex.c:833 [inline]
>  __mutex_lock+0xaad/0x1a80 kernel/locking/mutex.c:893
>  mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>  rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
>  linkwatch_event+0x57/0xc0 net/core/link_watch.c:236
>  process_one_work+0xbbf/0x1af0 kernel/workqueue.c:2113
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2247
>  kthread+0x33c/0x400 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:429
>
> Showing all locks held in the system:
> 3 locks held by kworker/1:1/23:
>  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<3027d709>]
> process_one_work+0xaaf/0x1af0 kernel/workqueue.c:2084
>  #1:  ((addr_chk_work).work){+.+.}, at: []
> process_one_work+0xb01/0x1af0 kernel/workqueue.c:2088
>  #2:  (rtnl_mutex){+.+.}, at: [<55e4e532>] rtnl_lock+0x17/0x20
> net/core/rtnetlink.c:74
> 2 locks held by khungtaskd/760:
>  #0:  (rcu_read_lock){}, at: [<7d511abd>]
> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
>  #0:  (rcu_read_lock){}, at: [<7d511abd>] watchdog+0x1c5/0xd60
> kernel/hung_task.c:249
>  #1:  (tasklist_lock){.+.+}, at: [<78b2aec7>]
> debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> 3 locks held by kworker/1:3/1828:
>  #0:  ((wq_completion)"events"){+.+.}, at: [<3027d709>]
> process_one_work+0xaaf/0x1af0 kernel/workqueue.c:2084
>  #1:  ((linkwatch_work).work){+.+.}, at: []
> process_one_work+0xb01/0x1af0 kernel/workqueue.c:2088
>  #2:  (rtnl_mutex){+.+.}, at: [<55e4e532>] rtnl_lock+0x17/0x20
> net/core/rtnetlink.c:74
> 1 lock held by rsyslogd/3879:
>  #0:  (>f_pos_lock){+.+.}, at: [<1e0f1a09>]
> __fdget_pos+0x12b/0x190 fs/file.c:765
> 2 locks held by getty/4002:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4003:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4004:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4005:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4006:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4007:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 2 locks held by getty/22206:
>  #0:  (>ldisc_sem){}, at: [<4b23cd0c>]
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>  #1:  (>atomic_read_lock){+.+.}, at: [<7b8990ce>]
> n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
> 1 lock held by syz-executor0/953:
>  #0:  (sk_lock-AF_INET6){+.+.}, at: [<918a22c0>] lock_sock
> include/net/sock.h:1463 [inline]
>  #0:  (sk_lock-AF_INET6){+.+.}, at: