Re: INFO: task hung in perf_trace_event_unreg

2018-04-12 Thread Paul E. McKenney
On Thu, Apr 12, 2018 at 11:39:42AM +0200, Dmitry Vyukov wrote:
> On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney
>  wrote:
> >> >> >> >>  wrote:
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
> >> >> >> >> >> >> >> >> > 21:20:27 2018 +)
> >> >> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
> >> >> >> >> >> >> >> >> > crash yet.
> >> >> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the 
> >> >> >> >> >> >> >> >> > following tag to the commit:
> >> >> >> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is 
> >> >> >> >> >> >> >> >> > fixed. See footer for
> >> >> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> >> >> > If you forward the report, please keep this part 
> >> >> >> >> >> >> >> >> > and the footer.
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
> >> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
> >> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps 
> >> >> >> >> >> >> >> > this is just syzkaller
> >> >> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more 
> >> >> >> >> >> >> >> >> > than 120 seconds.
> >> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> >> >> >> >> >> >> >> > disables this message.
> >> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 
> >> >> >> >> >> >> >> >> > kernel/time/timer.c:1777
> >> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
> >> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
> >> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 
> >> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
> >> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like 
> >> >> >> >> >> >> >> >> something is preventing
> >> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
> >> >> >> >> >> >> >> >> running in kernel
> >> >> >> >> >> >> >> >> space and never scheduling, that can cause this 
> >> >> >> >> >> >> >> >> issue. Or if RCU
> >> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
> >> >> >> >> >> >> >> > position ...
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Seems likely to me!
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs 
> >> >> >> >> >> >> >> that we have, I
> >> >> >> >> >> >> >> think we need some kind of priority between them. I.e. 
> >> >> >> >> >> >> >> we have rcu
> >> >> >> >> >> >> >> stalls, spinlock stalls, workqueue 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-12 Thread Paul E. McKenney
On Thu, Apr 12, 2018 at 11:39:42AM +0200, Dmitry Vyukov wrote:
> On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney
>  wrote:
> >> >> >> >>  wrote:
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
> >> >> >> >> >> >> >> >> > 21:20:27 2018 +)
> >> >> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
> >> >> >> >> >> >> >> >> > crash yet.
> >> >> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the 
> >> >> >> >> >> >> >> >> > following tag to the commit:
> >> >> >> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is 
> >> >> >> >> >> >> >> >> > fixed. See footer for
> >> >> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> >> >> > If you forward the report, please keep this part 
> >> >> >> >> >> >> >> >> > and the footer.
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
> >> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
> >> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps 
> >> >> >> >> >> >> >> > this is just syzkaller
> >> >> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more 
> >> >> >> >> >> >> >> >> > than 120 seconds.
> >> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> >> >> >> >> >> >> >> > disables this message.
> >> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 
> >> >> >> >> >> >> >> >> > kernel/time/timer.c:1777
> >> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
> >> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
> >> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 
> >> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
> >> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like 
> >> >> >> >> >> >> >> >> something is preventing
> >> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
> >> >> >> >> >> >> >> >> running in kernel
> >> >> >> >> >> >> >> >> space and never scheduling, that can cause this 
> >> >> >> >> >> >> >> >> issue. Or if RCU
> >> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
> >> >> >> >> >> >> >> > position ...
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Seems likely to me!
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs 
> >> >> >> >> >> >> >> that we have, I
> >> >> >> >> >> >> >> think we need some kind of priority between them. I.e. 
> >> >> >> >> >> >> >> we have rcu
> >> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, 
> >> >> >> >> >> >> >> silent machine
> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-12 Thread Dmitry Vyukov
On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney
 wrote:
>> >> >> >>  wrote:
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
>> >> >> >> >> >> >> >> > 21:20:27 2018 +)
>> >> >> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
>> >> >> >> >> >> >> >> > crash yet.
>> >> >> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the 
>> >> >> >> >> >> >> >> > following tag to the commit:
>> >> >> >> >> >> >> >> > Reported-by: 
>> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. 
>> >> >> >> >> >> >> >> > See footer for
>> >> >> >> >> >> >> >> > details.
>> >> >> >> >> >> >> >> > If you forward the report, please keep this part and 
>> >> >> >> >> >> >> >> > the footer.
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
>> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
>> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps 
>> >> >> >> >> >> >> > this is just syzkaller
>> >> >> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 
>> >> >> >> >> >> >> >> > 120 seconds.
>> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> >> >> >> >> >> >> >> > disables this message.
>> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 
>> >> >> >> >> >> >> >> > kernel/time/timer.c:1777
>> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
>> >> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
>> >> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 
>> >> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
>> >> >> >> >> >> >> >> > kernel/sched/completion.c:139
>> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
>> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
>> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like 
>> >> >> >> >> >> >> >> something is preventing
>> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
>> >> >> >> >> >> >> >> running in kernel
>> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. 
>> >> >> >> >> >> >> >> Or if RCU
>> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
>> >> >> >> >> >> >> > position ...
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> I think this is this guy then:
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >> >> >
>> >> >> >> >> >> > Seems likely to me!
>> >> >> >> >> >> >
>> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs 
>> >> >> >> >> >> >> that we have, I
>> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we 
>> >> >> >> >> >> >> have rcu
>> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, 
>> >> >> >> >> >> >> silent machine
>> >> >> >> >> >> >> hang and maybe something else. It would be useful if they 
>> >> >> >> >> >> >> fire
>> >> >> >> >> >> >> deterministically according to priorities. If there is an 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-12 Thread Dmitry Vyukov
On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney
 wrote:
>> >> >> >>  wrote:
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
>> >> >> >> >> >> >> >> > 21:20:27 2018 +)
>> >> >> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
>> >> >> >> >> >> >> >> > crash yet.
>> >> >> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the 
>> >> >> >> >> >> >> >> > following tag to the commit:
>> >> >> >> >> >> >> >> > Reported-by: 
>> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. 
>> >> >> >> >> >> >> >> > See footer for
>> >> >> >> >> >> >> >> > details.
>> >> >> >> >> >> >> >> > If you forward the report, please keep this part and 
>> >> >> >> >> >> >> >> > the footer.
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
>> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
>> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps 
>> >> >> >> >> >> >> > this is just syzkaller
>> >> >> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 
>> >> >> >> >> >> >> >> > 120 seconds.
>> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> >> >> >> >> >> >> >> > disables this message.
>> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 
>> >> >> >> >> >> >> >> > kernel/time/timer.c:1777
>> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
>> >> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
>> >> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 
>> >> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
>> >> >> >> >> >> >> >> > kernel/sched/completion.c:139
>> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
>> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
>> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like 
>> >> >> >> >> >> >> >> something is preventing
>> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
>> >> >> >> >> >> >> >> running in kernel
>> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. 
>> >> >> >> >> >> >> >> Or if RCU
>> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
>> >> >> >> >> >> >> > position ...
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> I think this is this guy then:
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >> >> >
>> >> >> >> >> >> > Seems likely to me!
>> >> >> >> >> >> >
>> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs 
>> >> >> >> >> >> >> that we have, I
>> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we 
>> >> >> >> >> >> >> have rcu
>> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, 
>> >> >> >> >> >> >> silent machine
>> >> >> >> >> >> >> hang and maybe something else. It would be useful if they 
>> >> >> >> >> >> >> fire
>> >> >> >> >> >> >> deterministically according to priorities. If there is an 
>> >> >> >> >> >> >> rcu stall,
>> >> >> >> >> >> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-11 Thread Paul E. McKenney
On Wed, Apr 11, 2018 at 12:06:27PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney
>  wrote:
> >> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >> >> >>  wrote:
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
> >> >> >> >> >> >> >> > 21:20:27 2018 +)
> >> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
> >> >> >> >> >> >> >> > crash yet.
> >> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the 
> >> >> >> >> >> >> >> > following tag to the commit:
> >> >> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. 
> >> >> >> >> >> >> >> > See footer for
> >> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> >> > If you forward the report, please keep this part and 
> >> >> >> >> >> >> >> > the footer.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps 
> >> >> >> >> >> >> > this is just syzkaller
> >> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 
> >> >> >> >> >> >> >> > 120 seconds.
> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> >> >> >> >> >> >> > disables this message.
> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 
> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something 
> >> >> >> >> >> >> >> is preventing
> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
> >> >> >> >> >> >> >> running in kernel
> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. 
> >> >> >> >> >> >> >> Or if RCU
> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
> >> >> >> >> >> >> > position ...
> >> >> >> >> >> >>
> >> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >> >>
> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >> >>
> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >> >
> >> >> >> >> >> > Seems likely to me!
> >> >> >> >> >> >
> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that 
> >> >> >> >> >> >> we have, I
> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we 
> >> >> >> >> >> >> have rcu
> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, 
> >> >> >> >> >> >> silent machine
> >> >> >> >> >> >> hang and maybe something else. It would be useful if they 
> >> >> >> >> >> >> fire
> >> >> >> >> >> >> deterministically according to priorities. If 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-11 Thread Paul E. McKenney
On Wed, Apr 11, 2018 at 12:06:27PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney
>  wrote:
> >> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >> >> >>  wrote:
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
> >> >> >> >> >> >> >> > 21:20:27 2018 +)
> >> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
> >> >> >> >> >> >> >> > crash yet.
> >> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the 
> >> >> >> >> >> >> >> > following tag to the commit:
> >> >> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. 
> >> >> >> >> >> >> >> > See footer for
> >> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> >> > If you forward the report, please keep this part and 
> >> >> >> >> >> >> >> > the footer.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps 
> >> >> >> >> >> >> > this is just syzkaller
> >> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 
> >> >> >> >> >> >> >> > 120 seconds.
> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> >> >> >> >> >> >> > disables this message.
> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 
> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something 
> >> >> >> >> >> >> >> is preventing
> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
> >> >> >> >> >> >> >> running in kernel
> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. 
> >> >> >> >> >> >> >> Or if RCU
> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
> >> >> >> >> >> >> > position ...
> >> >> >> >> >> >>
> >> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >> >>
> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >> >>
> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >> >
> >> >> >> >> >> > Seems likely to me!
> >> >> >> >> >> >
> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that 
> >> >> >> >> >> >> we have, I
> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we 
> >> >> >> >> >> >> have rcu
> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, 
> >> >> >> >> >> >> silent machine
> >> >> >> >> >> >> hang and maybe something else. It would be useful if they 
> >> >> >> >> >> >> fire
> >> >> >> >> >> >> deterministically according to priorities. If there is an 
> >> >> >> >> >> >> rcu stall,
> >> >> >> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-11 Thread Dmitry Vyukov
On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney
 wrote:
>> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>> >> >>  wrote:
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
>> >> >> >> >> >> >> > 21:20:27 2018 +)
>> >> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
>> >> >> >> >> >> >> > crash yet.
>> >> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following 
>> >> >> >> >> >> >> > tag to the commit:
>> >> >> >> >> >> >> > Reported-by: 
>> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. 
>> >> >> >> >> >> >> > See footer for
>> >> >> >> >> >> >> > details.
>> >> >> >> >> >> >> > If you forward the report, please keep this part and the 
>> >> >> >> >> >> >> > footer.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
>> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
>> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >> >
>> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this 
>> >> >> >> >> >> > is just syzkaller
>> >> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >> >
>> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
>> >> >> >> >> >> >> > seconds.
>> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> >> >> >> >> >> >> > disables this message.
>> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
>> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
>> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
>> >> >> >> >> >> >> > kernel/sched/completion.c:139
>> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
>> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
>> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something 
>> >> >> >> >> >> >> is preventing
>> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
>> >> >> >> >> >> >> running in kernel
>> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or 
>> >> >> >> >> >> >> if RCU
>> >> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >> >
>> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
>> >> >> >> >> >> > position ...
>> >> >> >> >> >>
>> >> >> >> >> >> I think this is this guy then:
>> >> >> >> >> >>
>> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >> >>
>> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >> >
>> >> >> >> >> > Seems likely to me!
>> >> >> >> >> >
>> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that 
>> >> >> >> >> >> we have, I
>> >> >> >> >> >> think we need some kind of priority between them. I.e. we 
>> >> >> >> >> >> have rcu
>> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
>> >> >> >> >> >> machine
>> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> >> >> deterministically according to priorities. If there is an rcu 
>> >> >> >> >> >> stall,
>> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
>> >> >> >> >> >> stall,
>> >> >> >> >> >> but a workqueue stall, then that's always detected as 
>> >> >> >> >> >> workqueue stall,
>> >> >> >> >> >> etc.
>> >> >> >> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-11 Thread Dmitry Vyukov
On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney
 wrote:
>> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>> >> >>  wrote:
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
>> >> >> >> >> >> >> > 21:20:27 2018 +)
>> >> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
>> >> >> >> >> >> >> > crash yet.
>> >> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following 
>> >> >> >> >> >> >> > tag to the commit:
>> >> >> >> >> >> >> > Reported-by: 
>> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. 
>> >> >> >> >> >> >> > See footer for
>> >> >> >> >> >> >> > details.
>> >> >> >> >> >> >> > If you forward the report, please keep this part and the 
>> >> >> >> >> >> >> > footer.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
>> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
>> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >> >
>> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this 
>> >> >> >> >> >> > is just syzkaller
>> >> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >> >
>> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
>> >> >> >> >> >> >> > seconds.
>> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> >> >> >> >> >> >> > disables this message.
>> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
>> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
>> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
>> >> >> >> >> >> >> > kernel/sched/completion.c:139
>> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
>> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
>> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something 
>> >> >> >> >> >> >> is preventing
>> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
>> >> >> >> >> >> >> running in kernel
>> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or 
>> >> >> >> >> >> >> if RCU
>> >> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >> >
>> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
>> >> >> >> >> >> > position ...
>> >> >> >> >> >>
>> >> >> >> >> >> I think this is this guy then:
>> >> >> >> >> >>
>> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >> >>
>> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >> >
>> >> >> >> >> > Seems likely to me!
>> >> >> >> >> >
>> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that 
>> >> >> >> >> >> we have, I
>> >> >> >> >> >> think we need some kind of priority between them. I.e. we 
>> >> >> >> >> >> have rcu
>> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
>> >> >> >> >> >> machine
>> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> >> >> deterministically according to priorities. If there is an rcu 
>> >> >> >> >> >> stall,
>> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
>> >> >> >> >> >> stall,
>> >> >> >> >> >> but a workqueue stall, then that's always detected as 
>> >> >> >> >> >> workqueue stall,
>> >> >> >> >> >> etc.
>> >> >> >> >> >> Currently if we have an RCU stall (effectively CPU 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-10 Thread Paul E. McKenney
On Tue, Apr 10, 2018 at 01:13:13PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
> >>  wrote:
> >> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >> >>  wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
> >> >> >> >> >> >> > 21:20:27 2018 +)
> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash 
> >> >> >> >> >> >> > yet.
> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following 
> >> >> >> >> >> >> > tag to the commit:
> >> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
> >> >> >> >> >> >> > footer for
> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> > If you forward the report, please keep this part and the 
> >> >> >> >> >> >> > footer.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >
> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this 
> >> >> >> >> >> > is just syzkaller
> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >
> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
> >> >> >> >> >> >> > seconds.
> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> >> >> >> >> >> > disables this message.
> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >>
> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> >> >> >> preventing
> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running 
> >> >> >> >> >> >> in kernel
> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or 
> >> >> >> >> >> >> if RCU
> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >
> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
> >> >> >> >> >> > position ...
> >> >> >> >> >>
> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >>
> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >>
> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >
> >> >> >> >> > Seems likely to me!
> >> >> >> >> >
> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
> >> >> >> >> >> have, I
> >> >> >> >> >> think we need some kind of priority between them. I.e. we have 
> >> >> >> >> >> rcu
> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
> >> >> >> >> >> machine
> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> >> >> deterministically according to priorities. If there is an rcu 
> >> >> >> >> >> stall,
> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
> >> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-10 Thread Paul E. McKenney
On Tue, Apr 10, 2018 at 01:13:13PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
> >>  wrote:
> >> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >> >>  wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
> >> >> >> >> >> >> > 21:20:27 2018 +)
> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash 
> >> >> >> >> >> >> > yet.
> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following 
> >> >> >> >> >> >> > tag to the commit:
> >> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
> >> >> >> >> >> >> > footer for
> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> > If you forward the report, please keep this part and the 
> >> >> >> >> >> >> > footer.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >
> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this 
> >> >> >> >> >> > is just syzkaller
> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >
> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
> >> >> >> >> >> >> > seconds.
> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> >> >> >> >> >> > disables this message.
> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >>
> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> >> >> >> preventing
> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running 
> >> >> >> >> >> >> in kernel
> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or 
> >> >> >> >> >> >> if RCU
> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >
> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
> >> >> >> >> >> > position ...
> >> >> >> >> >>
> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >>
> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >>
> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >
> >> >> >> >> > Seems likely to me!
> >> >> >> >> >
> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
> >> >> >> >> >> have, I
> >> >> >> >> >> think we need some kind of priority between them. I.e. we have 
> >> >> >> >> >> rcu
> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
> >> >> >> >> >> machine
> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> >> >> deterministically according to priorities. If there is an rcu 
> >> >> >> >> >> stall,
> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
> >> >> >> >> >> stall,
> >> >> >> >> >> but a workqueue stall, then that's always detected 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-10 Thread Dmitry Vyukov
On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
>>  wrote:
>> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
>> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>> >>  wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >
>> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
>> >> >> >> >> >> > 21:20:27 2018 +)
>> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >
>> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash 
>> >> >> >> >> >> > yet.
>> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >
>> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag 
>> >> >> >> >> >> > to the commit:
>> >> >> >> >> >> > Reported-by: 
>> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
>> >> >> >> >> >> > footer for
>> >> >> >> >> >> > details.
>> >> >> >> >> >> > If you forward the report, please keep this part and the 
>> >> >> >> >> >> > footer.
>> >> >> >> >> >> >
>> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
>> >> >> >> >> >> > reiserfs_getopt: unknown mount
>> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >
>> >> >> >> >> > Might not hurt to look into the above, though perhaps this is 
>> >> >> >> >> > just syzkaller
>> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >
>> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
>> >> >> >> >> >> > seconds.
>> >> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> >> >> >> >> >> > this message.
>> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
>> >> >> >> >> >> > kernel/sched/completion.c:139
>> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
>> >> >> >> >> >> > kernel/rcu/tree.c:3212
>> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >>
>> >> >> >> >> >> I don't think this is a perf issue. Looks like something is 
>> >> >> >> >> >> preventing
>> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running 
>> >> >> >> >> >> in kernel
>> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if 
>> >> >> >> >> >> RCU
>> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >
>> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
>> >> >> >> >> > position ...
>> >> >> >> >>
>> >> >> >> >> I think this is this guy then:
>> >> >> >> >>
>> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >>
>> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >
>> >> >> >> > Seems likely to me!
>> >> >> >> >
>> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
>> >> >> >> >> have, I
>> >> >> >> >> think we need some kind of priority between them. I.e. we have 
>> >> >> >> >> rcu
>> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
>> >> >> >> >> machine
>> >> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> >> deterministically according to priorities. If there is an rcu 
>> >> >> >> >> stall,
>> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
>> >> >> >> >> stall,
>> >> >> >> >> but a workqueue stall, then that's always detected as workqueue 
>> >> >> >> >> stall,
>> >> >> >> >> etc.
>> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that 
>> >> >> >> >> can be
>> >> >> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-10 Thread Dmitry Vyukov
On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
>>  wrote:
>> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
>> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>> >>  wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >
>> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
>> >> >> >> >> >> > 21:20:27 2018 +)
>> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >
>> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash 
>> >> >> >> >> >> > yet.
>> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >
>> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag 
>> >> >> >> >> >> > to the commit:
>> >> >> >> >> >> > Reported-by: 
>> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
>> >> >> >> >> >> > footer for
>> >> >> >> >> >> > details.
>> >> >> >> >> >> > If you forward the report, please keep this part and the 
>> >> >> >> >> >> > footer.
>> >> >> >> >> >> >
>> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
>> >> >> >> >> >> > reiserfs_getopt: unknown mount
>> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >
>> >> >> >> >> > Might not hurt to look into the above, though perhaps this is 
>> >> >> >> >> > just syzkaller
>> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >
>> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
>> >> >> >> >> >> > seconds.
>> >> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> >> >> >> >> >> > this message.
>> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
>> >> >> >> >> >> > kernel/sched/completion.c:139
>> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
>> >> >> >> >> >> > kernel/rcu/tree.c:3212
>> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >>
>> >> >> >> >> >> I don't think this is a perf issue. Looks like something is 
>> >> >> >> >> >> preventing
>> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running 
>> >> >> >> >> >> in kernel
>> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if 
>> >> >> >> >> >> RCU
>> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >
>> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
>> >> >> >> >> > position ...
>> >> >> >> >>
>> >> >> >> >> I think this is this guy then:
>> >> >> >> >>
>> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >>
>> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >
>> >> >> >> > Seems likely to me!
>> >> >> >> >
>> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
>> >> >> >> >> have, I
>> >> >> >> >> think we need some kind of priority between them. I.e. we have 
>> >> >> >> >> rcu
>> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
>> >> >> >> >> machine
>> >> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> >> deterministically according to priorities. If there is an rcu 
>> >> >> >> >> stall,
>> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
>> >> >> >> >> stall,
>> >> >> >> >> but a workqueue stall, then that's always detected as workqueue 
>> >> >> >> >> stall,
>> >> >> >> >> etc.
>> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that 
>> >> >> >> >> can be
>> >> >> >> >> detected either RCU stall or a task hung, producing 2 different 
>> >> >> >> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Paul E. McKenney
On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >>  wrote:
> >> >> >> >> >>
> >> >> >> >> >> > Hello,
> >> >> >> >> >> >
> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 
> >> >> >> >> >> > 2018 +)
> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >
> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash 
> >> >> >> >> >> > yet.
> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >
> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag 
> >> >> >> >> >> > to the commit:
> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
> >> >> >> >> >> > footer for
> >> >> >> >> >> > details.
> >> >> >> >> >> > If you forward the report, please keep this part and the 
> >> >> >> >> >> > footer.
> >> >> >> >> >> >
> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
> >> >> >> >> >> > unknown mount
> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >
> >> >> >> >> > Might not hurt to look into the above, though perhaps this is 
> >> >> >> >> > just syzkaller
> >> >> >> >> > playing around with mount options.
> >> >> >> >> >
> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
> >> >> >> >> >> > seconds.
> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> >> >> >> >> >> > this message.
> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >>
> >> >> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> >> >> preventing
> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
> >> >> >> >> >> kernel
> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if 
> >> >> >> >> >> RCU
> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >
> >> >> >> >> > The RCU CPU stall warning below strongly supports this position 
> >> >> >> >> > ...
> >> >> >> >>
> >> >> >> >> I think this is this guy then:
> >> >> >> >>
> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >>
> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >
> >> >> >> > Seems likely to me!
> >> >> >> >
> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
> >> >> >> >> have, I
> >> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
> >> >> >> >> machine
> >> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> >> deterministically according to priorities. If there is an rcu 
> >> >> >> >> stall,
> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
> >> >> >> >> stall,
> >> >> >> >> but a workqueue stall, then that's always detected as workqueue 
> >> >> >> >> stall,
> >> >> >> >> etc.
> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that 
> >> >> >> >> can be
> >> >> >> >> detected either RCU stall or a task hung, producing 2 different 
> >> >> >> >> bug
> >> >> >> >> reports (which is bad).
> >> >> >> >> One can say that it's only a matter of tuning timeouts, but at 
> >> >> >> >> least

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Paul E. McKenney
On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >>  wrote:
> >> >> >> >> >>
> >> >> >> >> >> > Hello,
> >> >> >> >> >> >
> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 
> >> >> >> >> >> > 2018 +)
> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >
> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash 
> >> >> >> >> >> > yet.
> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >
> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag 
> >> >> >> >> >> > to the commit:
> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
> >> >> >> >> >> > footer for
> >> >> >> >> >> > details.
> >> >> >> >> >> > If you forward the report, please keep this part and the 
> >> >> >> >> >> > footer.
> >> >> >> >> >> >
> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
> >> >> >> >> >> > unknown mount
> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >
> >> >> >> >> > Might not hurt to look into the above, though perhaps this is 
> >> >> >> >> > just syzkaller
> >> >> >> >> > playing around with mount options.
> >> >> >> >> >
> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
> >> >> >> >> >> > seconds.
> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> >> >> >> >> >> > this message.
> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >>
> >> >> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> >> >> preventing
> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
> >> >> >> >> >> kernel
> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if 
> >> >> >> >> >> RCU
> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >
> >> >> >> >> > The RCU CPU stall warning below strongly supports this position 
> >> >> >> >> > ...
> >> >> >> >>
> >> >> >> >> I think this is this guy then:
> >> >> >> >>
> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >>
> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >
> >> >> >> > Seems likely to me!
> >> >> >> >
> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
> >> >> >> >> have, I
> >> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
> >> >> >> >> machine
> >> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> >> deterministically according to priorities. If there is an rcu 
> >> >> >> >> stall,
> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
> >> >> >> >> stall,
> >> >> >> >> but a workqueue stall, then that's always detected as workqueue 
> >> >> >> >> stall,
> >> >> >> >> etc.
> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that 
> >> >> >> >> can be
> >> >> >> >> detected either RCU stall or a task hung, producing 2 different 
> >> >> >> >> bug
> >> >> >> >> reports (which is bad).
> >> >> >> >> One can say that it's only a matter of tuning timeouts, but at 
> >> >> >> >> least
> >> >> >> >> task hung detector has a problem that if 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Dmitry Vyukov
On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>>  wrote:
>> >> >> >> >>
>> >> >> >> >> > Hello,
>> >> >> >> >> >
>> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 
>> >> >> >> >> > 2018 +)
>> >> >> >> >> > Linux 4.16
>> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >
>> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> >> > Raw console output:
>> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> > Kernel config:
>> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >
>> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to 
>> >> >> >> >> > the commit:
>> >> >> >> >> > Reported-by: 
>> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
>> >> >> >> >> > footer for
>> >> >> >> >> > details.
>> >> >> >> >> > If you forward the report, please keep this part and the 
>> >> >> >> >> > footer.
>> >> >> >> >> >
>> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
>> >> >> >> >> > unknown mount
>> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >
>> >> >> >> > Might not hurt to look into the above, though perhaps this is 
>> >> >> >> > just syzkaller
>> >> >> >> > playing around with mount options.
>> >> >> >> >
>> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
>> >> >> >> >> > seconds.
>> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> >> >> >> >> > this message.
>> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> > Call Trace:
>> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >>
>> >> >> >> >> I don't think this is a perf issue. Looks like something is 
>> >> >> >> >> preventing
>> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
>> >> >> >> >> kernel
>> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >
>> >> >> >> > The RCU CPU stall warning below strongly supports this position 
>> >> >> >> > ...
>> >> >> >>
>> >> >> >> I think this is this guy then:
>> >> >> >>
>> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >>
>> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >
>> >> >> > Seems likely to me!
>> >> >> >
>> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
>> >> >> >> have, I
>> >> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> >> but a workqueue stall, then that's always detected as workqueue 
>> >> >> >> stall,
>> >> >> >> etc.
>> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can 
>> >> >> >> be
>> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> >> reports (which is bad).
>> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> >> task hung detector has a problem that if you set timeout to X, it 
>> >> >> >> can
>> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need 
>> >> >> >> quite
>> >> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> >> dead (these 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Dmitry Vyukov
On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>>  wrote:
>> >> >> >> >>
>> >> >> >> >> > Hello,
>> >> >> >> >> >
>> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 
>> >> >> >> >> > 2018 +)
>> >> >> >> >> > Linux 4.16
>> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >
>> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> >> > Raw console output:
>> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> > Kernel config:
>> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >
>> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to 
>> >> >> >> >> > the commit:
>> >> >> >> >> > Reported-by: 
>> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
>> >> >> >> >> > footer for
>> >> >> >> >> > details.
>> >> >> >> >> > If you forward the report, please keep this part and the 
>> >> >> >> >> > footer.
>> >> >> >> >> >
>> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
>> >> >> >> >> > unknown mount
>> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >
>> >> >> >> > Might not hurt to look into the above, though perhaps this is 
>> >> >> >> > just syzkaller
>> >> >> >> > playing around with mount options.
>> >> >> >> >
>> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
>> >> >> >> >> > seconds.
>> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> >> >> >> >> > this message.
>> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> > Call Trace:
>> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >>
>> >> >> >> >> I don't think this is a perf issue. Looks like something is 
>> >> >> >> >> preventing
>> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
>> >> >> >> >> kernel
>> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >
>> >> >> >> > The RCU CPU stall warning below strongly supports this position 
>> >> >> >> > ...
>> >> >> >>
>> >> >> >> I think this is this guy then:
>> >> >> >>
>> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >>
>> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >
>> >> >> > Seems likely to me!
>> >> >> >
>> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
>> >> >> >> have, I
>> >> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> >> but a workqueue stall, then that's always detected as workqueue 
>> >> >> >> stall,
>> >> >> >> etc.
>> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can 
>> >> >> >> be
>> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> >> reports (which is bad).
>> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> >> task hung detector has a problem that if you set timeout to X, it 
>> >> >> >> can
>> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need 
>> >> >> >> quite
>> >> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> >> dead (these things happen every few minutes).
>> >> >> >
>> >> >> > I 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Paul E. McKenney
On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>  wrote:
> >> >> >> >>
> >> >> >> >> > Hello,
> >> >> >> >> >
> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 
> >> >> >> >> > 2018 +)
> >> >> >> >> > Linux 4.16
> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >
> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> >> > Raw console output:
> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> > Kernel config:
> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >
> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to 
> >> >> >> >> > the commit:
> >> >> >> >> > Reported-by: 
> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
> >> >> >> >> > footer for
> >> >> >> >> > details.
> >> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >> >
> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
> >> >> >> >> > unknown mount
> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >
> >> >> >> > Might not hurt to look into the above, though perhaps this is just 
> >> >> >> > syzkaller
> >> >> >> > playing around with mount options.
> >> >> >> >
> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
> >> >> >> >> > seconds.
> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> >> >> >> >> > this message.
> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> > Call Trace:
> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >>
> >> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> >> preventing
> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
> >> >> >> >> kernel
> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >
> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >> >>
> >> >> >> I think this is this guy then:
> >> >> >>
> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >>
> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >
> >> >> > Seems likely to me!
> >> >> >
> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, 
> >> >> >> I
> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> >> but a workqueue stall, then that's always detected as workqueue 
> >> >> >> stall,
> >> >> >> etc.
> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can 
> >> >> >> be
> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> >> reports (which is bad).
> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> >> dead (these things happen every few minutes).
> >> >> >
> >> >> > I suppose that we could have a global variable that was set to the
> >> >> > priority of the complaint in question, which would suppress all
> >> >> > lower-priority complaints.  Might 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Paul E. McKenney
On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>  wrote:
> >> >> >> >>
> >> >> >> >> > Hello,
> >> >> >> >> >
> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 
> >> >> >> >> > 2018 +)
> >> >> >> >> > Linux 4.16
> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >
> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> >> > Raw console output:
> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> > Kernel config:
> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >
> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to 
> >> >> >> >> > the commit:
> >> >> >> >> > Reported-by: 
> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
> >> >> >> >> > footer for
> >> >> >> >> > details.
> >> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >> >
> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
> >> >> >> >> > unknown mount
> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >
> >> >> >> > Might not hurt to look into the above, though perhaps this is just 
> >> >> >> > syzkaller
> >> >> >> > playing around with mount options.
> >> >> >> >
> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
> >> >> >> >> > seconds.
> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> >> >> >> >> > this message.
> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> > Call Trace:
> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >>
> >> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> >> preventing
> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
> >> >> >> >> kernel
> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >
> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >> >>
> >> >> >> I think this is this guy then:
> >> >> >>
> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >>
> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >
> >> >> > Seems likely to me!
> >> >> >
> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, 
> >> >> >> I
> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> >> but a workqueue stall, then that's always detected as workqueue 
> >> >> >> stall,
> >> >> >> etc.
> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can 
> >> >> >> be
> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> >> reports (which is bad).
> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> >> dead (these things happen every few minutes).
> >> >> >
> >> >> > I suppose that we could have a global variable that was set to the
> >> >> > priority of the complaint in question, which would suppress all
> >> >> > lower-priority complaints.  Might need to be opt-in, though -- I 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
 wrote:
>> >> >> >>
>> >> >> >> > Hello,
>> >> >> >> >
>> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
>> >> >> >> > +)
>> >> >> >> > Linux 4.16
>> >> >> >> > syzbot dashboard link:
>> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >
>> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> > Raw console output:
>> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> > Kernel config:
>> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >
>> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to 
>> >> >> >> > the commit:
>> >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> > It will help syzbot understand when the bug is fixed. See footer 
>> >> >> >> > for
>> >> >> >> > details.
>> >> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >> >
>> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
>> >> >> >> > unknown mount
>> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >
>> >> >> > Might not hurt to look into the above, though perhaps this is just 
>> >> >> > syzkaller
>> >> >> > playing around with mount options.
>> >> >> >
>> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> >> >> >> > message.
>> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> > Call Trace:
>> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >>
>> >> >> >> I don't think this is a perf issue. Looks like something is 
>> >> >> >> preventing
>> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
>> >> >> >> kernel
>> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >
>> >> >> > The RCU CPU stall warning below strongly supports this position ...
>> >> >>
>> >> >> I think this is this guy then:
>> >> >>
>> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >>
>> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >
>> >> > Seems likely to me!
>> >> >
>> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> >> etc.
>> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> reports (which is bad).
>> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> task hung detector has a problem that if you set timeout to X, it can
>> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> dead (these things happen every few minutes).
>> >> >
>> >> > I suppose that we could have a global variable that was set to the
>> >> > priority of the complaint in question, which would suppress all
>> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> >> > guess that not everyone is going to be happy with one complaint 
>> >> > suppressing
>> >> > others, especially given the possibility that the two complaints might
>> >> > be about different things.
>> >> >
>> >> > Or did you have something more deft in mind?
>> >>
>> >>

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
 wrote:
>> >> >> >>
>> >> >> >> > Hello,
>> >> >> >> >
>> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
>> >> >> >> > +)
>> >> >> >> > Linux 4.16
>> >> >> >> > syzbot dashboard link:
>> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >
>> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> > Raw console output:
>> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> > Kernel config:
>> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >
>> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to 
>> >> >> >> > the commit:
>> >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> > It will help syzbot understand when the bug is fixed. See footer 
>> >> >> >> > for
>> >> >> >> > details.
>> >> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >> >
>> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
>> >> >> >> > unknown mount
>> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >
>> >> >> > Might not hurt to look into the above, though perhaps this is just 
>> >> >> > syzkaller
>> >> >> > playing around with mount options.
>> >> >> >
>> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> >> >> >> > message.
>> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> > Call Trace:
>> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >>
>> >> >> >> I don't think this is a perf issue. Looks like something is 
>> >> >> >> preventing
>> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
>> >> >> >> kernel
>> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >
>> >> >> > The RCU CPU stall warning below strongly supports this position ...
>> >> >>
>> >> >> I think this is this guy then:
>> >> >>
>> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >>
>> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >
>> >> > Seems likely to me!
>> >> >
>> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> >> etc.
>> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> reports (which is bad).
>> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> task hung detector has a problem that if you set timeout to X, it can
>> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> dead (these things happen every few minutes).
>> >> >
>> >> > I suppose that we could have a global variable that was set to the
>> >> > priority of the complaint in question, which would suppress all
>> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> >> > guess that not everyone is going to be happy with one complaint 
>> >> > suppressing
>> >> > others, especially given the possibility that the two complaints might
>> >> > be about different things.
>> >> >
>> >> > Or did you have something more deft in mind?
>> >>
>> >>
>> >> syzkaller generally 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 07:11:50PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
> >>  wrote:
> >> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> >> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
> >> >>  wrote:
> >> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> >> >> syzbot  wrote:
> >> >> >>
> >> >> >> > Hello,
> >> >> >> >
> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
> >> >> >> > +)
> >> >> >> > Linux 4.16
> >> >> >> > syzbot dashboard link:
> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >
> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> > Raw console output:
> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> > Kernel config:
> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >
> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
> >> >> >> > commit:
> >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> > It will help syzbot understand when the bug is fixed. See footer 
> >> >> >> > for
> >> >> >> > details.
> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >
> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
> >> >> >> > unknown mount
> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >
> >> >> > Might not hurt to look into the above, though perhaps this is just 
> >> >> > syzkaller
> >> >> > playing around with mount options.
> >> >> >
> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
> >> >> >> > message.
> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> > Call Trace:
> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >>
> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> preventing
> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> somehow missed a transition into idle or user space.
> >> >> >
> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >>
> >> >> I think this is this guy then:
> >> >>
> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >>
> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >
> >> > Seems likely to me!
> >> >
> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> hang and maybe something else. It would be useful if they fire
> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> >> etc.
> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> reports (which is bad).
> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> dead (these things happen every few minutes).
> >> >
> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 07:11:50PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
> >>  wrote:
> >> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> >> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
> >> >>  wrote:
> >> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> >> >> syzbot  wrote:
> >> >> >>
> >> >> >> > Hello,
> >> >> >> >
> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
> >> >> >> > +)
> >> >> >> > Linux 4.16
> >> >> >> > syzbot dashboard link:
> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >
> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> > Raw console output:
> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> > Kernel config:
> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >
> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
> >> >> >> > commit:
> >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> > It will help syzbot understand when the bug is fixed. See footer 
> >> >> >> > for
> >> >> >> > details.
> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >
> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
> >> >> >> > unknown mount
> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >
> >> >> > Might not hurt to look into the above, though perhaps this is just 
> >> >> > syzkaller
> >> >> > playing around with mount options.
> >> >> >
> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
> >> >> >> > message.
> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> > Call Trace:
> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >>
> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> preventing
> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> somehow missed a transition into idle or user space.
> >> >> >
> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >>
> >> >> I think this is this guy then:
> >> >>
> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >>
> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >
> >> > Seems likely to me!
> >> >
> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> hang and maybe something else. It would be useful if they fire
> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> >> etc.
> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> reports (which is bad).
> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> dead (these things happen every few minutes).
> >> >
> >> > I suppose that we could have a global variable that was set to the
> >> > priority of the complaint in question, which would suppress 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
>>  wrote:
>> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
>> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>> >>  wrote:
>> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
>> >> >> syzbot  wrote:
>> >> >>
>> >> >> > Hello,
>> >> >> >
>> >> >> > syzbot hit the following crash on upstream commit
>> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
>> >> >> > +)
>> >> >> > Linux 4.16
>> >> >> > syzbot dashboard link:
>> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >
>> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> > Raw console output:
>> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> > Kernel config:
>> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >
>> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
>> >> >> > commit:
>> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> >> > details.
>> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >
>> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
>> >> >> > mount
>> >> >> > option "g �;e�K�׫>pquota"
>> >> >
>> >> > Might not hurt to look into the above, though perhaps this is just 
>> >> > syzkaller
>> >> > playing around with mount options.
>> >> >
>> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >Not tainted 4.16.0+ #10
>> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> >> >> > message.
>> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> > Call Trace:
>> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >>
>> >> >> I don't think this is a perf issue. Looks like something is preventing
>> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> somehow missed a transition into idle or user space.
>> >> >
>> >> > The RCU CPU stall warning below strongly supports this position ...
>> >>
>> >> I think this is this guy then:
>> >>
>> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >>
>> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >
>> > Seems likely to me!
>> >
>> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> think we need some kind of priority between them. I.e. we have rcu
>> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> hang and maybe something else. It would be useful if they fire
>> >> deterministically according to priorities. If there is an rcu stall,
>> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> etc.
>> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> reports (which is bad).
>> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> task hung detector has a problem that if you set timeout to X, it can
>> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> large timeout (a minute may not be enough), and on the other hand we
>> >> can't wait for an hour just to make sure that the machine is indeed
>> >> dead (these things happen every few minutes).
>> >
>> > I suppose that we could have a global variable that was set to the
>> > priority of the complaint in question, which would suppress all
>> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> > guess that not everyone is going to be happy with one 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
>>  wrote:
>> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
>> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>> >>  wrote:
>> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
>> >> >> syzbot  wrote:
>> >> >>
>> >> >> > Hello,
>> >> >> >
>> >> >> > syzbot hit the following crash on upstream commit
>> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
>> >> >> > +)
>> >> >> > Linux 4.16
>> >> >> > syzbot dashboard link:
>> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >
>> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> > Raw console output:
>> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> > Kernel config:
>> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >
>> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
>> >> >> > commit:
>> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> >> > details.
>> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >
>> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
>> >> >> > mount
>> >> >> > option "g �;e�K�׫>pquota"
>> >> >
>> >> > Might not hurt to look into the above, though perhaps this is just 
>> >> > syzkaller
>> >> > playing around with mount options.
>> >> >
>> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >Not tainted 4.16.0+ #10
>> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> >> >> > message.
>> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> > Call Trace:
>> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >>
>> >> >> I don't think this is a perf issue. Looks like something is preventing
>> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> somehow missed a transition into idle or user space.
>> >> >
>> >> > The RCU CPU stall warning below strongly supports this position ...
>> >>
>> >> I think this is this guy then:
>> >>
>> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >>
>> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >
>> > Seems likely to me!
>> >
>> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> think we need some kind of priority between them. I.e. we have rcu
>> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> hang and maybe something else. It would be useful if they fire
>> >> deterministically according to priorities. If there is an rcu stall,
>> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> etc.
>> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> reports (which is bad).
>> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> task hung detector has a problem that if you set timeout to X, it can
>> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> large timeout (a minute may not be enough), and on the other hand we
>> >> can't wait for an hour just to make sure that the machine is indeed
>> >> dead (these things happen every few minutes).
>> >
>> > I suppose that we could have a global variable that was set to the
>> > priority of the complaint in question, which would suppress all
>> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> > guess that not everyone is going to be happy with one complaint suppressing
>> > others, especially given the possibility that the two complaints might
>> > be about different things.
>> >
>> > Or 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
> >>  wrote:
> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> >> syzbot  wrote:
> >> >>
> >> >> > Hello,
> >> >> >
> >> >> > syzbot hit the following crash on upstream commit
> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
> >> >> > +)
> >> >> > Linux 4.16
> >> >> > syzbot dashboard link:
> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >
> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> > Raw console output:
> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> > Kernel config:
> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >
> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
> >> >> > commit:
> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> >> > details.
> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >
> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
> >> >> > mount
> >> >> > option "g �;e�K�׫>pquota"
> >> >
> >> > Might not hurt to look into the above, though perhaps this is just 
> >> > syzkaller
> >> > playing around with mount options.
> >> >
> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >Not tainted 4.16.0+ #10
> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
> >> >> > message.
> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> > Call Trace:
> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >>
> >> >> I don't think this is a perf issue. Looks like something is preventing
> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> somehow missed a transition into idle or user space.
> >> >
> >> > The RCU CPU stall warning below strongly supports this position ...
> >>
> >> I think this is this guy then:
> >>
> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >>
> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >
> > Seems likely to me!
> >
> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> think we need some kind of priority between them. I.e. we have rcu
> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> hang and maybe something else. It would be useful if they fire
> >> deterministically according to priorities. If there is an rcu stall,
> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> etc.
> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> detected either RCU stall or a task hung, producing 2 different bug
> >> reports (which is bad).
> >> One can say that it's only a matter of tuning timeouts, but at least
> >> task hung detector has a problem that if you set timeout to X, it can
> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> large timeout (a minute may not be enough), and on the other hand we
> >> can't wait for an hour just to make sure that the machine is indeed
> >> dead (these things happen every few minutes).
> >
> > I suppose that we could have a global variable that was set to the
> > priority of the complaint in question, which would suppress all
> > lower-priority complaints.  Might need to be opt-in, though -- I would
> > guess that not everyone is going to be happy with one complaint suppressing
> > others, especially given the possibility that the two complaints might
> > be about different things.
> >
> > Or did you have something more deft in mind?
> 
> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
> >>  wrote:
> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> >> syzbot  wrote:
> >> >>
> >> >> > Hello,
> >> >> >
> >> >> > syzbot hit the following crash on upstream commit
> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
> >> >> > +)
> >> >> > Linux 4.16
> >> >> > syzbot dashboard link:
> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >
> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> > Raw console output:
> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> > Kernel config:
> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >
> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
> >> >> > commit:
> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> >> > details.
> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >
> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
> >> >> > mount
> >> >> > option "g �;e�K�׫>pquota"
> >> >
> >> > Might not hurt to look into the above, though perhaps this is just 
> >> > syzkaller
> >> > playing around with mount options.
> >> >
> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >Not tainted 4.16.0+ #10
> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
> >> >> > message.
> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> > Call Trace:
> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >>
> >> >> I don't think this is a perf issue. Looks like something is preventing
> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> somehow missed a transition into idle or user space.
> >> >
> >> > The RCU CPU stall warning below strongly supports this position ...
> >>
> >> I think this is this guy then:
> >>
> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >>
> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >
> > Seems likely to me!
> >
> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> think we need some kind of priority between them. I.e. we have rcu
> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> hang and maybe something else. It would be useful if they fire
> >> deterministically according to priorities. If there is an rcu stall,
> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> etc.
> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> detected either RCU stall or a task hung, producing 2 different bug
> >> reports (which is bad).
> >> One can say that it's only a matter of tuning timeouts, but at least
> >> task hung detector has a problem that if you set timeout to X, it can
> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> large timeout (a minute may not be enough), and on the other hand we
> >> can't wait for an hour just to make sure that the machine is indeed
> >> dead (these things happen every few minutes).
> >
> > I suppose that we could have a global variable that was set to the
> > priority of the complaint in question, which would suppress all
> > lower-priority complaints.  Might need to be opt-in, though -- I would
> > guess that not everyone is going to be happy with one complaint suppressing
> > others, especially given the possibility that the two complaints might
> > be about different things.
> >
> > Or did you have something more deft in mind?
> 
> 
> syzkaller generally looks only at the first report. One does not know
> if/when there will be a second one, 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>>  wrote:
>> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> >> On Mon, 02 Apr 2018 02:20:02 -0700
>> >> syzbot  wrote:
>> >>
>> >> > Hello,
>> >> >
>> >> > syzbot hit the following crash on upstream commit
>> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
>> >> > Linux 4.16
>> >> > syzbot dashboard link:
>> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >
>> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> > Raw console output:
>> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> > Kernel config:
>> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >
>> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
>> >> > commit:
>> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> > details.
>> >> > If you forward the report, please keep this part and the footer.
>> >> >
>> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
>> >> > mount
>> >> > option "g �;e�K�׫>pquota"
>> >
>> > Might not hurt to look into the above, though perhaps this is just 
>> > syzkaller
>> > playing around with mount options.
>> >
>> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >Not tainted 4.16.0+ #10
>> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> >> > message.
>> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> > Call Trace:
>> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >>
>> >> I don't think this is a perf issue. Looks like something is preventing
>> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> somehow missed a transition into idle or user space.
>> >
>> > The RCU CPU stall warning below strongly supports this position ...
>>
>> I think this is this guy then:
>>
>> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>>
>> #syz dup: INFO: rcu detected stall in __process_echoes
>
> Seems likely to me!
>
>> Looking retrospectively at the various hang/stall bugs that we have, I
>> think we need some kind of priority between them. I.e. we have rcu
>> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> hang and maybe something else. It would be useful if they fire
>> deterministically according to priorities. If there is an rcu stall,
>> that's always detected as CPU stall. Then if there is no RCU stall,
>> but a workqueue stall, then that's always detected as workqueue stall,
>> etc.
>> Currently if we have an RCU stall (effectively CPU stall), that can be
>> detected either RCU stall or a task hung, producing 2 different bug
>> reports (which is bad).
>> One can say that it's only a matter of tuning timeouts, but at least
>> task hung detector has a problem that if you set timeout to X, it can
>> detect hung anywhere between X and 2*X. And on one hand we need quite
>> large timeout (a minute may not be enough), and on the other hand we
>> can't wait for an hour just to make sure that the machine is indeed
>> dead (these things happen every few minutes).
>
> I suppose that we could have a global variable that was set to the
> priority of the complaint in question, which would suppress all
> lower-priority complaints.  Might need to be opt-in, though -- I would
> guess that not everyone is going to be happy with one complaint suppressing
> others, especially given the possibility that the two complaints might
> be about different things.
>
> Or did you have something more deft in mind?


syzkaller generally looks only at the first report. One does not know
if/when there will be a second one, or the second one can be induced
by the first one, and we generally want clean reports on a non-tainted
kernel. So we don't just need to suppress lower priority ones, 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>>  wrote:
>> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> >> On Mon, 02 Apr 2018 02:20:02 -0700
>> >> syzbot  wrote:
>> >>
>> >> > Hello,
>> >> >
>> >> > syzbot hit the following crash on upstream commit
>> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
>> >> > Linux 4.16
>> >> > syzbot dashboard link:
>> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >
>> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> > Raw console output:
>> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> > Kernel config:
>> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >
>> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
>> >> > commit:
>> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> > details.
>> >> > If you forward the report, please keep this part and the footer.
>> >> >
>> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
>> >> > mount
>> >> > option "g �;e�K�׫>pquota"
>> >
>> > Might not hurt to look into the above, though perhaps this is just 
>> > syzkaller
>> > playing around with mount options.
>> >
>> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >Not tainted 4.16.0+ #10
>> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> >> > message.
>> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> > Call Trace:
>> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >>
>> >> I don't think this is a perf issue. Looks like something is preventing
>> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> somehow missed a transition into idle or user space.
>> >
>> > The RCU CPU stall warning below strongly supports this position ...
>>
>> I think this is this guy then:
>>
>> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>>
>> #syz dup: INFO: rcu detected stall in __process_echoes
>
> Seems likely to me!
>
>> Looking retrospectively at the various hang/stall bugs that we have, I
>> think we need some kind of priority between them. I.e. we have rcu
>> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> hang and maybe something else. It would be useful if they fire
>> deterministically according to priorities. If there is an rcu stall,
>> that's always detected as CPU stall. Then if there is no RCU stall,
>> but a workqueue stall, then that's always detected as workqueue stall,
>> etc.
>> Currently if we have an RCU stall (effectively CPU stall), that can be
>> detected either RCU stall or a task hung, producing 2 different bug
>> reports (which is bad).
>> One can say that it's only a matter of tuning timeouts, but at least
>> task hung detector has a problem that if you set timeout to X, it can
>> detect hung anywhere between X and 2*X. And on one hand we need quite
>> large timeout (a minute may not be enough), and on the other hand we
>> can't wait for an hour just to make sure that the machine is indeed
>> dead (these things happen every few minutes).
>
> I suppose that we could have a global variable that was set to the
> priority of the complaint in question, which would suppress all
> lower-priority complaints.  Might need to be opt-in, though -- I would
> guess that not everyone is going to be happy with one complaint suppressing
> others, especially given the possibility that the two complaints might
> be about different things.
>
> Or did you have something more deft in mind?


syzkaller generally looks only at the first report. One does not know
if/when there will be a second one, or the second one can be induced
by the first one, and we generally want clean reports on a non-tainted
kernel. So we don't just need to suppress lower priority ones, we need
to produce the right report first.
I am thinking maybe setting:
 - rcu stalls at 1.5 minutes
 - 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> syzbot  wrote:
> >>
> >> > Hello,
> >> >
> >> > syzbot hit the following crash on upstream commit
> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
> >> > Linux 4.16
> >> > syzbot dashboard link:
> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >
> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> > Raw console output:
> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> > Kernel config:
> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >
> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
> >> > commit:
> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> > details.
> >> > If you forward the report, please keep this part and the footer.
> >> >
> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
> >> > mount
> >> > option "g �;e�K�׫>pquota"
> >
> > Might not hurt to look into the above, though perhaps this is just syzkaller
> > playing around with mount options.
> >
> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >Not tainted 4.16.0+ #10
> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> > syz-executor3   D20944 10803   4492 0x8002
> >> > Call Trace:
> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >>
> >> I don't think this is a perf issue. Looks like something is preventing
> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> space and never scheduling, that can cause this issue. Or if RCU
> >> somehow missed a transition into idle or user space.
> >
> > The RCU CPU stall warning below strongly supports this position ...
> 
> I think this is this guy then:
> 
> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> 
> #syz dup: INFO: rcu detected stall in __process_echoes

Seems likely to me!

> Looking retrospectively at the various hang/stall bugs that we have, I
> think we need some kind of priority between them. I.e. we have rcu
> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> hang and maybe something else. It would be useful if they fire
> deterministically according to priorities. If there is an rcu stall,
> that's always detected as CPU stall. Then if there is no RCU stall,
> but a workqueue stall, then that's always detected as workqueue stall,
> etc.
> Currently if we have an RCU stall (effectively CPU stall), that can be
> detected either RCU stall or a task hung, producing 2 different bug
> reports (which is bad).
> One can say that it's only a matter of tuning timeouts, but at least
> task hung detector has a problem that if you set timeout to X, it can
> detect hung anywhere between X and 2*X. And on one hand we need quite
> large timeout (a minute may not be enough), and on the other hand we
> can't wait for an hour just to make sure that the machine is indeed
> dead (these things happen every few minutes).

I suppose that we could have a global variable that was set to the
priority of the complaint in question, which would suppress all
lower-priority complaints.  Might need to be opt-in, though -- I would
guess that not everyone is going to be happy with one complaint suppressing
others, especially given the possibility that the two complaints might
be about different things.

Or did you have something more deft in mind?

Thanx, Paul

> >> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 
> >> > [inline]
> >> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
> >> > kernel/trace/trace_event_perf.c:161
> >> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
> >> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
> >> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
> >> >   

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> syzbot  wrote:
> >>
> >> > Hello,
> >> >
> >> > syzbot hit the following crash on upstream commit
> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
> >> > Linux 4.16
> >> > syzbot dashboard link:
> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >
> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> > Raw console output:
> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> > Kernel config:
> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >
> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
> >> > commit:
> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> > details.
> >> > If you forward the report, please keep this part and the footer.
> >> >
> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
> >> > mount
> >> > option "g �;e�K�׫>pquota"
> >
> > Might not hurt to look into the above, though perhaps this is just syzkaller
> > playing around with mount options.
> >
> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >Not tainted 4.16.0+ #10
> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> > syz-executor3   D20944 10803   4492 0x8002
> >> > Call Trace:
> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >>
> >> I don't think this is a perf issue. Looks like something is preventing
> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> space and never scheduling, that can cause this issue. Or if RCU
> >> somehow missed a transition into idle or user space.
> >
> > The RCU CPU stall warning below strongly supports this position ...
> 
> I think this is this guy then:
> 
> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> 
> #syz dup: INFO: rcu detected stall in __process_echoes

Seems likely to me!

> Looking retrospectively at the various hang/stall bugs that we have, I
> think we need some kind of priority between them. I.e. we have rcu
> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> hang and maybe something else. It would be useful if they fire
> deterministically according to priorities. If there is an rcu stall,
> that's always detected as CPU stall. Then if there is no RCU stall,
> but a workqueue stall, then that's always detected as workqueue stall,
> etc.
> Currently if we have an RCU stall (effectively CPU stall), that can be
> detected either RCU stall or a task hung, producing 2 different bug
> reports (which is bad).
> One can say that it's only a matter of tuning timeouts, but at least
> task hung detector has a problem that if you set timeout to X, it can
> detect hung anywhere between X and 2*X. And on one hand we need quite
> large timeout (a minute may not be enough), and on the other hand we
> can't wait for an hour just to make sure that the machine is indeed
> dead (these things happen every few minutes).

I suppose that we could have a global variable that was set to the
priority of the complaint in question, which would suppress all
lower-priority complaints.  Might need to be opt-in, though -- I would
guess that not everyone is going to be happy with one complaint suppressing
others, especially given the possibility that the two complaints might
be about different things.

Or did you have something more deft in mind?

Thanx, Paul

> >> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 
> >> > [inline]
> >> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
> >> > kernel/trace/trace_event_perf.c:161
> >> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
> >> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
> >> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
> >> >   put_event+0x24/0x30 kernel/events/core.c:4204
> >> >   perf_event_release_kernel+0x6e8/0xfc0 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> On Mon, 02 Apr 2018 02:20:02 -0700
>> syzbot  wrote:
>>
>> > Hello,
>> >
>> > syzbot hit the following crash on upstream commit
>> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
>> > Linux 4.16
>> > syzbot dashboard link:
>> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >
>> > Unfortunately, I don't have any reproducer for this crash yet.
>> > Raw console output:
>> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> > Kernel config:
>> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> > compiler: gcc (GCC) 7.1.1 20170620
>> >
>> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> > It will help syzbot understand when the bug is fixed. See footer for
>> > details.
>> > If you forward the report, please keep this part and the footer.
>> >
>> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
>> > option "g �;e�K�׫>pquota"
>
> Might not hurt to look into the above, though perhaps this is just syzkaller
> playing around with mount options.
>
>> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >Not tainted 4.16.0+ #10
>> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > syz-executor3   D20944 10803   4492 0x8002
>> > Call Trace:
>> >   context_switch kernel/sched/core.c:2862 [inline]
>> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>>
>> I don't think this is a perf issue. Looks like something is preventing
>> rcu_sched from completing. If there's a CPU that is running in kernel
>> space and never scheduling, that can cause this issue. Or if RCU
>> somehow missed a transition into idle or user space.
>
> The RCU CPU stall warning below strongly supports this position ...


I think this is this guy then:

https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40

#syz dup: INFO: rcu detected stall in __process_echoes


Looking retrospectively at the various hang/stall bugs that we have, I
think we need some kind of priority between them. I.e. we have rcu
stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
hang and maybe something else. It would be useful if they fire
deterministically according to priorities. If there is an rcu stall,
that's always detected as CPU stall. Then if there is no RCU stall,
but a workqueue stall, then that's always detected as workqueue stall,
etc.
Currently if we have an RCU stall (effectively CPU stall), that can be
detected either RCU stall or a task hung, producing 2 different bug
reports (which is bad).
One can say that it's only a matter of tuning timeouts, but at least
task hung detector has a problem that if you set timeout to X, it can
detect hung anywhere between X and 2*X. And on one hand we need quite
large timeout (a minute may not be enough), and on the other hand we
can't wait for an hour just to make sure that the machine is indeed
dead (these things happen every few minutes).





>> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
>> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
>> > kernel/trace/trace_event_perf.c:161
>> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
>> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
>> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
>> >   put_event+0x24/0x30 kernel/events/core.c:4204
>> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
>> >   perf_release+0x37/0x50 kernel/events/core.c:4320
>> >   __fput+0x327/0x7e0 fs/file_table.c:209
>> >   fput+0x15/0x20 fs/file_table.c:243
>> >   task_work_run+0x199/0x270 kernel/task_work.c:113
>> >   exit_task_work include/linux/task_work.h:22 [inline]
>> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>> >   do_group_exit+0x149/0x400 kernel/exit.c:968
>> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
>> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>> >   syscall_return_slowpath 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> On Mon, 02 Apr 2018 02:20:02 -0700
>> syzbot  wrote:
>>
>> > Hello,
>> >
>> > syzbot hit the following crash on upstream commit
>> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
>> > Linux 4.16
>> > syzbot dashboard link:
>> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >
>> > Unfortunately, I don't have any reproducer for this crash yet.
>> > Raw console output:
>> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> > Kernel config:
>> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> > compiler: gcc (GCC) 7.1.1 20170620
>> >
>> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> > It will help syzbot understand when the bug is fixed. See footer for
>> > details.
>> > If you forward the report, please keep this part and the footer.
>> >
>> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
>> > option "g �;e�K�׫>pquota"
>
> Might not hurt to look into the above, though perhaps this is just syzkaller
> playing around with mount options.
>
>> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >Not tainted 4.16.0+ #10
>> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > syz-executor3   D20944 10803   4492 0x8002
>> > Call Trace:
>> >   context_switch kernel/sched/core.c:2862 [inline]
>> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>>
>> I don't think this is a perf issue. Looks like something is preventing
>> rcu_sched from completing. If there's a CPU that is running in kernel
>> space and never scheduling, that can cause this issue. Or if RCU
>> somehow missed a transition into idle or user space.
>
> The RCU CPU stall warning below strongly supports this position ...


I think this is this guy then:

https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40

#syz dup: INFO: rcu detected stall in __process_echoes


Looking retrospectively at the various hang/stall bugs that we have, I
think we need some kind of priority between them. I.e. we have rcu
stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
hang and maybe something else. It would be useful if they fire
deterministically according to priorities. If there is an rcu stall,
that's always detected as CPU stall. Then if there is no RCU stall,
but a workqueue stall, then that's always detected as workqueue stall,
etc.
Currently if we have an RCU stall (effectively CPU stall), that can be
detected either RCU stall or a task hung, producing 2 different bug
reports (which is bad).
One can say that it's only a matter of tuning timeouts, but at least
task hung detector has a problem that if you set timeout to X, it can
detect hung anywhere between X and 2*X. And on one hand we need quite
large timeout (a minute may not be enough), and on the other hand we
can't wait for an hour just to make sure that the machine is indeed
dead (these things happen every few minutes).





>> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
>> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
>> > kernel/trace/trace_event_perf.c:161
>> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
>> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
>> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
>> >   put_event+0x24/0x30 kernel/events/core.c:4204
>> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
>> >   perf_release+0x37/0x50 kernel/events/core.c:4320
>> >   __fput+0x327/0x7e0 fs/file_table.c:209
>> >   fput+0x15/0x20 fs/file_table.c:243
>> >   task_work_run+0x199/0x270 kernel/task_work.c:113
>> >   exit_task_work include/linux/task_work.h:22 [inline]
>> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>> >   do_group_exit+0x149/0x400 kernel/exit.c:968
>> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
>> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
>> >   

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> On Mon, 02 Apr 2018 02:20:02 -0700
> syzbot  wrote:
> 
> > Hello,
> > 
> > syzbot hit the following crash on upstream commit
> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
> > Linux 4.16
> > syzbot dashboard link:  
> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> > 
> > Unfortunately, I don't have any reproducer for this crash yet.
> > Raw console output:  
> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> > Kernel config:  
> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> > compiler: gcc (GCC) 7.1.1 20170620
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> > It will help syzbot understand when the bug is fixed. See footer for  
> > details.
> > If you forward the report, please keep this part and the footer.
> > 
> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
> > option "g�;e�K�׫>pquota"

Might not hurt to look into the above, though perhaps this is just syzkaller
playing around with mount options.

> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >Not tainted 4.16.0+ #10
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > syz-executor3   D20944 10803   4492 0x8002
> > Call Trace:
> >   context_switch kernel/sched/core.c:2862 [inline]
> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> 
> I don't think this is a perf issue. Looks like something is preventing
> rcu_sched from completing. If there's a CPU that is running in kernel
> space and never scheduling, that can cause this issue. Or if RCU
> somehow missed a transition into idle or user space.

The RCU CPU stall warning below strongly supports this position ...

> -- Steve
> 
> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0  
> > kernel/trace/trace_event_perf.c:161
> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
> >   put_event+0x24/0x30 kernel/events/core.c:4204
> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
> >   perf_release+0x37/0x50 kernel/events/core.c:4320
> >   __fput+0x327/0x7e0 fs/file_table.c:209
> >   fput+0x15/0x20 fs/file_table.c:243
> >   task_work_run+0x199/0x270 kernel/task_work.c:113
> >   exit_task_work include/linux/task_work.h:22 [inline]
> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
> >   do_group_exit+0x149/0x400 kernel/exit.c:968
> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
> >   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> > RIP: 0033:0x455269
> > RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca
> > RAX:  RBX: 0072bec8 RCX: 00455269
> > RDX:  RSI:  RDI: 0072bec8
> > RBP: 0072bec8 R08:  R09: 0072bea0
> > R10:  R11: 0246 R12: 
> > R13: 7ffe793f79cf R14: 7f89763729c0 R15: 
> > 
> > Showing all locks held in the system:
> > 2 locks held by khungtaskd/876:
> >   #0:  (rcu_read_lock){}, at: [<8f2bec4b>]  
> > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
> >   #0:  (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60
> > kernel/hung_task.c:249

... And two places to start looking are the two above rcu_read_lock() calls.
Especially given that khungtask shows up below.

> >   #1:  (tasklist_lock){.+.+}, at: [<06b3009f>]  
> > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> > 2 locks held by getty/4414:
> >   #0:  (>ldisc_sem){}, at: []  
> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >   #1:  (>atomic_read_lock){+.+.}, at: 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> On Mon, 02 Apr 2018 02:20:02 -0700
> syzbot  wrote:
> 
> > Hello,
> > 
> > syzbot hit the following crash on upstream commit
> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
> > Linux 4.16
> > syzbot dashboard link:  
> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> > 
> > Unfortunately, I don't have any reproducer for this crash yet.
> > Raw console output:  
> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> > Kernel config:  
> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> > compiler: gcc (GCC) 7.1.1 20170620
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> > It will help syzbot understand when the bug is fixed. See footer for  
> > details.
> > If you forward the report, please keep this part and the footer.
> > 
> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
> > option "g�;e�K�׫>pquota"

Might not hurt to look into the above, though perhaps this is just syzkaller
playing around with mount options.

> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >Not tainted 4.16.0+ #10
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > syz-executor3   D20944 10803   4492 0x8002
> > Call Trace:
> >   context_switch kernel/sched/core.c:2862 [inline]
> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> 
> I don't think this is a perf issue. Looks like something is preventing
> rcu_sched from completing. If there's a CPU that is running in kernel
> space and never scheduling, that can cause this issue. Or if RCU
> somehow missed a transition into idle or user space.

The RCU CPU stall warning below strongly supports this position ...

> -- Steve
> 
> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0  
> > kernel/trace/trace_event_perf.c:161
> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
> >   put_event+0x24/0x30 kernel/events/core.c:4204
> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
> >   perf_release+0x37/0x50 kernel/events/core.c:4320
> >   __fput+0x327/0x7e0 fs/file_table.c:209
> >   fput+0x15/0x20 fs/file_table.c:243
> >   task_work_run+0x199/0x270 kernel/task_work.c:113
> >   exit_task_work include/linux/task_work.h:22 [inline]
> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
> >   do_group_exit+0x149/0x400 kernel/exit.c:968
> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
> >   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> > RIP: 0033:0x455269
> > RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca
> > RAX:  RBX: 0072bec8 RCX: 00455269
> > RDX:  RSI:  RDI: 0072bec8
> > RBP: 0072bec8 R08:  R09: 0072bea0
> > R10:  R11: 0246 R12: 
> > R13: 7ffe793f79cf R14: 7f89763729c0 R15: 
> > 
> > Showing all locks held in the system:
> > 2 locks held by khungtaskd/876:
> >   #0:  (rcu_read_lock){}, at: [<8f2bec4b>]  
> > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
> >   #0:  (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60
> > kernel/hung_task.c:249

... And two places to start looking are the two above rcu_read_lock() calls.
Especially given that khungtask shows up below.

> >   #1:  (tasklist_lock){.+.+}, at: [<06b3009f>]  
> > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> > 2 locks held by getty/4414:
> >   #0:  (>ldisc_sem){}, at: []  
> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> > n_tty_read+0x2ef/0x1a40 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Steven Rostedt
On Mon, 02 Apr 2018 02:20:02 -0700
syzbot  wrote:

> Hello,
> 
> syzbot hit the following crash on upstream commit
> 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
> Linux 4.16
> syzbot dashboard link:  
> https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:  
> https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> Kernel config:  
> https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> compiler: gcc (GCC) 7.1.1 20170620
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for  
> details.
> If you forward the report, please keep this part and the footer.
> 
> REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
> option "g�;e�K�׫>pquota"
> INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>Not tainted 4.16.0+ #10
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor3   D20944 10803   4492 0x8002
> Call Trace:
>   context_switch kernel/sched/core.c:2862 [inline]
>   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>   schedule+0xf5/0x430 kernel/sched/core.c:3499
>   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>   do_wait_for_common kernel/sched/completion.c:86 [inline]
>   __wait_for_common kernel/sched/completion.c:107 [inline]
>   wait_for_common kernel/sched/completion.c:118 [inline]
>   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213

I don't think this is a perf issue. Looks like something is preventing
rcu_sched from completing. If there's a CPU that is running in kernel
space and never scheduling, that can cause this issue. Or if RCU
somehow missed a transition into idle or user space.

-- Steve

>   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
>   perf_trace_event_unreg.isra.2+0xb7/0x1f0  
> kernel/trace/trace_event_perf.c:161
>   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
>   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
>   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
>   put_event+0x24/0x30 kernel/events/core.c:4204
>   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
>   perf_release+0x37/0x50 kernel/events/core.c:4320
>   __fput+0x327/0x7e0 fs/file_table.c:209
>   fput+0x15/0x20 fs/file_table.c:243
>   task_work_run+0x199/0x270 kernel/task_work.c:113
>   exit_task_work include/linux/task_work.h:22 [inline]
>   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>   do_group_exit+0x149/0x400 kernel/exit.c:968
>   get_signal+0x73a/0x16d0 kernel/signal.c:2469
>   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
>   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x455269
> RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca
> RAX:  RBX: 0072bec8 RCX: 00455269
> RDX:  RSI:  RDI: 0072bec8
> RBP: 0072bec8 R08:  R09: 0072bea0
> R10:  R11: 0246 R12: 
> R13: 7ffe793f79cf R14: 7f89763729c0 R15: 
> 
> Showing all locks held in the system:
> 2 locks held by khungtaskd/876:
>   #0:  (rcu_read_lock){}, at: [<8f2bec4b>]  
> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
>   #0:  (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60  
> kernel/hung_task.c:249
>   #1:  (tasklist_lock){.+.+}, at: [<06b3009f>]  
> debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> 2 locks held by getty/4414:
>   #0:  (>ldisc_sem){}, at: []  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4415:
>   #0:  (>ldisc_sem){}, at: []  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4416:
>   #0:  (>ldisc_sem){}, at: []  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> n_tty_read+0x2ef/0x1a40 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Steven Rostedt
On Mon, 02 Apr 2018 02:20:02 -0700
syzbot  wrote:

> Hello,
> 
> syzbot hit the following crash on upstream commit
> 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
> Linux 4.16
> syzbot dashboard link:  
> https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:  
> https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> Kernel config:  
> https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> compiler: gcc (GCC) 7.1.1 20170620
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for  
> details.
> If you forward the report, please keep this part and the footer.
> 
> REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
> option "g�;e�K�׫>pquota"
> INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>Not tainted 4.16.0+ #10
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor3   D20944 10803   4492 0x8002
> Call Trace:
>   context_switch kernel/sched/core.c:2862 [inline]
>   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>   schedule+0xf5/0x430 kernel/sched/core.c:3499
>   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>   do_wait_for_common kernel/sched/completion.c:86 [inline]
>   __wait_for_common kernel/sched/completion.c:107 [inline]
>   wait_for_common kernel/sched/completion.c:118 [inline]
>   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213

I don't think this is a perf issue. Looks like something is preventing
rcu_sched from completing. If there's a CPU that is running in kernel
space and never scheduling, that can cause this issue. Or if RCU
somehow missed a transition into idle or user space.

-- Steve

>   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
>   perf_trace_event_unreg.isra.2+0xb7/0x1f0  
> kernel/trace/trace_event_perf.c:161
>   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
>   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
>   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
>   put_event+0x24/0x30 kernel/events/core.c:4204
>   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
>   perf_release+0x37/0x50 kernel/events/core.c:4320
>   __fput+0x327/0x7e0 fs/file_table.c:209
>   fput+0x15/0x20 fs/file_table.c:243
>   task_work_run+0x199/0x270 kernel/task_work.c:113
>   exit_task_work include/linux/task_work.h:22 [inline]
>   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>   do_group_exit+0x149/0x400 kernel/exit.c:968
>   get_signal+0x73a/0x16d0 kernel/signal.c:2469
>   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
>   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x455269
> RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca
> RAX:  RBX: 0072bec8 RCX: 00455269
> RDX:  RSI:  RDI: 0072bec8
> RBP: 0072bec8 R08:  R09: 0072bea0
> R10:  R11: 0246 R12: 
> R13: 7ffe793f79cf R14: 7f89763729c0 R15: 
> 
> Showing all locks held in the system:
> 2 locks held by khungtaskd/876:
>   #0:  (rcu_read_lock){}, at: [<8f2bec4b>]  
> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
>   #0:  (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60  
> kernel/hung_task.c:249
>   #1:  (tasklist_lock){.+.+}, at: [<06b3009f>]  
> debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> 2 locks held by getty/4414:
>   #0:  (>ldisc_sem){}, at: []  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4415:
>   #0:  (>ldisc_sem){}, at: []  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4416:
>   #0:  (>ldisc_sem){}, at: []  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4417:
> 

INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread syzbot

Hello,

syzbot hit the following crash on upstream commit
0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
Linux 4.16
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-2374466361298166459

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
option "g�;e�K�׫>pquota"

INFO: task syz-executor3:10803 blocked for more than 120 seconds.
  Not tainted 4.16.0+ #10
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor3   D20944 10803   4492 0x8002
Call Trace:
 context_switch kernel/sched/core.c:2862 [inline]
 __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
 schedule+0xf5/0x430 kernel/sched/core.c:3499
 schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
 do_wait_for_common kernel/sched/completion.c:86 [inline]
 __wait_for_common kernel/sched/completion.c:107 [inline]
 wait_for_common kernel/sched/completion.c:118 [inline]
 wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
 __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
 synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
 synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
 tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
 perf_trace_event_unreg.isra.2+0xb7/0x1f0  
kernel/trace/trace_event_perf.c:161

 perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
 tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
 _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
 put_event+0x24/0x30 kernel/events/core.c:4204
 perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
 perf_release+0x37/0x50 kernel/events/core.c:4320
 __fput+0x327/0x7e0 fs/file_table.c:209
 fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x199/0x270 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x9bb/0x1ad0 kernel/exit.c:865
 do_group_exit+0x149/0x400 kernel/exit.c:968
 get_signal+0x73a/0x16d0 kernel/signal.c:2469
 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x455269
RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca
RAX:  RBX: 0072bec8 RCX: 00455269
RDX:  RSI:  RDI: 0072bec8
RBP: 0072bec8 R08:  R09: 0072bea0
R10:  R11: 0246 R12: 
R13: 7ffe793f79cf R14: 7f89763729c0 R15: 

Showing all locks held in the system:
2 locks held by khungtaskd/876:
 #0:  (rcu_read_lock){}, at: [<8f2bec4b>]  
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
 #0:  (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60  
kernel/hung_task.c:249
 #1:  (tasklist_lock){.+.+}, at: [<06b3009f>]  
debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470

2 locks held by getty/4414:
 #0:  (>ldisc_sem){}, at: []  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
 #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131

2 locks held by getty/4415:
 #0:  (>ldisc_sem){}, at: []  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
 #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131

2 locks held by getty/4416:
 #0:  (>ldisc_sem){}, at: []  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
 #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131

2 locks held by getty/4417:
 #0:  (>ldisc_sem){}, at: []  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
 #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131

2 locks held by getty/4418:
 #0:  (>ldisc_sem){}, at: []  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
 #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131

2 locks held by getty/4419:
 #0:  (>ldisc_sem){}, at: []  

INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread syzbot

Hello,

syzbot hit the following crash on upstream commit
0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
Linux 4.16
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-2374466361298166459

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
option "g�;e�K�׫>pquota"

INFO: task syz-executor3:10803 blocked for more than 120 seconds.
  Not tainted 4.16.0+ #10
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor3   D20944 10803   4492 0x8002
Call Trace:
 context_switch kernel/sched/core.c:2862 [inline]
 __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
 schedule+0xf5/0x430 kernel/sched/core.c:3499
 schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
 do_wait_for_common kernel/sched/completion.c:86 [inline]
 __wait_for_common kernel/sched/completion.c:107 [inline]
 wait_for_common kernel/sched/completion.c:118 [inline]
 wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
 __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
 synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
 synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
 tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
 perf_trace_event_unreg.isra.2+0xb7/0x1f0  
kernel/trace/trace_event_perf.c:161

 perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
 tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
 _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
 put_event+0x24/0x30 kernel/events/core.c:4204
 perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
 perf_release+0x37/0x50 kernel/events/core.c:4320
 __fput+0x327/0x7e0 fs/file_table.c:209
 fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x199/0x270 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x9bb/0x1ad0 kernel/exit.c:865
 do_group_exit+0x149/0x400 kernel/exit.c:968
 get_signal+0x73a/0x16d0 kernel/signal.c:2469
 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x455269
RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca
RAX:  RBX: 0072bec8 RCX: 00455269
RDX:  RSI:  RDI: 0072bec8
RBP: 0072bec8 R08:  R09: 0072bea0
R10:  R11: 0246 R12: 
R13: 7ffe793f79cf R14: 7f89763729c0 R15: 

Showing all locks held in the system:
2 locks held by khungtaskd/876:
 #0:  (rcu_read_lock){}, at: [<8f2bec4b>]  
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
 #0:  (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60  
kernel/hung_task.c:249
 #1:  (tasklist_lock){.+.+}, at: [<06b3009f>]  
debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470

2 locks held by getty/4414:
 #0:  (>ldisc_sem){}, at: []  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
 #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131

2 locks held by getty/4415:
 #0:  (>ldisc_sem){}, at: []  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
 #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131

2 locks held by getty/4416:
 #0:  (>ldisc_sem){}, at: []  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
 #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131

2 locks held by getty/4417:
 #0:  (>ldisc_sem){}, at: []  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
 #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131

2 locks held by getty/4418:
 #0:  (>ldisc_sem){}, at: []  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
 #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131

2 locks held by getty/4419:
 #0:  (>ldisc_sem){}, at: []