Re: INFO: task hung in perf_trace_event_unreg

2018-04-12 Thread Paul E. McKenney
On Thu, Apr 12, 2018 at 11:39:42AM +0200, Dmitry Vyukov wrote:
> On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney
>  wrote:
> >> >> >> >>  wrote:
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
> >> >> >> >> >> >> >> >> > 21:20:27 2018 +)
> >> >> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
> >> >> >> >> >> >> >> >> > crash yet.
> >> >> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the 
> >> >> >> >> >> >> >> >> > following tag to the commit:
> >> >> >> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is 
> >> >> >> >> >> >> >> >> > fixed. See footer for
> >> >> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> >> >> > If you forward the report, please keep this part 
> >> >> >> >> >> >> >> >> > and the footer.
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
> >> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
> >> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps 
> >> >> >> >> >> >> >> > this is just syzkaller
> >> >> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more 
> >> >> >> >> >> >> >> >> > than 120 seconds.
> >> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> >> >> >> >> >> >> >> > disables this message.
> >> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 
> >> >> >> >> >> >> >> >> > kernel/time/timer.c:1777
> >> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
> >> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
> >> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 
> >> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
> >> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like 
> >> >> >> >> >> >> >> >> something is preventing
> >> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
> >> >> >> >> >> >> >> >> running in kernel
> >> >> >> >> >> >> >> >> space and never scheduling, that can cause this 
> >> >> >> >> >> >> >> >> issue. Or if RCU
> >> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
> >> >> >> >> >> >> >> > position ...
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Seems likely to me!
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs 
> >> >> >> >> >> >> >> that we have, I
> >> >> >> >> >> >> >> think we need some kind of priority between them. I.e. 
> >> >> >> >> >> >> >> we have rcu
> >> >> >> >> >> >> >> stalls, spinlock stalls, workqueue 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-12 Thread Paul E. McKenney
On Thu, Apr 12, 2018 at 11:39:42AM +0200, Dmitry Vyukov wrote:
> On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney
>  wrote:
> >> >> >> >>  wrote:
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
> >> >> >> >> >> >> >> >> > 21:20:27 2018 +)
> >> >> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
> >> >> >> >> >> >> >> >> > crash yet.
> >> >> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the 
> >> >> >> >> >> >> >> >> > following tag to the commit:
> >> >> >> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is 
> >> >> >> >> >> >> >> >> > fixed. See footer for
> >> >> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> >> >> > If you forward the report, please keep this part 
> >> >> >> >> >> >> >> >> > and the footer.
> >> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
> >> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
> >> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps 
> >> >> >> >> >> >> >> > this is just syzkaller
> >> >> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more 
> >> >> >> >> >> >> >> >> > than 120 seconds.
> >> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> >> >> >> >> >> >> >> > disables this message.
> >> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 
> >> >> >> >> >> >> >> >> > kernel/time/timer.c:1777
> >> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
> >> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
> >> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 
> >> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
> >> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like 
> >> >> >> >> >> >> >> >> something is preventing
> >> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
> >> >> >> >> >> >> >> >> running in kernel
> >> >> >> >> >> >> >> >> space and never scheduling, that can cause this 
> >> >> >> >> >> >> >> >> issue. Or if RCU
> >> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
> >> >> >> >> >> >> >> > position ...
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Seems likely to me!
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs 
> >> >> >> >> >> >> >> that we have, I
> >> >> >> >> >> >> >> think we need some kind of priority between them. I.e. 
> >> >> >> >> >> >> >> we have rcu
> >> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, 
> >> >> >> >> >> >> >> silent machine
> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-12 Thread Dmitry Vyukov
On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney
 wrote:
>> >> >> >>  wrote:
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
>> >> >> >> >> >> >> >> > 21:20:27 2018 +)
>> >> >> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
>> >> >> >> >> >> >> >> > crash yet.
>> >> >> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the 
>> >> >> >> >> >> >> >> > following tag to the commit:
>> >> >> >> >> >> >> >> > Reported-by: 
>> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. 
>> >> >> >> >> >> >> >> > See footer for
>> >> >> >> >> >> >> >> > details.
>> >> >> >> >> >> >> >> > If you forward the report, please keep this part and 
>> >> >> >> >> >> >> >> > the footer.
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
>> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
>> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps 
>> >> >> >> >> >> >> > this is just syzkaller
>> >> >> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 
>> >> >> >> >> >> >> >> > 120 seconds.
>> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> >> >> >> >> >> >> >> > disables this message.
>> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 
>> >> >> >> >> >> >> >> > kernel/time/timer.c:1777
>> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
>> >> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
>> >> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 
>> >> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
>> >> >> >> >> >> >> >> > kernel/sched/completion.c:139
>> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
>> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
>> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like 
>> >> >> >> >> >> >> >> something is preventing
>> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
>> >> >> >> >> >> >> >> running in kernel
>> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. 
>> >> >> >> >> >> >> >> Or if RCU
>> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
>> >> >> >> >> >> >> > position ...
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> I think this is this guy then:
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >> >> >
>> >> >> >> >> >> > Seems likely to me!
>> >> >> >> >> >> >
>> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs 
>> >> >> >> >> >> >> that we have, I
>> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we 
>> >> >> >> >> >> >> have rcu
>> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, 
>> >> >> >> >> >> >> silent machine
>> >> >> >> >> >> >> hang and maybe something else. It would be useful if they 
>> >> >> >> >> >> >> fire
>> >> >> >> >> >> >> deterministically according to priorities. If there is an 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-12 Thread Dmitry Vyukov
On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney
 wrote:
>> >> >> >>  wrote:
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
>> >> >> >> >> >> >> >> > 21:20:27 2018 +)
>> >> >> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
>> >> >> >> >> >> >> >> > crash yet.
>> >> >> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the 
>> >> >> >> >> >> >> >> > following tag to the commit:
>> >> >> >> >> >> >> >> > Reported-by: 
>> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. 
>> >> >> >> >> >> >> >> > See footer for
>> >> >> >> >> >> >> >> > details.
>> >> >> >> >> >> >> >> > If you forward the report, please keep this part and 
>> >> >> >> >> >> >> >> > the footer.
>> >> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
>> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
>> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps 
>> >> >> >> >> >> >> > this is just syzkaller
>> >> >> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 
>> >> >> >> >> >> >> >> > 120 seconds.
>> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> >> >> >> >> >> >> >> > disables this message.
>> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 
>> >> >> >> >> >> >> >> > kernel/time/timer.c:1777
>> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
>> >> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
>> >> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 
>> >> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
>> >> >> >> >> >> >> >> > kernel/sched/completion.c:139
>> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
>> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
>> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >> >> >>
>> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like 
>> >> >> >> >> >> >> >> something is preventing
>> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
>> >> >> >> >> >> >> >> running in kernel
>> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. 
>> >> >> >> >> >> >> >> Or if RCU
>> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
>> >> >> >> >> >> >> > position ...
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> I think this is this guy then:
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >> >> >
>> >> >> >> >> >> > Seems likely to me!
>> >> >> >> >> >> >
>> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs 
>> >> >> >> >> >> >> that we have, I
>> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we 
>> >> >> >> >> >> >> have rcu
>> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, 
>> >> >> >> >> >> >> silent machine
>> >> >> >> >> >> >> hang and maybe something else. It would be useful if they 
>> >> >> >> >> >> >> fire
>> >> >> >> >> >> >> deterministically according to priorities. If there is an 
>> >> >> >> >> >> >> rcu stall,
>> >> >> >> >> >> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-11 Thread Paul E. McKenney
On Wed, Apr 11, 2018 at 12:06:27PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney
>  wrote:
> >> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >> >> >>  wrote:
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
> >> >> >> >> >> >> >> > 21:20:27 2018 +)
> >> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
> >> >> >> >> >> >> >> > crash yet.
> >> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the 
> >> >> >> >> >> >> >> > following tag to the commit:
> >> >> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. 
> >> >> >> >> >> >> >> > See footer for
> >> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> >> > If you forward the report, please keep this part and 
> >> >> >> >> >> >> >> > the footer.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps 
> >> >> >> >> >> >> > this is just syzkaller
> >> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 
> >> >> >> >> >> >> >> > 120 seconds.
> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> >> >> >> >> >> >> > disables this message.
> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 
> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something 
> >> >> >> >> >> >> >> is preventing
> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
> >> >> >> >> >> >> >> running in kernel
> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. 
> >> >> >> >> >> >> >> Or if RCU
> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
> >> >> >> >> >> >> > position ...
> >> >> >> >> >> >>
> >> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >> >>
> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >> >>
> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >> >
> >> >> >> >> >> > Seems likely to me!
> >> >> >> >> >> >
> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that 
> >> >> >> >> >> >> we have, I
> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we 
> >> >> >> >> >> >> have rcu
> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, 
> >> >> >> >> >> >> silent machine
> >> >> >> >> >> >> hang and maybe something else. It would be useful if they 
> >> >> >> >> >> >> fire
> >> >> >> >> >> >> deterministically according to priorities. If 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-11 Thread Paul E. McKenney
On Wed, Apr 11, 2018 at 12:06:27PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney
>  wrote:
> >> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >> >> >>  wrote:
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
> >> >> >> >> >> >> >> > 21:20:27 2018 +)
> >> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
> >> >> >> >> >> >> >> > crash yet.
> >> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the 
> >> >> >> >> >> >> >> > following tag to the commit:
> >> >> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. 
> >> >> >> >> >> >> >> > See footer for
> >> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> >> > If you forward the report, please keep this part and 
> >> >> >> >> >> >> >> > the footer.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
> >> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps 
> >> >> >> >> >> >> > this is just syzkaller
> >> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 
> >> >> >> >> >> >> >> > 120 seconds.
> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> >> >> >> >> >> >> > disables this message.
> >> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 
> >> >> >> >> >> >> >> > [inline]
> >> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something 
> >> >> >> >> >> >> >> is preventing
> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
> >> >> >> >> >> >> >> running in kernel
> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. 
> >> >> >> >> >> >> >> Or if RCU
> >> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
> >> >> >> >> >> >> > position ...
> >> >> >> >> >> >>
> >> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >> >>
> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >> >>
> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >> >
> >> >> >> >> >> > Seems likely to me!
> >> >> >> >> >> >
> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that 
> >> >> >> >> >> >> we have, I
> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we 
> >> >> >> >> >> >> have rcu
> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, 
> >> >> >> >> >> >> silent machine
> >> >> >> >> >> >> hang and maybe something else. It would be useful if they 
> >> >> >> >> >> >> fire
> >> >> >> >> >> >> deterministically according to priorities. If there is an 
> >> >> >> >> >> >> rcu stall,
> >> >> >> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-11 Thread Dmitry Vyukov
On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney
 wrote:
>> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>> >> >>  wrote:
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
>> >> >> >> >> >> >> > 21:20:27 2018 +)
>> >> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
>> >> >> >> >> >> >> > crash yet.
>> >> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following 
>> >> >> >> >> >> >> > tag to the commit:
>> >> >> >> >> >> >> > Reported-by: 
>> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. 
>> >> >> >> >> >> >> > See footer for
>> >> >> >> >> >> >> > details.
>> >> >> >> >> >> >> > If you forward the report, please keep this part and the 
>> >> >> >> >> >> >> > footer.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
>> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
>> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >> >
>> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this 
>> >> >> >> >> >> > is just syzkaller
>> >> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >> >
>> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
>> >> >> >> >> >> >> > seconds.
>> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> >> >> >> >> >> >> > disables this message.
>> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
>> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
>> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
>> >> >> >> >> >> >> > kernel/sched/completion.c:139
>> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
>> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
>> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something 
>> >> >> >> >> >> >> is preventing
>> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
>> >> >> >> >> >> >> running in kernel
>> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or 
>> >> >> >> >> >> >> if RCU
>> >> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >> >
>> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
>> >> >> >> >> >> > position ...
>> >> >> >> >> >>
>> >> >> >> >> >> I think this is this guy then:
>> >> >> >> >> >>
>> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >> >>
>> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >> >
>> >> >> >> >> > Seems likely to me!
>> >> >> >> >> >
>> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that 
>> >> >> >> >> >> we have, I
>> >> >> >> >> >> think we need some kind of priority between them. I.e. we 
>> >> >> >> >> >> have rcu
>> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
>> >> >> >> >> >> machine
>> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> >> >> deterministically according to priorities. If there is an rcu 
>> >> >> >> >> >> stall,
>> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
>> >> >> >> >> >> stall,
>> >> >> >> >> >> but a workqueue stall, then that's always detected as 
>> >> >> >> >> >> workqueue stall,
>> >> >> >> >> >> etc.
>> >> >> >> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-11 Thread Dmitry Vyukov
On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney
 wrote:
>> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>> >> >>  wrote:
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
>> >> >> >> >> >> >> > 21:20:27 2018 +)
>> >> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this 
>> >> >> >> >> >> >> > crash yet.
>> >> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following 
>> >> >> >> >> >> >> > tag to the commit:
>> >> >> >> >> >> >> > Reported-by: 
>> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. 
>> >> >> >> >> >> >> > See footer for
>> >> >> >> >> >> >> > details.
>> >> >> >> >> >> >> > If you forward the report, please keep this part and the 
>> >> >> >> >> >> >> > footer.
>> >> >> >> >> >> >> >
>> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
>> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
>> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >> >
>> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this 
>> >> >> >> >> >> > is just syzkaller
>> >> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >> >
>> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
>> >> >> >> >> >> >> > seconds.
>> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> >> >> >> >> >> >> > disables this message.
>> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 
>> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 
>> >> >> >> >> >> >> > [inline]
>> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
>> >> >> >> >> >> >> > kernel/sched/completion.c:139
>> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
>> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
>> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something 
>> >> >> >> >> >> >> is preventing
>> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is 
>> >> >> >> >> >> >> running in kernel
>> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or 
>> >> >> >> >> >> >> if RCU
>> >> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >> >
>> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
>> >> >> >> >> >> > position ...
>> >> >> >> >> >>
>> >> >> >> >> >> I think this is this guy then:
>> >> >> >> >> >>
>> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >> >>
>> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >> >
>> >> >> >> >> > Seems likely to me!
>> >> >> >> >> >
>> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that 
>> >> >> >> >> >> we have, I
>> >> >> >> >> >> think we need some kind of priority between them. I.e. we 
>> >> >> >> >> >> have rcu
>> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
>> >> >> >> >> >> machine
>> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> >> >> deterministically according to priorities. If there is an rcu 
>> >> >> >> >> >> stall,
>> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
>> >> >> >> >> >> stall,
>> >> >> >> >> >> but a workqueue stall, then that's always detected as 
>> >> >> >> >> >> workqueue stall,
>> >> >> >> >> >> etc.
>> >> >> >> >> >> Currently if we have an RCU stall (effectively CPU 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-10 Thread Paul E. McKenney
On Tue, Apr 10, 2018 at 01:13:13PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
> >>  wrote:
> >> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >> >>  wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
> >> >> >> >> >> >> > 21:20:27 2018 +)
> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash 
> >> >> >> >> >> >> > yet.
> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following 
> >> >> >> >> >> >> > tag to the commit:
> >> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
> >> >> >> >> >> >> > footer for
> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> > If you forward the report, please keep this part and the 
> >> >> >> >> >> >> > footer.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >
> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this 
> >> >> >> >> >> > is just syzkaller
> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >
> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
> >> >> >> >> >> >> > seconds.
> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> >> >> >> >> >> > disables this message.
> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >>
> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> >> >> >> preventing
> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running 
> >> >> >> >> >> >> in kernel
> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or 
> >> >> >> >> >> >> if RCU
> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >
> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
> >> >> >> >> >> > position ...
> >> >> >> >> >>
> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >>
> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >>
> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >
> >> >> >> >> > Seems likely to me!
> >> >> >> >> >
> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
> >> >> >> >> >> have, I
> >> >> >> >> >> think we need some kind of priority between them. I.e. we have 
> >> >> >> >> >> rcu
> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
> >> >> >> >> >> machine
> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> >> >> deterministically according to priorities. If there is an rcu 
> >> >> >> >> >> stall,
> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
> >> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-10 Thread Paul E. McKenney
On Tue, Apr 10, 2018 at 01:13:13PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
> >>  wrote:
> >> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >> >>  wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >> > Hello,
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
> >> >> >> >> >> >> > 21:20:27 2018 +)
> >> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash 
> >> >> >> >> >> >> > yet.
> >> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following 
> >> >> >> >> >> >> > tag to the commit:
> >> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
> >> >> >> >> >> >> > footer for
> >> >> >> >> >> >> > details.
> >> >> >> >> >> >> > If you forward the report, please keep this part and the 
> >> >> >> >> >> >> > footer.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
> >> >> >> >> >> >> > reiserfs_getopt: unknown mount
> >> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >> >
> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this 
> >> >> >> >> >> > is just syzkaller
> >> >> >> >> >> > playing around with mount options.
> >> >> >> >> >> >
> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
> >> >> >> >> >> >> > seconds.
> >> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> >> >> >> >> >> > disables this message.
> >> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
> >> >> >> >> >> >> > kernel/rcu/tree.c:3212
> >> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >> >>
> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> >> >> >> preventing
> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running 
> >> >> >> >> >> >> in kernel
> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or 
> >> >> >> >> >> >> if RCU
> >> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >> >
> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
> >> >> >> >> >> > position ...
> >> >> >> >> >>
> >> >> >> >> >> I think this is this guy then:
> >> >> >> >> >>
> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >> >>
> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >> >
> >> >> >> >> > Seems likely to me!
> >> >> >> >> >
> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
> >> >> >> >> >> have, I
> >> >> >> >> >> think we need some kind of priority between them. I.e. we have 
> >> >> >> >> >> rcu
> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
> >> >> >> >> >> machine
> >> >> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> >> >> deterministically according to priorities. If there is an rcu 
> >> >> >> >> >> stall,
> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
> >> >> >> >> >> stall,
> >> >> >> >> >> but a workqueue stall, then that's always detected 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-10 Thread Dmitry Vyukov
On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
>>  wrote:
>> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
>> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>> >>  wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >
>> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
>> >> >> >> >> >> > 21:20:27 2018 +)
>> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >
>> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash 
>> >> >> >> >> >> > yet.
>> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >
>> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag 
>> >> >> >> >> >> > to the commit:
>> >> >> >> >> >> > Reported-by: 
>> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
>> >> >> >> >> >> > footer for
>> >> >> >> >> >> > details.
>> >> >> >> >> >> > If you forward the report, please keep this part and the 
>> >> >> >> >> >> > footer.
>> >> >> >> >> >> >
>> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
>> >> >> >> >> >> > reiserfs_getopt: unknown mount
>> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >
>> >> >> >> >> > Might not hurt to look into the above, though perhaps this is 
>> >> >> >> >> > just syzkaller
>> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >
>> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
>> >> >> >> >> >> > seconds.
>> >> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> >> >> >> >> >> > this message.
>> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
>> >> >> >> >> >> > kernel/sched/completion.c:139
>> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
>> >> >> >> >> >> > kernel/rcu/tree.c:3212
>> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >>
>> >> >> >> >> >> I don't think this is a perf issue. Looks like something is 
>> >> >> >> >> >> preventing
>> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running 
>> >> >> >> >> >> in kernel
>> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if 
>> >> >> >> >> >> RCU
>> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >
>> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
>> >> >> >> >> > position ...
>> >> >> >> >>
>> >> >> >> >> I think this is this guy then:
>> >> >> >> >>
>> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >>
>> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >
>> >> >> >> > Seems likely to me!
>> >> >> >> >
>> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
>> >> >> >> >> have, I
>> >> >> >> >> think we need some kind of priority between them. I.e. we have 
>> >> >> >> >> rcu
>> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
>> >> >> >> >> machine
>> >> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> >> deterministically according to priorities. If there is an rcu 
>> >> >> >> >> stall,
>> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
>> >> >> >> >> stall,
>> >> >> >> >> but a workqueue stall, then that's always detected as workqueue 
>> >> >> >> >> stall,
>> >> >> >> >> etc.
>> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that 
>> >> >> >> >> can be
>> >> >> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-10 Thread Dmitry Vyukov
On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
>>  wrote:
>> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
>> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>> >>  wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >
>> >> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 
>> >> >> >> >> >> > 21:20:27 2018 +)
>> >> >> >> >> >> > Linux 4.16
>> >> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >> >
>> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash 
>> >> >> >> >> >> > yet.
>> >> >> >> >> >> > Raw console output:
>> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> >> > Kernel config:
>> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >> >
>> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag 
>> >> >> >> >> >> > to the commit:
>> >> >> >> >> >> > Reported-by: 
>> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
>> >> >> >> >> >> > footer for
>> >> >> >> >> >> > details.
>> >> >> >> >> >> > If you forward the report, please keep this part and the 
>> >> >> >> >> >> > footer.
>> >> >> >> >> >> >
>> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 
>> >> >> >> >> >> > reiserfs_getopt: unknown mount
>> >> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >> >
>> >> >> >> >> > Might not hurt to look into the above, though perhaps this is 
>> >> >> >> >> > just syzkaller
>> >> >> >> >> > playing around with mount options.
>> >> >> >> >> >
>> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
>> >> >> >> >> >> > seconds.
>> >> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> >> >> >> >> >> > this message.
>> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> >> > Call Trace:
>> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
>> >> >> >> >> >> > kernel/sched/completion.c:139
>> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 
>> >> >> >> >> >> > kernel/rcu/tree.c:3212
>> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >> >>
>> >> >> >> >> >> I don't think this is a perf issue. Looks like something is 
>> >> >> >> >> >> preventing
>> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running 
>> >> >> >> >> >> in kernel
>> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if 
>> >> >> >> >> >> RCU
>> >> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >> >
>> >> >> >> >> > The RCU CPU stall warning below strongly supports this 
>> >> >> >> >> > position ...
>> >> >> >> >>
>> >> >> >> >> I think this is this guy then:
>> >> >> >> >>
>> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >> >>
>> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >> >
>> >> >> >> > Seems likely to me!
>> >> >> >> >
>> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
>> >> >> >> >> have, I
>> >> >> >> >> think we need some kind of priority between them. I.e. we have 
>> >> >> >> >> rcu
>> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
>> >> >> >> >> machine
>> >> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> >> deterministically according to priorities. If there is an rcu 
>> >> >> >> >> stall,
>> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
>> >> >> >> >> stall,
>> >> >> >> >> but a workqueue stall, then that's always detected as workqueue 
>> >> >> >> >> stall,
>> >> >> >> >> etc.
>> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that 
>> >> >> >> >> can be
>> >> >> >> >> detected either RCU stall or a task hung, producing 2 different 
>> >> >> >> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Paul E. McKenney
On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >>  wrote:
> >> >> >> >> >>
> >> >> >> >> >> > Hello,
> >> >> >> >> >> >
> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 
> >> >> >> >> >> > 2018 +)
> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >
> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash 
> >> >> >> >> >> > yet.
> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >
> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag 
> >> >> >> >> >> > to the commit:
> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
> >> >> >> >> >> > footer for
> >> >> >> >> >> > details.
> >> >> >> >> >> > If you forward the report, please keep this part and the 
> >> >> >> >> >> > footer.
> >> >> >> >> >> >
> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
> >> >> >> >> >> > unknown mount
> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >
> >> >> >> >> > Might not hurt to look into the above, though perhaps this is 
> >> >> >> >> > just syzkaller
> >> >> >> >> > playing around with mount options.
> >> >> >> >> >
> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
> >> >> >> >> >> > seconds.
> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> >> >> >> >> >> > this message.
> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >>
> >> >> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> >> >> preventing
> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
> >> >> >> >> >> kernel
> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if 
> >> >> >> >> >> RCU
> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >
> >> >> >> >> > The RCU CPU stall warning below strongly supports this position 
> >> >> >> >> > ...
> >> >> >> >>
> >> >> >> >> I think this is this guy then:
> >> >> >> >>
> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >>
> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >
> >> >> >> > Seems likely to me!
> >> >> >> >
> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
> >> >> >> >> have, I
> >> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
> >> >> >> >> machine
> >> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> >> deterministically according to priorities. If there is an rcu 
> >> >> >> >> stall,
> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
> >> >> >> >> stall,
> >> >> >> >> but a workqueue stall, then that's always detected as workqueue 
> >> >> >> >> stall,
> >> >> >> >> etc.
> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that 
> >> >> >> >> can be
> >> >> >> >> detected either RCU stall or a task hung, producing 2 different 
> >> >> >> >> bug
> >> >> >> >> reports (which is bad).
> >> >> >> >> One can say that it's only a matter of tuning timeouts, but at 
> >> >> >> >> least

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Paul E. McKenney
On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
> >>  wrote:
> >> >> >> >> >>
> >> >> >> >> >> > Hello,
> >> >> >> >> >> >
> >> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 
> >> >> >> >> >> > 2018 +)
> >> >> >> >> >> > Linux 4.16
> >> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >> >
> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash 
> >> >> >> >> >> > yet.
> >> >> >> >> >> > Raw console output:
> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> >> > Kernel config:
> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >> >
> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag 
> >> >> >> >> >> > to the commit:
> >> >> >> >> >> > Reported-by: 
> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
> >> >> >> >> >> > footer for
> >> >> >> >> >> > details.
> >> >> >> >> >> > If you forward the report, please keep this part and the 
> >> >> >> >> >> > footer.
> >> >> >> >> >> >
> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
> >> >> >> >> >> > unknown mount
> >> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >> >
> >> >> >> >> > Might not hurt to look into the above, though perhaps this is 
> >> >> >> >> > just syzkaller
> >> >> >> >> > playing around with mount options.
> >> >> >> >> >
> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
> >> >> >> >> >> > seconds.
> >> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> >> >> >> >> >> > this message.
> >> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> >> > Call Trace:
> >> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >> >   wait_for_completion+0x415/0x770 
> >> >> >> >> >> > kernel/sched/completion.c:139
> >> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >> >>
> >> >> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> >> >> preventing
> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
> >> >> >> >> >> kernel
> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if 
> >> >> >> >> >> RCU
> >> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >> >
> >> >> >> >> > The RCU CPU stall warning below strongly supports this position 
> >> >> >> >> > ...
> >> >> >> >>
> >> >> >> >> I think this is this guy then:
> >> >> >> >>
> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >> >>
> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >> >
> >> >> >> > Seems likely to me!
> >> >> >> >
> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
> >> >> >> >> have, I
> >> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent 
> >> >> >> >> machine
> >> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> >> deterministically according to priorities. If there is an rcu 
> >> >> >> >> stall,
> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU 
> >> >> >> >> stall,
> >> >> >> >> but a workqueue stall, then that's always detected as workqueue 
> >> >> >> >> stall,
> >> >> >> >> etc.
> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that 
> >> >> >> >> can be
> >> >> >> >> detected either RCU stall or a task hung, producing 2 different 
> >> >> >> >> bug
> >> >> >> >> reports (which is bad).
> >> >> >> >> One can say that it's only a matter of tuning timeouts, but at 
> >> >> >> >> least
> >> >> >> >> task hung detector has a problem that if 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Dmitry Vyukov
On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>>  wrote:
>> >> >> >> >>
>> >> >> >> >> > Hello,
>> >> >> >> >> >
>> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 
>> >> >> >> >> > 2018 +)
>> >> >> >> >> > Linux 4.16
>> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >
>> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> >> > Raw console output:
>> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> > Kernel config:
>> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >
>> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to 
>> >> >> >> >> > the commit:
>> >> >> >> >> > Reported-by: 
>> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
>> >> >> >> >> > footer for
>> >> >> >> >> > details.
>> >> >> >> >> > If you forward the report, please keep this part and the 
>> >> >> >> >> > footer.
>> >> >> >> >> >
>> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
>> >> >> >> >> > unknown mount
>> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >
>> >> >> >> > Might not hurt to look into the above, though perhaps this is 
>> >> >> >> > just syzkaller
>> >> >> >> > playing around with mount options.
>> >> >> >> >
>> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
>> >> >> >> >> > seconds.
>> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> >> >> >> >> > this message.
>> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> > Call Trace:
>> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >>
>> >> >> >> >> I don't think this is a perf issue. Looks like something is 
>> >> >> >> >> preventing
>> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
>> >> >> >> >> kernel
>> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >
>> >> >> >> > The RCU CPU stall warning below strongly supports this position 
>> >> >> >> > ...
>> >> >> >>
>> >> >> >> I think this is this guy then:
>> >> >> >>
>> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >>
>> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >
>> >> >> > Seems likely to me!
>> >> >> >
>> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
>> >> >> >> have, I
>> >> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> >> but a workqueue stall, then that's always detected as workqueue 
>> >> >> >> stall,
>> >> >> >> etc.
>> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can 
>> >> >> >> be
>> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> >> reports (which is bad).
>> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> >> task hung detector has a problem that if you set timeout to X, it 
>> >> >> >> can
>> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need 
>> >> >> >> quite
>> >> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> >> dead (these 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Dmitry Vyukov
On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>>  wrote:
>> >> >> >> >>
>> >> >> >> >> > Hello,
>> >> >> >> >> >
>> >> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 
>> >> >> >> >> > 2018 +)
>> >> >> >> >> > Linux 4.16
>> >> >> >> >> > syzbot dashboard link:
>> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >> >
>> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> >> > Raw console output:
>> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> >> > Kernel config:
>> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >> >
>> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to 
>> >> >> >> >> > the commit:
>> >> >> >> >> > Reported-by: 
>> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
>> >> >> >> >> > footer for
>> >> >> >> >> > details.
>> >> >> >> >> > If you forward the report, please keep this part and the 
>> >> >> >> >> > footer.
>> >> >> >> >> >
>> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
>> >> >> >> >> > unknown mount
>> >> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >> >
>> >> >> >> > Might not hurt to look into the above, though perhaps this is 
>> >> >> >> > just syzkaller
>> >> >> >> > playing around with mount options.
>> >> >> >> >
>> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
>> >> >> >> >> > seconds.
>> >> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> >> >> >> >> > this message.
>> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> >> > Call Trace:
>> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >> >>
>> >> >> >> >> I don't think this is a perf issue. Looks like something is 
>> >> >> >> >> preventing
>> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
>> >> >> >> >> kernel
>> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >> >
>> >> >> >> > The RCU CPU stall warning below strongly supports this position 
>> >> >> >> > ...
>> >> >> >>
>> >> >> >> I think this is this guy then:
>> >> >> >>
>> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >> >>
>> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >> >
>> >> >> > Seems likely to me!
>> >> >> >
>> >> >> >> Looking retrospectively at the various hang/stall bugs that we 
>> >> >> >> have, I
>> >> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> >> but a workqueue stall, then that's always detected as workqueue 
>> >> >> >> stall,
>> >> >> >> etc.
>> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can 
>> >> >> >> be
>> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> >> reports (which is bad).
>> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> >> task hung detector has a problem that if you set timeout to X, it 
>> >> >> >> can
>> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need 
>> >> >> >> quite
>> >> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> >> dead (these things happen every few minutes).
>> >> >> >
>> >> >> > I 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Paul E. McKenney
On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>  wrote:
> >> >> >> >>
> >> >> >> >> > Hello,
> >> >> >> >> >
> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 
> >> >> >> >> > 2018 +)
> >> >> >> >> > Linux 4.16
> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >
> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> >> > Raw console output:
> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> > Kernel config:
> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >
> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to 
> >> >> >> >> > the commit:
> >> >> >> >> > Reported-by: 
> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
> >> >> >> >> > footer for
> >> >> >> >> > details.
> >> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >> >
> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
> >> >> >> >> > unknown mount
> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >
> >> >> >> > Might not hurt to look into the above, though perhaps this is just 
> >> >> >> > syzkaller
> >> >> >> > playing around with mount options.
> >> >> >> >
> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
> >> >> >> >> > seconds.
> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> >> >> >> >> > this message.
> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> > Call Trace:
> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >>
> >> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> >> preventing
> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
> >> >> >> >> kernel
> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >
> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >> >>
> >> >> >> I think this is this guy then:
> >> >> >>
> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >>
> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >
> >> >> > Seems likely to me!
> >> >> >
> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, 
> >> >> >> I
> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> >> but a workqueue stall, then that's always detected as workqueue 
> >> >> >> stall,
> >> >> >> etc.
> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can 
> >> >> >> be
> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> >> reports (which is bad).
> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> >> dead (these things happen every few minutes).
> >> >> >
> >> >> > I suppose that we could have a global variable that was set to the
> >> >> > priority of the complaint in question, which would suppress all
> >> >> > lower-priority complaints.  Might 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Paul E. McKenney
On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
>  wrote:
> >> >> >> >>
> >> >> >> >> > Hello,
> >> >> >> >> >
> >> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 
> >> >> >> >> > 2018 +)
> >> >> >> >> > Linux 4.16
> >> >> >> >> > syzbot dashboard link:
> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >> >
> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> >> > Raw console output:
> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> >> > Kernel config:
> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >> >
> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to 
> >> >> >> >> > the commit:
> >> >> >> >> > Reported-by: 
> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> >> > It will help syzbot understand when the bug is fixed. See 
> >> >> >> >> > footer for
> >> >> >> >> > details.
> >> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >> >
> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
> >> >> >> >> > unknown mount
> >> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >> >
> >> >> >> > Might not hurt to look into the above, though perhaps this is just 
> >> >> >> > syzkaller
> >> >> >> > playing around with mount options.
> >> >> >> >
> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 
> >> >> >> >> > seconds.
> >> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> >> >> >> >> > this message.
> >> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> >> > Call Trace:
> >> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >> >>
> >> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> >> preventing
> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
> >> >> >> >> kernel
> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> >> somehow missed a transition into idle or user space.
> >> >> >> >
> >> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >> >>
> >> >> >> I think this is this guy then:
> >> >> >>
> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >> >>
> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >> >
> >> >> > Seems likely to me!
> >> >> >
> >> >> >> Looking retrospectively at the various hang/stall bugs that we have, 
> >> >> >> I
> >> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> >> hang and maybe something else. It would be useful if they fire
> >> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> >> but a workqueue stall, then that's always detected as workqueue 
> >> >> >> stall,
> >> >> >> etc.
> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can 
> >> >> >> be
> >> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> >> reports (which is bad).
> >> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> >> dead (these things happen every few minutes).
> >> >> >
> >> >> > I suppose that we could have a global variable that was set to the
> >> >> > priority of the complaint in question, which would suppress all
> >> >> > lower-priority complaints.  Might need to be opt-in, though -- I 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
 wrote:
>> >> >> >>
>> >> >> >> > Hello,
>> >> >> >> >
>> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
>> >> >> >> > +)
>> >> >> >> > Linux 4.16
>> >> >> >> > syzbot dashboard link:
>> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >
>> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> > Raw console output:
>> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> > Kernel config:
>> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >
>> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to 
>> >> >> >> > the commit:
>> >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> > It will help syzbot understand when the bug is fixed. See footer 
>> >> >> >> > for
>> >> >> >> > details.
>> >> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >> >
>> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
>> >> >> >> > unknown mount
>> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >
>> >> >> > Might not hurt to look into the above, though perhaps this is just 
>> >> >> > syzkaller
>> >> >> > playing around with mount options.
>> >> >> >
>> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> >> >> >> > message.
>> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> > Call Trace:
>> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >>
>> >> >> >> I don't think this is a perf issue. Looks like something is 
>> >> >> >> preventing
>> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
>> >> >> >> kernel
>> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >
>> >> >> > The RCU CPU stall warning below strongly supports this position ...
>> >> >>
>> >> >> I think this is this guy then:
>> >> >>
>> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >>
>> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >
>> >> > Seems likely to me!
>> >> >
>> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> >> etc.
>> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> reports (which is bad).
>> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> task hung detector has a problem that if you set timeout to X, it can
>> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> dead (these things happen every few minutes).
>> >> >
>> >> > I suppose that we could have a global variable that was set to the
>> >> > priority of the complaint in question, which would suppress all
>> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> >> > guess that not everyone is going to be happy with one complaint 
>> >> > suppressing
>> >> > others, especially given the possibility that the two complaints might
>> >> > be about different things.
>> >> >
>> >> > Or did you have something more deft in mind?
>> >>
>> >>

Re: INFO: task hung in perf_trace_event_unreg

2018-04-09 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney
 wrote:
>> >> >> >>
>> >> >> >> > Hello,
>> >> >> >> >
>> >> >> >> > syzbot hit the following crash on upstream commit
>> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
>> >> >> >> > +)
>> >> >> >> > Linux 4.16
>> >> >> >> > syzbot dashboard link:
>> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >> >
>> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> >> > Raw console output:
>> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> >> > Kernel config:
>> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >> >
>> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to 
>> >> >> >> > the commit:
>> >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> >> > It will help syzbot understand when the bug is fixed. See footer 
>> >> >> >> > for
>> >> >> >> > details.
>> >> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >> >
>> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
>> >> >> >> > unknown mount
>> >> >> >> > option "g �;e�K�׫>pquota"
>> >> >> >
>> >> >> > Might not hurt to look into the above, though perhaps this is just 
>> >> >> > syzkaller
>> >> >> > playing around with mount options.
>> >> >> >
>> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >> >Not tainted 4.16.0+ #10
>> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> >> >> >> > message.
>> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> >> > Call Trace:
>> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >> >>
>> >> >> >> I don't think this is a perf issue. Looks like something is 
>> >> >> >> preventing
>> >> >> >> rcu_sched from completing. If there's a CPU that is running in 
>> >> >> >> kernel
>> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> >> somehow missed a transition into idle or user space.
>> >> >> >
>> >> >> > The RCU CPU stall warning below strongly supports this position ...
>> >> >>
>> >> >> I think this is this guy then:
>> >> >>
>> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >> >>
>> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >> >
>> >> > Seems likely to me!
>> >> >
>> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> >> think we need some kind of priority between them. I.e. we have rcu
>> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> >> hang and maybe something else. It would be useful if they fire
>> >> >> deterministically according to priorities. If there is an rcu stall,
>> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> >> etc.
>> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> >> reports (which is bad).
>> >> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> >> task hung detector has a problem that if you set timeout to X, it can
>> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> >> large timeout (a minute may not be enough), and on the other hand we
>> >> >> can't wait for an hour just to make sure that the machine is indeed
>> >> >> dead (these things happen every few minutes).
>> >> >
>> >> > I suppose that we could have a global variable that was set to the
>> >> > priority of the complaint in question, which would suppress all
>> >> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> >> > guess that not everyone is going to be happy with one complaint 
>> >> > suppressing
>> >> > others, especially given the possibility that the two complaints might
>> >> > be about different things.
>> >> >
>> >> > Or did you have something more deft in mind?
>> >>
>> >>
>> >> syzkaller generally 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 07:11:50PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
> >>  wrote:
> >> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> >> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
> >> >>  wrote:
> >> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> >> >> syzbot  wrote:
> >> >> >>
> >> >> >> > Hello,
> >> >> >> >
> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
> >> >> >> > +)
> >> >> >> > Linux 4.16
> >> >> >> > syzbot dashboard link:
> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >
> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> > Raw console output:
> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> > Kernel config:
> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >
> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
> >> >> >> > commit:
> >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> > It will help syzbot understand when the bug is fixed. See footer 
> >> >> >> > for
> >> >> >> > details.
> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >
> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
> >> >> >> > unknown mount
> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >
> >> >> > Might not hurt to look into the above, though perhaps this is just 
> >> >> > syzkaller
> >> >> > playing around with mount options.
> >> >> >
> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
> >> >> >> > message.
> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> > Call Trace:
> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >>
> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> preventing
> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> somehow missed a transition into idle or user space.
> >> >> >
> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >>
> >> >> I think this is this guy then:
> >> >>
> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >>
> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >
> >> > Seems likely to me!
> >> >
> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> hang and maybe something else. It would be useful if they fire
> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> >> etc.
> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> reports (which is bad).
> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> dead (these things happen every few minutes).
> >> >
> >> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 07:11:50PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
> >>  wrote:
> >> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> >> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
> >> >>  wrote:
> >> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> >> >> syzbot  wrote:
> >> >> >>
> >> >> >> > Hello,
> >> >> >> >
> >> >> >> > syzbot hit the following crash on upstream commit
> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
> >> >> >> > +)
> >> >> >> > Linux 4.16
> >> >> >> > syzbot dashboard link:
> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >> >
> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> >> > Raw console output:
> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> >> > Kernel config:
> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >> >
> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
> >> >> >> > commit:
> >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> >> > It will help syzbot understand when the bug is fixed. See footer 
> >> >> >> > for
> >> >> >> > details.
> >> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >> >
> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: 
> >> >> >> > unknown mount
> >> >> >> > option "g �;e�K�׫>pquota"
> >> >> >
> >> >> > Might not hurt to look into the above, though perhaps this is just 
> >> >> > syzkaller
> >> >> > playing around with mount options.
> >> >> >
> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >> >Not tainted 4.16.0+ #10
> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
> >> >> >> > message.
> >> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> >> > Call Trace:
> >> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >> >>
> >> >> >> I don't think this is a perf issue. Looks like something is 
> >> >> >> preventing
> >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> >> somehow missed a transition into idle or user space.
> >> >> >
> >> >> > The RCU CPU stall warning below strongly supports this position ...
> >> >>
> >> >> I think this is this guy then:
> >> >>
> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >> >>
> >> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >> >
> >> > Seems likely to me!
> >> >
> >> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> >> think we need some kind of priority between them. I.e. we have rcu
> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> >> hang and maybe something else. It would be useful if they fire
> >> >> deterministically according to priorities. If there is an rcu stall,
> >> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> >> etc.
> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> >> detected either RCU stall or a task hung, producing 2 different bug
> >> >> reports (which is bad).
> >> >> One can say that it's only a matter of tuning timeouts, but at least
> >> >> task hung detector has a problem that if you set timeout to X, it can
> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> >> large timeout (a minute may not be enough), and on the other hand we
> >> >> can't wait for an hour just to make sure that the machine is indeed
> >> >> dead (these things happen every few minutes).
> >> >
> >> > I suppose that we could have a global variable that was set to the
> >> > priority of the complaint in question, which would suppress 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
>>  wrote:
>> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
>> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>> >>  wrote:
>> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
>> >> >> syzbot  wrote:
>> >> >>
>> >> >> > Hello,
>> >> >> >
>> >> >> > syzbot hit the following crash on upstream commit
>> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
>> >> >> > +)
>> >> >> > Linux 4.16
>> >> >> > syzbot dashboard link:
>> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >
>> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> > Raw console output:
>> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> > Kernel config:
>> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >
>> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
>> >> >> > commit:
>> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> >> > details.
>> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >
>> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
>> >> >> > mount
>> >> >> > option "g �;e�K�׫>pquota"
>> >> >
>> >> > Might not hurt to look into the above, though perhaps this is just 
>> >> > syzkaller
>> >> > playing around with mount options.
>> >> >
>> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >Not tainted 4.16.0+ #10
>> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> >> >> > message.
>> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> > Call Trace:
>> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >>
>> >> >> I don't think this is a perf issue. Looks like something is preventing
>> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> somehow missed a transition into idle or user space.
>> >> >
>> >> > The RCU CPU stall warning below strongly supports this position ...
>> >>
>> >> I think this is this guy then:
>> >>
>> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >>
>> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >
>> > Seems likely to me!
>> >
>> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> think we need some kind of priority between them. I.e. we have rcu
>> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> hang and maybe something else. It would be useful if they fire
>> >> deterministically according to priorities. If there is an rcu stall,
>> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> etc.
>> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> reports (which is bad).
>> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> task hung detector has a problem that if you set timeout to X, it can
>> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> large timeout (a minute may not be enough), and on the other hand we
>> >> can't wait for an hour just to make sure that the machine is indeed
>> >> dead (these things happen every few minutes).
>> >
>> > I suppose that we could have a global variable that was set to the
>> > priority of the complaint in question, which would suppress all
>> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> > guess that not everyone is going to be happy with one 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
>>  wrote:
>> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
>> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>> >>  wrote:
>> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
>> >> >> syzbot  wrote:
>> >> >>
>> >> >> > Hello,
>> >> >> >
>> >> >> > syzbot hit the following crash on upstream commit
>> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
>> >> >> > +)
>> >> >> > Linux 4.16
>> >> >> > syzbot dashboard link:
>> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >> >
>> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> >> > Raw console output:
>> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> >> > Kernel config:
>> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >> >
>> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
>> >> >> > commit:
>> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> >> > details.
>> >> >> > If you forward the report, please keep this part and the footer.
>> >> >> >
>> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
>> >> >> > mount
>> >> >> > option "g �;e�K�׫>pquota"
>> >> >
>> >> > Might not hurt to look into the above, though perhaps this is just 
>> >> > syzkaller
>> >> > playing around with mount options.
>> >> >
>> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >> >Not tainted 4.16.0+ #10
>> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> >> >> > message.
>> >> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> >> > Call Trace:
>> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >> >>
>> >> >> I don't think this is a perf issue. Looks like something is preventing
>> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> >> somehow missed a transition into idle or user space.
>> >> >
>> >> > The RCU CPU stall warning below strongly supports this position ...
>> >>
>> >> I think this is this guy then:
>> >>
>> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>> >>
>> >> #syz dup: INFO: rcu detected stall in __process_echoes
>> >
>> > Seems likely to me!
>> >
>> >> Looking retrospectively at the various hang/stall bugs that we have, I
>> >> think we need some kind of priority between them. I.e. we have rcu
>> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> >> hang and maybe something else. It would be useful if they fire
>> >> deterministically according to priorities. If there is an rcu stall,
>> >> that's always detected as CPU stall. Then if there is no RCU stall,
>> >> but a workqueue stall, then that's always detected as workqueue stall,
>> >> etc.
>> >> Currently if we have an RCU stall (effectively CPU stall), that can be
>> >> detected either RCU stall or a task hung, producing 2 different bug
>> >> reports (which is bad).
>> >> One can say that it's only a matter of tuning timeouts, but at least
>> >> task hung detector has a problem that if you set timeout to X, it can
>> >> detect hung anywhere between X and 2*X. And on one hand we need quite
>> >> large timeout (a minute may not be enough), and on the other hand we
>> >> can't wait for an hour just to make sure that the machine is indeed
>> >> dead (these things happen every few minutes).
>> >
>> > I suppose that we could have a global variable that was set to the
>> > priority of the complaint in question, which would suppress all
>> > lower-priority complaints.  Might need to be opt-in, though -- I would
>> > guess that not everyone is going to be happy with one complaint suppressing
>> > others, especially given the possibility that the two complaints might
>> > be about different things.
>> >
>> > Or 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
> >>  wrote:
> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> >> syzbot  wrote:
> >> >>
> >> >> > Hello,
> >> >> >
> >> >> > syzbot hit the following crash on upstream commit
> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
> >> >> > +)
> >> >> > Linux 4.16
> >> >> > syzbot dashboard link:
> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >
> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> > Raw console output:
> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> > Kernel config:
> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >
> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
> >> >> > commit:
> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> >> > details.
> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >
> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
> >> >> > mount
> >> >> > option "g �;e�K�׫>pquota"
> >> >
> >> > Might not hurt to look into the above, though perhaps this is just 
> >> > syzkaller
> >> > playing around with mount options.
> >> >
> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >Not tainted 4.16.0+ #10
> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
> >> >> > message.
> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> > Call Trace:
> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >>
> >> >> I don't think this is a perf issue. Looks like something is preventing
> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> somehow missed a transition into idle or user space.
> >> >
> >> > The RCU CPU stall warning below strongly supports this position ...
> >>
> >> I think this is this guy then:
> >>
> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >>
> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >
> > Seems likely to me!
> >
> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> think we need some kind of priority between them. I.e. we have rcu
> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> hang and maybe something else. It would be useful if they fire
> >> deterministically according to priorities. If there is an rcu stall,
> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> etc.
> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> detected either RCU stall or a task hung, producing 2 different bug
> >> reports (which is bad).
> >> One can say that it's only a matter of tuning timeouts, but at least
> >> task hung detector has a problem that if you set timeout to X, it can
> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> large timeout (a minute may not be enough), and on the other hand we
> >> can't wait for an hour just to make sure that the machine is indeed
> >> dead (these things happen every few minutes).
> >
> > I suppose that we could have a global variable that was set to the
> > priority of the complaint in question, which would suppress all
> > lower-priority complaints.  Might need to be opt-in, though -- I would
> > guess that not everyone is going to be happy with one complaint suppressing
> > others, especially given the possibility that the two complaints might
> > be about different things.
> >
> > Or did you have something more deft in mind?
> 
> 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
> >>  wrote:
> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> >> syzbot  wrote:
> >> >>
> >> >> > Hello,
> >> >> >
> >> >> > syzbot hit the following crash on upstream commit
> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 
> >> >> > +)
> >> >> > Linux 4.16
> >> >> > syzbot dashboard link:
> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >> >
> >> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> >> > Raw console output:
> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> >> > Kernel config:
> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >> >
> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
> >> >> > commit:
> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> >> > details.
> >> >> > If you forward the report, please keep this part and the footer.
> >> >> >
> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
> >> >> > mount
> >> >> > option "g �;e�K�׫>pquota"
> >> >
> >> > Might not hurt to look into the above, though perhaps this is just 
> >> > syzkaller
> >> > playing around with mount options.
> >> >
> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >> >Not tainted 4.16.0+ #10
> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
> >> >> > message.
> >> >> > syz-executor3   D20944 10803   4492 0x8002
> >> >> > Call Trace:
> >> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >> >>
> >> >> I don't think this is a perf issue. Looks like something is preventing
> >> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> >> space and never scheduling, that can cause this issue. Or if RCU
> >> >> somehow missed a transition into idle or user space.
> >> >
> >> > The RCU CPU stall warning below strongly supports this position ...
> >>
> >> I think this is this guy then:
> >>
> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> >>
> >> #syz dup: INFO: rcu detected stall in __process_echoes
> >
> > Seems likely to me!
> >
> >> Looking retrospectively at the various hang/stall bugs that we have, I
> >> think we need some kind of priority between them. I.e. we have rcu
> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> >> hang and maybe something else. It would be useful if they fire
> >> deterministically according to priorities. If there is an rcu stall,
> >> that's always detected as CPU stall. Then if there is no RCU stall,
> >> but a workqueue stall, then that's always detected as workqueue stall,
> >> etc.
> >> Currently if we have an RCU stall (effectively CPU stall), that can be
> >> detected either RCU stall or a task hung, producing 2 different bug
> >> reports (which is bad).
> >> One can say that it's only a matter of tuning timeouts, but at least
> >> task hung detector has a problem that if you set timeout to X, it can
> >> detect hung anywhere between X and 2*X. And on one hand we need quite
> >> large timeout (a minute may not be enough), and on the other hand we
> >> can't wait for an hour just to make sure that the machine is indeed
> >> dead (these things happen every few minutes).
> >
> > I suppose that we could have a global variable that was set to the
> > priority of the complaint in question, which would suppress all
> > lower-priority complaints.  Might need to be opt-in, though -- I would
> > guess that not everyone is going to be happy with one complaint suppressing
> > others, especially given the possibility that the two complaints might
> > be about different things.
> >
> > Or did you have something more deft in mind?
> 
> 
> syzkaller generally looks only at the first report. One does not know
> if/when there will be a second one, 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>>  wrote:
>> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> >> On Mon, 02 Apr 2018 02:20:02 -0700
>> >> syzbot  wrote:
>> >>
>> >> > Hello,
>> >> >
>> >> > syzbot hit the following crash on upstream commit
>> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
>> >> > Linux 4.16
>> >> > syzbot dashboard link:
>> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >
>> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> > Raw console output:
>> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> > Kernel config:
>> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >
>> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
>> >> > commit:
>> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> > details.
>> >> > If you forward the report, please keep this part and the footer.
>> >> >
>> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
>> >> > mount
>> >> > option "g �;e�K�׫>pquota"
>> >
>> > Might not hurt to look into the above, though perhaps this is just 
>> > syzkaller
>> > playing around with mount options.
>> >
>> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >Not tainted 4.16.0+ #10
>> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> >> > message.
>> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> > Call Trace:
>> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >>
>> >> I don't think this is a perf issue. Looks like something is preventing
>> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> somehow missed a transition into idle or user space.
>> >
>> > The RCU CPU stall warning below strongly supports this position ...
>>
>> I think this is this guy then:
>>
>> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>>
>> #syz dup: INFO: rcu detected stall in __process_echoes
>
> Seems likely to me!
>
>> Looking retrospectively at the various hang/stall bugs that we have, I
>> think we need some kind of priority between them. I.e. we have rcu
>> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> hang and maybe something else. It would be useful if they fire
>> deterministically according to priorities. If there is an rcu stall,
>> that's always detected as CPU stall. Then if there is no RCU stall,
>> but a workqueue stall, then that's always detected as workqueue stall,
>> etc.
>> Currently if we have an RCU stall (effectively CPU stall), that can be
>> detected either RCU stall or a task hung, producing 2 different bug
>> reports (which is bad).
>> One can say that it's only a matter of tuning timeouts, but at least
>> task hung detector has a problem that if you set timeout to X, it can
>> detect hung anywhere between X and 2*X. And on one hand we need quite
>> large timeout (a minute may not be enough), and on the other hand we
>> can't wait for an hour just to make sure that the machine is indeed
>> dead (these things happen every few minutes).
>
> I suppose that we could have a global variable that was set to the
> priority of the complaint in question, which would suppress all
> lower-priority complaints.  Might need to be opt-in, though -- I would
> guess that not everyone is going to be happy with one complaint suppressing
> others, especially given the possibility that the two complaints might
> be about different things.
>
> Or did you have something more deft in mind?


syzkaller generally looks only at the first report. One does not know
if/when there will be a second one, or the second one can be induced
by the first one, and we generally want clean reports on a non-tainted
kernel. So we don't just need to suppress lower priority ones, 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>>  wrote:
>> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> >> On Mon, 02 Apr 2018 02:20:02 -0700
>> >> syzbot  wrote:
>> >>
>> >> > Hello,
>> >> >
>> >> > syzbot hit the following crash on upstream commit
>> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
>> >> > Linux 4.16
>> >> > syzbot dashboard link:
>> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >> >
>> >> > Unfortunately, I don't have any reproducer for this crash yet.
>> >> > Raw console output:
>> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> >> > Kernel config:
>> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> >> > compiler: gcc (GCC) 7.1.1 20170620
>> >> >
>> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
>> >> > commit:
>> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> >> > It will help syzbot understand when the bug is fixed. See footer for
>> >> > details.
>> >> > If you forward the report, please keep this part and the footer.
>> >> >
>> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
>> >> > mount
>> >> > option "g �;e�K�׫>pquota"
>> >
>> > Might not hurt to look into the above, though perhaps this is just 
>> > syzkaller
>> > playing around with mount options.
>> >
>> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >> >Not tainted 4.16.0+ #10
>> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
>> >> > message.
>> >> > syz-executor3   D20944 10803   4492 0x8002
>> >> > Call Trace:
>> >> >   context_switch kernel/sched/core.c:2862 [inline]
>> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>> >>
>> >> I don't think this is a perf issue. Looks like something is preventing
>> >> rcu_sched from completing. If there's a CPU that is running in kernel
>> >> space and never scheduling, that can cause this issue. Or if RCU
>> >> somehow missed a transition into idle or user space.
>> >
>> > The RCU CPU stall warning below strongly supports this position ...
>>
>> I think this is this guy then:
>>
>> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
>>
>> #syz dup: INFO: rcu detected stall in __process_echoes
>
> Seems likely to me!
>
>> Looking retrospectively at the various hang/stall bugs that we have, I
>> think we need some kind of priority between them. I.e. we have rcu
>> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
>> hang and maybe something else. It would be useful if they fire
>> deterministically according to priorities. If there is an rcu stall,
>> that's always detected as CPU stall. Then if there is no RCU stall,
>> but a workqueue stall, then that's always detected as workqueue stall,
>> etc.
>> Currently if we have an RCU stall (effectively CPU stall), that can be
>> detected either RCU stall or a task hung, producing 2 different bug
>> reports (which is bad).
>> One can say that it's only a matter of tuning timeouts, but at least
>> task hung detector has a problem that if you set timeout to X, it can
>> detect hung anywhere between X and 2*X. And on one hand we need quite
>> large timeout (a minute may not be enough), and on the other hand we
>> can't wait for an hour just to make sure that the machine is indeed
>> dead (these things happen every few minutes).
>
> I suppose that we could have a global variable that was set to the
> priority of the complaint in question, which would suppress all
> lower-priority complaints.  Might need to be opt-in, though -- I would
> guess that not everyone is going to be happy with one complaint suppressing
> others, especially given the possibility that the two complaints might
> be about different things.
>
> Or did you have something more deft in mind?


syzkaller generally looks only at the first report. One does not know
if/when there will be a second one, or the second one can be induced
by the first one, and we generally want clean reports on a non-tainted
kernel. So we don't just need to suppress lower priority ones, we need
to produce the right report first.
I am thinking maybe setting:
 - rcu stalls at 1.5 minutes
 - 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> syzbot  wrote:
> >>
> >> > Hello,
> >> >
> >> > syzbot hit the following crash on upstream commit
> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
> >> > Linux 4.16
> >> > syzbot dashboard link:
> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >
> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> > Raw console output:
> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> > Kernel config:
> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >
> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
> >> > commit:
> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> > details.
> >> > If you forward the report, please keep this part and the footer.
> >> >
> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
> >> > mount
> >> > option "g �;e�K�׫>pquota"
> >
> > Might not hurt to look into the above, though perhaps this is just syzkaller
> > playing around with mount options.
> >
> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >Not tainted 4.16.0+ #10
> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> > syz-executor3   D20944 10803   4492 0x8002
> >> > Call Trace:
> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >>
> >> I don't think this is a perf issue. Looks like something is preventing
> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> space and never scheduling, that can cause this issue. Or if RCU
> >> somehow missed a transition into idle or user space.
> >
> > The RCU CPU stall warning below strongly supports this position ...
> 
> I think this is this guy then:
> 
> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> 
> #syz dup: INFO: rcu detected stall in __process_echoes

Seems likely to me!

> Looking retrospectively at the various hang/stall bugs that we have, I
> think we need some kind of priority between them. I.e. we have rcu
> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> hang and maybe something else. It would be useful if they fire
> deterministically according to priorities. If there is an rcu stall,
> that's always detected as CPU stall. Then if there is no RCU stall,
> but a workqueue stall, then that's always detected as workqueue stall,
> etc.
> Currently if we have an RCU stall (effectively CPU stall), that can be
> detected either RCU stall or a task hung, producing 2 different bug
> reports (which is bad).
> One can say that it's only a matter of tuning timeouts, but at least
> task hung detector has a problem that if you set timeout to X, it can
> detect hung anywhere between X and 2*X. And on one hand we need quite
> large timeout (a minute may not be enough), and on the other hand we
> can't wait for an hour just to make sure that the machine is indeed
> dead (these things happen every few minutes).

I suppose that we could have a global variable that was set to the
priority of the complaint in question, which would suppress all
lower-priority complaints.  Might need to be opt-in, though -- I would
guess that not everyone is going to be happy with one complaint suppressing
others, especially given the possibility that the two complaints might
be about different things.

Or did you have something more deft in mind?

Thanx, Paul

> >> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 
> >> > [inline]
> >> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
> >> > kernel/trace/trace_event_perf.c:161
> >> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
> >> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
> >> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
> >> >   

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote:
> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
>  wrote:
> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> >> On Mon, 02 Apr 2018 02:20:02 -0700
> >> syzbot  wrote:
> >>
> >> > Hello,
> >> >
> >> > syzbot hit the following crash on upstream commit
> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
> >> > Linux 4.16
> >> > syzbot dashboard link:
> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> >> >
> >> > Unfortunately, I don't have any reproducer for this crash yet.
> >> > Raw console output:
> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> >> > Kernel config:
> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> >> > compiler: gcc (GCC) 7.1.1 20170620
> >> >
> >> > IMPORTANT: if you fix the bug, please add the following tag to the 
> >> > commit:
> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> >> > It will help syzbot understand when the bug is fixed. See footer for
> >> > details.
> >> > If you forward the report, please keep this part and the footer.
> >> >
> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown 
> >> > mount
> >> > option "g �;e�K�׫>pquota"
> >
> > Might not hurt to look into the above, though perhaps this is just syzkaller
> > playing around with mount options.
> >
> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >> >Not tainted 4.16.0+ #10
> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> > syz-executor3   D20944 10803   4492 0x8002
> >> > Call Trace:
> >> >   context_switch kernel/sched/core.c:2862 [inline]
> >> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> >>
> >> I don't think this is a perf issue. Looks like something is preventing
> >> rcu_sched from completing. If there's a CPU that is running in kernel
> >> space and never scheduling, that can cause this issue. Or if RCU
> >> somehow missed a transition into idle or user space.
> >
> > The RCU CPU stall warning below strongly supports this position ...
> 
> I think this is this guy then:
> 
> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40
> 
> #syz dup: INFO: rcu detected stall in __process_echoes

Seems likely to me!

> Looking retrospectively at the various hang/stall bugs that we have, I
> think we need some kind of priority between them. I.e. we have rcu
> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
> hang and maybe something else. It would be useful if they fire
> deterministically according to priorities. If there is an rcu stall,
> that's always detected as CPU stall. Then if there is no RCU stall,
> but a workqueue stall, then that's always detected as workqueue stall,
> etc.
> Currently if we have an RCU stall (effectively CPU stall), that can be
> detected either RCU stall or a task hung, producing 2 different bug
> reports (which is bad).
> One can say that it's only a matter of tuning timeouts, but at least
> task hung detector has a problem that if you set timeout to X, it can
> detect hung anywhere between X and 2*X. And on one hand we need quite
> large timeout (a minute may not be enough), and on the other hand we
> can't wait for an hour just to make sure that the machine is indeed
> dead (these things happen every few minutes).

I suppose that we could have a global variable that was set to the
priority of the complaint in question, which would suppress all
lower-priority complaints.  Might need to be opt-in, though -- I would
guess that not everyone is going to be happy with one complaint suppressing
others, especially given the possibility that the two complaints might
be about different things.

Or did you have something more deft in mind?

Thanx, Paul

> >> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 
> >> > [inline]
> >> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
> >> > kernel/trace/trace_event_perf.c:161
> >> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
> >> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
> >> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
> >> >   put_event+0x24/0x30 kernel/events/core.c:4204
> >> >   perf_event_release_kernel+0x6e8/0xfc0 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> On Mon, 02 Apr 2018 02:20:02 -0700
>> syzbot  wrote:
>>
>> > Hello,
>> >
>> > syzbot hit the following crash on upstream commit
>> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
>> > Linux 4.16
>> > syzbot dashboard link:
>> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >
>> > Unfortunately, I don't have any reproducer for this crash yet.
>> > Raw console output:
>> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> > Kernel config:
>> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> > compiler: gcc (GCC) 7.1.1 20170620
>> >
>> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> > It will help syzbot understand when the bug is fixed. See footer for
>> > details.
>> > If you forward the report, please keep this part and the footer.
>> >
>> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
>> > option "g �;e�K�׫>pquota"
>
> Might not hurt to look into the above, though perhaps this is just syzkaller
> playing around with mount options.
>
>> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >Not tainted 4.16.0+ #10
>> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > syz-executor3   D20944 10803   4492 0x8002
>> > Call Trace:
>> >   context_switch kernel/sched/core.c:2862 [inline]
>> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>>
>> I don't think this is a perf issue. Looks like something is preventing
>> rcu_sched from completing. If there's a CPU that is running in kernel
>> space and never scheduling, that can cause this issue. Or if RCU
>> somehow missed a transition into idle or user space.
>
> The RCU CPU stall warning below strongly supports this position ...


I think this is this guy then:

https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40

#syz dup: INFO: rcu detected stall in __process_echoes


Looking retrospectively at the various hang/stall bugs that we have, I
think we need some kind of priority between them. I.e. we have rcu
stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
hang and maybe something else. It would be useful if they fire
deterministically according to priorities. If there is an rcu stall,
that's always detected as CPU stall. Then if there is no RCU stall,
but a workqueue stall, then that's always detected as workqueue stall,
etc.
Currently if we have an RCU stall (effectively CPU stall), that can be
detected either RCU stall or a task hung, producing 2 different bug
reports (which is bad).
One can say that it's only a matter of tuning timeouts, but at least
task hung detector has a problem that if you set timeout to X, it can
detect hung anywhere between X and 2*X. And on one hand we need quite
large timeout (a minute may not be enough), and on the other hand we
can't wait for an hour just to make sure that the machine is indeed
dead (these things happen every few minutes).





>> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
>> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
>> > kernel/trace/trace_event_perf.c:161
>> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
>> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
>> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
>> >   put_event+0x24/0x30 kernel/events/core.c:4204
>> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
>> >   perf_release+0x37/0x50 kernel/events/core.c:4320
>> >   __fput+0x327/0x7e0 fs/file_table.c:209
>> >   fput+0x15/0x20 fs/file_table.c:243
>> >   task_work_run+0x199/0x270 kernel/task_work.c:113
>> >   exit_task_work include/linux/task_work.h:22 [inline]
>> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>> >   do_group_exit+0x149/0x400 kernel/exit.c:968
>> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
>> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>> >   syscall_return_slowpath 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Dmitry Vyukov
On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney
 wrote:
> On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
>> On Mon, 02 Apr 2018 02:20:02 -0700
>> syzbot  wrote:
>>
>> > Hello,
>> >
>> > syzbot hit the following crash on upstream commit
>> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
>> > Linux 4.16
>> > syzbot dashboard link:
>> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
>> >
>> > Unfortunately, I don't have any reproducer for this crash yet.
>> > Raw console output:
>> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
>> > Kernel config:
>> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
>> > compiler: gcc (GCC) 7.1.1 20170620
>> >
>> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
>> > It will help syzbot understand when the bug is fixed. See footer for
>> > details.
>> > If you forward the report, please keep this part and the footer.
>> >
>> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount
>> > option "g �;e�K�׫>pquota"
>
> Might not hurt to look into the above, though perhaps this is just syzkaller
> playing around with mount options.
>
>> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>> >Not tainted 4.16.0+ #10
>> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > syz-executor3   D20944 10803   4492 0x8002
>> > Call Trace:
>> >   context_switch kernel/sched/core.c:2862 [inline]
>> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
>> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
>> >   __wait_for_common kernel/sched/completion.c:107 [inline]
>> >   wait_for_common kernel/sched/completion.c:118 [inline]
>> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
>>
>> I don't think this is a perf issue. Looks like something is preventing
>> rcu_sched from completing. If there's a CPU that is running in kernel
>> space and never scheduling, that can cause this issue. Or if RCU
>> somehow missed a transition into idle or user space.
>
> The RCU CPU stall warning below strongly supports this position ...


I think this is this guy then:

https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40

#syz dup: INFO: rcu detected stall in __process_echoes


Looking retrospectively at the various hang/stall bugs that we have, I
think we need some kind of priority between them. I.e. we have rcu
stalls, spinlock stalls, workqueue hangs, task hangs, silent machine
hang and maybe something else. It would be useful if they fire
deterministically according to priorities. If there is an rcu stall,
that's always detected as CPU stall. Then if there is no RCU stall,
but a workqueue stall, then that's always detected as workqueue stall,
etc.
Currently if we have an RCU stall (effectively CPU stall), that can be
detected either RCU stall or a task hung, producing 2 different bug
reports (which is bad).
One can say that it's only a matter of tuning timeouts, but at least
task hung detector has a problem that if you set timeout to X, it can
detect hung anywhere between X and 2*X. And on one hand we need quite
large timeout (a minute may not be enough), and on the other hand we
can't wait for an hour just to make sure that the machine is indeed
dead (these things happen every few minutes).





>> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
>> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0
>> > kernel/trace/trace_event_perf.c:161
>> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
>> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
>> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
>> >   put_event+0x24/0x30 kernel/events/core.c:4204
>> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
>> >   perf_release+0x37/0x50 kernel/events/core.c:4320
>> >   __fput+0x327/0x7e0 fs/file_table.c:209
>> >   fput+0x15/0x20 fs/file_table.c:243
>> >   task_work_run+0x199/0x270 kernel/task_work.c:113
>> >   exit_task_work include/linux/task_work.h:22 [inline]
>> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>> >   do_group_exit+0x149/0x400 kernel/exit.c:968
>> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
>> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
>> >   

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> On Mon, 02 Apr 2018 02:20:02 -0700
> syzbot  wrote:
> 
> > Hello,
> > 
> > syzbot hit the following crash on upstream commit
> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
> > Linux 4.16
> > syzbot dashboard link:  
> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> > 
> > Unfortunately, I don't have any reproducer for this crash yet.
> > Raw console output:  
> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> > Kernel config:  
> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> > compiler: gcc (GCC) 7.1.1 20170620
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> > It will help syzbot understand when the bug is fixed. See footer for  
> > details.
> > If you forward the report, please keep this part and the footer.
> > 
> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
> > option "g�;e�K�׫>pquota"

Might not hurt to look into the above, though perhaps this is just syzkaller
playing around with mount options.

> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >Not tainted 4.16.0+ #10
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > syz-executor3   D20944 10803   4492 0x8002
> > Call Trace:
> >   context_switch kernel/sched/core.c:2862 [inline]
> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> 
> I don't think this is a perf issue. Looks like something is preventing
> rcu_sched from completing. If there's a CPU that is running in kernel
> space and never scheduling, that can cause this issue. Or if RCU
> somehow missed a transition into idle or user space.

The RCU CPU stall warning below strongly supports this position ...

> -- Steve
> 
> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0  
> > kernel/trace/trace_event_perf.c:161
> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
> >   put_event+0x24/0x30 kernel/events/core.c:4204
> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
> >   perf_release+0x37/0x50 kernel/events/core.c:4320
> >   __fput+0x327/0x7e0 fs/file_table.c:209
> >   fput+0x15/0x20 fs/file_table.c:243
> >   task_work_run+0x199/0x270 kernel/task_work.c:113
> >   exit_task_work include/linux/task_work.h:22 [inline]
> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
> >   do_group_exit+0x149/0x400 kernel/exit.c:968
> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
> >   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> > RIP: 0033:0x455269
> > RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca
> > RAX:  RBX: 0072bec8 RCX: 00455269
> > RDX:  RSI:  RDI: 0072bec8
> > RBP: 0072bec8 R08:  R09: 0072bea0
> > R10:  R11: 0246 R12: 
> > R13: 7ffe793f79cf R14: 7f89763729c0 R15: 
> > 
> > Showing all locks held in the system:
> > 2 locks held by khungtaskd/876:
> >   #0:  (rcu_read_lock){}, at: [<8f2bec4b>]  
> > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
> >   #0:  (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60
> > kernel/hung_task.c:249

... And two places to start looking are the two above rcu_read_lock() calls.
Especially given that khungtask shows up below.

> >   #1:  (tasklist_lock){.+.+}, at: [<06b3009f>]  
> > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> > 2 locks held by getty/4414:
> >   #0:  (>ldisc_sem){}, at: []  
> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >   #1:  (>atomic_read_lock){+.+.}, at: 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Paul E. McKenney
On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote:
> On Mon, 02 Apr 2018 02:20:02 -0700
> syzbot  wrote:
> 
> > Hello,
> > 
> > syzbot hit the following crash on upstream commit
> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
> > Linux 4.16
> > syzbot dashboard link:  
> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> > 
> > Unfortunately, I don't have any reproducer for this crash yet.
> > Raw console output:  
> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> > Kernel config:  
> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> > compiler: gcc (GCC) 7.1.1 20170620
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> > It will help syzbot understand when the bug is fixed. See footer for  
> > details.
> > If you forward the report, please keep this part and the footer.
> > 
> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
> > option "g�;e�K�׫>pquota"

Might not hurt to look into the above, though perhaps this is just syzkaller
playing around with mount options.

> > INFO: task syz-executor3:10803 blocked for more than 120 seconds.
> >Not tainted 4.16.0+ #10
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > syz-executor3   D20944 10803   4492 0x8002
> > Call Trace:
> >   context_switch kernel/sched/core.c:2862 [inline]
> >   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
> >   schedule+0xf5/0x430 kernel/sched/core.c:3499
> >   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
> >   do_wait_for_common kernel/sched/completion.c:86 [inline]
> >   __wait_for_common kernel/sched/completion.c:107 [inline]
> >   wait_for_common kernel/sched/completion.c:118 [inline]
> >   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
> >   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
> >   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
> >   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213
> 
> I don't think this is a perf issue. Looks like something is preventing
> rcu_sched from completing. If there's a CPU that is running in kernel
> space and never scheduling, that can cause this issue. Or if RCU
> somehow missed a transition into idle or user space.

The RCU CPU stall warning below strongly supports this position ...

> -- Steve
> 
> >   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
> >   perf_trace_event_unreg.isra.2+0xb7/0x1f0  
> > kernel/trace/trace_event_perf.c:161
> >   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
> >   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
> >   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
> >   put_event+0x24/0x30 kernel/events/core.c:4204
> >   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
> >   perf_release+0x37/0x50 kernel/events/core.c:4320
> >   __fput+0x327/0x7e0 fs/file_table.c:209
> >   fput+0x15/0x20 fs/file_table.c:243
> >   task_work_run+0x199/0x270 kernel/task_work.c:113
> >   exit_task_work include/linux/task_work.h:22 [inline]
> >   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
> >   do_group_exit+0x149/0x400 kernel/exit.c:968
> >   get_signal+0x73a/0x16d0 kernel/signal.c:2469
> >   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
> >   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
> >   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
> >   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
> >   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
> >   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> > RIP: 0033:0x455269
> > RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca
> > RAX:  RBX: 0072bec8 RCX: 00455269
> > RDX:  RSI:  RDI: 0072bec8
> > RBP: 0072bec8 R08:  R09: 0072bea0
> > R10:  R11: 0246 R12: 
> > R13: 7ffe793f79cf R14: 7f89763729c0 R15: 
> > 
> > Showing all locks held in the system:
> > 2 locks held by khungtaskd/876:
> >   #0:  (rcu_read_lock){}, at: [<8f2bec4b>]  
> > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
> >   #0:  (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60
> > kernel/hung_task.c:249

... And two places to start looking are the two above rcu_read_lock() calls.
Especially given that khungtask shows up below.

> >   #1:  (tasklist_lock){.+.+}, at: [<06b3009f>]  
> > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> > 2 locks held by getty/4414:
> >   #0:  (>ldisc_sem){}, at: []  
> > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
> >   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> > n_tty_read+0x2ef/0x1a40 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Steven Rostedt
On Mon, 02 Apr 2018 02:20:02 -0700
syzbot  wrote:

> Hello,
> 
> syzbot hit the following crash on upstream commit
> 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
> Linux 4.16
> syzbot dashboard link:  
> https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:  
> https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> Kernel config:  
> https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> compiler: gcc (GCC) 7.1.1 20170620
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for  
> details.
> If you forward the report, please keep this part and the footer.
> 
> REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
> option "g�;e�K�׫>pquota"
> INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>Not tainted 4.16.0+ #10
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor3   D20944 10803   4492 0x8002
> Call Trace:
>   context_switch kernel/sched/core.c:2862 [inline]
>   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>   schedule+0xf5/0x430 kernel/sched/core.c:3499
>   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>   do_wait_for_common kernel/sched/completion.c:86 [inline]
>   __wait_for_common kernel/sched/completion.c:107 [inline]
>   wait_for_common kernel/sched/completion.c:118 [inline]
>   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213

I don't think this is a perf issue. Looks like something is preventing
rcu_sched from completing. If there's a CPU that is running in kernel
space and never scheduling, that can cause this issue. Or if RCU
somehow missed a transition into idle or user space.

-- Steve

>   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
>   perf_trace_event_unreg.isra.2+0xb7/0x1f0  
> kernel/trace/trace_event_perf.c:161
>   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
>   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
>   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
>   put_event+0x24/0x30 kernel/events/core.c:4204
>   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
>   perf_release+0x37/0x50 kernel/events/core.c:4320
>   __fput+0x327/0x7e0 fs/file_table.c:209
>   fput+0x15/0x20 fs/file_table.c:243
>   task_work_run+0x199/0x270 kernel/task_work.c:113
>   exit_task_work include/linux/task_work.h:22 [inline]
>   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>   do_group_exit+0x149/0x400 kernel/exit.c:968
>   get_signal+0x73a/0x16d0 kernel/signal.c:2469
>   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
>   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x455269
> RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca
> RAX:  RBX: 0072bec8 RCX: 00455269
> RDX:  RSI:  RDI: 0072bec8
> RBP: 0072bec8 R08:  R09: 0072bea0
> R10:  R11: 0246 R12: 
> R13: 7ffe793f79cf R14: 7f89763729c0 R15: 
> 
> Showing all locks held in the system:
> 2 locks held by khungtaskd/876:
>   #0:  (rcu_read_lock){}, at: [<8f2bec4b>]  
> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
>   #0:  (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60  
> kernel/hung_task.c:249
>   #1:  (tasklist_lock){.+.+}, at: [<06b3009f>]  
> debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> 2 locks held by getty/4414:
>   #0:  (>ldisc_sem){}, at: []  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4415:
>   #0:  (>ldisc_sem){}, at: []  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4416:
>   #0:  (>ldisc_sem){}, at: []  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> n_tty_read+0x2ef/0x1a40 

Re: INFO: task hung in perf_trace_event_unreg

2018-04-02 Thread Steven Rostedt
On Mon, 02 Apr 2018 02:20:02 -0700
syzbot  wrote:

> Hello,
> 
> syzbot hit the following crash on upstream commit
> 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +)
> Linux 4.16
> syzbot dashboard link:  
> https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:  
> https://syzkaller.appspot.com/x/log.txt?id=5487937873510400
> Kernel config:  
> https://syzkaller.appspot.com/x/.config?id=-2374466361298166459
> compiler: gcc (GCC) 7.1.1 20170620
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for  
> details.
> If you forward the report, please keep this part and the footer.
> 
> REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount  
> option "g�;e�K�׫>pquota"
> INFO: task syz-executor3:10803 blocked for more than 120 seconds.
>Not tainted 4.16.0+ #10
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor3   D20944 10803   4492 0x8002
> Call Trace:
>   context_switch kernel/sched/core.c:2862 [inline]
>   __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
>   schedule+0xf5/0x430 kernel/sched/core.c:3499
>   schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777
>   do_wait_for_common kernel/sched/completion.c:86 [inline]
>   __wait_for_common kernel/sched/completion.c:107 [inline]
>   wait_for_common kernel/sched/completion.c:118 [inline]
>   wait_for_completion+0x415/0x770 kernel/sched/completion.c:139
>   __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414
>   synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212
>   synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213

I don't think this is a perf issue. Looks like something is preventing
rcu_sched from completing. If there's a CPU that is running in kernel
space and never scheduling, that can cause this issue. Or if RCU
somehow missed a transition into idle or user space.

-- Steve

>   tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline]
>   perf_trace_event_unreg.isra.2+0xb7/0x1f0  
> kernel/trace/trace_event_perf.c:161
>   perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236
>   tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976
>   _free_event+0x3bd/0x10f0 kernel/events/core.c:4121
>   put_event+0x24/0x30 kernel/events/core.c:4204
>   perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310
>   perf_release+0x37/0x50 kernel/events/core.c:4320
>   __fput+0x327/0x7e0 fs/file_table.c:209
>   fput+0x15/0x20 fs/file_table.c:243
>   task_work_run+0x199/0x270 kernel/task_work.c:113
>   exit_task_work include/linux/task_work.h:22 [inline]
>   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>   do_group_exit+0x149/0x400 kernel/exit.c:968
>   get_signal+0x73a/0x16d0 kernel/signal.c:2469
>   do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>   exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>   prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>   do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
>   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x455269
> RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca
> RAX:  RBX: 0072bec8 RCX: 00455269
> RDX:  RSI:  RDI: 0072bec8
> RBP: 0072bec8 R08:  R09: 0072bea0
> R10:  R11: 0246 R12: 
> R13: 7ffe793f79cf R14: 7f89763729c0 R15: 
> 
> Showing all locks held in the system:
> 2 locks held by khungtaskd/876:
>   #0:  (rcu_read_lock){}, at: [<8f2bec4b>]  
> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
>   #0:  (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60  
> kernel/hung_task.c:249
>   #1:  (tasklist_lock){.+.+}, at: [<06b3009f>]  
> debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
> 2 locks held by getty/4414:
>   #0:  (>ldisc_sem){}, at: []  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4415:
>   #0:  (>ldisc_sem){}, at: []  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4416:
>   #0:  (>ldisc_sem){}, at: []  
> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
>   #1:  (>atomic_read_lock){+.+.}, at: [<762a7320>]  
> n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
> 2 locks held by getty/4417:
>