Re: INFO: task hung in perf_trace_event_unreg
On Thu, Apr 12, 2018 at 11:39:42AM +0200, Dmitry Vyukov wrote: > On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney >wrote: > >> >> >> >> wrote: > >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > Hello, > >> >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit > >> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 > >> >> >> >> >> >> >> >> > 21:20:27 2018 +) > >> >> >> >> >> >> >> >> > Linux 4.16 > >> >> >> >> >> >> >> >> > syzbot dashboard link: > >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this > >> >> >> >> >> >> >> >> > crash yet. > >> >> >> >> >> >> >> >> > Raw console output: > >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> >> >> >> >> >> >> > Kernel config: > >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the > >> >> >> >> >> >> >> >> > following tag to the commit: > >> >> >> >> >> >> >> >> > Reported-by: > >> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is > >> >> >> >> >> >> >> >> > fixed. See footer for > >> >> >> >> >> >> >> >> > details. > >> >> >> >> >> >> >> >> > If you forward the report, please keep this part > >> >> >> >> >> >> >> >> > and the footer. > >> >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 > >> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount > >> >> >> >> >> >> >> >> > option "g �;e�K�>pquota" > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps > >> >> >> >> >> >> >> > this is just syzkaller > >> >> >> >> >> >> >> > playing around with mount options. > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more > >> >> >> >> >> >> >> >> > than 120 seconds. > >> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10 > >> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >> >> >> >> >> >> >> >> > disables this message. > >> >> >> >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> >> >> >> >> >> >> > Call Trace: > >> >> >> >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> >> >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> >> >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> >> >> >> >> >> >> > schedule_timeout+0x1a3/0x230 > >> >> >> >> >> >> >> >> > kernel/time/timer.c:1777 > >> >> >> >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 > >> >> >> >> >> >> >> >> > [inline] > >> >> >> >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 > >> >> >> >> >> >> >> >> > [inline] > >> >> >> >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 > >> >> >> >> >> >> >> >> > [inline] > >> >> >> >> >> >> >> >> > wait_for_completion+0x415/0x770 > >> >> >> >> >> >> >> >> > kernel/sched/completion.c:139 > >> >> >> >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> >> >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 > >> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212 > >> >> >> >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like > >> >> >> >> >> >> >> >> something is preventing > >> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is > >> >> >> >> >> >> >> >> running in kernel > >> >> >> >> >> >> >> >> space and never scheduling, that can cause this > >> >> >> >> >> >> >> >> issue. Or if RCU > >> >> >> >> >> >> >> >> somehow missed a transition into idle or user space. > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this > >> >> >> >> >> >> >> > position ... > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> I think this is this guy then: > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes > >> >> >> >> >> >> > > >> >> >> >> >> >> > Seems likely to me! > >> >> >> >> >> >> > > >> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs > >> >> >> >> >> >> >> that we have, I > >> >> >> >> >> >> >> think we need some kind of priority between them. I.e. > >> >> >> >> >> >> >> we have rcu > >> >> >> >> >> >> >> stalls, spinlock stalls, workqueue
Re: INFO: task hung in perf_trace_event_unreg
On Thu, Apr 12, 2018 at 11:39:42AM +0200, Dmitry Vyukov wrote: > On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney > wrote: > >> >> >> >> wrote: > >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > Hello, > >> >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit > >> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 > >> >> >> >> >> >> >> >> > 21:20:27 2018 +) > >> >> >> >> >> >> >> >> > Linux 4.16 > >> >> >> >> >> >> >> >> > syzbot dashboard link: > >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this > >> >> >> >> >> >> >> >> > crash yet. > >> >> >> >> >> >> >> >> > Raw console output: > >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> >> >> >> >> >> >> > Kernel config: > >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the > >> >> >> >> >> >> >> >> > following tag to the commit: > >> >> >> >> >> >> >> >> > Reported-by: > >> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is > >> >> >> >> >> >> >> >> > fixed. See footer for > >> >> >> >> >> >> >> >> > details. > >> >> >> >> >> >> >> >> > If you forward the report, please keep this part > >> >> >> >> >> >> >> >> > and the footer. > >> >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 > >> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount > >> >> >> >> >> >> >> >> > option "g �;e�K�>pquota" > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps > >> >> >> >> >> >> >> > this is just syzkaller > >> >> >> >> >> >> >> > playing around with mount options. > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more > >> >> >> >> >> >> >> >> > than 120 seconds. > >> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10 > >> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >> >> >> >> >> >> >> >> > disables this message. > >> >> >> >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> >> >> >> >> >> >> > Call Trace: > >> >> >> >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> >> >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> >> >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> >> >> >> >> >> >> > schedule_timeout+0x1a3/0x230 > >> >> >> >> >> >> >> >> > kernel/time/timer.c:1777 > >> >> >> >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 > >> >> >> >> >> >> >> >> > [inline] > >> >> >> >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 > >> >> >> >> >> >> >> >> > [inline] > >> >> >> >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 > >> >> >> >> >> >> >> >> > [inline] > >> >> >> >> >> >> >> >> > wait_for_completion+0x415/0x770 > >> >> >> >> >> >> >> >> > kernel/sched/completion.c:139 > >> >> >> >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> >> >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 > >> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212 > >> >> >> >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like > >> >> >> >> >> >> >> >> something is preventing > >> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is > >> >> >> >> >> >> >> >> running in kernel > >> >> >> >> >> >> >> >> space and never scheduling, that can cause this > >> >> >> >> >> >> >> >> issue. Or if RCU > >> >> >> >> >> >> >> >> somehow missed a transition into idle or user space. > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this > >> >> >> >> >> >> >> > position ... > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> I think this is this guy then: > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes > >> >> >> >> >> >> > > >> >> >> >> >> >> > Seems likely to me! > >> >> >> >> >> >> > > >> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs > >> >> >> >> >> >> >> that we have, I > >> >> >> >> >> >> >> think we need some kind of priority between them. I.e. > >> >> >> >> >> >> >> we have rcu > >> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, > >> >> >> >> >> >> >> silent machine >
Re: INFO: task hung in perf_trace_event_unreg
On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenneywrote: >> >> >> >> wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > Hello, >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit >> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 >> >> >> >> >> >> >> >> > 21:20:27 2018 +) >> >> >> >> >> >> >> >> > Linux 4.16 >> >> >> >> >> >> >> >> > syzbot dashboard link: >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this >> >> >> >> >> >> >> >> > crash yet. >> >> >> >> >> >> >> >> > Raw console output: >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> >> >> >> >> >> >> > Kernel config: >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the >> >> >> >> >> >> >> >> > following tag to the commit: >> >> >> >> >> >> >> >> > Reported-by: >> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. >> >> >> >> >> >> >> >> > See footer for >> >> >> >> >> >> >> >> > details. >> >> >> >> >> >> >> >> > If you forward the report, please keep this part and >> >> >> >> >> >> >> >> > the footer. >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 >> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount >> >> >> >> >> >> >> >> > option "g �;e�K�>pquota" >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps >> >> >> >> >> >> >> > this is just syzkaller >> >> >> >> >> >> >> > playing around with mount options. >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than >> >> >> >> >> >> >> >> > 120 seconds. >> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10 >> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> >> >> >> >> >> >> >> > disables this message. >> >> >> >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> >> >> >> >> >> >> > Call Trace: >> >> >> >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> >> >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> >> >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> >> >> >> >> >> >> > schedule_timeout+0x1a3/0x230 >> >> >> >> >> >> >> >> > kernel/time/timer.c:1777 >> >> >> >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 >> >> >> >> >> >> >> >> > [inline] >> >> >> >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 >> >> >> >> >> >> >> >> > [inline] >> >> >> >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 >> >> >> >> >> >> >> >> > [inline] >> >> >> >> >> >> >> >> > wait_for_completion+0x415/0x770 >> >> >> >> >> >> >> >> > kernel/sched/completion.c:139 >> >> >> >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> >> >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 >> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212 >> >> >> >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like >> >> >> >> >> >> >> >> something is preventing >> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is >> >> >> >> >> >> >> >> running in kernel >> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. >> >> >> >> >> >> >> >> Or if RCU >> >> >> >> >> >> >> >> somehow missed a transition into idle or user space. >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this >> >> >> >> >> >> >> > position ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> I think this is this guy then: >> >> >> >> >> >> >> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> >> >> >> >> >> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes >> >> >> >> >> >> > >> >> >> >> >> >> > Seems likely to me! >> >> >> >> >> >> > >> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs >> >> >> >> >> >> >> that we have, I >> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we >> >> >> >> >> >> >> have rcu >> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, >> >> >> >> >> >> >> silent machine >> >> >> >> >> >> >> hang and maybe something else. It would be useful if they >> >> >> >> >> >> >> fire >> >> >> >> >> >> >> deterministically according to priorities. If there is an
Re: INFO: task hung in perf_trace_event_unreg
On Wed, Apr 11, 2018 at 9:36 PM, Paul E. McKenney wrote: >> >> >> >> wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > Hello, >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit >> >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 >> >> >> >> >> >> >> >> > 21:20:27 2018 +) >> >> >> >> >> >> >> >> > Linux 4.16 >> >> >> >> >> >> >> >> > syzbot dashboard link: >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this >> >> >> >> >> >> >> >> > crash yet. >> >> >> >> >> >> >> >> > Raw console output: >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> >> >> >> >> >> >> > Kernel config: >> >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the >> >> >> >> >> >> >> >> > following tag to the commit: >> >> >> >> >> >> >> >> > Reported-by: >> >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. >> >> >> >> >> >> >> >> > See footer for >> >> >> >> >> >> >> >> > details. >> >> >> >> >> >> >> >> > If you forward the report, please keep this part and >> >> >> >> >> >> >> >> > the footer. >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 >> >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount >> >> >> >> >> >> >> >> > option "g �;e�K�>pquota" >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps >> >> >> >> >> >> >> > this is just syzkaller >> >> >> >> >> >> >> > playing around with mount options. >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than >> >> >> >> >> >> >> >> > 120 seconds. >> >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10 >> >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> >> >> >> >> >> >> >> > disables this message. >> >> >> >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> >> >> >> >> >> >> > Call Trace: >> >> >> >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> >> >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> >> >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> >> >> >> >> >> >> > schedule_timeout+0x1a3/0x230 >> >> >> >> >> >> >> >> > kernel/time/timer.c:1777 >> >> >> >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 >> >> >> >> >> >> >> >> > [inline] >> >> >> >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 >> >> >> >> >> >> >> >> > [inline] >> >> >> >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 >> >> >> >> >> >> >> >> > [inline] >> >> >> >> >> >> >> >> > wait_for_completion+0x415/0x770 >> >> >> >> >> >> >> >> > kernel/sched/completion.c:139 >> >> >> >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> >> >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 >> >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212 >> >> >> >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like >> >> >> >> >> >> >> >> something is preventing >> >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is >> >> >> >> >> >> >> >> running in kernel >> >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. >> >> >> >> >> >> >> >> Or if RCU >> >> >> >> >> >> >> >> somehow missed a transition into idle or user space. >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this >> >> >> >> >> >> >> > position ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> I think this is this guy then: >> >> >> >> >> >> >> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> >> >> >> >> >> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes >> >> >> >> >> >> > >> >> >> >> >> >> > Seems likely to me! >> >> >> >> >> >> > >> >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs >> >> >> >> >> >> >> that we have, I >> >> >> >> >> >> >> think we need some kind of priority between them. I.e. we >> >> >> >> >> >> >> have rcu >> >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, >> >> >> >> >> >> >> silent machine >> >> >> >> >> >> >> hang and maybe something else. It would be useful if they >> >> >> >> >> >> >> fire >> >> >> >> >> >> >> deterministically according to priorities. If there is an >> >> >> >> >> >> >> rcu stall, >> >> >> >> >> >> >>
Re: INFO: task hung in perf_trace_event_unreg
On Wed, Apr 11, 2018 at 12:06:27PM +0200, Dmitry Vyukov wrote: > On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney >wrote: > >> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney > >> >> >> wrote: > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > Hello, > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit > >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 > >> >> >> >> >> >> >> > 21:20:27 2018 +) > >> >> >> >> >> >> >> > Linux 4.16 > >> >> >> >> >> >> >> > syzbot dashboard link: > >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this > >> >> >> >> >> >> >> > crash yet. > >> >> >> >> >> >> >> > Raw console output: > >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> >> >> >> >> >> > Kernel config: > >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the > >> >> >> >> >> >> >> > following tag to the commit: > >> >> >> >> >> >> >> > Reported-by: > >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. > >> >> >> >> >> >> >> > See footer for > >> >> >> >> >> >> >> > details. > >> >> >> >> >> >> >> > If you forward the report, please keep this part and > >> >> >> >> >> >> >> > the footer. > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 > >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount > >> >> >> >> >> >> >> > option "g �;e�K�>pquota" > >> >> >> >> >> >> > > >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps > >> >> >> >> >> >> > this is just syzkaller > >> >> >> >> >> >> > playing around with mount options. > >> >> >> >> >> >> > > >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than > >> >> >> >> >> >> >> > 120 seconds. > >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10 > >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >> >> >> >> >> >> >> > disables this message. > >> >> >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> >> >> >> >> >> > Call Trace: > >> >> >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> >> >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> >> >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 > >> >> >> >> >> >> >> > [inline] > >> >> >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 > >> >> >> >> >> >> >> > [inline] > >> >> >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 > >> >> >> >> >> >> >> > [inline] > >> >> >> >> >> >> >> > wait_for_completion+0x415/0x770 > >> >> >> >> >> >> >> > kernel/sched/completion.c:139 > >> >> >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 > >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212 > >> >> >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something > >> >> >> >> >> >> >> is preventing > >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is > >> >> >> >> >> >> >> running in kernel > >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. > >> >> >> >> >> >> >> Or if RCU > >> >> >> >> >> >> >> somehow missed a transition into idle or user space. > >> >> >> >> >> >> > > >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this > >> >> >> >> >> >> > position ... > >> >> >> >> >> >> > >> >> >> >> >> >> I think this is this guy then: > >> >> >> >> >> >> > >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> >> >> >> >> >> > >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes > >> >> >> >> >> > > >> >> >> >> >> > Seems likely to me! > >> >> >> >> >> > > >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that > >> >> >> >> >> >> we have, I > >> >> >> >> >> >> think we need some kind of priority between them. I.e. we > >> >> >> >> >> >> have rcu > >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, > >> >> >> >> >> >> silent machine > >> >> >> >> >> >> hang and maybe something else. It would be useful if they > >> >> >> >> >> >> fire > >> >> >> >> >> >> deterministically according to priorities. If
Re: INFO: task hung in perf_trace_event_unreg
On Wed, Apr 11, 2018 at 12:06:27PM +0200, Dmitry Vyukov wrote: > On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney > wrote: > >> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney > >> >> >> wrote: > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > Hello, > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit > >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 > >> >> >> >> >> >> >> > 21:20:27 2018 +) > >> >> >> >> >> >> >> > Linux 4.16 > >> >> >> >> >> >> >> > syzbot dashboard link: > >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this > >> >> >> >> >> >> >> > crash yet. > >> >> >> >> >> >> >> > Raw console output: > >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> >> >> >> >> >> > Kernel config: > >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the > >> >> >> >> >> >> >> > following tag to the commit: > >> >> >> >> >> >> >> > Reported-by: > >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. > >> >> >> >> >> >> >> > See footer for > >> >> >> >> >> >> >> > details. > >> >> >> >> >> >> >> > If you forward the report, please keep this part and > >> >> >> >> >> >> >> > the footer. > >> >> >> >> >> >> >> > > >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 > >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount > >> >> >> >> >> >> >> > option "g �;e�K�>pquota" > >> >> >> >> >> >> > > >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps > >> >> >> >> >> >> > this is just syzkaller > >> >> >> >> >> >> > playing around with mount options. > >> >> >> >> >> >> > > >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than > >> >> >> >> >> >> >> > 120 seconds. > >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10 > >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >> >> >> >> >> >> >> > disables this message. > >> >> >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> >> >> >> >> >> > Call Trace: > >> >> >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> >> >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> >> >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 > >> >> >> >> >> >> >> > [inline] > >> >> >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 > >> >> >> >> >> >> >> > [inline] > >> >> >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 > >> >> >> >> >> >> >> > [inline] > >> >> >> >> >> >> >> > wait_for_completion+0x415/0x770 > >> >> >> >> >> >> >> > kernel/sched/completion.c:139 > >> >> >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 > >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212 > >> >> >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something > >> >> >> >> >> >> >> is preventing > >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is > >> >> >> >> >> >> >> running in kernel > >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. > >> >> >> >> >> >> >> Or if RCU > >> >> >> >> >> >> >> somehow missed a transition into idle or user space. > >> >> >> >> >> >> > > >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this > >> >> >> >> >> >> > position ... > >> >> >> >> >> >> > >> >> >> >> >> >> I think this is this guy then: > >> >> >> >> >> >> > >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> >> >> >> >> >> > >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes > >> >> >> >> >> > > >> >> >> >> >> > Seems likely to me! > >> >> >> >> >> > > >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that > >> >> >> >> >> >> we have, I > >> >> >> >> >> >> think we need some kind of priority between them. I.e. we > >> >> >> >> >> >> have rcu > >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, > >> >> >> >> >> >> silent machine > >> >> >> >> >> >> hang and maybe something else. It would be useful if they > >> >> >> >> >> >> fire > >> >> >> >> >> >> deterministically according to priorities. If there is an > >> >> >> >> >> >> rcu stall, > >> >> >> >>
Re: INFO: task hung in perf_trace_event_unreg
On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenneywrote: >> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney >> >> >> wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> > Hello, >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 >> >> >> >> >> >> >> > 21:20:27 2018 +) >> >> >> >> >> >> >> > Linux 4.16 >> >> >> >> >> >> >> > syzbot dashboard link: >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this >> >> >> >> >> >> >> > crash yet. >> >> >> >> >> >> >> > Raw console output: >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> >> >> >> >> >> > Kernel config: >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following >> >> >> >> >> >> >> > tag to the commit: >> >> >> >> >> >> >> > Reported-by: >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. >> >> >> >> >> >> >> > See footer for >> >> >> >> >> >> >> > details. >> >> >> >> >> >> >> > If you forward the report, please keep this part and the >> >> >> >> >> >> >> > footer. >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount >> >> >> >> >> >> >> > option "g �;e�K�>pquota" >> >> >> >> >> >> > >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this >> >> >> >> >> >> > is just syzkaller >> >> >> >> >> >> > playing around with mount options. >> >> >> >> >> >> > >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 >> >> >> >> >> >> >> > seconds. >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10 >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> >> >> >> >> >> >> > disables this message. >> >> >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> >> >> >> >> >> > Call Trace: >> >> >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> >> >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> >> >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 >> >> >> >> >> >> >> > [inline] >> >> >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 >> >> >> >> >> >> >> > [inline] >> >> >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] >> >> >> >> >> >> >> > wait_for_completion+0x415/0x770 >> >> >> >> >> >> >> > kernel/sched/completion.c:139 >> >> >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212 >> >> >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> >> >> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something >> >> >> >> >> >> >> is preventing >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is >> >> >> >> >> >> >> running in kernel >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or >> >> >> >> >> >> >> if RCU >> >> >> >> >> >> >> somehow missed a transition into idle or user space. >> >> >> >> >> >> > >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this >> >> >> >> >> >> > position ... >> >> >> >> >> >> >> >> >> >> >> >> I think this is this guy then: >> >> >> >> >> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> >> >> >> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes >> >> >> >> >> > >> >> >> >> >> > Seems likely to me! >> >> >> >> >> > >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that >> >> >> >> >> >> we have, I >> >> >> >> >> >> think we need some kind of priority between them. I.e. we >> >> >> >> >> >> have rcu >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent >> >> >> >> >> >> machine >> >> >> >> >> >> hang and maybe something else. It would be useful if they fire >> >> >> >> >> >> deterministically according to priorities. If there is an rcu >> >> >> >> >> >> stall, >> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU >> >> >> >> >> >> stall, >> >> >> >> >> >> but a workqueue stall, then that's always detected as >> >> >> >> >> >> workqueue stall, >> >> >> >> >> >> etc. >> >> >> >> >>
Re: INFO: task hung in perf_trace_event_unreg
On Tue, Apr 10, 2018 at 7:02 PM, Paul E. McKenney wrote: >> >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney >> >> >> wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> > Hello, >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > syzbot hit the following crash on upstream commit >> >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 >> >> >> >> >> >> >> > 21:20:27 2018 +) >> >> >> >> >> >> >> > Linux 4.16 >> >> >> >> >> >> >> > syzbot dashboard link: >> >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this >> >> >> >> >> >> >> > crash yet. >> >> >> >> >> >> >> > Raw console output: >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> >> >> >> >> >> > Kernel config: >> >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following >> >> >> >> >> >> >> > tag to the commit: >> >> >> >> >> >> >> > Reported-by: >> >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. >> >> >> >> >> >> >> > See footer for >> >> >> >> >> >> >> > details. >> >> >> >> >> >> >> > If you forward the report, please keep this part and the >> >> >> >> >> >> >> > footer. >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 >> >> >> >> >> >> >> > reiserfs_getopt: unknown mount >> >> >> >> >> >> >> > option "g �;e�K�>pquota" >> >> >> >> >> >> > >> >> >> >> >> >> > Might not hurt to look into the above, though perhaps this >> >> >> >> >> >> > is just syzkaller >> >> >> >> >> >> > playing around with mount options. >> >> >> >> >> >> > >> >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 >> >> >> >> >> >> >> > seconds. >> >> >> >> >> >> >> >Not tainted 4.16.0+ #10 >> >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> >> >> >> >> >> >> > disables this message. >> >> >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> >> >> >> >> >> > Call Trace: >> >> >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> >> >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> >> >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 >> >> >> >> >> >> >> > [inline] >> >> >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 >> >> >> >> >> >> >> > [inline] >> >> >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] >> >> >> >> >> >> >> > wait_for_completion+0x415/0x770 >> >> >> >> >> >> >> > kernel/sched/completion.c:139 >> >> >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 >> >> >> >> >> >> >> > kernel/rcu/tree.c:3212 >> >> >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> >> >> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something >> >> >> >> >> >> >> is preventing >> >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is >> >> >> >> >> >> >> running in kernel >> >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or >> >> >> >> >> >> >> if RCU >> >> >> >> >> >> >> somehow missed a transition into idle or user space. >> >> >> >> >> >> > >> >> >> >> >> >> > The RCU CPU stall warning below strongly supports this >> >> >> >> >> >> > position ... >> >> >> >> >> >> >> >> >> >> >> >> I think this is this guy then: >> >> >> >> >> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> >> >> >> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes >> >> >> >> >> > >> >> >> >> >> > Seems likely to me! >> >> >> >> >> > >> >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that >> >> >> >> >> >> we have, I >> >> >> >> >> >> think we need some kind of priority between them. I.e. we >> >> >> >> >> >> have rcu >> >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent >> >> >> >> >> >> machine >> >> >> >> >> >> hang and maybe something else. It would be useful if they fire >> >> >> >> >> >> deterministically according to priorities. If there is an rcu >> >> >> >> >> >> stall, >> >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU >> >> >> >> >> >> stall, >> >> >> >> >> >> but a workqueue stall, then that's always detected as >> >> >> >> >> >> workqueue stall, >> >> >> >> >> >> etc. >> >> >> >> >> >> Currently if we have an RCU stall (effectively CPU
Re: INFO: task hung in perf_trace_event_unreg
On Tue, Apr 10, 2018 at 01:13:13PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney >wrote: > > On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote: > >> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney > >> wrote: > >> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote: > >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney > >> >> wrote: > >> >> >> >> >> >> > >> >> >> >> >> >> > Hello, > >> >> >> >> >> >> > > >> >> >> >> >> >> > syzbot hit the following crash on upstream commit > >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 > >> >> >> >> >> >> > 21:20:27 2018 +) > >> >> >> >> >> >> > Linux 4.16 > >> >> >> >> >> >> > syzbot dashboard link: > >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> >> >> >> >> > > >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash > >> >> >> >> >> >> > yet. > >> >> >> >> >> >> > Raw console output: > >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> >> >> >> >> > Kernel config: > >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> >> >> >> >> > > >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following > >> >> >> >> >> >> > tag to the commit: > >> >> >> >> >> >> > Reported-by: > >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See > >> >> >> >> >> >> > footer for > >> >> >> >> >> >> > details. > >> >> >> >> >> >> > If you forward the report, please keep this part and the > >> >> >> >> >> >> > footer. > >> >> >> >> >> >> > > >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 > >> >> >> >> >> >> > reiserfs_getopt: unknown mount > >> >> >> >> >> >> > option "g �;e�K�>pquota" > >> >> >> >> >> > > >> >> >> >> >> > Might not hurt to look into the above, though perhaps this > >> >> >> >> >> > is just syzkaller > >> >> >> >> >> > playing around with mount options. > >> >> >> >> >> > > >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 > >> >> >> >> >> >> > seconds. > >> >> >> >> >> >> >Not tainted 4.16.0+ #10 > >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >> >> >> >> >> >> > disables this message. > >> >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> >> >> >> >> > Call Trace: > >> >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] > >> >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] > >> >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] > >> >> >> >> >> >> > wait_for_completion+0x415/0x770 > >> >> >> >> >> >> > kernel/sched/completion.c:139 > >> >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 > >> >> >> >> >> >> > kernel/rcu/tree.c:3212 > >> >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> >> >> >> >> > >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is > >> >> >> >> >> >> preventing > >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running > >> >> >> >> >> >> in kernel > >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or > >> >> >> >> >> >> if RCU > >> >> >> >> >> >> somehow missed a transition into idle or user space. > >> >> >> >> >> > > >> >> >> >> >> > The RCU CPU stall warning below strongly supports this > >> >> >> >> >> > position ... > >> >> >> >> >> > >> >> >> >> >> I think this is this guy then: > >> >> >> >> >> > >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> >> >> >> >> > >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes > >> >> >> >> > > >> >> >> >> > Seems likely to me! > >> >> >> >> > > >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we > >> >> >> >> >> have, I > >> >> >> >> >> think we need some kind of priority between them. I.e. we have > >> >> >> >> >> rcu > >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent > >> >> >> >> >> machine > >> >> >> >> >> hang and maybe something else. It would be useful if they fire > >> >> >> >> >> deterministically according to priorities. If there is an rcu > >> >> >> >> >> stall, > >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU > >> >>
Re: INFO: task hung in perf_trace_event_unreg
On Tue, Apr 10, 2018 at 01:13:13PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney > wrote: > > On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote: > >> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney > >> wrote: > >> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote: > >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney > >> >> wrote: > >> >> >> >> >> >> > >> >> >> >> >> >> > Hello, > >> >> >> >> >> >> > > >> >> >> >> >> >> > syzbot hit the following crash on upstream commit > >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 > >> >> >> >> >> >> > 21:20:27 2018 +) > >> >> >> >> >> >> > Linux 4.16 > >> >> >> >> >> >> > syzbot dashboard link: > >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> >> >> >> >> > > >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash > >> >> >> >> >> >> > yet. > >> >> >> >> >> >> > Raw console output: > >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> >> >> >> >> > Kernel config: > >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> >> >> >> >> > > >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following > >> >> >> >> >> >> > tag to the commit: > >> >> >> >> >> >> > Reported-by: > >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See > >> >> >> >> >> >> > footer for > >> >> >> >> >> >> > details. > >> >> >> >> >> >> > If you forward the report, please keep this part and the > >> >> >> >> >> >> > footer. > >> >> >> >> >> >> > > >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 > >> >> >> >> >> >> > reiserfs_getopt: unknown mount > >> >> >> >> >> >> > option "g �;e�K�>pquota" > >> >> >> >> >> > > >> >> >> >> >> > Might not hurt to look into the above, though perhaps this > >> >> >> >> >> > is just syzkaller > >> >> >> >> >> > playing around with mount options. > >> >> >> >> >> > > >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 > >> >> >> >> >> >> > seconds. > >> >> >> >> >> >> >Not tainted 4.16.0+ #10 > >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >> >> >> >> >> >> > disables this message. > >> >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> >> >> >> >> > Call Trace: > >> >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] > >> >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] > >> >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] > >> >> >> >> >> >> > wait_for_completion+0x415/0x770 > >> >> >> >> >> >> > kernel/sched/completion.c:139 > >> >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 > >> >> >> >> >> >> > kernel/rcu/tree.c:3212 > >> >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> >> >> >> >> > >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is > >> >> >> >> >> >> preventing > >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running > >> >> >> >> >> >> in kernel > >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or > >> >> >> >> >> >> if RCU > >> >> >> >> >> >> somehow missed a transition into idle or user space. > >> >> >> >> >> > > >> >> >> >> >> > The RCU CPU stall warning below strongly supports this > >> >> >> >> >> > position ... > >> >> >> >> >> > >> >> >> >> >> I think this is this guy then: > >> >> >> >> >> > >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> >> >> >> >> > >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes > >> >> >> >> > > >> >> >> >> > Seems likely to me! > >> >> >> >> > > >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we > >> >> >> >> >> have, I > >> >> >> >> >> think we need some kind of priority between them. I.e. we have > >> >> >> >> >> rcu > >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent > >> >> >> >> >> machine > >> >> >> >> >> hang and maybe something else. It would be useful if they fire > >> >> >> >> >> deterministically according to priorities. If there is an rcu > >> >> >> >> >> stall, > >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU > >> >> >> >> >> stall, > >> >> >> >> >> but a workqueue stall, then that's always detected
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenneywrote: > On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote: >> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney >> wrote: >> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote: >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney >> >> wrote: >> >> >> >> >> >> >> >> >> >> >> >> > Hello, >> >> >> >> >> >> > >> >> >> >> >> >> > syzbot hit the following crash on upstream commit >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 >> >> >> >> >> >> > 21:20:27 2018 +) >> >> >> >> >> >> > Linux 4.16 >> >> >> >> >> >> > syzbot dashboard link: >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> >> >> >> >> > >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash >> >> >> >> >> >> > yet. >> >> >> >> >> >> > Raw console output: >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> >> >> >> >> > Kernel config: >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> >> >> >> >> > >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag >> >> >> >> >> >> > to the commit: >> >> >> >> >> >> > Reported-by: >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See >> >> >> >> >> >> > footer for >> >> >> >> >> >> > details. >> >> >> >> >> >> > If you forward the report, please keep this part and the >> >> >> >> >> >> > footer. >> >> >> >> >> >> > >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 >> >> >> >> >> >> > reiserfs_getopt: unknown mount >> >> >> >> >> >> > option "g �;e�K�>pquota" >> >> >> >> >> > >> >> >> >> >> > Might not hurt to look into the above, though perhaps this is >> >> >> >> >> > just syzkaller >> >> >> >> >> > playing around with mount options. >> >> >> >> >> > >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 >> >> >> >> >> >> > seconds. >> >> >> >> >> >> >Not tainted 4.16.0+ #10 >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> >> >> >> >> >> > this message. >> >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> >> >> >> >> > Call Trace: >> >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] >> >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] >> >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] >> >> >> >> >> >> > wait_for_completion+0x415/0x770 >> >> >> >> >> >> > kernel/sched/completion.c:139 >> >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 >> >> >> >> >> >> > kernel/rcu/tree.c:3212 >> >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is >> >> >> >> >> >> preventing >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running >> >> >> >> >> >> in kernel >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if >> >> >> >> >> >> RCU >> >> >> >> >> >> somehow missed a transition into idle or user space. >> >> >> >> >> > >> >> >> >> >> > The RCU CPU stall warning below strongly supports this >> >> >> >> >> > position ... >> >> >> >> >> >> >> >> >> >> I think this is this guy then: >> >> >> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> >> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes >> >> >> >> > >> >> >> >> > Seems likely to me! >> >> >> >> > >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we >> >> >> >> >> have, I >> >> >> >> >> think we need some kind of priority between them. I.e. we have >> >> >> >> >> rcu >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent >> >> >> >> >> machine >> >> >> >> >> hang and maybe something else. It would be useful if they fire >> >> >> >> >> deterministically according to priorities. If there is an rcu >> >> >> >> >> stall, >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU >> >> >> >> >> stall, >> >> >> >> >> but a workqueue stall, then that's always detected as workqueue >> >> >> >> >> stall, >> >> >> >> >> etc. >> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that >> >> >> >> >> can be >> >> >> >>
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 9, 2018 at 8:11 PM, Paul E. McKenney wrote: > On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote: >> On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney >> wrote: >> > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote: >> >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney >> >> wrote: >> >> >> >> >> >> >> >> >> >> >> >> > Hello, >> >> >> >> >> >> > >> >> >> >> >> >> > syzbot hit the following crash on upstream commit >> >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 >> >> >> >> >> >> > 21:20:27 2018 +) >> >> >> >> >> >> > Linux 4.16 >> >> >> >> >> >> > syzbot dashboard link: >> >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> >> >> >> >> > >> >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash >> >> >> >> >> >> > yet. >> >> >> >> >> >> > Raw console output: >> >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> >> >> >> >> > Kernel config: >> >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> >> >> >> >> > >> >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag >> >> >> >> >> >> > to the commit: >> >> >> >> >> >> > Reported-by: >> >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See >> >> >> >> >> >> > footer for >> >> >> >> >> >> > details. >> >> >> >> >> >> > If you forward the report, please keep this part and the >> >> >> >> >> >> > footer. >> >> >> >> >> >> > >> >> >> >> >> >> > REISERFS warning (device loop4): super-6502 >> >> >> >> >> >> > reiserfs_getopt: unknown mount >> >> >> >> >> >> > option "g �;e�K�>pquota" >> >> >> >> >> > >> >> >> >> >> > Might not hurt to look into the above, though perhaps this is >> >> >> >> >> > just syzkaller >> >> >> >> >> > playing around with mount options. >> >> >> >> >> > >> >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 >> >> >> >> >> >> > seconds. >> >> >> >> >> >> >Not tainted 4.16.0+ #10 >> >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> >> >> >> >> >> > this message. >> >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> >> >> >> >> > Call Trace: >> >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] >> >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] >> >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] >> >> >> >> >> >> > wait_for_completion+0x415/0x770 >> >> >> >> >> >> > kernel/sched/completion.c:139 >> >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 >> >> >> >> >> >> > kernel/rcu/tree.c:3212 >> >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is >> >> >> >> >> >> preventing >> >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running >> >> >> >> >> >> in kernel >> >> >> >> >> >> space and never scheduling, that can cause this issue. Or if >> >> >> >> >> >> RCU >> >> >> >> >> >> somehow missed a transition into idle or user space. >> >> >> >> >> > >> >> >> >> >> > The RCU CPU stall warning below strongly supports this >> >> >> >> >> > position ... >> >> >> >> >> >> >> >> >> >> I think this is this guy then: >> >> >> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> >> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes >> >> >> >> > >> >> >> >> > Seems likely to me! >> >> >> >> > >> >> >> >> >> Looking retrospectively at the various hang/stall bugs that we >> >> >> >> >> have, I >> >> >> >> >> think we need some kind of priority between them. I.e. we have >> >> >> >> >> rcu >> >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent >> >> >> >> >> machine >> >> >> >> >> hang and maybe something else. It would be useful if they fire >> >> >> >> >> deterministically according to priorities. If there is an rcu >> >> >> >> >> stall, >> >> >> >> >> that's always detected as CPU stall. Then if there is no RCU >> >> >> >> >> stall, >> >> >> >> >> but a workqueue stall, then that's always detected as workqueue >> >> >> >> >> stall, >> >> >> >> >> etc. >> >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that >> >> >> >> >> can be >> >> >> >> >> detected either RCU stall or a task hung, producing 2 different >> >> >> >> >>
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney >wrote: > > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote: > >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney > >> wrote: > >> >> >> >> >> > >> >> >> >> >> > Hello, > >> >> >> >> >> > > >> >> >> >> >> > syzbot hit the following crash on upstream commit > >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 > >> >> >> >> >> > 2018 +) > >> >> >> >> >> > Linux 4.16 > >> >> >> >> >> > syzbot dashboard link: > >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> >> >> >> > > >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash > >> >> >> >> >> > yet. > >> >> >> >> >> > Raw console output: > >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> >> >> >> > Kernel config: > >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> >> >> >> > > >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag > >> >> >> >> >> > to the commit: > >> >> >> >> >> > Reported-by: > >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See > >> >> >> >> >> > footer for > >> >> >> >> >> > details. > >> >> >> >> >> > If you forward the report, please keep this part and the > >> >> >> >> >> > footer. > >> >> >> >> >> > > >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: > >> >> >> >> >> > unknown mount > >> >> >> >> >> > option "g �;e�K�>pquota" > >> >> >> >> > > >> >> >> >> > Might not hurt to look into the above, though perhaps this is > >> >> >> >> > just syzkaller > >> >> >> >> > playing around with mount options. > >> >> >> >> > > >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 > >> >> >> >> >> > seconds. > >> >> >> >> >> >Not tainted 4.16.0+ #10 > >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > >> >> >> >> >> > this message. > >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> >> >> >> > Call Trace: > >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] > >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] > >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] > >> >> >> >> >> > wait_for_completion+0x415/0x770 > >> >> >> >> >> > kernel/sched/completion.c:139 > >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> >> >> >> > >> >> >> >> >> I don't think this is a perf issue. Looks like something is > >> >> >> >> >> preventing > >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in > >> >> >> >> >> kernel > >> >> >> >> >> space and never scheduling, that can cause this issue. Or if > >> >> >> >> >> RCU > >> >> >> >> >> somehow missed a transition into idle or user space. > >> >> >> >> > > >> >> >> >> > The RCU CPU stall warning below strongly supports this position > >> >> >> >> > ... > >> >> >> >> > >> >> >> >> I think this is this guy then: > >> >> >> >> > >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> >> >> >> > >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes > >> >> >> > > >> >> >> > Seems likely to me! > >> >> >> > > >> >> >> >> Looking retrospectively at the various hang/stall bugs that we > >> >> >> >> have, I > >> >> >> >> think we need some kind of priority between them. I.e. we have rcu > >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent > >> >> >> >> machine > >> >> >> >> hang and maybe something else. It would be useful if they fire > >> >> >> >> deterministically according to priorities. If there is an rcu > >> >> >> >> stall, > >> >> >> >> that's always detected as CPU stall. Then if there is no RCU > >> >> >> >> stall, > >> >> >> >> but a workqueue stall, then that's always detected as workqueue > >> >> >> >> stall, > >> >> >> >> etc. > >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that > >> >> >> >> can be > >> >> >> >> detected either RCU stall or a task hung, producing 2 different > >> >> >> >> bug > >> >> >> >> reports (which is bad). > >> >> >> >> One can say that it's only a matter of tuning timeouts, but at > >> >> >> >> least
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 09, 2018 at 06:28:16PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney > wrote: > > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote: > >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney > >> wrote: > >> >> >> >> >> > >> >> >> >> >> > Hello, > >> >> >> >> >> > > >> >> >> >> >> > syzbot hit the following crash on upstream commit > >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 > >> >> >> >> >> > 2018 +) > >> >> >> >> >> > Linux 4.16 > >> >> >> >> >> > syzbot dashboard link: > >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> >> >> >> > > >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash > >> >> >> >> >> > yet. > >> >> >> >> >> > Raw console output: > >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> >> >> >> > Kernel config: > >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> >> >> >> > > >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag > >> >> >> >> >> > to the commit: > >> >> >> >> >> > Reported-by: > >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See > >> >> >> >> >> > footer for > >> >> >> >> >> > details. > >> >> >> >> >> > If you forward the report, please keep this part and the > >> >> >> >> >> > footer. > >> >> >> >> >> > > >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: > >> >> >> >> >> > unknown mount > >> >> >> >> >> > option "g �;e�K�>pquota" > >> >> >> >> > > >> >> >> >> > Might not hurt to look into the above, though perhaps this is > >> >> >> >> > just syzkaller > >> >> >> >> > playing around with mount options. > >> >> >> >> > > >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 > >> >> >> >> >> > seconds. > >> >> >> >> >> >Not tainted 4.16.0+ #10 > >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > >> >> >> >> >> > this message. > >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> >> >> >> > Call Trace: > >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] > >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] > >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] > >> >> >> >> >> > wait_for_completion+0x415/0x770 > >> >> >> >> >> > kernel/sched/completion.c:139 > >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> >> >> >> > >> >> >> >> >> I don't think this is a perf issue. Looks like something is > >> >> >> >> >> preventing > >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in > >> >> >> >> >> kernel > >> >> >> >> >> space and never scheduling, that can cause this issue. Or if > >> >> >> >> >> RCU > >> >> >> >> >> somehow missed a transition into idle or user space. > >> >> >> >> > > >> >> >> >> > The RCU CPU stall warning below strongly supports this position > >> >> >> >> > ... > >> >> >> >> > >> >> >> >> I think this is this guy then: > >> >> >> >> > >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> >> >> >> > >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes > >> >> >> > > >> >> >> > Seems likely to me! > >> >> >> > > >> >> >> >> Looking retrospectively at the various hang/stall bugs that we > >> >> >> >> have, I > >> >> >> >> think we need some kind of priority between them. I.e. we have rcu > >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent > >> >> >> >> machine > >> >> >> >> hang and maybe something else. It would be useful if they fire > >> >> >> >> deterministically according to priorities. If there is an rcu > >> >> >> >> stall, > >> >> >> >> that's always detected as CPU stall. Then if there is no RCU > >> >> >> >> stall, > >> >> >> >> but a workqueue stall, then that's always detected as workqueue > >> >> >> >> stall, > >> >> >> >> etc. > >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that > >> >> >> >> can be > >> >> >> >> detected either RCU stall or a task hung, producing 2 different > >> >> >> >> bug > >> >> >> >> reports (which is bad). > >> >> >> >> One can say that it's only a matter of tuning timeouts, but at > >> >> >> >> least > >> >> >> >> task hung detector has a problem that if
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenneywrote: > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote: >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney >> wrote: >> >> >> >> >> >> >> >> >> >> > Hello, >> >> >> >> >> > >> >> >> >> >> > syzbot hit the following crash on upstream commit >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 >> >> >> >> >> > 2018 +) >> >> >> >> >> > Linux 4.16 >> >> >> >> >> > syzbot dashboard link: >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> >> >> >> > >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet. >> >> >> >> >> > Raw console output: >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> >> >> >> > Kernel config: >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> >> >> >> > >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to >> >> >> >> >> > the commit: >> >> >> >> >> > Reported-by: >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See >> >> >> >> >> > footer for >> >> >> >> >> > details. >> >> >> >> >> > If you forward the report, please keep this part and the >> >> >> >> >> > footer. >> >> >> >> >> > >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: >> >> >> >> >> > unknown mount >> >> >> >> >> > option "g �;e�K�>pquota" >> >> >> >> > >> >> >> >> > Might not hurt to look into the above, though perhaps this is >> >> >> >> > just syzkaller >> >> >> >> > playing around with mount options. >> >> >> >> > >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 >> >> >> >> >> > seconds. >> >> >> >> >> >Not tainted 4.16.0+ #10 >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> >> >> >> >> > this message. >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> >> >> >> > Call Trace: >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] >> >> >> >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is >> >> >> >> >> preventing >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in >> >> >> >> >> kernel >> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU >> >> >> >> >> somehow missed a transition into idle or user space. >> >> >> >> > >> >> >> >> > The RCU CPU stall warning below strongly supports this position >> >> >> >> > ... >> >> >> >> >> >> >> >> I think this is this guy then: >> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes >> >> >> > >> >> >> > Seems likely to me! >> >> >> > >> >> >> >> Looking retrospectively at the various hang/stall bugs that we >> >> >> >> have, I >> >> >> >> think we need some kind of priority between them. I.e. we have rcu >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine >> >> >> >> hang and maybe something else. It would be useful if they fire >> >> >> >> deterministically according to priorities. If there is an rcu stall, >> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall, >> >> >> >> but a workqueue stall, then that's always detected as workqueue >> >> >> >> stall, >> >> >> >> etc. >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can >> >> >> >> be >> >> >> >> detected either RCU stall or a task hung, producing 2 different bug >> >> >> >> reports (which is bad). >> >> >> >> One can say that it's only a matter of tuning timeouts, but at least >> >> >> >> task hung detector has a problem that if you set timeout to X, it >> >> >> >> can >> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need >> >> >> >> quite >> >> >> >> large timeout (a minute may not be enough), and on the other hand we >> >> >> >> can't wait for an hour just to make sure that the machine is indeed >> >> >> >> dead (these
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 9, 2018 at 6:20 PM, Paul E. McKenney wrote: > On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote: >> On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney >> wrote: >> >> >> >> >> >> >> >> >> >> > Hello, >> >> >> >> >> > >> >> >> >> >> > syzbot hit the following crash on upstream commit >> >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 >> >> >> >> >> > 2018 +) >> >> >> >> >> > Linux 4.16 >> >> >> >> >> > syzbot dashboard link: >> >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> >> >> >> > >> >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet. >> >> >> >> >> > Raw console output: >> >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> >> >> >> > Kernel config: >> >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> >> >> >> > >> >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to >> >> >> >> >> > the commit: >> >> >> >> >> > Reported-by: >> >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> >> >> >> > It will help syzbot understand when the bug is fixed. See >> >> >> >> >> > footer for >> >> >> >> >> > details. >> >> >> >> >> > If you forward the report, please keep this part and the >> >> >> >> >> > footer. >> >> >> >> >> > >> >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: >> >> >> >> >> > unknown mount >> >> >> >> >> > option "g �;e�K�>pquota" >> >> >> >> > >> >> >> >> > Might not hurt to look into the above, though perhaps this is >> >> >> >> > just syzkaller >> >> >> >> > playing around with mount options. >> >> >> >> > >> >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 >> >> >> >> >> > seconds. >> >> >> >> >> >Not tainted 4.16.0+ #10 >> >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> >> >> >> >> > this message. >> >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> >> >> >> > Call Trace: >> >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] >> >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] >> >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] >> >> >> >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 >> >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> >> >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 >> >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is >> >> >> >> >> preventing >> >> >> >> >> rcu_sched from completing. If there's a CPU that is running in >> >> >> >> >> kernel >> >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU >> >> >> >> >> somehow missed a transition into idle or user space. >> >> >> >> > >> >> >> >> > The RCU CPU stall warning below strongly supports this position >> >> >> >> > ... >> >> >> >> >> >> >> >> I think this is this guy then: >> >> >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes >> >> >> > >> >> >> > Seems likely to me! >> >> >> > >> >> >> >> Looking retrospectively at the various hang/stall bugs that we >> >> >> >> have, I >> >> >> >> think we need some kind of priority between them. I.e. we have rcu >> >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine >> >> >> >> hang and maybe something else. It would be useful if they fire >> >> >> >> deterministically according to priorities. If there is an rcu stall, >> >> >> >> that's always detected as CPU stall. Then if there is no RCU stall, >> >> >> >> but a workqueue stall, then that's always detected as workqueue >> >> >> >> stall, >> >> >> >> etc. >> >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can >> >> >> >> be >> >> >> >> detected either RCU stall or a task hung, producing 2 different bug >> >> >> >> reports (which is bad). >> >> >> >> One can say that it's only a matter of tuning timeouts, but at least >> >> >> >> task hung detector has a problem that if you set timeout to X, it >> >> >> >> can >> >> >> >> detect hung anywhere between X and 2*X. And on one hand we need >> >> >> >> quite >> >> >> >> large timeout (a minute may not be enough), and on the other hand we >> >> >> >> can't wait for an hour just to make sure that the machine is indeed >> >> >> >> dead (these things happen every few minutes). >> >> >> > >> >> >> > I
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney >wrote: > >> >> >> >> > >> >> >> >> > Hello, > >> >> >> >> > > >> >> >> >> > syzbot hit the following crash on upstream commit > >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 > >> >> >> >> > 2018 +) > >> >> >> >> > Linux 4.16 > >> >> >> >> > syzbot dashboard link: > >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> >> >> > > >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet. > >> >> >> >> > Raw console output: > >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> >> >> > Kernel config: > >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> >> >> > > >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to > >> >> >> >> > the commit: > >> >> >> >> > Reported-by: > >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> >> >> > It will help syzbot understand when the bug is fixed. See > >> >> >> >> > footer for > >> >> >> >> > details. > >> >> >> >> > If you forward the report, please keep this part and the footer. > >> >> >> >> > > >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: > >> >> >> >> > unknown mount > >> >> >> >> > option "g �;e�K�>pquota" > >> >> >> > > >> >> >> > Might not hurt to look into the above, though perhaps this is just > >> >> >> > syzkaller > >> >> >> > playing around with mount options. > >> >> >> > > >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 > >> >> >> >> > seconds. > >> >> >> >> >Not tainted 4.16.0+ #10 > >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > >> >> >> >> > this message. > >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> >> >> > Call Trace: > >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] > >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] > >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] > >> >> >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 > >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> >> >> > >> >> >> >> I don't think this is a perf issue. Looks like something is > >> >> >> >> preventing > >> >> >> >> rcu_sched from completing. If there's a CPU that is running in > >> >> >> >> kernel > >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU > >> >> >> >> somehow missed a transition into idle or user space. > >> >> >> > > >> >> >> > The RCU CPU stall warning below strongly supports this position ... > >> >> >> > >> >> >> I think this is this guy then: > >> >> >> > >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> >> >> > >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes > >> >> > > >> >> > Seems likely to me! > >> >> > > >> >> >> Looking retrospectively at the various hang/stall bugs that we have, > >> >> >> I > >> >> >> think we need some kind of priority between them. I.e. we have rcu > >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine > >> >> >> hang and maybe something else. It would be useful if they fire > >> >> >> deterministically according to priorities. If there is an rcu stall, > >> >> >> that's always detected as CPU stall. Then if there is no RCU stall, > >> >> >> but a workqueue stall, then that's always detected as workqueue > >> >> >> stall, > >> >> >> etc. > >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can > >> >> >> be > >> >> >> detected either RCU stall or a task hung, producing 2 different bug > >> >> >> reports (which is bad). > >> >> >> One can say that it's only a matter of tuning timeouts, but at least > >> >> >> task hung detector has a problem that if you set timeout to X, it can > >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite > >> >> >> large timeout (a minute may not be enough), and on the other hand we > >> >> >> can't wait for an hour just to make sure that the machine is indeed > >> >> >> dead (these things happen every few minutes). > >> >> > > >> >> > I suppose that we could have a global variable that was set to the > >> >> > priority of the complaint in question, which would suppress all > >> >> > lower-priority complaints. Might
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 09, 2018 at 02:54:20PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney > wrote: > >> >> >> >> > >> >> >> >> > Hello, > >> >> >> >> > > >> >> >> >> > syzbot hit the following crash on upstream commit > >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 > >> >> >> >> > 2018 +) > >> >> >> >> > Linux 4.16 > >> >> >> >> > syzbot dashboard link: > >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> >> >> > > >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet. > >> >> >> >> > Raw console output: > >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> >> >> > Kernel config: > >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> >> >> > > >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to > >> >> >> >> > the commit: > >> >> >> >> > Reported-by: > >> >> >> >> > syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> >> >> > It will help syzbot understand when the bug is fixed. See > >> >> >> >> > footer for > >> >> >> >> > details. > >> >> >> >> > If you forward the report, please keep this part and the footer. > >> >> >> >> > > >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: > >> >> >> >> > unknown mount > >> >> >> >> > option "g �;e�K�>pquota" > >> >> >> > > >> >> >> > Might not hurt to look into the above, though perhaps this is just > >> >> >> > syzkaller > >> >> >> > playing around with mount options. > >> >> >> > > >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 > >> >> >> >> > seconds. > >> >> >> >> >Not tainted 4.16.0+ #10 > >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > >> >> >> >> > this message. > >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> >> >> > Call Trace: > >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] > >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] > >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] > >> >> >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 > >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> >> >> > >> >> >> >> I don't think this is a perf issue. Looks like something is > >> >> >> >> preventing > >> >> >> >> rcu_sched from completing. If there's a CPU that is running in > >> >> >> >> kernel > >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU > >> >> >> >> somehow missed a transition into idle or user space. > >> >> >> > > >> >> >> > The RCU CPU stall warning below strongly supports this position ... > >> >> >> > >> >> >> I think this is this guy then: > >> >> >> > >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> >> >> > >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes > >> >> > > >> >> > Seems likely to me! > >> >> > > >> >> >> Looking retrospectively at the various hang/stall bugs that we have, > >> >> >> I > >> >> >> think we need some kind of priority between them. I.e. we have rcu > >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine > >> >> >> hang and maybe something else. It would be useful if they fire > >> >> >> deterministically according to priorities. If there is an rcu stall, > >> >> >> that's always detected as CPU stall. Then if there is no RCU stall, > >> >> >> but a workqueue stall, then that's always detected as workqueue > >> >> >> stall, > >> >> >> etc. > >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can > >> >> >> be > >> >> >> detected either RCU stall or a task hung, producing 2 different bug > >> >> >> reports (which is bad). > >> >> >> One can say that it's only a matter of tuning timeouts, but at least > >> >> >> task hung detector has a problem that if you set timeout to X, it can > >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite > >> >> >> large timeout (a minute may not be enough), and on the other hand we > >> >> >> can't wait for an hour just to make sure that the machine is indeed > >> >> >> dead (these things happen every few minutes). > >> >> > > >> >> > I suppose that we could have a global variable that was set to the > >> >> > priority of the complaint in question, which would suppress all > >> >> > lower-priority complaints. Might need to be opt-in, though -- I
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenneywrote: >> >> >> >> >> >> >> >> > Hello, >> >> >> >> > >> >> >> >> > syzbot hit the following crash on upstream commit >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 >> >> >> >> > +) >> >> >> >> > Linux 4.16 >> >> >> >> > syzbot dashboard link: >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> >> >> > >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet. >> >> >> >> > Raw console output: >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> >> >> > Kernel config: >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> >> >> > >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to >> >> >> >> > the commit: >> >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> >> >> > It will help syzbot understand when the bug is fixed. See footer >> >> >> >> > for >> >> >> >> > details. >> >> >> >> > If you forward the report, please keep this part and the footer. >> >> >> >> > >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: >> >> >> >> > unknown mount >> >> >> >> > option "g �;e�K�>pquota" >> >> >> > >> >> >> > Might not hurt to look into the above, though perhaps this is just >> >> >> > syzkaller >> >> >> > playing around with mount options. >> >> >> > >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. >> >> >> >> >Not tainted 4.16.0+ #10 >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this >> >> >> >> > message. >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> >> >> > Call Trace: >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] >> >> >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is >> >> >> >> preventing >> >> >> >> rcu_sched from completing. If there's a CPU that is running in >> >> >> >> kernel >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU >> >> >> >> somehow missed a transition into idle or user space. >> >> >> > >> >> >> > The RCU CPU stall warning below strongly supports this position ... >> >> >> >> >> >> I think this is this guy then: >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes >> >> > >> >> > Seems likely to me! >> >> > >> >> >> Looking retrospectively at the various hang/stall bugs that we have, I >> >> >> think we need some kind of priority between them. I.e. we have rcu >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine >> >> >> hang and maybe something else. It would be useful if they fire >> >> >> deterministically according to priorities. If there is an rcu stall, >> >> >> that's always detected as CPU stall. Then if there is no RCU stall, >> >> >> but a workqueue stall, then that's always detected as workqueue stall, >> >> >> etc. >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be >> >> >> detected either RCU stall or a task hung, producing 2 different bug >> >> >> reports (which is bad). >> >> >> One can say that it's only a matter of tuning timeouts, but at least >> >> >> task hung detector has a problem that if you set timeout to X, it can >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite >> >> >> large timeout (a minute may not be enough), and on the other hand we >> >> >> can't wait for an hour just to make sure that the machine is indeed >> >> >> dead (these things happen every few minutes). >> >> > >> >> > I suppose that we could have a global variable that was set to the >> >> > priority of the complaint in question, which would suppress all >> >> > lower-priority complaints. Might need to be opt-in, though -- I would >> >> > guess that not everyone is going to be happy with one complaint >> >> > suppressing >> >> > others, especially given the possibility that the two complaints might >> >> > be about different things. >> >> > >> >> > Or did you have something more deft in mind? >> >> >> >>
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 2, 2018 at 7:23 PM, Paul E. McKenney wrote: >> >> >> >> >> >> >> >> > Hello, >> >> >> >> > >> >> >> >> > syzbot hit the following crash on upstream commit >> >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 >> >> >> >> > +) >> >> >> >> > Linux 4.16 >> >> >> >> > syzbot dashboard link: >> >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> >> >> > >> >> >> >> > Unfortunately, I don't have any reproducer for this crash yet. >> >> >> >> > Raw console output: >> >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> >> >> > Kernel config: >> >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> >> >> > >> >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to >> >> >> >> > the commit: >> >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> >> >> > It will help syzbot understand when the bug is fixed. See footer >> >> >> >> > for >> >> >> >> > details. >> >> >> >> > If you forward the report, please keep this part and the footer. >> >> >> >> > >> >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: >> >> >> >> > unknown mount >> >> >> >> > option "g �;e�K�>pquota" >> >> >> > >> >> >> > Might not hurt to look into the above, though perhaps this is just >> >> >> > syzkaller >> >> >> > playing around with mount options. >> >> >> > >> >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. >> >> >> >> >Not tainted 4.16.0+ #10 >> >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this >> >> >> >> > message. >> >> >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> >> >> > Call Trace: >> >> >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] >> >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] >> >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] >> >> >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 >> >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 >> >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is >> >> >> >> preventing >> >> >> >> rcu_sched from completing. If there's a CPU that is running in >> >> >> >> kernel >> >> >> >> space and never scheduling, that can cause this issue. Or if RCU >> >> >> >> somehow missed a transition into idle or user space. >> >> >> > >> >> >> > The RCU CPU stall warning below strongly supports this position ... >> >> >> >> >> >> I think this is this guy then: >> >> >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes >> >> > >> >> > Seems likely to me! >> >> > >> >> >> Looking retrospectively at the various hang/stall bugs that we have, I >> >> >> think we need some kind of priority between them. I.e. we have rcu >> >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine >> >> >> hang and maybe something else. It would be useful if they fire >> >> >> deterministically according to priorities. If there is an rcu stall, >> >> >> that's always detected as CPU stall. Then if there is no RCU stall, >> >> >> but a workqueue stall, then that's always detected as workqueue stall, >> >> >> etc. >> >> >> Currently if we have an RCU stall (effectively CPU stall), that can be >> >> >> detected either RCU stall or a task hung, producing 2 different bug >> >> >> reports (which is bad). >> >> >> One can say that it's only a matter of tuning timeouts, but at least >> >> >> task hung detector has a problem that if you set timeout to X, it can >> >> >> detect hung anywhere between X and 2*X. And on one hand we need quite >> >> >> large timeout (a minute may not be enough), and on the other hand we >> >> >> can't wait for an hour just to make sure that the machine is indeed >> >> >> dead (these things happen every few minutes). >> >> > >> >> > I suppose that we could have a global variable that was set to the >> >> > priority of the complaint in question, which would suppress all >> >> > lower-priority complaints. Might need to be opt-in, though -- I would >> >> > guess that not everyone is going to be happy with one complaint >> >> > suppressing >> >> > others, especially given the possibility that the two complaints might >> >> > be about different things. >> >> > >> >> > Or did you have something more deft in mind? >> >> >> >> >> >> syzkaller generally
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 02, 2018 at 07:11:50PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney >wrote: > > On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote: > >> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney > >> wrote: > >> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote: > >> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney > >> >> wrote: > >> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: > >> >> >> On Mon, 02 Apr 2018 02:20:02 -0700 > >> >> >> syzbot wrote: > >> >> >> > >> >> >> > Hello, > >> >> >> > > >> >> >> > syzbot hit the following crash on upstream commit > >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 > >> >> >> > +) > >> >> >> > Linux 4.16 > >> >> >> > syzbot dashboard link: > >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> >> > > >> >> >> > Unfortunately, I don't have any reproducer for this crash yet. > >> >> >> > Raw console output: > >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> >> > Kernel config: > >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> >> > > >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the > >> >> >> > commit: > >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> >> > It will help syzbot understand when the bug is fixed. See footer > >> >> >> > for > >> >> >> > details. > >> >> >> > If you forward the report, please keep this part and the footer. > >> >> >> > > >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: > >> >> >> > unknown mount > >> >> >> > option "g �;e�K�>pquota" > >> >> > > >> >> > Might not hurt to look into the above, though perhaps this is just > >> >> > syzkaller > >> >> > playing around with mount options. > >> >> > > >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. > >> >> >> >Not tainted 4.16.0+ #10 > >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > >> >> >> > message. > >> >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> >> > Call Trace: > >> >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] > >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] > >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] > >> >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 > >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> >> > >> >> >> I don't think this is a perf issue. Looks like something is > >> >> >> preventing > >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel > >> >> >> space and never scheduling, that can cause this issue. Or if RCU > >> >> >> somehow missed a transition into idle or user space. > >> >> > > >> >> > The RCU CPU stall warning below strongly supports this position ... > >> >> > >> >> I think this is this guy then: > >> >> > >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> >> > >> >> #syz dup: INFO: rcu detected stall in __process_echoes > >> > > >> > Seems likely to me! > >> > > >> >> Looking retrospectively at the various hang/stall bugs that we have, I > >> >> think we need some kind of priority between them. I.e. we have rcu > >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine > >> >> hang and maybe something else. It would be useful if they fire > >> >> deterministically according to priorities. If there is an rcu stall, > >> >> that's always detected as CPU stall. Then if there is no RCU stall, > >> >> but a workqueue stall, then that's always detected as workqueue stall, > >> >> etc. > >> >> Currently if we have an RCU stall (effectively CPU stall), that can be > >> >> detected either RCU stall or a task hung, producing 2 different bug > >> >> reports (which is bad). > >> >> One can say that it's only a matter of tuning timeouts, but at least > >> >> task hung detector has a problem that if you set timeout to X, it can > >> >> detect hung anywhere between X and 2*X. And on one hand we need quite > >> >> large timeout (a minute may not be enough), and on the other hand we > >> >> can't wait for an hour just to make sure that the machine is indeed > >> >> dead (these things happen every few minutes). > >> > > >>
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 02, 2018 at 07:11:50PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney > wrote: > > On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote: > >> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney > >> wrote: > >> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote: > >> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney > >> >> wrote: > >> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: > >> >> >> On Mon, 02 Apr 2018 02:20:02 -0700 > >> >> >> syzbot wrote: > >> >> >> > >> >> >> > Hello, > >> >> >> > > >> >> >> > syzbot hit the following crash on upstream commit > >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 > >> >> >> > +) > >> >> >> > Linux 4.16 > >> >> >> > syzbot dashboard link: > >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> >> > > >> >> >> > Unfortunately, I don't have any reproducer for this crash yet. > >> >> >> > Raw console output: > >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> >> > Kernel config: > >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> >> > > >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the > >> >> >> > commit: > >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> >> > It will help syzbot understand when the bug is fixed. See footer > >> >> >> > for > >> >> >> > details. > >> >> >> > If you forward the report, please keep this part and the footer. > >> >> >> > > >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: > >> >> >> > unknown mount > >> >> >> > option "g �;e�K�>pquota" > >> >> > > >> >> > Might not hurt to look into the above, though perhaps this is just > >> >> > syzkaller > >> >> > playing around with mount options. > >> >> > > >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. > >> >> >> >Not tainted 4.16.0+ #10 > >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > >> >> >> > message. > >> >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> >> > Call Trace: > >> >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] > >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] > >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] > >> >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 > >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> >> > >> >> >> I don't think this is a perf issue. Looks like something is > >> >> >> preventing > >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel > >> >> >> space and never scheduling, that can cause this issue. Or if RCU > >> >> >> somehow missed a transition into idle or user space. > >> >> > > >> >> > The RCU CPU stall warning below strongly supports this position ... > >> >> > >> >> I think this is this guy then: > >> >> > >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> >> > >> >> #syz dup: INFO: rcu detected stall in __process_echoes > >> > > >> > Seems likely to me! > >> > > >> >> Looking retrospectively at the various hang/stall bugs that we have, I > >> >> think we need some kind of priority between them. I.e. we have rcu > >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine > >> >> hang and maybe something else. It would be useful if they fire > >> >> deterministically according to priorities. If there is an rcu stall, > >> >> that's always detected as CPU stall. Then if there is no RCU stall, > >> >> but a workqueue stall, then that's always detected as workqueue stall, > >> >> etc. > >> >> Currently if we have an RCU stall (effectively CPU stall), that can be > >> >> detected either RCU stall or a task hung, producing 2 different bug > >> >> reports (which is bad). > >> >> One can say that it's only a matter of tuning timeouts, but at least > >> >> task hung detector has a problem that if you set timeout to X, it can > >> >> detect hung anywhere between X and 2*X. And on one hand we need quite > >> >> large timeout (a minute may not be enough), and on the other hand we > >> >> can't wait for an hour just to make sure that the machine is indeed > >> >> dead (these things happen every few minutes). > >> > > >> > I suppose that we could have a global variable that was set to the > >> > priority of the complaint in question, which would suppress
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenneywrote: > On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote: >> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney >> wrote: >> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote: >> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney >> >> wrote: >> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: >> >> >> On Mon, 02 Apr 2018 02:20:02 -0700 >> >> >> syzbot wrote: >> >> >> >> >> >> > Hello, >> >> >> > >> >> >> > syzbot hit the following crash on upstream commit >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 >> >> >> > +) >> >> >> > Linux 4.16 >> >> >> > syzbot dashboard link: >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> >> > >> >> >> > Unfortunately, I don't have any reproducer for this crash yet. >> >> >> > Raw console output: >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> >> > Kernel config: >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> >> > >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the >> >> >> > commit: >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> >> > It will help syzbot understand when the bug is fixed. See footer for >> >> >> > details. >> >> >> > If you forward the report, please keep this part and the footer. >> >> >> > >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown >> >> >> > mount >> >> >> > option "g �;e�K�>pquota" >> >> > >> >> > Might not hurt to look into the above, though perhaps this is just >> >> > syzkaller >> >> > playing around with mount options. >> >> > >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. >> >> >> >Not tainted 4.16.0+ #10 >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this >> >> >> > message. >> >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> >> > Call Trace: >> >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] >> >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is preventing >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel >> >> >> space and never scheduling, that can cause this issue. Or if RCU >> >> >> somehow missed a transition into idle or user space. >> >> > >> >> > The RCU CPU stall warning below strongly supports this position ... >> >> >> >> I think this is this guy then: >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes >> > >> > Seems likely to me! >> > >> >> Looking retrospectively at the various hang/stall bugs that we have, I >> >> think we need some kind of priority between them. I.e. we have rcu >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine >> >> hang and maybe something else. It would be useful if they fire >> >> deterministically according to priorities. If there is an rcu stall, >> >> that's always detected as CPU stall. Then if there is no RCU stall, >> >> but a workqueue stall, then that's always detected as workqueue stall, >> >> etc. >> >> Currently if we have an RCU stall (effectively CPU stall), that can be >> >> detected either RCU stall or a task hung, producing 2 different bug >> >> reports (which is bad). >> >> One can say that it's only a matter of tuning timeouts, but at least >> >> task hung detector has a problem that if you set timeout to X, it can >> >> detect hung anywhere between X and 2*X. And on one hand we need quite >> >> large timeout (a minute may not be enough), and on the other hand we >> >> can't wait for an hour just to make sure that the machine is indeed >> >> dead (these things happen every few minutes). >> > >> > I suppose that we could have a global variable that was set to the >> > priority of the complaint in question, which would suppress all >> > lower-priority complaints. Might need to be opt-in, though -- I would >> > guess that not everyone is going to be happy with one
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 2, 2018 at 6:39 PM, Paul E. McKenney wrote: > On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote: >> On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney >> wrote: >> > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote: >> >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney >> >> wrote: >> >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: >> >> >> On Mon, 02 Apr 2018 02:20:02 -0700 >> >> >> syzbot wrote: >> >> >> >> >> >> > Hello, >> >> >> > >> >> >> > syzbot hit the following crash on upstream commit >> >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 >> >> >> > +) >> >> >> > Linux 4.16 >> >> >> > syzbot dashboard link: >> >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> >> > >> >> >> > Unfortunately, I don't have any reproducer for this crash yet. >> >> >> > Raw console output: >> >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> >> > Kernel config: >> >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> >> > >> >> >> > IMPORTANT: if you fix the bug, please add the following tag to the >> >> >> > commit: >> >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> >> > It will help syzbot understand when the bug is fixed. See footer for >> >> >> > details. >> >> >> > If you forward the report, please keep this part and the footer. >> >> >> > >> >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown >> >> >> > mount >> >> >> > option "g �;e�K�>pquota" >> >> > >> >> > Might not hurt to look into the above, though perhaps this is just >> >> > syzkaller >> >> > playing around with mount options. >> >> > >> >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. >> >> >> >Not tainted 4.16.0+ #10 >> >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this >> >> >> > message. >> >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> >> > Call Trace: >> >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] >> >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] >> >> >> > wait_for_common kernel/sched/completion.c:118 [inline] >> >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 >> >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 >> >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> >> >> I don't think this is a perf issue. Looks like something is preventing >> >> >> rcu_sched from completing. If there's a CPU that is running in kernel >> >> >> space and never scheduling, that can cause this issue. Or if RCU >> >> >> somehow missed a transition into idle or user space. >> >> > >> >> > The RCU CPU stall warning below strongly supports this position ... >> >> >> >> I think this is this guy then: >> >> >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> >> >> #syz dup: INFO: rcu detected stall in __process_echoes >> > >> > Seems likely to me! >> > >> >> Looking retrospectively at the various hang/stall bugs that we have, I >> >> think we need some kind of priority between them. I.e. we have rcu >> >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine >> >> hang and maybe something else. It would be useful if they fire >> >> deterministically according to priorities. If there is an rcu stall, >> >> that's always detected as CPU stall. Then if there is no RCU stall, >> >> but a workqueue stall, then that's always detected as workqueue stall, >> >> etc. >> >> Currently if we have an RCU stall (effectively CPU stall), that can be >> >> detected either RCU stall or a task hung, producing 2 different bug >> >> reports (which is bad). >> >> One can say that it's only a matter of tuning timeouts, but at least >> >> task hung detector has a problem that if you set timeout to X, it can >> >> detect hung anywhere between X and 2*X. And on one hand we need quite >> >> large timeout (a minute may not be enough), and on the other hand we >> >> can't wait for an hour just to make sure that the machine is indeed >> >> dead (these things happen every few minutes). >> > >> > I suppose that we could have a global variable that was set to the >> > priority of the complaint in question, which would suppress all >> > lower-priority complaints. Might need to be opt-in, though -- I would >> > guess that not everyone is going to be happy with one complaint suppressing >> > others, especially given the possibility that the two complaints might >> > be about different things. >> > >> > Or
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney >wrote: > > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote: > >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney > >> wrote: > >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: > >> >> On Mon, 02 Apr 2018 02:20:02 -0700 > >> >> syzbot wrote: > >> >> > >> >> > Hello, > >> >> > > >> >> > syzbot hit the following crash on upstream commit > >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 > >> >> > +) > >> >> > Linux 4.16 > >> >> > syzbot dashboard link: > >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> > > >> >> > Unfortunately, I don't have any reproducer for this crash yet. > >> >> > Raw console output: > >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> > Kernel config: > >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> > > >> >> > IMPORTANT: if you fix the bug, please add the following tag to the > >> >> > commit: > >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> > It will help syzbot understand when the bug is fixed. See footer for > >> >> > details. > >> >> > If you forward the report, please keep this part and the footer. > >> >> > > >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown > >> >> > mount > >> >> > option "g �;e�K�>pquota" > >> > > >> > Might not hurt to look into the above, though perhaps this is just > >> > syzkaller > >> > playing around with mount options. > >> > > >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. > >> >> >Not tainted 4.16.0+ #10 > >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > >> >> > message. > >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> > Call Trace: > >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] > >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] > >> >> > wait_for_common kernel/sched/completion.c:118 [inline] > >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 > >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> > >> >> I don't think this is a perf issue. Looks like something is preventing > >> >> rcu_sched from completing. If there's a CPU that is running in kernel > >> >> space and never scheduling, that can cause this issue. Or if RCU > >> >> somehow missed a transition into idle or user space. > >> > > >> > The RCU CPU stall warning below strongly supports this position ... > >> > >> I think this is this guy then: > >> > >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> > >> #syz dup: INFO: rcu detected stall in __process_echoes > > > > Seems likely to me! > > > >> Looking retrospectively at the various hang/stall bugs that we have, I > >> think we need some kind of priority between them. I.e. we have rcu > >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine > >> hang and maybe something else. It would be useful if they fire > >> deterministically according to priorities. If there is an rcu stall, > >> that's always detected as CPU stall. Then if there is no RCU stall, > >> but a workqueue stall, then that's always detected as workqueue stall, > >> etc. > >> Currently if we have an RCU stall (effectively CPU stall), that can be > >> detected either RCU stall or a task hung, producing 2 different bug > >> reports (which is bad). > >> One can say that it's only a matter of tuning timeouts, but at least > >> task hung detector has a problem that if you set timeout to X, it can > >> detect hung anywhere between X and 2*X. And on one hand we need quite > >> large timeout (a minute may not be enough), and on the other hand we > >> can't wait for an hour just to make sure that the machine is indeed > >> dead (these things happen every few minutes). > > > > I suppose that we could have a global variable that was set to the > > priority of the complaint in question, which would suppress all > > lower-priority complaints. Might need to be opt-in, though -- I would > > guess that not everyone is going to be happy with one complaint suppressing > > others, especially given the possibility that the two complaints might > > be about different things. > > > > Or did you have something more deft in mind? > >
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 02, 2018 at 06:32:03PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney > wrote: > > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote: > >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney > >> wrote: > >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: > >> >> On Mon, 02 Apr 2018 02:20:02 -0700 > >> >> syzbot wrote: > >> >> > >> >> > Hello, > >> >> > > >> >> > syzbot hit the following crash on upstream commit > >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 > >> >> > +) > >> >> > Linux 4.16 > >> >> > syzbot dashboard link: > >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> >> > > >> >> > Unfortunately, I don't have any reproducer for this crash yet. > >> >> > Raw console output: > >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> >> > Kernel config: > >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> >> > compiler: gcc (GCC) 7.1.1 20170620 > >> >> > > >> >> > IMPORTANT: if you fix the bug, please add the following tag to the > >> >> > commit: > >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> >> > It will help syzbot understand when the bug is fixed. See footer for > >> >> > details. > >> >> > If you forward the report, please keep this part and the footer. > >> >> > > >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown > >> >> > mount > >> >> > option "g �;e�K�>pquota" > >> > > >> > Might not hurt to look into the above, though perhaps this is just > >> > syzkaller > >> > playing around with mount options. > >> > > >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. > >> >> >Not tainted 4.16.0+ #10 > >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > >> >> > message. > >> >> > syz-executor3 D20944 10803 4492 0x8002 > >> >> > Call Trace: > >> >> > context_switch kernel/sched/core.c:2862 [inline] > >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] > >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] > >> >> > wait_for_common kernel/sched/completion.c:118 [inline] > >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 > >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> >> > >> >> I don't think this is a perf issue. Looks like something is preventing > >> >> rcu_sched from completing. If there's a CPU that is running in kernel > >> >> space and never scheduling, that can cause this issue. Or if RCU > >> >> somehow missed a transition into idle or user space. > >> > > >> > The RCU CPU stall warning below strongly supports this position ... > >> > >> I think this is this guy then: > >> > >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > >> > >> #syz dup: INFO: rcu detected stall in __process_echoes > > > > Seems likely to me! > > > >> Looking retrospectively at the various hang/stall bugs that we have, I > >> think we need some kind of priority between them. I.e. we have rcu > >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine > >> hang and maybe something else. It would be useful if they fire > >> deterministically according to priorities. If there is an rcu stall, > >> that's always detected as CPU stall. Then if there is no RCU stall, > >> but a workqueue stall, then that's always detected as workqueue stall, > >> etc. > >> Currently if we have an RCU stall (effectively CPU stall), that can be > >> detected either RCU stall or a task hung, producing 2 different bug > >> reports (which is bad). > >> One can say that it's only a matter of tuning timeouts, but at least > >> task hung detector has a problem that if you set timeout to X, it can > >> detect hung anywhere between X and 2*X. And on one hand we need quite > >> large timeout (a minute may not be enough), and on the other hand we > >> can't wait for an hour just to make sure that the machine is indeed > >> dead (these things happen every few minutes). > > > > I suppose that we could have a global variable that was set to the > > priority of the complaint in question, which would suppress all > > lower-priority complaints. Might need to be opt-in, though -- I would > > guess that not everyone is going to be happy with one complaint suppressing > > others, especially given the possibility that the two complaints might > > be about different things. > > > > Or did you have something more deft in mind? > > > syzkaller generally looks only at the first report. One does not know > if/when there will be a second one,
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenneywrote: > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote: >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney >> wrote: >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: >> >> On Mon, 02 Apr 2018 02:20:02 -0700 >> >> syzbot wrote: >> >> >> >> > Hello, >> >> > >> >> > syzbot hit the following crash on upstream commit >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +) >> >> > Linux 4.16 >> >> > syzbot dashboard link: >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> > >> >> > Unfortunately, I don't have any reproducer for this crash yet. >> >> > Raw console output: >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> > Kernel config: >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> > >> >> > IMPORTANT: if you fix the bug, please add the following tag to the >> >> > commit: >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> > It will help syzbot understand when the bug is fixed. See footer for >> >> > details. >> >> > If you forward the report, please keep this part and the footer. >> >> > >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown >> >> > mount >> >> > option "g �;e�K�>pquota" >> > >> > Might not hurt to look into the above, though perhaps this is just >> > syzkaller >> > playing around with mount options. >> > >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. >> >> >Not tainted 4.16.0+ #10 >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this >> >> > message. >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> > Call Trace: >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] >> >> > wait_for_common kernel/sched/completion.c:118 [inline] >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> I don't think this is a perf issue. Looks like something is preventing >> >> rcu_sched from completing. If there's a CPU that is running in kernel >> >> space and never scheduling, that can cause this issue. Or if RCU >> >> somehow missed a transition into idle or user space. >> > >> > The RCU CPU stall warning below strongly supports this position ... >> >> I think this is this guy then: >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> #syz dup: INFO: rcu detected stall in __process_echoes > > Seems likely to me! > >> Looking retrospectively at the various hang/stall bugs that we have, I >> think we need some kind of priority between them. I.e. we have rcu >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine >> hang and maybe something else. It would be useful if they fire >> deterministically according to priorities. If there is an rcu stall, >> that's always detected as CPU stall. Then if there is no RCU stall, >> but a workqueue stall, then that's always detected as workqueue stall, >> etc. >> Currently if we have an RCU stall (effectively CPU stall), that can be >> detected either RCU stall or a task hung, producing 2 different bug >> reports (which is bad). >> One can say that it's only a matter of tuning timeouts, but at least >> task hung detector has a problem that if you set timeout to X, it can >> detect hung anywhere between X and 2*X. And on one hand we need quite >> large timeout (a minute may not be enough), and on the other hand we >> can't wait for an hour just to make sure that the machine is indeed >> dead (these things happen every few minutes). > > I suppose that we could have a global variable that was set to the > priority of the complaint in question, which would suppress all > lower-priority complaints. Might need to be opt-in, though -- I would > guess that not everyone is going to be happy with one complaint suppressing > others, especially given the possibility that the two complaints might > be about different things. > > Or did you have something more deft in mind? syzkaller generally looks only at the first report. One does not know if/when there will be a second one, or the second one can be induced by the first one, and we generally want clean reports on a non-tainted kernel. So we don't just need to suppress lower priority ones,
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 2, 2018 at 6:21 PM, Paul E. McKenney wrote: > On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote: >> On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney >> wrote: >> > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: >> >> On Mon, 02 Apr 2018 02:20:02 -0700 >> >> syzbot wrote: >> >> >> >> > Hello, >> >> > >> >> > syzbot hit the following crash on upstream commit >> >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +) >> >> > Linux 4.16 >> >> > syzbot dashboard link: >> >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> >> > >> >> > Unfortunately, I don't have any reproducer for this crash yet. >> >> > Raw console output: >> >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> >> > Kernel config: >> >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> >> > compiler: gcc (GCC) 7.1.1 20170620 >> >> > >> >> > IMPORTANT: if you fix the bug, please add the following tag to the >> >> > commit: >> >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> >> > It will help syzbot understand when the bug is fixed. See footer for >> >> > details. >> >> > If you forward the report, please keep this part and the footer. >> >> > >> >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown >> >> > mount >> >> > option "g �;e�K�>pquota" >> > >> > Might not hurt to look into the above, though perhaps this is just >> > syzkaller >> > playing around with mount options. >> > >> >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. >> >> >Not tainted 4.16.0+ #10 >> >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this >> >> > message. >> >> > syz-executor3 D20944 10803 4492 0x8002 >> >> > Call Trace: >> >> > context_switch kernel/sched/core.c:2862 [inline] >> >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> >> > do_wait_for_common kernel/sched/completion.c:86 [inline] >> >> > __wait_for_common kernel/sched/completion.c:107 [inline] >> >> > wait_for_common kernel/sched/completion.c:118 [inline] >> >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 >> >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 >> >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> >> >> I don't think this is a perf issue. Looks like something is preventing >> >> rcu_sched from completing. If there's a CPU that is running in kernel >> >> space and never scheduling, that can cause this issue. Or if RCU >> >> somehow missed a transition into idle or user space. >> > >> > The RCU CPU stall warning below strongly supports this position ... >> >> I think this is this guy then: >> >> https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 >> >> #syz dup: INFO: rcu detected stall in __process_echoes > > Seems likely to me! > >> Looking retrospectively at the various hang/stall bugs that we have, I >> think we need some kind of priority between them. I.e. we have rcu >> stalls, spinlock stalls, workqueue hangs, task hangs, silent machine >> hang and maybe something else. It would be useful if they fire >> deterministically according to priorities. If there is an rcu stall, >> that's always detected as CPU stall. Then if there is no RCU stall, >> but a workqueue stall, then that's always detected as workqueue stall, >> etc. >> Currently if we have an RCU stall (effectively CPU stall), that can be >> detected either RCU stall or a task hung, producing 2 different bug >> reports (which is bad). >> One can say that it's only a matter of tuning timeouts, but at least >> task hung detector has a problem that if you set timeout to X, it can >> detect hung anywhere between X and 2*X. And on one hand we need quite >> large timeout (a minute may not be enough), and on the other hand we >> can't wait for an hour just to make sure that the machine is indeed >> dead (these things happen every few minutes). > > I suppose that we could have a global variable that was set to the > priority of the complaint in question, which would suppress all > lower-priority complaints. Might need to be opt-in, though -- I would > guess that not everyone is going to be happy with one complaint suppressing > others, especially given the possibility that the two complaints might > be about different things. > > Or did you have something more deft in mind? syzkaller generally looks only at the first report. One does not know if/when there will be a second one, or the second one can be induced by the first one, and we generally want clean reports on a non-tainted kernel. So we don't just need to suppress lower priority ones, we need to produce the right report first. I am thinking maybe setting: - rcu stalls at 1.5 minutes -
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney >wrote: > > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: > >> On Mon, 02 Apr 2018 02:20:02 -0700 > >> syzbot wrote: > >> > >> > Hello, > >> > > >> > syzbot hit the following crash on upstream commit > >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +) > >> > Linux 4.16 > >> > syzbot dashboard link: > >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> > > >> > Unfortunately, I don't have any reproducer for this crash yet. > >> > Raw console output: > >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> > Kernel config: > >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> > compiler: gcc (GCC) 7.1.1 20170620 > >> > > >> > IMPORTANT: if you fix the bug, please add the following tag to the > >> > commit: > >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> > It will help syzbot understand when the bug is fixed. See footer for > >> > details. > >> > If you forward the report, please keep this part and the footer. > >> > > >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown > >> > mount > >> > option "g �;e�K�>pquota" > > > > Might not hurt to look into the above, though perhaps this is just syzkaller > > playing around with mount options. > > > >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. > >> >Not tainted 4.16.0+ #10 > >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> > syz-executor3 D20944 10803 4492 0x8002 > >> > Call Trace: > >> > context_switch kernel/sched/core.c:2862 [inline] > >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> > do_wait_for_common kernel/sched/completion.c:86 [inline] > >> > __wait_for_common kernel/sched/completion.c:107 [inline] > >> > wait_for_common kernel/sched/completion.c:118 [inline] > >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 > >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> > >> I don't think this is a perf issue. Looks like something is preventing > >> rcu_sched from completing. If there's a CPU that is running in kernel > >> space and never scheduling, that can cause this issue. Or if RCU > >> somehow missed a transition into idle or user space. > > > > The RCU CPU stall warning below strongly supports this position ... > > I think this is this guy then: > > https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > > #syz dup: INFO: rcu detected stall in __process_echoes Seems likely to me! > Looking retrospectively at the various hang/stall bugs that we have, I > think we need some kind of priority between them. I.e. we have rcu > stalls, spinlock stalls, workqueue hangs, task hangs, silent machine > hang and maybe something else. It would be useful if they fire > deterministically according to priorities. If there is an rcu stall, > that's always detected as CPU stall. Then if there is no RCU stall, > but a workqueue stall, then that's always detected as workqueue stall, > etc. > Currently if we have an RCU stall (effectively CPU stall), that can be > detected either RCU stall or a task hung, producing 2 different bug > reports (which is bad). > One can say that it's only a matter of tuning timeouts, but at least > task hung detector has a problem that if you set timeout to X, it can > detect hung anywhere between X and 2*X. And on one hand we need quite > large timeout (a minute may not be enough), and on the other hand we > can't wait for an hour just to make sure that the machine is indeed > dead (these things happen every few minutes). I suppose that we could have a global variable that was set to the priority of the complaint in question, which would suppress all lower-priority complaints. Might need to be opt-in, though -- I would guess that not everyone is going to be happy with one complaint suppressing others, especially given the possibility that the two complaints might be about different things. Or did you have something more deft in mind? Thanx, Paul > >> > tracepoint_synchronize_unregister include/linux/tracepoint.h:80 > >> > [inline] > >> > perf_trace_event_unreg.isra.2+0xb7/0x1f0 > >> > kernel/trace/trace_event_perf.c:161 > >> > perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236 > >> > tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976 > >> > _free_event+0x3bd/0x10f0 kernel/events/core.c:4121 > >> >
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 02, 2018 at 06:04:35PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney > wrote: > > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: > >> On Mon, 02 Apr 2018 02:20:02 -0700 > >> syzbot wrote: > >> > >> > Hello, > >> > > >> > syzbot hit the following crash on upstream commit > >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +) > >> > Linux 4.16 > >> > syzbot dashboard link: > >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > >> > > >> > Unfortunately, I don't have any reproducer for this crash yet. > >> > Raw console output: > >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > >> > Kernel config: > >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > >> > compiler: gcc (GCC) 7.1.1 20170620 > >> > > >> > IMPORTANT: if you fix the bug, please add the following tag to the > >> > commit: > >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > >> > It will help syzbot understand when the bug is fixed. See footer for > >> > details. > >> > If you forward the report, please keep this part and the footer. > >> > > >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown > >> > mount > >> > option "g �;e�K�>pquota" > > > > Might not hurt to look into the above, though perhaps this is just syzkaller > > playing around with mount options. > > > >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. > >> >Not tainted 4.16.0+ #10 > >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> > syz-executor3 D20944 10803 4492 0x8002 > >> > Call Trace: > >> > context_switch kernel/sched/core.c:2862 [inline] > >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 > >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > >> > do_wait_for_common kernel/sched/completion.c:86 [inline] > >> > __wait_for_common kernel/sched/completion.c:107 [inline] > >> > wait_for_common kernel/sched/completion.c:118 [inline] > >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 > >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > >> > >> I don't think this is a perf issue. Looks like something is preventing > >> rcu_sched from completing. If there's a CPU that is running in kernel > >> space and never scheduling, that can cause this issue. Or if RCU > >> somehow missed a transition into idle or user space. > > > > The RCU CPU stall warning below strongly supports this position ... > > I think this is this guy then: > > https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 > > #syz dup: INFO: rcu detected stall in __process_echoes Seems likely to me! > Looking retrospectively at the various hang/stall bugs that we have, I > think we need some kind of priority between them. I.e. we have rcu > stalls, spinlock stalls, workqueue hangs, task hangs, silent machine > hang and maybe something else. It would be useful if they fire > deterministically according to priorities. If there is an rcu stall, > that's always detected as CPU stall. Then if there is no RCU stall, > but a workqueue stall, then that's always detected as workqueue stall, > etc. > Currently if we have an RCU stall (effectively CPU stall), that can be > detected either RCU stall or a task hung, producing 2 different bug > reports (which is bad). > One can say that it's only a matter of tuning timeouts, but at least > task hung detector has a problem that if you set timeout to X, it can > detect hung anywhere between X and 2*X. And on one hand we need quite > large timeout (a minute may not be enough), and on the other hand we > can't wait for an hour just to make sure that the machine is indeed > dead (these things happen every few minutes). I suppose that we could have a global variable that was set to the priority of the complaint in question, which would suppress all lower-priority complaints. Might need to be opt-in, though -- I would guess that not everyone is going to be happy with one complaint suppressing others, especially given the possibility that the two complaints might be about different things. Or did you have something more deft in mind? Thanx, Paul > >> > tracepoint_synchronize_unregister include/linux/tracepoint.h:80 > >> > [inline] > >> > perf_trace_event_unreg.isra.2+0xb7/0x1f0 > >> > kernel/trace/trace_event_perf.c:161 > >> > perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236 > >> > tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976 > >> > _free_event+0x3bd/0x10f0 kernel/events/core.c:4121 > >> > put_event+0x24/0x30 kernel/events/core.c:4204 > >> > perf_event_release_kernel+0x6e8/0xfc0
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenneywrote: > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: >> On Mon, 02 Apr 2018 02:20:02 -0700 >> syzbot wrote: >> >> > Hello, >> > >> > syzbot hit the following crash on upstream commit >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +) >> > Linux 4.16 >> > syzbot dashboard link: >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> > >> > Unfortunately, I don't have any reproducer for this crash yet. >> > Raw console output: >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> > Kernel config: >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> > compiler: gcc (GCC) 7.1.1 20170620 >> > >> > IMPORTANT: if you fix the bug, please add the following tag to the commit: >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> > It will help syzbot understand when the bug is fixed. See footer for >> > details. >> > If you forward the report, please keep this part and the footer. >> > >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount >> > option "g �;e�K�>pquota" > > Might not hurt to look into the above, though perhaps this is just syzkaller > playing around with mount options. > >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. >> >Not tainted 4.16.0+ #10 >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> > syz-executor3 D20944 10803 4492 0x8002 >> > Call Trace: >> > context_switch kernel/sched/core.c:2862 [inline] >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> > do_wait_for_common kernel/sched/completion.c:86 [inline] >> > __wait_for_common kernel/sched/completion.c:107 [inline] >> > wait_for_common kernel/sched/completion.c:118 [inline] >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> I don't think this is a perf issue. Looks like something is preventing >> rcu_sched from completing. If there's a CPU that is running in kernel >> space and never scheduling, that can cause this issue. Or if RCU >> somehow missed a transition into idle or user space. > > The RCU CPU stall warning below strongly supports this position ... I think this is this guy then: https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 #syz dup: INFO: rcu detected stall in __process_echoes Looking retrospectively at the various hang/stall bugs that we have, I think we need some kind of priority between them. I.e. we have rcu stalls, spinlock stalls, workqueue hangs, task hangs, silent machine hang and maybe something else. It would be useful if they fire deterministically according to priorities. If there is an rcu stall, that's always detected as CPU stall. Then if there is no RCU stall, but a workqueue stall, then that's always detected as workqueue stall, etc. Currently if we have an RCU stall (effectively CPU stall), that can be detected either RCU stall or a task hung, producing 2 different bug reports (which is bad). One can say that it's only a matter of tuning timeouts, but at least task hung detector has a problem that if you set timeout to X, it can detect hung anywhere between X and 2*X. And on one hand we need quite large timeout (a minute may not be enough), and on the other hand we can't wait for an hour just to make sure that the machine is indeed dead (these things happen every few minutes). >> > tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline] >> > perf_trace_event_unreg.isra.2+0xb7/0x1f0 >> > kernel/trace/trace_event_perf.c:161 >> > perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236 >> > tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976 >> > _free_event+0x3bd/0x10f0 kernel/events/core.c:4121 >> > put_event+0x24/0x30 kernel/events/core.c:4204 >> > perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310 >> > perf_release+0x37/0x50 kernel/events/core.c:4320 >> > __fput+0x327/0x7e0 fs/file_table.c:209 >> > fput+0x15/0x20 fs/file_table.c:243 >> > task_work_run+0x199/0x270 kernel/task_work.c:113 >> > exit_task_work include/linux/task_work.h:22 [inline] >> > do_exit+0x9bb/0x1ad0 kernel/exit.c:865 >> > do_group_exit+0x149/0x400 kernel/exit.c:968 >> > get_signal+0x73a/0x16d0 kernel/signal.c:2469 >> > do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 >> > exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 >> > prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] >> > syscall_return_slowpath
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 2, 2018 at 5:33 PM, Paul E. McKenney wrote: > On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: >> On Mon, 02 Apr 2018 02:20:02 -0700 >> syzbot wrote: >> >> > Hello, >> > >> > syzbot hit the following crash on upstream commit >> > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +) >> > Linux 4.16 >> > syzbot dashboard link: >> > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd >> > >> > Unfortunately, I don't have any reproducer for this crash yet. >> > Raw console output: >> > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 >> > Kernel config: >> > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 >> > compiler: gcc (GCC) 7.1.1 20170620 >> > >> > IMPORTANT: if you fix the bug, please add the following tag to the commit: >> > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com >> > It will help syzbot understand when the bug is fixed. See footer for >> > details. >> > If you forward the report, please keep this part and the footer. >> > >> > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount >> > option "g �;e�K�>pquota" > > Might not hurt to look into the above, though perhaps this is just syzkaller > playing around with mount options. > >> > INFO: task syz-executor3:10803 blocked for more than 120 seconds. >> >Not tainted 4.16.0+ #10 >> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> > syz-executor3 D20944 10803 4492 0x8002 >> > Call Trace: >> > context_switch kernel/sched/core.c:2862 [inline] >> > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 >> > schedule+0xf5/0x430 kernel/sched/core.c:3499 >> > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 >> > do_wait_for_common kernel/sched/completion.c:86 [inline] >> > __wait_for_common kernel/sched/completion.c:107 [inline] >> > wait_for_common kernel/sched/completion.c:118 [inline] >> > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 >> > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 >> > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 >> > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 >> >> I don't think this is a perf issue. Looks like something is preventing >> rcu_sched from completing. If there's a CPU that is running in kernel >> space and never scheduling, that can cause this issue. Or if RCU >> somehow missed a transition into idle or user space. > > The RCU CPU stall warning below strongly supports this position ... I think this is this guy then: https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 #syz dup: INFO: rcu detected stall in __process_echoes Looking retrospectively at the various hang/stall bugs that we have, I think we need some kind of priority between them. I.e. we have rcu stalls, spinlock stalls, workqueue hangs, task hangs, silent machine hang and maybe something else. It would be useful if they fire deterministically according to priorities. If there is an rcu stall, that's always detected as CPU stall. Then if there is no RCU stall, but a workqueue stall, then that's always detected as workqueue stall, etc. Currently if we have an RCU stall (effectively CPU stall), that can be detected either RCU stall or a task hung, producing 2 different bug reports (which is bad). One can say that it's only a matter of tuning timeouts, but at least task hung detector has a problem that if you set timeout to X, it can detect hung anywhere between X and 2*X. And on one hand we need quite large timeout (a minute may not be enough), and on the other hand we can't wait for an hour just to make sure that the machine is indeed dead (these things happen every few minutes). >> > tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline] >> > perf_trace_event_unreg.isra.2+0xb7/0x1f0 >> > kernel/trace/trace_event_perf.c:161 >> > perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236 >> > tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976 >> > _free_event+0x3bd/0x10f0 kernel/events/core.c:4121 >> > put_event+0x24/0x30 kernel/events/core.c:4204 >> > perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310 >> > perf_release+0x37/0x50 kernel/events/core.c:4320 >> > __fput+0x327/0x7e0 fs/file_table.c:209 >> > fput+0x15/0x20 fs/file_table.c:243 >> > task_work_run+0x199/0x270 kernel/task_work.c:113 >> > exit_task_work include/linux/task_work.h:22 [inline] >> > do_exit+0x9bb/0x1ad0 kernel/exit.c:865 >> > do_group_exit+0x149/0x400 kernel/exit.c:968 >> > get_signal+0x73a/0x16d0 kernel/signal.c:2469 >> > do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 >> > exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 >> > prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] >> > syscall_return_slowpath arch/x86/entry/common.c:265 [inline] >> > do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292 >> >
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: > On Mon, 02 Apr 2018 02:20:02 -0700 > syzbotwrote: > > > Hello, > > > > syzbot hit the following crash on upstream commit > > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +) > > Linux 4.16 > > syzbot dashboard link: > > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > > > > Unfortunately, I don't have any reproducer for this crash yet. > > Raw console output: > > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > > Kernel config: > > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > > compiler: gcc (GCC) 7.1.1 20170620 > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > > It will help syzbot understand when the bug is fixed. See footer for > > details. > > If you forward the report, please keep this part and the footer. > > > > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount > > option "g�;e�K�>pquota" Might not hurt to look into the above, though perhaps this is just syzkaller playing around with mount options. > > INFO: task syz-executor3:10803 blocked for more than 120 seconds. > >Not tainted 4.16.0+ #10 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > syz-executor3 D20944 10803 4492 0x8002 > > Call Trace: > > context_switch kernel/sched/core.c:2862 [inline] > > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > > schedule+0xf5/0x430 kernel/sched/core.c:3499 > > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > > do_wait_for_common kernel/sched/completion.c:86 [inline] > > __wait_for_common kernel/sched/completion.c:107 [inline] > > wait_for_common kernel/sched/completion.c:118 [inline] > > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 > > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > > I don't think this is a perf issue. Looks like something is preventing > rcu_sched from completing. If there's a CPU that is running in kernel > space and never scheduling, that can cause this issue. Or if RCU > somehow missed a transition into idle or user space. The RCU CPU stall warning below strongly supports this position ... > -- Steve > > > tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline] > > perf_trace_event_unreg.isra.2+0xb7/0x1f0 > > kernel/trace/trace_event_perf.c:161 > > perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236 > > tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976 > > _free_event+0x3bd/0x10f0 kernel/events/core.c:4121 > > put_event+0x24/0x30 kernel/events/core.c:4204 > > perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310 > > perf_release+0x37/0x50 kernel/events/core.c:4320 > > __fput+0x327/0x7e0 fs/file_table.c:209 > > fput+0x15/0x20 fs/file_table.c:243 > > task_work_run+0x199/0x270 kernel/task_work.c:113 > > exit_task_work include/linux/task_work.h:22 [inline] > > do_exit+0x9bb/0x1ad0 kernel/exit.c:865 > > do_group_exit+0x149/0x400 kernel/exit.c:968 > > get_signal+0x73a/0x16d0 kernel/signal.c:2469 > > do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 > > exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 > > prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] > > syscall_return_slowpath arch/x86/entry/common.c:265 [inline] > > do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292 > > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > RIP: 0033:0x455269 > > RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca > > RAX: RBX: 0072bec8 RCX: 00455269 > > RDX: RSI: RDI: 0072bec8 > > RBP: 0072bec8 R08: R09: 0072bea0 > > R10: R11: 0246 R12: > > R13: 7ffe793f79cf R14: 7f89763729c0 R15: > > > > Showing all locks held in the system: > > 2 locks held by khungtaskd/876: > > #0: (rcu_read_lock){}, at: [<8f2bec4b>] > > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] > > #0: (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60 > > kernel/hung_task.c:249 ... And two places to start looking are the two above rcu_read_lock() calls. Especially given that khungtask shows up below. > > #1: (tasklist_lock){.+.+}, at: [<06b3009f>] > > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470 > > 2 locks held by getty/4414: > > #0: (>ldisc_sem){}, at: [ ] > > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 > > #1: (>atomic_read_lock){+.+.}, at:
Re: INFO: task hung in perf_trace_event_unreg
On Mon, Apr 02, 2018 at 09:40:40AM -0400, Steven Rostedt wrote: > On Mon, 02 Apr 2018 02:20:02 -0700 > syzbot wrote: > > > Hello, > > > > syzbot hit the following crash on upstream commit > > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +) > > Linux 4.16 > > syzbot dashboard link: > > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > > > > Unfortunately, I don't have any reproducer for this crash yet. > > Raw console output: > > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > > Kernel config: > > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > > compiler: gcc (GCC) 7.1.1 20170620 > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > > It will help syzbot understand when the bug is fixed. See footer for > > details. > > If you forward the report, please keep this part and the footer. > > > > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount > > option "g�;e�K�>pquota" Might not hurt to look into the above, though perhaps this is just syzkaller playing around with mount options. > > INFO: task syz-executor3:10803 blocked for more than 120 seconds. > >Not tainted 4.16.0+ #10 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > syz-executor3 D20944 10803 4492 0x8002 > > Call Trace: > > context_switch kernel/sched/core.c:2862 [inline] > > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > > schedule+0xf5/0x430 kernel/sched/core.c:3499 > > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > > do_wait_for_common kernel/sched/completion.c:86 [inline] > > __wait_for_common kernel/sched/completion.c:107 [inline] > > wait_for_common kernel/sched/completion.c:118 [inline] > > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 > > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 > > I don't think this is a perf issue. Looks like something is preventing > rcu_sched from completing. If there's a CPU that is running in kernel > space and never scheduling, that can cause this issue. Or if RCU > somehow missed a transition into idle or user space. The RCU CPU stall warning below strongly supports this position ... > -- Steve > > > tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline] > > perf_trace_event_unreg.isra.2+0xb7/0x1f0 > > kernel/trace/trace_event_perf.c:161 > > perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236 > > tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976 > > _free_event+0x3bd/0x10f0 kernel/events/core.c:4121 > > put_event+0x24/0x30 kernel/events/core.c:4204 > > perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310 > > perf_release+0x37/0x50 kernel/events/core.c:4320 > > __fput+0x327/0x7e0 fs/file_table.c:209 > > fput+0x15/0x20 fs/file_table.c:243 > > task_work_run+0x199/0x270 kernel/task_work.c:113 > > exit_task_work include/linux/task_work.h:22 [inline] > > do_exit+0x9bb/0x1ad0 kernel/exit.c:865 > > do_group_exit+0x149/0x400 kernel/exit.c:968 > > get_signal+0x73a/0x16d0 kernel/signal.c:2469 > > do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 > > exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 > > prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] > > syscall_return_slowpath arch/x86/entry/common.c:265 [inline] > > do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292 > > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > > RIP: 0033:0x455269 > > RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca > > RAX: RBX: 0072bec8 RCX: 00455269 > > RDX: RSI: RDI: 0072bec8 > > RBP: 0072bec8 R08: R09: 0072bea0 > > R10: R11: 0246 R12: > > R13: 7ffe793f79cf R14: 7f89763729c0 R15: > > > > Showing all locks held in the system: > > 2 locks held by khungtaskd/876: > > #0: (rcu_read_lock){}, at: [<8f2bec4b>] > > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] > > #0: (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60 > > kernel/hung_task.c:249 ... And two places to start looking are the two above rcu_read_lock() calls. Especially given that khungtask shows up below. > > #1: (tasklist_lock){.+.+}, at: [<06b3009f>] > > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470 > > 2 locks held by getty/4414: > > #0: (>ldisc_sem){}, at: [] > > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 > > #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] > > n_tty_read+0x2ef/0x1a40
Re: INFO: task hung in perf_trace_event_unreg
On Mon, 02 Apr 2018 02:20:02 -0700 syzbotwrote: > Hello, > > syzbot hit the following crash on upstream commit > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +) > Linux 4.16 > syzbot dashboard link: > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > > Unfortunately, I don't have any reproducer for this crash yet. > Raw console output: > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > Kernel config: > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > compiler: gcc (GCC) 7.1.1 20170620 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > It will help syzbot understand when the bug is fixed. See footer for > details. > If you forward the report, please keep this part and the footer. > > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount > option "g�;e�K�>pquota" > INFO: task syz-executor3:10803 blocked for more than 120 seconds. >Not tainted 4.16.0+ #10 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > syz-executor3 D20944 10803 4492 0x8002 > Call Trace: > context_switch kernel/sched/core.c:2862 [inline] > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > schedule+0xf5/0x430 kernel/sched/core.c:3499 > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > do_wait_for_common kernel/sched/completion.c:86 [inline] > __wait_for_common kernel/sched/completion.c:107 [inline] > wait_for_common kernel/sched/completion.c:118 [inline] > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 I don't think this is a perf issue. Looks like something is preventing rcu_sched from completing. If there's a CPU that is running in kernel space and never scheduling, that can cause this issue. Or if RCU somehow missed a transition into idle or user space. -- Steve > tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline] > perf_trace_event_unreg.isra.2+0xb7/0x1f0 > kernel/trace/trace_event_perf.c:161 > perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236 > tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976 > _free_event+0x3bd/0x10f0 kernel/events/core.c:4121 > put_event+0x24/0x30 kernel/events/core.c:4204 > perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310 > perf_release+0x37/0x50 kernel/events/core.c:4320 > __fput+0x327/0x7e0 fs/file_table.c:209 > fput+0x15/0x20 fs/file_table.c:243 > task_work_run+0x199/0x270 kernel/task_work.c:113 > exit_task_work include/linux/task_work.h:22 [inline] > do_exit+0x9bb/0x1ad0 kernel/exit.c:865 > do_group_exit+0x149/0x400 kernel/exit.c:968 > get_signal+0x73a/0x16d0 kernel/signal.c:2469 > do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 > exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 > prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] > syscall_return_slowpath arch/x86/entry/common.c:265 [inline] > do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > RIP: 0033:0x455269 > RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca > RAX: RBX: 0072bec8 RCX: 00455269 > RDX: RSI: RDI: 0072bec8 > RBP: 0072bec8 R08: R09: 0072bea0 > R10: R11: 0246 R12: > R13: 7ffe793f79cf R14: 7f89763729c0 R15: > > Showing all locks held in the system: > 2 locks held by khungtaskd/876: > #0: (rcu_read_lock){}, at: [<8f2bec4b>] > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] > #0: (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60 > kernel/hung_task.c:249 > #1: (tasklist_lock){.+.+}, at: [<06b3009f>] > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470 > 2 locks held by getty/4414: > #0: (>ldisc_sem){}, at: [ ] > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 > #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 > 2 locks held by getty/4415: > #0: (>ldisc_sem){}, at: [ ] > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 > #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 > 2 locks held by getty/4416: > #0: (>ldisc_sem){}, at: [ ] > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 > #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] > n_tty_read+0x2ef/0x1a40
Re: INFO: task hung in perf_trace_event_unreg
On Mon, 02 Apr 2018 02:20:02 -0700 syzbot wrote: > Hello, > > syzbot hit the following crash on upstream commit > 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +) > Linux 4.16 > syzbot dashboard link: > https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd > > Unfortunately, I don't have any reproducer for this crash yet. > Raw console output: > https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 > Kernel config: > https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 > compiler: gcc (GCC) 7.1.1 20170620 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com > It will help syzbot understand when the bug is fixed. See footer for > details. > If you forward the report, please keep this part and the footer. > > REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount > option "g�;e�K�>pquota" > INFO: task syz-executor3:10803 blocked for more than 120 seconds. >Not tainted 4.16.0+ #10 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > syz-executor3 D20944 10803 4492 0x8002 > Call Trace: > context_switch kernel/sched/core.c:2862 [inline] > __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 > schedule+0xf5/0x430 kernel/sched/core.c:3499 > schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 > do_wait_for_common kernel/sched/completion.c:86 [inline] > __wait_for_common kernel/sched/completion.c:107 [inline] > wait_for_common kernel/sched/completion.c:118 [inline] > wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 > __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 > synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 > synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 I don't think this is a perf issue. Looks like something is preventing rcu_sched from completing. If there's a CPU that is running in kernel space and never scheduling, that can cause this issue. Or if RCU somehow missed a transition into idle or user space. -- Steve > tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline] > perf_trace_event_unreg.isra.2+0xb7/0x1f0 > kernel/trace/trace_event_perf.c:161 > perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236 > tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976 > _free_event+0x3bd/0x10f0 kernel/events/core.c:4121 > put_event+0x24/0x30 kernel/events/core.c:4204 > perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310 > perf_release+0x37/0x50 kernel/events/core.c:4320 > __fput+0x327/0x7e0 fs/file_table.c:209 > fput+0x15/0x20 fs/file_table.c:243 > task_work_run+0x199/0x270 kernel/task_work.c:113 > exit_task_work include/linux/task_work.h:22 [inline] > do_exit+0x9bb/0x1ad0 kernel/exit.c:865 > do_group_exit+0x149/0x400 kernel/exit.c:968 > get_signal+0x73a/0x16d0 kernel/signal.c:2469 > do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 > exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 > prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] > syscall_return_slowpath arch/x86/entry/common.c:265 [inline] > do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292 > entry_SYSCALL_64_after_hwframe+0x42/0xb7 > RIP: 0033:0x455269 > RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca > RAX: RBX: 0072bec8 RCX: 00455269 > RDX: RSI: RDI: 0072bec8 > RBP: 0072bec8 R08: R09: 0072bea0 > R10: R11: 0246 R12: > R13: 7ffe793f79cf R14: 7f89763729c0 R15: > > Showing all locks held in the system: > 2 locks held by khungtaskd/876: > #0: (rcu_read_lock){}, at: [<8f2bec4b>] > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] > #0: (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60 > kernel/hung_task.c:249 > #1: (tasklist_lock){.+.+}, at: [<06b3009f>] > debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470 > 2 locks held by getty/4414: > #0: (>ldisc_sem){}, at: [] > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 > #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 > 2 locks held by getty/4415: > #0: (>ldisc_sem){}, at: [ ] > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 > #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 > 2 locks held by getty/4416: > #0: (>ldisc_sem){}, at: [ ] > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 > #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] > n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 > 2 locks held by getty/4417: >
INFO: task hung in perf_trace_event_unreg
Hello, syzbot hit the following crash on upstream commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +) Linux 4.16 syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd Unfortunately, I don't have any reproducer for this crash yet. Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 Kernel config: https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 compiler: gcc (GCC) 7.1.1 20170620 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount option "g�;e�K�>pquota" INFO: task syz-executor3:10803 blocked for more than 120 seconds. Not tainted 4.16.0+ #10 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor3 D20944 10803 4492 0x8002 Call Trace: context_switch kernel/sched/core.c:2862 [inline] __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 schedule+0xf5/0x430 kernel/sched/core.c:3499 schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 do_wait_for_common kernel/sched/completion.c:86 [inline] __wait_for_common kernel/sched/completion.c:107 [inline] wait_for_common kernel/sched/completion.c:118 [inline] wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline] perf_trace_event_unreg.isra.2+0xb7/0x1f0 kernel/trace/trace_event_perf.c:161 perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236 tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976 _free_event+0x3bd/0x10f0 kernel/events/core.c:4121 put_event+0x24/0x30 kernel/events/core.c:4204 perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310 perf_release+0x37/0x50 kernel/events/core.c:4320 __fput+0x327/0x7e0 fs/file_table.c:209 fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x199/0x270 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x9bb/0x1ad0 kernel/exit.c:865 do_group_exit+0x149/0x400 kernel/exit.c:968 get_signal+0x73a/0x16d0 kernel/signal.c:2469 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x455269 RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca RAX: RBX: 0072bec8 RCX: 00455269 RDX: RSI: RDI: 0072bec8 RBP: 0072bec8 R08: R09: 0072bea0 R10: R11: 0246 R12: R13: 7ffe793f79cf R14: 7f89763729c0 R15: Showing all locks held in the system: 2 locks held by khungtaskd/876: #0: (rcu_read_lock){}, at: [<8f2bec4b>] check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] #0: (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60 kernel/hung_task.c:249 #1: (tasklist_lock){.+.+}, at: [<06b3009f>] debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470 2 locks held by getty/4414: #0: (>ldisc_sem){}, at: [] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4415: #0: (>ldisc_sem){}, at: [ ] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4416: #0: (>ldisc_sem){}, at: [ ] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4417: #0: (>ldisc_sem){}, at: [ ] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4418: #0: (>ldisc_sem){}, at: [ ] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4419: #0: (>ldisc_sem){}, at: [ ]
INFO: task hung in perf_trace_event_unreg
Hello, syzbot hit the following crash on upstream commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda (Sun Apr 1 21:20:27 2018 +) Linux 4.16 syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=2dbc55da20fa246378fd Unfortunately, I don't have any reproducer for this crash yet. Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5487937873510400 Kernel config: https://syzkaller.appspot.com/x/.config?id=-2374466361298166459 compiler: gcc (GCC) 7.1.1 20170620 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+2dbc55da20fa24637...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. REISERFS warning (device loop4): super-6502 reiserfs_getopt: unknown mount option "g�;e�K�>pquota" INFO: task syz-executor3:10803 blocked for more than 120 seconds. Not tainted 4.16.0+ #10 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor3 D20944 10803 4492 0x8002 Call Trace: context_switch kernel/sched/core.c:2862 [inline] __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 schedule+0xf5/0x430 kernel/sched/core.c:3499 schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 do_wait_for_common kernel/sched/completion.c:86 [inline] __wait_for_common kernel/sched/completion.c:107 [inline] wait_for_common kernel/sched/completion.c:118 [inline] wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 __wait_rcu_gp+0x221/0x340 kernel/rcu/update.c:414 synchronize_sched.part.64+0xac/0x100 kernel/rcu/tree.c:3212 synchronize_sched+0x76/0xf0 kernel/rcu/tree.c:3213 tracepoint_synchronize_unregister include/linux/tracepoint.h:80 [inline] perf_trace_event_unreg.isra.2+0xb7/0x1f0 kernel/trace/trace_event_perf.c:161 perf_trace_destroy+0xbc/0x100 kernel/trace/trace_event_perf.c:236 tp_perf_event_destroy+0x15/0x20 kernel/events/core.c:7976 _free_event+0x3bd/0x10f0 kernel/events/core.c:4121 put_event+0x24/0x30 kernel/events/core.c:4204 perf_event_release_kernel+0x6e8/0xfc0 kernel/events/core.c:4310 perf_release+0x37/0x50 kernel/events/core.c:4320 __fput+0x327/0x7e0 fs/file_table.c:209 fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x199/0x270 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x9bb/0x1ad0 kernel/exit.c:865 do_group_exit+0x149/0x400 kernel/exit.c:968 get_signal+0x73a/0x16d0 kernel/signal.c:2469 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x455269 RSP: 002b:7f8976371ce8 EFLAGS: 0246 ORIG_RAX: 00ca RAX: RBX: 0072bec8 RCX: 00455269 RDX: RSI: RDI: 0072bec8 RBP: 0072bec8 R08: R09: 0072bea0 R10: R11: 0246 R12: R13: 7ffe793f79cf R14: 7f89763729c0 R15: Showing all locks held in the system: 2 locks held by khungtaskd/876: #0: (rcu_read_lock){}, at: [<8f2bec4b>] check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] #0: (rcu_read_lock){}, at: [<8f2bec4b>] watchdog+0x1c5/0xd60 kernel/hung_task.c:249 #1: (tasklist_lock){.+.+}, at: [<06b3009f>] debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470 2 locks held by getty/4414: #0: (>ldisc_sem){}, at: [] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4415: #0: (>ldisc_sem){}, at: [ ] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4416: #0: (>ldisc_sem){}, at: [ ] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4417: #0: (>ldisc_sem){}, at: [ ] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4418: #0: (>ldisc_sem){}, at: [ ] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (>atomic_read_lock){+.+.}, at: [<762a7320>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4419: #0: (>ldisc_sem){}, at: [ ]