Re: INFO: task hung in bpf_exit_net
On Fri, Dec 22, 2017 at 05:04:37PM -0200, Marcelo Ricardo Leitner wrote: > On Fri, Dec 22, 2017 at 04:28:07PM -0200, Marcelo Ricardo Leitner wrote: > > On Fri, Dec 22, 2017 at 11:58:08AM +0100, Dmitry Vyukov wrote: > > ... > > > > Same with this one, perhaps related to / fixed by: > > > > http://patchwork.ozlabs.org/patch/850957/ > > > > > > > > > > > > > > > > Looking at the log, this one seems to be an infinite loop in SCTP code > > > with console output in it. Kernel is busy printing gazilion of: > > > > > > [ 176.491099] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > > low, using default minimum of 512 > > > ** 110 printk messages dropped ** > > > [ 176.503409] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > > low, using default minimum of 512 > > > ** 103 printk messages dropped ** > > > ... > > > [ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > > low, using default minimum of 512 > > > [ 246.742484] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > > low, using default minimum of 512 > > > [ 246.742590] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > > low, using default minimum of 512 > > > > > > Looks like a different issue. > > > > > > > Oh. I guess this is caused by the interface having a MTU smaller than > > SCTP_DEFAULT_MINSEGMENT (512), as the icmp frag needed handler > > (sctp_icmp_frag_needed) will trigger an instant retransmission. > > But as the MTU is smaller, SCTP won't update it, but will issue the > > retransmission anyway. > > > > I will test this soon. Should be fairly easy to trigger it. > > Reproduced it. > > netns A veth0(1500) - veth1(1500) B veth2(508) - veth3(508) C > > When A sends a sctp packet bigger than 508, it triggers the issue as B > will reply a icmp frag needed with a size that sctp won't accept but > will retransmit anyway. > syzbot hasn't encountered this hang again (although, it just happened once in the first place). I assume it was fixed by commit b6c5734db070, so telling syzbot this: #syz fix: sctp: fix the handling of ICMP Frag Needed for too small MTUs - Eric
Re: INFO: task hung in bpf_exit_net
On Fri, Dec 22, 2017 at 04:28:07PM -0200, Marcelo Ricardo Leitner wrote: > On Fri, Dec 22, 2017 at 11:58:08AM +0100, Dmitry Vyukov wrote: > ... > > > Same with this one, perhaps related to / fixed by: > > > http://patchwork.ozlabs.org/patch/850957/ > > > > > > > > > > > Looking at the log, this one seems to be an infinite loop in SCTP code > > with console output in it. Kernel is busy printing gazilion of: > > > > [ 176.491099] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > low, using default minimum of 512 > > ** 110 printk messages dropped ** > > [ 176.503409] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > low, using default minimum of 512 > > ** 103 printk messages dropped ** > > ... > > [ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > low, using default minimum of 512 > > [ 246.742484] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > low, using default minimum of 512 > > [ 246.742590] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > > low, using default minimum of 512 > > > > Looks like a different issue. > > > > Oh. I guess this is caused by the interface having a MTU smaller than > SCTP_DEFAULT_MINSEGMENT (512), as the icmp frag needed handler > (sctp_icmp_frag_needed) will trigger an instant retransmission. > But as the MTU is smaller, SCTP won't update it, but will issue the > retransmission anyway. > > I will test this soon. Should be fairly easy to trigger it. Reproduced it. netns A veth0(1500) - veth1(1500) B veth2(508) - veth3(508) C When A sends a sctp packet bigger than 508, it triggers the issue as B will reply a icmp frag needed with a size that sctp won't accept but will retransmit anyway. Marcelo
Re: INFO: task hung in bpf_exit_net
On Fri, Dec 22, 2017 at 11:58:08AM +0100, Dmitry Vyukov wrote: ... > > Same with this one, perhaps related to / fixed by: > > http://patchwork.ozlabs.org/patch/850957/ > > > > > > Looking at the log, this one seems to be an infinite loop in SCTP code > with console output in it. Kernel is busy printing gazilion of: > > [ 176.491099] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > low, using default minimum of 512 > ** 110 printk messages dropped ** > [ 176.503409] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > low, using default minimum of 512 > ** 103 printk messages dropped ** > ... > [ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > low, using default minimum of 512 > [ 246.742484] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > low, using default minimum of 512 > [ 246.742590] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too > low, using default minimum of 512 > > Looks like a different issue. > Oh. I guess this is caused by the interface having a MTU smaller than SCTP_DEFAULT_MINSEGMENT (512), as the icmp frag needed handler (sctp_icmp_frag_needed) will trigger an instant retransmission. But as the MTU is smaller, SCTP won't update it, but will issue the retransmission anyway. I will test this soon. Should be fairly easy to trigger it. Marcelo
Re: INFO: task hung in bpf_exit_net
On Fri, Dec 22, 2017 at 12:16 PM, Marcelo Ricardo Leitner wrote: > On Fri, Dec 22, 2017 at 11:58:08AM +0100, Dmitry Vyukov wrote: >> On Tue, Dec 19, 2017 at 7:20 PM, David Ahern wrote: >> > On 12/19/17 5:47 AM, Dmitry Vyukov wrote: >> >> On Tue, Dec 19, 2017 at 1:36 PM, syzbot >> >> >> >> wrote: >> >>> Hello, >> >>> >> >>> syzkaller hit the following crash on >> >>> 7ceb97a071e80f1b5e4cd5a36de135612a836388 >> >>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master >> >>> compiler: gcc (GCC) 7.1.1 20170620 >> >>> .config is attached >> >>> Raw console output is attached. >> >>> >> >>> Unfortunately, I don't have any reproducer for this bug yet. >> >>> >> >>> >> >>> sctp: sctp_transport_update_pmtu: Reported pmtu 508 too low, using >> >>> default >> >>> minimum of 512 >> >>> INFO: task kworker/u4:0:5 blocked for more than 120 seconds. >> >>> Not tainted 4.15.0-rc2-next-20171205+ #59 >> >>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> >>> kworker/u4:0D15808 5 2 0x8000 >> >>> Workqueue: netns cleanup_net >> >>> Call Trace: >> >>> context_switch kernel/sched/core.c:2800 [inline] >> >>> __schedule+0x8eb/0x2060 kernel/sched/core.c:3376 >> >>> schedule+0xf5/0x430 kernel/sched/core.c:3435 >> >>> schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3493 >> >>> __mutex_lock_common kernel/locking/mutex.c:833 [inline] >> >>> __mutex_lock+0xaad/0x1a80 kernel/locking/mutex.c:893 >> >>> mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 >> >>> rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 >> >>> tc_action_net_exit include/net/act_api.h:125 [inline] >> >>> bpf_exit_net+0x1a2/0x340 net/sched/act_bpf.c:408 >> >>> ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:142 >> >>> cleanup_net+0x5c7/0xb60 net/core/net_namespace.c:484 >> >>> process_one_work+0xbfd/0x1bc0 kernel/workqueue.c:2113 >> >>> worker_thread+0x223/0x1990 kernel/workqueue.c:2247 >> >>> kthread+0x37a/0x440 kernel/kthread.c:238 >> >>> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:517 >> >>> >> >>> Showing all locks held in the system: >> >>> 4 locks held by kworker/u4:0/5: >> >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [] >> >>> __write_once_size include/linux/compiler.h:212 [inline] >> >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >> >>> atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] >> >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >> >>> atomic_long_set include/asm-generic/atomic-long.h:57 [inline] >> >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >> >>> set_work_data kernel/workqueue.c:619 [inline] >> >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >> >>> set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] >> >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >> >>> process_one_work+0xad4/0x1bc0 kernel/workqueue.c:2084 >> >>> #1: (net_cleanup_work){+.+.}, at: [<6c7c48a3>] >> >>> process_one_work+0xb2f/0x1bc0 kernel/workqueue.c:2088 >> >>> #2: (net_mutex){+.+.}, at: [ ] cleanup_net+0x247/0xb60 >> >>> net/core/net_namespace.c:450 >> >>> #3: (rtnl_mutex){+.+.}, at: [<53390f0b>] rtnl_lock+0x17/0x20 >> >>> net/core/rtnetlink.c:74 >> >>> 3 locks held by kworker/1:0/17: >> >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: >> >>> [ ] >> >>> __write_once_size include/linux/compiler.h:212 [inline] >> >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: >> >>> [ ] >> >>> atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] >> >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: >> >>> [ ] >> >>> atomic_long_set include/asm-generic/atomic-long.h:57 [inline] >> >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: >> >>> [ ] >> >>> set_work_data kernel/workqueue.c:619 [inline] >> >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: >> >>> [ ] >> >>> set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] >> >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: >> >>> [ ] >> >>> process_one_work+0xad4/0x1bc0 kernel/workqueue.c:2084 >> >>> #1: ((addr_chk_work).work){+.+.}, at: [<6c7c48a3>] >> >>> process_one_work+0xb2f/0x1bc0 kernel/workqueue.c:2088 >> >>> #2: (rtnl_mutex){+.+.}, at: [<53390f0b>] rtnl_lock+0x17/0x20 >> >>> net/core/rtnetlink.c:74 >> >>> 2 locks held by khungtaskd/675: >> >>> #0: (rcu_read_lock){}, at: [<587c8471>] >> >>> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] >> >>> #0: (rcu_read_lock){}, at: [<587c8471>] >> >>> watchdog+0x1c5/0xd60 >> >>> kernel/hung_task.c:249 >> >>> #1: (tasklist_lock){.+.+}, at: [<5288685e>] >> >>> debug_show_all_locks+0xd3/0x400 kernel/locking/lockdep.c:4554 >> >>> 1 lock held by
Re: INFO: task hung in bpf_exit_net
On Fri, Dec 22, 2017 at 11:58:08AM +0100, Dmitry Vyukov wrote: > On Tue, Dec 19, 2017 at 7:20 PM, David Ahern wrote: > > On 12/19/17 5:47 AM, Dmitry Vyukov wrote: > >> On Tue, Dec 19, 2017 at 1:36 PM, syzbot > >> > >> wrote: > >>> Hello, > >>> > >>> syzkaller hit the following crash on > >>> 7ceb97a071e80f1b5e4cd5a36de135612a836388 > >>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master > >>> compiler: gcc (GCC) 7.1.1 20170620 > >>> .config is attached > >>> Raw console output is attached. > >>> > >>> Unfortunately, I don't have any reproducer for this bug yet. > >>> > >>> > >>> sctp: sctp_transport_update_pmtu: Reported pmtu 508 too low, using default > >>> minimum of 512 > >>> INFO: task kworker/u4:0:5 blocked for more than 120 seconds. > >>> Not tainted 4.15.0-rc2-next-20171205+ #59 > >>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >>> kworker/u4:0D15808 5 2 0x8000 > >>> Workqueue: netns cleanup_net > >>> Call Trace: > >>> context_switch kernel/sched/core.c:2800 [inline] > >>> __schedule+0x8eb/0x2060 kernel/sched/core.c:3376 > >>> schedule+0xf5/0x430 kernel/sched/core.c:3435 > >>> schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3493 > >>> __mutex_lock_common kernel/locking/mutex.c:833 [inline] > >>> __mutex_lock+0xaad/0x1a80 kernel/locking/mutex.c:893 > >>> mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 > >>> rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 > >>> tc_action_net_exit include/net/act_api.h:125 [inline] > >>> bpf_exit_net+0x1a2/0x340 net/sched/act_bpf.c:408 > >>> ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:142 > >>> cleanup_net+0x5c7/0xb60 net/core/net_namespace.c:484 > >>> process_one_work+0xbfd/0x1bc0 kernel/workqueue.c:2113 > >>> worker_thread+0x223/0x1990 kernel/workqueue.c:2247 > >>> kthread+0x37a/0x440 kernel/kthread.c:238 > >>> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:517 > >>> > >>> Showing all locks held in the system: > >>> 4 locks held by kworker/u4:0/5: > >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [] > >>> __write_once_size include/linux/compiler.h:212 [inline] > >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] > >>> atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] > >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] > >>> atomic_long_set include/asm-generic/atomic-long.h:57 [inline] > >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] > >>> set_work_data kernel/workqueue.c:619 [inline] > >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] > >>> set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] > >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] > >>> process_one_work+0xad4/0x1bc0 kernel/workqueue.c:2084 > >>> #1: (net_cleanup_work){+.+.}, at: [<6c7c48a3>] > >>> process_one_work+0xb2f/0x1bc0 kernel/workqueue.c:2088 > >>> #2: (net_mutex){+.+.}, at: [ ] cleanup_net+0x247/0xb60 > >>> net/core/net_namespace.c:450 > >>> #3: (rtnl_mutex){+.+.}, at: [<53390f0b>] rtnl_lock+0x17/0x20 > >>> net/core/rtnetlink.c:74 > >>> 3 locks held by kworker/1:0/17: > >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: > >>> [ ] > >>> __write_once_size include/linux/compiler.h:212 [inline] > >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: > >>> [ ] > >>> atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] > >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: > >>> [ ] > >>> atomic_long_set include/asm-generic/atomic-long.h:57 [inline] > >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: > >>> [ ] > >>> set_work_data kernel/workqueue.c:619 [inline] > >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: > >>> [ ] > >>> set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] > >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: > >>> [ ] > >>> process_one_work+0xad4/0x1bc0 kernel/workqueue.c:2084 > >>> #1: ((addr_chk_work).work){+.+.}, at: [<6c7c48a3>] > >>> process_one_work+0xb2f/0x1bc0 kernel/workqueue.c:2088 > >>> #2: (rtnl_mutex){+.+.}, at: [<53390f0b>] rtnl_lock+0x17/0x20 > >>> net/core/rtnetlink.c:74 > >>> 2 locks held by khungtaskd/675: > >>> #0: (rcu_read_lock){}, at: [<587c8471>] > >>> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] > >>> #0: (rcu_read_lock){}, at: [<587c8471>] watchdog+0x1c5/0xd60 > >>> kernel/hung_task.c:249 > >>> #1: (tasklist_lock){.+.+}, at: [<5288685e>] > >>> debug_show_all_locks+0xd3/0x400 kernel/locking/lockdep.c:4554 > >>> 1 lock held by rsyslogd/2974: > >>> #0: (&f->f_pos_lock){+.+.}, at: [<11e00499>] > >>> __fdget_pos+0x131/0x1a0 fs/file.c:770 > >>> 2 locks held by getty/3056: > >>> #0: (&tty->ldis
Re: INFO: task hung in bpf_exit_net
On Tue, Dec 19, 2017 at 7:20 PM, David Ahern wrote: > On 12/19/17 5:47 AM, Dmitry Vyukov wrote: >> On Tue, Dec 19, 2017 at 1:36 PM, syzbot >> >> wrote: >>> Hello, >>> >>> syzkaller hit the following crash on >>> 7ceb97a071e80f1b5e4cd5a36de135612a836388 >>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master >>> compiler: gcc (GCC) 7.1.1 20170620 >>> .config is attached >>> Raw console output is attached. >>> >>> Unfortunately, I don't have any reproducer for this bug yet. >>> >>> >>> sctp: sctp_transport_update_pmtu: Reported pmtu 508 too low, using default >>> minimum of 512 >>> INFO: task kworker/u4:0:5 blocked for more than 120 seconds. >>> Not tainted 4.15.0-rc2-next-20171205+ #59 >>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> kworker/u4:0D15808 5 2 0x8000 >>> Workqueue: netns cleanup_net >>> Call Trace: >>> context_switch kernel/sched/core.c:2800 [inline] >>> __schedule+0x8eb/0x2060 kernel/sched/core.c:3376 >>> schedule+0xf5/0x430 kernel/sched/core.c:3435 >>> schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3493 >>> __mutex_lock_common kernel/locking/mutex.c:833 [inline] >>> __mutex_lock+0xaad/0x1a80 kernel/locking/mutex.c:893 >>> mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 >>> rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 >>> tc_action_net_exit include/net/act_api.h:125 [inline] >>> bpf_exit_net+0x1a2/0x340 net/sched/act_bpf.c:408 >>> ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:142 >>> cleanup_net+0x5c7/0xb60 net/core/net_namespace.c:484 >>> process_one_work+0xbfd/0x1bc0 kernel/workqueue.c:2113 >>> worker_thread+0x223/0x1990 kernel/workqueue.c:2247 >>> kthread+0x37a/0x440 kernel/kthread.c:238 >>> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:517 >>> >>> Showing all locks held in the system: >>> 4 locks held by kworker/u4:0/5: >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [] >>> __write_once_size include/linux/compiler.h:212 [inline] >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >>> atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >>> atomic_long_set include/asm-generic/atomic-long.h:57 [inline] >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >>> set_work_data kernel/workqueue.c:619 [inline] >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >>> set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] >>> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >>> process_one_work+0xad4/0x1bc0 kernel/workqueue.c:2084 >>> #1: (net_cleanup_work){+.+.}, at: [<6c7c48a3>] >>> process_one_work+0xb2f/0x1bc0 kernel/workqueue.c:2088 >>> #2: (net_mutex){+.+.}, at: [ ] cleanup_net+0x247/0xb60 >>> net/core/net_namespace.c:450 >>> #3: (rtnl_mutex){+.+.}, at: [<53390f0b>] rtnl_lock+0x17/0x20 >>> net/core/rtnetlink.c:74 >>> 3 locks held by kworker/1:0/17: >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] >>> __write_once_size include/linux/compiler.h:212 [inline] >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] >>> atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] >>> atomic_long_set include/asm-generic/atomic-long.h:57 [inline] >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] >>> set_work_data kernel/workqueue.c:619 [inline] >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] >>> set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] >>> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] >>> process_one_work+0xad4/0x1bc0 kernel/workqueue.c:2084 >>> #1: ((addr_chk_work).work){+.+.}, at: [<6c7c48a3>] >>> process_one_work+0xb2f/0x1bc0 kernel/workqueue.c:2088 >>> #2: (rtnl_mutex){+.+.}, at: [<53390f0b>] rtnl_lock+0x17/0x20 >>> net/core/rtnetlink.c:74 >>> 2 locks held by khungtaskd/675: >>> #0: (rcu_read_lock){}, at: [<587c8471>] >>> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] >>> #0: (rcu_read_lock){}, at: [<587c8471>] watchdog+0x1c5/0xd60 >>> kernel/hung_task.c:249 >>> #1: (tasklist_lock){.+.+}, at: [<5288685e>] >>> debug_show_all_locks+0xd3/0x400 kernel/locking/lockdep.c:4554 >>> 1 lock held by rsyslogd/2974: >>> #0: (&f->f_pos_lock){+.+.}, at: [<11e00499>] >>> __fdget_pos+0x131/0x1a0 fs/file.c:770 >>> 2 locks held by getty/3056: >>> #0: (&tty->ldisc_sem){}, at: [ ] >>> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 >>> #1: (&ldata->atomic_read_lock){+.+.}, at: [ ] >>> n_tty_read+0x2f2/0x1a10 drivers/tty/n_tty.c:2131 >>> 2 locks held by getty/3057: >>> #0: (&tty->ldisc_sem){}, a
Re: INFO: task hung in bpf_exit_net
On 12/19/17 5:47 AM, Dmitry Vyukov wrote: > On Tue, Dec 19, 2017 at 1:36 PM, syzbot > > wrote: >> Hello, >> >> syzkaller hit the following crash on >> 7ceb97a071e80f1b5e4cd5a36de135612a836388 >> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master >> compiler: gcc (GCC) 7.1.1 20170620 >> .config is attached >> Raw console output is attached. >> >> Unfortunately, I don't have any reproducer for this bug yet. >> >> >> sctp: sctp_transport_update_pmtu: Reported pmtu 508 too low, using default >> minimum of 512 >> INFO: task kworker/u4:0:5 blocked for more than 120 seconds. >> Not tainted 4.15.0-rc2-next-20171205+ #59 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kworker/u4:0D15808 5 2 0x8000 >> Workqueue: netns cleanup_net >> Call Trace: >> context_switch kernel/sched/core.c:2800 [inline] >> __schedule+0x8eb/0x2060 kernel/sched/core.c:3376 >> schedule+0xf5/0x430 kernel/sched/core.c:3435 >> schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3493 >> __mutex_lock_common kernel/locking/mutex.c:833 [inline] >> __mutex_lock+0xaad/0x1a80 kernel/locking/mutex.c:893 >> mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 >> rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 >> tc_action_net_exit include/net/act_api.h:125 [inline] >> bpf_exit_net+0x1a2/0x340 net/sched/act_bpf.c:408 >> ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:142 >> cleanup_net+0x5c7/0xb60 net/core/net_namespace.c:484 >> process_one_work+0xbfd/0x1bc0 kernel/workqueue.c:2113 >> worker_thread+0x223/0x1990 kernel/workqueue.c:2247 >> kthread+0x37a/0x440 kernel/kthread.c:238 >> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:517 >> >> Showing all locks held in the system: >> 4 locks held by kworker/u4:0/5: >> #0: ((wq_completion)"%s""netns"){+.+.}, at: [] >> __write_once_size include/linux/compiler.h:212 [inline] >> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >> atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] >> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >> atomic_long_set include/asm-generic/atomic-long.h:57 [inline] >> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >> set_work_data kernel/workqueue.c:619 [inline] >> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >> set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] >> #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] >> process_one_work+0xad4/0x1bc0 kernel/workqueue.c:2084 >> #1: (net_cleanup_work){+.+.}, at: [<6c7c48a3>] >> process_one_work+0xb2f/0x1bc0 kernel/workqueue.c:2088 >> #2: (net_mutex){+.+.}, at: [ ] cleanup_net+0x247/0xb60 >> net/core/net_namespace.c:450 >> #3: (rtnl_mutex){+.+.}, at: [<53390f0b>] rtnl_lock+0x17/0x20 >> net/core/rtnetlink.c:74 >> 3 locks held by kworker/1:0/17: >> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] >> __write_once_size include/linux/compiler.h:212 [inline] >> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] >> atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] >> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] >> atomic_long_set include/asm-generic/atomic-long.h:57 [inline] >> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] >> set_work_data kernel/workqueue.c:619 [inline] >> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] >> set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] >> #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] >> process_one_work+0xad4/0x1bc0 kernel/workqueue.c:2084 >> #1: ((addr_chk_work).work){+.+.}, at: [<6c7c48a3>] >> process_one_work+0xb2f/0x1bc0 kernel/workqueue.c:2088 >> #2: (rtnl_mutex){+.+.}, at: [<53390f0b>] rtnl_lock+0x17/0x20 >> net/core/rtnetlink.c:74 >> 2 locks held by khungtaskd/675: >> #0: (rcu_read_lock){}, at: [<587c8471>] >> check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] >> #0: (rcu_read_lock){}, at: [<587c8471>] watchdog+0x1c5/0xd60 >> kernel/hung_task.c:249 >> #1: (tasklist_lock){.+.+}, at: [<5288685e>] >> debug_show_all_locks+0xd3/0x400 kernel/locking/lockdep.c:4554 >> 1 lock held by rsyslogd/2974: >> #0: (&f->f_pos_lock){+.+.}, at: [<11e00499>] >> __fdget_pos+0x131/0x1a0 fs/file.c:770 >> 2 locks held by getty/3056: >> #0: (&tty->ldisc_sem){}, at: [ ] >> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 >> #1: (&ldata->atomic_read_lock){+.+.}, at: [ ] >> n_tty_read+0x2f2/0x1a10 drivers/tty/n_tty.c:2131 >> 2 locks held by getty/3057: >> #0: (&tty->ldisc_sem){}, at: [ ] >> ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 >> #1: (&ldata->atomic_read_lock){+.+.}, at: [ ]
Re: INFO: task hung in bpf_exit_net
On Tue, Dec 19, 2017 at 8:47 PM, Dmitry Vyukov wrote: > On Tue, Dec 19, 2017 at 1:36 PM, syzbot > > wrote: >> Hello, >> >> syzkaller hit the following crash on >> 7ceb97a071e80f1b5e4cd5a36de135612a836388 >> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master >> compiler: gcc (GCC) 7.1.1 20170620 >> .config is attached >> Raw console output is attached. >> >> Unfortunately, I don't have any reproducer for this bug yet. >> >> >> sctp: sctp_transport_update_pmtu: Reported pmtu 508 too low, using default >> minimum of 512 >> INFO: task kworker/u4:0:5 blocked for more than 120 seconds. >> Not tainted 4.15.0-rc2-next-20171205+ #59 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kworker/u4:0D15808 5 2 0x8000 >> Workqueue: netns cleanup_net >> Call Trace: >> context_switch kernel/sched/core.c:2800 [inline] >> __schedule+0x8eb/0x2060 kernel/sched/core.c:3376 >> schedule+0xf5/0x430 kernel/sched/core.c:3435 >> schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3493 >> __mutex_lock_common kernel/locking/mutex.c:833 [inline] >> __mutex_lock+0xaad/0x1a80 kernel/locking/mutex.c:893 >> mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 >> rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 >> tc_action_net_exit include/net/act_api.h:125 [inline] >> bpf_exit_net+0x1a2/0x340 net/sched/act_bpf.c:408 >> ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:142 >> cleanup_net+0x5c7/0xb60 net/core/net_namespace.c:484 >> process_one_work+0xbfd/0x1bc0 kernel/workqueue.c:2113 >> worker_thread+0x223/0x1990 kernel/workqueue.c:2247 >> kthread+0x37a/0x440 kernel/kthread.c:238 >> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:517 >> [...] >> Call Trace: >> serial_in drivers/tty/serial/8250/8250.h:111 [inline] >> wait_for_xmitr+0x93/0x1e0 drivers/tty/serial/8250/8250_port.c:2033 I saw this call trace on both 'bpf_exit_net task hung' and 'cleanup_net task hung'. Note when cpu is here, it's still holding the rtnl_lock in these 2 cases, one is in nl80211_dump_interface(), and the other one is in dev_ioctl(). I noticed this patch: commit 54f19b4a679149130f78413c421a5780e90a9d0a Author: Jiri Olsa Date: Wed Sep 21 16:43:15 2016 +0200 tty/serial/8250: Touch NMI watchdog in wait_for_xmitr It means in early time, watchdog timeout can be triggered here, And this patch was to fix it by restarting NMI watchdog timeout with calling touch_nmi_watchdog(). But this patch missed that it's still holding the rtnl_lock(), other threads may timeout on watchdog when trying to acquire rtnl_lock(). >> serial8250_console_putchar+0x1f/0x60 >> drivers/tty/serial/8250/8250_port.c:3170 >> uart_console_write+0xac/0xe0 drivers/tty/serial/serial_core.c:1858 >> serial8250_console_write+0x647/0xa20 >> drivers/tty/serial/8250/8250_port.c:3236 >> univ8250_console_write+0x5f/0x70 drivers/tty/serial/8250/8250_core.c:590 >> call_console_drivers kernel/printk/printk.c:1574 [inline] >> console_unlock+0x788/0xd70 kernel/printk/printk.c:2233 >> vprintk_emit+0x4ad/0x590 kernel/printk/printk.c:1757 >> vprintk_default+0x28/0x30 kernel/printk/printk.c:1796 >> vprintk_func+0x57/0xc0 kernel/printk/printk_safe.c:379 >> printk+0xaa/0xca kernel/printk/printk.c:1829 >> nla_parse+0x374/0x3d0 lib/nlattr.c:257 >> nlmsg_parse include/net/netlink.h:398 [inline] >> nl80211_dump_wiphy_parse.isra.37.constprop.83+0x138/0x5c0 >> net/wireless/nl80211.c:1920 >> nl80211_dump_interface+0x596/0x820 net/wireless/nl80211.c:2660 >> genl_lock_dumpit+0x68/0x90 net/netlink/genetlink.c:480 >> netlink_dump+0x48c/0xce0 net/netlink/af_netlink.c:2186 >> __netlink_dump_start+0x4f0/0x6d0 net/netlink/af_netlink.c:2283 >> genl_family_rcv_msg+0xd27/0xfc0 net/netlink/genetlink.c:548 >> genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:624 >> netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2405 >> genl_rcv+0x28/0x40 net/netlink/genetlink.c:635 >> netlink_unicast_kernel net/netlink/af_netlink.c:1272 [inline] >> netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1298 >> netlink_sendmsg+0xa4a/0xe70 net/netlink/af_netlink.c:1861 >> sock_sendmsg_nosec net/socket.c:636 [inline] >> sock_sendmsg+0xca/0x110 net/socket.c:646 >> sock_write_iter+0x320/0x5e0 net/socket.c:915 >> call_write_iter include/linux/fs.h:1776 [inline] >> new_sync_write fs/read_write.c:469 [inline] >> __vfs_write+0x68a/0x970 fs/read_write.c:482 >> vfs_write+0x18f/0x510 fs/read_write.c:544 >> SYSC_write fs/read_write.c:589 [inline] >> SyS_write+0xef/0x220 fs/read_write.c:581 >> entry_SYSCALL_64_fastpath+0x1f/0x96 >> RIP: 0033:0x4529d9 >> RSP: 002b:7f6d52e3ec58 EFLAGS: 0212 ORIG_RAX: 0001 >> RAX: ffda RBX: 7f6d52e3f700 RCX: 004529d9 >> RDX: 0024 RSI: 20454000 RDI: 0016 >> RBP: R08: R09: >> R10: R11: 0212 R12: >> R13: 00a6f7ff R14:
Re: INFO: task hung in bpf_exit_net
On Tue, Dec 19, 2017 at 1:36 PM, syzbot wrote: > Hello, > > syzkaller hit the following crash on > 7ceb97a071e80f1b5e4cd5a36de135612a836388 > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master > compiler: gcc (GCC) 7.1.1 20170620 > .config is attached > Raw console output is attached. > > Unfortunately, I don't have any reproducer for this bug yet. > > > sctp: sctp_transport_update_pmtu: Reported pmtu 508 too low, using default > minimum of 512 > INFO: task kworker/u4:0:5 blocked for more than 120 seconds. > Not tainted 4.15.0-rc2-next-20171205+ #59 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/u4:0D15808 5 2 0x8000 > Workqueue: netns cleanup_net > Call Trace: > context_switch kernel/sched/core.c:2800 [inline] > __schedule+0x8eb/0x2060 kernel/sched/core.c:3376 > schedule+0xf5/0x430 kernel/sched/core.c:3435 > schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3493 > __mutex_lock_common kernel/locking/mutex.c:833 [inline] > __mutex_lock+0xaad/0x1a80 kernel/locking/mutex.c:893 > mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 > rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 > tc_action_net_exit include/net/act_api.h:125 [inline] > bpf_exit_net+0x1a2/0x340 net/sched/act_bpf.c:408 > ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:142 > cleanup_net+0x5c7/0xb60 net/core/net_namespace.c:484 > process_one_work+0xbfd/0x1bc0 kernel/workqueue.c:2113 > worker_thread+0x223/0x1990 kernel/workqueue.c:2247 > kthread+0x37a/0x440 kernel/kthread.c:238 > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:517 > > Showing all locks held in the system: > 4 locks held by kworker/u4:0/5: > #0: ((wq_completion)"%s""netns"){+.+.}, at: [] > __write_once_size include/linux/compiler.h:212 [inline] > #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] > atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] > #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] > atomic_long_set include/asm-generic/atomic-long.h:57 [inline] > #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] > set_work_data kernel/workqueue.c:619 [inline] > #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] > set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] > #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] > process_one_work+0xad4/0x1bc0 kernel/workqueue.c:2084 > #1: (net_cleanup_work){+.+.}, at: [<6c7c48a3>] > process_one_work+0xb2f/0x1bc0 kernel/workqueue.c:2088 > #2: (net_mutex){+.+.}, at: [ ] cleanup_net+0x247/0xb60 > net/core/net_namespace.c:450 > #3: (rtnl_mutex){+.+.}, at: [<53390f0b>] rtnl_lock+0x17/0x20 > net/core/rtnetlink.c:74 > 3 locks held by kworker/1:0/17: > #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] > __write_once_size include/linux/compiler.h:212 [inline] > #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] > atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] > #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] > atomic_long_set include/asm-generic/atomic-long.h:57 [inline] > #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] > set_work_data kernel/workqueue.c:619 [inline] > #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] > set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] > #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [ ] > process_one_work+0xad4/0x1bc0 kernel/workqueue.c:2084 > #1: ((addr_chk_work).work){+.+.}, at: [<6c7c48a3>] > process_one_work+0xb2f/0x1bc0 kernel/workqueue.c:2088 > #2: (rtnl_mutex){+.+.}, at: [<53390f0b>] rtnl_lock+0x17/0x20 > net/core/rtnetlink.c:74 > 2 locks held by khungtaskd/675: > #0: (rcu_read_lock){}, at: [<587c8471>] > check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] > #0: (rcu_read_lock){}, at: [<587c8471>] watchdog+0x1c5/0xd60 > kernel/hung_task.c:249 > #1: (tasklist_lock){.+.+}, at: [<5288685e>] > debug_show_all_locks+0xd3/0x400 kernel/locking/lockdep.c:4554 > 1 lock held by rsyslogd/2974: > #0: (&f->f_pos_lock){+.+.}, at: [<11e00499>] > __fdget_pos+0x131/0x1a0 fs/file.c:770 > 2 locks held by getty/3056: > #0: (&tty->ldisc_sem){}, at: [ ] > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 > #1: (&ldata->atomic_read_lock){+.+.}, at: [ ] > n_tty_read+0x2f2/0x1a10 drivers/tty/n_tty.c:2131 > 2 locks held by getty/3057: > #0: (&tty->ldisc_sem){}, at: [ ] > ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 > #1: (&ldata->atomic_read_lock){+.+.}, at: [ ] > n_tty_read+0x2f2/0x1a10 drivers/tty/n_tty.c:2131 > 2 locks held by getty/3058: > #0: (&tty->ldisc_sem){}, at: [ ] >