Re: net: deadlock on genl_mutex
On Sun, Jan 29, 2017 at 2:11 AM, Dmitry Vyukovwrote: > On Fri, Dec 9, 2016 at 6:08 AM, Cong Wang wrote: Chain exists of: Possible unsafe locking scenario: CPU0CPU1 lock(genl_mutex); lock(nlk->cb_mutex); lock(genl_mutex); lock(rtnl_mutex); *** DEADLOCK *** >>> >>> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex. >>> Let me think about it. >> >> Never mind. Actually both reports in this thread are legitimate. >> >> I know what happened now, the lock chain is so long, 4 locks are involved >> to form a chain!!! >> >> Let me think about how to break the chain. > > > Cong, any success with breaking the chain? No luck yet. Each part of the chain seems legit, not sure which one could be reordered. :-/
Re: net: deadlock on genl_mutex
On Fri, Dec 9, 2016 at 6:08 AM, Cong Wangwrote: >>> Chain exists of: >>> Possible unsafe locking scenario: >>> >>>CPU0CPU1 >>> >>> lock(genl_mutex); >>>lock(nlk->cb_mutex); >>>lock(genl_mutex); >>> lock(rtnl_mutex); >>> >>> *** DEADLOCK *** >> >> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex. >> Let me think about it. > > Never mind. Actually both reports in this thread are legitimate. > > I know what happened now, the lock chain is so long, 4 locks are involved > to form a chain!!! > > Let me think about how to break the chain. Cong, any success with breaking the chain? Still happenning on f0ad17712b9f71c24e2b8b9725230ef57232377f. Or is it a different one? [ INFO: possible circular locking dependency detected ] 4.10.0-rc3+ #4 Not tainted --- syz-executor9/2705 is trying to acquire lock: (genl_mutex){+.+.+.}, at: [] genl_lock net/netlink/genetlink.c:32 [inline] (genl_mutex){+.+.+.}, at: [] genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547 but task is already holding lock: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.+.}: [] validate_chain kernel/locking/lockdep.c:2265 [inline] [] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [] __mutex_lock_common kernel/locking/mutex.c:639 [inline] [] mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753 [] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70 [] nl80211_pre_doit+0x2fe/0x570 net/wireless/nl80211.c:11847 [] genl_family_rcv_msg+0x760/0x1040 net/netlink/genetlink.c:591 [] genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620 [] netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298 [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:631 [] netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline] [] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257 [] netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803 [] sock_sendmsg_nosec net/socket.c:635 [inline] [] sock_sendmsg+0xca/0x110 net/socket.c:645 [] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985 [] __sys_sendmsg+0x138/0x300 net/socket.c:2019 [] SYSC_sendmsg net/socket.c:2030 [inline] [] SyS_sendmsg+0x2d/0x50 net/socket.c:2026 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 -> #0 (genl_mutex){+.+.+.}: [] check_prev_add kernel/locking/lockdep.c:1828 [inline] [] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938 [] validate_chain kernel/locking/lockdep.c:2265 [inline] [] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [] __mutex_lock_common kernel/locking/mutex.c:639 [inline] [] mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753 [] genl_lock net/netlink/genetlink.c:32 [inline] [] genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547 [] genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620 [] netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298 [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:631 [] netlink_unicast_kernel net/netlink/af_netlink.c:1231 [inline] [] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257 [] netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803 [] sock_sendmsg_nosec net/socket.c:635 [inline] [] sock_sendmsg+0xca/0x110 net/socket.c:645 [] sock_write_iter+0x326/0x600 net/socket.c:848 [] new_sync_write fs/read_write.c:499 [inline] [] __vfs_write+0x483/0x740 fs/read_write.c:512 [] vfs_write+0x187/0x530 fs/read_write.c:560 [] SYSC_write fs/read_write.c:607 [inline] [] SyS_write+0xfb/0x230 fs/read_write.c:599 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 other info that might help us debug this: Possible unsafe locking scenario: CPU0CPU1 lock(rtnl_mutex); lock(genl_mutex); lock(rtnl_mutex); lock(genl_mutex); *** DEADLOCK *** 2 locks held by syz-executor9/2705: #0: (cb_lock){++}, at: [] genl_rcv+0x19/0x40 net/netlink/genetlink.c:630 #1: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70 stack backtrace: CPU: 1 PID: 2705 Comm: syz-executor9 Not tainted 4.10.0-rc3+ #4 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1202 check_prev_add kernel/locking/lockdep.c:1828 [inline] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938 validate_chain kernel/locking/lockdep.c:2265 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 lock_acquire+0x2a1/0x630
Re: net: deadlock on genl_mutex
On Fri, Dec 9, 2016 at 6:08 AM, Cong Wangwrote: > On Thu, Dec 8, 2016 at 4:32 PM, Cong Wang wrote: >> On Thu, Dec 8, 2016 at 9:16 AM, Dmitry Vyukov wrote: >>> Chain exists of: >>> Possible unsafe locking scenario: >>> >>>CPU0CPU1 >>> >>> lock(genl_mutex); >>>lock(nlk->cb_mutex); >>>lock(genl_mutex); >>> lock(rtnl_mutex); >>> >>> *** DEADLOCK *** >> >> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex. >> Let me think about it. > > Never mind. Actually both reports in this thread are legitimate. > > I know what happened now, the lock chain is so long, 4 locks are involved > to form a chain!!! > > Let me think about how to break the chain. Seems to be a related one, now on nfnl_lock : [ INFO: possible circular locking dependency detected ] 4.9.0-rc8+ #82 Not tainted --- syz-executor3/10151 is trying to acquire lock: ([i].mutex){+.+.+.}, at: [] nfnl_lock+0x2d/0x30 net/netfilter/nfnetlink.c:61 but task is already holding lock: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x1c/0x20 net/core/rtnetlink.c:70 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: [ 231.942041] [< inline >] validate_chain kernel/locking/lockdep.c:2265 [ 231.942041] [] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338 floppy0: disk absent or changed during operation floppy0: disk absent or changed during operation [ 231.950342] [] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749 [ 231.950342] [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 [ 231.950342] [] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621 [ 231.950342] [] rtnl_lock+0x1c/0x20 net/core/rtnetlink.c:70 [ 231.950342] [] nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750 [ 231.950342] [] genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631 [ 231.950342] [] genl_rcv_msg+0x1b0/0x260 net/netlink/genetlink.c:660 [ 231.950342] [] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298 [ 231.950342] [] genl_rcv+0x2d/0x40 net/netlink/genetlink.c:671 [ 231.950342] [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1231 [ 231.950342] [] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257 [ 231.950342] [] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803 [ 231.950342] [< inline >] sock_sendmsg_nosec net/socket.c:621 [ 231.950342] [] sock_sendmsg+0xcf/0x110 net/socket.c:631 [ 231.950342] [] sock_write_iter+0x32b/0x620 net/socket.c:829 [ 231.950342] [< inline >] new_sync_write fs/read_write.c:499 [ 231.950342] [] __vfs_write+0x4fe/0x830 fs/read_write.c:512 [ 231.950342] [] vfs_write+0x175/0x4e0 fs/read_write.c:560 [ 231.950342] [< inline >] SYSC_write fs/read_write.c:607 [ 231.950342] [] SyS_write+0x100/0x240 fs/read_write.c:599 [ 231.950342] [] entry_SYSCALL_64_fastpath+0x23/0xc6 [ 231.950342] [< inline >] validate_chain kernel/locking/lockdep.c:2265 [ 231.950342] [] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338 [ 231.950342] [] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749 [ 231.950342] [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 [ 231.950342] [] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621 [ 231.950342] [< inline >] genl_lock net/netlink/genetlink.c:31 [ 231.950342] [] genl_lock_dumpit+0x46/0xa0 net/netlink/genetlink.c:518 [ 231.950342] [] netlink_dump+0x57c/0xd70 net/netlink/af_netlink.c:2127 [ 231.950342] [] __netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217 [ 231.950342] [] genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586 [ 231.950342] [] genl_rcv_msg+0x1b0/0x260 net/netlink/genetlink.c:660 [ 231.950342] [] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298 [ 231.950342] [] genl_rcv+0x2d/0x40 net/netlink/genetlink.c:671 [ 231.950342] [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1231 [ 231.950342] [] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257 [ 231.950342] [] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803 [ 231.950342] [< inline >] sock_sendmsg_nosec net/socket.c:621 [ 231.950342] [] sock_sendmsg+0xcf/0x110 net/socket.c:631 [ 231.950342] [] sock_write_iter+0x32b/0x620 net/socket.c:829 [ 231.950342] [] do_iter_readv_writev+0x363/0x670 fs/read_write.c:695 [ 231.950342] [] do_readv_writev+0x431/0x9b0 fs/read_write.c:872 [
Re: net: deadlock on genl_mutex
On Thu, Dec 8, 2016 at 4:32 PM, Cong Wangwrote: > On Thu, Dec 8, 2016 at 9:16 AM, Dmitry Vyukov wrote: >> Chain exists of: >> Possible unsafe locking scenario: >> >>CPU0CPU1 >> >> lock(genl_mutex); >>lock(nlk->cb_mutex); >>lock(genl_mutex); >> lock(rtnl_mutex); >> >> *** DEADLOCK *** > > This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex. > Let me think about it. Never mind. Actually both reports in this thread are legitimate. I know what happened now, the lock chain is so long, 4 locks are involved to form a chain!!! Let me think about how to break the chain.
Re: net: deadlock on genl_mutex
On Thu, Dec 8, 2016 at 9:16 AM, Dmitry Vyukovwrote: > Chain exists of: > Possible unsafe locking scenario: > >CPU0CPU1 > > lock(genl_mutex); >lock(nlk->cb_mutex); >lock(genl_mutex); > lock(rtnl_mutex); > > *** DEADLOCK *** This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex. Let me think about it.
Re: net: deadlock on genl_mutex
On Thu, Dec 8, 2016 at 10:02 AM, Dmitry Vyukovwrote: > Chain exists of: > Possible unsafe locking scenario: > >CPU0CPU1 > > lock(nlk->cb_mutex); >lock([i].mutex); >lock(nlk->cb_mutex); > lock(genl_mutex); Similar to the unix bindlock, this one looks false positive to me too.
Re: net: deadlock on genl_mutex
On Thu, Dec 8, 2016 at 6:16 PM, Dmitry Vyukovwrote: > On Thu, Dec 8, 2016 at 5:16 PM, Dmitry Vyukov wrote: >> On Tue, Nov 29, 2016 at 6:59 AM, wrote: Issue was reported yesterday and is under investigation. http://marc.info/?l=linux-netdev=148014004331663=2 Thanks ! >>> >>> >>> Hi Dmitry >>> >>> Can you try the patch below with your reproducer? I haven't seen similar >>> crashes reported after this (or even with Eric's patch). >> >> I've synced to 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7) and do >> _not_ see this report happening anymore. >> Thanks. > > > But now I am seeing "possible deadlock" warnings involving genl_lock: > > [ INFO: possible circular locking dependency detected ] > 4.9.0-rc8+ #77 Not tainted > --- > syz-executor7/18794 is trying to acquire lock: > (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x1c/0x20 > net/core/rtnetlink.c:70 > but task is already holding lock: > (genl_mutex){+.+.+.}, at: [< inline >] genl_lock > net/netlink/genetlink.c:31 > (genl_mutex){+.+.+.}, at: [] > genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658 > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > >[ 315.403815] [< inline >] validate_chain > kernel/locking/lockdep.c:2265 >[ 315.403815] [] > __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338 >[ 315.403815] [] lock_acquire+0x2a2/0x790 > kernel/locking/lockdep.c:3749 >[ 315.403815] [< inline >] __mutex_lock_common > kernel/locking/mutex.c:521 >[ 315.403815] [] > mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621 >[ 315.403815] [< inline >] genl_lock > net/netlink/genetlink.c:31 >[ 315.403815] [] genl_lock_dumpit+0x46/0xa0 > net/netlink/genetlink.c:518 >[ 315.403815] [] netlink_dump+0x57c/0xd70 > net/netlink/af_netlink.c:2127 >[ 315.403815] [] > __netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217 >[ 315.403815] [] > genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586 >[ 315.403815] [] genl_rcv_msg+0x1b0/0x260 > net/netlink/genetlink.c:660 >[ 315.403815] [] netlink_rcv_skb+0x2bc/0x3a0 > net/netlink/af_netlink.c:2298 >[ 315.403815] [] genl_rcv+0x2d/0x40 > net/netlink/genetlink.c:671 >[ 315.403815] [< inline >] netlink_unicast_kernel > net/netlink/af_netlink.c:1231 >[ 315.403815] [] netlink_unicast+0x51a/0x740 > net/netlink/af_netlink.c:1257 >[ 315.403815] [] netlink_sendmsg+0xaa4/0xe50 > net/netlink/af_netlink.c:1803 >[ 315.403815] [< inline >] sock_sendmsg_nosec net/socket.c:621 >[ 315.403815] [] sock_sendmsg+0xcf/0x110 > net/socket.c:631 >[ 315.403815] [] sock_write_iter+0x32b/0x620 > net/socket.c:829 >[ 315.403815] [< inline >] new_sync_write fs/read_write.c:499 >[ 315.403815] [] __vfs_write+0x4fe/0x830 > fs/read_write.c:512 >[ 315.403815] [] vfs_write+0x175/0x4e0 > fs/read_write.c:560 >[ 315.403815] [< inline >] SYSC_write fs/read_write.c:607 >[ 315.403815] [] SyS_write+0x100/0x240 > fs/read_write.c:599 >[ 315.403815] [] entry_SYSCALL_64_fastpath+0x23/0xc6 > >[ 315.403815] [< inline >] validate_chain > kernel/locking/lockdep.c:2265 >[ 315.403815] [] > __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338 >[ 315.403815] [] lock_acquire+0x2a2/0x790 > kernel/locking/lockdep.c:3749 >[ 315.403815] [< inline >] __mutex_lock_common > kernel/locking/mutex.c:521 >[ 315.403815] [] > mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621 >[ 315.403815] [] > __netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187 >[ 315.403815] [< inline >] netlink_dump_start > include/linux/netlink.h:165 >[ 315.403815] [] > ctnetlink_stat_ct_cpu+0x198/0x1e0 > net/netfilter/nf_conntrack_netlink.c:2045 >[ 315.403815] [] > nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212 >[ 315.403815] [] netlink_rcv_skb+0x2bc/0x3a0 > net/netlink/af_netlink.c:2298 >[ 315.403815] [] nfnetlink_rcv+0x7e1/0x10d0 > net/netfilter/nfnetlink.c:474 >[ 315.403815] [< inline >] netlink_unicast_kernel > net/netlink/af_netlink.c:1231 >[ 315.403815] [] netlink_unicast+0x51a/0x740 > net/netlink/af_netlink.c:1257 >[ 315.403815] [] netlink_sendmsg+0xaa4/0xe50 > net/netlink/af_netlink.c:1803 >[ 315.403815] [< inline >] sock_sendmsg_nosec net/socket.c:621 >[ 315.403815] [] sock_sendmsg+0xcf/0x110 > net/socket.c:631 >[ 315.403815] [] sock_write_iter+0x32b/0x620 > net/socket.c:829 >[ 315.403815] [< inline >] new_sync_write fs/read_write.c:499 >[
Re: net: deadlock on genl_mutex
On Thu, Dec 8, 2016 at 5:16 PM, Dmitry Vyukovwrote: > On Tue, Nov 29, 2016 at 6:59 AM, wrote: >>> >>> Issue was reported yesterday and is under investigation. >>> >>> >>> http://marc.info/?l=linux-netdev=148014004331663=2 >>> >>> >>> Thanks ! >> >> >> Hi Dmitry >> >> Can you try the patch below with your reproducer? I haven't seen similar >> crashes reported after this (or even with Eric's patch). > > I've synced to 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7) and do > _not_ see this report happening anymore. > Thanks. But now I am seeing "possible deadlock" warnings involving genl_lock: [ INFO: possible circular locking dependency detected ] 4.9.0-rc8+ #77 Not tainted --- syz-executor7/18794 is trying to acquire lock: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x1c/0x20 net/core/rtnetlink.c:70 but task is already holding lock: (genl_mutex){+.+.+.}, at: [< inline >] genl_lock net/netlink/genetlink.c:31 (genl_mutex){+.+.+.}, at: [] genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: [ 315.403815] [< inline >] validate_chain kernel/locking/lockdep.c:2265 [ 315.403815] [] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338 [ 315.403815] [] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749 [ 315.403815] [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 [ 315.403815] [] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621 [ 315.403815] [< inline >] genl_lock net/netlink/genetlink.c:31 [ 315.403815] [] genl_lock_dumpit+0x46/0xa0 net/netlink/genetlink.c:518 [ 315.403815] [] netlink_dump+0x57c/0xd70 net/netlink/af_netlink.c:2127 [ 315.403815] [] __netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217 [ 315.403815] [] genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586 [ 315.403815] [] genl_rcv_msg+0x1b0/0x260 net/netlink/genetlink.c:660 [ 315.403815] [] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298 [ 315.403815] [] genl_rcv+0x2d/0x40 net/netlink/genetlink.c:671 [ 315.403815] [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1231 [ 315.403815] [] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257 [ 315.403815] [] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803 [ 315.403815] [< inline >] sock_sendmsg_nosec net/socket.c:621 [ 315.403815] [] sock_sendmsg+0xcf/0x110 net/socket.c:631 [ 315.403815] [] sock_write_iter+0x32b/0x620 net/socket.c:829 [ 315.403815] [< inline >] new_sync_write fs/read_write.c:499 [ 315.403815] [] __vfs_write+0x4fe/0x830 fs/read_write.c:512 [ 315.403815] [] vfs_write+0x175/0x4e0 fs/read_write.c:560 [ 315.403815] [< inline >] SYSC_write fs/read_write.c:607 [ 315.403815] [] SyS_write+0x100/0x240 fs/read_write.c:599 [ 315.403815] [] entry_SYSCALL_64_fastpath+0x23/0xc6 [ 315.403815] [< inline >] validate_chain kernel/locking/lockdep.c:2265 [ 315.403815] [] __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338 [ 315.403815] [] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3749 [ 315.403815] [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 [ 315.403815] [] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621 [ 315.403815] [] __netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187 [ 315.403815] [< inline >] netlink_dump_start include/linux/netlink.h:165 [ 315.403815] [] ctnetlink_stat_ct_cpu+0x198/0x1e0 net/netfilter/nf_conntrack_netlink.c:2045 [ 315.403815] [] nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212 [ 315.403815] [] netlink_rcv_skb+0x2bc/0x3a0 net/netlink/af_netlink.c:2298 [ 315.403815] [] nfnetlink_rcv+0x7e1/0x10d0 net/netfilter/nfnetlink.c:474 [ 315.403815] [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1231 [ 315.403815] [] netlink_unicast+0x51a/0x740 net/netlink/af_netlink.c:1257 [ 315.403815] [] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1803 [ 315.403815] [< inline >] sock_sendmsg_nosec net/socket.c:621 [ 315.403815] [] sock_sendmsg+0xcf/0x110 net/socket.c:631 [ 315.403815] [] sock_write_iter+0x32b/0x620 net/socket.c:829 [ 315.403815] [< inline >] new_sync_write fs/read_write.c:499 [ 315.403815] [] __vfs_write+0x4fe/0x830 fs/read_write.c:512 [ 315.403815] [] vfs_write+0x175/0x4e0 fs/read_write.c:560 [ 315.403815] [< inline >] SYSC_write fs/read_write.c:607 [ 315.403815] [] SyS_write+0x100/0x240 fs/read_write.c:599 [ 315.403815] []
Re: net: deadlock on genl_mutex
On Tue, Nov 29, 2016 at 6:59 AM,wrote: >> >> Issue was reported yesterday and is under investigation. >> >> >> http://marc.info/?l=linux-netdev=148014004331663=2 >> >> >> Thanks ! > > > Hi Dmitry > > Can you try the patch below with your reproducer? I haven't seen similar > crashes reported after this (or even with Eric's patch). I've synced to 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7) and do _not_ see this report happening anymore. Thanks.
Re: net: deadlock on genl_mutex
On Mon, 2016-11-28 at 22:59 -0700, subas...@codeaurora.org wrote: > > > > Issue was reported yesterday and is under investigation. > > > > > > http://marc.info/?l=linux-netdev=148014004331663=2 > > > > > > Thanks ! > > Hi Dmitry > > Can you try the patch below with your reproducer? I haven't seen similar > crashes reported after this (or even with Eric's patch). > > https://patchwork.ozlabs.org/patch/699937/ Yeah, I will post my patch on top of this one.
Re: net: deadlock on genl_mutex
Issue was reported yesterday and is under investigation. http://marc.info/?l=linux-netdev=148014004331663=2 Thanks ! Hi Dmitry Can you try the patch below with your reproducer? I haven't seen similar crashes reported after this (or even with Eric's patch). https://patchwork.ozlabs.org/patch/699937/ -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: net: deadlock on genl_mutex
On Sat, Nov 26, 2016 at 9:04 AM, Dmitry Vyukovwrote: > Hello, > > The following program triggers deadlock warnings on genl_mutex: > > https://gist.githubusercontent.com/dvyukov/65e33d053e507d2ab0bf6ae83d989585/raw/b3c640ec58e894b50bcbf255c471406466cfa5d0/gistfile1.txt > > On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). > > BUG: sleeping function called from invalid context at > kernel/locking/mutex.c:620 > in_atomic(): 1, irqs_disabled(): 0, pid: 32289, name: syz-executor > CPU: 0 PID: 32289 Comm: syz-executor Not tainted 4.9.0-rc5+ #54 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > 88003ec06420 834c2e39 110007d80c17 > ed0007d80c0f 41b58ab3 89575550 834c2b4b > 8baab1a0 dc00 880068f794e0 > Call Trace: > [ 287.394552] [< inline >] __dump_stack lib/dump_stack.c:15 > [ 287.394552] [] dump_stack+0x2ee/0x3f5 > lib/dump_stack.c:51 > [] ___might_sleep+0x483/0x660 kernel/sched/core.c:7761 > [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720 > [] mutex_lock_nested+0x1ea/0xf20 kernel/locking/mutex.c:620 > [< inline >] genl_lock net/netlink/genetlink.c:31 > [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531 > [] netlink_sock_destruct+0xf8/0x400 > net/netlink/af_netlink.c:331 > [] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423 > [] sk_destruct+0x4c/0x80 net/core/sock.c:1453 > [] __sk_free+0x5c/0x230 net/core/sock.c:1461 > [] sk_free+0x28/0x30 net/core/sock.c:1472 > [< inline >] sock_put include/net/sock.h:1591 > [] deferred_put_nlk_sk+0x31/0x40 > net/netlink/af_netlink.c:652 > [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118 > [] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776 > [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040 > [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007 > [] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024 > [] __do_softirq+0x32b/0xca8 kernel/softirq.c:284 > [< inline >] invoke_softirq kernel/softirq.c:364 > [] irq_exit+0x1d1/0x210 kernel/softirq.c:405 > [< inline >] exiting_irq arch/x86/include/asm/apic.h:659 > [] smp_apic_timer_interrupt+0x80/0xa0 > arch/x86/kernel/apic/apic.c:960 > [] apic_timer_interrupt+0x8c/0xa0 > arch/x86/entry/entry_64.S:489 > [ 287.403717] [] ? lock_is_held+0x247/0x310 > [] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729 > [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720 > [] down_read+0x78/0x160 kernel/locking/rwsem.c:21 > [< inline >] anon_vma_lock_read include/linux/rmap.h:127 > [] validate_mm+0xe5/0x880 mm/mmap.c:347 > [] vma_link+0x11b/0x180 mm/mmap.c:605 > [] mmap_region+0x1076/0x1880 mm/mmap.c:1692 > [] do_mmap+0x6ff/0xe80 mm/mmap.c:1450 > [< inline >] do_mmap_pgoff include/linux/mm.h:2039 > [] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305 > [< inline >] SYSC_mmap_pgoff mm/mmap.c:1500 > [] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458 > [< inline >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95 > [] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86 > [] entry_SYSCALL_64_fastpath+0x23/0xc6 > > = > [ INFO: inconsistent lock state ] > 4.9.0-rc5+ #54 Tainted: GW > - > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. > syz-executor/32289 [HC0[0]:SC1[1]:HE1:SE0] takes: > ([ 287.580014] genl_mutex > [< inline >] genl_lock net/netlink/genetlink.c:31 > [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531 > {SOFTIRQ-ON-W} state was registered at: > [ 287.580014] [< inline >] mark_irqflags > kernel/locking/lockdep.c:2938 > [ 287.580014] [] __lock_acquire+0x6e7/0x3380 > kernel/locking/lockdep.c:3292 > [ 287.580014] [] lock_acquire+0x2a2/0x790 > kernel/locking/lockdep.c:3746 > [ 287.580014] [< inline >] __mutex_lock_common > kernel/locking/mutex.c:521 > [ 287.580014] [] mutex_lock_nested+0x23f/0xf20 > kernel/locking/mutex.c:621 > [ 287.580014] [< inline >] genl_lock net/netlink/genetlink.c:31 > [ 287.580014] [< inline >] genl_lock_all net/netlink/genetlink.c:52 > [ 287.580014] [] > __genl_register_family+0x2ce/0x1870 net/netlink/genetlink.c:374 > [ 287.580014] [< inline >] > _genl_register_family_with_ops_grps include/net/genetlink.h:173 > [ 287.580014] [] genl_init+0x11d/0x185 > net/netlink/genetlink.c:1084 > [ 287.580014] [] do_one_initcall+0xfb/0x3f0 > init/main.c:778 > [ 287.580014] [< inline >] do_initcall_level init/main.c:844 > [ 287.580014] [< inline >] do_initcalls init/main.c:852 > [ 287.580014] [< inline >] do_basic_setup init/main.c:870 > [ 287.580014] [] kernel_init_freeable+0x5c4/0x69e > init/main.c:1017 > [ 287.580014] [] kernel_init+0x18/0x180 init/main.c:943 > [ 287.580014] [] ret_from_fork+0x2a/0x40 >
net: deadlock on genl_mutex
Hello, The following program triggers deadlock warnings on genl_mutex: https://gist.githubusercontent.com/dvyukov/65e33d053e507d2ab0bf6ae83d989585/raw/b3c640ec58e894b50bcbf255c471406466cfa5d0/gistfile1.txt On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620 in_atomic(): 1, irqs_disabled(): 0, pid: 32289, name: syz-executor CPU: 0 PID: 32289 Comm: syz-executor Not tainted 4.9.0-rc5+ #54 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 88003ec06420 834c2e39 110007d80c17 ed0007d80c0f 41b58ab3 89575550 834c2b4b 8baab1a0 dc00 880068f794e0 Call Trace: [ 287.394552] [< inline >] __dump_stack lib/dump_stack.c:15 [ 287.394552] [] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51 [] ___might_sleep+0x483/0x660 kernel/sched/core.c:7761 [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720 [] mutex_lock_nested+0x1ea/0xf20 kernel/locking/mutex.c:620 [< inline >] genl_lock net/netlink/genetlink.c:31 [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531 [] netlink_sock_destruct+0xf8/0x400 net/netlink/af_netlink.c:331 [] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423 [] sk_destruct+0x4c/0x80 net/core/sock.c:1453 [] __sk_free+0x5c/0x230 net/core/sock.c:1461 [] sk_free+0x28/0x30 net/core/sock.c:1472 [< inline >] sock_put include/net/sock.h:1591 [] deferred_put_nlk_sk+0x31/0x40 net/netlink/af_netlink.c:652 [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118 [] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776 [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040 [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007 [] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024 [] __do_softirq+0x32b/0xca8 kernel/softirq.c:284 [< inline >] invoke_softirq kernel/softirq.c:364 [] irq_exit+0x1d1/0x210 kernel/softirq.c:405 [< inline >] exiting_irq arch/x86/include/asm/apic.h:659 [] smp_apic_timer_interrupt+0x80/0xa0 arch/x86/kernel/apic/apic.c:960 [] apic_timer_interrupt+0x8c/0xa0 arch/x86/entry/entry_64.S:489 [ 287.403717] [] ? lock_is_held+0x247/0x310 [] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729 [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720 [] down_read+0x78/0x160 kernel/locking/rwsem.c:21 [< inline >] anon_vma_lock_read include/linux/rmap.h:127 [] validate_mm+0xe5/0x880 mm/mmap.c:347 [] vma_link+0x11b/0x180 mm/mmap.c:605 [] mmap_region+0x1076/0x1880 mm/mmap.c:1692 [] do_mmap+0x6ff/0xe80 mm/mmap.c:1450 [< inline >] do_mmap_pgoff include/linux/mm.h:2039 [] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305 [< inline >] SYSC_mmap_pgoff mm/mmap.c:1500 [] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458 [< inline >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95 [] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86 [] entry_SYSCALL_64_fastpath+0x23/0xc6 = [ INFO: inconsistent lock state ] 4.9.0-rc5+ #54 Tainted: GW - inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. syz-executor/32289 [HC0[0]:SC1[1]:HE1:SE0] takes: ([ 287.580014] genl_mutex [< inline >] genl_lock net/netlink/genetlink.c:31 [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531 {SOFTIRQ-ON-W} state was registered at: [ 287.580014] [< inline >] mark_irqflags kernel/locking/lockdep.c:2938 [ 287.580014] [] __lock_acquire+0x6e7/0x3380 kernel/locking/lockdep.c:3292 [ 287.580014] [] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746 [ 287.580014] [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 [ 287.580014] [] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621 [ 287.580014] [< inline >] genl_lock net/netlink/genetlink.c:31 [ 287.580014] [< inline >] genl_lock_all net/netlink/genetlink.c:52 [ 287.580014] [] __genl_register_family+0x2ce/0x1870 net/netlink/genetlink.c:374 [ 287.580014] [< inline >] _genl_register_family_with_ops_grps include/net/genetlink.h:173 [ 287.580014] [] genl_init+0x11d/0x185 net/netlink/genetlink.c:1084 [ 287.580014] [] do_one_initcall+0xfb/0x3f0 init/main.c:778 [ 287.580014] [< inline >] do_initcall_level init/main.c:844 [ 287.580014] [< inline >] do_initcalls init/main.c:852 [ 287.580014] [< inline >] do_basic_setup init/main.c:870 [ 287.580014] [] kernel_init_freeable+0x5c4/0x69e init/main.c:1017 [ 287.580014] [] kernel_init+0x18/0x180 init/main.c:943 [ 287.580014] [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 [ 78.258919] [ INFO: inconsistent lock state ] [ 78.258919] 4.9.0-rc5+ #54 Tainted: GW [ 78.258919] - [ 78.258919] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 78.258919] syz-fuzzer/5211