Re: net: deadlock on genl_mutex

2017-02-05 Thread Cong Wang
On Sun, Jan 29, 2017 at 2:11 AM, Dmitry Vyukov  wrote:
> On Fri, Dec 9, 2016 at 6:08 AM, Cong Wang  wrote:
 Chain exists of:
  Possible unsafe locking scenario:

CPU0CPU1

   lock(genl_mutex);
lock(nlk->cb_mutex);
lock(genl_mutex);
   lock(rtnl_mutex);

  *** DEADLOCK ***
>>>
>>> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
>>> Let me think about it.
>>
>> Never mind. Actually both reports in this thread are legitimate.
>>
>> I know what happened now, the lock chain is so long, 4 locks are involved
>> to form a chain!!!
>>
>> Let me think about how to break the chain.
>
>
> Cong, any success with breaking the chain?

No luck yet. Each part of the chain seems legit, not sure which
one could be reordered. :-/


Re: net: deadlock on genl_mutex

2017-01-29 Thread Dmitry Vyukov
On Fri, Dec 9, 2016 at 6:08 AM, Cong Wang  wrote:
>>> Chain exists of:
>>>  Possible unsafe locking scenario:
>>>
>>>CPU0CPU1
>>>
>>>   lock(genl_mutex);
>>>lock(nlk->cb_mutex);
>>>lock(genl_mutex);
>>>   lock(rtnl_mutex);
>>>
>>>  *** DEADLOCK ***
>>
>> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
>> Let me think about it.
>
> Never mind. Actually both reports in this thread are legitimate.
>
> I know what happened now, the lock chain is so long, 4 locks are involved
> to form a chain!!!
>
> Let me think about how to break the chain.


Cong, any success with breaking the chain?

Still happenning on f0ad17712b9f71c24e2b8b9725230ef57232377f. Or is it
a different one?


[ INFO: possible circular locking dependency detected ]
4.10.0-rc3+ #4 Not tainted
---
syz-executor9/2705 is trying to acquire lock:
 (genl_mutex){+.+.+.}, at: [] genl_lock
net/netlink/genetlink.c:32 [inline]
 (genl_mutex){+.+.+.}, at: []
genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547

but task is already holding lock:
 (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:70

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (rtnl_mutex){+.+.+.}:

[] validate_chain kernel/locking/lockdep.c:2265 [inline]
[] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
[] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
[] __mutex_lock_common kernel/locking/mutex.c:639 [inline]
[] mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753
[] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
[] nl80211_pre_doit+0x2fe/0x570 net/wireless/nl80211.c:11847
[] genl_family_rcv_msg+0x760/0x1040
net/netlink/genetlink.c:591
[] genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620
[] netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
[] genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
[] netlink_unicast_kernel
net/netlink/af_netlink.c:1231 [inline]
[] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
[] netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
[] sock_sendmsg_nosec net/socket.c:635 [inline]
[] sock_sendmsg+0xca/0x110 net/socket.c:645
[] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985
[] __sys_sendmsg+0x138/0x300 net/socket.c:2019
[] SYSC_sendmsg net/socket.c:2030 [inline]
[] SyS_sendmsg+0x2d/0x50 net/socket.c:2026
[] entry_SYSCALL_64_fastpath+0x1f/0xc2

-> #0 (genl_mutex){+.+.+.}:

[] check_prev_add kernel/locking/lockdep.c:1828 [inline]
[] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938
[] validate_chain kernel/locking/lockdep.c:2265 [inline]
[] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
[] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
[] __mutex_lock_common kernel/locking/mutex.c:639 [inline]
[] mutex_lock_nested+0x290/0x1730 kernel/locking/mutex.c:753
[] genl_lock net/netlink/genetlink.c:32 [inline]
[] genl_family_rcv_msg+0xdae/0x1040
net/netlink/genetlink.c:547
[] genl_rcv_msg+0x19a/0x330 net/netlink/genetlink.c:620
[] netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2298
[] genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
[] netlink_unicast_kernel
net/netlink/af_netlink.c:1231 [inline]
[] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1257
[] netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1803
[] sock_sendmsg_nosec net/socket.c:635 [inline]
[] sock_sendmsg+0xca/0x110 net/socket.c:645
[] sock_write_iter+0x326/0x600 net/socket.c:848
[] new_sync_write fs/read_write.c:499 [inline]
[] __vfs_write+0x483/0x740 fs/read_write.c:512
[] vfs_write+0x187/0x530 fs/read_write.c:560
[] SYSC_write fs/read_write.c:607 [inline]
[] SyS_write+0xfb/0x230 fs/read_write.c:599
[] entry_SYSCALL_64_fastpath+0x1f/0xc2

other info that might help us debug this:

 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock(rtnl_mutex);
   lock(genl_mutex);
   lock(rtnl_mutex);
  lock(genl_mutex);

 *** DEADLOCK ***

2 locks held by syz-executor9/2705:
 #0:  (cb_lock){++}, at: [] genl_rcv+0x19/0x40
net/netlink/genetlink.c:630
 #1:  (rtnl_mutex){+.+.+.}, at: []
rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70

stack backtrace:
CPU: 1 PID: 2705 Comm: syz-executor9 Not tainted 4.10.0-rc3+ #4
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1202
 check_prev_add kernel/locking/lockdep.c:1828 [inline]
 check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938
 validate_chain kernel/locking/lockdep.c:2265 [inline]
 __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338
 lock_acquire+0x2a1/0x630 

Re: net: deadlock on genl_mutex

2016-12-11 Thread Dmitry Vyukov
On Fri, Dec 9, 2016 at 6:08 AM, Cong Wang  wrote:
> On Thu, Dec 8, 2016 at 4:32 PM, Cong Wang  wrote:
>> On Thu, Dec 8, 2016 at 9:16 AM, Dmitry Vyukov  wrote:
>>> Chain exists of:
>>>  Possible unsafe locking scenario:
>>>
>>>CPU0CPU1
>>>
>>>   lock(genl_mutex);
>>>lock(nlk->cb_mutex);
>>>lock(genl_mutex);
>>>   lock(rtnl_mutex);
>>>
>>>  *** DEADLOCK ***
>>
>> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
>> Let me think about it.
>
> Never mind. Actually both reports in this thread are legitimate.
>
> I know what happened now, the lock chain is so long, 4 locks are involved
> to form a chain!!!
>
> Let me think about how to break the chain.



Seems to be a related one, now on nfnl_lock :



[ INFO: possible circular locking dependency detected ]
4.9.0-rc8+ #82 Not tainted
---
syz-executor3/10151 is trying to acquire lock:
 ([i].mutex){+.+.+.}, at: []
nfnl_lock+0x2d/0x30 net/netfilter/nfnetlink.c:61
but task is already holding lock:
 (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

   [  231.942041] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
   [  231.942041] []
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
floppy0: disk absent or changed during operation
floppy0: disk absent or changed during operation
   [  231.950342] [] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
   [  231.950342] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
   [  231.950342] []
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
   [  231.950342] [] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
   [  231.950342] []
nl80211_pre_doit+0x309/0x5b0 net/wireless/nl80211.c:11750
   [  231.950342] []
genl_family_rcv_msg+0x780/0x1070 net/netlink/genetlink.c:631
   [  231.950342] [] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
   [  231.950342] [] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
   [  231.950342] [] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
   [  231.950342] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
   [  231.950342] [] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
   [  231.950342] [] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
   [  231.950342] [< inline >] sock_sendmsg_nosec net/socket.c:621
   [  231.950342] [] sock_sendmsg+0xcf/0x110
net/socket.c:631
   [  231.950342] [] sock_write_iter+0x32b/0x620
net/socket.c:829
   [  231.950342] [< inline >] new_sync_write fs/read_write.c:499
   [  231.950342] [] __vfs_write+0x4fe/0x830
fs/read_write.c:512
   [  231.950342] [] vfs_write+0x175/0x4e0
fs/read_write.c:560
   [  231.950342] [< inline >] SYSC_write fs/read_write.c:607
   [  231.950342] [] SyS_write+0x100/0x240
fs/read_write.c:599
   [  231.950342] [] entry_SYSCALL_64_fastpath+0x23/0xc6

   [  231.950342] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
   [  231.950342] []
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
   [  231.950342] [] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
   [  231.950342] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
   [  231.950342] []
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
   [  231.950342] [< inline >] genl_lock net/netlink/genetlink.c:31
   [  231.950342] [] genl_lock_dumpit+0x46/0xa0
net/netlink/genetlink.c:518
   [  231.950342] [] netlink_dump+0x57c/0xd70
net/netlink/af_netlink.c:2127
   [  231.950342] []
__netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
   [  231.950342] []
genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
   [  231.950342] [] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
   [  231.950342] [] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
   [  231.950342] [] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
   [  231.950342] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
   [  231.950342] [] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
   [  231.950342] [] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
   [  231.950342] [< inline >] sock_sendmsg_nosec net/socket.c:621
   [  231.950342] [] sock_sendmsg+0xcf/0x110
net/socket.c:631
   [  231.950342] [] sock_write_iter+0x32b/0x620
net/socket.c:829
   [  231.950342] []
do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
   [  231.950342] [] do_readv_writev+0x431/0x9b0
fs/read_write.c:872
   [  

Re: net: deadlock on genl_mutex

2016-12-08 Thread Cong Wang
On Thu, Dec 8, 2016 at 4:32 PM, Cong Wang  wrote:
> On Thu, Dec 8, 2016 at 9:16 AM, Dmitry Vyukov  wrote:
>> Chain exists of:
>>  Possible unsafe locking scenario:
>>
>>CPU0CPU1
>>
>>   lock(genl_mutex);
>>lock(nlk->cb_mutex);
>>lock(genl_mutex);
>>   lock(rtnl_mutex);
>>
>>  *** DEADLOCK ***
>
> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
> Let me think about it.

Never mind. Actually both reports in this thread are legitimate.

I know what happened now, the lock chain is so long, 4 locks are involved
to form a chain!!!

Let me think about how to break the chain.


Re: net: deadlock on genl_mutex

2016-12-08 Thread Cong Wang
On Thu, Dec 8, 2016 at 9:16 AM, Dmitry Vyukov  wrote:
> Chain exists of:
>  Possible unsafe locking scenario:
>
>CPU0CPU1
>
>   lock(genl_mutex);
>lock(nlk->cb_mutex);
>lock(genl_mutex);
>   lock(rtnl_mutex);
>
>  *** DEADLOCK ***

This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
Let me think about it.


Re: net: deadlock on genl_mutex

2016-12-08 Thread Cong Wang
On Thu, Dec 8, 2016 at 10:02 AM, Dmitry Vyukov  wrote:
> Chain exists of:
>  Possible unsafe locking scenario:
>
>CPU0CPU1
>
>   lock(nlk->cb_mutex);
>lock([i].mutex);
>lock(nlk->cb_mutex);
>   lock(genl_mutex);

Similar to the unix bindlock, this one looks false positive to me too.


Re: net: deadlock on genl_mutex

2016-12-08 Thread Dmitry Vyukov
On Thu, Dec 8, 2016 at 6:16 PM, Dmitry Vyukov  wrote:
> On Thu, Dec 8, 2016 at 5:16 PM, Dmitry Vyukov  wrote:
>> On Tue, Nov 29, 2016 at 6:59 AM,   wrote:

 Issue was reported yesterday and is under investigation.


 http://marc.info/?l=linux-netdev=148014004331663=2


 Thanks !
>>>
>>>
>>> Hi Dmitry
>>>
>>> Can you try the patch below with your reproducer? I haven't seen similar
>>> crashes reported after this (or even with Eric's patch).
>>
>> I've synced to 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7) and do
>> _not_ see this report happening anymore.
>> Thanks.
>
>
> But now I am seeing "possible deadlock" warnings involving genl_lock:
>
> [ INFO: possible circular locking dependency detected ]
> 4.9.0-rc8+ #77 Not tainted
> ---
> syz-executor7/18794 is trying to acquire lock:
>  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x1c/0x20
> net/core/rtnetlink.c:70
> but task is already holding lock:
>  (genl_mutex){+.+.+.}, at: [< inline >] genl_lock
> net/netlink/genetlink.c:31
>  (genl_mutex){+.+.+.}, at: []
> genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
>[  315.403815] [< inline >] validate_chain
> kernel/locking/lockdep.c:2265
>[  315.403815] []
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
>[  315.403815] [] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
>[  315.403815] [< inline >] __mutex_lock_common
> kernel/locking/mutex.c:521
>[  315.403815] []
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>[  315.403815] [< inline >] genl_lock 
> net/netlink/genetlink.c:31
>[  315.403815] [] genl_lock_dumpit+0x46/0xa0
> net/netlink/genetlink.c:518
>[  315.403815] [] netlink_dump+0x57c/0xd70
> net/netlink/af_netlink.c:2127
>[  315.403815] []
> __netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
>[  315.403815] []
> genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
>[  315.403815] [] genl_rcv_msg+0x1b0/0x260
> net/netlink/genetlink.c:660
>[  315.403815] [] netlink_rcv_skb+0x2bc/0x3a0
> net/netlink/af_netlink.c:2298
>[  315.403815] [] genl_rcv+0x2d/0x40
> net/netlink/genetlink.c:671
>[  315.403815] [< inline >] netlink_unicast_kernel
> net/netlink/af_netlink.c:1231
>[  315.403815] [] netlink_unicast+0x51a/0x740
> net/netlink/af_netlink.c:1257
>[  315.403815] [] netlink_sendmsg+0xaa4/0xe50
> net/netlink/af_netlink.c:1803
>[  315.403815] [< inline >] sock_sendmsg_nosec net/socket.c:621
>[  315.403815] [] sock_sendmsg+0xcf/0x110
> net/socket.c:631
>[  315.403815] [] sock_write_iter+0x32b/0x620
> net/socket.c:829
>[  315.403815] [< inline >] new_sync_write fs/read_write.c:499
>[  315.403815] [] __vfs_write+0x4fe/0x830
> fs/read_write.c:512
>[  315.403815] [] vfs_write+0x175/0x4e0
> fs/read_write.c:560
>[  315.403815] [< inline >] SYSC_write fs/read_write.c:607
>[  315.403815] [] SyS_write+0x100/0x240
> fs/read_write.c:599
>[  315.403815] [] entry_SYSCALL_64_fastpath+0x23/0xc6
>
>[  315.403815] [< inline >] validate_chain
> kernel/locking/lockdep.c:2265
>[  315.403815] []
> __lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
>[  315.403815] [] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3749
>[  315.403815] [< inline >] __mutex_lock_common
> kernel/locking/mutex.c:521
>[  315.403815] []
> mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
>[  315.403815] []
> __netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
>[  315.403815] [< inline >] netlink_dump_start
> include/linux/netlink.h:165
>[  315.403815] []
> ctnetlink_stat_ct_cpu+0x198/0x1e0
> net/netfilter/nf_conntrack_netlink.c:2045
>[  315.403815] []
> nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
>[  315.403815] [] netlink_rcv_skb+0x2bc/0x3a0
> net/netlink/af_netlink.c:2298
>[  315.403815] [] nfnetlink_rcv+0x7e1/0x10d0
> net/netfilter/nfnetlink.c:474
>[  315.403815] [< inline >] netlink_unicast_kernel
> net/netlink/af_netlink.c:1231
>[  315.403815] [] netlink_unicast+0x51a/0x740
> net/netlink/af_netlink.c:1257
>[  315.403815] [] netlink_sendmsg+0xaa4/0xe50
> net/netlink/af_netlink.c:1803
>[  315.403815] [< inline >] sock_sendmsg_nosec net/socket.c:621
>[  315.403815] [] sock_sendmsg+0xcf/0x110
> net/socket.c:631
>[  315.403815] [] sock_write_iter+0x32b/0x620
> net/socket.c:829
>[  315.403815] [< inline >] new_sync_write fs/read_write.c:499
>[  

Re: net: deadlock on genl_mutex

2016-12-08 Thread Dmitry Vyukov
On Thu, Dec 8, 2016 at 5:16 PM, Dmitry Vyukov  wrote:
> On Tue, Nov 29, 2016 at 6:59 AM,   wrote:
>>>
>>> Issue was reported yesterday and is under investigation.
>>>
>>>
>>> http://marc.info/?l=linux-netdev=148014004331663=2
>>>
>>>
>>> Thanks !
>>
>>
>> Hi Dmitry
>>
>> Can you try the patch below with your reproducer? I haven't seen similar
>> crashes reported after this (or even with Eric's patch).
>
> I've synced to 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7) and do
> _not_ see this report happening anymore.
> Thanks.


But now I am seeing "possible deadlock" warnings involving genl_lock:

[ INFO: possible circular locking dependency detected ]
4.9.0-rc8+ #77 Not tainted
---
syz-executor7/18794 is trying to acquire lock:
 (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x1c/0x20
net/core/rtnetlink.c:70
but task is already holding lock:
 (genl_mutex){+.+.+.}, at: [< inline >] genl_lock
net/netlink/genetlink.c:31
 (genl_mutex){+.+.+.}, at: []
genl_rcv_msg+0x209/0x260 net/netlink/genetlink.c:658
which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

   [  315.403815] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
   [  315.403815] []
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
   [  315.403815] [] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
   [  315.403815] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
   [  315.403815] []
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
   [  315.403815] [< inline >] genl_lock net/netlink/genetlink.c:31
   [  315.403815] [] genl_lock_dumpit+0x46/0xa0
net/netlink/genetlink.c:518
   [  315.403815] [] netlink_dump+0x57c/0xd70
net/netlink/af_netlink.c:2127
   [  315.403815] []
__netlink_dump_start+0x4ea/0x760 net/netlink/af_netlink.c:2217
   [  315.403815] []
genl_family_rcv_msg+0xdc9/0x1070 net/netlink/genetlink.c:586
   [  315.403815] [] genl_rcv_msg+0x1b0/0x260
net/netlink/genetlink.c:660
   [  315.403815] [] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
   [  315.403815] [] genl_rcv+0x2d/0x40
net/netlink/genetlink.c:671
   [  315.403815] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
   [  315.403815] [] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
   [  315.403815] [] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
   [  315.403815] [< inline >] sock_sendmsg_nosec net/socket.c:621
   [  315.403815] [] sock_sendmsg+0xcf/0x110
net/socket.c:631
   [  315.403815] [] sock_write_iter+0x32b/0x620
net/socket.c:829
   [  315.403815] [< inline >] new_sync_write fs/read_write.c:499
   [  315.403815] [] __vfs_write+0x4fe/0x830
fs/read_write.c:512
   [  315.403815] [] vfs_write+0x175/0x4e0
fs/read_write.c:560
   [  315.403815] [< inline >] SYSC_write fs/read_write.c:607
   [  315.403815] [] SyS_write+0x100/0x240
fs/read_write.c:599
   [  315.403815] [] entry_SYSCALL_64_fastpath+0x23/0xc6

   [  315.403815] [< inline >] validate_chain
kernel/locking/lockdep.c:2265
   [  315.403815] []
__lock_acquire+0x2156/0x3380 kernel/locking/lockdep.c:3338
   [  315.403815] [] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3749
   [  315.403815] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
   [  315.403815] []
mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621
   [  315.403815] []
__netlink_dump_start+0xf9/0x760 net/netlink/af_netlink.c:2187
   [  315.403815] [< inline >] netlink_dump_start
include/linux/netlink.h:165
   [  315.403815] []
ctnetlink_stat_ct_cpu+0x198/0x1e0
net/netfilter/nf_conntrack_netlink.c:2045
   [  315.403815] []
nfnetlink_rcv_msg+0x9be/0xd60 net/netfilter/nfnetlink.c:212
   [  315.403815] [] netlink_rcv_skb+0x2bc/0x3a0
net/netlink/af_netlink.c:2298
   [  315.403815] [] nfnetlink_rcv+0x7e1/0x10d0
net/netfilter/nfnetlink.c:474
   [  315.403815] [< inline >] netlink_unicast_kernel
net/netlink/af_netlink.c:1231
   [  315.403815] [] netlink_unicast+0x51a/0x740
net/netlink/af_netlink.c:1257
   [  315.403815] [] netlink_sendmsg+0xaa4/0xe50
net/netlink/af_netlink.c:1803
   [  315.403815] [< inline >] sock_sendmsg_nosec net/socket.c:621
   [  315.403815] [] sock_sendmsg+0xcf/0x110
net/socket.c:631
   [  315.403815] [] sock_write_iter+0x32b/0x620
net/socket.c:829
   [  315.403815] [< inline >] new_sync_write fs/read_write.c:499
   [  315.403815] [] __vfs_write+0x4fe/0x830
fs/read_write.c:512
   [  315.403815] [] vfs_write+0x175/0x4e0
fs/read_write.c:560
   [  315.403815] [< inline >] SYSC_write fs/read_write.c:607
   [  315.403815] [] SyS_write+0x100/0x240
fs/read_write.c:599
   [  315.403815] [] 

Re: net: deadlock on genl_mutex

2016-12-08 Thread Dmitry Vyukov
On Tue, Nov 29, 2016 at 6:59 AM,   wrote:
>>
>> Issue was reported yesterday and is under investigation.
>>
>>
>> http://marc.info/?l=linux-netdev=148014004331663=2
>>
>>
>> Thanks !
>
>
> Hi Dmitry
>
> Can you try the patch below with your reproducer? I haven't seen similar
> crashes reported after this (or even with Eric's patch).

I've synced to 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7) and do
_not_ see this report happening anymore.
Thanks.


Re: net: deadlock on genl_mutex

2016-11-28 Thread Eric Dumazet
On Mon, 2016-11-28 at 22:59 -0700, subas...@codeaurora.org wrote:
> > 
> > Issue was reported yesterday and is under investigation.
> > 
> > 
> > http://marc.info/?l=linux-netdev=148014004331663=2
> > 
> > 
> > Thanks !
> 
> Hi Dmitry
> 
> Can you try the patch below with your reproducer? I haven't seen similar 
> crashes reported after this (or even with Eric's patch).
> 
> https://patchwork.ozlabs.org/patch/699937/

Yeah, I will post my patch on top of this one.





Re: net: deadlock on genl_mutex

2016-11-28 Thread subashab


Issue was reported yesterday and is under investigation.


http://marc.info/?l=linux-netdev=148014004331663=2


Thanks !


Hi Dmitry

Can you try the patch below with your reproducer? I haven't seen similar 
crashes reported after this (or even with Eric's patch).


https://patchwork.ozlabs.org/patch/699937/

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


Re: net: deadlock on genl_mutex

2016-11-26 Thread Eric Dumazet
On Sat, Nov 26, 2016 at 9:04 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program triggers deadlock warnings on genl_mutex:
>
> https://gist.githubusercontent.com/dvyukov/65e33d053e507d2ab0bf6ae83d989585/raw/b3c640ec58e894b50bcbf255c471406466cfa5d0/gistfile1.txt
>
> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
>
> BUG: sleeping function called from invalid context at 
> kernel/locking/mutex.c:620
> in_atomic(): 1, irqs_disabled(): 0, pid: 32289, name: syz-executor
> CPU: 0 PID: 32289 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  88003ec06420 834c2e39  110007d80c17
>  ed0007d80c0f 41b58ab3 89575550 834c2b4b
>  8baab1a0 dc00  880068f794e0
> Call Trace:
>   [  287.394552]  [< inline >] __dump_stack lib/dump_stack.c:15
>   [  287.394552]  [] dump_stack+0x2ee/0x3f5
> lib/dump_stack.c:51
>  [] ___might_sleep+0x483/0x660 kernel/sched/core.c:7761
>  [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
>  [] mutex_lock_nested+0x1ea/0xf20 kernel/locking/mutex.c:620
>  [< inline >] genl_lock net/netlink/genetlink.c:31
>  [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
>  [] netlink_sock_destruct+0xf8/0x400
> net/netlink/af_netlink.c:331
>  [] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423
>  [] sk_destruct+0x4c/0x80 net/core/sock.c:1453
>  [] __sk_free+0x5c/0x230 net/core/sock.c:1461
>  [] sk_free+0x28/0x30 net/core/sock.c:1472
>  [< inline >] sock_put include/net/sock.h:1591
>  [] deferred_put_nlk_sk+0x31/0x40 
> net/netlink/af_netlink.c:652
>  [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118
>  [] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776
>  [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040
>  [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007
>  [] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024
>  [] __do_softirq+0x32b/0xca8 kernel/softirq.c:284
>  [< inline >] invoke_softirq kernel/softirq.c:364
>  [] irq_exit+0x1d1/0x210 kernel/softirq.c:405
>  [< inline >] exiting_irq arch/x86/include/asm/apic.h:659
>  [] smp_apic_timer_interrupt+0x80/0xa0
> arch/x86/kernel/apic/apic.c:960
>  [] apic_timer_interrupt+0x8c/0xa0
> arch/x86/entry/entry_64.S:489
>   [  287.403717]  [] ? lock_is_held+0x247/0x310
>  [] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729
>  [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
>  [] down_read+0x78/0x160 kernel/locking/rwsem.c:21
>  [< inline >] anon_vma_lock_read include/linux/rmap.h:127
>  [] validate_mm+0xe5/0x880 mm/mmap.c:347
>  [] vma_link+0x11b/0x180 mm/mmap.c:605
>  [] mmap_region+0x1076/0x1880 mm/mmap.c:1692
>  [] do_mmap+0x6ff/0xe80 mm/mmap.c:1450
>  [< inline >] do_mmap_pgoff include/linux/mm.h:2039
>  [] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305
>  [< inline >] SYSC_mmap_pgoff mm/mmap.c:1500
>  [] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458
>  [< inline >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95
>  [] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86
>  [] entry_SYSCALL_64_fastpath+0x23/0xc6
>
> =
> [ INFO: inconsistent lock state ]
> 4.9.0-rc5+ #54 Tainted: GW
> -
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> syz-executor/32289 [HC0[0]:SC1[1]:HE1:SE0] takes:
>  ([  287.580014] genl_mutex
> [< inline >] genl_lock net/netlink/genetlink.c:31
> [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
> {SOFTIRQ-ON-W} state was registered at:
>   [  287.580014] [< inline >] mark_irqflags
> kernel/locking/lockdep.c:2938
>   [  287.580014] [] __lock_acquire+0x6e7/0x3380
> kernel/locking/lockdep.c:3292
>   [  287.580014] [] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3746
>   [  287.580014] [< inline >] __mutex_lock_common
> kernel/locking/mutex.c:521
>   [  287.580014] [] mutex_lock_nested+0x23f/0xf20
> kernel/locking/mutex.c:621
>   [  287.580014] [< inline >] genl_lock net/netlink/genetlink.c:31
>   [  287.580014] [< inline >] genl_lock_all net/netlink/genetlink.c:52
>   [  287.580014] []
> __genl_register_family+0x2ce/0x1870 net/netlink/genetlink.c:374
>   [  287.580014] [< inline >]
> _genl_register_family_with_ops_grps include/net/genetlink.h:173
>   [  287.580014] [] genl_init+0x11d/0x185
> net/netlink/genetlink.c:1084
>   [  287.580014] [] do_one_initcall+0xfb/0x3f0 
> init/main.c:778
>   [  287.580014] [< inline >] do_initcall_level init/main.c:844
>   [  287.580014] [< inline >] do_initcalls init/main.c:852
>   [  287.580014] [< inline >] do_basic_setup init/main.c:870
>   [  287.580014] [] kernel_init_freeable+0x5c4/0x69e
> init/main.c:1017
>   [  287.580014] [] kernel_init+0x18/0x180 init/main.c:943
>   [  287.580014] [] ret_from_fork+0x2a/0x40
> 

net: deadlock on genl_mutex

2016-11-26 Thread Dmitry Vyukov
Hello,

The following program triggers deadlock warnings on genl_mutex:

https://gist.githubusercontent.com/dvyukov/65e33d053e507d2ab0bf6ae83d989585/raw/b3c640ec58e894b50bcbf255c471406466cfa5d0/gistfile1.txt

On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
in_atomic(): 1, irqs_disabled(): 0, pid: 32289, name: syz-executor
CPU: 0 PID: 32289 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 88003ec06420 834c2e39  110007d80c17
 ed0007d80c0f 41b58ab3 89575550 834c2b4b
 8baab1a0 dc00  880068f794e0
Call Trace:
  [  287.394552]  [< inline >] __dump_stack lib/dump_stack.c:15
  [  287.394552]  [] dump_stack+0x2ee/0x3f5
lib/dump_stack.c:51
 [] ___might_sleep+0x483/0x660 kernel/sched/core.c:7761
 [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
 [] mutex_lock_nested+0x1ea/0xf20 kernel/locking/mutex.c:620
 [< inline >] genl_lock net/netlink/genetlink.c:31
 [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
 [] netlink_sock_destruct+0xf8/0x400
net/netlink/af_netlink.c:331
 [] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423
 [] sk_destruct+0x4c/0x80 net/core/sock.c:1453
 [] __sk_free+0x5c/0x230 net/core/sock.c:1461
 [] sk_free+0x28/0x30 net/core/sock.c:1472
 [< inline >] sock_put include/net/sock.h:1591
 [] deferred_put_nlk_sk+0x31/0x40 net/netlink/af_netlink.c:652
 [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118
 [] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776
 [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040
 [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007
 [] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024
 [] __do_softirq+0x32b/0xca8 kernel/softirq.c:284
 [< inline >] invoke_softirq kernel/softirq.c:364
 [] irq_exit+0x1d1/0x210 kernel/softirq.c:405
 [< inline >] exiting_irq arch/x86/include/asm/apic.h:659
 [] smp_apic_timer_interrupt+0x80/0xa0
arch/x86/kernel/apic/apic.c:960
 [] apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:489
  [  287.403717]  [] ? lock_is_held+0x247/0x310
 [] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729
 [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
 [] down_read+0x78/0x160 kernel/locking/rwsem.c:21
 [< inline >] anon_vma_lock_read include/linux/rmap.h:127
 [] validate_mm+0xe5/0x880 mm/mmap.c:347
 [] vma_link+0x11b/0x180 mm/mmap.c:605
 [] mmap_region+0x1076/0x1880 mm/mmap.c:1692
 [] do_mmap+0x6ff/0xe80 mm/mmap.c:1450
 [< inline >] do_mmap_pgoff include/linux/mm.h:2039
 [] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305
 [< inline >] SYSC_mmap_pgoff mm/mmap.c:1500
 [] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458
 [< inline >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95
 [] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86
 [] entry_SYSCALL_64_fastpath+0x23/0xc6

=
[ INFO: inconsistent lock state ]
4.9.0-rc5+ #54 Tainted: GW
-
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
syz-executor/32289 [HC0[0]:SC1[1]:HE1:SE0] takes:
 ([  287.580014] genl_mutex
[< inline >] genl_lock net/netlink/genetlink.c:31
[] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
{SOFTIRQ-ON-W} state was registered at:
  [  287.580014] [< inline >] mark_irqflags
kernel/locking/lockdep.c:2938
  [  287.580014] [] __lock_acquire+0x6e7/0x3380
kernel/locking/lockdep.c:3292
  [  287.580014] [] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3746
  [  287.580014] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
  [  287.580014] [] mutex_lock_nested+0x23f/0xf20
kernel/locking/mutex.c:621
  [  287.580014] [< inline >] genl_lock net/netlink/genetlink.c:31
  [  287.580014] [< inline >] genl_lock_all net/netlink/genetlink.c:52
  [  287.580014] []
__genl_register_family+0x2ce/0x1870 net/netlink/genetlink.c:374
  [  287.580014] [< inline >]
_genl_register_family_with_ops_grps include/net/genetlink.h:173
  [  287.580014] [] genl_init+0x11d/0x185
net/netlink/genetlink.c:1084
  [  287.580014] [] do_one_initcall+0xfb/0x3f0 init/main.c:778
  [  287.580014] [< inline >] do_initcall_level init/main.c:844
  [  287.580014] [< inline >] do_initcalls init/main.c:852
  [  287.580014] [< inline >] do_basic_setup init/main.c:870
  [  287.580014] [] kernel_init_freeable+0x5c4/0x69e
init/main.c:1017
  [  287.580014] [] kernel_init+0x18/0x180 init/main.c:943
  [  287.580014] [] ret_from_fork+0x2a/0x40
arch/x86/entry/entry_64.S:433

[   78.258919] [ INFO: inconsistent lock state ]
[   78.258919] 4.9.0-rc5+ #54 Tainted: GW
[   78.258919] -
[   78.258919] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[   78.258919] syz-fuzzer/5211