Re: rtnl_mutex deadlock?

2015-08-08 Thread Thomas Graf
On 08/07/15 at 08:00am, Herbert Xu wrote: On Fri, Aug 07, 2015 at 01:58:15AM +0200, Daniel Borkmann wrote: Looks like we had a WARN_ON() in rhashtable_insert_rehash() before, but was removed in a87b9ebf1709 (rhashtable: Do not schedule more than one rehash if we can't grow further). Do

Re: rtnl_mutex deadlock?

2015-08-06 Thread Herbert Xu
On Fri, Aug 07, 2015 at 01:58:15AM +0200, Daniel Borkmann wrote: Looks like we had a WARN_ON() in rhashtable_insert_rehash() before, but was removed in a87b9ebf1709 (rhashtable: Do not schedule more than one rehash if we can't grow further). Do you want to re-add a WARN_ON_ONCE()? I think so.

Re: rtnl_mutex deadlock?

2015-08-06 Thread Daniel Borkmann
On 08/06/2015 04:50 PM, Daniel Borkmann wrote: On 08/06/2015 02:30 AM, Herbert Xu wrote: On Wed, Aug 05, 2015 at 08:59:07PM +0200, Daniel Borkmann wrote: Here's a theory and patch below. Herbert, Thomas, does this make any sense to you resp. sound plausible? ;) It's certainly possible.

Re: rtnl_mutex deadlock?

2015-08-06 Thread Herbert Xu
On Thu, Aug 06, 2015 at 04:50:39PM +0200, Daniel Borkmann wrote: Then, in __rhashtable_insert_fast(), I could trigger an -EBUSY when I'm really unlucky and exceed the ht-elasticity limit of 16. I would then end up in rhashtable_insert_rehash() to find out there's already one ongoing and

Re: rtnl_mutex deadlock?

2015-08-06 Thread Herbert Xu
On Fri, Aug 07, 2015 at 12:39:47AM +0200, Daniel Borkmann wrote: window was too small to trigger an error. I think in any case, remapping seems okay. Oh there is no doubt that we need your EBUSY remapping patch. It's just that it's very unlikely for this to be responsible for the dead-lock

Re: rtnl_mutex deadlock?

2015-08-06 Thread Daniel Borkmann
On 08/07/2015 01:41 AM, Herbert Xu wrote: On Thu, Aug 06, 2015 at 04:50:39PM +0200, Daniel Borkmann wrote: Then, in __rhashtable_insert_fast(), I could trigger an -EBUSY when I'm really unlucky and exceed the ht-elasticity limit of 16. I would then end up in rhashtable_insert_rehash() to find

Re: rtnl_mutex deadlock?

2015-08-06 Thread Daniel Borkmann
On 08/06/2015 02:30 AM, Herbert Xu wrote: On Wed, Aug 05, 2015 at 08:59:07PM +0200, Daniel Borkmann wrote: Here's a theory and patch below. Herbert, Thomas, does this make any sense to you resp. sound plausible? ;) It's certainly possible. Whether it's plausible I'm not so sure. The netlink

Re: rtnl_mutex deadlock?

2015-08-05 Thread Jiri Pirko
Wed, Aug 05, 2015 at 07:31:30AM CEST, cw...@twopensource.com wrote: On Tue, Aug 4, 2015 at 8:48 AM, Linus Torvalds torva...@linux-foundation.org wrote: Sorry for the spamming of random rtnetlink people, but I just resumed my laptop at PDX, and networking was dead. It looks like a deadlock on

Re: rtnl_mutex deadlock?

2015-08-05 Thread Linus Torvalds
On Wed, Aug 5, 2015 at 9:43 AM, Jiri Pirko j...@resnulli.us wrote: Indeed. Most probably, NETLINK_CB(skb).portid got zeroed. Linus, are you able to reproduce this or is it a one-time issue? I don't think I'm able to reproduce this, it's happened only once so far. Linus -- To

Re: rtnl_mutex deadlock?

2015-08-05 Thread Daniel Borkmann
On 08/05/2015 10:44 AM, Linus Torvalds wrote: On Wed, Aug 5, 2015 at 9:43 AM, Jiri Pirko j...@resnulli.us wrote: Indeed. Most probably, NETLINK_CB(skb).portid got zeroed. Linus, are you able to reproduce this or is it a one-time issue? I don't think I'm able to reproduce this, it's happened

Re: rtnl_mutex deadlock?

2015-08-05 Thread Herbert Xu
On Wed, Aug 05, 2015 at 08:59:07PM +0200, Daniel Borkmann wrote: Here's a theory and patch below. Herbert, Thomas, does this make any sense to you resp. sound plausible? ;) It's certainly possible. Whether it's plausible I'm not so sure. The netlink hashtable is unlimited in size. So it

rtnl_mutex deadlock?

2015-08-04 Thread Linus Torvalds
Sorry for the spamming of random rtnetlink people, but I just resumed my laptop at PDX, and networking was dead. It looks like a deadlock on rtnl_mutex, possibly due to some error path not releasing the lock. No network op was making any progress, and as you can see from the attached sysrq-w, it

Re: rtnl_mutex deadlock?

2015-08-04 Thread Cong Wang
On Tue, Aug 4, 2015 at 8:48 AM, Linus Torvalds torva...@linux-foundation.org wrote: Sorry for the spamming of random rtnetlink people, but I just resumed my laptop at PDX, and networking was dead. It looks like a deadlock on rtnl_mutex, possibly due to some error path not releasing the lock.