Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-20 Thread Jarek Poplawski
On Fri, Feb 16, 2007 at 08:06:25AM -0800, Ben Greear wrote: ... Well, I had lockdep and all of the locking debugging I could find enabled, but it did not catch this problem..I had to use sysctl -t and manually dig through the backtraces to find the deadlock It may be that lockdep

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Jarek Poplawski
On Thu, Feb 15, 2007 at 11:40:32PM -0800, Ben Greear wrote: ... Maybe there should be something like an ASSERT_NOT_RTNL() in the flush_scheduled_work() method? If it's performance criticial, #ifdef it out if we're not debugging locks? Yes! I thought about the same (at first). But in my

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Ben Greear
Jarek Poplawski wrote: On Thu, Feb 15, 2007 at 11:40:32PM -0800, Ben Greear wrote: ... Maybe there should be something like an ASSERT_NOT_RTNL() in the flush_scheduled_work() method? If it's performance criticial, #ifdef it out if we're not debugging locks? Yes! I thought about the

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Jarek Poplawski
On Fri, Feb 16, 2007 at 12:23:05AM -0800, Ben Greear wrote: Jarek Poplawski wrote: On Thu, Feb 15, 2007 at 11:40:32PM -0800, Ben Greear wrote: ... Maybe there should be something like an ASSERT_NOT_RTNL() in the flush_scheduled_work() method? If it's performance criticial, #ifdef it

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Jarek Poplawski
On Fri, Feb 16, 2007 at 10:04:25AM +0100, Jarek Poplawski wrote: On Fri, Feb 16, 2007 at 12:23:05AM -0800, Ben Greear wrote: ... I don't see how asserting it in the rtnl_lock would help anything, because at that point we are about to deadlock anyway... (and this is probably very rare,

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Ben Greear
Jarek Poplawski wrote: On Fri, Feb 16, 2007 at 10:04:25AM +0100, Jarek Poplawski wrote: On Fri, Feb 16, 2007 at 12:23:05AM -0800, Ben Greear wrote: ... I don't see how asserting it in the rtnl_lock would help anything, because at that point we are about to deadlock anyway... (and

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Stephen Hemminger
On Thu, 15 Feb 2007 23:40:32 -0800 Ben Greear [EMAIL PROTECTED] wrote: Jarek Poplawski wrote: On 14-02-2007 22:27, Stephen Hemminger wrote: Ben found this but the problem seems pretty widespread. The following places are subject to deadlock between flush_scheduled_work and the

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-16 Thread Ben Greear
Stephen Hemminger wrote: On Thu, 15 Feb 2007 23:40:32 -0800 Ben Greear [EMAIL PROTECTED] wrote: Maybe there should be something like an ASSERT_NOT_RTNL() in the flush_scheduled_work() method? If it's performance criticial, #ifdef it out if we're not debugging locks? You can't safely add

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-15 Thread Ben Greear
Francois Romieu wrote: Ben Greear [EMAIL PROTECTED] : [...] I seem to be able to trigger this within about 1 minute on a particular 2.6.18.2 system with some 8139too devices, so if someone has a patch that could be tested, I'll gladly test it. For whatever reason, I haven't hit this problem on

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-15 Thread Jarek Poplawski
On 14-02-2007 22:27, Stephen Hemminger wrote: Ben found this but the problem seems pretty widespread. The following places are subject to deadlock between flush_scheduled_work and the RTNL mutex. What can happen is that a work queue routine (like bridge port_carrier_check) is waiting forever

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-15 Thread Ben Greear
Jarek Poplawski wrote: On 14-02-2007 22:27, Stephen Hemminger wrote: Ben found this but the problem seems pretty widespread. The following places are subject to deadlock between flush_scheduled_work and the RTNL mutex. What can happen is that a work queue routine (like bridge

[BUG] RTNL and flush_scheduled_work deadlocks

2007-02-14 Thread Stephen Hemminger
Ben found this but the problem seems pretty widespread. The following places are subject to deadlock between flush_scheduled_work and the RTNL mutex. What can happen is that a work queue routine (like bridge port_carrier_check) is waiting forever for RTNL, and the driver routine has called

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-14 Thread Ben Greear
Stephen Hemminger wrote: Ben found this but the problem seems pretty widespread. The following places are subject to deadlock between flush_scheduled_work and the RTNL mutex. What can happen is that a work queue routine (like bridge port_carrier_check) is waiting forever for RTNL, and the

Re: [BUG] RTNL and flush_scheduled_work deadlocks

2007-02-14 Thread Francois Romieu
Ben Greear [EMAIL PROTECTED] : [...] I seem to be able to trigger this within about 1 minute on a particular 2.6.18.2 system with some 8139too devices, so if someone has a patch that could be tested, I'll gladly test it. For whatever reason, I haven't hit this problem on 2.6.20 yet, but that