Re: [NET]: Fix possible dev_deactivate race condition

2007-10-19 Thread Herbert Xu
On Fri, Oct 19, 2007 at 09:35:19AM +0200, Peter Zijlstra wrote:
>
> > /* Wait for outstanding qdisc_run calls. */
> > -   while (test_bit(__LINK_STATE_QDISC_RUNNING, >state))
> > -   yield();
> > +   do {
> > +   while (test_bit(__LINK_STATE_QDISC_RUNNING, >state))
> > +   yield();
> > +
> 
> Ouch!, is there really no sane locking alternative? Hashed waitqueues
> like for the page lock come to mind.

Well if we ever moved the transmission to full process context
then we'll gladly accept your patch :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NET]: Fix possible dev_deactivate race condition

2007-10-19 Thread Peter Zijlstra
On Fri, 2007-10-19 at 13:36 +0800, Herbert Xu wrote:
> On Fri, Oct 19, 2007 at 12:20:25PM +0800, Herbert Xu wrote:
> >
> > In fact this bug exists elsewhere too.  For example, the network
> > stack does this in net/sched/sch_generic.c:
> > 
> > /* Wait for outstanding qdisc_run calls. */
> >   while (test_bit(__LINK_STATE_QDISC_RUNNING, >state))
> >   yield();
> > 
> > This has the same problem as the current synchronize_irq code.
> 

> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index e01d576..b3b7420 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -556,6 +556,7 @@ void dev_deactivate(struct net_device *dev)
>  {
> struct Qdisc *qdisc;
> struct sk_buff *skb;
> +   int running;
>  
> spin_lock_bh(>queue_lock);
> qdisc = dev->qdisc;
> @@ -571,12 +572,31 @@ void dev_deactivate(struct net_device *dev)
>  
> dev_watchdog_down(dev);
>  
> -   /* Wait for outstanding dev_queue_xmit calls. */
> +   /* Wait for outstanding qdisc-less dev_queue_xmit calls. */
> synchronize_rcu();
>  
> /* Wait for outstanding qdisc_run calls. */
> -   while (test_bit(__LINK_STATE_QDISC_RUNNING, >state))
> -   yield();
> +   do {
> +   while (test_bit(__LINK_STATE_QDISC_RUNNING, >state))
> +   yield();
> +

Ouch!, is there really no sane locking alternative? Hashed waitqueues
like for the page lock come to mind.

> +   /*
> +* Double-check inside queue lock to ensure that all effects
> +* of the queue run are visible when we return.
> +*/
> +   spin_lock_bh(>queue_lock);
> +   running = test_bit(__LINK_STATE_QDISC_RUNNING, >state);
> +   spin_unlock_bh(>queue_lock);
> +
> +   /*
> +* The running flag should never be set at this point because
> +* we've already set dev->qdisc to noop_qdisc *inside* the 
> same
> +* pair of spin locks.  That is, if any qdisc_run starts after
> +* our initial test it should see the noop_qdisc and then
> +* clear the RUNNING bit before dropping the queue lock.  So
> +* if it is set here then we've found a bug.
> +*/
> +   } while (WARN_ON_ONCE(running));
>  }
>  
>  void dev_init_scheduler(struct net_device *dev) 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NET]: Fix possible dev_deactivate race condition

2007-10-19 Thread Herbert Xu
On Fri, Oct 19, 2007 at 09:35:19AM +0200, Peter Zijlstra wrote:

  /* Wait for outstanding qdisc_run calls. */
  -   while (test_bit(__LINK_STATE_QDISC_RUNNING, dev-state))
  -   yield();
  +   do {
  +   while (test_bit(__LINK_STATE_QDISC_RUNNING, dev-state))
  +   yield();
  +
 
 Ouch!, is there really no sane locking alternative? Hashed waitqueues
 like for the page lock come to mind.

Well if we ever moved the transmission to full process context
then we'll gladly accept your patch :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NET]: Fix possible dev_deactivate race condition

2007-10-19 Thread Peter Zijlstra
On Fri, 2007-10-19 at 13:36 +0800, Herbert Xu wrote:
 On Fri, Oct 19, 2007 at 12:20:25PM +0800, Herbert Xu wrote:
 
  In fact this bug exists elsewhere too.  For example, the network
  stack does this in net/sched/sch_generic.c:
  
  /* Wait for outstanding qdisc_run calls. */
while (test_bit(__LINK_STATE_QDISC_RUNNING, dev-state))
yield();
  
  This has the same problem as the current synchronize_irq code.
 

 diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
 index e01d576..b3b7420 100644
 --- a/net/sched/sch_generic.c
 +++ b/net/sched/sch_generic.c
 @@ -556,6 +556,7 @@ void dev_deactivate(struct net_device *dev)
  {
 struct Qdisc *qdisc;
 struct sk_buff *skb;
 +   int running;
  
 spin_lock_bh(dev-queue_lock);
 qdisc = dev-qdisc;
 @@ -571,12 +572,31 @@ void dev_deactivate(struct net_device *dev)
  
 dev_watchdog_down(dev);
  
 -   /* Wait for outstanding dev_queue_xmit calls. */
 +   /* Wait for outstanding qdisc-less dev_queue_xmit calls. */
 synchronize_rcu();
  
 /* Wait for outstanding qdisc_run calls. */
 -   while (test_bit(__LINK_STATE_QDISC_RUNNING, dev-state))
 -   yield();
 +   do {
 +   while (test_bit(__LINK_STATE_QDISC_RUNNING, dev-state))
 +   yield();
 +

Ouch!, is there really no sane locking alternative? Hashed waitqueues
like for the page lock come to mind.

 +   /*
 +* Double-check inside queue lock to ensure that all effects
 +* of the queue run are visible when we return.
 +*/
 +   spin_lock_bh(dev-queue_lock);
 +   running = test_bit(__LINK_STATE_QDISC_RUNNING, dev-state);
 +   spin_unlock_bh(dev-queue_lock);
 +
 +   /*
 +* The running flag should never be set at this point because
 +* we've already set dev-qdisc to noop_qdisc *inside* the 
 same
 +* pair of spin locks.  That is, if any qdisc_run starts after
 +* our initial test it should see the noop_qdisc and then
 +* clear the RUNNING bit before dropping the queue lock.  So
 +* if it is set here then we've found a bug.
 +*/
 +   } while (WARN_ON_ONCE(running));
  }
  
  void dev_init_scheduler(struct net_device *dev) 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NET]: Fix possible dev_deactivate race condition

2007-10-18 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Fri, 19 Oct 2007 13:36:24 +0800

> [NET]: Fix possible dev_deactivate race condition
> 
> The function dev_deactivate is supposed to only return when
> all outstanding transmissions have completed.  Unfortunately
> it is possible for store operations in the driver's transmit
> function to only become visible after dev_deactivate returns.
> 
> This patch fixes this by taking the queue lock after we see
> the end of the queue run.  This ensures that all effects of
> any previous transmit calls are visible.
> 
> If however we detect that there is another queue run occuring,
> then we'll warn about it because this should never happen as
> we have pointed dev->qdisc to noop_qdisc within the same queue
> lock earlier in the functino.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied, thanks Herbert!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[NET]: Fix possible dev_deactivate race condition

2007-10-18 Thread Herbert Xu
On Fri, Oct 19, 2007 at 12:20:25PM +0800, Herbert Xu wrote:
>
> In fact this bug exists elsewhere too.  For example, the network
> stack does this in net/sched/sch_generic.c:
> 
> /* Wait for outstanding qdisc_run calls. */
>   while (test_bit(__LINK_STATE_QDISC_RUNNING, >state))
>   yield();
> 
> This has the same problem as the current synchronize_irq code.

OK here is the fix for that case.

[NET]: Fix possible dev_deactivate race condition

The function dev_deactivate is supposed to only return when
all outstanding transmissions have completed.  Unfortunately
it is possible for store operations in the driver's transmit
function to only become visible after dev_deactivate returns.

This patch fixes this by taking the queue lock after we see
the end of the queue run.  This ensures that all effects of
any previous transmit calls are visible.

If however we detect that there is another queue run occuring,
then we'll warn about it because this should never happen as
we have pointed dev->qdisc to noop_qdisc within the same queue
lock earlier in the functino.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index e01d576..b3b7420 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -556,6 +556,7 @@ void dev_deactivate(struct net_device *dev)
 {
struct Qdisc *qdisc;
struct sk_buff *skb;
+   int running;
 
spin_lock_bh(>queue_lock);
qdisc = dev->qdisc;
@@ -571,12 +572,31 @@ void dev_deactivate(struct net_device *dev)
 
dev_watchdog_down(dev);
 
-   /* Wait for outstanding dev_queue_xmit calls. */
+   /* Wait for outstanding qdisc-less dev_queue_xmit calls. */
synchronize_rcu();
 
/* Wait for outstanding qdisc_run calls. */
-   while (test_bit(__LINK_STATE_QDISC_RUNNING, >state))
-   yield();
+   do {
+   while (test_bit(__LINK_STATE_QDISC_RUNNING, >state))
+   yield();
+
+   /*
+* Double-check inside queue lock to ensure that all effects
+* of the queue run are visible when we return.
+*/
+   spin_lock_bh(>queue_lock);
+   running = test_bit(__LINK_STATE_QDISC_RUNNING, >state);
+   spin_unlock_bh(>queue_lock);
+
+   /*
+* The running flag should never be set at this point because
+* we've already set dev->qdisc to noop_qdisc *inside* the same
+* pair of spin locks.  That is, if any qdisc_run starts after
+* our initial test it should see the noop_qdisc and then
+* clear the RUNNING bit before dropping the queue lock.  So
+* if it is set here then we've found a bug.
+*/
+   } while (WARN_ON_ONCE(running));
 }
 
 void dev_init_scheduler(struct net_device *dev)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [NET]: Fix possible dev_deactivate race condition

2007-10-18 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Fri, 19 Oct 2007 13:36:24 +0800

 [NET]: Fix possible dev_deactivate race condition
 
 The function dev_deactivate is supposed to only return when
 all outstanding transmissions have completed.  Unfortunately
 it is possible for store operations in the driver's transmit
 function to only become visible after dev_deactivate returns.
 
 This patch fixes this by taking the queue lock after we see
 the end of the queue run.  This ensures that all effects of
 any previous transmit calls are visible.
 
 If however we detect that there is another queue run occuring,
 then we'll warn about it because this should never happen as
 we have pointed dev-qdisc to noop_qdisc within the same queue
 lock earlier in the functino.
 
 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Applied, thanks Herbert!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[NET]: Fix possible dev_deactivate race condition

2007-10-18 Thread Herbert Xu
On Fri, Oct 19, 2007 at 12:20:25PM +0800, Herbert Xu wrote:

 In fact this bug exists elsewhere too.  For example, the network
 stack does this in net/sched/sch_generic.c:
 
 /* Wait for outstanding qdisc_run calls. */
   while (test_bit(__LINK_STATE_QDISC_RUNNING, dev-state))
   yield();
 
 This has the same problem as the current synchronize_irq code.

OK here is the fix for that case.

[NET]: Fix possible dev_deactivate race condition

The function dev_deactivate is supposed to only return when
all outstanding transmissions have completed.  Unfortunately
it is possible for store operations in the driver's transmit
function to only become visible after dev_deactivate returns.

This patch fixes this by taking the queue lock after we see
the end of the queue run.  This ensures that all effects of
any previous transmit calls are visible.

If however we detect that there is another queue run occuring,
then we'll warn about it because this should never happen as
we have pointed dev-qdisc to noop_qdisc within the same queue
lock earlier in the functino.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index e01d576..b3b7420 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -556,6 +556,7 @@ void dev_deactivate(struct net_device *dev)
 {
struct Qdisc *qdisc;
struct sk_buff *skb;
+   int running;
 
spin_lock_bh(dev-queue_lock);
qdisc = dev-qdisc;
@@ -571,12 +572,31 @@ void dev_deactivate(struct net_device *dev)
 
dev_watchdog_down(dev);
 
-   /* Wait for outstanding dev_queue_xmit calls. */
+   /* Wait for outstanding qdisc-less dev_queue_xmit calls. */
synchronize_rcu();
 
/* Wait for outstanding qdisc_run calls. */
-   while (test_bit(__LINK_STATE_QDISC_RUNNING, dev-state))
-   yield();
+   do {
+   while (test_bit(__LINK_STATE_QDISC_RUNNING, dev-state))
+   yield();
+
+   /*
+* Double-check inside queue lock to ensure that all effects
+* of the queue run are visible when we return.
+*/
+   spin_lock_bh(dev-queue_lock);
+   running = test_bit(__LINK_STATE_QDISC_RUNNING, dev-state);
+   spin_unlock_bh(dev-queue_lock);
+
+   /*
+* The running flag should never be set at this point because
+* we've already set dev-qdisc to noop_qdisc *inside* the same
+* pair of spin locks.  That is, if any qdisc_run starts after
+* our initial test it should see the noop_qdisc and then
+* clear the RUNNING bit before dropping the queue lock.  So
+* if it is set here then we've found a bug.
+*/
+   } while (WARN_ON_ONCE(running));
 }
 
 void dev_init_scheduler(struct net_device *dev)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/