Re: [PATCH] via-rhine: Fix tx_timeout handling

2013-07-21 Thread Ben Hutchings
On Sun, 2013-07-21 at 18:32 +0200, Richard Weinberger wrote:
> Am 21.07.2013 18:18, schrieb Ben Hutchings:
> > On Fri, 2013-07-19 at 23:30 +0200, Richard Weinberger wrote:
> >> rhine_reset_task() misses to call netif_stop_queue(),
> >> this can lead to a crash if work is still scheduled while
> >> we're resetting the tx queue.
> >>
> >> Fixes:
> >> [   93.591707] BUG: unable to handle kernel NULL pointer dereference at 
> >> 004c
> >> [   93.595514] IP: [] rhine_napipoll+0x491/0x6e
> >>
> >> Signed-off-by: Richard Weinberger 
> >> ---
> >>  drivers/net/ethernet/via/via-rhine.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/net/ethernet/via/via-rhine.c 
> >> b/drivers/net/ethernet/via/via-rhine.c
> >> index b75eb9e..57e1b40 100644
> >> --- a/drivers/net/ethernet/via/via-rhine.c
> >> +++ b/drivers/net/ethernet/via/via-rhine.c
> >> @@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct 
> >> *work)
> >>goto out_unlock;
> >>  
> >>napi_disable(>napi);
> >> +  netif_stop_queue(dev);
> > 
> > This is not really fixing the bug because there is no synchronisation
> > with the TX scheduler.  You can call netif_tx_disable() instead to do
> > that.
> 
> I guess other drivers suffer from that too.
> A quick grep showed that not many drivers are using netif_tx_disable().
> 
> > (I also think that it is preferable to use
> > netif_device_{detach,attach}() to stop the queue during reconfiguration,
> > as this is independent of TX completions and the watchdog.)

Actually, this is not independent of TX completions - netif_wake_queue()
will still start the TX scheduler while the device is not present, so
you have to avoid that.

> So the correct down sequence is napi_disable() -> netif_tx_disable() -> 
> netif_device_detach()?

No, that's redundant.  You can do:

napi_disable();
netif_tx_lock_bh(); /* sync with TX scheduler */
netif_device_detach();
netif_tx_unlock_bh();

and then when the queue is ready to use again:

netif_device_attach();
napi_enable();

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] via-rhine: Fix tx_timeout handling

2013-07-21 Thread Richard Weinberger
Am 21.07.2013 18:18, schrieb Ben Hutchings:
> On Fri, 2013-07-19 at 23:30 +0200, Richard Weinberger wrote:
>> rhine_reset_task() misses to call netif_stop_queue(),
>> this can lead to a crash if work is still scheduled while
>> we're resetting the tx queue.
>>
>> Fixes:
>> [   93.591707] BUG: unable to handle kernel NULL pointer dereference at 
>> 004c
>> [   93.595514] IP: [] rhine_napipoll+0x491/0x6e
>>
>> Signed-off-by: Richard Weinberger 
>> ---
>>  drivers/net/ethernet/via/via-rhine.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/ethernet/via/via-rhine.c 
>> b/drivers/net/ethernet/via/via-rhine.c
>> index b75eb9e..57e1b40 100644
>> --- a/drivers/net/ethernet/via/via-rhine.c
>> +++ b/drivers/net/ethernet/via/via-rhine.c
>> @@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct *work)
>>  goto out_unlock;
>>  
>>  napi_disable(>napi);
>> +netif_stop_queue(dev);
> 
> This is not really fixing the bug because there is no synchronisation
> with the TX scheduler.  You can call netif_tx_disable() instead to do
> that.

I guess other drivers suffer from that too.
A quick grep showed that not many drivers are using netif_tx_disable().

> (I also think that it is preferable to use
> netif_device_{detach,attach}() to stop the queue during reconfiguration,
> as this is independent of TX completions and the watchdog.)

So the correct down sequence is napi_disable() -> netif_tx_disable() -> 
netif_device_detach()?

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] via-rhine: Fix tx_timeout handling

2013-07-21 Thread Ben Hutchings
On Fri, 2013-07-19 at 23:30 +0200, Richard Weinberger wrote:
> rhine_reset_task() misses to call netif_stop_queue(),
> this can lead to a crash if work is still scheduled while
> we're resetting the tx queue.
> 
> Fixes:
> [   93.591707] BUG: unable to handle kernel NULL pointer dereference at 
> 004c
> [   93.595514] IP: [] rhine_napipoll+0x491/0x6e
> 
> Signed-off-by: Richard Weinberger 
> ---
>  drivers/net/ethernet/via/via-rhine.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/via/via-rhine.c 
> b/drivers/net/ethernet/via/via-rhine.c
> index b75eb9e..57e1b40 100644
> --- a/drivers/net/ethernet/via/via-rhine.c
> +++ b/drivers/net/ethernet/via/via-rhine.c
> @@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct *work)
>   goto out_unlock;
>  
>   napi_disable(>napi);
> + netif_stop_queue(dev);

This is not really fixing the bug because there is no synchronisation
with the TX scheduler.  You can call netif_tx_disable() instead to do
that.

(I also think that it is preferable to use
netif_device_{detach,attach}() to stop the queue during reconfiguration,
as this is independent of TX completions and the watchdog.)

Ben.

>   spin_lock_bh(>lock);
>  
>   /* clear all descriptors */

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] via-rhine: Fix tx_timeout handling

2013-07-21 Thread Ben Hutchings
On Fri, 2013-07-19 at 23:30 +0200, Richard Weinberger wrote:
 rhine_reset_task() misses to call netif_stop_queue(),
 this can lead to a crash if work is still scheduled while
 we're resetting the tx queue.
 
 Fixes:
 [   93.591707] BUG: unable to handle kernel NULL pointer dereference at 
 004c
 [   93.595514] IP: [c119d10d] rhine_napipoll+0x491/0x6e
 
 Signed-off-by: Richard Weinberger rich...@nod.at
 ---
  drivers/net/ethernet/via/via-rhine.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/drivers/net/ethernet/via/via-rhine.c 
 b/drivers/net/ethernet/via/via-rhine.c
 index b75eb9e..57e1b40 100644
 --- a/drivers/net/ethernet/via/via-rhine.c
 +++ b/drivers/net/ethernet/via/via-rhine.c
 @@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct *work)
   goto out_unlock;
  
   napi_disable(rp-napi);
 + netif_stop_queue(dev);

This is not really fixing the bug because there is no synchronisation
with the TX scheduler.  You can call netif_tx_disable() instead to do
that.

(I also think that it is preferable to use
netif_device_{detach,attach}() to stop the queue during reconfiguration,
as this is independent of TX completions and the watchdog.)

Ben.

   spin_lock_bh(rp-lock);
  
   /* clear all descriptors */

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] via-rhine: Fix tx_timeout handling

2013-07-21 Thread Richard Weinberger
Am 21.07.2013 18:18, schrieb Ben Hutchings:
 On Fri, 2013-07-19 at 23:30 +0200, Richard Weinberger wrote:
 rhine_reset_task() misses to call netif_stop_queue(),
 this can lead to a crash if work is still scheduled while
 we're resetting the tx queue.

 Fixes:
 [   93.591707] BUG: unable to handle kernel NULL pointer dereference at 
 004c
 [   93.595514] IP: [c119d10d] rhine_napipoll+0x491/0x6e

 Signed-off-by: Richard Weinberger rich...@nod.at
 ---
  drivers/net/ethernet/via/via-rhine.c | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/drivers/net/ethernet/via/via-rhine.c 
 b/drivers/net/ethernet/via/via-rhine.c
 index b75eb9e..57e1b40 100644
 --- a/drivers/net/ethernet/via/via-rhine.c
 +++ b/drivers/net/ethernet/via/via-rhine.c
 @@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct *work)
  goto out_unlock;
  
  napi_disable(rp-napi);
 +netif_stop_queue(dev);
 
 This is not really fixing the bug because there is no synchronisation
 with the TX scheduler.  You can call netif_tx_disable() instead to do
 that.

I guess other drivers suffer from that too.
A quick grep showed that not many drivers are using netif_tx_disable().

 (I also think that it is preferable to use
 netif_device_{detach,attach}() to stop the queue during reconfiguration,
 as this is independent of TX completions and the watchdog.)

So the correct down sequence is napi_disable() - netif_tx_disable() - 
netif_device_detach()?

Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] via-rhine: Fix tx_timeout handling

2013-07-21 Thread Ben Hutchings
On Sun, 2013-07-21 at 18:32 +0200, Richard Weinberger wrote:
 Am 21.07.2013 18:18, schrieb Ben Hutchings:
  On Fri, 2013-07-19 at 23:30 +0200, Richard Weinberger wrote:
  rhine_reset_task() misses to call netif_stop_queue(),
  this can lead to a crash if work is still scheduled while
  we're resetting the tx queue.
 
  Fixes:
  [   93.591707] BUG: unable to handle kernel NULL pointer dereference at 
  004c
  [   93.595514] IP: [c119d10d] rhine_napipoll+0x491/0x6e
 
  Signed-off-by: Richard Weinberger rich...@nod.at
  ---
   drivers/net/ethernet/via/via-rhine.c | 1 +
   1 file changed, 1 insertion(+)
 
  diff --git a/drivers/net/ethernet/via/via-rhine.c 
  b/drivers/net/ethernet/via/via-rhine.c
  index b75eb9e..57e1b40 100644
  --- a/drivers/net/ethernet/via/via-rhine.c
  +++ b/drivers/net/ethernet/via/via-rhine.c
  @@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct 
  *work)
 goto out_unlock;
   
 napi_disable(rp-napi);
  +  netif_stop_queue(dev);
  
  This is not really fixing the bug because there is no synchronisation
  with the TX scheduler.  You can call netif_tx_disable() instead to do
  that.
 
 I guess other drivers suffer from that too.
 A quick grep showed that not many drivers are using netif_tx_disable().
 
  (I also think that it is preferable to use
  netif_device_{detach,attach}() to stop the queue during reconfiguration,
  as this is independent of TX completions and the watchdog.)

Actually, this is not independent of TX completions - netif_wake_queue()
will still start the TX scheduler while the device is not present, so
you have to avoid that.

 So the correct down sequence is napi_disable() - netif_tx_disable() - 
 netif_device_detach()?

No, that's redundant.  You can do:

napi_disable();
netif_tx_lock_bh(); /* sync with TX scheduler */
netif_device_detach();
netif_tx_unlock_bh();

and then when the queue is ready to use again:

netif_device_attach();
napi_enable();

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] via-rhine: Fix tx_timeout handling

2013-07-19 Thread Richard Weinberger
rhine_reset_task() misses to call netif_stop_queue(),
this can lead to a crash if work is still scheduled while
we're resetting the tx queue.

Fixes:
[   93.591707] BUG: unable to handle kernel NULL pointer dereference at 004c
[   93.595514] IP: [] rhine_napipoll+0x491/0x6e

Signed-off-by: Richard Weinberger 
---
 drivers/net/ethernet/via/via-rhine.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/via/via-rhine.c 
b/drivers/net/ethernet/via/via-rhine.c
index b75eb9e..57e1b40 100644
--- a/drivers/net/ethernet/via/via-rhine.c
+++ b/drivers/net/ethernet/via/via-rhine.c
@@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct *work)
goto out_unlock;
 
napi_disable(>napi);
+   netif_stop_queue(dev);
spin_lock_bh(>lock);
 
/* clear all descriptors */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] via-rhine: Fix tx_timeout handling

2013-07-19 Thread Richard Weinberger
rhine_reset_task() misses to call netif_stop_queue(),
this can lead to a crash if work is still scheduled while
we're resetting the tx queue.

Fixes:
[   93.591707] BUG: unable to handle kernel NULL pointer dereference at 004c
[   93.595514] IP: [c119d10d] rhine_napipoll+0x491/0x6e

Signed-off-by: Richard Weinberger rich...@nod.at
---
 drivers/net/ethernet/via/via-rhine.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/via/via-rhine.c 
b/drivers/net/ethernet/via/via-rhine.c
index b75eb9e..57e1b40 100644
--- a/drivers/net/ethernet/via/via-rhine.c
+++ b/drivers/net/ethernet/via/via-rhine.c
@@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct *work)
goto out_unlock;
 
napi_disable(rp-napi);
+   netif_stop_queue(dev);
spin_lock_bh(rp-lock);
 
/* clear all descriptors */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/