Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-16 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 11 Dec 2007 15:42:22 -0800

> Perhaps we should change the warning to identify the guilty device.

Applied.

Stephen, you often don't supply a proper signoff line
for one-off changes like this and I find it very irritating.
It doesn't cost you anything to hit one or two keys in your
outgoing email editor to include the signoff line.

So please do so, or I'll silently drop patches without them
in the future.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-16 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Tue, 11 Dec 2007 15:42:22 -0800

 Perhaps we should change the warning to identify the guilty device.

Applied.

Stephen, you often don't supply a proper signoff line
for one-off changes like this and I find it very irritating.
It doesn't cost you anything to hit one or two keys in your
outgoing email editor to include the signoff line.

So please do so, or I'll silently drop patches without them
in the future.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-12 Thread David Miller
From: "Brandeburg, Jesse" <[EMAIL PROTECTED]>
Date: Tue, 11 Dec 2007 16:38:37 -0800

> @@ -3933,6 +3933,10 @@ quit_polling:
> e1000_set_itr(adapter);
> netif_rx_complete(poll_dev, napi);
> e1000_irq_enable(adapter);
> +   if (work_done == weight)
> +   return work_done - 1;
> +   else
> +   return work_done;

Don't do this.

If you processed "weight" worth of packets, return that
exact value and do not netif_rx_complete() and do not
re-enable interrupts.

That is the only correct fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-12 Thread David Miller
From: Brandeburg, Jesse [EMAIL PROTECTED]
Date: Tue, 11 Dec 2007 16:38:37 -0800

 @@ -3933,6 +3933,10 @@ quit_polling:
 e1000_set_itr(adapter);
 netif_rx_complete(poll_dev, napi);
 e1000_irq_enable(adapter);
 +   if (work_done == weight)
 +   return work_done - 1;
 +   else
 +   return work_done;

Don't do this.

If you processed weight worth of packets, return that
exact value and do not netif_rx_complete() and do not
re-enable interrupts.

That is the only correct fix.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Brandeburg, Jesse
Joonwoo Park wrote:
> 2007/12/12, Brandeburg, Jesse <[EMAIL PROTECTED]>:
>> 
>> all drivers using NAPI in 2.6.24+ (NNAPI??) must return zero here,
>> after calling netif_rx_complete.  netif_rx_complete plus work_done
>> != 0 causes a bug. 
>> 
> 
> Brandeburg,
> Don't we need to return non-zero work_done after netif_rx_complete if
> work_done != weight?

Actually we just need to make sure we don't return work_done == weight
as it is checked on line 2210 of dev.c.
I should also note that I was wrong above, and that the requirement is
that we MUST not return work_done == weight when indicating
netif_rx_complete.

So, returning 0 when we actually did some work, but are being removed
from the poll list because something like !netif_running, is probably
okay, but I'm sure someone will disagree with me.  Maybe returning like
this would be better
diff --git a/drivers/net/e1000/e1000_main.c
b/drivers/net/e1000/e1000_main.c
index 724f067..76d5e3b 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3933,6 +3933,10 @@ quit_polling:
e1000_set_itr(adapter);
netif_rx_complete(poll_dev, napi);
e1000_irq_enable(adapter);
+   if (work_done == weight)
+   return work_done - 1;
+   else
+   return work_done;
}

return work_done;

Jesse

 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Joonwoo Park
2007/12/12, Brandeburg, Jesse <[EMAIL PROTECTED]>:
>
> all drivers using NAPI in 2.6.24+ (NNAPI??) must return zero here, after
> calling netif_rx_complete.  netif_rx_complete plus work_done != 0 causes
> a bug.
>

Brandeburg,
Don't we need to return non-zero work_done after netif_rx_complete if
work_done != weight?

Thanks,
Joonwoo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Stephen Hemminger
Perhaps we should change the warning to identify the guilty device.


--- a/net/core/dev.c2007-11-19 09:09:57.0 -0800
+++ b/net/core/dev.c2007-12-07 15:54:03.0 -0800
@@ -2196,7 +2196,13 @@ static void net_rx_action(struct softirq
if (test_bit(NAPI_STATE_SCHED, >state))
work = n->poll(n, weight);
 
-   WARN_ON_ONCE(work > weight);
+   if (unlikely(work > weight)) {
+   if (net_ratelimit())
+   printk(KERN_WARNING
+  "%s: driver poll bug (work=%d 
weight=%d)\n",
+  work, weight);
+   work = weight;
+   }
 
budget -= work;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Joonwoo Park
2007/12/12, Brandeburg, Jesse <[EMAIL PROTECTED]>:
> Joonwoo Park wrote:
> > /* If no Tx and not enough Rx work done, exit the polling mode */
> > if ((!tx_cleaned && (work_done == 0)) ||
> >!netif_running(poll_dev)) {
> > quit_polling:
> > if (likely(adapter->itr_setting & 3))
> > e1000_set_itr(adapter);
> > netif_rx_complete(poll_dev, napi);
> > e1000_irq_enable(adapter);
>
> all drivers using NAPI in 2.6.24+ (NNAPI??) must return zero here, after
> calling netif_rx_complete.  netif_rx_complete plus work_done != 0 causes
> a bug.
>
> > }
> >
> > return work_done;
> > }
> >
>

I'm working another patch for drivers (maybe patches)

Thanks.
Joonwoo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Brandeburg, Jesse
Joonwoo Park wrote:
> /* If no Tx and not enough Rx work done, exit the polling mode */
> if ((!tx_cleaned && (work_done == 0)) ||
>!netif_running(poll_dev)) {
> quit_polling:
> if (likely(adapter->itr_setting & 3))
> e1000_set_itr(adapter);
> netif_rx_complete(poll_dev, napi);
> e1000_irq_enable(adapter);

all drivers using NAPI in 2.6.24+ (NNAPI??) must return zero here, after
calling netif_rx_complete.  netif_rx_complete plus work_done != 0 causes
a bug.

> }
> 
> return work_done;
> }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Joonwoo Park
2007/12/11, David Miller <[EMAIL PROTECTED]>:
> From: "Joonwoo Park" <[EMAIL PROTECTED]>
> Date: Tue, 11 Dec 2007 18:13:34 +0900
>
> Joonwoo-ssi annyoung haseyo,

Wow Great! :-)

> How can the NAPI_STATE_SCHED bit be cleared externally yet we take
> this list_move_tail() code path?
>
> If NAPI_STATE_SCHED is cleared, work will be zero which will never be
> equal to 'weight', and this we'll never attempt the list_move_tail().
>
> If something clears NAPI_STATE_SCHED meanwhile, we have a serious race
> and your patch is an incomplete bandaid.  For example, if it can
> happen, then a case like:
>
>if (test_bit(NAPI_STATE_SCHED, >state))
>... something clears NAPI_STATE_SCHED right now ...
>work = n->poll(n, weight);
>
> can crash too.
>

David,
With your suggestions, I'm feeling a doubt at e1000.
It's seems to me e1000_clean does call netif_rx_complete and returns
non-zero work_done when !netif_running()
So net_rx_action has (work == weight) as true with NAPI_STATE_SCHED cleared.

Here is the e1000_clean.

static int
e1000_clean(struct napi_struct *napi, int budget)
{
/* Keep link state information with original netdev */
if (!netif_carrier_ok(poll_dev))
goto quit_polling;

...

adapter->clean_rx(adapter, >rx_ring[0],
  _done, budget);

/* If no Tx and not enough Rx work done, exit the polling mode */
if ((!tx_cleaned && (work_done == 0)) ||
   !netif_running(poll_dev)) {
quit_polling:
if (likely(adapter->itr_setting & 3))
e1000_set_itr(adapter);
netif_rx_complete(poll_dev, napi);
e1000_irq_enable(adapter);
}

return work_done;
}

Thanks.
Joonwoo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Tue, 11 Dec 2007 20:36:21 +0800

> David Miller <[EMAIL PROTECTED]> wrote:
> >
> > How can the NAPI_STATE_SCHED bit be cleared externally yet we take
> > this list_move_tail() code path?
> 
> His driver is probably buggy.  When we had two drivers beginning
> with e100 we often forgot to apply fixes to the both of them.  Now
> that we have three it's even more confusing.
> 
> I just checked and indeed e1000e seems to be missing the NAPI fix
> that was applied to e1000.  Of course it doesn't rule out the
> possibility of another NAPI bug in e1000.

Thanks for checking.

Indeed I stuck the huge comment there in net_rx_action() above the
list move to try and explain things to people, so that if you saw a
crash in the list manipulation, you're go check the driver first.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Herbert Xu
David Miller <[EMAIL PROTECTED]> wrote:
>
> How can the NAPI_STATE_SCHED bit be cleared externally yet we take
> this list_move_tail() code path?

His driver is probably buggy.  When we had two drivers beginning
with e100 we often forgot to apply fixes to the both of them.  Now
that we have three it's even more confusing.

I just checked and indeed e1000e seems to be missing the NAPI fix
that was applied to e1000.  Of course it doesn't rule out the
possibility of another NAPI bug in e1000.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread David Miller
From: "Joonwoo Park" <[EMAIL PROTECTED]>
Date: Tue, 11 Dec 2007 18:13:34 +0900

Joonwoo-ssi annyoung haseyo,

> [NET]: Fix Ooops of napi net_rx_action.
> Before doing list_move_tail napi poll_list, it should be ensured
> 
> Signed-off-by: Joonwoo Park <[EMAIL PROTECTED]>
> ---
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 86d6261..74bd5ab 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2207,7 +2207,8 @@ static void net_rx_action(struct softirq_action *h)
>* still "owns" the NAPI instance and therefore can
>* move the instance around on the list at-will.
>*/
> - if (unlikely(work == weight))
> + if (unlikely((test_bit(NAPI_STATE_SCHED, >state))
> + && (work == weight)))
>   list_move_tail(>poll_list, list);
>  
>   netpoll_poll_unlock(have);

How can the NAPI_STATE_SCHED bit be cleared externally yet we take
this list_move_tail() code path?

If NAPI_STATE_SCHED is cleared, work will be zero which will never be
equal to 'weight', and this we'll never attempt the list_move_tail().

If something clears NAPI_STATE_SCHED meanwhile, we have a serious race
and your patch is an incomplete bandaid.  For example, if it can
happen, then a case like:

if (test_bit(NAPI_STATE_SCHED, >state))
... something clears NAPI_STATE_SCHED right now ...
work = n->poll(n, weight);

can crash too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Joonwoo Park
[NET]: Fix Ooops of napi net_rx_action.
Before doing list_move_tail napi poll_list, it should be ensured

Signed-off-by: Joonwoo Park <[EMAIL PROTECTED]>
---
diff --git a/net/core/dev.c b/net/core/dev.c
index 86d6261..74bd5ab 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2207,7 +2207,8 @@ static void net_rx_action(struct softirq_action *h)
 * still "owns" the NAPI instance and therefore can
 * move the instance around on the list at-will.
 */
-   if (unlikely(work == weight))
+   if (unlikely((test_bit(NAPI_STATE_SCHED, >state))
+   && (work == weight)))
list_move_tail(>poll_list, list);
 
netpoll_poll_unlock(have);
---

I was able to make it real oops on 2.6.24-rc4 + e1000 napi + smp.
eth0 of my laptop is connected directly to a packet generator.
The packet generator is making unicast udp packets to laptop (approximately 
7pps).
At that time, if the interface is went down, Ooops occurred.

Thanks.
Joonwoo


[EMAIL PROTECTED]:~# ifconfig eth0 down
BUG: unable to handle kernel paging request at virtual address 00100104
printing eip: c42ecc35 *pdpt = 1fc89001 *pde = 
Ooops: 0002 [#1] SMP
Modules linked in:

Pid: 4, comm:ksoftirqd/0 Not tainted (2.6.24-rc4 #27)
EIP: 0060:[] EFLAGS: 00010046 CPU: 0
EIP is at net_rx_action+0x1d5/0x1f0
EAX: 00100100 EBX: f77c965c ECX:  EDX: 00200200
ESI: 0040 EDI: f77c965c EBP: f7c6df94 ESP: f7c6df64
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
 Process ksoftirqd/0 (pid: 4, ti=f7c6c000 task=f7c68ab0 task.ti=f7c6c000)
Stack: 0002 0001 c42ecabe c4598f40 c1d0f5cc c7da 00ac f77c965c
   0040 0001 c4543b18 c4598f40 f7c6dfb0 c4032a07 0004 
   0246  c4032f60 f7c6dfbc c4032ad7 c459ba80 f6c6dfcc c4032fb9
Call Trace:
 [] show_trace_log_lvl+0x1a/0x30
 [] show_stack_log_lvl+0xa9/0xd0
 [] show_registers+0xca/0x1c0
 [] die+0x115/0x230
 [] do_page_fault+0x34c/0x7f0
 [] error_code+0x72/0x78
 [] __do_softirq+0x87/0x60
 [] do_softirq+0x57/0xe0
 [] kthread+0x42/0x70
 [] kernel_thread_helper+0x7/0x14
 
gdb outputs
(gdb)  info line *net_rx_action+0x1d5
Line 157 of "include/linux/list.h"
   starts at address 0xc42ecc35 
   and ends at 0xc42ecc38 .
(gdb) l include/linux/list.h:157
152  * This is only for internal list manipulation where we know
153  * the prev/next entries already!
154  */
155 static inline void __list_del(struct list_head * prev, struct list_head 
* next)
156 {
157 next->prev = prev;
158 prev->next = next;
159 }
160
161 /**
(gdb) info line *__do_softirq+0x87
Line 130 of "include/linux/rcupdate.h"
   starts at address 0xc4032a07 <__do_softirq+135>
   and ends at 0xc4032a0a <__do_softirq+138>.
(gdb) l include/linux/rcupdate.h:130
125 struct rcu_data *rdp = _cpu(rcu_data, cpu);
126 rdp->passed_quiesc = 1;
127 }
128 static inline void rcu_bh_qsctr_inc(int cpu)
129 {
130 struct rcu_data *rdp = _cpu(rcu_bh_data, cpu);
131 rdp->passed_quiesc = 1;
132 }
133
134 extern int rcu_pending(int cpu);
(gdb) info line *do_softirq+0x57
Line 269 of "kernel/softirq.c" starts at address 0xc4032ad2 
   and ends at 0xc4032ae0 .
(gdb) l kernel/softirq.c:269
264 local_irq_save(flags);
265
266 pending = local_softirq_pending();
267
268 if (pending)
269 __do_softirq();
270
271 local_irq_restore(flags);
272 }
273

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Joonwoo Park
[NET]: Fix Ooops of napi net_rx_action.
Before doing list_move_tail napi poll_list, it should be ensured

Signed-off-by: Joonwoo Park [EMAIL PROTECTED]
---
diff --git a/net/core/dev.c b/net/core/dev.c
index 86d6261..74bd5ab 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2207,7 +2207,8 @@ static void net_rx_action(struct softirq_action *h)
 * still owns the NAPI instance and therefore can
 * move the instance around on the list at-will.
 */
-   if (unlikely(work == weight))
+   if (unlikely((test_bit(NAPI_STATE_SCHED, n-state))
+(work == weight)))
list_move_tail(n-poll_list, list);
 
netpoll_poll_unlock(have);
---

I was able to make it real oops on 2.6.24-rc4 + e1000 napi + smp.
eth0 of my laptop is connected directly to a packet generator.
The packet generator is making unicast udp packets to laptop (approximately 
7pps).
At that time, if the interface is went down, Ooops occurred.

Thanks.
Joonwoo


[EMAIL PROTECTED]:~# ifconfig eth0 down
BUG: unable to handle kernel paging request at virtual address 00100104
printing eip: c42ecc35 *pdpt = 1fc89001 *pde = 
Ooops: 0002 [#1] SMP
Modules linked in:

Pid: 4, comm:ksoftirqd/0 Not tainted (2.6.24-rc4 #27)
EIP: 0060:[c42ecc35] EFLAGS: 00010046 CPU: 0
EIP is at net_rx_action+0x1d5/0x1f0
EAX: 00100100 EBX: f77c965c ECX:  EDX: 00200200
ESI: 0040 EDI: f77c965c EBP: f7c6df94 ESP: f7c6df64
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
 Process ksoftirqd/0 (pid: 4, ti=f7c6c000 task=f7c68ab0 task.ti=f7c6c000)
Stack: 0002 0001 c42ecabe c4598f40 c1d0f5cc c7da 00ac f77c965c
   0040 0001 c4543b18 c4598f40 f7c6dfb0 c4032a07 0004 
   0246  c4032f60 f7c6dfbc c4032ad7 c459ba80 f6c6dfcc c4032fb9
Call Trace:
 [c40053ca] show_trace_log_lvl+0x1a/0x30
 [c4005489] show_stack_log_lvl+0xa9/0xd0
 [c400557a] show_registers+0xca/0x1c0
 [c4005785] die+0x115/0x230
 [c4020f0c] do_page_fault+0x34c/0x7f0
 [c43c9cea] error_code+0x72/0x78
 [c4032a07] __do_softirq+0x87/0x60
 [c4032ad7] do_softirq+0x57/0xe0
 [c4041ee2] kthread+0x42/0x70
 [c4004fc3] kernel_thread_helper+0x7/0x14
 
gdb outputs
(gdb)  info line *net_rx_action+0x1d5
Line 157 of include/linux/list.h
   starts at address 0xc42ecc35 net_rx_action+469
   and ends at 0xc42ecc38 net_rx_action+472.
(gdb) l include/linux/list.h:157
152  * This is only for internal list manipulation where we know
153  * the prev/next entries already!
154  */
155 static inline void __list_del(struct list_head * prev, struct list_head 
* next)
156 {
157 next-prev = prev;
158 prev-next = next;
159 }
160
161 /**
(gdb) info line *__do_softirq+0x87
Line 130 of include/linux/rcupdate.h
   starts at address 0xc4032a07 __do_softirq+135
   and ends at 0xc4032a0a __do_softirq+138.
(gdb) l include/linux/rcupdate.h:130
125 struct rcu_data *rdp = per_cpu(rcu_data, cpu);
126 rdp-passed_quiesc = 1;
127 }
128 static inline void rcu_bh_qsctr_inc(int cpu)
129 {
130 struct rcu_data *rdp = per_cpu(rcu_bh_data, cpu);
131 rdp-passed_quiesc = 1;
132 }
133
134 extern int rcu_pending(int cpu);
(gdb) info line *do_softirq+0x57
Line 269 of kernel/softirq.c starts at address 0xc4032ad2 do_softirq+82
   and ends at 0xc4032ae0 tasklet_kill.
(gdb) l kernel/softirq.c:269
264 local_irq_save(flags);
265
266 pending = local_softirq_pending();
267
268 if (pending)
269 __do_softirq();
270
271 local_irq_restore(flags);
272 }
273

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread David Miller
From: Joonwoo Park [EMAIL PROTECTED]
Date: Tue, 11 Dec 2007 18:13:34 +0900

Joonwoo-ssi annyoung haseyo,

 [NET]: Fix Ooops of napi net_rx_action.
 Before doing list_move_tail napi poll_list, it should be ensured
 
 Signed-off-by: Joonwoo Park [EMAIL PROTECTED]
 ---
 diff --git a/net/core/dev.c b/net/core/dev.c
 index 86d6261..74bd5ab 100644
 --- a/net/core/dev.c
 +++ b/net/core/dev.c
 @@ -2207,7 +2207,8 @@ static void net_rx_action(struct softirq_action *h)
* still owns the NAPI instance and therefore can
* move the instance around on the list at-will.
*/
 - if (unlikely(work == weight))
 + if (unlikely((test_bit(NAPI_STATE_SCHED, n-state))
 +  (work == weight)))
   list_move_tail(n-poll_list, list);
  
   netpoll_poll_unlock(have);

How can the NAPI_STATE_SCHED bit be cleared externally yet we take
this list_move_tail() code path?

If NAPI_STATE_SCHED is cleared, work will be zero which will never be
equal to 'weight', and this we'll never attempt the list_move_tail().

If something clears NAPI_STATE_SCHED meanwhile, we have a serious race
and your patch is an incomplete bandaid.  For example, if it can
happen, then a case like:

if (test_bit(NAPI_STATE_SCHED, n-state))
... something clears NAPI_STATE_SCHED right now ...
work = n-poll(n, weight);

can crash too.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Herbert Xu
David Miller [EMAIL PROTECTED] wrote:

 How can the NAPI_STATE_SCHED bit be cleared externally yet we take
 this list_move_tail() code path?

His driver is probably buggy.  When we had two drivers beginning
with e100 we often forgot to apply fixes to the both of them.  Now
that we have three it's even more confusing.

I just checked and indeed e1000e seems to be missing the NAPI fix
that was applied to e1000.  Of course it doesn't rule out the
possibility of another NAPI bug in e1000.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Tue, 11 Dec 2007 20:36:21 +0800

 David Miller [EMAIL PROTECTED] wrote:
 
  How can the NAPI_STATE_SCHED bit be cleared externally yet we take
  this list_move_tail() code path?
 
 His driver is probably buggy.  When we had two drivers beginning
 with e100 we often forgot to apply fixes to the both of them.  Now
 that we have three it's even more confusing.
 
 I just checked and indeed e1000e seems to be missing the NAPI fix
 that was applied to e1000.  Of course it doesn't rule out the
 possibility of another NAPI bug in e1000.

Thanks for checking.

Indeed I stuck the huge comment there in net_rx_action() above the
list move to try and explain things to people, so that if you saw a
crash in the list manipulation, you're go check the driver first.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Joonwoo Park
2007/12/11, David Miller [EMAIL PROTECTED]:
 From: Joonwoo Park [EMAIL PROTECTED]
 Date: Tue, 11 Dec 2007 18:13:34 +0900

 Joonwoo-ssi annyoung haseyo,

Wow Great! :-)

 How can the NAPI_STATE_SCHED bit be cleared externally yet we take
 this list_move_tail() code path?

 If NAPI_STATE_SCHED is cleared, work will be zero which will never be
 equal to 'weight', and this we'll never attempt the list_move_tail().

 If something clears NAPI_STATE_SCHED meanwhile, we have a serious race
 and your patch is an incomplete bandaid.  For example, if it can
 happen, then a case like:

if (test_bit(NAPI_STATE_SCHED, n-state))
... something clears NAPI_STATE_SCHED right now ...
work = n-poll(n, weight);

 can crash too.


David,
With your suggestions, I'm feeling a doubt at e1000.
It's seems to me e1000_clean does call netif_rx_complete and returns
non-zero work_done when !netif_running()
So net_rx_action has (work == weight) as true with NAPI_STATE_SCHED cleared.

Here is the e1000_clean.

static int
e1000_clean(struct napi_struct *napi, int budget)
{
/* Keep link state information with original netdev */
if (!netif_carrier_ok(poll_dev))
goto quit_polling;

...

adapter-clean_rx(adapter, adapter-rx_ring[0],
  work_done, budget);

/* If no Tx and not enough Rx work done, exit the polling mode */
if ((!tx_cleaned  (work_done == 0)) ||
   !netif_running(poll_dev)) {
quit_polling:
if (likely(adapter-itr_setting  3))
e1000_set_itr(adapter);
netif_rx_complete(poll_dev, napi);
e1000_irq_enable(adapter);
}

return work_done;
}

Thanks.
Joonwoo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Brandeburg, Jesse
Joonwoo Park wrote:
 /* If no Tx and not enough Rx work done, exit the polling mode */
 if ((!tx_cleaned  (work_done == 0)) ||
!netif_running(poll_dev)) {
 quit_polling:
 if (likely(adapter-itr_setting  3))
 e1000_set_itr(adapter);
 netif_rx_complete(poll_dev, napi);
 e1000_irq_enable(adapter);

all drivers using NAPI in 2.6.24+ (NNAPI??) must return zero here, after
calling netif_rx_complete.  netif_rx_complete plus work_done != 0 causes
a bug.

 }
 
 return work_done;
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Joonwoo Park
2007/12/12, Brandeburg, Jesse [EMAIL PROTECTED]:
 Joonwoo Park wrote:
  /* If no Tx and not enough Rx work done, exit the polling mode */
  if ((!tx_cleaned  (work_done == 0)) ||
 !netif_running(poll_dev)) {
  quit_polling:
  if (likely(adapter-itr_setting  3))
  e1000_set_itr(adapter);
  netif_rx_complete(poll_dev, napi);
  e1000_irq_enable(adapter);

 all drivers using NAPI in 2.6.24+ (NNAPI??) must return zero here, after
 calling netif_rx_complete.  netif_rx_complete plus work_done != 0 causes
 a bug.

  }
 
  return work_done;
  }
 


I'm working another patch for drivers (maybe patches)

Thanks.
Joonwoo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Stephen Hemminger
Perhaps we should change the warning to identify the guilty device.


--- a/net/core/dev.c2007-11-19 09:09:57.0 -0800
+++ b/net/core/dev.c2007-12-07 15:54:03.0 -0800
@@ -2196,7 +2196,13 @@ static void net_rx_action(struct softirq
if (test_bit(NAPI_STATE_SCHED, n-state))
work = n-poll(n, weight);
 
-   WARN_ON_ONCE(work  weight);
+   if (unlikely(work  weight)) {
+   if (net_ratelimit())
+   printk(KERN_WARNING
+  %s: driver poll bug (work=%d 
weight=%d)\n,
+  work, weight);
+   work = weight;
+   }
 
budget -= work;
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Joonwoo Park
2007/12/12, Brandeburg, Jesse [EMAIL PROTECTED]:

 all drivers using NAPI in 2.6.24+ (NNAPI??) must return zero here, after
 calling netif_rx_complete.  netif_rx_complete plus work_done != 0 causes
 a bug.


Brandeburg,
Don't we need to return non-zero work_done after netif_rx_complete if
work_done != weight?

Thanks,
Joonwoo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] [NET]: Fix Ooops of napi net_rx_action.

2007-12-11 Thread Brandeburg, Jesse
Joonwoo Park wrote:
 2007/12/12, Brandeburg, Jesse [EMAIL PROTECTED]:
 
 all drivers using NAPI in 2.6.24+ (NNAPI??) must return zero here,
 after calling netif_rx_complete.  netif_rx_complete plus work_done
 != 0 causes a bug. 
 
 
 Brandeburg,
 Don't we need to return non-zero work_done after netif_rx_complete if
 work_done != weight?

Actually we just need to make sure we don't return work_done == weight
as it is checked on line 2210 of dev.c.
I should also note that I was wrong above, and that the requirement is
that we MUST not return work_done == weight when indicating
netif_rx_complete.

So, returning 0 when we actually did some work, but are being removed
from the poll list because something like !netif_running, is probably
okay, but I'm sure someone will disagree with me.  Maybe returning like
this would be better
diff --git a/drivers/net/e1000/e1000_main.c
b/drivers/net/e1000/e1000_main.c
index 724f067..76d5e3b 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3933,6 +3933,10 @@ quit_polling:
e1000_set_itr(adapter);
netif_rx_complete(poll_dev, napi);
e1000_irq_enable(adapter);
+   if (work_done == weight)
+   return work_done - 1;
+   else
+   return work_done;
}

return work_done;

Jesse

 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/