Re: [RFC] net: napi fix

2007-12-20 Thread Robert Olsson
David Miller writes: Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block any new polls from starting. Hello! Yes but the reason was

Re: [RFC] net: napi fix

2007-12-20 Thread David Miller
From: Robert Olsson [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 10:52:17 +0100 Yes but the reason was not to wait for all pending polls to complete so a server/router could be rebooted even under high- load and DOS. We've experienced some nasty problems with this. I know, see the rest of the

Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 14:49:53 +0100 As a matter of fact, since it's unlikely() in net_rx_action() anyway, I wonder what is the main reason or gain of leaving such a tricky exception, instead of letting drivers to always decide which is the best moment

Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
On 12-12-2007 19:41, Kok, Auke wrote: David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all

Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Andrew Gallatin [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 09:13:54 -0500 If the netif_running() check is indeed required to make a device break out of napi polling and respond to an ifconfig down, then I think the netif_running() check should be moved up into net_rx_action() to avoid

Re: [RFC] net: napi fix

2007-12-13 Thread Andrew Gallatin
Joonwoo Park wrote: 2007/12/13, Kok, Auke [EMAIL PROTECTED]: David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens

Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
On Thu, Dec 13, 2007 at 05:50:13AM -0800, David Miller wrote: From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 14:49:53 +0100 As a matter of fact, since it's unlikely() in net_rx_action() anyway, I wonder what is the main reason or gain of leaving such a tricky exception,

Re: [RFC] net: napi fix

2007-12-13 Thread Kok, Auke
David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 09:13:54 -0500 If the netif_running() check is indeed required to make a device break out of napi polling and respond to an ifconfig down, then I think the netif_running() check should be moved up into

Re: [RFC] net: napi fix

2007-12-13 Thread Stephen Hemminger
On Thu, 13 Dec 2007 06:19:38 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 09:13:54 -0500 If the netif_running() check is indeed required to make a device break out of napi polling and respond to an ifconfig down, then I

Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Andrew Gallatin [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 14:02:25 -0500 Or perhaps we should just leave things as is. We should probably add a disabling state bit to the napi struct flags, this will be set by napi_disable() before it loops trying to set the sched bit. net_rx_action() can

Re: [RFC] net: napi fix

2007-12-13 Thread Andrew Gallatin
Stephen Hemminger wrote: On Thu, 13 Dec 2007 06:19:38 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 09:13:54 -0500 If the netif_running() check is indeed required to make a device break out of napi polling and respond to an

Re: [RFC] net: napi fix

2007-12-13 Thread Stephen Hemminger
David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 14:02:25 -0500 Or perhaps we should just leave things as is. We should probably add a disabling state bit to the napi struct flags, this will be set by napi_disable() before it loops trying to set the

Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
David Miller wrote, On 12/13/2007 02:50 PM: From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 14:49:53 +0100 As a matter of fact, since it's unlikely() in net_rx_action() anyway, I wonder what is the main reason or gain of leaving such a tricky exception, instead of letting

Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 21:16:12 +0100 I see in a nearby thread you would prefer to save some work to drivers (like this netif_running() check), but I think this all is at the cost of flexibility, and there will probably appear new problems, when a

Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 11:35:07 -0800 How about allowing a return value of -1 from napi_poll and letting device check itself. It doesn't avoid the code duplication in the -poll() fast paths. I don't care, on the other hand, if crap accumulates in

Re: [RFC] net: napi fix

2007-12-13 Thread Stephen Hemminger
David Miller wrote: From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 21:16:12 +0100 I see in a nearby thread you would prefer to save some work to drivers (like this netif_running() check), but I think this all is at the cost of flexibility, and there will probably appear new

Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
Stephen Hemminger wrote, On 12/13/2007 09:41 PM: David Miller wrote: From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 21:16:12 +0100 I see in a nearby thread you would prefer to save some work to drivers (like this netif_running() check), but I think this all is at the

Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
David Miller wrote, On 12/13/2007 09:37 PM: ... For example, if we export the list handling widget into the -poll() routines, god help the person who wants to change how the poll list is managed in net_rx_action() :-/ ...I'm afraid I can't understand: I mean doing the same but without passing

Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 23:28:41 +0100 ...I'm afraid I can't understand: I mean doing the same but without passing this info with 'work == weight': if driver sends this info, why it can't instead call something like napi_continue() with this

Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
David Miller wrote, On 12/13/2007 11:34 PM: From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 23:28:41 +0100 ...I'm afraid I can't understand: I mean doing the same but without passing this info with 'work == weight': if driver sends this info, why it can't instead call

Re: [RFC] net: napi fix

2007-12-13 Thread Joonwoo Park
2007/12/13, Andrew Gallatin [EMAIL PROTECTED]: If the netif_running() check is indeed required to make a device break out of napi polling and respond to an ifconfig down, then I think the netif_running() check should be moved up into net_rx_action() to avoid potential for driver complexity

Re: [RFC] net: napi fix

2007-12-12 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED] Date: Tue, 11 Dec 2007 21:46:34 -0800 Isn't this a better fix for all drivers, rather than peppering every driver with the special case. This is how the logic worked up until 2.6.24. Stephen this is not the problem. The problem is that the driver is

Re: [RFC] net: napi fix

2007-12-12 Thread David Miller
From: Joonwoo Park [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 15:05:26 +0900 Could you explain how it fix the problem? IMHO I think your patch cannot solve the problem. The drivers can call netif_rx_complete and net_rx_action can do list_move_tail also. Stephen is confused about what the bug

Re: [RFC] net: napi fix

2007-12-12 Thread Andrew Gallatin
David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block

Re: [RFC] net: napi fix

2007-12-12 Thread Andrew Gallatin
[I apologize for loosing threading, I'm replying from the archives] The problem is that the driver is doing a NAPI completion and re-enabling chip interrupts with work_done == weight, and that is illegal. The only time at least myri10ge will do this is due to the !netif_running(netdev)

Re: [RFC] net: napi fix

2007-12-12 Thread David Miller
From: Andrew Gallatin [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block any new polls from

Re: [RFC] net: napi fix

2007-12-12 Thread Kok, Auke
David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then

Re: [RFC] net: napi fix

2007-12-12 Thread Joonwoo Park
2007/12/13, Kok, Auke [EMAIL PROTECTED]: David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we

[RFC] net: napi fix

2007-12-11 Thread Stephen Hemminger
Isn't this a better fix for all drivers, rather than peppering every driver with the special case. This is how the logic worked up until 2.6.24. --- a/net/core/dev.c2007-12-11 12:16:20.0 -0800 +++ b/net/core/dev.c2007-12-11 21:43:39.0 -0800 @@ -2184,7 +2184,7 @@ static

Re: [RFC] net: napi fix

2007-12-11 Thread Joonwoo Park
2007/12/12, Stephen Hemminger [EMAIL PROTECTED]: Isn't this a better fix for all drivers, rather than peppering every driver with the special case. This is how the logic worked up until 2.6.24. --- a/net/core/dev.c2007-12-11 12:16:20.0 -0800 +++ b/net/core/dev.c2007-12-11