Re: [RFC] net: napi fix
From: Robert Olsson <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 10:52:17 +0100 > Yes but the reason was not to wait for all pending polls to > complete so a server/router could be rebooted even under high- > load and DOS. We've experienced some nasty problems with this. I know, see the rest of the thread where I agree that we need to deal with this somehow. The device is marked down first, and somehow we need to tip off of that to break out of the NAPI loop. This "how" is what hasn't been resolved yet. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller writes: > > Is the netif_running() check even required? > > No, it is not. > > When a device is brought down, one of the first things > that happens is that we wait for all pending NAPI polls > to complete, then block any new polls from starting. Hello! Yes but the reason was not to wait for all pending polls to complete so a server/router could be rebooted even under high- load and DOS. We've experienced some nasty problems with this. Cheers. --ro -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller writes: Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block any new polls from starting. Hello! Yes but the reason was not to wait for all pending polls to complete so a server/router could be rebooted even under high- load and DOS. We've experienced some nasty problems with this. Cheers. --ro -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Robert Olsson [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 10:52:17 +0100 Yes but the reason was not to wait for all pending polls to complete so a server/router could be rebooted even under high- load and DOS. We've experienced some nasty problems with this. I know, see the rest of the thread where I agree that we need to deal with this somehow. The device is marked down first, and somehow we need to tip off of that to break out of the NAPI loop. This how is what hasn't been resolved yet. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
2007/12/13, Andrew Gallatin <[EMAIL PROTECTED]>: > > If the netif_running() check is indeed required to make a device break > out of napi polling and respond to an ifconfig down, then I think the > netif_running() check should be moved up into net_rx_action() to avoid > potential for driver complexity and bugs like the ones you found. > > Drew > Yep, It looks good. Joonwoo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote, On 12/13/2007 11:34 PM: > From: Jarek Poplawski <[EMAIL PROTECTED]> > Date: Thu, 13 Dec 2007 23:28:41 +0100 > >> ...I'm afraid I can't understand: I mean doing the same but without >> passing this info with 'work == weight': if driver sends this info, >> why it can't instead call something like napi_continue() with >> this list_move_tail() (and probably additional local_irq_disable()/ >> enble() - but since it's unlikely()?) which looks much more readable, >> and saves one whole unlikely if ()? > > Because the poll list is private to net_rx_action() and we don't > want to expose implementation details like that to every > ->poll() implementation. So, it seems 'we' failed e.g. exposing napi_complete()... OK, no offense, I'll only mention at the end that there is always a possibility to redefine such a function to {} with any change of implementation. Jarek P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Jarek Poplawski <[EMAIL PROTECTED]> Date: Thu, 13 Dec 2007 23:28:41 +0100 > ...I'm afraid I can't understand: I mean doing the same but without > passing this info with 'work == weight': if driver sends this info, > why it can't instead call something like napi_continue() with > this list_move_tail() (and probably additional local_irq_disable()/ > enble() - but since it's unlikely()?) which looks much more readable, > and saves one whole unlikely if ()? Because the poll list is private to net_rx_action() and we don't want to expose implementation details like that to every ->poll() implementation. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote, On 12/13/2007 09:37 PM: ... > For example, if we export the list handling widget into the ->poll() > routines, god help the person who wants to change how the poll list is > managed in net_rx_action() :-/ ...I'm afraid I can't understand: I mean doing the same but without passing this info with 'work == weight': if driver sends this info, why it can't instead call something like napi_continue() with this list_move_tail() (and probably additional local_irq_disable()/ enble() - but since it's unlikely()?) which looks much more readable, and saves one whole unlikely if ()? Jarek P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
Stephen Hemminger wrote, On 12/13/2007 09:41 PM: > David Miller wrote: >> From: Jarek Poplawski <[EMAIL PROTECTED]> >> Date: Thu, 13 Dec 2007 21:16:12 +0100 >> >> >>> I see in a nearby thread you would prefer to save some work to drivers >>> (like this netif_running() check), but I think this all is at the cost >>> of flexibility, and there will probably appear new problems, when a >>> driver simply can't wait till the next poll (which btw. looks strange >>> with all these hotplugging, usb and powersaving). >>> >> As someone who has actually had to edit the NAPI support of _EVERY_ >> single driver in the tree I can tell you that code duplication and >> subtle semantic differences are a huge issue. >> >> And when you talk about driver flexibility, it's wise to mention that >> this comes at the expense of flexibility in the core implmentation. >> For example, if we export the list handling widget into the ->poll() >> routines, god help the person who wants to change how the poll list is >> managed in net_rx_action() :-/ >> >> So we don't want to export datastructure details like that to the >> driver. (I hope you both don't mind I save some 'paper' and do this 2 in 1...) So, you've seen a few drivers, know this much better than me, and maybe even thought why they all so unnecessarily different... Of course, if you think that despite those differences they all can work with simpler napi api then OK (until they don't have to do any cheating, like with this 'work' here). > Also, most of the drivers should/could be doing the same thing. It is > seems that > driver writers just want to get "creative" and do things differently. > The code is > cleaner, safer, and less buggy if every device uses the interface in the > same way. > > When I did the initial pass on this, I didn't see a single variation on > NAPI usage > that was better than the simple "get N packets and return" variation. > But Dave > did way more detailed grunt work on this. It seems there are some differences in thinking what is simple/complex. I think drivers' developers are used to controlling their devices, so they know better when to turn on/off interrupts. So, maybe similar model could be appropriate here. Sometimes doing more looks simpler than doing less and remembering how and when the rest will be done (like this netif_running() test). But I hope I'm wrong here, and this will work after all! Cheers, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote: From: Jarek Poplawski <[EMAIL PROTECTED]> Date: Thu, 13 Dec 2007 21:16:12 +0100 I see in a nearby thread you would prefer to save some work to drivers (like this netif_running() check), but I think this all is at the cost of flexibility, and there will probably appear new problems, when a driver simply can't wait till the next poll (which btw. looks strange with all these hotplugging, usb and powersaving). As someone who has actually had to edit the NAPI support of _EVERY_ single driver in the tree I can tell you that code duplication and subtle semantic differences are a huge issue. And when you talk about driver flexibility, it's wise to mention that this comes at the expense of flexibility in the core implmentation. For example, if we export the list handling widget into the ->poll() routines, god help the person who wants to change how the poll list is managed in net_rx_action() :-/ So we don't want to export datastructure details like that to the driver. Also, most of the drivers should/could be doing the same thing. It is seems that driver writers just want to get "creative" and do things differently. The code is cleaner, safer, and less buggy if every device uses the interface in the same way. When I did the initial pass on this, I didn't see a single variation on NAPI usage that was better than the simple "get N packets and return" variation. But Dave did way more detailed grunt work on this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Thu, 13 Dec 2007 11:35:07 -0800 > How about allowing a return value of -1 from napi_poll and letting > device check itself. It doesn't avoid the code duplication in the ->poll() fast paths. I don't care, on the other hand, if crap accumulates in non-critical slow paths like napi_disable() and dev_close(). That's why I'm suggesting solutions in that area. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Jarek Poplawski <[EMAIL PROTECTED]> Date: Thu, 13 Dec 2007 21:16:12 +0100 > I see in a nearby thread you would prefer to save some work to drivers > (like this netif_running() check), but I think this all is at the cost > of flexibility, and there will probably appear new problems, when a > driver simply can't wait till the next poll (which btw. looks strange > with all these hotplugging, usb and powersaving). As someone who has actually had to edit the NAPI support of _EVERY_ single driver in the tree I can tell you that code duplication and subtle semantic differences are a huge issue. And when you talk about driver flexibility, it's wise to mention that this comes at the expense of flexibility in the core implmentation. For example, if we export the list handling widget into the ->poll() routines, god help the person who wants to change how the poll list is managed in net_rx_action() :-/ So we don't want to export datastructure details like that to the driver. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote, On 12/13/2007 02:50 PM: > From: Jarek Poplawski <[EMAIL PROTECTED]> > Date: Thu, 13 Dec 2007 14:49:53 +0100 > >> As a matter of fact, since it's "unlikely()" in net_rx_action() anyway, >> I wonder what is the main reason or gain of leaving such a tricky >> exception, instead of letting drivers to always decide which is the >> best moment for napi_complete()? (Or maybe even, in such a case, they >> should call some function with this list_move_tail() if it's so >> useful?) > > It is the only sane way to synchronize the list manipulations. > > There has to be a way for ->poll() to tell net_rx_action() two things: > > 1) How much work was completed, so we can adjust 'budget' The 'budget' line would stay where it is. IMHO, it's only about this list_move_tail(). (Probably also doing netpoll_poll_unlock() during n->poll() could be considered to let the driver even destroy napi just after napi_complete() - but it's another subject.) > 2) Was the NAPI quota exhausted? So that we know that >net_rx_action() still "owns" the polling context and >thus can do the list manipulation safely. > > And these both need to be encoded into one single return value, thus > the adopted convention that "work == weight" means that the device has > not done a NAPI complete. Of course, with some care and explanations to driver maintainers, like in this case, this all should probably work like it is. But IMHO it would be easier to remember and maintain if there are some simple rules with no exceptions, so here e.g. driver always "owns" (with functions like napi_schedule(), napi_complete(), and maybe napi_move_tail()), and net_rx_action() only reads the list and runs these functions?! I see in a nearby thread you would prefer to save some work to drivers (like this netif_running() check), but I think this all is at the cost of flexibility, and there will probably appear new problems, when a driver simply can't wait till the next poll (which btw. looks strange with all these hotplugging, usb and powersaving). Regards, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote: From: Andrew Gallatin <[EMAIL PROTECTED]> Date: Thu, 13 Dec 2007 14:02:25 -0500 Or perhaps we should just leave things as is. We should probably add a "disabling" state bit to the napi struct flags, this will be set by napi_disable() before it loops trying to set the sched bit. net_rx_action() can then check this. How about allowing a return value of -1 from napi_poll and letting device check itself. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
Stephen Hemminger wrote: On Thu, 13 Dec 2007 06:19:38 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote: From: Andrew Gallatin <[EMAIL PROTECTED]> Date: Thu, 13 Dec 2007 09:13:54 -0500 If the netif_running() check is indeed required to make a device break out of napi polling and respond to an ifconfig down, then I think the netif_running() check should be moved up into net_rx_action() to avoid potential for driver complexity and bugs like the ones you found. That, or something like it, definitely sounds reasonable and much better than putting the check into every driver :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html It is not possible to do netif_running() check in generic code as currently written because of the case of devices where a single NAPI object is being used to handle two devices. The association between napi and netdevice is M to N. There are cases like niu that have multiple NAPI's and one netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's. Ah, now I see. I forgot that not every device has a 1:1::napi:netdev relationship. Could we make an optional *dev_state field in the napi structure. It would be initialized to __LINK_STATE_START. Devices which have a 1:1 NAPI:netdevice relationship would set it to >state. The generic code would then do a test_bit(__LINK_STATE_START, napi->dev_state), and 1:1 drivers could remove this check. M:N drivers would pay for a useless (to them) test_bit, and would have to provide their own netif_running check to get termination under heavy load. Just an idea, perhaps there is a better way which is less hacky. Or perhaps we should just leave things as is. Drew -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Andrew Gallatin <[EMAIL PROTECTED]> Date: Thu, 13 Dec 2007 14:02:25 -0500 > Or perhaps we should just leave things as is. We should probably add a "disabling" state bit to the napi struct flags, this will be set by napi_disable() before it loops trying to set the sched bit. net_rx_action() can then check this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
On Thu, 13 Dec 2007 06:19:38 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote: > From: Andrew Gallatin <[EMAIL PROTECTED]> > Date: Thu, 13 Dec 2007 09:13:54 -0500 > > > If the netif_running() check is indeed required to make a device break > > out of napi polling and respond to an ifconfig down, then I think the > > netif_running() check should be moved up into net_rx_action() to avoid > > potential for driver complexity and bugs like the ones you found. > > That, or something like it, definitely sounds reasonable and much > better than putting the check into every driver :-) > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html It is not possible to do netif_running() check in generic code as currently written because of the case of devices where a single NAPI object is being used to handle two devices. The association between napi and netdevice is M to N. There are cases like niu that have multiple NAPI's and one netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's. The existing pointer from napi to netdevice is only used by netconsole now. For devices like sky2 it means that netconsole can't work on the the second port which is a not a big problem. But adding a netif_running() check would be a big issue. -- Stephen Hemminger <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote: > From: Andrew Gallatin <[EMAIL PROTECTED]> > Date: Thu, 13 Dec 2007 09:13:54 -0500 > >> If the netif_running() check is indeed required to make a device break >> out of napi polling and respond to an ifconfig down, then I think the >> netif_running() check should be moved up into net_rx_action() to avoid >> potential for driver complexity and bugs like the ones you found. > > That, or something like it, definitely sounds reasonable and much > better than putting the check into every driver :-) hear hear! Auke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Andrew Gallatin <[EMAIL PROTECTED]> Date: Thu, 13 Dec 2007 09:13:54 -0500 > If the netif_running() check is indeed required to make a device break > out of napi polling and respond to an ifconfig down, then I think the > netif_running() check should be moved up into net_rx_action() to avoid > potential for driver complexity and bugs like the ones you found. That, or something like it, definitely sounds reasonable and much better than putting the check into every driver :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
Joonwoo Park wrote: > 2007/12/13, Kok, Auke <[EMAIL PROTECTED]>: >> David Miller wrote: >>> From: Andrew Gallatin <[EMAIL PROTECTED]> >>> Date: Wed, 12 Dec 2007 12:29:23 -0500 >>> Is the netif_running() check even required? >>> No, it is not. >>> >>> When a device is brought down, one of the first things >>> that happens is that we wait for all pending NAPI polls >>> to complete, then block any new polls from starting. >> I think this was previously (pre-2.6.24) not the case, which is why e1000 et al >> has this check as well and that's exactly what is causing most of the >> net_rx_action oopses in the first place. Without the netif_running() check >> previously the drivers were just unusable with NAPI and prone to many races with >> down (i.e. touching some ethtool ioctl which wants to do a reset while routing >> small packets at high numbers). that's why we added the netif_running() check in >> the first place :) >> >> There might be more drivers lurking that need this change... >> >> Auke >> > > Also in my case, without netif_running() check, I cannot do ifconfig down. > It stucked if packet generator was sending packets. If the netif_running() check is indeed required to make a device break out of napi polling and respond to an ifconfig down, then I think the netif_running() check should be moved up into net_rx_action() to avoid potential for driver complexity and bugs like the ones you found. Drew -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
On Thu, Dec 13, 2007 at 05:50:13AM -0800, David Miller wrote: > From: Jarek Poplawski <[EMAIL PROTECTED]> > Date: Thu, 13 Dec 2007 14:49:53 +0100 > > > As a matter of fact, since it's "unlikely()" in net_rx_action() anyway, > > I wonder what is the main reason or gain of leaving such a tricky > > exception, instead of letting drivers to always decide which is the > > best moment for napi_complete()? (Or maybe even, in such a case, they > > should call some function with this list_move_tail() if it's so > > useful?) > > It is the only sane way to synchronize the list manipulations. > > There has to be a way for ->poll() to tell net_rx_action() two things: > > 1) How much work was completed, so we can adjust 'budget' > 2) Was the NAPI quota exhausted? So that we know that >net_rx_action() still "owns" the polling context and >thus can do the list manipulation safely. > > And these both need to be encoded into one single return value, thus > the adopted convention that "work == weight" means that the device has > not done a NAPI complete. Thanks! So, I've to rethink this all... Jarek P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Jarek Poplawski <[EMAIL PROTECTED]> Date: Thu, 13 Dec 2007 14:49:53 +0100 > As a matter of fact, since it's "unlikely()" in net_rx_action() anyway, > I wonder what is the main reason or gain of leaving such a tricky > exception, instead of letting drivers to always decide which is the > best moment for napi_complete()? (Or maybe even, in such a case, they > should call some function with this list_move_tail() if it's so > useful?) It is the only sane way to synchronize the list manipulations. There has to be a way for ->poll() to tell net_rx_action() two things: 1) How much work was completed, so we can adjust 'budget' 2) Was the NAPI quota exhausted? So that we know that net_rx_action() still "owns" the polling context and thus can do the list manipulation safely. And these both need to be encoded into one single return value, thus the adopted convention that "work == weight" means that the device has not done a NAPI complete. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
On 12-12-2007 19:41, Kok, Auke wrote: > David Miller wrote: >> From: Andrew Gallatin <[EMAIL PROTECTED]> >> Date: Wed, 12 Dec 2007 12:29:23 -0500 >> >>> Is the netif_running() check even required? >> No, it is not. >> >> When a device is brought down, one of the first things >> that happens is that we wait for all pending NAPI polls >> to complete, then block any new polls from starting. > > I think this was previously (pre-2.6.24) not the case, which is why e1000 et > al > has this check as well and that's exactly what is causing most of the > net_rx_action oopses in the first place. Without the netif_running() check > previously the drivers were just unusable with NAPI and prone to many races > with > down (i.e. touching some ethtool ioctl which wants to do a reset while routing > small packets at high numbers). that's why we added the netif_running() check > in > the first place :) > > There might be more drivers lurking that need this change... > As a matter of fact, since it's "unlikely()" in net_rx_action() anyway, I wonder what is the main reason or gain of leaving such a tricky exception, instead of letting drivers to always decide which is the best moment for napi_complete()? (Or maybe even, in such a case, they should call some function with this list_move_tail() if it's so useful?) Regards, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 14:49:53 +0100 As a matter of fact, since it's unlikely() in net_rx_action() anyway, I wonder what is the main reason or gain of leaving such a tricky exception, instead of letting drivers to always decide which is the best moment for napi_complete()? (Or maybe even, in such a case, they should call some function with this list_move_tail() if it's so useful?) It is the only sane way to synchronize the list manipulations. There has to be a way for -poll() to tell net_rx_action() two things: 1) How much work was completed, so we can adjust 'budget' 2) Was the NAPI quota exhausted? So that we know that net_rx_action() still owns the polling context and thus can do the list manipulation safely. And these both need to be encoded into one single return value, thus the adopted convention that work == weight means that the device has not done a NAPI complete. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
On 12-12-2007 19:41, Kok, Auke wrote: David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block any new polls from starting. I think this was previously (pre-2.6.24) not the case, which is why e1000 et al has this check as well and that's exactly what is causing most of the net_rx_action oopses in the first place. Without the netif_running() check previously the drivers were just unusable with NAPI and prone to many races with down (i.e. touching some ethtool ioctl which wants to do a reset while routing small packets at high numbers). that's why we added the netif_running() check in the first place :) There might be more drivers lurking that need this change... As a matter of fact, since it's unlikely() in net_rx_action() anyway, I wonder what is the main reason or gain of leaving such a tricky exception, instead of letting drivers to always decide which is the best moment for napi_complete()? (Or maybe even, in such a case, they should call some function with this list_move_tail() if it's so useful?) Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Andrew Gallatin [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 09:13:54 -0500 If the netif_running() check is indeed required to make a device break out of napi polling and respond to an ifconfig down, then I think the netif_running() check should be moved up into net_rx_action() to avoid potential for driver complexity and bugs like the ones you found. That, or something like it, definitely sounds reasonable and much better than putting the check into every driver :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
Joonwoo Park wrote: 2007/12/13, Kok, Auke [EMAIL PROTECTED]: David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block any new polls from starting. I think this was previously (pre-2.6.24) not the case, which is why e1000 et al has this check as well and that's exactly what is causing most of the net_rx_action oopses in the first place. Without the netif_running() check previously the drivers were just unusable with NAPI and prone to many races with down (i.e. touching some ethtool ioctl which wants to do a reset while routing small packets at high numbers). that's why we added the netif_running() check in the first place :) There might be more drivers lurking that need this change... Auke Also in my case, without netif_running() check, I cannot do ifconfig down. It stucked if packet generator was sending packets. If the netif_running() check is indeed required to make a device break out of napi polling and respond to an ifconfig down, then I think the netif_running() check should be moved up into net_rx_action() to avoid potential for driver complexity and bugs like the ones you found. Drew -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
On Thu, Dec 13, 2007 at 05:50:13AM -0800, David Miller wrote: From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 14:49:53 +0100 As a matter of fact, since it's unlikely() in net_rx_action() anyway, I wonder what is the main reason or gain of leaving such a tricky exception, instead of letting drivers to always decide which is the best moment for napi_complete()? (Or maybe even, in such a case, they should call some function with this list_move_tail() if it's so useful?) It is the only sane way to synchronize the list manipulations. There has to be a way for -poll() to tell net_rx_action() two things: 1) How much work was completed, so we can adjust 'budget' 2) Was the NAPI quota exhausted? So that we know that net_rx_action() still owns the polling context and thus can do the list manipulation safely. And these both need to be encoded into one single return value, thus the adopted convention that work == weight means that the device has not done a NAPI complete. Thanks! So, I've to rethink this all... Jarek P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 09:13:54 -0500 If the netif_running() check is indeed required to make a device break out of napi polling and respond to an ifconfig down, then I think the netif_running() check should be moved up into net_rx_action() to avoid potential for driver complexity and bugs like the ones you found. That, or something like it, definitely sounds reasonable and much better than putting the check into every driver :-) hear hear! Auke -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
On Thu, 13 Dec 2007 06:19:38 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 09:13:54 -0500 If the netif_running() check is indeed required to make a device break out of napi polling and respond to an ifconfig down, then I think the netif_running() check should be moved up into net_rx_action() to avoid potential for driver complexity and bugs like the ones you found. That, or something like it, definitely sounds reasonable and much better than putting the check into every driver :-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html It is not possible to do netif_running() check in generic code as currently written because of the case of devices where a single NAPI object is being used to handle two devices. The association between napi and netdevice is M to N. There are cases like niu that have multiple NAPI's and one netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's. The existing pointer from napi to netdevice is only used by netconsole now. For devices like sky2 it means that netconsole can't work on the the second port which is a not a big problem. But adding a netif_running() check would be a big issue. -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Andrew Gallatin [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 14:02:25 -0500 Or perhaps we should just leave things as is. We should probably add a disabling state bit to the napi struct flags, this will be set by napi_disable() before it loops trying to set the sched bit. net_rx_action() can then check this. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
Stephen Hemminger wrote: On Thu, 13 Dec 2007 06:19:38 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 09:13:54 -0500 If the netif_running() check is indeed required to make a device break out of napi polling and respond to an ifconfig down, then I think the netif_running() check should be moved up into net_rx_action() to avoid potential for driver complexity and bugs like the ones you found. That, or something like it, definitely sounds reasonable and much better than putting the check into every driver :-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html It is not possible to do netif_running() check in generic code as currently written because of the case of devices where a single NAPI object is being used to handle two devices. The association between napi and netdevice is M to N. There are cases like niu that have multiple NAPI's and one netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's. Ah, now I see. I forgot that not every device has a 1:1::napi:netdev relationship. Could we make an optional *dev_state field in the napi structure. It would be initialized to __LINK_STATE_START. Devices which have a 1:1 NAPI:netdevice relationship would set it to netdev-state. The generic code would then do a test_bit(__LINK_STATE_START, napi-dev_state), and 1:1 drivers could remove this check. M:N drivers would pay for a useless (to them) test_bit, and would have to provide their own netif_running check to get termination under heavy load. Just an idea, perhaps there is a better way which is less hacky. Or perhaps we should just leave things as is. Drew -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 14:02:25 -0500 Or perhaps we should just leave things as is. We should probably add a disabling state bit to the napi struct flags, this will be set by napi_disable() before it loops trying to set the sched bit. net_rx_action() can then check this. How about allowing a return value of -1 from napi_poll and letting device check itself. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote, On 12/13/2007 02:50 PM: From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 14:49:53 +0100 As a matter of fact, since it's unlikely() in net_rx_action() anyway, I wonder what is the main reason or gain of leaving such a tricky exception, instead of letting drivers to always decide which is the best moment for napi_complete()? (Or maybe even, in such a case, they should call some function with this list_move_tail() if it's so useful?) It is the only sane way to synchronize the list manipulations. There has to be a way for -poll() to tell net_rx_action() two things: 1) How much work was completed, so we can adjust 'budget' The 'budget' line would stay where it is. IMHO, it's only about this list_move_tail(). (Probably also doing netpoll_poll_unlock() during n-poll() could be considered to let the driver even destroy napi just after napi_complete() - but it's another subject.) 2) Was the NAPI quota exhausted? So that we know that net_rx_action() still owns the polling context and thus can do the list manipulation safely. And these both need to be encoded into one single return value, thus the adopted convention that work == weight means that the device has not done a NAPI complete. Of course, with some care and explanations to driver maintainers, like in this case, this all should probably work like it is. But IMHO it would be easier to remember and maintain if there are some simple rules with no exceptions, so here e.g. driver always owns (with functions like napi_schedule(), napi_complete(), and maybe napi_move_tail()), and net_rx_action() only reads the list and runs these functions?! I see in a nearby thread you would prefer to save some work to drivers (like this netif_running() check), but I think this all is at the cost of flexibility, and there will probably appear new problems, when a driver simply can't wait till the next poll (which btw. looks strange with all these hotplugging, usb and powersaving). Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 21:16:12 +0100 I see in a nearby thread you would prefer to save some work to drivers (like this netif_running() check), but I think this all is at the cost of flexibility, and there will probably appear new problems, when a driver simply can't wait till the next poll (which btw. looks strange with all these hotplugging, usb and powersaving). As someone who has actually had to edit the NAPI support of _EVERY_ single driver in the tree I can tell you that code duplication and subtle semantic differences are a huge issue. And when you talk about driver flexibility, it's wise to mention that this comes at the expense of flexibility in the core implmentation. For example, if we export the list handling widget into the -poll() routines, god help the person who wants to change how the poll list is managed in net_rx_action() :-/ So we don't want to export datastructure details like that to the driver. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Stephen Hemminger [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 11:35:07 -0800 How about allowing a return value of -1 from napi_poll and letting device check itself. It doesn't avoid the code duplication in the -poll() fast paths. I don't care, on the other hand, if crap accumulates in non-critical slow paths like napi_disable() and dev_close(). That's why I'm suggesting solutions in that area. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote: From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 21:16:12 +0100 I see in a nearby thread you would prefer to save some work to drivers (like this netif_running() check), but I think this all is at the cost of flexibility, and there will probably appear new problems, when a driver simply can't wait till the next poll (which btw. looks strange with all these hotplugging, usb and powersaving). As someone who has actually had to edit the NAPI support of _EVERY_ single driver in the tree I can tell you that code duplication and subtle semantic differences are a huge issue. And when you talk about driver flexibility, it's wise to mention that this comes at the expense of flexibility in the core implmentation. For example, if we export the list handling widget into the -poll() routines, god help the person who wants to change how the poll list is managed in net_rx_action() :-/ So we don't want to export datastructure details like that to the driver. Also, most of the drivers should/could be doing the same thing. It is seems that driver writers just want to get creative and do things differently. The code is cleaner, safer, and less buggy if every device uses the interface in the same way. When I did the initial pass on this, I didn't see a single variation on NAPI usage that was better than the simple get N packets and return variation. But Dave did way more detailed grunt work on this. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
Stephen Hemminger wrote, On 12/13/2007 09:41 PM: David Miller wrote: From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 21:16:12 +0100 I see in a nearby thread you would prefer to save some work to drivers (like this netif_running() check), but I think this all is at the cost of flexibility, and there will probably appear new problems, when a driver simply can't wait till the next poll (which btw. looks strange with all these hotplugging, usb and powersaving). As someone who has actually had to edit the NAPI support of _EVERY_ single driver in the tree I can tell you that code duplication and subtle semantic differences are a huge issue. And when you talk about driver flexibility, it's wise to mention that this comes at the expense of flexibility in the core implmentation. For example, if we export the list handling widget into the -poll() routines, god help the person who wants to change how the poll list is managed in net_rx_action() :-/ So we don't want to export datastructure details like that to the driver. (I hope you both don't mind I save some 'paper' and do this 2 in 1...) So, you've seen a few drivers, know this much better than me, and maybe even thought why they all so unnecessarily different... Of course, if you think that despite those differences they all can work with simpler napi api then OK (until they don't have to do any cheating, like with this 'work' here). Also, most of the drivers should/could be doing the same thing. It is seems that driver writers just want to get creative and do things differently. The code is cleaner, safer, and less buggy if every device uses the interface in the same way. When I did the initial pass on this, I didn't see a single variation on NAPI usage that was better than the simple get N packets and return variation. But Dave did way more detailed grunt work on this. It seems there are some differences in thinking what is simple/complex. I think drivers' developers are used to controlling their devices, so they know better when to turn on/off interrupts. So, maybe similar model could be appropriate here. Sometimes doing more looks simpler than doing less and remembering how and when the rest will be done (like this netif_running() test). But I hope I'm wrong here, and this will work after all! Cheers, Jarek P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote, On 12/13/2007 09:37 PM: ... For example, if we export the list handling widget into the -poll() routines, god help the person who wants to change how the poll list is managed in net_rx_action() :-/ ...I'm afraid I can't understand: I mean doing the same but without passing this info with 'work == weight': if driver sends this info, why it can't instead call something like napi_continue() with this list_move_tail() (and probably additional local_irq_disable()/ enble() - but since it's unlikely()?) which looks much more readable, and saves one whole unlikely if ()? Jarek P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 23:28:41 +0100 ...I'm afraid I can't understand: I mean doing the same but without passing this info with 'work == weight': if driver sends this info, why it can't instead call something like napi_continue() with this list_move_tail() (and probably additional local_irq_disable()/ enble() - but since it's unlikely()?) which looks much more readable, and saves one whole unlikely if ()? Because the poll list is private to net_rx_action() and we don't want to expose implementation details like that to every -poll() implementation. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote, On 12/13/2007 11:34 PM: From: Jarek Poplawski [EMAIL PROTECTED] Date: Thu, 13 Dec 2007 23:28:41 +0100 ...I'm afraid I can't understand: I mean doing the same but without passing this info with 'work == weight': if driver sends this info, why it can't instead call something like napi_continue() with this list_move_tail() (and probably additional local_irq_disable()/ enble() - but since it's unlikely()?) which looks much more readable, and saves one whole unlikely if ()? Because the poll list is private to net_rx_action() and we don't want to expose implementation details like that to every -poll() implementation. So, it seems 'we' failed e.g. exposing napi_complete()... OK, no offense, I'll only mention at the end that there is always a possibility to redefine such a function to {} with any change of implementation. Jarek P. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
2007/12/13, Andrew Gallatin [EMAIL PROTECTED]: If the netif_running() check is indeed required to make a device break out of napi polling and respond to an ifconfig down, then I think the netif_running() check should be moved up into net_rx_action() to avoid potential for driver complexity and bugs like the ones you found. Drew Yep, It looks good. Joonwoo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
2007/12/13, Kok, Auke <[EMAIL PROTECTED]>: > David Miller wrote: > > From: Andrew Gallatin <[EMAIL PROTECTED]> > > Date: Wed, 12 Dec 2007 12:29:23 -0500 > > > >> Is the netif_running() check even required? > > > > No, it is not. > > > > When a device is brought down, one of the first things > > that happens is that we wait for all pending NAPI polls > > to complete, then block any new polls from starting. > > I think this was previously (pre-2.6.24) not the case, which is why e1000 et > al > has this check as well and that's exactly what is causing most of the > net_rx_action oopses in the first place. Without the netif_running() check > previously the drivers were just unusable with NAPI and prone to many races > with > down (i.e. touching some ethtool ioctl which wants to do a reset while routing > small packets at high numbers). that's why we added the netif_running() check > in > the first place :) > > There might be more drivers lurking that need this change... > > Auke > Also in my case, without netif_running() check, I cannot do ifconfig down. It stucked if packet generator was sending packets. Thanks Joonwoo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote: > From: Andrew Gallatin <[EMAIL PROTECTED]> > Date: Wed, 12 Dec 2007 12:29:23 -0500 > >> Is the netif_running() check even required? > > No, it is not. > > When a device is brought down, one of the first things > that happens is that we wait for all pending NAPI polls > to complete, then block any new polls from starting. I think this was previously (pre-2.6.24) not the case, which is why e1000 et al has this check as well and that's exactly what is causing most of the net_rx_action oopses in the first place. Without the netif_running() check previously the drivers were just unusable with NAPI and prone to many races with down (i.e. touching some ethtool ioctl which wants to do a reset while routing small packets at high numbers). that's why we added the netif_running() check in the first place :) There might be more drivers lurking that need this change... Auke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote: From: Andrew Gallatin <[EMAIL PROTECTED]> Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block any new polls from starting. Great, thanks. I will submit a patch to remove the bogus check. This should fix myri10ge properly. Thank you, Drew -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Andrew Gallatin <[EMAIL PROTECTED]> Date: Wed, 12 Dec 2007 12:29:23 -0500 > Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block any new polls from starting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
[I apologize for loosing threading, I'm replying from the archives] > The problem is that the driver is doing a NAPI completion and > re-enabling chip interrupts with work_done == weight, and that is > illegal. The only time at least myri10ge will do this is due to the !netif_running(netdev) check. Eg, from myri10ge's poll: work_done = myri10ge_clean_rx_done(mgp, budget); if (work_done < budget || !netif_running(netdev)) { netif_rx_complete(netdev, napi); put_be32(htonl(3), mgp->irq_claim); } Is the netif_running() check even required? Is this just a bad way to solve a race with running NAPI at down() time that would be better solved by putting a napi_synchronize() in the driver's down() routine? I'd rather fix this right than add another check to a questionable code path. Thanks, Drew -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: "Joonwoo Park" <[EMAIL PROTECTED]> Date: Wed, 12 Dec 2007 15:05:26 +0900 > Could you explain how it fix the problem? > IMHO I think your patch cannot solve the problem. > The drivers can call netif_rx_complete and net_rx_action can do > list_move_tail also. Stephen is confused about what the bug is in these drivers, that's all. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 11 Dec 2007 21:46:34 -0800 > Isn't this a better fix for all drivers, rather than peppering every > driver with the special case. This is how the logic worked up until > 2.6.24. Stephen this is not the problem. The problem is that the driver is doing a NAPI completion and re-enabling chip interrupts with work_done == weight, and that is illegal. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Stephen Hemminger [EMAIL PROTECTED] Date: Tue, 11 Dec 2007 21:46:34 -0800 Isn't this a better fix for all drivers, rather than peppering every driver with the special case. This is how the logic worked up until 2.6.24. Stephen this is not the problem. The problem is that the driver is doing a NAPI completion and re-enabling chip interrupts with work_done == weight, and that is illegal. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Joonwoo Park [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 15:05:26 +0900 Could you explain how it fix the problem? IMHO I think your patch cannot solve the problem. The drivers can call netif_rx_complete and net_rx_action can do list_move_tail also. Stephen is confused about what the bug is in these drivers, that's all. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
From: Andrew Gallatin [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block any new polls from starting. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
[I apologize for loosing threading, I'm replying from the archives] The problem is that the driver is doing a NAPI completion and re-enabling chip interrupts with work_done == weight, and that is illegal. The only time at least myri10ge will do this is due to the !netif_running(netdev) check. Eg, from myri10ge's poll: work_done = myri10ge_clean_rx_done(mgp, budget); if (work_done budget || !netif_running(netdev)) { netif_rx_complete(netdev, napi); put_be32(htonl(3), mgp-irq_claim); } Is the netif_running() check even required? Is this just a bad way to solve a race with running NAPI at down() time that would be better solved by putting a napi_synchronize() in the driver's down() routine? I'd rather fix this right than add another check to a questionable code path. Thanks, Drew -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block any new polls from starting. Great, thanks. I will submit a patch to remove the bogus check. This should fix myri10ge properly. Thank you, Drew -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block any new polls from starting. I think this was previously (pre-2.6.24) not the case, which is why e1000 et al has this check as well and that's exactly what is causing most of the net_rx_action oopses in the first place. Without the netif_running() check previously the drivers were just unusable with NAPI and prone to many races with down (i.e. touching some ethtool ioctl which wants to do a reset while routing small packets at high numbers). that's why we added the netif_running() check in the first place :) There might be more drivers lurking that need this change... Auke -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
2007/12/13, Kok, Auke [EMAIL PROTECTED]: David Miller wrote: From: Andrew Gallatin [EMAIL PROTECTED] Date: Wed, 12 Dec 2007 12:29:23 -0500 Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block any new polls from starting. I think this was previously (pre-2.6.24) not the case, which is why e1000 et al has this check as well and that's exactly what is causing most of the net_rx_action oopses in the first place. Without the netif_running() check previously the drivers were just unusable with NAPI and prone to many races with down (i.e. touching some ethtool ioctl which wants to do a reset while routing small packets at high numbers). that's why we added the netif_running() check in the first place :) There might be more drivers lurking that need this change... Auke Also in my case, without netif_running() check, I cannot do ifconfig down. It stucked if packet generator was sending packets. Thanks Joonwoo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
2007/12/12, Stephen Hemminger <[EMAIL PROTECTED]>: > Isn't this a better fix for all drivers, rather than peppering every > driver with the special case. This is how the logic worked up until > 2.6.24. > > > --- a/net/core/dev.c2007-12-11 12:16:20.0 -0800 > +++ b/net/core/dev.c2007-12-11 21:43:39.0 -0800 > @@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq > >have = netpoll_poll_lock(n); > > - weight = n->weight; > + weight = min(n->weight, budget); > >/* This NAPI_STATE_SCHED test is for avoiding a race > * with netpoll's poll_napi(). Only the entity which > Stephen, Could you explain how it fix the problem? IMHO I think your patch cannot solve the problem. The drivers can call netif_rx_complete and net_rx_action can do list_move_tail also. Am I missing something? Thanks Joonwoo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] net: napi fix
Isn't this a better fix for all drivers, rather than peppering every driver with the special case. This is how the logic worked up until 2.6.24. --- a/net/core/dev.c2007-12-11 12:16:20.0 -0800 +++ b/net/core/dev.c2007-12-11 21:43:39.0 -0800 @@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq have = netpoll_poll_lock(n); - weight = n->weight; + weight = min(n->weight, budget); /* This NAPI_STATE_SCHED test is for avoiding a race * with netpoll's poll_napi(). Only the entity which -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] net: napi fix
Isn't this a better fix for all drivers, rather than peppering every driver with the special case. This is how the logic worked up until 2.6.24. --- a/net/core/dev.c2007-12-11 12:16:20.0 -0800 +++ b/net/core/dev.c2007-12-11 21:43:39.0 -0800 @@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq have = netpoll_poll_lock(n); - weight = n-weight; + weight = min(n-weight, budget); /* This NAPI_STATE_SCHED test is for avoiding a race * with netpoll's poll_napi(). Only the entity which -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
2007/12/12, Stephen Hemminger [EMAIL PROTECTED]: Isn't this a better fix for all drivers, rather than peppering every driver with the special case. This is how the logic worked up until 2.6.24. --- a/net/core/dev.c2007-12-11 12:16:20.0 -0800 +++ b/net/core/dev.c2007-12-11 21:43:39.0 -0800 @@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq have = netpoll_poll_lock(n); - weight = n-weight; + weight = min(n-weight, budget); /* This NAPI_STATE_SCHED test is for avoiding a race * with netpoll's poll_napi(). Only the entity which Stephen, Could you explain how it fix the problem? IMHO I think your patch cannot solve the problem. The drivers can call netif_rx_complete and net_rx_action can do list_move_tail also. Am I missing something? Thanks Joonwoo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/