Re: NAPI poll behavior in various Intel drivers
On 04-01-2008 12:40, David Miller wrote: ... That tx_cleaned thing clouds the logic in all of these driver's poll routines. The one necessary precondition is that when work_done budget we exit polling and return a value less than budget. If the -poll() returns a value less than budget, net_rx_action() assumes that the device has been removed from the poll() list. /* Drivers must not modify the NAPI state if they * consume the entire weight. In such cases this code * still owns the NAPI instance and therefore can * move the instance around on the list at-will. */ if (unlikely(work == weight)) list_move_tail(n-poll_list, list); Probably it's because of this clouded logic, but IMHO: Drivers must not modify the NAPI state if... doesn't imply: drivers must modify the NAPI state otherwise. (But it seems I must have missed the real reason for this quote here.) Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NAPI poll behavior in various Intel drivers
David Miller [EMAIL PROTECTED] writes: From: James Chapman [EMAIL PROTECTED] Date: Sat, 05 Jan 2008 00:18:31 + David Miller wrote: From: James Chapman [EMAIL PROTECTED] Date: Fri, 04 Jan 2008 20:10:30 + With the latest NAPI, this code has to change. But rather than remove the tx_cleaned logic completely, shouldn't transmit processing be included in the work_done accounting when a driver does transmit cleanup processing in the poll? Most other NAPI drivers don't do this, they just process all the pending TX work unconditionally and do not account it into the NAPI poll work. This will cause the interface to thrash in/out of polled mode very quickly when it is doing almost all transmit work. That's something to avoid, no? I see your point although I've never seen this in practice with tg3 or niu. In 2.4 we used to have (haven't checked recently) performance regressions with NAPI vs non NAPI (or versus the old BCM vendor driver) on tg3 for some workloads that didn't fully fill the link. The theory was always that the reason for that was something like the regular switching in and out. So I think we saw that problem on tg3 too. -Andi -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NAPI poll behavior in various Intel drivers
From: Andi Kleen [EMAIL PROTECTED] Date: Sat, 05 Jan 2008 14:29:05 +0100 In 2.4 we used to have (haven't checked recently) performance regressions with NAPI vs non NAPI (or versus the old BCM vendor driver) on tg3 for some workloads that didn't fully fill the link. The theory was always that the reason for that was something like the regular switching in and out. So I think we saw that problem on tg3 too. It was because we originally didn't program the HW interrupt mitigation settings at all when using NAPI, now we do and the problem is long gone. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
NAPI poll behavior in various Intel drivers
Several Intel networking drivers such as e1000, e1000e and e100 all do this to exit NAPI polling: if ((!tx_cleaned (work_done == 0)) || !netif_running(poll_dev)) { I tried to make this use in the NAPI rework: if ((!tx_cleaned (work_done budget)) || !netif_running(poll_dev)) { But that got reverted by: commit f7bbb9098315d712351aba7861a8c9fcf6bf0213 e1000: Fix NAPI state bug when Rx complete Don't exit polling when we have not yet used our budget, this causes the NAPI system to end up with a messed up poll list. Signed-off-by: Auke Kok [EMAIL PROTECTED] Signed-off-by: Jeff Garzik [EMAIL PROTECTED] I definitely would not have signed off on that :-) That tx_cleaned thing clouds the logic in all of these driver's poll routines. The one necessary precondition is that when work_done budget we exit polling and return a value less than budget. If the -poll() returns a value less than budget, net_rx_action() assumes that the device has been removed from the poll() list. /* Drivers must not modify the NAPI state if they * consume the entire weight. In such cases this code * still owns the NAPI instance and therefore can * move the instance around on the list at-will. */ if (unlikely(work == weight)) list_move_tail(n-poll_list, list); This work_done == 0 test in these drivers, is thus, wrong. It should be work_done budget and the whole tx_cleaned thing needs to be removed. It happens to work, because what happens is that we loop again and process the same NAPI struct again. As a result, E1000 devices get polled TWICE every time they process at least one RX packet, but do not consume the whole quota. I smell a performance hack, and if so this is wrong and against all of the principles of NAPI. Either that or it's a workaround for the !netif_running() case. I noticed this while trying to work on a generic fix for the -poll() does not exit when device is brought down while being bombed with packets bug. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NAPI poll behavior in various Intel drivers
David Miller wrote: Several Intel networking drivers such as e1000, e1000e and e100 all do this to exit NAPI polling: if ((!tx_cleaned (work_done == 0)) || !netif_running(poll_dev)) { I tried to make this use in the NAPI rework: if ((!tx_cleaned (work_done budget)) || !netif_running(poll_dev)) { But that got reverted by: commit f7bbb9098315d712351aba7861a8c9fcf6bf0213 e1000: Fix NAPI state bug when Rx complete Don't exit polling when we have not yet used our budget, this causes the NAPI system to end up with a messed up poll list. Signed-off-by: Auke Kok [EMAIL PROTECTED] Signed-off-by: Jeff Garzik [EMAIL PROTECTED] I definitely would not have signed off on that :-) That tx_cleaned thing clouds the logic in all of these driver's poll routines. The one necessary precondition is that when work_done budget we exit polling and return a value less than budget. If the -poll() returns a value less than budget, net_rx_action() assumes that the device has been removed from the poll() list. /* Drivers must not modify the NAPI state if they * consume the entire weight. In such cases this code * still owns the NAPI instance and therefore can * move the instance around on the list at-will. */ if (unlikely(work == weight)) list_move_tail(n-poll_list, list); This work_done == 0 test in these drivers, is thus, wrong. It should be work_done budget and the whole tx_cleaned thing needs to be removed. It happens to work, because what happens is that we loop again and process the same NAPI struct again. As a result, E1000 devices get polled TWICE every time they process at least one RX packet, but do not consume the whole quota. I smell a performance hack, and if so this is wrong and against all of the principles of NAPI. Either that or it's a workaround for the !netif_running() case. You have a good nose, Dave. :) And you can probably partly blame me for this scheme. I worked with the e100 driver author (Scott Feldman) 5 years ago to performance tune and stress test the driver. I found much better packets/sec forwarding performance (packets in one e100, out another) by staying in polled mode until no receive or transmit was done. With the latest NAPI, this code has to change. But rather than remove the tx_cleaned logic completely, shouldn't transmit processing be included in the work_done accounting when a driver does transmit cleanup processing in the poll? If not, then when an interface is transmitting far more than receiving, it will exit polled mode on every poll. I noticed this while trying to work on a generic fix for the -poll() does not exit when device is brought down while being bombed with packets bug. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NAPI poll behavior in various Intel drivers
From: James Chapman [EMAIL PROTECTED] Date: Fri, 04 Jan 2008 20:10:30 + With the latest NAPI, this code has to change. But rather than remove the tx_cleaned logic completely, shouldn't transmit processing be included in the work_done accounting when a driver does transmit cleanup processing in the poll? Most other NAPI drivers don't do this, they just process all the pending TX work unconditionally and do not account it into the NAPI poll work. The logic is that, like link state handling, TX work is very cheap and not the cpu cycle eater that RX packet processing is. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NAPI poll behavior in various Intel drivers
David Miller wrote: From: James Chapman [EMAIL PROTECTED] Date: Fri, 04 Jan 2008 20:10:30 + With the latest NAPI, this code has to change. But rather than remove the tx_cleaned logic completely, shouldn't transmit processing be included in the work_done accounting when a driver does transmit cleanup processing in the poll? Most other NAPI drivers don't do this, they just process all the pending TX work unconditionally and do not account it into the NAPI poll work. This will cause the interface to thrash in/out of polled mode very quickly when it is doing almost all transmit work. That's something to avoid, no? The logic is that, like link state handling, TX work is very cheap and not the cpu cycle eater that RX packet processing is. The processing is cheap but it is being done in the poll so it is scheduled by NAPI. The event rate for TX events can be just as high as RX. For link state handling, the event rate is always low so won't cause NAPI to schedule rapidly. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NAPI poll behavior in various Intel drivers
From: James Chapman [EMAIL PROTECTED] Date: Sat, 05 Jan 2008 00:18:31 + David Miller wrote: From: James Chapman [EMAIL PROTECTED] Date: Fri, 04 Jan 2008 20:10:30 + With the latest NAPI, this code has to change. But rather than remove the tx_cleaned logic completely, shouldn't transmit processing be included in the work_done accounting when a driver does transmit cleanup processing in the poll? Most other NAPI drivers don't do this, they just process all the pending TX work unconditionally and do not account it into the NAPI poll work. This will cause the interface to thrash in/out of polled mode very quickly when it is doing almost all transmit work. That's something to avoid, no? I see your point although I've never seen this in practice with tg3 or niu. For a heavy transmit load with TCP, the cpu killer is the ACK receive processing. I guess for a datagram send situation it might start to edge up. However, you're likely deferring TX events (say every 1/4 of the TX ring or similar) which should make the effect matter much less. But anyways, it's someone else's driver so if they want to take TX work into account sure. :-) I'll keep this in mind in my NAPI changes. Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html