Re: NAPI poll behavior in various Intel drivers

2008-01-07 Thread Jarek Poplawski
On 04-01-2008 12:40, David Miller wrote:
...
 That tx_cleaned thing clouds the logic in all of these driver's
 poll routines.
 
 The one necessary precondition is that when work_done  budget
 we exit polling and return a value less than budget.
 
 If the -poll() returns a value less than budget, net_rx_action()
 assumes that the device has been removed from the poll() list.
 
   /* Drivers must not modify the NAPI state if they
* consume the entire weight.  In such cases this code
* still owns the NAPI instance and therefore can
* move the instance around on the list at-will.
*/
   if (unlikely(work == weight))
   list_move_tail(n-poll_list, list);
 

Probably it's because of this clouded logic, but IMHO: Drivers must
not modify the NAPI state if... doesn't imply: drivers must modify
the NAPI state otherwise. (But it seems I must have missed the real
reason for this quote here.)

Regards,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI poll behavior in various Intel drivers

2008-01-05 Thread Andi Kleen
David Miller [EMAIL PROTECTED] writes:

 From: James Chapman [EMAIL PROTECTED]
 Date: Sat, 05 Jan 2008 00:18:31 +

 David Miller wrote:
  From: James Chapman [EMAIL PROTECTED]
  Date: Fri, 04 Jan 2008 20:10:30 +
  
  With the latest NAPI, this code has to change. But rather than remove
  the tx_cleaned logic completely, shouldn't transmit processing be
  included in the work_done accounting when a driver does transmit cleanup
  processing in the poll?
  
  Most other NAPI drivers don't do this, they just process all the
  pending TX work unconditionally and do not account it into the NAPI
  poll work.
 
 This will cause the interface to thrash in/out of polled mode very
 quickly when it is doing almost all transmit work. That's something to
 avoid, no?

 I see your point although I've never seen this in practice
 with tg3 or niu.

In 2.4 we used to have (haven't checked recently) performance regressions
with NAPI vs non NAPI (or versus the old BCM vendor driver) on tg3 for
some workloads that didn't fully fill the link. The theory was always
that the reason for that was something like the regular switching in
and out. So I think we saw that problem on tg3 too.

-Andi
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI poll behavior in various Intel drivers

2008-01-05 Thread David Miller
From: Andi Kleen [EMAIL PROTECTED]
Date: Sat, 05 Jan 2008 14:29:05 +0100

 In 2.4 we used to have (haven't checked recently) performance regressions
 with NAPI vs non NAPI (or versus the old BCM vendor driver) on tg3 for
 some workloads that didn't fully fill the link. The theory was always
 that the reason for that was something like the regular switching in
 and out. So I think we saw that problem on tg3 too.

It was because we originally didn't program the HW interrupt
mitigation settings at all when using NAPI, now we do and the problem
is long gone.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NAPI poll behavior in various Intel drivers

2008-01-04 Thread David Miller

Several Intel networking drivers such as e1000, e1000e
and e100 all do this to exit NAPI polling:

if ((!tx_cleaned  (work_done == 0)) ||
   !netif_running(poll_dev)) {

I tried to make this use in the NAPI rework:

if ((!tx_cleaned  (work_done  budget)) ||
   !netif_running(poll_dev)) {

But that got reverted by:

commit f7bbb9098315d712351aba7861a8c9fcf6bf0213

e1000: Fix NAPI state bug when Rx complete

Don't exit polling when we have not yet used our budget, this causes
the NAPI system to end up with a messed up poll list.

Signed-off-by: Auke Kok [EMAIL PROTECTED]
Signed-off-by: Jeff Garzik [EMAIL PROTECTED]

I definitely would not have signed off on that :-)

That tx_cleaned thing clouds the logic in all of these driver's
poll routines.

The one necessary precondition is that when work_done  budget
we exit polling and return a value less than budget.

If the -poll() returns a value less than budget, net_rx_action()
assumes that the device has been removed from the poll() list.

/* Drivers must not modify the NAPI state if they
 * consume the entire weight.  In such cases this code
 * still owns the NAPI instance and therefore can
 * move the instance around on the list at-will.
 */
if (unlikely(work == weight))
list_move_tail(n-poll_list, list);

This work_done == 0 test in these drivers, is thus, wrong.  It
should be work_done  budget and the whole tx_cleaned thing needs to
be removed.

It happens to work, because what happens is that we loop again and
process the same NAPI struct again.

As a result, E1000 devices get polled TWICE every time they
process at least one RX packet, but do not consume the whole
quota.

I smell a performance hack, and if so this is wrong and against
all of the principles of NAPI.  Either that or it's a workaround
for the !netif_running() case.

I noticed this while trying to work on a generic fix for the
-poll() does not exit when device is brought down while being
bombed with packets bug.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI poll behavior in various Intel drivers

2008-01-04 Thread James Chapman
David Miller wrote:
 Several Intel networking drivers such as e1000, e1000e
 and e100 all do this to exit NAPI polling:
 
   if ((!tx_cleaned  (work_done == 0)) ||
  !netif_running(poll_dev)) {
 
 I tried to make this use in the NAPI rework:
 
   if ((!tx_cleaned  (work_done  budget)) ||
  !netif_running(poll_dev)) {
 
 But that got reverted by:
 
   commit f7bbb9098315d712351aba7861a8c9fcf6bf0213
 
   e1000: Fix NAPI state bug when Rx complete
 
   Don't exit polling when we have not yet used our budget, this causes
   the NAPI system to end up with a messed up poll list.
 
   Signed-off-by: Auke Kok [EMAIL PROTECTED]
   Signed-off-by: Jeff Garzik [EMAIL PROTECTED]
 
 I definitely would not have signed off on that :-)
 
 That tx_cleaned thing clouds the logic in all of these driver's
 poll routines.
 
 The one necessary precondition is that when work_done  budget
 we exit polling and return a value less than budget.
 
 If the -poll() returns a value less than budget, net_rx_action()
 assumes that the device has been removed from the poll() list.
 
   /* Drivers must not modify the NAPI state if they
* consume the entire weight.  In such cases this code
* still owns the NAPI instance and therefore can
* move the instance around on the list at-will.
*/
   if (unlikely(work == weight))
   list_move_tail(n-poll_list, list);
 
 This work_done == 0 test in these drivers, is thus, wrong.  It
 should be work_done  budget and the whole tx_cleaned thing needs to
 be removed.
 
 It happens to work, because what happens is that we loop again and
 process the same NAPI struct again.
 
 As a result, E1000 devices get polled TWICE every time they
 process at least one RX packet, but do not consume the whole
 quota.
 
 I smell a performance hack, and if so this is wrong and against
 all of the principles of NAPI.  Either that or it's a workaround
 for the !netif_running() case.

You have a good nose, Dave. :) And you can probably partly blame me for
this scheme. I worked with the e100 driver author (Scott Feldman) 5
years ago to performance tune and stress test the driver. I found much
better packets/sec forwarding performance (packets in one e100, out
another) by staying in polled mode until no receive or transmit was done.

With the latest NAPI, this code has to change. But rather than remove
the tx_cleaned logic completely, shouldn't transmit processing be
included in the work_done accounting when a driver does transmit cleanup
processing in the poll? If not, then when an interface is transmitting
far more than receiving, it will exit polled mode on every poll.

 I noticed this while trying to work on a generic fix for the
 -poll() does not exit when device is brought down while being
 bombed with packets bug.

-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI poll behavior in various Intel drivers

2008-01-04 Thread David Miller
From: James Chapman [EMAIL PROTECTED]
Date: Fri, 04 Jan 2008 20:10:30 +

 With the latest NAPI, this code has to change. But rather than remove
 the tx_cleaned logic completely, shouldn't transmit processing be
 included in the work_done accounting when a driver does transmit cleanup
 processing in the poll?

Most other NAPI drivers don't do this, they just process all the
pending TX work unconditionally and do not account it into the NAPI
poll work.

The logic is that, like link state handling, TX work is very cheap and
not the cpu cycle eater that RX packet processing is.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI poll behavior in various Intel drivers

2008-01-04 Thread James Chapman
David Miller wrote:
 From: James Chapman [EMAIL PROTECTED]
 Date: Fri, 04 Jan 2008 20:10:30 +
 
 With the latest NAPI, this code has to change. But rather than remove
 the tx_cleaned logic completely, shouldn't transmit processing be
 included in the work_done accounting when a driver does transmit cleanup
 processing in the poll?
 
 Most other NAPI drivers don't do this, they just process all the
 pending TX work unconditionally and do not account it into the NAPI
 poll work.

This will cause the interface to thrash in/out of polled mode very
quickly when it is doing almost all transmit work. That's something to
avoid, no?

 The logic is that, like link state handling, TX work is very cheap and
 not the cpu cycle eater that RX packet processing is.

The processing is cheap but it is being done in the poll so it is
scheduled by NAPI. The event rate for TX events can be just as high as
RX. For link state handling, the event rate is always low so won't cause
NAPI to schedule rapidly.

-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI poll behavior in various Intel drivers

2008-01-04 Thread David Miller
From: James Chapman [EMAIL PROTECTED]
Date: Sat, 05 Jan 2008 00:18:31 +

 David Miller wrote:
  From: James Chapman [EMAIL PROTECTED]
  Date: Fri, 04 Jan 2008 20:10:30 +
  
  With the latest NAPI, this code has to change. But rather than remove
  the tx_cleaned logic completely, shouldn't transmit processing be
  included in the work_done accounting when a driver does transmit cleanup
  processing in the poll?
  
  Most other NAPI drivers don't do this, they just process all the
  pending TX work unconditionally and do not account it into the NAPI
  poll work.
 
 This will cause the interface to thrash in/out of polled mode very
 quickly when it is doing almost all transmit work. That's something to
 avoid, no?

I see your point although I've never seen this in practice
with tg3 or niu.

For a heavy transmit load with TCP, the cpu killer is the ACK receive
processing.

I guess for a datagram send situation it might start to edge up.

However, you're likely deferring TX events (say every 1/4 of the TX
ring or similar) which should make the effect matter much less.

But anyways, it's someone else's driver so if they want to take
TX work into account sure. :-)  I'll keep this in mind in my NAPI
changes.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html