Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-20 Thread Andrew Grover

(trimmed CC to just netdev)


  One of our engineers (on the I/O AT team) has been tasked with modifying
  the Linux kernel to properly support multiple hardware queues (both TX and
  RX).  We'll make sure that he looks at the netpoll interface as part of
  that process.

 Might I ask who this is?  I might like to ping him/her on this topic.
 There is potentially some overlap with wireless, at least on the
 transmit side.
 John W. Linville


Hi John, so yeah we want multiple TX queues on wired ethernet for QoS,
same as wireless. Also for SMP scaling (maybe not needed until 10G+).
Did wireless people settle on a design since the email thread in
January?

Regards -- Andy
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-15 Thread John W. Linville
On Wed, Jun 14, 2006 at 04:44:56PM -0700, Mitch Williams wrote:

 One of our engineers (on the I/O AT team) has been tasked with modifying
 the Linux kernel to properly support multiple hardware queues (both TX and
 RX).  We'll make sure that he looks at the netpoll interface as part of
 that process.

Might I ask who this is?  I might like to ping him/her on this topic.
There is potentially some overlap with wireless, at least on the
transmit side.

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-15 Thread Mitch Williams
Andy Grover [EMAIL PROTECTED] is the guy.  I'm not sure when he'll
be working on this; it's somewhere in his TODO pile.

On Thu, 15 Jun 2006, John W. Linville wrote:


 On Wed, Jun 14, 2006 at 04:44:56PM -0700, Mitch Williams wrote:

  One of our engineers (on the I/O AT team) has been tasked with modifying
  the Linux kernel to properly support multiple hardware queues (both TX and
  RX).  We'll make sure that he looks at the netpoll interface as part of
  that process.

 Might I ask who this is?  I might like to ping him/her on this topic.
 There is potentially some overlap with wireless, at least on the
 transmit side.

 John
 --
 John W. Linville
 [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-14 Thread Neil Horman
On Mon, Jun 12, 2006 at 02:06:00PM -0400, Neil Horman wrote:
 On Mon, Jun 12, 2006 at 09:42:14AM -0700, Mitch Williams wrote:
  On Sun, 2006-06-11 at 17:13 -0700, Neil Horman wrote:
   Any further thoughts on this guys?  I still think my last solution
   solves all of
   the netpoll problems, and isn't going to have any noticable impact on
   performance.
   
  I haven't had time to evaluate performance on your patch (sorry!), but
  after thinking about it, I agree that it should not have any noticeable
  impact.  OTOH, performance tuning is a funny thing, and things you think
  won't cause problems often do.
  
 Thats ok, I just didn't hear out of anyone on friday, so I was curious as to
 where we were on this.  I don't have the ability to do any real world
 performance testing here, but I'll try to record the run time of the interrupt
 routine on a limited number of frames here.
 

Hey, as promised, I've done some rudimentary performance benchmarking on various
ways that we have talked about to solve this problem.  As I previously mentioned
I didn't have the equipment to do any real full scale testing here, so what I
did was take a read of the real time counter at the start and end of the
e1000_intr routine with various patches applied, and I recorded the number of
ticks elapsed on the tsc during its run.  I did this on my single cpu x86_64
machine here, using the latest unpatched e1000 driver as a base, and then
comparing it to the e1000 driver using my patch and separately with a patch that
spinlocks the e1000_clean_rx_irq routine (so as to serialize the critical
section that would otherwise be subject to data corruption.  Here are my
results:

Base line: 
Avg. 8145 Ticks on the tsc.

With my patch: 
http://marc.theaimsgroup.com/?l=linux-netdevm=114970807606096w=2 
Avg. 8159 Ticks on the tsc. (+0.17% increase)

With a spinlock added to e1000_clean_rx_irq:
Avg. 8238 Ticks on the tsc. (+1.1% increase)

If you like I can send you the time stamp counter patch that I used, as well as
the patch which adds spinlocks to the clean routine.  Note that the free running
counter values will vary so you probably want to look at percentage increase.
Either way, I think either solution provides very little impact on interrupt run
time.  Given that my patch (granted using my test methodology here) is the
faster of the two, and arguably the more correct in terms of not using the poll
controller method to recieve frames, We should go with that patch.

Thoughts/opinions? 
Neil


-- 
/***
 *Neil Horman
 *Software Engineer
 *gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
 ***/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-14 Thread Mitch Williams


On Wed, 14 Jun 2006, Neil Horman wrote:

 Hey, as promised, I've done some rudimentary performance benchmarking on 
 various
 ways that we have talked about to solve this problem.  As I previously 
 mentioned

We see the same results here, Neil.  However, we've got a much less
invasive patch undergoing internal review, and which we will post to
netdev once everybody gets happy with it.  Basically, we just do our NAPI
scheduling on the real netdev structure instead of our polling netdev,
in the case where we only have one RX queue.  Since this is the case for
all our currently-shipping parts under Linux, netpoll works again across
the board.  It's a short-term fix because we do want to support multiple
queues going forward, but for now we need to get everybody working.

One of our engineers (on the I/O AT team) has been tasked with modifying
the Linux kernel to properly support multiple hardware queues (both TX and
RX).  We'll make sure that he looks at the netpoll interface as part of
that process.

Stay tuned for our impending patch.
-Mitch

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-12 Thread Neil Horman
On Mon, Jun 12, 2006 at 09:42:14AM -0700, Mitch Williams wrote:
 On Sun, 2006-06-11 at 17:13 -0700, Neil Horman wrote:
  Any further thoughts on this guys?  I still think my last solution
  solves all of
  the netpoll problems, and isn't going to have any noticable impact on
  performance.
  
 I haven't had time to evaluate performance on your patch (sorry!), but
 after thinking about it, I agree that it should not have any noticeable
 impact.  OTOH, performance tuning is a funny thing, and things you think
 won't cause problems often do.
 
Thats ok, I just didn't hear out of anyone on friday, so I was curious as to
where we were on this.  I don't have the ability to do any real world
performance testing here, but I'll try to record the run time of the interrupt
routine on a limited number of frames here.

 Anyway, I'm still not quite ready to ACK this because it's just not
 future-proof.  Eventually, we will need to support multiple RX queues,
 and this solution will not work in that situation.
 
Not sure I understand this.  My patch:
http://marc.theaimsgroup.com/?l=linux-netdevm=114970506406155w=2
still allows for the use of multiple rx
queues in the nominal case.  Only when we have to use a relatively slow netpoll
driven operation (kgdb, netdump, etc), do we need to receive on the same
interface that we transmit on.

 A simpler short-term solution is just to schedule our NAPI polling on
 the real netdev instead of our polling netdev.  This is a trivial
 change and works correctly with a single queue.  But, like your patch,
 it isn't future-proof.
 
Again, not sure what you mean here.  My last patch proposal:
http://marc.theaimsgroup.com/?l=linux-netdevm=114970807606096w=2
Does precisely what you describe, but it still allows for multiple rx queues in
the nominal (non poll_controller driven) receive case.

 So, I'm still thinking and pondering on this one.
 
 If we get a patch in to fix the recursive loop in netpoll, my original
 patch will work, right?  Or is there still another issue?
 
I assume you are referring to this patch:
http://marc.theaimsgroup.com/?l=linux-netdevm=114970506406155w=2
If so, then no, that patch is still broken for the reasons Jeff outlined
previously, The recursion patch that I proposed earlier today is related to a
different recursion problem, and while the e1000 driver might be suceptable to
it, your patch is also suceptible to the data corruption that arises from when
the poll_controller calls adapter-clean_rx at the same that that dev-poll is
called for the same adapter on another cpu.  If that happens we can have two
cpu's writing to the same private net_device data without the benefit of a
spinlock to protect them.  And yes, you can add a spin lock to protect the case
where you have a dev-poll_controller and a dev-poll operation at the same
time, but that seems to me like it will also re-serialize all the parallel
operations that you could otherwise do with multiple rx queues

I'll post again if I get a chance to do some performance measurements
Regrds
Neil

 -Mitch
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
/***
 *Neil Horman
 *Software Engineer
 *gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
 ***/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-11 Thread Neil Horman
On Thu, Jun 08, 2006 at 01:29:00PM -0400, Jeff Moyer wrote:
 == Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Mitch Williams 
 [EMAIL PROTECTED] adds:
 
 mitch.a.williams On Wed, 2006-06-07 at 11:44 -0700, Jeff Moyer wrote:
  That patch locks around the tx clean routine.  As such, it doesn't
  prevent the problem.
 
 mitch.a.williams The call to netif_rx_schedule_prep provides locking
 mitch.a.williams because it sets the __LINK_STATE_RX_SCHED bit atomically.
 mitch.a.williams The spinlock around e1000_clean_tx_irq is to protect it
 mitch.a.williams from other calls to the transmit routine, not NAPI.
 
 Yes, but what prevents recursion in the poll routine?  Consider that the
 poll routine could end up triggerring a printk (think iptables, here).  In
 that case, you end up calling into netpoll, and if the tx ring is full, we
 call the poll_controller routine.  We've now recursed.
 
 The poll lock was originally introduced to prevent recursion, not
 concurrent access.
 
Any further thoughts on this guys?  I still think my last solution solves all of
the netpoll problems, and isn't going to have any noticable impact on
performance.

Regards
Neil

 -Jeff
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-08 Thread Mitch Williams
On Wed, 2006-06-07 at 11:44 -0700, Jeff Moyer wrote:

 That patch locks around the tx clean routine.  As such, it doesn't
 prevent
 the problem.

The call to netif_rx_schedule_prep provides locking because it sets the
__LINK_STATE_RX_SCHED bit atomically.  The spinlock around
e1000_clean_tx_irq is to protect it from other calls to the transmit
routine, not NAPI.

-Mitch

  + disable_irq(adapter-pdev-irq);
  + if
 (likely(netif_rx_schedule_prep(adapter-polling_netdev[0]))) {
  + if (spin_trylock(adapter-tx_queue_lock)) {
  + e1000_clean_tx_irq(adapter,
 adapter-tx_ring[0]);
  + spin_unlock(adapter-tx_queue_lock);
  + }
  + adapter-clean_rx(adapter, adapter-rx_ring,
  + budget, netdev-weight);
  + clear_bit(__LINK_STATE_RX_SCHED,
  + adapter-polling_netdev[0].state);
 
 -Jeff
 
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-08 Thread Jeff Moyer
== Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Mitch Williams 
[EMAIL PROTECTED] adds:

mitch.a.williams On Wed, 2006-06-07 at 11:44 -0700, Jeff Moyer wrote:
 That patch locks around the tx clean routine.  As such, it doesn't
 prevent the problem.

mitch.a.williams The call to netif_rx_schedule_prep provides locking
mitch.a.williams because it sets the __LINK_STATE_RX_SCHED bit atomically.
mitch.a.williams The spinlock around e1000_clean_tx_irq is to protect it
mitch.a.williams from other calls to the transmit routine, not NAPI.

Yes, but what prevents recursion in the poll routine?  Consider that the
poll routine could end up triggerring a printk (think iptables, here).  In
that case, you end up calling into netpoll, and if the tx ring is full, we
call the poll_controller routine.  We've now recursed.

The poll lock was originally introduced to prevent recursion, not
concurrent access.

-Jeff
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-08 Thread Mitch Williams
On Wed, 2006-06-07 at 11:54 -0700, John W. Linville wrote:

 Pedantic objection, but I think this would read easier w/o the extra
 newline before disable_irq.

Heh.  I prefer to have a newline between declarations and code.  The
real problem is the position of the #ifdef -- that's what makes it
difficult to read.  The other solution would be
{
struct e1000_adapter *adapter = netdev_priv(netdev);
#ifdef CONFIG_E1000_NAPI
int budget = 0;
#endif

disable_irq(adapter-pdev-irq);

#ifdef CONFIG_E1000_NAPI
 all that stuff 
#else
rest of the stuff 
#endif
}

Which I think is worse to read.
-Mitch
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-08 Thread John W. Linville
On Thu, Jun 08, 2006 at 10:23:56AM -0700, Mitch Williams wrote:
 On Wed, 2006-06-07 at 11:54 -0700, John W. Linville wrote:
 
  Pedantic objection, but I think this would read easier w/o the extra
  newline before disable_irq.
 
 Heh.  I prefer to have a newline between declarations and code.  The

Normally I would agree.  But in this case, I find the distraction of
the random newline after the #else to be more compelling.

 real problem is the position of the #ifdef -- that's what makes it
 difficult to read.  The other solution would be
 {
 struct e1000_adapter *adapter = netdev_priv(netdev);
 #ifdef CONFIG_E1000_NAPI
   int budget = 0;
 #endif
 
   disable_irq(adapter-pdev-irq);
 
 #ifdef CONFIG_E1000_NAPI
all that stuff 
 #else
   rest of the stuff 
 #endif
 }
 
 Which I think is worse to read.

I presume it is the double #ifdef that you find objectionable?
I don't really like it, but at least that idiom is quite common.

Given that the disable_irq appears in both code paths (almost by
necessity), there is a certain appeal to having it outside of the
#ifdef block.  That seems more maintainable.

To me, the idiomatic #ifdef placement seems more readable, if for no
other reason than familiarity.  I suppose we can agree to disagree.

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-07 Thread Neil Horman
  
  Matt, any ideas on this?
 
 Not at the moment.

how about this for a solution?  It doesn't make netpoll any more robust, but I
think in the interests of efficiency it would be fair to require that, when
netpolled, a driver must receive frames on the same net device for which it was
polled.  With this patch we detect that condition and handle it accordingly in
e1000_intr.  This eliminates the need for us to call the clean_rx method from
the poll_controller when napi is configured, instead allowing the poll method to
be called from napi_poll, as the netpoll model currently does.  This fixes the
netdump regression, and eliminates the layering violation and the potential race
that we've been discussing.  I've just tested it with netdump here and it works
quite well.

Thoughts appreciated.

Thanks  Regards
Neil

Signed-off-by: Neil Horman [EMAIL PROTECTED]


 e1000_main.c |   54 +-
 1 files changed, 45 insertions(+), 9 deletions(-)


--- linux-2.6.9/drivers/net/e1000/e1000_main.c.neil 2006-06-06 
10:37:42.0 -0400
+++ linux-2.6.9/drivers/net/e1000/e1000_main.c  2006-06-07 10:48:22.0 
-0400
@@ -3207,8 +3207,9 @@
  * @pt_regs: CPU registers structure
  **/
 
+
 static irqreturn_t
-e1000_intr(int irq, void *data, struct pt_regs *regs)
+__e1000_intr(int irq, void *data, struct pt_regs *regs, int netpoll_op)
 {
struct net_device *netdev = data;
struct e1000_adapter *adapter = netdev_priv(netdev);
@@ -3217,6 +3218,7 @@
 #ifndef CONFIG_E1000_NAPI
int i;
 #else
+   struct net_device *dev_to_sched;
/* Interrupt Auto-Mask...upon reading ICR,
 * interrupts are masked.  No need for the
 * IMC write, but it does mean we should
@@ -3255,8 +3257,22 @@
E1000_WRITE_REG(hw, IMC, ~0);
E1000_WRITE_FLUSH(hw);
}
-   if (likely(netif_rx_schedule_prep(adapter-polling_netdev[0])))
-   __netif_rx_schedule(adapter-polling_netdev[0]);
+
+   /*
+* netpoll operations, in the interests of efficiency 
+* only do napi polling on the device passed to the 
+* poll_controller.  Therefore, if we are preforming 
+* a netpoll operation, then we can't schedule a receive
+* to one of the dummy net devices that exist for sole
+* purpose of spreading out rx schedules
+*/ 
+   if (netpoll_op)
+   dev_to_sched = netdev;
+   else
+   dev_to_sched = adapter-polling_netdev[0];
+
+   if (likely(netif_rx_schedule_prep(dev_to_sched)))
+   __netif_rx_schedule(dev_to_sched);
else
e1000_irq_enable(adapter);
 #else
@@ -3288,6 +3304,13 @@
return IRQ_HANDLED;
 }
 
+static irqreturn_t
+e1000_intr(int irq, void *data, struct pt_regs *regs)
+{
+   return __e1000_intr(irq, data, regs, 0);
+}
+
+
 #ifdef CONFIG_E1000_NAPI
 /**
  * e1000_clean - NAPI Rx polling callback
@@ -3300,7 +3323,6 @@
struct e1000_adapter *adapter;
int work_to_do = min(*budget, poll_dev-quota);
int tx_cleaned = 0, i = 0, work_done = 0;
-
/* Must NOT use netdev_priv macro here. */
adapter = poll_dev-priv;
 
@@ -3308,10 +3330,24 @@
if (!netif_carrier_ok(adapter-netdev))
goto quit_polling;
 
-   while (poll_dev != adapter-polling_netdev[i]) {
-   i++;
-   if (unlikely(i == adapter-num_rx_queues))
-   BUG();
+   /*
+* only search for a matching polling_netdev in the event
+* that this isn't a real registered net_device
+* A real net device can be passed in here in the event
+* that netdump has been activated (this comes through 
+* netpoll_poll_dev).  We detect this by virtue of the 
+* fact that each polling_netdev-priv points to the private
+* data of its parent (registered) netdev.  So if:
+* poll_dev-priv == netdev_priv(poll_dev), its a real device
+* otherwise its a polling_netdev.
+*/
+   if (adapter != netdev_priv(poll_dev)) {
+   while (poll_dev != adapter-polling_netdev[i]) {
+   i++;
+   if (unlikely(i == adapter-num_rx_queues))
+   BUG();
+   }
+
}
 
if (likely(adapter-num_tx_queues == 1)) {
@@ -4624,7 +4660,7 @@
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
disable_irq(adapter-pdev-irq);
-   e1000_intr(adapter-pdev-irq, netdev, NULL);
+   __e1000_intr(adapter-pdev-irq, netdev, NULL, 1);
e1000_clean_tx_irq(adapter, adapter-tx_ring);
 #ifndef CONFIG_E1000_NAPI
adapter-clean_rx(adapter, adapter-rx_ring);
-- 
/***
 *Neil Horman
 *Software Engineer
 *gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
 ***/
-
To unsubscribe 

Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-07 Thread Matt Mackall
On Wed, Jun 07, 2006 at 11:05:22AM -0400, Neil Horman wrote:
   
   Matt, any ideas on this?
  
  Not at the moment.
 
 how about this for a solution?  It doesn't make netpoll any more robust, but I
 think in the interests of efficiency it would be fair to require that, when
 netpolled, a driver must receive frames on the same net device for which it 
 was
 polled.  With this patch we detect that condition and handle it accordingly in
 e1000_intr.  This eliminates the need for us to call the clean_rx method from
 the poll_controller when napi is configured, instead allowing the poll method 
 to
 be called from napi_poll, as the netpoll model currently does.  This fixes the
 netdump regression, and eliminates the layering violation and the potential 
 race
 that we've been discussing.  I've just tested it with netdump here and it 
 works
 quite well.
 
 Thoughts appreciated.

This looks pretty reasonable, mostly from the perspective that it
doesn't put any further ugliness in netpoll. We might want to add a
comment somewhere in netpoll of the new rule we're now observing.
I'll let the e1000 guys comment on the particulars of the driver change.

 Signed-off-by: Neil Horman [EMAIL PROTECTED]
Signed-off-by: Matt Mackall [EMAIL PROTECTED]

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-07 Thread Auke Kok

Matt Mackall wrote:

On Wed, Jun 07, 2006 at 11:05:22AM -0400, Neil Horman wrote:
 

Matt, any ideas on this?

Not at the moment.

how about this for a solution?  It doesn't make netpoll any more robust, but I
think in the interests of efficiency it would be fair to require that, when
netpolled, a driver must receive frames on the same net device for which it was
polled.  With this patch we detect that condition and handle it accordingly in
e1000_intr.  This eliminates the need for us to call the clean_rx method from
the poll_controller when napi is configured, instead allowing the poll method to
be called from napi_poll, as the netpoll model currently does.  This fixes the
netdump regression, and eliminates the layering violation and the potential race
that we've been discussing.  I've just tested it with netdump here and it works
quite well.

Thoughts appreciated.


This looks pretty reasonable, mostly from the perspective that it
doesn't put any further ugliness in netpoll. We might want to add a
comment somewhere in netpoll of the new rule we're now observing.
I'll let the e1000 guys comment on the particulars of the driver change.


Hi,

we're not too happy with this as it puts a branch right in the regular receive 
path. We haven't ran the numbers on it yet but it is likely that this will 
lower performance significantly during normal receives for something that is 
not common use.


Attached below a (revised) patch that adds proper locking around the rx_clean 
to prevent the race.


Cheers,

Auke

---

e1000: fix netpoll with NAPI

This fixes netpoll when using NAPI, and protects against a rare race condition
in the netpoll routine, while staying out of our ways from the normal hotpath.

Signed-off-by: Mitch Williams [EMAIL PROTECTED]
Signed-off-by: Auke Kok [EMAIL PROTECTED]

---
 e1000_main.c |   17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index bd709e5..876251c 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -4584,10 +4584,25 @@ static void
 e1000_netpoll(struct net_device *netdev)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
+#ifdef CONFIG_E1000_NAPI
+	int budget = 0;
+
+	disable_irq(adapter-pdev-irq);
+	if (likely(netif_rx_schedule_prep(adapter-polling_netdev[0]))) {
+		if (spin_trylock(adapter-tx_queue_lock)) {
+			e1000_clean_tx_irq(adapter, adapter-tx_ring[0]);
+			spin_unlock(adapter-tx_queue_lock);
+		}
+		adapter-clean_rx(adapter, adapter-rx_ring,
+budget, netdev-weight);
+		clear_bit(__LINK_STATE_RX_SCHED,
+adapter-polling_netdev[0].state);
+	}
+#else
+
 	disable_irq(adapter-pdev-irq);
 	e1000_intr(adapter-pdev-irq, netdev, NULL);
 	e1000_clean_tx_irq(adapter, adapter-tx_ring);
-#ifndef CONFIG_E1000_NAPI
 	adapter-clean_rx(adapter, adapter-rx_ring);
 #endif
 	enable_irq(adapter-pdev-irq);


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-07 Thread Jeff Moyer
== Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok [EMAIL 
PROTECTED] adds:

auke-jan.h.kok Hi,

auke-jan.h.kok we're not too happy with this as it puts a branch right in
auke-jan.h.kok the regular receive path. We haven't ran the numbers on it
auke-jan.h.kok yet but it is likely that this will lower performance
auke-jan.h.kok significantly during normal receives for something that is
auke-jan.h.kok not common use.

auke-jan.h.kok Attached below a (revised) patch that adds proper locking
auke-jan.h.kok around the rx_clean to prevent the race.

That patch locks around the tx clean routine.  As such, it doesn't prevent
the problem.

 + disable_irq(adapter-pdev-irq);
 + if (likely(netif_rx_schedule_prep(adapter-polling_netdev[0]))) {
 + if (spin_trylock(adapter-tx_queue_lock)) {
 + e1000_clean_tx_irq(adapter, adapter-tx_ring[0]);
 + spin_unlock(adapter-tx_queue_lock);
 + }
 + adapter-clean_rx(adapter, adapter-rx_ring,
 + budget, netdev-weight);
 + clear_bit(__LINK_STATE_RX_SCHED,
 + adapter-polling_netdev[0].state);

-Jeff
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-07 Thread John W. Linville
On Wed, Jun 07, 2006 at 11:25:29AM -0700, Auke Kok wrote:

 @@ -4584,10 +4584,25 @@ static void
  e1000_netpoll(struct net_device *netdev)
  {
 struct e1000_adapter *adapter = netdev_priv(netdev);
 +#ifdef CONFIG_E1000_NAPI
 +   int budget = 0;
 +
 +   disable_irq(adapter-pdev-irq);
 +   if (likely(netif_rx_schedule_prep(adapter-polling_netdev[0]))) {
 +   if (spin_trylock(adapter-tx_queue_lock)) {
 +   e1000_clean_tx_irq(adapter, adapter-tx_ring[0]);
 +   spin_unlock(adapter-tx_queue_lock);
 +   }
 +   adapter-clean_rx(adapter, adapter-rx_ring,
 +   budget, netdev-weight);
 +   clear_bit(__LINK_STATE_RX_SCHED,
 +   adapter-polling_netdev[0].state);
 +   }
 +#else
 +
 disable_irq(adapter-pdev-irq);

Pedantic objection, but I think this would read easier w/o the extra
newline before disable_irq.

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-07 Thread Neil Horman
On Wed, Jun 07, 2006 at 02:44:54PM -0400, Jeff Moyer wrote:
 == Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok [EMAIL 
 PROTECTED] adds:
 
 auke-jan.h.kok Hi,
 
 auke-jan.h.kok we're not too happy with this as it puts a branch right in
 auke-jan.h.kok the regular receive path. We haven't ran the numbers on it
 auke-jan.h.kok yet but it is likely that this will lower performance
 auke-jan.h.kok significantly during normal receives for something that is
 auke-jan.h.kok not common use.
 
 auke-jan.h.kok Attached below a (revised) patch that adds proper locking
 auke-jan.h.kok around the rx_clean to prevent the race.
 
 That patch locks around the tx clean routine.  As such, it doesn't prevent
 the problem.
 
Further to that, do tests on this if you like, but I would certainly think a
properly formed conditional operation is going to provide better performance
than a spin_lock operation in the receive path.  Why not just turn the:
if(netpoll_op)
into an
if(unlikely(netpoll_op))

I expect that will reduce the overhead of the conditional to effectively zero
for the normal receive case.  The following patch does that, and I expect you
performance won't suffer at all:

Signed-off-by: Neil Horman [EMAIL PROTECTED]



 e1000_main.c |   54 +-
 1 files changed, 45 insertions(+), 9 deletions(-)



--- linux-2.6.9/drivers/net/e1000/e1000_main.c.neil 2006-06-06 
10:37:42.0 -0400
+++ linux-2.6.9/drivers/net/e1000/e1000_main.c  2006-06-07 10:48:22.0 
-0400
@@ -3207,8 +3207,9 @@ e1000_update_stats(struct e1000_adapter 
  * @pt_regs: CPU registers structure
  **/
 
+
 static irqreturn_t
-e1000_intr(int irq, void *data, struct pt_regs *regs)
+__e1000_intr(int irq, void *data, struct pt_regs *regs, int netpoll_op)
 {
struct net_device *netdev = data;
struct e1000_adapter *adapter = netdev_priv(netdev);
@@ -3217,6 +3218,7 @@ e1000_intr(int irq, void *data, struct p
 #ifndef CONFIG_E1000_NAPI
int i;
 #else
+   struct net_device *dev_to_sched;
/* Interrupt Auto-Mask...upon reading ICR,
 * interrupts are masked.  No need for the
 * IMC write, but it does mean we should
@@ -3255,8 +3257,22 @@ e1000_intr(int irq, void *data, struct p
E1000_WRITE_REG(hw, IMC, ~0);
E1000_WRITE_FLUSH(hw);
}
-   if (likely(netif_rx_schedule_prep(adapter-polling_netdev[0])))
-   __netif_rx_schedule(adapter-polling_netdev[0]);
+
+   /*
+* netpoll operations, in the interests of efficiency 
+* only do napi polling on the device passed to the 
+* poll_controller.  Therefore, if we are preforming 
+* a netpoll operation, then we can't schedule a receive
+* to one of the dummy net devices that exist for sole
+* purpose of spreading out rx schedules
+*/ 
+   if (unlikely(netpoll_op))
+   dev_to_sched = netdev;
+   else
+   dev_to_sched = adapter-polling_netdev[0];
+
+   if (likely(netif_rx_schedule_prep(dev_to_sched)))
+   __netif_rx_schedule(dev_to_sched);
else
e1000_irq_enable(adapter);
 #else
@@ -3288,6 +3304,13 @@ e1000_intr(int irq, void *data, struct p
return IRQ_HANDLED;
 }
 
+static irqreturn_t
+e1000_intr(int irq, void *data, struct pt_regs *regs)
+{
+   return __e1000_intr(irq, data, regs, 0);
+}
+
+
 #ifdef CONFIG_E1000_NAPI
 /**
  * e1000_clean - NAPI Rx polling callback
@@ -3300,7 +3323,6 @@ e1000_clean(struct net_device *poll_dev,
struct e1000_adapter *adapter;
int work_to_do = min(*budget, poll_dev-quota);
int tx_cleaned = 0, i = 0, work_done = 0;
-
/* Must NOT use netdev_priv macro here. */
adapter = poll_dev-priv;
 
@@ -3308,10 +3330,24 @@ e1000_clean(struct net_device *poll_dev,
if (!netif_carrier_ok(adapter-netdev))
goto quit_polling;
 
-   while (poll_dev != adapter-polling_netdev[i]) {
-   i++;
-   if (unlikely(i == adapter-num_rx_queues))
-   BUG();
+   /*
+* only search for a matching polling_netdev in the event
+* that this isn't a real registered net_device
+* A real net device can be passed in here in the event
+* that netdump has been activated (this comes through 
+* netpoll_poll_dev).  We detect this by virtue of the 
+* fact that each polling_netdev-priv points to the private
+* data of its parent (registered) netdev.  So if:
+* poll_dev-priv == netdev_priv(poll_dev), its a real device
+* otherwise its a polling_netdev.
+*/
+   if (likely(adapter != netdev_priv(poll_dev))) {
+   while (poll_dev != adapter-polling_netdev[i]) {
+   i++;
+   if (unlikely(i == adapter-num_rx_queues))
+   BUG

Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-06 Thread Neil Horman
On Mon, Jun 05, 2006 at 04:11:25PM -0700, Kok, Auke wrote:
 
 Netpoll was broken due to the earlier addition of multiqueue.
 
 Signed-off-by: Mitch Williams [EMAIL PROTECTED]
 Signed-off-by: Auke Kok [EMAIL PROTECTED]
 ---
 
  drivers/net/e1000/e1000_main.c |9 -
  1 files changed, 8 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
 index ed15fca..7103a0e 100644
 --- a/drivers/net/e1000/e1000_main.c
 +++ b/drivers/net/e1000/e1000_main.c
 @@ -4629,10 +4629,17 @@ static void
  e1000_netpoll(struct net_device *netdev)
  {
   struct e1000_adapter *adapter = netdev_priv(netdev);
 +#ifdef CONFIG_E1000_NAPI
 + int budget = 0;
 +
 + disable_irq(adapter-pdev-irq);
 + e1000_clean_tx_irq(adapter, adapter-tx_ring);
 + adapter-clean_rx(adapter, adapter-rx_ring, budget, netdev-weight);
 +#else
 +
I've been speaking about this fix with a Jeff Moyer, and we've come up with some
concerns regarding its implementation.  Specifically the call to
adapter-clean_rx in the case of the e1000 driver is rather a layering
violation in the netpoll code, in the sense that the function pointed to by 
clean_rx
is functionality that is nominally used by the dev-poll method.  In fact in
this case, it would appear possible since dev-poll is called under the
poll_lock, but dev-poll_controller is not, that is is possible to have cpus in
a system executing in e1000_clean_rx_irq_[ps] at the same time leading to data
corruption:

CPU0:
netpoll_poll_dev
 dev-poll_controller (e1000_netpoll)
  adapter-clean_rx (e1000_clean_rx_irq)

CPU1:
napi_poll
 dev-poll (e1000_clean)
  e1000_clean_rx_irq

I'm not sure what the right fix is here.  A spinlock in e1000_clean_rx_irq[_ps]
would seem to be an easy and direct fix, but I don't know that thats the best
solution.

Something I don't understand is why the call to adapter-clean_rx is
required in the first place when the napi_poll routine calls it itself directly.
All other drivers schedule a napi poll and receive frames via that path.  My
guess is that it has to do with the fact that we schedule polls on the device in
the polling_netdev array, rather than the actual registered netdev. Looking at
the driver code I note that while an entire array is allocated for polling
netdevs, we only ever use entry 0.  Would it make sense to just remove the the
polling_netdev array and use the registered device like all the other drivers.
I expect that would eliminate the need for this patch as well.

Regards
Neil

   disable_irq(adapter-pdev-irq);
   e1000_intr(adapter-pdev-irq, netdev, NULL);
   e1000_clean_tx_irq(adapter, adapter-tx_ring);
 -#ifndef CONFIG_E1000_NAPI
   adapter-clean_rx(adapter, adapter-rx_ring);
  #endif
   enable_irq(adapter-pdev-irq);
 
 
 
 --
 Auke Kok [EMAIL PROTECTED]
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
/***
 *Neil Horman
 *Software Engineer
 *gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
 ***/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-06 Thread Mitch Williams
On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:
 I've been speaking about this fix with a Jeff Moyer, and we've come up with 
 some
 concerns regarding its implementation.  Specifically the call to
 adapter-clean_rx in the case of the e1000 driver is rather a layering
 violation in the netpoll code, in the sense that the function pointed to by 
 clean_rx
 is functionality that is nominally used by the dev-poll method.  In fact in
 this case, it would appear possible since dev-poll is called under the
 poll_lock, but dev-poll_controller is not, that is is possible to have cpus 
 in
 a system executing in e1000_clean_rx_irq_[ps] at the same time leading to data
 corruption:
 
 CPU0:
 netpoll_poll_dev
  dev-poll_controller (e1000_netpoll)
   adapter-clean_rx (e1000_clean_rx_irq)
 
 CPU1:
 napi_poll
  dev-poll (e1000_clean)
   e1000_clean_rx_irq

Hmmm. You may have a point.  I don't think a spinlock is required, but
we do need to check if the poll is already scheduled on another CPU,
like netpoll does in poll_napi().

In practice, of course, we never see this.  The only netpoll client in
the kernel is netconsole, which doesn't do receives.  A few Major
Distros use netdump, which does do receives, but only after the system
has crashed.  In that case, all other CPUs are stopped anyway.

However, just for the sake of correctness (and paranoia), I'll whip up
another patch that does this check.

Jeff, please do not commit this patch.

-Mitch

NB:  The root of this problem is that e1000 uses a dummy netdev to
schedule NAPI polling.  Since netpoll doesn't know about the dummy
netdev, it checks the real e1000 netdev struct to see if it's
scheduled for polling.  Since this is never the case, netpoll fails when
e1000 is configured to use NAPI.  Hence, this patch.

Why is the dummy netdev in place?  To support multi-queue RX.  Our PCIe
adapters support this, but the kernel's not _quite_ ready yet.
Hopefully, the VJ net channel stuff will enable this feature.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-06 Thread Neil Horman
On Tue, Jun 06, 2006 at 09:39:25AM -0700, Mitch Williams wrote:
 On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:
  I've been speaking about this fix with a Jeff Moyer, and we've come up with 
  some
  concerns regarding its implementation.  Specifically the call to
  adapter-clean_rx in the case of the e1000 driver is rather a layering
  violation in the netpoll code, in the sense that the function pointed to by 
  clean_rx
  is functionality that is nominally used by the dev-poll method.  In fact in
  this case, it would appear possible since dev-poll is called under the
  poll_lock, but dev-poll_controller is not, that is is possible to have 
  cpus in
  a system executing in e1000_clean_rx_irq_[ps] at the same time leading to 
  data
  corruption:
  
  CPU0:
  netpoll_poll_dev
   dev-poll_controller (e1000_netpoll)
adapter-clean_rx (e1000_clean_rx_irq)
  
  CPU1:
  napi_poll
   dev-poll (e1000_clean)
e1000_clean_rx_irq
 
 Hmmm. You may have a point.  I don't think a spinlock is required, but
 we do need to check if the poll is already scheduled on another CPU,
 like netpoll does in poll_napi().
 
 In practice, of course, we never see this.  The only netpoll client in
 the kernel is netconsole, which doesn't do receives.  A few Major
 Distros use netdump, which does do receives, but only after the system
 has crashed.  In that case, all other CPUs are stopped anyway.
 
You are probably right, this is a rare case, if it ever happens at all, but I
think there is (at least as Jeff explained it to me) a corner case, where
netconsole on transmit, notes an exhaustion of tx descriptors, and in response
calls the poll controller method of the driver to clean up and make some of
those descriptors available:

printk
  release_console_sem
call_console_drivers
  netconsole.c:write_msg
netpoll_send_udp
  netpoll_send_skb

if (netif_queue_stopped(np-dev)) --- out of descriptors ?
netpoll_poll(np); --- trigger a poll to clean up

If this is being done at the same time as a napi_poll on another cpu, we have a
real set of conditions under which a corruption can occur.

 However, just for the sake of correctness (and paranoia), I'll whip up
 another patch that does this check.
 
Thanks for the quick feedback!

Regards
Neil

 Jeff, please do not commit this patch.
 
 -Mitch
 
 NB:  The root of this problem is that e1000 uses a dummy netdev to
 schedule NAPI polling.  Since netpoll doesn't know about the dummy
 netdev, it checks the real e1000 netdev struct to see if it's
 scheduled for polling.  Since this is never the case, netpoll fails when
 e1000 is configured to use NAPI.  Hence, this patch.
 
 Why is the dummy netdev in place?  To support multi-queue RX.  Our PCIe
 adapters support this, but the kernel's not _quite_ ready yet.
 Hopefully, the VJ net channel stuff will enable this feature.
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
/***
 *Neil Horman
 *Software Engineer
 *gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
 ***/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-06 Thread Auke Kok

Neil Horman wrote:

On Tue, Jun 06, 2006 at 09:39:25AM -0700, Mitch Williams wrote:

On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:



[snip]



However, just for the sake of correctness (and paranoia), I'll whip up
another patch that does this check.


Thanks for the quick feedback!

Regards
Neil


Jeff, please do not commit this patch.


Jeff,

I've popped the patch off from our gitserver, so you can pull the two 
outstanding patches while we revamp this one.


Auke

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-06 Thread Jeff Moyer
== Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Mitch Williams 
[EMAIL PROTECTED] adds:

mitch On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:
 I've been speaking about this fix with a Jeff Moyer, and we've come up with 
 some
 concerns regarding its implementation.  Specifically the call to
adapter- clean_rx in the case of the e1000 driver is rather a layering
 violation in the netpoll code, in the sense that the function pointed to by 
 clean_rx
 is functionality that is nominally used by the dev-poll method.  In fact in
 this case, it would appear possible since dev-poll is called under the
 poll_lock, but dev-poll_controller is not, that is is possible to have cpus 
 in
 a system executing in e1000_clean_rx_irq_[ps] at the same time leading to 
 data
 corruption:
 
 CPU0:
 netpoll_poll_dev
dev- poll_controller (e1000_netpoll)
adapter- clean_rx (e1000_clean_rx_irq)
 
 CPU1:
 napi_poll
dev- poll (e1000_clean)
 e1000_clean_rx_irq

mitch Hmmm. You may have a point.  I don't think a spinlock is required, but
mitch we do need to check if the poll is already scheduled on another CPU,
mitch like netpoll does in poll_napi().

mitch In practice, of course, we never see this.  The only netpoll client in
mitch the kernel is netconsole, which doesn't do receives.  A few Major
mitch Distros use netdump, which does do receives, but only after the system
mitch has crashed.  In that case, all other CPUs are stopped anyway.

Don't forget about driver/net/kgdb_eth.c, which is in-tree, and does sends
and receives.

-Jeff
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-06 Thread Auke Kok

Jeff Moyer wrote:

== Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok [EMAIL 
PROTECTED] adds:

auke-jan.h.kok Neil Horman wrote:

On Tue, Jun 06, 2006 at 09:39:25AM -0700, Mitch Williams wrote:

On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:



auke-jan.h.kok [snip]



However, just for the sake of correctness (and paranoia), I'll whip up
another patch that does this check.


Thanks for the quick feedback!
Regards
Neil


Jeff, please do not commit this patch.


auke-jan.h.kok Jeff,

auke-jan.h.kok I've popped the patch off from our gitserver, so you can pull 
the two
auke-jan.h.kok outstanding patches while we revamp this one.

Would you please send patches to this list?  I'd certainly like to review
them.  I don't think the problem needs solving in the e1000 driver.  I
think it is an issue that should be handled properly by netpoll.


???

that message was directed to Jeff Garzik, perhaps that was too confusing.


They were sent here in the first place:

http://marc.theaimsgroup.com/?l=linux-netdevm=114954878711789w=2


Cheers,

Auke

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-06 Thread Jeff Moyer
== Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok [EMAIL 
PROTECTED] adds:

auke-jan.h.kok Jeff Moyer wrote:
 == Regarding Re: [PATCH 1/2] e1000: fix netpoll with NAPI; Auke Kok [EMAIL 
 PROTECTED] adds:
auke-jan.h.kok Neil Horman wrote:
 On Tue, Jun 06, 2006 at 09:39:25AM -0700, Mitch Williams wrote:
 On Tue, 2006-06-06 at 09:52 -0400, Neil Horman wrote:
auke-jan.h.kok [snip]
 
 However, just for the sake of correctness (and paranoia), I'll whip up
 another patch that does this check.
 
 Thanks for the quick feedback!
 Regards
 Neil
 
 Jeff, please do not commit this patch.
auke-jan.h.kok Jeff,
auke-jan.h.kok I've popped the patch off from our gitserver, so you can
 pull the two
auke-jan.h.kok outstanding patches while we revamp this one.
 Would you please send patches to this list?  I'd certainly like to review
 them.  I don't think the problem needs solving in the e1000 driver.  I
 think it is an issue that should be handled properly by netpoll.

auke-jan.h.kok ???

auke-jan.h.kok that message was directed to Jeff Garzik, perhaps that was too 
confusing.

I figured it was directed at Jeff G.

auke-jan.h.kok They were sent here in the first place:

auke-jan.h.kok 
http://marc.theaimsgroup.com/?l=linux-netdevm=114954878711789w=2

Thanks for the pointer.

As you noted, e1000 now uses a separate device for polling.  Netpoll should
be able to deal with this, though.  I'd like to solicit mpm's input on
this, as I'm having difficulties coming up with a clean solution.

For some background, the netpoll_poll_lock calls were introduced to prevent
recursion in a device driver's -poll routine.  By essentially calling the
poll routine from the poll_controller routine, you are no longer prevented
from such recursion.

It would be best if the poll_controller routine was kept simple.  It should
do the equivalent of the interrupt processing portion of the work, and
leave the delivery to the network stack for a call to the -poll routine.

Solving this at the netpoll layer will be a bit difficult, since we have no
insight into the driver design (as your driver illustrates).  Up until now,
our model worked well.

We could, potentially, walk the list of devices scheduled for a poll much
the same as net_rx_action does.  However, we don't want to process work
pending on other network adapters, only the one associated with the netpoll
net device.  I can think of at least one way to make this distinction, but
it feels too much like a hack.

Matt, any ideas on this?

-Jeff
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html