Re: [RFC] net: napi fix

2007-12-20 Thread David Miller
From: Robert Olsson <[EMAIL PROTECTED]>
Date: Thu, 20 Dec 2007 10:52:17 +0100

>  Yes but the reason was not to wait for all pending polls to
>  complete so a server/router could be rebooted even under high-
>  load and DOS. We've experienced some nasty problems with this.

I know, see the rest of the thread where I agree that
we need to deal with this somehow.

The device is marked down first, and somehow we need to
tip off of that to break out of the NAPI loop.  This
"how" is what hasn't been resolved yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-20 Thread Robert Olsson

David Miller writes:

 > > Is the netif_running() check even required?
 > 
 > No, it is not.
 > 
 > When a device is brought down, one of the first things
 > that happens is that we wait for all pending NAPI polls
 > to complete, then block any new polls from starting.

 Hello!

 Yes but the reason was not to wait for all pending polls to
 complete so a server/router could be rebooted even under high-
 load and DOS. We've experienced some nasty problems with this.

 Cheers.
--ro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-20 Thread Robert Olsson

David Miller writes:

   Is the netif_running() check even required?
  
  No, it is not.
  
  When a device is brought down, one of the first things
  that happens is that we wait for all pending NAPI polls
  to complete, then block any new polls from starting.

 Hello!

 Yes but the reason was not to wait for all pending polls to
 complete so a server/router could be rebooted even under high-
 load and DOS. We've experienced some nasty problems with this.

 Cheers.
--ro
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-20 Thread David Miller
From: Robert Olsson [EMAIL PROTECTED]
Date: Thu, 20 Dec 2007 10:52:17 +0100

  Yes but the reason was not to wait for all pending polls to
  complete so a server/router could be rebooted even under high-
  load and DOS. We've experienced some nasty problems with this.

I know, see the rest of the thread where I agree that
we need to deal with this somehow.

The device is marked down first, and somehow we need to
tip off of that to break out of the NAPI loop.  This
how is what hasn't been resolved yet.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Joonwoo Park
2007/12/13, Andrew Gallatin <[EMAIL PROTECTED]>:
>
> If the netif_running() check is indeed required to make a device break
> out of napi polling and respond to an ifconfig down, then I think the
> netif_running() check should be moved up into net_rx_action() to avoid
> potential for driver complexity and bugs like the ones you found.
>
> Drew
>

Yep, It looks good.

Joonwoo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
David Miller wrote, On 12/13/2007 11:34 PM:

> From: Jarek Poplawski <[EMAIL PROTECTED]>
> Date: Thu, 13 Dec 2007 23:28:41 +0100
> 
>> ...I'm afraid I can't understand: I mean doing the same but without
>> passing this info with 'work == weight': if driver sends this info,
>> why it can't instead call something like napi_continue() with
>> this list_move_tail() (and probably additional local_irq_disable()/
>> enble() - but since it's unlikely()?) which looks much more readable,
>> and saves one whole unlikely if ()?
> 
> Because the poll list is private to net_rx_action() and we don't
> want to expose implementation details like that to every
> ->poll() implementation.

So, it seems 'we' failed e.g. exposing napi_complete()...
OK, no offense, I'll only mention at the end that there is
always a possibility to redefine such a function to {} with any
change of implementation.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Jarek Poplawski <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 23:28:41 +0100

> ...I'm afraid I can't understand: I mean doing the same but without
> passing this info with 'work == weight': if driver sends this info,
> why it can't instead call something like napi_continue() with
> this list_move_tail() (and probably additional local_irq_disable()/
> enble() - but since it's unlikely()?) which looks much more readable,
> and saves one whole unlikely if ()?

Because the poll list is private to net_rx_action() and we don't
want to expose implementation details like that to every
->poll() implementation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
David Miller wrote, On 12/13/2007 09:37 PM:
...

> For example, if we export the list handling widget into the ->poll()
> routines, god help the person who wants to change how the poll list is
> managed in net_rx_action() :-/

...I'm afraid I can't understand: I mean doing the same but without
passing this info with 'work == weight': if driver sends this info,
why it can't instead call something like napi_continue() with
this list_move_tail() (and probably additional local_irq_disable()/
enble() - but since it's unlikely()?) which looks much more readable,
and saves one whole unlikely if ()?

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
Stephen Hemminger wrote, On 12/13/2007 09:41 PM:

> David Miller wrote:
>> From: Jarek Poplawski <[EMAIL PROTECTED]>
>> Date: Thu, 13 Dec 2007 21:16:12 +0100
>>
>>   
>>> I see in a nearby thread you would prefer to save some work to drivers
>>> (like this netif_running() check), but I think this all is at the cost
>>> of flexibility, and there will probably appear new problems, when a
>>> driver simply can't wait till the next poll (which btw. looks strange
>>> with all these hotplugging, usb and powersaving).
>>> 
>> As someone who has actually had to edit the NAPI support of _EVERY_
>> single driver in the tree I can tell you that code duplication and
>> subtle semantic differences are a huge issue.
>>
>> And when you talk about driver flexibility, it's wise to mention that
>> this comes at the expense of flexibility in the core implmentation.
>> For example, if we export the list handling widget into the ->poll()
>> routines, god help the person who wants to change how the poll list is
>> managed in net_rx_action() :-/
>>
>> So we don't want to export datastructure details like that to the
>> driver.
   

(I hope you both don't mind I save some 'paper' and do this
2 in 1...)

So, you've seen a few drivers, know this much better than me, and
maybe even thought why they all so unnecessarily different... Of
course, if you think that despite those differences they all can
work with simpler napi api then OK (until they don't have to do
any cheating, like with this 'work' here).

> Also, most of the drivers should/could be doing the same thing. It is 
> seems that
> driver writers just want to get "creative" and do things differently. 
> The code is
> cleaner, safer, and less buggy if every device uses the interface in the 
> same way.
> 
> When I did the initial pass on this, I didn't see a single variation on 
> NAPI usage
> that was better than the simple "get N packets and return" variation.  
> But Dave
> did way more detailed grunt work on this.

It seems there are some differences in thinking what is simple/complex.
I think drivers' developers are used to controlling their devices, so
they know better when to turn on/off interrupts. So, maybe similar model
could be appropriate here. Sometimes doing more looks simpler than doing
less and remembering how and when the rest will be done (like
this netif_running() test). But I hope I'm wrong here, and this will
work after all!

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Stephen Hemminger

David Miller wrote:

From: Jarek Poplawski <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 21:16:12 +0100

  

I see in a nearby thread you would prefer to save some work to drivers
(like this netif_running() check), but I think this all is at the cost
of flexibility, and there will probably appear new problems, when a
driver simply can't wait till the next poll (which btw. looks strange
with all these hotplugging, usb and powersaving).



As someone who has actually had to edit the NAPI support of _EVERY_
single driver in the tree I can tell you that code duplication and
subtle semantic differences are a huge issue.

And when you talk about driver flexibility, it's wise to mention that
this comes at the expense of flexibility in the core implmentation.
For example, if we export the list handling widget into the ->poll()
routines, god help the person who wants to change how the poll list is
managed in net_rx_action() :-/

So we don't want to export datastructure details like that to the
driver.
  
Also, most of the drivers should/could be doing the same thing. It is 
seems that
driver writers just want to get "creative" and do things differently. 
The code is
cleaner, safer, and less buggy if every device uses the interface in the 
same way.


When I did the initial pass on this, I didn't see a single variation on 
NAPI usage
that was better than the simple "get N packets and return" variation.  
But Dave

did way more detailed grunt work on this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 11:35:07 -0800

> How about allowing a return value of -1 from napi_poll and letting
> device check itself.

It doesn't avoid the code duplication in the ->poll() fast paths.

I don't care, on the other hand, if crap accumulates in non-critical
slow paths like napi_disable() and dev_close().  That's why I'm
suggesting solutions in that area.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Jarek Poplawski <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 21:16:12 +0100

> I see in a nearby thread you would prefer to save some work to drivers
> (like this netif_running() check), but I think this all is at the cost
> of flexibility, and there will probably appear new problems, when a
> driver simply can't wait till the next poll (which btw. looks strange
> with all these hotplugging, usb and powersaving).

As someone who has actually had to edit the NAPI support of _EVERY_
single driver in the tree I can tell you that code duplication and
subtle semantic differences are a huge issue.

And when you talk about driver flexibility, it's wise to mention that
this comes at the expense of flexibility in the core implmentation.
For example, if we export the list handling widget into the ->poll()
routines, god help the person who wants to change how the poll list is
managed in net_rx_action() :-/

So we don't want to export datastructure details like that to the
driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
David Miller wrote, On 12/13/2007 02:50 PM:

> From: Jarek Poplawski <[EMAIL PROTECTED]>
> Date: Thu, 13 Dec 2007 14:49:53 +0100
> 
>> As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
>> I wonder what is the main reason or gain of leaving such a tricky
>> exception, instead of letting drivers to always decide which is the
>> best moment for napi_complete()? (Or maybe even, in such a case, they
>> should call some function with this list_move_tail() if it's so
>> useful?)
> 
> It is the only sane way to synchronize the list manipulations.
> 
> There has to be a way for ->poll() to tell net_rx_action() two things:
> 
> 1) How much work was completed, so we can adjust 'budget'


The 'budget' line would stay where it is. IMHO, it's only about this
list_move_tail(). (Probably also doing netpoll_poll_unlock()
during n->poll() could be considered to let the driver even destroy
napi just after napi_complete() - but it's another subject.)

> 2) Was the NAPI quota exhausted?  So that we know that
>net_rx_action() still "owns" the polling context and
>thus can do the list manipulation safely.
> 
> And these both need to be encoded into one single return value, thus
> the adopted convention that "work == weight" means that the device has
> not done a NAPI complete.

Of course, with some care and explanations to driver maintainers, like in
this case, this all should probably work like it is. But IMHO it would be
easier to remember and maintain if there are some simple rules with no
exceptions, so here e.g. driver always "owns" (with functions like
napi_schedule(), napi_complete(), and maybe napi_move_tail()), and
net_rx_action() only reads the list and runs these functions?!

I see in a nearby thread you would prefer to save some work to drivers
(like this netif_running() check), but I think this all is at the cost
of flexibility, and there will probably appear new problems, when a
driver simply can't wait till the next poll (which btw. looks strange
with all these hotplugging, usb and powersaving).

Regards,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Stephen Hemminger

David Miller wrote:

From: Andrew Gallatin <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 14:02:25 -0500

  

Or perhaps we should just leave things as is.



We should probably add a "disabling" state bit to the
napi struct flags, this will be set by napi_disable()
before it loops trying to set the sched bit.

net_rx_action() can then check this.
  

How about allowing a return value of -1 from napi_poll and letting device
check itself.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Andrew Gallatin

Stephen Hemminger wrote:

On Thu, 13 Dec 2007 06:19:38 -0800 (PST)
David Miller <[EMAIL PROTECTED]> wrote:


From: Andrew Gallatin <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 09:13:54 -0500


If the netif_running() check is indeed required to make a device break
out of napi polling and respond to an ifconfig down, then I think the
netif_running() check should be moved up into net_rx_action() to avoid
potential for driver complexity and bugs like the ones you found.

That, or something like it, definitely sounds reasonable and much
better than putting the check into every driver :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


It is not possible to do netif_running() check in generic code as currently
written because of the case of devices where a single NAPI object is
being used to handle two devices. The association between napi and netdevice
is M to N.  There are cases like niu that have multiple NAPI's and one
netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's.


Ah, now I see.  I forgot that not every device has a 1:1::napi:netdev
relationship.

Could we make an optional *dev_state field in the napi structure.
It would be initialized to __LINK_STATE_START.  Devices which have
a 1:1 NAPI:netdevice relationship would set it to >state.
The generic code would then do a test_bit(__LINK_STATE_START, 
napi->dev_state), and 1:1 drivers could remove this check.

M:N drivers would pay for a useless (to them) test_bit, and would
have to provide their own netif_running check to get termination
under heavy load.

Just an idea, perhaps there is a better way which is less hacky.

Or perhaps we should just leave things as is.

Drew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Andrew Gallatin <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 14:02:25 -0500

> Or perhaps we should just leave things as is.

We should probably add a "disabling" state bit to the
napi struct flags, this will be set by napi_disable()
before it loops trying to set the sched bit.

net_rx_action() can then check this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Stephen Hemminger
On Thu, 13 Dec 2007 06:19:38 -0800 (PST)
David Miller <[EMAIL PROTECTED]> wrote:

> From: Andrew Gallatin <[EMAIL PROTECTED]>
> Date: Thu, 13 Dec 2007 09:13:54 -0500
> 
> > If the netif_running() check is indeed required to make a device break
> > out of napi polling and respond to an ifconfig down, then I think the
> > netif_running() check should be moved up into net_rx_action() to avoid
> > potential for driver complexity and bugs like the ones you found.
> 
> That, or something like it, definitely sounds reasonable and much
> better than putting the check into every driver :-)
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

It is not possible to do netif_running() check in generic code as currently
written because of the case of devices where a single NAPI object is
being used to handle two devices. The association between napi and netdevice
is M to N.  There are cases like niu that have multiple NAPI's and one
netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's.

The existing pointer from napi to netdevice is only used by netconsole
now. For devices like sky2 it means that netconsole can't work on the the
second port which is a not a big problem. But adding a netif_running()
check would be a big issue.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Kok, Auke
David Miller wrote:
> From: Andrew Gallatin <[EMAIL PROTECTED]>
> Date: Thu, 13 Dec 2007 09:13:54 -0500
> 
>> If the netif_running() check is indeed required to make a device break
>> out of napi polling and respond to an ifconfig down, then I think the
>> netif_running() check should be moved up into net_rx_action() to avoid
>> potential for driver complexity and bugs like the ones you found.
> 
> That, or something like it, definitely sounds reasonable and much
> better than putting the check into every driver :-)

hear hear!

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Andrew Gallatin <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 09:13:54 -0500

> If the netif_running() check is indeed required to make a device break
> out of napi polling and respond to an ifconfig down, then I think the
> netif_running() check should be moved up into net_rx_action() to avoid
> potential for driver complexity and bugs like the ones you found.

That, or something like it, definitely sounds reasonable and much
better than putting the check into every driver :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Andrew Gallatin

Joonwoo Park wrote:
> 2007/12/13, Kok, Auke <[EMAIL PROTECTED]>:
>> David Miller wrote:
>>> From: Andrew Gallatin <[EMAIL PROTECTED]>
>>> Date: Wed, 12 Dec 2007 12:29:23 -0500
>>>
 Is the netif_running() check even required?
>>> No, it is not.
>>>
>>> When a device is brought down, one of the first things
>>> that happens is that we wait for all pending NAPI polls
>>> to complete, then block any new polls from starting.
>> I think this was previously (pre-2.6.24) not the case, which is why 
e1000 et al

>> has this check as well and that's exactly what is causing most of the
>> net_rx_action oopses in the first place. Without the netif_running() 
check
>> previously the drivers were just unusable with NAPI and prone to 
many races with
>> down (i.e. touching some ethtool ioctl which wants to do a reset 
while routing
>> small packets at high numbers). that's why we added the 
netif_running() check in

>> the first place :)
>>
>> There might be more drivers lurking that need this change...
>>
>> Auke
>>
>
> Also in my case, without netif_running() check, I cannot do ifconfig 
down.

> It stucked if packet generator was sending packets.

If the netif_running() check is indeed required to make a device break
out of napi polling and respond to an ifconfig down, then I think the
netif_running() check should be moved up into net_rx_action() to avoid
potential for driver complexity and bugs like the ones you found.

Drew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
On Thu, Dec 13, 2007 at 05:50:13AM -0800, David Miller wrote:
> From: Jarek Poplawski <[EMAIL PROTECTED]>
> Date: Thu, 13 Dec 2007 14:49:53 +0100
> 
> > As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
> > I wonder what is the main reason or gain of leaving such a tricky
> > exception, instead of letting drivers to always decide which is the
> > best moment for napi_complete()? (Or maybe even, in such a case, they
> > should call some function with this list_move_tail() if it's so
> > useful?)
> 
> It is the only sane way to synchronize the list manipulations.
> 
> There has to be a way for ->poll() to tell net_rx_action() two things:
> 
> 1) How much work was completed, so we can adjust 'budget'
> 2) Was the NAPI quota exhausted?  So that we know that
>net_rx_action() still "owns" the polling context and
>thus can do the list manipulation safely.
> 
> And these both need to be encoded into one single return value, thus
> the adopted convention that "work == weight" means that the device has
> not done a NAPI complete.

Thanks! So, I've to rethink this all...

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Jarek Poplawski <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 14:49:53 +0100

> As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
> I wonder what is the main reason or gain of leaving such a tricky
> exception, instead of letting drivers to always decide which is the
> best moment for napi_complete()? (Or maybe even, in such a case, they
> should call some function with this list_move_tail() if it's so
> useful?)

It is the only sane way to synchronize the list manipulations.

There has to be a way for ->poll() to tell net_rx_action() two things:

1) How much work was completed, so we can adjust 'budget'
2) Was the NAPI quota exhausted?  So that we know that
   net_rx_action() still "owns" the polling context and
   thus can do the list manipulation safely.

And these both need to be encoded into one single return value, thus
the adopted convention that "work == weight" means that the device has
not done a NAPI complete.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
On 12-12-2007 19:41, Kok, Auke wrote:
> David Miller wrote:
>> From: Andrew Gallatin <[EMAIL PROTECTED]>
>> Date: Wed, 12 Dec 2007 12:29:23 -0500
>>
>>> Is the netif_running() check even required?
>> No, it is not.
>>
>> When a device is brought down, one of the first things
>> that happens is that we wait for all pending NAPI polls
>> to complete, then block any new polls from starting.
> 
> I think this was previously (pre-2.6.24) not the case, which is why e1000 et 
> al
> has this check as well and that's exactly what is causing most of the
> net_rx_action oopses in the first place. Without the netif_running() check
> previously the drivers were just unusable with NAPI and prone to many races 
> with
> down (i.e. touching some ethtool ioctl which wants to do a reset while routing
> small packets at high numbers). that's why we added the netif_running() check 
> in
> the first place :)
> 
> There might be more drivers lurking that need this change...
> 

As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
I wonder what is the main reason or gain of leaving such a tricky
exception, instead of letting drivers to always decide which is the
best moment for napi_complete()? (Or maybe even, in such a case, they
should call some function with this list_move_tail() if it's so
useful?)

Regards,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Jarek Poplawski [EMAIL PROTECTED]
Date: Thu, 13 Dec 2007 14:49:53 +0100

 As a matter of fact, since it's unlikely() in net_rx_action() anyway,
 I wonder what is the main reason or gain of leaving such a tricky
 exception, instead of letting drivers to always decide which is the
 best moment for napi_complete()? (Or maybe even, in such a case, they
 should call some function with this list_move_tail() if it's so
 useful?)

It is the only sane way to synchronize the list manipulations.

There has to be a way for -poll() to tell net_rx_action() two things:

1) How much work was completed, so we can adjust 'budget'
2) Was the NAPI quota exhausted?  So that we know that
   net_rx_action() still owns the polling context and
   thus can do the list manipulation safely.

And these both need to be encoded into one single return value, thus
the adopted convention that work == weight means that the device has
not done a NAPI complete.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
On 12-12-2007 19:41, Kok, Auke wrote:
 David Miller wrote:
 From: Andrew Gallatin [EMAIL PROTECTED]
 Date: Wed, 12 Dec 2007 12:29:23 -0500

 Is the netif_running() check even required?
 No, it is not.

 When a device is brought down, one of the first things
 that happens is that we wait for all pending NAPI polls
 to complete, then block any new polls from starting.
 
 I think this was previously (pre-2.6.24) not the case, which is why e1000 et 
 al
 has this check as well and that's exactly what is causing most of the
 net_rx_action oopses in the first place. Without the netif_running() check
 previously the drivers were just unusable with NAPI and prone to many races 
 with
 down (i.e. touching some ethtool ioctl which wants to do a reset while routing
 small packets at high numbers). that's why we added the netif_running() check 
 in
 the first place :)
 
 There might be more drivers lurking that need this change...
 

As a matter of fact, since it's unlikely() in net_rx_action() anyway,
I wonder what is the main reason or gain of leaving such a tricky
exception, instead of letting drivers to always decide which is the
best moment for napi_complete()? (Or maybe even, in such a case, they
should call some function with this list_move_tail() if it's so
useful?)

Regards,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Andrew Gallatin [EMAIL PROTECTED]
Date: Thu, 13 Dec 2007 09:13:54 -0500

 If the netif_running() check is indeed required to make a device break
 out of napi polling and respond to an ifconfig down, then I think the
 netif_running() check should be moved up into net_rx_action() to avoid
 potential for driver complexity and bugs like the ones you found.

That, or something like it, definitely sounds reasonable and much
better than putting the check into every driver :-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Andrew Gallatin

Joonwoo Park wrote:
 2007/12/13, Kok, Auke [EMAIL PROTECTED]:
 David Miller wrote:
 From: Andrew Gallatin [EMAIL PROTECTED]
 Date: Wed, 12 Dec 2007 12:29:23 -0500

 Is the netif_running() check even required?
 No, it is not.

 When a device is brought down, one of the first things
 that happens is that we wait for all pending NAPI polls
 to complete, then block any new polls from starting.
 I think this was previously (pre-2.6.24) not the case, which is why 
e1000 et al

 has this check as well and that's exactly what is causing most of the
 net_rx_action oopses in the first place. Without the netif_running() 
check
 previously the drivers were just unusable with NAPI and prone to 
many races with
 down (i.e. touching some ethtool ioctl which wants to do a reset 
while routing
 small packets at high numbers). that's why we added the 
netif_running() check in

 the first place :)

 There might be more drivers lurking that need this change...

 Auke


 Also in my case, without netif_running() check, I cannot do ifconfig 
down.

 It stucked if packet generator was sending packets.

If the netif_running() check is indeed required to make a device break
out of napi polling and respond to an ifconfig down, then I think the
netif_running() check should be moved up into net_rx_action() to avoid
potential for driver complexity and bugs like the ones you found.

Drew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
On Thu, Dec 13, 2007 at 05:50:13AM -0800, David Miller wrote:
 From: Jarek Poplawski [EMAIL PROTECTED]
 Date: Thu, 13 Dec 2007 14:49:53 +0100
 
  As a matter of fact, since it's unlikely() in net_rx_action() anyway,
  I wonder what is the main reason or gain of leaving such a tricky
  exception, instead of letting drivers to always decide which is the
  best moment for napi_complete()? (Or maybe even, in such a case, they
  should call some function with this list_move_tail() if it's so
  useful?)
 
 It is the only sane way to synchronize the list manipulations.
 
 There has to be a way for -poll() to tell net_rx_action() two things:
 
 1) How much work was completed, so we can adjust 'budget'
 2) Was the NAPI quota exhausted?  So that we know that
net_rx_action() still owns the polling context and
thus can do the list manipulation safely.
 
 And these both need to be encoded into one single return value, thus
 the adopted convention that work == weight means that the device has
 not done a NAPI complete.

Thanks! So, I've to rethink this all...

Jarek P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Kok, Auke
David Miller wrote:
 From: Andrew Gallatin [EMAIL PROTECTED]
 Date: Thu, 13 Dec 2007 09:13:54 -0500
 
 If the netif_running() check is indeed required to make a device break
 out of napi polling and respond to an ifconfig down, then I think the
 netif_running() check should be moved up into net_rx_action() to avoid
 potential for driver complexity and bugs like the ones you found.
 
 That, or something like it, definitely sounds reasonable and much
 better than putting the check into every driver :-)

hear hear!

Auke
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Stephen Hemminger
On Thu, 13 Dec 2007 06:19:38 -0800 (PST)
David Miller [EMAIL PROTECTED] wrote:

 From: Andrew Gallatin [EMAIL PROTECTED]
 Date: Thu, 13 Dec 2007 09:13:54 -0500
 
  If the netif_running() check is indeed required to make a device break
  out of napi polling and respond to an ifconfig down, then I think the
  netif_running() check should be moved up into net_rx_action() to avoid
  potential for driver complexity and bugs like the ones you found.
 
 That, or something like it, definitely sounds reasonable and much
 better than putting the check into every driver :-)
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

It is not possible to do netif_running() check in generic code as currently
written because of the case of devices where a single NAPI object is
being used to handle two devices. The association between napi and netdevice
is M to N.  There are cases like niu that have multiple NAPI's and one
netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's.

The existing pointer from napi to netdevice is only used by netconsole
now. For devices like sky2 it means that netconsole can't work on the the
second port which is a not a big problem. But adding a netif_running()
check would be a big issue.

-- 
Stephen Hemminger [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Andrew Gallatin [EMAIL PROTECTED]
Date: Thu, 13 Dec 2007 14:02:25 -0500

 Or perhaps we should just leave things as is.

We should probably add a disabling state bit to the
napi struct flags, this will be set by napi_disable()
before it loops trying to set the sched bit.

net_rx_action() can then check this.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Andrew Gallatin

Stephen Hemminger wrote:

On Thu, 13 Dec 2007 06:19:38 -0800 (PST)
David Miller [EMAIL PROTECTED] wrote:


From: Andrew Gallatin [EMAIL PROTECTED]
Date: Thu, 13 Dec 2007 09:13:54 -0500


If the netif_running() check is indeed required to make a device break
out of napi polling and respond to an ifconfig down, then I think the
netif_running() check should be moved up into net_rx_action() to avoid
potential for driver complexity and bugs like the ones you found.

That, or something like it, definitely sounds reasonable and much
better than putting the check into every driver :-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


It is not possible to do netif_running() check in generic code as currently
written because of the case of devices where a single NAPI object is
being used to handle two devices. The association between napi and netdevice
is M to N.  There are cases like niu that have multiple NAPI's and one
netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's.


Ah, now I see.  I forgot that not every device has a 1:1::napi:netdev
relationship.

Could we make an optional *dev_state field in the napi structure.
It would be initialized to __LINK_STATE_START.  Devices which have
a 1:1 NAPI:netdevice relationship would set it to netdev-state.
The generic code would then do a test_bit(__LINK_STATE_START, 
napi-dev_state), and 1:1 drivers could remove this check.

M:N drivers would pay for a useless (to them) test_bit, and would
have to provide their own netif_running check to get termination
under heavy load.

Just an idea, perhaps there is a better way which is less hacky.

Or perhaps we should just leave things as is.

Drew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Stephen Hemminger

David Miller wrote:

From: Andrew Gallatin [EMAIL PROTECTED]
Date: Thu, 13 Dec 2007 14:02:25 -0500

  

Or perhaps we should just leave things as is.



We should probably add a disabling state bit to the
napi struct flags, this will be set by napi_disable()
before it loops trying to set the sched bit.

net_rx_action() can then check this.
  

How about allowing a return value of -1 from napi_poll and letting device
check itself.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
David Miller wrote, On 12/13/2007 02:50 PM:

 From: Jarek Poplawski [EMAIL PROTECTED]
 Date: Thu, 13 Dec 2007 14:49:53 +0100
 
 As a matter of fact, since it's unlikely() in net_rx_action() anyway,
 I wonder what is the main reason or gain of leaving such a tricky
 exception, instead of letting drivers to always decide which is the
 best moment for napi_complete()? (Or maybe even, in such a case, they
 should call some function with this list_move_tail() if it's so
 useful?)
 
 It is the only sane way to synchronize the list manipulations.
 
 There has to be a way for -poll() to tell net_rx_action() two things:
 
 1) How much work was completed, so we can adjust 'budget'


The 'budget' line would stay where it is. IMHO, it's only about this
list_move_tail(). (Probably also doing netpoll_poll_unlock()
during n-poll() could be considered to let the driver even destroy
napi just after napi_complete() - but it's another subject.)

 2) Was the NAPI quota exhausted?  So that we know that
net_rx_action() still owns the polling context and
thus can do the list manipulation safely.
 
 And these both need to be encoded into one single return value, thus
 the adopted convention that work == weight means that the device has
 not done a NAPI complete.

Of course, with some care and explanations to driver maintainers, like in
this case, this all should probably work like it is. But IMHO it would be
easier to remember and maintain if there are some simple rules with no
exceptions, so here e.g. driver always owns (with functions like
napi_schedule(), napi_complete(), and maybe napi_move_tail()), and
net_rx_action() only reads the list and runs these functions?!

I see in a nearby thread you would prefer to save some work to drivers
(like this netif_running() check), but I think this all is at the cost
of flexibility, and there will probably appear new problems, when a
driver simply can't wait till the next poll (which btw. looks strange
with all these hotplugging, usb and powersaving).

Regards,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Jarek Poplawski [EMAIL PROTECTED]
Date: Thu, 13 Dec 2007 21:16:12 +0100

 I see in a nearby thread you would prefer to save some work to drivers
 (like this netif_running() check), but I think this all is at the cost
 of flexibility, and there will probably appear new problems, when a
 driver simply can't wait till the next poll (which btw. looks strange
 with all these hotplugging, usb and powersaving).

As someone who has actually had to edit the NAPI support of _EVERY_
single driver in the tree I can tell you that code duplication and
subtle semantic differences are a huge issue.

And when you talk about driver flexibility, it's wise to mention that
this comes at the expense of flexibility in the core implmentation.
For example, if we export the list handling widget into the -poll()
routines, god help the person who wants to change how the poll list is
managed in net_rx_action() :-/

So we don't want to export datastructure details like that to the
driver.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Thu, 13 Dec 2007 11:35:07 -0800

 How about allowing a return value of -1 from napi_poll and letting
 device check itself.

It doesn't avoid the code duplication in the -poll() fast paths.

I don't care, on the other hand, if crap accumulates in non-critical
slow paths like napi_disable() and dev_close().  That's why I'm
suggesting solutions in that area.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Stephen Hemminger

David Miller wrote:

From: Jarek Poplawski [EMAIL PROTECTED]
Date: Thu, 13 Dec 2007 21:16:12 +0100

  

I see in a nearby thread you would prefer to save some work to drivers
(like this netif_running() check), but I think this all is at the cost
of flexibility, and there will probably appear new problems, when a
driver simply can't wait till the next poll (which btw. looks strange
with all these hotplugging, usb and powersaving).



As someone who has actually had to edit the NAPI support of _EVERY_
single driver in the tree I can tell you that code duplication and
subtle semantic differences are a huge issue.

And when you talk about driver flexibility, it's wise to mention that
this comes at the expense of flexibility in the core implmentation.
For example, if we export the list handling widget into the -poll()
routines, god help the person who wants to change how the poll list is
managed in net_rx_action() :-/

So we don't want to export datastructure details like that to the
driver.
  
Also, most of the drivers should/could be doing the same thing. It is 
seems that
driver writers just want to get creative and do things differently. 
The code is
cleaner, safer, and less buggy if every device uses the interface in the 
same way.


When I did the initial pass on this, I didn't see a single variation on 
NAPI usage
that was better than the simple get N packets and return variation.  
But Dave

did way more detailed grunt work on this.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
Stephen Hemminger wrote, On 12/13/2007 09:41 PM:

 David Miller wrote:
 From: Jarek Poplawski [EMAIL PROTECTED]
 Date: Thu, 13 Dec 2007 21:16:12 +0100

   
 I see in a nearby thread you would prefer to save some work to drivers
 (like this netif_running() check), but I think this all is at the cost
 of flexibility, and there will probably appear new problems, when a
 driver simply can't wait till the next poll (which btw. looks strange
 with all these hotplugging, usb and powersaving).
 
 As someone who has actually had to edit the NAPI support of _EVERY_
 single driver in the tree I can tell you that code duplication and
 subtle semantic differences are a huge issue.

 And when you talk about driver flexibility, it's wise to mention that
 this comes at the expense of flexibility in the core implmentation.
 For example, if we export the list handling widget into the -poll()
 routines, god help the person who wants to change how the poll list is
 managed in net_rx_action() :-/

 So we don't want to export datastructure details like that to the
 driver.
   

(I hope you both don't mind I save some 'paper' and do this
2 in 1...)

So, you've seen a few drivers, know this much better than me, and
maybe even thought why they all so unnecessarily different... Of
course, if you think that despite those differences they all can
work with simpler napi api then OK (until they don't have to do
any cheating, like with this 'work' here).

 Also, most of the drivers should/could be doing the same thing. It is 
 seems that
 driver writers just want to get creative and do things differently. 
 The code is
 cleaner, safer, and less buggy if every device uses the interface in the 
 same way.
 
 When I did the initial pass on this, I didn't see a single variation on 
 NAPI usage
 that was better than the simple get N packets and return variation.  
 But Dave
 did way more detailed grunt work on this.

It seems there are some differences in thinking what is simple/complex.
I think drivers' developers are used to controlling their devices, so
they know better when to turn on/off interrupts. So, maybe similar model
could be appropriate here. Sometimes doing more looks simpler than doing
less and remembering how and when the rest will be done (like
this netif_running() test). But I hope I'm wrong here, and this will
work after all!

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
David Miller wrote, On 12/13/2007 09:37 PM:
...

 For example, if we export the list handling widget into the -poll()
 routines, god help the person who wants to change how the poll list is
 managed in net_rx_action() :-/

...I'm afraid I can't understand: I mean doing the same but without
passing this info with 'work == weight': if driver sends this info,
why it can't instead call something like napi_continue() with
this list_move_tail() (and probably additional local_irq_disable()/
enble() - but since it's unlikely()?) which looks much more readable,
and saves one whole unlikely if ()?

Jarek P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread David Miller
From: Jarek Poplawski [EMAIL PROTECTED]
Date: Thu, 13 Dec 2007 23:28:41 +0100

 ...I'm afraid I can't understand: I mean doing the same but without
 passing this info with 'work == weight': if driver sends this info,
 why it can't instead call something like napi_continue() with
 this list_move_tail() (and probably additional local_irq_disable()/
 enble() - but since it's unlikely()?) which looks much more readable,
 and saves one whole unlikely if ()?

Because the poll list is private to net_rx_action() and we don't
want to expose implementation details like that to every
-poll() implementation.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski
David Miller wrote, On 12/13/2007 11:34 PM:

 From: Jarek Poplawski [EMAIL PROTECTED]
 Date: Thu, 13 Dec 2007 23:28:41 +0100
 
 ...I'm afraid I can't understand: I mean doing the same but without
 passing this info with 'work == weight': if driver sends this info,
 why it can't instead call something like napi_continue() with
 this list_move_tail() (and probably additional local_irq_disable()/
 enble() - but since it's unlikely()?) which looks much more readable,
 and saves one whole unlikely if ()?
 
 Because the poll list is private to net_rx_action() and we don't
 want to expose implementation details like that to every
 -poll() implementation.

So, it seems 'we' failed e.g. exposing napi_complete()...
OK, no offense, I'll only mention at the end that there is
always a possibility to redefine such a function to {} with any
change of implementation.

Jarek P.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-13 Thread Joonwoo Park
2007/12/13, Andrew Gallatin [EMAIL PROTECTED]:

 If the netif_running() check is indeed required to make a device break
 out of napi polling and respond to an ifconfig down, then I think the
 netif_running() check should be moved up into net_rx_action() to avoid
 potential for driver complexity and bugs like the ones you found.

 Drew


Yep, It looks good.

Joonwoo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread Joonwoo Park
2007/12/13, Kok, Auke <[EMAIL PROTECTED]>:
> David Miller wrote:
> > From: Andrew Gallatin <[EMAIL PROTECTED]>
> > Date: Wed, 12 Dec 2007 12:29:23 -0500
> >
> >> Is the netif_running() check even required?
> >
> > No, it is not.
> >
> > When a device is brought down, one of the first things
> > that happens is that we wait for all pending NAPI polls
> > to complete, then block any new polls from starting.
>
> I think this was previously (pre-2.6.24) not the case, which is why e1000 et 
> al
> has this check as well and that's exactly what is causing most of the
> net_rx_action oopses in the first place. Without the netif_running() check
> previously the drivers were just unusable with NAPI and prone to many races 
> with
> down (i.e. touching some ethtool ioctl which wants to do a reset while routing
> small packets at high numbers). that's why we added the netif_running() check 
> in
> the first place :)
>
> There might be more drivers lurking that need this change...
>
> Auke
>

Also in my case, without netif_running() check, I cannot do ifconfig down.
It stucked if packet generator was sending packets.

Thanks
Joonwoo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread Kok, Auke
David Miller wrote:
> From: Andrew Gallatin <[EMAIL PROTECTED]>
> Date: Wed, 12 Dec 2007 12:29:23 -0500
> 
>> Is the netif_running() check even required?
> 
> No, it is not.
> 
> When a device is brought down, one of the first things
> that happens is that we wait for all pending NAPI polls
> to complete, then block any new polls from starting.

I think this was previously (pre-2.6.24) not the case, which is why e1000 et al
has this check as well and that's exactly what is causing most of the
net_rx_action oopses in the first place. Without the netif_running() check
previously the drivers were just unusable with NAPI and prone to many races with
down (i.e. touching some ethtool ioctl which wants to do a reset while routing
small packets at high numbers). that's why we added the netif_running() check in
the first place :)

There might be more drivers lurking that need this change...

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread Andrew Gallatin

David Miller wrote:

From: Andrew Gallatin <[EMAIL PROTECTED]>
Date: Wed, 12 Dec 2007 12:29:23 -0500


Is the netif_running() check even required?


No, it is not.

When a device is brought down, one of the first things
that happens is that we wait for all pending NAPI polls
to complete, then block any new polls from starting.


Great, thanks.  I will submit a patch to remove the bogus
check.  This should fix myri10ge properly.


Thank you,

Drew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread David Miller
From: Andrew Gallatin <[EMAIL PROTECTED]>
Date: Wed, 12 Dec 2007 12:29:23 -0500

> Is the netif_running() check even required?

No, it is not.

When a device is brought down, one of the first things
that happens is that we wait for all pending NAPI polls
to complete, then block any new polls from starting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread Andrew Gallatin

[I apologize for loosing threading, I'm replying from the archives]

> The problem is that the driver is doing a NAPI completion and
> re-enabling chip interrupts with work_done == weight, and that is
> illegal.

The only time at least myri10ge will do this is due to
the !netif_running(netdev) check.   Eg, from myri10ge's poll:

work_done = myri10ge_clean_rx_done(mgp, budget);

if (work_done < budget || !netif_running(netdev)) {
netif_rx_complete(netdev, napi);
put_be32(htonl(3), mgp->irq_claim);
}

Is the netif_running() check even required? Is this just
a bad way to solve a race with running NAPI at down() time
that would be better solved by putting a napi_synchronize()
in the driver's down() routine?

I'd rather fix this right than add another check to a
questionable code path.

Thanks,

Drew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread David Miller
From: "Joonwoo Park" <[EMAIL PROTECTED]>
Date: Wed, 12 Dec 2007 15:05:26 +0900

> Could you explain how it fix the problem?
> IMHO I think your patch cannot solve the problem.
> The drivers can call netif_rx_complete and net_rx_action can do
> list_move_tail also.

Stephen is confused about what the bug is in these drivers,
that's all.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 11 Dec 2007 21:46:34 -0800

> Isn't this a better fix for all drivers, rather than peppering every
> driver with the special case. This is how the logic worked up until
> 2.6.24.

Stephen this is not the problem.

The problem is that the driver is doing a NAPI completion and
re-enabling chip interrupts with work_done == weight, and that is
illegal.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Tue, 11 Dec 2007 21:46:34 -0800

 Isn't this a better fix for all drivers, rather than peppering every
 driver with the special case. This is how the logic worked up until
 2.6.24.

Stephen this is not the problem.

The problem is that the driver is doing a NAPI completion and
re-enabling chip interrupts with work_done == weight, and that is
illegal.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread David Miller
From: Joonwoo Park [EMAIL PROTECTED]
Date: Wed, 12 Dec 2007 15:05:26 +0900

 Could you explain how it fix the problem?
 IMHO I think your patch cannot solve the problem.
 The drivers can call netif_rx_complete and net_rx_action can do
 list_move_tail also.

Stephen is confused about what the bug is in these drivers,
that's all.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread David Miller
From: Andrew Gallatin [EMAIL PROTECTED]
Date: Wed, 12 Dec 2007 12:29:23 -0500

 Is the netif_running() check even required?

No, it is not.

When a device is brought down, one of the first things
that happens is that we wait for all pending NAPI polls
to complete, then block any new polls from starting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread Andrew Gallatin

[I apologize for loosing threading, I'm replying from the archives]

 The problem is that the driver is doing a NAPI completion and
 re-enabling chip interrupts with work_done == weight, and that is
 illegal.

The only time at least myri10ge will do this is due to
the !netif_running(netdev) check.   Eg, from myri10ge's poll:

work_done = myri10ge_clean_rx_done(mgp, budget);

if (work_done  budget || !netif_running(netdev)) {
netif_rx_complete(netdev, napi);
put_be32(htonl(3), mgp-irq_claim);
}

Is the netif_running() check even required? Is this just
a bad way to solve a race with running NAPI at down() time
that would be better solved by putting a napi_synchronize()
in the driver's down() routine?

I'd rather fix this right than add another check to a
questionable code path.

Thanks,

Drew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread Andrew Gallatin

David Miller wrote:

From: Andrew Gallatin [EMAIL PROTECTED]
Date: Wed, 12 Dec 2007 12:29:23 -0500


Is the netif_running() check even required?


No, it is not.

When a device is brought down, one of the first things
that happens is that we wait for all pending NAPI polls
to complete, then block any new polls from starting.


Great, thanks.  I will submit a patch to remove the bogus
check.  This should fix myri10ge properly.


Thank you,

Drew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread Kok, Auke
David Miller wrote:
 From: Andrew Gallatin [EMAIL PROTECTED]
 Date: Wed, 12 Dec 2007 12:29:23 -0500
 
 Is the netif_running() check even required?
 
 No, it is not.
 
 When a device is brought down, one of the first things
 that happens is that we wait for all pending NAPI polls
 to complete, then block any new polls from starting.

I think this was previously (pre-2.6.24) not the case, which is why e1000 et al
has this check as well and that's exactly what is causing most of the
net_rx_action oopses in the first place. Without the netif_running() check
previously the drivers were just unusable with NAPI and prone to many races with
down (i.e. touching some ethtool ioctl which wants to do a reset while routing
small packets at high numbers). that's why we added the netif_running() check in
the first place :)

There might be more drivers lurking that need this change...

Auke
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-12 Thread Joonwoo Park
2007/12/13, Kok, Auke [EMAIL PROTECTED]:
 David Miller wrote:
  From: Andrew Gallatin [EMAIL PROTECTED]
  Date: Wed, 12 Dec 2007 12:29:23 -0500
 
  Is the netif_running() check even required?
 
  No, it is not.
 
  When a device is brought down, one of the first things
  that happens is that we wait for all pending NAPI polls
  to complete, then block any new polls from starting.

 I think this was previously (pre-2.6.24) not the case, which is why e1000 et 
 al
 has this check as well and that's exactly what is causing most of the
 net_rx_action oopses in the first place. Without the netif_running() check
 previously the drivers were just unusable with NAPI and prone to many races 
 with
 down (i.e. touching some ethtool ioctl which wants to do a reset while routing
 small packets at high numbers). that's why we added the netif_running() check 
 in
 the first place :)

 There might be more drivers lurking that need this change...

 Auke


Also in my case, without netif_running() check, I cannot do ifconfig down.
It stucked if packet generator was sending packets.

Thanks
Joonwoo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-11 Thread Joonwoo Park
2007/12/12, Stephen Hemminger <[EMAIL PROTECTED]>:
> Isn't this a better fix for all drivers, rather than peppering every
> driver with the special case. This is how the logic worked up until
> 2.6.24.
>
>
> --- a/net/core/dev.c2007-12-11 12:16:20.0 -0800
> +++ b/net/core/dev.c2007-12-11 21:43:39.0 -0800
> @@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq
>
>have = netpoll_poll_lock(n);
>
> -   weight = n->weight;
> +   weight = min(n->weight, budget);
>
>/* This NAPI_STATE_SCHED test is for avoiding a race
> * with netpoll's poll_napi().  Only the entity which
>

Stephen,
Could you explain how it fix the problem?
IMHO I think your patch cannot solve the problem.
The drivers can call netif_rx_complete and net_rx_action can do
list_move_tail also.
Am I missing something?

Thanks
Joonwoo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] net: napi fix

2007-12-11 Thread Stephen Hemminger
Isn't this a better fix for all drivers, rather than peppering every
driver with the special case. This is how the logic worked up until
2.6.24.


--- a/net/core/dev.c2007-12-11 12:16:20.0 -0800
+++ b/net/core/dev.c2007-12-11 21:43:39.0 -0800
@@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq
 
have = netpoll_poll_lock(n);
 
-   weight = n->weight;
+   weight = min(n->weight, budget);
 
/* This NAPI_STATE_SCHED test is for avoiding a race
 * with netpoll's poll_napi().  Only the entity which
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] net: napi fix

2007-12-11 Thread Stephen Hemminger
Isn't this a better fix for all drivers, rather than peppering every
driver with the special case. This is how the logic worked up until
2.6.24.


--- a/net/core/dev.c2007-12-11 12:16:20.0 -0800
+++ b/net/core/dev.c2007-12-11 21:43:39.0 -0800
@@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq
 
have = netpoll_poll_lock(n);
 
-   weight = n-weight;
+   weight = min(n-weight, budget);
 
/* This NAPI_STATE_SCHED test is for avoiding a race
 * with netpoll's poll_napi().  Only the entity which
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] net: napi fix

2007-12-11 Thread Joonwoo Park
2007/12/12, Stephen Hemminger [EMAIL PROTECTED]:
 Isn't this a better fix for all drivers, rather than peppering every
 driver with the special case. This is how the logic worked up until
 2.6.24.


 --- a/net/core/dev.c2007-12-11 12:16:20.0 -0800
 +++ b/net/core/dev.c2007-12-11 21:43:39.0 -0800
 @@ -2184,7 +2184,7 @@ static void net_rx_action(struct softirq

have = netpoll_poll_lock(n);

 -   weight = n-weight;
 +   weight = min(n-weight, budget);

/* This NAPI_STATE_SCHED test is for avoiding a race
 * with netpoll's poll_napi().  Only the entity which


Stephen,
Could you explain how it fix the problem?
IMHO I think your patch cannot solve the problem.
The drivers can call netif_rx_complete and net_rx_action can do
list_move_tail also.
Am I missing something?

Thanks
Joonwoo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/