Re: [RESEND][PATCH] ebtables: clean up vmalloc usage in net/bridge/netfilter/ebtables.c

2006-04-20 Thread Jayachandran C
On Wed, Apr 19, 2006 at 04:13:24PM -0700, Andrew Morton wrote:
 David S. Miller [EMAIL PROTECTED] wrote:
 
  From: Andrew Morton [EMAIL PROTECTED]
  Date: Wed, 19 Apr 2006 15:59:25 -0700
  
   David S. Miller [EMAIL PROTECTED] wrote:
   
An earlier variant of your patch was applied already, included below.
You'll need to submit the newer parts relative to the current tree.
   
   This is a similar-but-different patch.  It applies OK.
   
   I reviewed it (mostly - it's somewhat non-trivial to do this) and queued 
   it
   up and was planning on sending it to you for post-2.6.17.
  
  It's at least fixing a few bugs, and the parts which are cleanups
  undoubtedly should prevent bugs in the future, so I think we
  should consider it for 2.6.17 right?
 
 afaict it's just a cleanup, but whatever - I'll send it over now.

The first patch (which is already applied) was a bug fix. This one
is just a clean up, it makes the same clean-up that Andrew did to
the original patch, in other places in the same file.

This is not at all critical, so it can be moved post-2.6.17 without
any problem.

Regards,
JC.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please pull upstream-fixes branch of wireless-2.6

2006-04-20 Thread Michael Buesch
On Thursday 20 April 2006 03:12, John W. Linville wrote:
   bcm43xx: fix dyn tssi2dbm memleak
   bcm43xx: fix pctl slowclock limit calculation
   bcm43xx: sysfs code cleanup

These are already in -mm and on their way into linus's tree.
Is it possible to cause problems?
If not, fine. If yes, we need some clearly defined rules where
to put patches and a clearly defined statement of how often
patches are pushed upstream.

-- 
Greetings Michael.


pgpVHc4xlR1cH.pgp
Description: PGP signature


Re: Please pull upstream-fixes branch of wireless-2.6

2006-04-20 Thread Andrew Morton
Michael Buesch [EMAIL PROTECTED] wrote:

 On Thursday 20 April 2006 03:12, John W. Linville wrote:
bcm43xx: fix dyn tssi2dbm memleak
bcm43xx: fix pctl slowclock limit calculation
bcm43xx: sysfs code cleanup
 
 These are already in -mm and on their way into linus's tree.

I don't send netdev patches to Linus except under unusual circumstances. 
I'd expect these patches to go upstream via John or Jeff.

 Is it possible to cause problems?

Nope, I'll just drop then when they appear in a git tree.

And I really need to find a way of getting git-wireless into -mm.  Problem
is, it's based off git-netdev-all and when John's tree is synced to a later
version of Linus's tree than Jeff's tree, all hell breaks loose at my end. 
Junio and I weren't able to work out a way of extracting the jeff-john
diffs so I gave up.

Probably, I'll need to actually do a git merge, generate the diff then
throw away the resulting git tree.  Or something.  I've avoided doing git
merges because I'm dealing with 58 trees and I suspect I'd go insane.

 If not, fine. If yes, we need some clearly defined rules where
 to put patches and a clearly defined statement of how often
 patches are pushed upstream.

Because I don't carry git-wireless I don't have visibility of when John has
merged something.  Ordinarily you'd have seen me drop the patches again
when they popped up in John's tree.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please pull upstream-fixes branch of wireless-2.6

2006-04-20 Thread Jeff Garzik

Michael Buesch wrote:

On Thursday 20 April 2006 03:12, John W. Linville wrote:

  bcm43xx: fix dyn tssi2dbm memleak
  bcm43xx: fix pctl slowclock limit calculation
  bcm43xx: sysfs code cleanup


These are already in -mm and on their way into linus's tree.
Is it possible to cause problems?
If not, fine. If yes, we need some clearly defined rules where
to put patches and a clearly defined statement of how often
patches are pushed upstream.


Ideally, patches should be sent to John, who will send me - Linus.  If 
they are bug fixes, the turnaround can be same once I get them from John 
(and Linus is taking patches).


That's always been the standard route:  wireless patches - wireless 
maintainer.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please pull upstream-fixes branch of wireless-2.6

2006-04-20 Thread Michael Buesch
On Thursday 20 April 2006 10:57, you wrote:
 Michael Buesch [EMAIL PROTECTED] wrote:
 
  On Thursday 20 April 2006 03:12, John W. Linville wrote:
 bcm43xx: fix dyn tssi2dbm memleak
 bcm43xx: fix pctl slowclock limit calculation
 bcm43xx: sysfs code cleanup
  
  These are already in -mm and on their way into linus's tree.
 
 I don't send netdev patches to Linus except under unusual circumstances. 
 I'd expect these patches to go upstream via John or Jeff.
 
  Is it possible to cause problems?
 
 Nope, I'll just drop then when they appear in a git tree.
 
 And I really need to find a way of getting git-wireless into -mm.  Problem
 is, it's based off git-netdev-all and when John's tree is synced to a later
 version of Linus's tree than Jeff's tree, all hell breaks loose at my end. 
 Junio and I weren't able to work out a way of extracting the jeff-john
 diffs so I gave up.
 
 Probably, I'll need to actually do a git merge, generate the diff then
 throw away the resulting git tree.  Or something.  I've avoided doing git
 merges because I'm dealing with 58 trees and I suspect I'd go insane.
 
  If not, fine. If yes, we need some clearly defined rules where
  to put patches and a clearly defined statement of how often
  patches are pushed upstream.
 
 Because I don't carry git-wireless I don't have visibility of when John has
 merged something.  Ordinarily you'd have seen me drop the patches again
 when they popped up in John's tree.

Ok, that is perfectly fine and it will work.
Thanks for the clarification.

-- 
Greetings Michael.


pgpKKw9wXoRoW.pgp
Description: PGP signature


Re: [XFRM Doc]: aevent description

2006-04-20 Thread jamal
On Fri, 2006-14-04 at 15:05 -0700, David S. Miller wrote:
 From: jamal [EMAIL PROTECTED]
 Date: Thu, 13 Apr 2006 09:00:08 -0400
 
  There is dependency on the previous patch i sent since the issue that
  patch fixes is assumed in this text description. It would be a good
  idea to apply at the same time as the other.
 
 Applied, after fixing 28 lines containing trailing whitespace :-)

yikes ;-
Ok, so how do i avoid this in the future? Note, this was a _brand new_
file, so it is a little bizarre.

cheers,
jamal 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please pull upstream-fixes branch of wireless-2.6

2006-04-20 Thread John W. Linville
On Thu, Apr 20, 2006 at 01:57:52AM -0700, Andrew Morton wrote:

 And I really need to find a way of getting git-wireless into -mm.  Problem
 is, it's based off git-netdev-all and when John's tree is synced to a later
 version of Linus's tree than Jeff's tree, all hell breaks loose at my end. 

FWIW, I think this issue should be gone (hopefully never to return).
For a while I was pulling from Jeff's netdev tree as a way to fix-up
a git administration error I had inflicted upon myself...  That need
has disappeared since 2.6.17 opened and Jeff pushed his upstream
branch to Linus.

At present, all the branches in wireless-2.6 only pull from linux-2.6.
I am still pushing (i.e. requesting Jeff's pull) to netdev-2.6,
if that matters.

Maybe the current wireless-2.6 tree fits into your system better?

Thanks,

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] ipv4: initialize arp_tbl rw lock

2006-04-20 Thread Heiko Carstens
  As spinlock debugging still does not work with the qeth driver I
  want to pick up the discussion.
 
 Does something like the patch below work?
 
 But this all begs the question, what happens if you want to
 dig into the internals of a protocol which is built modular and
 hasn't been loaded yet?
 
 diff --git a/include/linux/init.h b/include/linux/init.h
 index 93dcbe1..8169f25 100644
 --- a/include/linux/init.h
 +++ b/include/linux/init.h
 @@ -95,8 +95,9 @@ #define postcore_initcall(fn)   __define_
  #define arch_initcall(fn)__define_initcall(3,fn)
  #define subsys_initcall(fn)  __define_initcall(4,fn)
  #define fs_initcall(fn)  __define_initcall(5,fn)
 -#define device_initcall(fn)  __define_initcall(6,fn)
 -#define late_initcall(fn)__define_initcall(7,fn)
 +#define net_initcall(fn) __define_initcall(6,fn)
 +#define device_initcall(fn)  __define_initcall(7,fn)
 +#define late_initcall(fn)__define_initcall(8,fn)
  
  #define __initcall(fn) device_initcall(fn)
  
 diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
 index dc206f1..9803a57 100644
 --- a/net/ipv4/af_inet.c
 +++ b/net/ipv4/af_inet.c
 @@ -1257,7 +1257,7 @@ out_unregister_udp_proto:
   goto out;
  }
  
 -module_init(inet_init);
 +net_initcall(inet_init);

That's exactly the same thing that I tried to. It didn't work for me since I
saw sometimes the described rcu_update latencies.
Today I was able to boot the machine 30 times and just saw it once... Not very
helpful for debugging this :(
Btw.: I guess the linker scripts need an update too, so that the new
.initcall8.init section doesn't get discarded.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SIOCGIWSCAN wireless event behaviour

2006-04-20 Thread Daniel Drake

Hi Jean,

A query regarding wireless events: under which circumstances should a 
driver/stack send a SIOCGIWSCAN event to userspace?


Should it be sent whenever a driver has new scan results available, or 
only when the user requested a scan a short time beforehand (via 
SIOCSIWSCAN)?


I ask this because softmac is sending the SIOCGIWSCAN event even when 
the user did not explicitly ask for it.


For example, the user sets an essid. softmac starts a scan in order to 
find the requested network. The network is found, the scan completes, 
and softmac sends SIOCGIWSCAN. softmac then authenticates to that 
network, associates, and then sends SIOCGIWAP.


I think the 'extra' SIOCGIWSCAN event may be confusing wpa_supplicant 
(but have not confirmed that yet).


Thanks,
Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Netlink and user-space buffer pointers

2006-04-20 Thread James Smart


Mike Christie wrote:

For the tasks you want to do for the fc class is performance critical?


No, it should not be.


If not, you could do what the iscsi class (for the netdev people this is
drivers/scsi/scsi_transport_iscsi.c) does and just suffer a couple
copies. For iscsi we do this in userspace to send down a login pdu:

/*
 * xmitbuf is a buffer that is large enough for the iscsi_event,
 * iscsi pdu (hdr_size) and iscsi pdu data (data_size)
 */


Well, the real difference is that the payload of the message is actually
the payload of the SCSI command or ELS/CT Request. Thus, the payload may
range in size from a few hundred bytes to several kbytes ( 1 page) to
Mbyte's in size. Rather than buffer all of this, and push it over the socket,
thus the extra copies - it would best to have the LLDD simply DMA the
payload like on a typical SCSI command.  Additionally, there will be
response data that can be several kbytes in length.


... I think there may be issues with packing structs or 32 bit
userspace and 64 bit kernels and other fun things like this so the iscsi
pdu and iscsi event have to be defined correctly and I guess we are back
to some of the problems with ioctls :(


Agreed. In this use of netlink, there's not a lot of wins for netlink over
ioctls. It all comes down to 2 things: a) proper portable message definition;
and b) what do you do with that non-portable user space buffer pointer ?

-- james s
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIOCGIWSCAN wireless event behaviour

2006-04-20 Thread Dan Williams
On Thu, 2006-04-20 at 15:15 +0100, Daniel Drake wrote:
 Hi Jean,
 
 A query regarding wireless events: under which circumstances should a 
 driver/stack send a SIOCGIWSCAN event to userspace?
 
 Should it be sent whenever a driver has new scan results available, or 
 only when the user requested a scan a short time beforehand (via 
 SIOCSIWSCAN)?

Similar situation:  when wpa_supplicant requests a scan, the driver
scans and pushes the GIWSCAN at completion.  _Every_ process (like
NetworkManager) listening for netlink WE messages gets the GIWSCAN event
even though only wpa_supplicant requested the original scan.

So what I'm saying is that applications that process GIWSCAN netlink
messages today should _already_ be able to handle random GIWSCAN events
at any time even when they have not explicitly requested a scan with
SIWSCAN.  The events are broadcast and the driver shouldn't really care
which user app initiated any particular request.  Multiple apps can
theoretically request scans at any time, though this isn't so good in
practice.

 I ask this because softmac is sending the SIOCGIWSCAN event even when 
 the user did not explicitly ask for it.

Given the above, I think this behavior is fine and even desirable.

 I think the 'extra' SIOCGIWSCAN event may be confusing wpa_supplicant 
 (but have not confirmed that yet).

If this is the case, wpa_supplicant should not be getting confused by
GIWSCAN events happening at random times, and should be fixed.  However,
in my experience with 0.4.8, this isn't a problem and wpa_supplicant
handles random scan events correctly.  Not sure about the 0.5.x branch
though.

Dan


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I/OAT: Call for discussion

2006-04-20 Thread Jack Vogel
On 4/19/06, Christoph Hellwig [EMAIL PROTECTED] wrote:
 On Wed, Apr 19, 2006 at 10:28:41AM -0700, John Ronciak wrote:
  The hardware is going to generally available in June.  There are also
  lots of OEMs, OSVs and hardware vendors that have the system to test
  on today.  The early rollout of hardware has been very large.

 As a start to get people actually interested you should stop talking
 like a jerk and kill all these silly three-letter acronyms from your language.


??? For a community absolutely FILLED with everyday use of acronyms
it boggles the mind why you would call someone names for using them.

So if they were 4 letter ones it would make him a savant instead??

Jack
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I/OAT: Call for discussion

2006-04-20 Thread Arnaldo Carvalho de Melo
On 4/20/06, Jack Vogel [EMAIL PROTECTED] wrote:
 On 4/19/06, Christoph Hellwig [EMAIL PROTECTED] wrote:
  On Wed, Apr 19, 2006 at 10:28:41AM -0700, John Ronciak wrote:
   The hardware is going to generally available in June.  There are also
   lots of OEMs, OSVs and hardware vendors that have the system to test
   on today.  The early rollout of hardware has been very large.
 
  As a start to get people actually interested you should stop talking
  like a jerk and kill all these silly three-letter acronyms from your 
  language.

 ??? For a community absolutely FILLED with everyday use of acronyms
 it boggles the mind why you would call someone names for using them.

 So if they were 4 letter ones it would make him a savant instead??

hch is not complaining about TLA usage, he is complaining about _silly_ TLAs
usage, as in to justify new feature acceptance in mainline.

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIOCGIWSCAN wireless event behaviour

2006-04-20 Thread Jouni Malinen
On Thu, Apr 20, 2006 at 03:15:59PM +0100, Daniel Drake wrote:

 I think the 'extra' SIOCGIWSCAN event may be confusing wpa_supplicant 
 (but have not confirmed that yet).

No, they don't. madwifi-ng is already doing this with background
scanning and as was pointed out, there can be multiple programs asking
for scans, so user space must be prepared for multiple events anyway.

-- 
Jouni MalinenPGP id EFC895FA
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIOCGIWSCAN wireless event behaviour

2006-04-20 Thread Jean Tourrilhes
On Thu, Apr 20, 2006 at 10:37:32AM -0400, Dan Williams wrote:
 On Thu, 2006-04-20 at 15:15 +0100, Daniel Drake wrote:
  Hi Jean,
  
  A query regarding wireless events: under which circumstances should a 
  driver/stack send a SIOCGIWSCAN event to userspace?
  
  Should it be sent whenever a driver has new scan results available, or 
  only when the user requested a scan a short time beforehand (via 
  SIOCSIWSCAN)?

The original behaviour was that the event was sent only when a
user did request a scan. At that time, cards did not do background
scanning, so new scan results would be produced only as a result of a
user scan.
After a short discussion we Dan, we agree that to change that,
the driver should send a scan whenever a new scan result is available,
regardless of how it happens (background scan or user scan). This
allow smart application to synchronise on background scans and avoid
them generating useless user scans. Minimising the number of user scan
is actually good.

 Similar situation:  when wpa_supplicant requests a scan, the driver
 scans and pushes the GIWSCAN at completion.  _Every_ process (like
 NetworkManager) listening for netlink WE messages gets the GIWSCAN event
 even though only wpa_supplicant requested the original scan.
 
 So what I'm saying is that applications that process GIWSCAN netlink
 messages today should _already_ be able to handle random GIWSCAN events
 at any time even when they have not explicitly requested a scan with
 SIWSCAN.  The events are broadcast and the driver shouldn't really care
 which user app initiated any particular request.  Multiple apps can
 theoretically request scans at any time, though this isn't so good in
 practice.

100% correct.

  I ask this because softmac is sending the SIOCGIWSCAN event even when 
  the user did not explicitly ask for it.
 
 Given the above, I think this behavior is fine and even desirable.

Yes.

  I think the 'extra' SIOCGIWSCAN event may be confusing wpa_supplicant 
  (but have not confirmed that yet).
 
 If this is the case, wpa_supplicant should not be getting confused by
 GIWSCAN events happening at random times, and should be fixed.  However,
 in my experience with 0.4.8, this isn't a problem and wpa_supplicant
 handles random scan events correctly.  Not sure about the 0.5.x branch
 though.

After we changed to behaviour of ipw, various users reported
that wpa_supplicant was confused. I particularly trust the report of
Bill Moss, who has been hacking ipw for a long time :

http://sourceforge.net/mailarchive/forum.php?thread_id=10091113forum_id=38938

Jouni was notified, but did not really answer to that bug report.
Then, the ipw maintainers commited the following patch to ipw
that fix or workaround that issue :

http://marc.theaimsgroup.com/?l=linux-netdevm=114492056522667w=2

I would still like Jouni to have a look at the issue to tell
us where the problem is. Two driver having issue is not coincidence. I
would hate driver starting to implement various workaround if the
problem is really in wpa_supplicant.

Have fun...

Jean
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIOCGIWSCAN wireless event behaviour

2006-04-20 Thread Jouni Malinen
On Thu, Apr 20, 2006 at 09:43:54AM -0700, Jean Tourrilhes wrote:
   After we changed to behaviour of ipw, various users reported
 that wpa_supplicant was confused. I particularly trust the report of
 Bill Moss, who has been hacking ipw for a long time :
 
 http://sourceforge.net/mailarchive/forum.php?thread_id=10091113forum_id=38938

Hmm.. Can someone please describe what was changed? Just sending
SIOCGIWSCAN events more frequently? I have not seen any problems with
this in my tests (though, mainly with madwifi-ng). Is the broken case
available in one of the kernel trees? 2.6.16? wireless-2.6? (i.e., where
can I get the exact version of ipw2200 driver that is expected to show
incorrect behavior)?

   Jouni was notified, but did not really answer to that bug report.
   Then, the ipw maintainers commited the following patch to ipw
 that fix or workaround that issue :
 
 http://marc.theaimsgroup.com/?l=linux-netdevm=114492056522667w=2

Hmm.. I don't remember having seen that report from Bill Moss.. How was
I notified? ;-) The patch here seems to be moving ipw_disassociate()
call, so it is not obviously clear from that what the impact on behavior
is. I can try to reproduce this, but I would like to know what version
to test with in order to avoid any possible workarounds from hiding the
issue.

-- 
Jouni MalinenPGP id EFC895FA
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Netlink and user-space buffer pointers

2006-04-20 Thread Mike Christie
James Smart wrote:
 
 Mike Christie wrote:
 For the tasks you want to do for the fc class is performance critical?
 
 No, it should not be.
 
 If not, you could do what the iscsi class (for the netdev people this is
 drivers/scsi/scsi_transport_iscsi.c) does and just suffer a couple
 copies. For iscsi we do this in userspace to send down a login pdu:

 /*
  * xmitbuf is a buffer that is large enough for the iscsi_event,
  * iscsi pdu (hdr_size) and iscsi pdu data (data_size)
  */
 
 Well, the real difference is that the payload of the message is actually
 the payload of the SCSI command or ELS/CT Request. Thus, the payload may

I am not sure I follow. For iscsi, everything after the iscsi_event
struct can be the iscsi request that is to be transmitted. The payload
will not normally be Mbytes but it is not a couple if bytes.

 range in size from a few hundred bytes to several kbytes ( 1 page) to
 Mbyte's in size. Rather than buffer all of this, and push it over the
 socket,
 thus the extra copies - it would best to have the LLDD simply DMA the
 payload like on a typical SCSI command.  Additionally, there will be
 response data that can be several kbytes in length.
 

Once you have got the buffer to the class, the class can create a
scatterlist to DMA from for the LLD. I thought. iscsi does not do this
just because it is software right now. For qla4xxx we do not need
something like what you are talking about (see below for what I was
thinking about for the initiators). If you are saying the extra step of
the copy is plain dumb, I agree, but this happens (you have to suffer
some copy and cannot do dio) for sg io as well in some cases. I think
for the sg driver the copy_*_user is the default.

Instead of netlink for scsi commands and transport requests

For scsi commands could we just use sg io, or is there something special
about the command you want to send? If you can use sg io for scsi
commands, maybe for transport level requests (in my example iscsi pdu)
we could modify something like sg/bsg/block layer scsi_ioctl.c to send
down transport requests to the classes and encapsulate them in some new
struct transport_requests or use the existing struct request but do that
thing people keep taling about using the request/request_queue for
message passing.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Netlink and user-space buffer pointers

2006-04-20 Thread Mike Christie
Mike Christie wrote:
 James Smart wrote:
 Mike Christie wrote:
 For the tasks you want to do for the fc class is performance critical?
 No, it should not be.

 If not, you could do what the iscsi class (for the netdev people this is
 drivers/scsi/scsi_transport_iscsi.c) does and just suffer a couple
 copies. For iscsi we do this in userspace to send down a login pdu:

 /*
  * xmitbuf is a buffer that is large enough for the iscsi_event,
  * iscsi pdu (hdr_size) and iscsi pdu data (data_size)
  */
 Well, the real difference is that the payload of the message is actually
 the payload of the SCSI command or ELS/CT Request. Thus, the payload may
 
 I am not sure I follow. For iscsi, everything after the iscsi_event
 struct can be the iscsi request that is to be transmitted. The payload
 will not normally be Mbytes but it is not a couple if bytes.
 
 range in size from a few hundred bytes to several kbytes ( 1 page) to
 Mbyte's in size. Rather than buffer all of this, and push it over the
 socket,
 thus the extra copies - it would best to have the LLDD simply DMA the
 payload like on a typical SCSI command.  Additionally, there will be
 response data that can be several kbytes in length.

 
 Once you have got the buffer to the class, the class can create a
 scatterlist to DMA from for the LLD. I thought. iscsi does not do this
 just because it is software right now. For qla4xxx we do not need

That should be, we do need.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/3] softmac: add SIOCSIWMLME

2006-04-20 Thread Johannes Berg
This patch adds the SIOCSIWMLME wext to softmac, this functionality
appears to be used by wpa_supplicant and is softmac-specific.

Signed-off-by: Johannes Berg [EMAIL PROTECTED]
Cc: Jouni Malinen [EMAIL PROTECTED]

--- wireless-2.6.orig/net/ieee80211/softmac/ieee80211softmac_priv.h 
2006-04-19 18:44:51.710074158 +0200
+++ wireless-2.6/net/ieee80211/softmac/ieee80211softmac_priv.h  2006-04-20 
00:50:54.930882874 +0200
@@ -150,6 +150,7 @@ int ieee80211softmac_handle_disassoc(str
 int ieee80211softmac_handle_reassoc_req(struct net_device * dev,
struct ieee80211_reassoc_request * 
reassoc);
 void ieee80211softmac_assoc_timeout(void *d);
+void ieee80211softmac_disassoc(struct ieee80211softmac_device *mac, u16 
reason);
 
 /* some helper functions */
 static inline int ieee80211softmac_scan_handlers_check_self(struct 
ieee80211softmac_device *sm)
--- wireless-2.6.orig/net/ieee80211/softmac/ieee80211softmac_wx.c   
2006-04-19 18:44:51.710074158 +0200
+++ wireless-2.6/net/ieee80211/softmac/ieee80211softmac_wx.c2006-04-19 
18:48:52.200074158 +0200
@@ -424,3 +424,35 @@ ieee80211softmac_wx_get_genie(struct net
 }
 EXPORT_SYMBOL_GPL(ieee80211softmac_wx_get_genie);
 
+int
+ieee80211softmac_wx_set_mlme(struct net_device *dev,
+struct iw_request_info *info,
+union iwreq_data *wrqu,
+char *extra)
+{
+   struct ieee80211softmac_device *mac = ieee80211_priv(dev);
+   struct iw_mlme *mlme = (struct iw_mlme *)extra;
+   u16 reason = cpu_to_le16(mlme-reason_code);
+   struct ieee80211softmac_network *net;
+
+   if (memcmp(mac-associnfo.bssid, mlme-addr.sa_data, ETH_ALEN)) {
+   printk(KERN_DEBUG PFX wx_set_mlme: requested operation on net 
we don't use\n);
+   return -EINVAL;
+   }
+
+   switch (mlme-cmd) {
+   case IW_MLME_DEAUTH:
+   net = ieee80211softmac_get_network_by_bssid_locked(mac, 
mlme-addr.sa_data);
+   if (!net) {
+   printk(KERN_DEBUG PFX wx_set_mlme: we should know the 
net here...\n);
+   return -EINVAL;
+   }
+   return ieee80211softmac_deauth_req(mac, net, reason);
+   case IW_MLME_DISASSOC:
+   ieee80211softmac_disassoc(mac, reason);
+   return 0;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+EXPORT_SYMBOL_GPL(ieee80211softmac_wx_set_mlme);
--- wireless-2.6.orig/include/net/ieee80211softmac_wx.h 2006-03-28 
16:23:31.0 +0200
+++ wireless-2.6/include/net/ieee80211softmac_wx.h  2006-04-19 
18:48:30.640074158 +0200
@@ -91,4 +91,9 @@ ieee80211softmac_wx_get_genie(struct net
  struct iw_request_info *info,
  union iwreq_data *wrqu,
  char *extra);
+extern int
+ieee80211softmac_wx_set_mlme(struct net_device *dev,
+struct iw_request_info *info,
+union iwreq_data *wrqu,
+char *extra);
 #endif /* _IEEE80211SOFTMAC_WX */
--- wireless-2.6.orig/net/ieee80211/softmac/ieee80211softmac_assoc.c
2006-04-19 18:46:29.0 +0200
+++ wireless-2.6/net/ieee80211/softmac/ieee80211softmac_assoc.c 2006-04-19 
18:46:47.300074158 +0200
@@ -82,7 +82,7 @@ ieee80211softmac_assoc_timeout(void *d)
 }
 
 /* Sends out a disassociation request to the desired AP */
-static void
+void
 ieee80211softmac_disassoc(struct ieee80211softmac_device *mac, u16 reason)
 {
unsigned long flags;

--

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/3] softmac: fix SIOCSIWAP

2006-04-20 Thread Johannes Berg
There are some bugs in the current implementation of the SIOCSIWAP wext,
for example that when you do it twice and it fails, it may still try
another access point for some reason. This patch fixes this by introducing
a new flag that tells the association code that the bssid that is in use
was fixed by the user and shouldn't be deviated from.

Signed-off-by: Johannes Berg [EMAIL PROTECTED]

--- wireless-2.6.orig/include/net/ieee80211softmac.h2006-04-13 
15:48:12.0 +0200
+++ wireless-2.6/include/net/ieee80211softmac.h 2006-04-20 01:10:32.770882874 
+0200
@@ -96,10 +96,13 @@ struct ieee80211softmac_assoc_info {
 *
 * bssvalid is true if we found a matching network
 * and saved it's BSSID into the bssid above.
+*
+* bssfixed is used for SIOCSIWAP.
 */
u8 static_essid:1,
   associating:1,
-  bssvalid:1;
+  bssvalid:1,
+  bssfixed:1;
 
/* Scan retries remaining */
int scan_retry;
--- wireless-2.6.orig/net/ieee80211/softmac/ieee80211softmac_assoc.c
2006-04-19 18:46:47.0 +0200
+++ wireless-2.6/net/ieee80211/softmac/ieee80211softmac_assoc.c 2006-04-20 
01:30:59.090882874 +0200
@@ -144,6 +144,12 @@ network_matches_request(struct ieee80211
if (!we_support_all_basic_rates(mac, net-rates_ex, net-rates_ex_len))
return 0;
 
+   /* assume that users know what they're doing ...
+* (note we don't let them select a net we're incompatible with) */
+   if (mac-associnfo.bssfixed) {
+   return !memcmp(mac-associnfo.bssid, net-bssid, ETH_ALEN);
+   }
+
/* if 'ANY' network requested, take any that doesn't have privacy 
enabled */
if (mac-associnfo.req_essid.len == 0 
 !(net-capability  WLAN_CAPABILITY_PRIVACY))
@@ -176,7 +182,7 @@ ieee80211softmac_assoc_work(void *d)
ieee80211softmac_disassoc(mac, 
WLAN_REASON_DISASSOC_STA_HAS_LEFT);
 
/* try to find the requested network in our list, if we found one 
already */
-   if (mac-associnfo.bssvalid)
+   if (mac-associnfo.bssvalid || mac-associnfo.bssfixed)
found = ieee80211softmac_get_network_by_bssid(mac, 
mac-associnfo.bssid);   

/* Search the ieee80211 networks for this network if we didn't find it 
by bssid,
@@ -241,19 +247,25 @@ ieee80211softmac_assoc_work(void *d)
if (ieee80211softmac_start_scan(mac))
dprintk(KERN_INFO PFX Associate: failed to 
initiate scan. Is device up?\n);
return;
-   }
-   else {
+   } else {
spin_lock_irqsave(mac-lock, flags);
mac-associnfo.associating = 0;
mac-associated = 0;
spin_unlock_irqrestore(mac-lock, flags);
 
dprintk(KERN_INFO PFX Unable to find matching network 
after scan!\n);
+   /* reset the retry counter for the next user request 
since we
+* break out and don't reschedule ourselves after this 
point. */
+   mac-associnfo.scan_retry = 
IEEE80211SOFTMAC_ASSOC_SCAN_RETRY_LIMIT;
ieee80211softmac_call_events(mac, 
IEEE80211SOFTMAC_EVENT_ASSOCIATE_NET_NOT_FOUND, NULL);
return;
}
}
-   
+
+   /* reset the retry counter for the next user request since we
+* now found a net and will try to associate to it, but not
+* schedule this function again. */
+   mac-associnfo.scan_retry = IEEE80211SOFTMAC_ASSOC_SCAN_RETRY_LIMIT;
mac-associnfo.bssvalid = 1;
memcpy(mac-associnfo.bssid, found-bssid, ETH_ALEN);
/* copy the ESSID for displaying it */
--- wireless-2.6.orig/net/ieee80211/softmac/ieee80211softmac_wx.c   
2006-04-19 18:48:52.0 +0200
+++ wireless-2.6/net/ieee80211/softmac/ieee80211softmac_wx.c2006-04-20 
15:27:26.122486954 +0200
@@ -27,7 +27,8 @@
 #include ieee80211softmac_priv.h
 
 #include net/iw_handler.h
-
+/* for is_broadcast_ether_addr and is_zero_ether_addr */
+#include linux/etherdevice.h
 
 int
 ieee80211softmac_wx_trigger_scan(struct net_device *net_dev,
@@ -83,7 +84,6 @@ ieee80211softmac_wx_set_essid(struct net
sm-associnfo.static_essid = 1;
}
}
-   sm-associnfo.scan_retry = IEEE80211SOFTMAC_ASSOC_SCAN_RETRY_LIMIT;
 
/* set our requested ESSID length.
 * If applicable, we have already copied the data in */
@@ -310,8 +310,6 @@ ieee80211softmac_wx_set_wap(struct net_d
char *extra)
 {
struct ieee80211softmac_device *mac = ieee80211_priv(net_dev);
-   static const unsigned char any[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
-   static const unsigned char off[] = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
unsigned long flags;
 

[patch 0/3] softmac: more fixes

2006-04-20 Thread Johannes Berg
This patchset fixes more things in softmac, the first patch implements
the SIOCSIWMLME wext, the second fixes the SIOCSIWAP wext and the third
cleans up the event code.

The second is a fairly important fix for wpa_supplicant and should probably
still go to 2.6.17, the others can go in too of course but aren't that
important I think.

johannes
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [XFRM Doc]: aevent description

2006-04-20 Thread David S. Miller
From: jamal [EMAIL PROTECTED]
Date: Thu, 20 Apr 2006 06:58:45 -0400

 On Fri, 2006-14-04 at 15:05 -0700, David S. Miller wrote:
  From: jamal [EMAIL PROTECTED]
  Date: Thu, 13 Apr 2006 09:00:08 -0400
  
   There is dependency on the previous patch i sent since the issue that
   patch fixes is assumed in this text description. It would be a good
   idea to apply at the same time as the other.
  
  Applied, after fixing 28 lines containing trailing whitespace :-)
 
 yikes ;-
 Ok, so how do i avoid this in the future? Note, this was a _brand new_
 file, so it is a little bizarre.

This command:

git apply --check --whitespace=error-all $1

will spit out errors if your patch adds trailing whitespace
or will not apply cleanly to the current GIT tree.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Van Jacobson's net channels and real-time

2006-04-20 Thread David S. Miller

[ Maybe ask questions like this on netdev where the networking
  developers hang out?  Added to CC: ]

Van fell off the face of the planet after giving his presentation and
never published his code, only his slides.

I've started to make a slow attempt at implementing his ideas, nothing
but pure infrastructure so far, but you can look at what I have here:

kernel.org:/pub/scm/linux/kernel/git/davem/vj-2.6.git

don't expect major progress and don't expect anything beyond a simple
channel to softint packet processing on receive any time soon.

Going all the way to the socket is a large endeavor and will require a
lot of restructuring to do it right, so expect this to take on the
order of months.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sendpage and high mem pages

2006-04-20 Thread David S. Miller
From: Mike Christie [EMAIL PROTECTED]
Date: Thu, 20 Apr 2006 14:29:06 -0500

 I was wondering if it is ok to pass sendpage high mem pages. If a piece
 of code does this:
 
 struct socket *sock;
 
 sock-ops-sendpage(pg...)
 
 and pg is a highmem page will the network layer do the right thing or
 should the caller check the page type and call sock_no_sendpage() for
 highmen? It looks like net/sunrpc/xprtsock.c does a check but
 drivers/scsi/iscsi_tcp.c and some others do not.

TCP and others handle this just fine, if something doesn't then it
needs to be fixed.  Any page in the page cache can be sent over this
interface.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please pull upstream-fixes branch of wireless-2.6

2006-04-20 Thread Andrew Morton
John W. Linville [EMAIL PROTECTED] wrote:

 At present, all the branches in wireless-2.6 only pull from linux-2.6.
  I am still pushing (i.e. requesting Jeff's pull) to netdev-2.6,
  if that matters.
 
  Maybe the current wireless-2.6 tree fits into your system better?

Works well, thanks.   I have some patches for you ;)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Netlink and user-space buffer pointers

2006-04-20 Thread James Smart

Note: We've transitioned off topic. If what this means is there isn't a good
way except by ioctls (which still isn't easily portable) or system calls,
then that's ok. Then at least we know the limits and can look at other
implementation alternatives.

Mike Christie wrote:

James Smart wrote:

Mike Christie wrote:

For the tasks you want to do for the fc class is performance critical?

No, it should not be.


If not, you could do what the iscsi class (for the netdev people this is
drivers/scsi/scsi_transport_iscsi.c) does and just suffer a couple
copies. For iscsi we do this in userspace to send down a login pdu:

/*
 * xmitbuf is a buffer that is large enough for the iscsi_event,
 * iscsi pdu (hdr_size) and iscsi pdu data (data_size)
 */

Well, the real difference is that the payload of the message is actually
the payload of the SCSI command or ELS/CT Request. Thus, the payload may


I am not sure I follow. For iscsi, everything after the iscsi_event
struct can be the iscsi request that is to be transmitted. The payload
will not normally be Mbytes but it is not a couple if bytes.


True... For a large read/write - it will eventually total what the i/o
request size was, and you did have to push it through the socekt.
What this discussion really comes down to is the difference between initiator
offload and what a target does.

The initiator offloads the full i/o from the users - e.g. send command,
get response. In the initiator case, the user isn't aware of each and
every IU that makes up the i/o. As it's on an i/o basis, the LLDD doing
the offload needs the full buffer sitting and ready. DMA is preferred so
the buffer doesn't have to be consuming socket/kernel/driver buffers while
it's pending - plus speed.

In the target case, the target controls each IU and it's size, thus it
only has to have access to as much buffer space as it wants to push the next
IU. The i/o can be paced by the target. Unfortunately, this is an entirely
different use model than users of a scsi initiator expect, and it won't map
well into replacing things like our sg_io ioctls.


Instead of netlink for scsi commands and transport requests

For scsi commands could we just use sg io, or is there something special
about the command you want to send? If you can use sg io for scsi
commands, maybe for transport level requests (in my example iscsi pdu)
we could modify something like sg/bsg/block layer scsi_ioctl.c to send
down transport requests to the classes and encapsulate them in some new
struct transport_requests or use the existing struct request but do that
thing people keep taling about using the request/request_queue for
message passing.


Well - there's 2 parts to this answer:

First : IOCTL's are considered dangerous/bad practice and therefore it would
  be nice to find a replacement mechanism that eliminates them. If that
  mechanism has some of the cool features that netlink does, even better.
  Using sg io, in the manner you indicate, wouldn't remove the ioctl use.
  Note: I have OEMs/users that are very confused about the community's statement
  about ioctls. They've heard they are bad, should never be allowed, will no
  be longer supported, but yet they are at the heart of DM and sg io and other
  subsystems. Other than a grandfathered explanation, they don't understand
  why the rules bend for one piece of code but not for another. To them, all
  the features are just as critical regardless of whose providing them.

Second: transport level i/o could be done like you suggest, and we've
  prototyped some of this as well. However, there's something very wrong
  about putting block device wrappers and settings around something that
  is not a block device.  In general, it's a heck of a lot of overhead and
  still doesn't solve the real issue - how to portably pass that user buffer
  in to/out of the kernel.


-- james s
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Netlink and user-space buffer pointers

2006-04-20 Thread Douglas Gilbert
Mike Christie wrote:
 James Smart wrote:
 
Mike Christie wrote:

For the tasks you want to do for the fc class is performance critical?

No, it should not be.


If not, you could do what the iscsi class (for the netdev people this is
drivers/scsi/scsi_transport_iscsi.c) does and just suffer a couple
copies. For iscsi we do this in userspace to send down a login pdu:

/*
 * xmitbuf is a buffer that is large enough for the iscsi_event,
 * iscsi pdu (hdr_size) and iscsi pdu data (data_size)
 */

Well, the real difference is that the payload of the message is actually
the payload of the SCSI command or ELS/CT Request. Thus, the payload may
 
 
 I am not sure I follow. For iscsi, everything after the iscsi_event
 struct can be the iscsi request that is to be transmitted. The payload
 will not normally be Mbytes but it is not a couple if bytes.
 
 
range in size from a few hundred bytes to several kbytes ( 1 page) to
Mbyte's in size. Rather than buffer all of this, and push it over the
socket,
thus the extra copies - it would best to have the LLDD simply DMA the
payload like on a typical SCSI command.  Additionally, there will be
response data that can be several kbytes in length.

 
 
 Once you have got the buffer to the class, the class can create a
 scatterlist to DMA from for the LLD. I thought. iscsi does not do this
 just because it is software right now. For qla4xxx we do not need
 something like what you are talking about (see below for what I was
 thinking about for the initiators). If you are saying the extra step of
 the copy is plain dumb, I agree, but this happens (you have to suffer
 some copy and cannot do dio) for sg io as well in some cases. I think
 for the sg driver the copy_*_user is the default.

Mike,
Indirect IO is the default in the sg driver because:
  - it has always been thus
  - the sg driver is less constrained (e.g. max number
of scatg elements is a bigger issue with dio)
  - the only alignment to worry about is byte
alignment (some folks would like bit alignment
but you can't please everybody)
  - there is no need for the sg driver to pin user
pages in memory (as there is with direct IO and
mmaped-IO)

 Instead of netlink for scsi commands and transport requests

With a netlink based pass through one might:
  - improve on the SG_IO ioctl and add things like
tags that are currently missing
  - introduce a proper SCSI task management function
pass through (no request queue please)
  - make other pass throughs for SAS: SMP and STP
  - have an alternative to sysfs for various control
functions in a HBA (e.g. in SAS: link and hard
reset) and fetching performance data from a HBA

Apart from how to get data efficiently between the HBA
and the user space, another major issue is the flexibility
of the bind() in s_netlink (storage netlink??).

 For scsi commands could we just use sg io, or is there something special
 about the command you want to send? If you can use sg io for scsi
 commands, maybe for transport level requests (in my example iscsi pdu)
 we could modify something like sg/bsg/block layer scsi_ioctl.c to send
 down transport requests to the classes and encapsulate them in some new
 struct transport_requests or use the existing struct request but do that
 thing people keep taling about using the request/request_queue for
 message passing.

Some SG_IO ioctl users want up to 32 MB in one transaction
and others want their data fast. Many pass through users
view the kernel as an impediment (not so much as the way
as in the way).

Doug Gilbert
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Netlink and user-space buffer pointers

2006-04-20 Thread Mike Christie
James Smart wrote:
 Note: We've transitioned off topic. If what this means is there isn't a
 good
 way except by ioctls (which still isn't easily portable) or system calls,
 then that's ok. Then at least we know the limits and can look at other
 implementation alternatives.
 
 Mike Christie wrote:
 James Smart wrote:
 Mike Christie wrote:
 For the tasks you want to do for the fc class is performance critical?
 No, it should not be.

 If not, you could do what the iscsi class (for the netdev people
 this is
 drivers/scsi/scsi_transport_iscsi.c) does and just suffer a couple
 copies. For iscsi we do this in userspace to send down a login pdu:

 /*
  * xmitbuf is a buffer that is large enough for the iscsi_event,
  * iscsi pdu (hdr_size) and iscsi pdu data (data_size)
  */
 Well, the real difference is that the payload of the message is
 actually
 the payload of the SCSI command or ELS/CT Request. Thus, the payload may

 I am not sure I follow. For iscsi, everything after the iscsi_event
 struct can be the iscsi request that is to be transmitted. The payload
 will not normally be Mbytes but it is not a couple if bytes.
 
 True... For a large read/write - it will eventually total what the i/o
 request size was, and you did have to push it through the socekt.
 What this discussion really comes down to is the difference between
 initiator
 offload and what a target does.
 
 The initiator offloads the full i/o from the users - e.g. send command,
 get response. In the initiator case, the user isn't aware of each and
 every IU that makes up the i/o. As it's on an i/o basis, the LLDD doing
 the offload needs the full buffer sitting and ready. DMA is preferred so
 the buffer doesn't have to be consuming socket/kernel/driver buffers while
 it's pending - plus speed.
 
 In the target case, the target controls each IU and it's size, thus it
 only has to have access to as much buffer space as it wants to push the
 next
 IU. The i/o can be paced by the target. Unfortunately, this is an
 entirely
 different use model than users of a scsi initiator expect, and it won't map
 well into replacing things like our sg_io ioctls.


I am not talking about the target here. For the open-iscsi initiator
that is in mainline that I referecnced in the example we send pdus from
userpsace to the LLD. In the future, initaitors that offload some iscsi
processing and will login from userspace or have userspace monitor the
transport by doing iscsi pings, we need to be able to send these pdus.
And the iscsi pdu cannot be broken up at the iscsi level (they can at
the interconect level though). From the iscsi host level they have to go
out like a scsi command would in that the LLD cannot decide to send out
mutiple pdus for he pdu that userspace sends down.

I do agree with you that targets can break down a scsi command into
multiple transport level packets as it sees fit.


 
 Instead of netlink for scsi commands and transport requests

 For scsi commands could we just use sg io, or is there something special
 about the command you want to send? If you can use sg io for scsi
 commands, maybe for transport level requests (in my example iscsi pdu)
 we could modify something like sg/bsg/block layer scsi_ioctl.c to send
 down transport requests to the classes and encapsulate them in some new
 struct transport_requests or use the existing struct request but do that
 thing people keep taling about using the request/request_queue for
 message passing.
 
 Well - there's 2 parts to this answer:
 
 First : IOCTL's are considered dangerous/bad practice and therefore it
 would

Yeah, i am not trying to kill ioctls. I go where the community goes.
What I am trying to dois just reuse the sg io mapping code so that we do
not end up with sg, st, target, blk scsi_ioctl.c and bsg all doing
similar things.


   be nice to find a replacement mechanism that eliminates them. If that
   mechanism has some of the cool features that netlink does, even better.
   Using sg io, in the manner you indicate, wouldn't remove the ioctl use.
   Note: I have OEMs/users that are very confused about the community's
 statement
   about ioctls. They've heard they are bad, should never be allowed,
 will no
   be longer supported, but yet they are at the heart of DM and sg io and
 other
   subsystems. Other than a grandfathered explanation, they don't
 understand
   why the rules bend for one piece of code but not for another. To them,
 all
   the features are just as critical regardless of whose providing them.
 
 Second: transport level i/o could be done like you suggest, and we've
   prototyped some of this as well. However, there's something very wrong
   about putting block device wrappers and settings around something that
   is not a block device.  In general, it's a heck of a lot of overhead and
   still doesn't solve the real issue - how to portably pass that user
 buffer


I am not talking about putting block device wrappers. This the magic
part and the 

Re: [RFC] Netlink and user-space buffer pointers

2006-04-20 Thread Mike Christie
Mike Christie wrote:
 James Smart wrote:
 Note: We've transitioned off topic. If what this means is there isn't a
 good
 way except by ioctls (which still isn't easily portable) or system calls,
 then that's ok. Then at least we know the limits and can look at other
 implementation alternatives.

 Mike Christie wrote:
 James Smart wrote:
 Mike Christie wrote:
 For the tasks you want to do for the fc class is performance critical?
 No, it should not be.

 If not, you could do what the iscsi class (for the netdev people
 this is
 drivers/scsi/scsi_transport_iscsi.c) does and just suffer a couple
 copies. For iscsi we do this in userspace to send down a login pdu:

 /*
  * xmitbuf is a buffer that is large enough for the iscsi_event,
  * iscsi pdu (hdr_size) and iscsi pdu data (data_size)
  */
 Well, the real difference is that the payload of the message is
 actually
 the payload of the SCSI command or ELS/CT Request. Thus, the payload may
 I am not sure I follow. For iscsi, everything after the iscsi_event
 struct can be the iscsi request that is to be transmitted. The payload
 will not normally be Mbytes but it is not a couple if bytes.
 True... For a large read/write - it will eventually total what the i/o
 request size was, and you did have to push it through the socekt.
 What this discussion really comes down to is the difference between
 initiator
 offload and what a target does.

 The initiator offloads the full i/o from the users - e.g. send command,
 get response. In the initiator case, the user isn't aware of each and
 every IU that makes up the i/o. As it's on an i/o basis, the LLDD doing
 the offload needs the full buffer sitting and ready. DMA is preferred so
 the buffer doesn't have to be consuming socket/kernel/driver buffers while
 it's pending - plus speed.

 In the target case, the target controls each IU and it's size, thus it
 only has to have access to as much buffer space as it wants to push the
 next
 IU. The i/o can be paced by the target. Unfortunately, this is an
 entirely
 different use model than users of a scsi initiator expect, and it won't map
 well into replacing things like our sg_io ioctls.
 
 
 I am not talking about the target here. For the open-iscsi initiator
 that is in mainline that I referecnced in the example we send pdus from
 userpsace to the LLD. In the future, initaitors that offload some iscsi
 processing and will login from userspace or have userspace monitor the
 transport by doing iscsi pings, we need to be able to send these pdus.
 And the iscsi pdu cannot be broken up at the iscsi level (they can at
 the interconect level though). From the iscsi host level they have to go
 out like a scsi command would in that the LLD cannot decide to send out
 mutiple pdus for he pdu that userspace sends down.
 
 I do agree with you that targets can break down a scsi command into
 multiple transport level packets as it sees fit.
 

Oh yeah is

FC IU == iscsi tcp packet
or
FC IU == iscsi pdu
?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/10] [IOAT] Setup the net subsystem as DMA client

2006-04-20 Thread Andrew Grover

Attempts to allocate per-CPU DMA channels

Signed-off-by: Chris Leech [EMAIL PROTECTED]

---

 drivers/dma/Kconfig   |   12 +
 include/linux/netdevice.h |4 ++
 include/net/netdma.h  |   38 
 net/core/dev.c|  104 +

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 0f15e76..30d021d 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -10,6 +10,18 @@ config DMA_ENGINE
  DMA engines offload copy operations from the CPU to dedicated
  hardware, allowing the copies to happen asynchronously.
 
+comment DMA Clients
+
+config NET_DMA
+   bool Network: TCP receive copy offload
+   depends on DMA_ENGINE  NET
+   default y
+   ---help---
+ This enables the use of DMA engines in the network stack to
+ offload receive copy-to-user operations, freeing CPU cycles.
+ Since this is the main user of the DMA engine, it should be enabled;
+ say Y here.
+
 comment DMA Devices
 
 config INTEL_IOATDMA
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 950dc55..7fda35f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -37,6 +37,7 @@
 #include linux/config.h
 #include linux/device.h
 #include linux/percpu.h
+#include linux/dmaengine.h
 
 struct divert_blk;
 struct vlan_group;
@@ -592,6 +593,9 @@ struct softnet_data
struct sk_buff  *completion_queue;
 
struct net_device   backlog_dev;/* Sorry. 8) */
+#ifdef CONFIG_NET_DMA
+   struct dma_chan *net_dma;
+#endif
 };
 
 DECLARE_PER_CPU(struct softnet_data,softnet_data);
diff --git a/include/net/netdma.h b/include/net/netdma.h
new file mode 100644
index 000..cbfe89d
--- /dev/null
+++ b/include/net/netdma.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+#ifndef NETDMA_H
+#define NETDMA_H
+#include linux/config.h
+#ifdef CONFIG_NET_DMA
+#include linux/dmaengine.h
+
+static inline struct dma_chan *get_softnet_dma(void)
+{
+   struct dma_chan *chan;
+   rcu_read_lock();
+   chan = rcu_dereference(__get_cpu_var(softnet_data.net_dma));
+   if (chan)
+   dma_chan_get(chan);
+   rcu_read_unlock();
+   return chan;
+}
+#endif /* CONFIG_NET_DMA */
+#endif /* NETDMA_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index a3ab11f..ffd3d6d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -115,6 +115,7 @@
 #include net/iw_handler.h
 #include asm/current.h
 #include linux/audit.h
+#include linux/dmaengine.h
 
 /*
  * The list of packet types we will receive (as opposed to discard)
@@ -148,6 +149,12 @@ static DEFINE_SPINLOCK(ptype_lock);
 static struct list_head ptype_base[16];/* 16 way hashed list */
 static struct list_head ptype_all; /* Taps */
 
+#ifdef CONFIG_NET_DMA
+static struct dma_client *net_dma_client;
+static unsigned int net_dma_count;
+static spinlock_t net_dma_event_lock;
+#endif
+
 /*
  * The @dev_base list is protected by @dev_base_lock and the rtln
  * semaphore.
@@ -1780,6 +1787,19 @@ static void net_rx_action(struct softirq
}
}
 out:
+#ifdef CONFIG_NET_DMA
+   /*
+* There may not be any more sk_buffs coming right now, so push
+* any pending DMA copies to hardware
+*/
+   if (net_dma_client) {
+   struct dma_chan *chan;
+   rcu_read_lock();
+   list_for_each_entry_rcu(chan, net_dma_client-channels, 
client_node)
+   dma_async_memcpy_issue_pending(chan);
+   rcu_read_unlock();
+   }
+#endif
local_irq_enable();
return;
 
@@ -3243,6 +3263,88 @@ static int dev_cpu_callback(struct notif
 }
 #endif /* CONFIG_HOTPLUG_CPU */
 
+#ifdef CONFIG_NET_DMA
+/**
+ * net_dma_rebalance -
+ * This is called when the number of channels allocated to the net_dma_client
+ * changes.  The net_dma_client tries to have one DMA channel per CPU.
+ */
+static void net_dma_rebalance(void)
+{
+   unsigned int cpu, i, n;
+   struct dma_chan *chan;
+
+   

[PATCH 3/10] [IOAT] Driver for the I/OAT DMA engine part 2

2006-04-20 Thread Andrew Grover

Adds a new ioatdma driver, ioatdma.c

Signed-off-by: Chris Leech [EMAIL PROTECTED]

---

 drivers/dma/ioatdma.c   |  805 +++

diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
new file mode 100644
index 000..ffe47dd
--- /dev/null
+++ b/drivers/dma/ioatdma.c
@@ -0,0 +1,805 @@
+/*
+ * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+/*
+ * This driver supports an Intel I/OAT DMA engine, which does asynchronous
+ * copy operations.
+ */
+
+#include linux/init.h
+#include linux/module.h
+#include linux/pci.h
+#include linux/interrupt.h
+#include linux/dmaengine.h
+#include linux/delay.h
+#include ioatdma.h
+#include ioatdma_io.h
+#include ioatdma_registers.h
+#include ioatdma_hw.h
+
+#define to_ioat_chan(chan) container_of(chan, struct ioat_dma_chan, common)
+#define to_ioat_device(dev) container_of(dev, struct ioat_device, common)
+#define to_ioat_desc(lh) container_of(lh, struct ioat_desc_sw, node)
+
+/* internal functions */
+static int __devinit ioat_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent);
+static void __devexit ioat_remove(struct pci_dev *pdev);
+
+static int enumerate_dma_channels(struct ioat_device *device)
+{
+   u8 xfercap_scale;
+   u32 xfercap;
+   int i;
+   struct ioat_dma_chan *ioat_chan;
+
+   device-common.chancnt = ioatdma_read8(device, IOAT_CHANCNT_OFFSET);
+   xfercap_scale = ioatdma_read8(device, IOAT_XFERCAP_OFFSET);
+   xfercap = (xfercap_scale == 0 ? -1 : (1UL  xfercap_scale));
+
+   for (i = 0; i  device-common.chancnt; i++) {
+   ioat_chan = kzalloc(sizeof(*ioat_chan), GFP_KERNEL);
+   if (!ioat_chan) {
+   device-common.chancnt = i;
+   break;
+   }
+
+   ioat_chan-device = device;
+   ioat_chan-reg_base = device-reg_base + (0x80 * (i + 1));
+   ioat_chan-xfercap = xfercap;
+   spin_lock_init(ioat_chan-cleanup_lock);
+   spin_lock_init(ioat_chan-desc_lock);
+   INIT_LIST_HEAD(ioat_chan-free_desc);
+   INIT_LIST_HEAD(ioat_chan-used_desc);
+   /* This should be made common somewhere in dmaengine.c */
+   ioat_chan-common.device = device-common;
+   ioat_chan-common.client = NULL;
+   list_add_tail(ioat_chan-common.device_node,
+ device-common.channels);
+   }
+   return device-common.chancnt;
+}
+
+static struct ioat_desc_sw *ioat_dma_alloc_descriptor(struct ioat_dma_chan 
*ioat_chan, int flags)
+{
+   struct ioat_dma_descriptor *desc;
+   struct ioat_desc_sw *desc_sw;
+   struct ioat_device *ioat_device;
+   dma_addr_t phys;
+
+   ioat_device = to_ioat_device(ioat_chan-common.device);
+   desc = pci_pool_alloc(ioat_device-dma_pool, flags, phys);
+   if (unlikely(!desc))
+   return NULL;
+
+   desc_sw = kzalloc(sizeof(*desc_sw), flags);
+   if (unlikely(!desc_sw)) {
+   pci_pool_free(ioat_device-dma_pool, desc, phys);
+   return NULL;
+   }
+
+   memset(desc, 0, sizeof(*desc));
+   desc_sw-hw = desc;
+   desc_sw-phys = phys;
+
+   return desc_sw;
+}
+
+#define INITIAL_IOAT_DESC_COUNT 128
+
+static void ioat_start_null_desc(struct ioat_dma_chan *ioat_chan);
+
+/* returns the actual number of allocated descriptors */
+static int ioat_dma_alloc_chan_resources(struct dma_chan *chan)
+{
+   struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+   struct ioat_desc_sw *desc = NULL;
+   u16 chanctrl;
+   u32 chanerr;
+   int i;
+
+   /*
+* In-use bit automatically set by reading chanctrl
+* If 0, we got it, if 1, someone else did
+*/
+   chanctrl = ioatdma_chan_read16(ioat_chan, IOAT_CHANCTRL_OFFSET);
+   if (chanctrl  IOAT_CHANCTRL_CHANNEL_IN_USE)
+   return -EBUSY;
+
+/* Setup register to interrupt and write completion status on error */
+   chanctrl = IOAT_CHANCTRL_CHANNEL_IN_USE |
+   IOAT_CHANCTRL_ERR_INT_EN 

[PATCH 7/10] [IOAT] cleanup_rbuf - tcp_cleanup_rbuf and make static

2006-04-20 Thread Andrew Grover

Needed to be able to call tcp_cleanup_rbuf in tcp_input.c for I/OAT

Signed-off-by: Chris Leech [EMAIL PROTECTED]

---

 include/net/tcp.h |2 ++
 net/ipv4/tcp.c|   10 +-

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 54e4367..ca5bdaf 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -294,6 +294,8 @@ extern int  tcp_rcv_established(struct 
 
 extern voidtcp_rcv_space_adjust(struct sock *sk);
 
+extern voidtcp_cleanup_rbuf(struct sock *sk, int copied);
+
 extern int tcp_twsk_unique(struct sock *sk,
struct sock *sktw, void *twp);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 87f68e7..b10f78c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -937,7 +937,7 @@ static int tcp_recv_urg(struct sock *sk,
  * calculation of whether or not we must ACK for the sake of
  * a window update.
  */
-static void cleanup_rbuf(struct sock *sk, int copied)
+void tcp_cleanup_rbuf(struct sock *sk, int copied)
 {
struct tcp_sock *tp = tcp_sk(sk);
int time_to_ack = 0;
@@ -1086,7 +1086,7 @@ int tcp_read_sock(struct sock *sk, read_
 
/* Clean up data we have read: This will do ACK frames. */
if (copied)
-   cleanup_rbuf(sk, copied);
+   tcp_cleanup_rbuf(sk, copied);
return copied;
 }
 
@@ -1220,7 +1220,7 @@ int tcp_recvmsg(struct kiocb *iocb, stru
}
}
 
-   cleanup_rbuf(sk, copied);
+   tcp_cleanup_rbuf(sk, copied);
 
if (!sysctl_tcp_low_latency  tp-ucopy.task == user_recv) {
/* Install new reader */
@@ -1391,7 +1391,7 @@ skip_copy:
 */
 
/* Clean up data we have read: This will do ACK frames. */
-   cleanup_rbuf(sk, copied);
+   tcp_cleanup_rbuf(sk, copied);
 
TCP_CHECK_TIMER(sk);
release_sock(sk);
@@ -1853,7 +1853,7 @@ static int do_tcp_setsockopt(struct sock
(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT) 
inet_csk_ack_scheduled(sk)) {
icsk-icsk_ack.pending |= ICSK_ACK_PUSHED;
-   cleanup_rbuf(sk, 1);
+   tcp_cleanup_rbuf(sk, 1);
if (!(val  1))
icsk-icsk_ack.pingpong = 1;
}
-- 
1.2.6



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/10] [IOAT] Add sysctl to tuning IOAT offloaded IO threshold

2006-04-20 Thread Andrew Grover

Any socket recv of less than this ammount will not be offloaded

Signed-off-by: Chris Leech [EMAIL PROTECTED]

---

 include/linux/sysctl.h |1 +
 include/net/tcp.h  |1 +
 net/core/user_dma.c|4 
 net/ipv4/sysctl_net_ipv4.c |   10 ++

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 76eaeff..cd9e7c0 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -403,6 +403,7 @@ enum
NET_TCP_MTU_PROBING=113,
NET_TCP_BASE_MSS=114,
NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS=115,
+   NET_TCP_DMA_COPYBREAK=116,
 };
 
 enum {
diff --git a/include/net/tcp.h b/include/net/tcp.h
index ca5bdaf..2e6fdef 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -219,6 +219,7 @@ extern int sysctl_tcp_adv_win_scale;
 extern int sysctl_tcp_tw_reuse;
 extern int sysctl_tcp_frto;
 extern int sysctl_tcp_low_latency;
+extern int sysctl_tcp_dma_copybreak;
 extern int sysctl_tcp_nometrics_save;
 extern int sysctl_tcp_moderate_rcvbuf;
 extern int sysctl_tcp_tso_win_divisor;
diff --git a/net/core/user_dma.c b/net/core/user_dma.c
index ec177ef..642a3f3 100644
--- a/net/core/user_dma.c
+++ b/net/core/user_dma.c
@@ -33,6 +33,10 @@
 
 #ifdef CONFIG_NET_DMA
 
+#define NET_DMA_DEFAULT_COPYBREAK 1024
+
+int sysctl_tcp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK;
+
 /**
  * dma_skb_copy_datagram_iovec - Copy a datagram to an iovec.
  * @skb - buffer to copy
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 6b6c3ad..6a6aa53 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -688,6 +688,16 @@ ctl_table ipv4_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec
},
+#ifdef CONFIG_NET_DMA
+   {
+   .ctl_name   = NET_TCP_DMA_COPYBREAK,
+   .procname   = tcp_dma_copybreak,
+   .data   = sysctl_tcp_dma_copybreak,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec
+   },
+#endif
{ .ctl_name = 0 }
 };
 
-- 
1.2.6



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/10] [IOAT] Actual changes to the net stack to use IOAT

2006-04-20 Thread Andrew Grover

Locks down user pages and sets up for DMA in tcp_recvmsg, then calls
dma_async_try_early_copy in tcp_v4_do_rcv

Signed-off-by: Chris Leech [EMAIL PROTECTED]

---

 net/ipv4/tcp.c   |  101 --
 net/ipv4/tcp_input.c |   74 +
 net/ipv4/tcp_ipv4.c  |   18 -
 net/ipv6/tcp_ipv6.c  |   12 +-

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 2346539..8be8d69 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -263,7 +263,7 @@
 #include net/tcp.h
 #include net/xfrm.h
 #include net/ip.h
-
+#include net/netdma.h
 
 #include asm/uaccess.h
 #include asm/ioctls.h
@@ -1110,6 +1110,7 @@ int tcp_recvmsg(struct kiocb *iocb, stru
int target; /* Read at least this many bytes */
long timeo;
struct task_struct *user_recv = NULL;
+   int copied_early = 0;
 
lock_sock(sk);
 
@@ -1133,6 +1134,15 @@ int tcp_recvmsg(struct kiocb *iocb, stru
 
target = sock_rcvlowat(sk, flags  MSG_WAITALL, len);
 
+#ifdef CONFIG_NET_DMA
+   tp-ucopy.dma_chan = NULL;
+   preempt_disable();
+   if ((len  sysctl_tcp_dma_copybreak)  !(flags  MSG_PEEK) 
+   !sysctl_tcp_low_latency  __get_cpu_var(softnet_data.net_dma))
+   tp-ucopy.pinned_list = dma_pin_iovec_pages(msg-msg_iov, len);
+   preempt_enable_no_resched();
+#endif
+
do {
struct sk_buff *skb;
u32 offset;
@@ -1274,6 +1284,10 @@ int tcp_recvmsg(struct kiocb *iocb, stru
} else
sk_wait_data(sk, timeo);
 
+#ifdef CONFIG_NET_DMA
+   tp-ucopy.wakeup = 0;
+#endif
+
if (user_recv) {
int chunk;
 
@@ -1329,13 +1343,39 @@ do_prequeue:
}
 
if (!(flags  MSG_TRUNC)) {
-   err = skb_copy_datagram_iovec(skb, offset,
- msg-msg_iov, used);
-   if (err) {
-   /* Exception. Bailout! */
-   if (!copied)
-   copied = -EFAULT;
-   break;
+#ifdef CONFIG_NET_DMA
+   if (!tp-ucopy.dma_chan  tp-ucopy.pinned_list)
+   tp-ucopy.dma_chan = get_softnet_dma();
+
+   if (tp-ucopy.dma_chan) {
+   tp-ucopy.dma_cookie = 
dma_skb_copy_datagram_iovec(
+   tp-ucopy.dma_chan, skb, offset,
+   msg-msg_iov, used,
+   tp-ucopy.pinned_list);
+
+   if (tp-ucopy.dma_cookie  0) {
+
+   printk(KERN_ALERT dma_cookie  0\n);
+
+   /* Exception. Bailout! */
+   if (!copied)
+   copied = -EFAULT;
+   break;
+   }
+   if ((offset + used) == skb-len)
+   copied_early = 1;
+
+   } else
+#endif
+   {
+   err = skb_copy_datagram_iovec(skb, offset,
+   msg-msg_iov, used);
+   if (err) {
+   /* Exception. Bailout! */
+   if (!copied)
+   copied = -EFAULT;
+   break;
+   }
}
}
 
@@ -1355,15 +1395,19 @@ skip_copy:
 
if (skb-h.th-fin)
goto found_fin_ok;
-   if (!(flags  MSG_PEEK))
-   sk_eat_skb(sk, skb, 0);
+   if (!(flags  MSG_PEEK)) {
+   sk_eat_skb(sk, skb, copied_early);
+   copied_early = 0;
+   }
continue;
 
found_fin_ok:
/* Process the FIN. */
++*seq;
-   if (!(flags  MSG_PEEK))
-   sk_eat_skb(sk, skb, 0);
+   if (!(flags  MSG_PEEK)) {
+   sk_eat_skb(sk, skb, copied_early);
+   copied_early = 0;
+   }
break;
} while (len  0);
 
@@ -1386,6 +1430,36 @@ skip_copy:
tp-ucopy.len = 0;
}
 
+#ifdef CONFIG_NET_DMA
+   if (tp-ucopy.dma_chan) {
+   struct sk_buff *skb;
+   dma_cookie_t done, used;
+
+   dma_async_memcpy_issue_pending(tp-ucopy.dma_chan);
+
+   while (dma_async_memcpy_complete(tp-ucopy.dma_chan,
+

[PATCH 1/10] [IOAT] DMA memcpy subsystem

2006-04-20 Thread Andrew Grover

Provides an API for offloading memory copies to DMA devices

Signed-off-by: Chris Leech [EMAIL PROTECTED]

---

 drivers/Kconfig   |2 
 drivers/Makefile  |1 
 drivers/dma/Kconfig   |   13 +
 drivers/dma/Makefile  |1 
 drivers/dma/dmaengine.c   |  405 +
 include/linux/dmaengine.h |  337 +

diff --git a/drivers/Kconfig b/drivers/Kconfig
index 9f5c0da..f89ac05 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -72,4 +72,6 @@ source drivers/edac/Kconfig
 
 source drivers/rtc/Kconfig
 
+source drivers/dma/Kconfig
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 4249552..9b808a6 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -74,3 +74,4 @@ obj-$(CONFIG_SGI_SN)  += sn/
 obj-y  += firmware/
 obj-$(CONFIG_CRYPTO)   += crypto/
 obj-$(CONFIG_SUPERH)   += sh/
+obj-$(CONFIG_DMA_ENGINE)   += dma/
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
new file mode 100644
index 000..f9ac4bc
--- /dev/null
+++ b/drivers/dma/Kconfig
@@ -0,0 +1,13 @@
+#
+# DMA engine configuration
+#
+
+menu DMA Engine support
+
+config DMA_ENGINE
+   bool Support for DMA engines
+   ---help---
+ DMA engines offload copy operations from the CPU to dedicated
+ hardware, allowing the copies to happen asynchronously.
+
+endmenu
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
new file mode 100644
index 000..10b7391
--- /dev/null
+++ b/drivers/dma/Makefile
@@ -0,0 +1 @@
+obj-y += dmaengine.o
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
new file mode 100644
index 000..683456a
--- /dev/null
+++ b/drivers/dma/dmaengine.c
@@ -0,0 +1,405 @@
+/*
+ * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+/*
+ * This code implements the DMA subsystem. It provides a HW-neutral interface
+ * for other kernel code to use asynchronous memory copy capabilities,
+ * if present, and allows different HW DMA drivers to register as providing
+ * this capability.
+ *
+ * Due to the fact we are accelerating what is already a relatively fast
+ * operation, the code goes to great lengths to avoid additional overhead,
+ * such as locking.
+ *
+ * LOCKING:
+ *
+ * The subsystem keeps two global lists, dma_device_list and dma_client_list.
+ * Both of these are protected by a spinlock, dma_list_lock.
+ *
+ * Each device has a channels list, which runs unlocked but is never modified
+ * once the device is registered, it's just setup by the driver.
+ *
+ * Each client has a channels list, it's only modified under the client-lock
+ * and in an RCU callback, so it's safe to read under rcu_read_lock().
+ *
+ * Each device has a kref, which is initialized to 1 when the device is
+ * registered. A kref_put is done for each class_device registered.  When the
+ * class_device is released, the coresponding kref_put is done in the release
+ * method. Every time one of the device's channels is allocated to a client,
+ * a kref_get occurs.  When the channel is freed, the coresponding kref_put
+ * happens. The device's release function does a completion, so
+ * unregister_device does a remove event, class_device_unregister, a kref_put
+ * for the first reference, then waits on the completion for all other
+ * references to finish.
+ *
+ * Each channel has an open-coded implementation of Rusty Russell's bigref,
+ * with a kref and a per_cpu local_t.  A single reference is set when on an
+ * ADDED event, and removed with a REMOVE event.  Net DMA client takes an
+ * extra reference per outstanding transaction.  The relase function does a
+ * kref_put on the device. -ChrisL
+ */
+
+#include linux/init.h
+#include linux/module.h
+#include linux/device.h
+#include linux/dmaengine.h
+#include linux/hardirq.h
+#include linux/spinlock.h
+#include linux/percpu.h
+#include linux/rcupdate.h
+
+static DEFINE_SPINLOCK(dma_list_lock);
+static LIST_HEAD(dma_device_list);
+static LIST_HEAD(dma_client_list);
+
+/* --- sysfs implementation --- */
+
+static ssize_t show_memcpy_count(struct class_device 

[PATCH 8/10] [IOAT] Make sk_eat_skb() IOAT-aware

2006-04-20 Thread Andrew Grover

Add an extra argument to sk_eat_skb, and make it move early copied packets
to the async_wait_queue instead of freeing them.
Signed-off-by: Chris Leech [EMAIL PROTECTED]

---

 include/net/sock.h |   13 -
 net/dccp/proto.c   |4 ++--
 net/ipv4/tcp.c |8 
 net/llc/af_llc.c   |2 +-

diff --git a/include/net/sock.h b/include/net/sock.h
index 190809c..e3723b6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1272,11 +1272,22 @@ sock_recv_timestamp(struct msghdr *msg, 
  * This routine must be called with interrupts disabled or with the socket
  * locked so that the sk_buff queue operation is ok.
 */
-static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
+#ifdef CONFIG_NET_DMA
+static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb, int 
copied_early)
+{
+   __skb_unlink(skb, sk-sk_receive_queue);
+   if (!copied_early)
+   __kfree_skb(skb);
+   else
+   __skb_queue_tail(sk-sk_async_wait_queue, skb);
+}
+#else
+static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb, int 
copied_early)
 {
__skb_unlink(skb, sk-sk_receive_queue);
__kfree_skb(skb);
 }
+#endif
 
 extern void sock_enable_timestamp(struct sock *sk);
 extern int sock_get_timestamp(struct sock *, struct timeval __user *);
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 1ff7328..35d7dfd 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -719,7 +719,7 @@ int dccp_recvmsg(struct kiocb *iocb, str
}
dccp_pr_debug(packet_type=%s\n,
  dccp_packet_name(dh-dccph_type));
-   sk_eat_skb(sk, skb);
+   sk_eat_skb(sk, skb, 0);
 verify_sock_status:
if (sock_flag(sk, SOCK_DONE)) {
len = 0;
@@ -773,7 +773,7 @@ verify_sock_status:
}
found_fin_ok:
if (!(flags  MSG_PEEK))
-   sk_eat_skb(sk, skb);
+   sk_eat_skb(sk, skb, 0);
break;
} while (1);
 out:
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b10f78c..2346539 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1072,11 +1072,11 @@ int tcp_read_sock(struct sock *sk, read_
break;
}
if (skb-h.th-fin) {
-   sk_eat_skb(sk, skb);
+   sk_eat_skb(sk, skb, 0);
++seq;
break;
}
-   sk_eat_skb(sk, skb);
+   sk_eat_skb(sk, skb, 0);
if (!desc-count)
break;
}
@@ -1356,14 +1356,14 @@ skip_copy:
if (skb-h.th-fin)
goto found_fin_ok;
if (!(flags  MSG_PEEK))
-   sk_eat_skb(sk, skb);
+   sk_eat_skb(sk, skb, 0);
continue;
 
found_fin_ok:
/* Process the FIN. */
++*seq;
if (!(flags  MSG_PEEK))
-   sk_eat_skb(sk, skb);
+   sk_eat_skb(sk, skb, 0);
break;
} while (len  0);
 
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 5a04db7..7465170 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -789,7 +789,7 @@ static int llc_ui_recvmsg(struct kiocb *
continue;
 
if (!(flags  MSG_PEEK)) {
-   sk_eat_skb(sk, skb);
+   sk_eat_skb(sk, skb, 0);
*seq = 0;
}
} while (len  0);
-- 
1.2.6



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/10] [IOAT] Struct changes for TCP recv offload to IOAT

2006-04-20 Thread Andrew Grover

Adds an async_wait_queue and some additional fields to tcp_sock, and a
dma_cookie_t to sk_buff.

Signed-off-by: Chris Leech [EMAIL PROTECTED]

---

 include/linux/skbuff.h |4 
 include/linux/tcp.h|8 
 include/net/sock.h |2 ++
 include/net/tcp.h  |7 +++
 net/core/sock.c|6 ++

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 613b951..76861a8 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -29,6 +29,7 @@
 #include linux/net.h
 #include linux/textsearch.h
 #include net/checksum.h
+#include linux/dmaengine.h
 
 #define HAVE_ALLOC_SKB /* For the drivers to know */
 #define HAVE_ALIGNABLE_SKB /* Ditto 8)*/
@@ -285,6 +286,9 @@ struct sk_buff {
__u16   tc_verd;/* traffic control verdict */
 #endif
 #endif
+#ifdef CONFIG_NET_DMA
+   dma_cookie_tdma_cookie;
+#endif
 
 
/* These elements must be at the end, see alloc_skb() for details.  */
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 542d395..c90daa5 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -18,6 +18,7 @@
 #define _LINUX_TCP_H
 
 #include linux/types.h
+#include linux/dmaengine.h
 #include asm/byteorder.h
 
 struct tcphdr {
@@ -233,6 +234,13 @@ struct tcp_sock {
struct iovec*iov;
int memory;
int len;
+#ifdef CONFIG_NET_DMA
+   /* members for async copy */
+   struct dma_chan *dma_chan;
+   int wakeup;
+   struct dma_pinned_list  *pinned_list;
+   dma_cookie_tdma_cookie;
+#endif
} ucopy;
 
__u32   snd_wl1;/* Sequence for window update   */
diff --git a/include/net/sock.h b/include/net/sock.h
index af2b054..190809c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -132,6 +132,7 @@ struct sock_common {
   *@sk_receive_queue: incoming packets
   *@sk_wmem_alloc: transmit queue bytes committed
   *@sk_write_queue: Packet sending queue
+  *@sk_async_wait_queue: DMA copied packets
   *@sk_omem_alloc: o is option or other
   *@sk_wmem_queued: persistent queue size
   *@sk_forward_alloc: space allocated forward
@@ -205,6 +206,7 @@ struct sock {
atomic_tsk_omem_alloc;
struct sk_buff_head sk_receive_queue;
struct sk_buff_head sk_write_queue;
+   struct sk_buff_head sk_async_wait_queue;
int sk_wmem_queued;
int sk_forward_alloc;
gfp_t   sk_allocation;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9418f4d..54e4367 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -28,6 +28,7 @@
 #include linux/cache.h
 #include linux/percpu.h
 #include linux/skbuff.h
+#include linux/dmaengine.h
 
 #include net/inet_connection_sock.h
 #include net/inet_timewait_sock.h
@@ -820,6 +821,12 @@ static inline void tcp_prequeue_init(str
tp-ucopy.len = 0;
tp-ucopy.memory = 0;
skb_queue_head_init(tp-ucopy.prequeue);
+#ifdef CONFIG_NET_DMA
+   tp-ucopy.dma_chan = NULL;
+   tp-ucopy.wakeup = 0;
+   tp-ucopy.pinned_list = NULL;
+   tp-ucopy.dma_cookie = 0;
+#endif
 }
 
 /* Packet is added to VJ-style prequeue for processing in process
diff --git a/net/core/sock.c b/net/core/sock.c
index a96ea7d..d2acd35 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -818,6 +818,9 @@ struct sock *sk_clone(const struct sock 
atomic_set(newsk-sk_omem_alloc, 0);
skb_queue_head_init(newsk-sk_receive_queue);
skb_queue_head_init(newsk-sk_write_queue);
+#ifdef CONFIG_NET_DMA
+   skb_queue_head_init(newsk-sk_async_wait_queue);
+#endif
 
rwlock_init(newsk-sk_dst_lock);
rwlock_init(newsk-sk_callback_lock);
@@ -1369,6 +1372,9 @@ void sock_init_data(struct socket *sock,
skb_queue_head_init(sk-sk_receive_queue);
skb_queue_head_init(sk-sk_write_queue);
skb_queue_head_init(sk-sk_error_queue);
+#ifdef CONFIG_NET_DMA
+   skb_queue_head_init(sk-sk_async_wait_queue);
+#endif
 
sk-sk_send_head=   NULL;
 
-- 
1.2.6




-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread Andrew Grover
Hi I'm reposting these, originally posted by Chris Leech a few weeks ago.
However, there is an extra part since I broke up one patch that was too 
big for netdev last time into two (patches 2 and 3).

Of course we're always looking for more style improvement comments, but 
more importantly we're posting these to talk about the larger issues 
around I/OAT and this code making it in upstream at some point.

These are also available on the wiki,  
http://linux-net.osdl.org/index.php/I/OAT .

Thanks -- Andy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2a/2] [IOAT] Driver for the I/OAT engine part 2a

2006-04-20 Thread Andrew Grover
patch 2 got blocked due to size, here is the diff in 2 parts. -- Andy


Adds a new ioatdma driver, ioatdma.c

Signed-off-by: Chris Leech [EMAIL PROTECTED]

---

 drivers/dma/ioatdma.c   |  805 +++

diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
new file mode 100644
index 000..ffe47dd
--- /dev/null
+++ b/drivers/dma/ioatdma.c
@@ -0,0 +1,805 @@
+/*
+ * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+/*
+ * This driver supports an Intel I/OAT DMA engine, which does asynchronous
+ * copy operations.
+ */
+
+#include linux/init.h
+#include linux/module.h
+#include linux/pci.h
+#include linux/interrupt.h
+#include linux/dmaengine.h
+#include linux/delay.h
+#include ioatdma.h
+#include ioatdma_io.h
+#include ioatdma_registers.h
+#include ioatdma_hw.h
+
+#define to_ioat_chan(chan) container_of(chan, struct ioat_dma_chan, common)
+#define to_ioat_device(dev) container_of(dev, struct ioat_device, common)
+#define to_ioat_desc(lh) container_of(lh, struct ioat_desc_sw, node)
+
+/* internal functions */
+static int __devinit ioat_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent);
+static void __devexit ioat_remove(struct pci_dev *pdev);
+
+static int enumerate_dma_channels(struct ioat_device *device)
+{
+   u8 xfercap_scale;
+   u32 xfercap;
+   int i;
+   struct ioat_dma_chan *ioat_chan;
+
+   device-common.chancnt = ioatdma_read8(device, IOAT_CHANCNT_OFFSET);
+   xfercap_scale = ioatdma_read8(device, IOAT_XFERCAP_OFFSET);
+   xfercap = (xfercap_scale == 0 ? -1 : (1UL  xfercap_scale));
+
+   for (i = 0; i  device-common.chancnt; i++) {
+   ioat_chan = kzalloc(sizeof(*ioat_chan), GFP_KERNEL);
+   if (!ioat_chan) {
+   device-common.chancnt = i;
+   break;
+   }
+
+   ioat_chan-device = device;
+   ioat_chan-reg_base = device-reg_base + (0x80 * (i + 1));
+   ioat_chan-xfercap = xfercap;
+   spin_lock_init(ioat_chan-cleanup_lock);
+   spin_lock_init(ioat_chan-desc_lock);
+   INIT_LIST_HEAD(ioat_chan-free_desc);
+   INIT_LIST_HEAD(ioat_chan-used_desc);
+   /* This should be made common somewhere in dmaengine.c */
+   ioat_chan-common.device = device-common;
+   ioat_chan-common.client = NULL;
+   list_add_tail(ioat_chan-common.device_node,
+ device-common.channels);
+   }
+   return device-common.chancnt;
+}
+
+static struct ioat_desc_sw *ioat_dma_alloc_descriptor(struct ioat_dma_chan 
*ioat_chan, int flags)
+{
+   struct ioat_dma_descriptor *desc;
+   struct ioat_desc_sw *desc_sw;
+   struct ioat_device *ioat_device;
+   dma_addr_t phys;
+
+   ioat_device = to_ioat_device(ioat_chan-common.device);
+   desc = pci_pool_alloc(ioat_device-dma_pool, flags, phys);
+   if (unlikely(!desc))
+   return NULL;
+
+   desc_sw = kzalloc(sizeof(*desc_sw), flags);
+   if (unlikely(!desc_sw)) {
+   pci_pool_free(ioat_device-dma_pool, desc, phys);
+   return NULL;
+   }
+
+   memset(desc, 0, sizeof(*desc));
+   desc_sw-hw = desc;
+   desc_sw-phys = phys;
+
+   return desc_sw;
+}
+
+#define INITIAL_IOAT_DESC_COUNT 128
+
+static void ioat_start_null_desc(struct ioat_dma_chan *ioat_chan);
+
+/* returns the actual number of allocated descriptors */
+static int ioat_dma_alloc_chan_resources(struct dma_chan *chan)
+{
+   struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+   struct ioat_desc_sw *desc = NULL;
+   u16 chanctrl;
+   u32 chanerr;
+   int i;
+
+   /*
+* In-use bit automatically set by reading chanctrl
+* If 0, we got it, if 1, someone else did
+*/
+   chanctrl = ioatdma_chan_read16(ioat_chan, IOAT_CHANCTRL_OFFSET);
+   if (chanctrl  IOAT_CHANCTRL_CHANNEL_IN_USE)
+   return -EBUSY;
+
+/* Setup register to interrupt and write completion status on error */
+   chanctrl = 

[PATCH 2b/2] [IOAT] Driver for the I/OAT DMA engine

2006-04-20 Thread Andrew Grover
Second half of the ioatdma.c diff, split up to make it past netdev size 
block -- Andy

Adds a new ioatdma driver, ioatdma.c

Signed-off-by: Chris Leech [EMAIL PROTECTED]

---

 drivers/dma/ioatdma.c   |  805 +++

diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
new file mode 100644
index 000..ffe47dd
--- /dev/null
+++ b/drivers/dma/ioatdma.c

[see previous post for first half of file. sorry]

+/**
+ * ioat_dma_memcpy_issue_pending - push potentially unrecognoized appended 
descriptors to hw
+ * @chan: DMA channel handle
+ */
+
+static void ioat_dma_memcpy_issue_pending(struct dma_chan *chan)
+{
+   struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+
+   if (ioat_chan-pending != 0) {
+   ioat_chan-pending = 0;
+   ioatdma_chan_write8(ioat_chan,
+   IOAT_CHANCMD_OFFSET,
+   IOAT_CHANCMD_APPEND);
+   }
+}
+
+static void ioat_dma_memcpy_cleanup(struct ioat_dma_chan *chan)
+{
+   unsigned long phys_complete;
+   struct ioat_desc_sw *desc, *_desc;
+   dma_cookie_t cookie = 0;
+
+   prefetch(chan-completion_virt);
+
+   if (!spin_trylock(chan-cleanup_lock))
+   return;
+
+   /* The completion writeback can happen at any time,
+  so reads by the driver need to be atomic operations
+  The descriptor physical addresses are limited to 32-bits
+  when the CPU can only do a 32-bit mov */
+
+#if (BITS_PER_LONG == 64)
+   phys_complete = chan-completion_virt-full  
IOAT_CHANSTS_COMPLETED_DESCRIPTOR_ADDR;
+#else
+   phys_complete = chan-completion_virt-low  IOAT_LOW_COMPLETION_MASK;
+#endif
+
+   if ((chan-completion_virt-full  IOAT_CHANSTS_DMA_TRANSFER_STATUS) ==
+   IOAT_CHANSTS_DMA_TRANSFER_STATUS_HALTED) {
+   printk(IOAT: Channel halted, chanerr = %x\n,
+   ioatdma_chan_read32(chan, IOAT_CHANERR_OFFSET));
+
+   /* TODO do something to salvage the situation */
+   }
+
+   if (phys_complete == chan-last_completion) {
+   spin_unlock(chan-cleanup_lock);
+   return;
+   }
+
+   spin_lock_bh(chan-desc_lock);
+   list_for_each_entry_safe(desc, _desc, chan-used_desc, node) {
+
+   /*
+* Incoming DMA requests may use multiple descriptors, due to
+* exceeding xfercap, perhaps. If so, only the last one will
+* have a cookie, and require unmapping.
+*/
+   if (desc-cookie) {
+   cookie = desc-cookie;
+
+   /* yes we are unmapping both _page and _single alloc'd
+  regions with unmap_page. Is this *really* that bad?
+   */
+   pci_unmap_page(chan-device-pdev,
+   pci_unmap_addr(desc, dst),
+   pci_unmap_len(desc, dst_len),
+   PCI_DMA_FROMDEVICE);
+   pci_unmap_page(chan-device-pdev,
+   pci_unmap_addr(desc, src),
+   pci_unmap_len(desc, src_len),
+   PCI_DMA_TODEVICE);
+   }
+
+   if (desc-phys != phys_complete) {
+   /* a completed entry, but not the last, so cleanup */
+   list_del(desc-node);
+   list_add_tail(desc-node, chan-free_desc);
+   } else {
+   /* last used desc. Do not remove, so we can append from
+  it, but don't look at it next time, either */
+   desc-cookie = 0;
+
+   /* TODO check status bits? */
+   break;
+   }
+   }
+
+   spin_unlock_bh(chan-desc_lock);
+
+   chan-last_completion = phys_complete;
+   if (cookie != 0)
+   chan-completed_cookie = cookie;
+
+   spin_unlock(chan-cleanup_lock);
+}
+
+/**
+ * ioat_dma_is_complete - poll the status of a IOAT DMA transaction
+ * @chan: IOAT DMA channel handle
+ * @cookie: DMA transaction identifier
+ */
+
+static enum dma_status ioat_dma_is_complete(struct dma_chan *chan, 
dma_cookie_t cookie, dma_cookie_t *done, dma_cookie_t *used)
+{
+   struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+   dma_cookie_t last_used;
+   dma_cookie_t last_complete;
+   enum dma_status ret;
+
+   last_used = chan-cookie;
+   last_complete = ioat_chan-completed_cookie;
+
+   if (done)
+   *done= last_complete;
+   if (used)
+   *used = last_used;
+
+   ret = dma_async_is_complete(cookie, last_complete, last_used);
+   if (ret == DMA_SUCCESS)
+   return ret;
+
+   ioat_dma_memcpy_cleanup(ioat_chan);
+
+   

[2.6 patch] net/802/tr.c: remove an unsed export

2006-04-20 Thread Adrian Bunk
This patch removes the unused EXPORT_SYMBOL(tr_source_route).

(No, the usage in net/llc/llc_output.c can't be modular.)

Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

--- linux-2.6.17-rc1-mm3-full/net/802/tr.c.old  2006-04-20 22:45:07.0 
+0200
+++ linux-2.6.17-rc1-mm3-full/net/802/tr.c  2006-04-20 22:45:18.0 
+0200
@@ -643,6 +643,5 @@
 
 module_init(rif_init);
 
-EXPORT_SYMBOL(tr_source_route);
 EXPORT_SYMBOL(tr_type_trans);
 EXPORT_SYMBOL(alloc_trdev);

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 1/2] s390: remove tty support from ctc network device driver [1/2]

2006-04-20 Thread Jeff Garzik

Frank Pavlic wrote:
Hi jeff, 
after the first shot I sent to you did not apply I

resend  two new patches I've made today to remove tty from ctc network driver.
Please apply 

Thank you ...


applied 1-2 to #upstream (queued for 2.6.18, since 2.6.17 is in -rc)


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] e1000: fix two mispatches

2006-04-20 Thread Jeff Garzik

Kok, Auke wrote:

Hi,

This patch series implements two e100 fixes for an old and new patch mishap.

[1] fix mispatch for media type detect.
[2] fix mismerge skb_put.


These changes are available through git.

git://63.64.152.142/~ahkok/git/netdev-2.6 e1000-7.0.38-k2-fixes


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please pull upstream-fixes branch of wireless-2.6

2006-04-20 Thread Jeff Garzik

John W. Linville wrote:

The following changes since commit 0efd9323f32c137b5cf48bc6582cd08556e7cdfc:
  Linus Torvalds:
Merge branch 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block

are found in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
upstream-fixes


pulled


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/10] [IOAT] Add sysctl to tuning IOAT offloaded IO threshold

2006-04-20 Thread Olof Johansson
Hi,

On Thu, Apr 20, 2006 at 01:50:40PM -0700, Andrew Grover wrote:
 
 Any socket recv of less than this ammount will not be offloaded
[...]
 --- a/net/core/user_dma.c
 +++ b/net/core/user_dma.c
 @@ -33,6 +33,10 @@
  
  #ifdef CONFIG_NET_DMA
  
 +#define NET_DMA_DEFAULT_COPYBREAK 1024
 +
 +int sysctl_tcp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK;
 +

The breakpoint is highly likely to be at different points on various
architectures and platforms depending on what they look like, where in
the system the DMA engine is, how efficient regular memcpy is, etc.

I would like to see it as a config option instead, so it will at least
be possible to tune per-arch (via default config, etc).


-Olof

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix locking in gianfar

2006-04-20 Thread Jeff Garzik

Andy Fleming wrote:

This patch fixes several bugs in the gianfar driver, including a major one
where spinlocks were horribly broken:

* Split gianfar locks into two types: TX and RX
* Made it so gfar_start() now clears RHALT
* Fixed a bug where calling gfar_start_xmit() with interrupts off would
corrupt the interrupt state
* Fixed a bug where a frame could potentially arrive, and never be handled
(if no more frames arrived
* Fixed a bug where the rx_work_limit would never be observed by the rx
completion code
* Fixed a bug where the interrupt handlers were not actually protected by
their spinlocks

Signed-off-by: Andy Fleming [EMAIL PROTECTED]


ACK but failed:


[EMAIL PROTECTED] netdev-2.6]$ git-applymbox /g/tmp/mbox ~/info/signoff.txt
1 patch(es) to process.

Applying 'Fix locking in gianfar'

fatal: corrupt patch at line 19

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread Olof Johansson
On Thu, Apr 20, 2006 at 01:49:16PM -0700, Andrew Grover wrote:
 Hi I'm reposting these, originally posted by Chris Leech a few weeks ago.
 However, there is an extra part since I broke up one patch that was too 
 big for netdev last time into two (patches 2 and 3).
 
 Of course we're always looking for more style improvement comments, but 
 more importantly we're posting these to talk about the larger issues 
 around I/OAT and this code making it in upstream at some point.
 
 These are also available on the wiki,  
 http://linux-net.osdl.org/index.php/I/OAT .

Hi,

Since you didn't provide the current issues in this email, I will copy
and paste them from the wiki page.

I guess the overall question is, how much of this needs to be addressed
in the implementation before merge, and how much should be done when
more drivers (with more features) are merged down the road. It might not
make sense to implement all of it now if the only available public
driver lacks the abilities.   But I'm bringing up the points anyway.

Maybe it could make sense to add a software-based driver for reference,
and for others to play around with.

I would also prefer to see the series clearly split between the DMA
framework and first clients (networking) and the I/OAT driver. Right now
I/OAT and DMA is used interchangeably, especially when describing
the later patches. It might help you in the perception that this is
something unique to the Intel chipsets as well.  :-)

(I have also proposed DMA offload discussions as a topic for the Kernel
Summit. I have kept Chris Leech Cc:d on most of the emails in question. It
should be a good place to get input from other subsystems regarding what
functionality they would like to see provided, etc.)


From the wiki:

 Current issues of concern:

1. Performance improvement may be on too narrow a set of workloads

Maybe from I/OAT and the current client, but the introduction of the
DMA infrastructure opens up for other uses that are not yet possible in
the API. For example, DMA with functions is a very natural extension,
and something that's very common on various platforms (XOR for RAID use,
checksums, encryption).

The API needs to be expanded to cover this by adding function types and
adding them to the channel allocation interface and logic.

2. Limited availability of hardware supporting I/OAT

DMA engines are fairly common, even though I/OAT might not be yet. They
just haven't had a common infrastructure until now.

For people who might want to play with it, a reference software-based
implementation might be useful.

3. Data copied by I/OAT is not cached

This is a I/OAT device limitation and not a global statement of the
DMA infrastructure. Other platforms might be able to prime caches
with the DMA traffic. Hint flags should be added on either the channel
allocation calls, or per-operation calls, depending on where it makes
sense driver/client wise.

4. Intrusiveness of net stack modifications
5. Compatibility with upcoming VJ net channel architecture 

Both of these are outside my scope, so I won't comment on them at this
time.


I would like to add, for longer term:

   * Userspace interfaces:
Are there any plans yet on how to export some of this to userspace? It
might not make full sense for just memcpy due to overheads, but it makes
sense for more advanced dma/offload engines.


-Olof
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bridge: allow full size vlan tagged packets to be bridged

2006-04-20 Thread Stephen Hemminger
The Ethernet bridge code silently drops packets when forwarding a packet
that is too large for the destination interface (as per 802.1d). But it
should allow for VLAN tagged frames.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- bridge.orig/net/bridge/br_forward.c 2006-04-10 16:17:51.0 -0700
+++ bridge/net/bridge/br_forward.c  2006-04-19 13:50:42.0 -0700
@@ -16,6 +16,7 @@
 #include linux/kernel.h
 #include linux/netdevice.h
 #include linux/skbuff.h
+#include linux/if_vlan.h
 #include linux/netfilter_bridge.h
 #include br_private.h
 
@@ -29,10 +30,15 @@
return 1;
 }
 
+static inline unsigned packet_length(const struct sk_buff *skb)
+{
+   return skb-len - (skb-protocol == htons(ETH_P_8021Q) ? VLAN_HLEN : 0);
+}
+
 int br_dev_queue_push_xmit(struct sk_buff *skb)
 {
/* drop mtu oversized packets except tso */
-   if (skb-len  skb-dev-mtu  !skb_shinfo(skb)-tso_size)
+   if (packet_length(skb)  skb-dev-mtu  !skb_shinfo(skb)-tso_size)
kfree_skb(skb);
else {
 #ifdef CONFIG_BRIDGE_NETFILTER
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 breakage in git-netdev-all

2006-04-20 Thread Jeff Garzik

Andrew Morton wrote:

A bunch of e1000 changes just hit Jeff's tree.


Hopefully things are now fixed in git-netdev-all...

Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[git patches] net driver fixes

2006-04-20 Thread Jeff Garzik

Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git

to receive the following updates:

 drivers/net/ne.c   |2 
 drivers/net/wireless/Kconfig   |2 
 drivers/net/wireless/airo.c|   46 +++---
 drivers/net/wireless/atmel.c   |   11 ++
 drivers/net/wireless/bcm43xx/Kconfig   |3 
 drivers/net/wireless/bcm43xx/bcm43xx.h |   17 +++
 drivers/net/wireless/bcm43xx/bcm43xx_debugfs.c |8 -
 drivers/net/wireless/bcm43xx/bcm43xx_dma.c |   13 +-
 drivers/net/wireless/bcm43xx/bcm43xx_main.c|2 
 drivers/net/wireless/bcm43xx/bcm43xx_phy.c |1 
 drivers/net/wireless/bcm43xx/bcm43xx_power.c   |  115 ++---
 drivers/net/wireless/bcm43xx/bcm43xx_power.h   |9 +
 drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c   |  115 ++---
 drivers/net/wireless/bcm43xx/bcm43xx_sysfs.h   |   16 ---
 drivers/net/wireless/bcm43xx/bcm43xx_wx.c  |8 -
 drivers/net/wireless/orinoco.c |2 
 include/net/ieee80211softmac.h |3 
 net/core/dev.c |3 
 net/core/wireless.c|8 +
 net/ieee80211/softmac/Kconfig  |1 
 net/ieee80211/softmac/ieee80211softmac_assoc.c |5 -
 net/ieee80211/softmac/ieee80211softmac_event.c |   40 +++-
 net/ieee80211/softmac/ieee80211softmac_io.c|   18 +++
 net/ieee80211/softmac/ieee80211softmac_scan.c  |2 
 net/ieee80211/softmac/ieee80211softmac_wx.c|   10 ++
 25 files changed, 289 insertions(+), 171 deletions(-)

Adrian Bunk:
  bcm43xx: fix dyn tssi2dbm memleak

Dan Williams:
  wireless/airo: clean up WEXT association and scan events
  wireless/atmel: send WEXT scan completion events

Erik Mouw:
  bcm43xx: iw_priv_args names should be 16 characters

Jean Tourrilhes:
  wext: Fix IWENCODEEXT security permissions
  Revert NET_RADIO Kconfig title change
  wext: Fix RtNetlink ENCODE security permissions

Johannes Berg:
  softmac: fix event sending
  softmac: report when scanning has finished

[EMAIL PROTECTED]:
  softmac: return -EAGAIN from getscan while scanning
  softmac: dont send out packets while scanning
  softmac: handle iw_mode properly

Michael Buesch:
  softmac: fix spinlock recursion on reassoc
  bcm43xx: set trans_start on TX to prevent bogus timeouts
  bcm43xx: fix pctl slowclock limit calculation
  bcm43xx: sysfs code cleanup

Pavel Roskin:
  orinoco: fix truncating commsquality RID with the latest Symbol firmware

Randy Dunlap:
  softmac uses Wiress Ext.
  bcm43xx wireless: fix printk format warnings
  bcm43xx: fix config menu alignment

Sergei Shtylyov:
  NEx000: fix RTL8019AS base address for RBTX4938

diff --git a/drivers/net/ne.c b/drivers/net/ne.c
index 08b218c..93c494b 100644
--- a/drivers/net/ne.c
+++ b/drivers/net/ne.c
@@ -226,7 +226,7 @@ struct net_device * __init ne_probe(int 
netdev_boot_setup_check(dev);
 
 #ifdef CONFIG_TOSHIBA_RBTX4938
-   dev-base_addr = 0x07f20280;
+   dev-base_addr = RBTX4938_RTL_8019_BASE;
dev-irq = RBTX4938_RTL_8019_IRQ;
 #endif
err = do_ne_probe(dev);
diff --git a/drivers/net/wireless/Kconfig b/drivers/net/wireless/Kconfig
index bad09eb..e0874cb 100644
--- a/drivers/net/wireless/Kconfig
+++ b/drivers/net/wireless/Kconfig
@@ -6,7 +6,7 @@ menu Wireless LAN (non-hamradio)
depends on NETDEVICES
 
 config NET_RADIO
-   bool Wireless LAN drivers (non-hamradio)
+   bool Wireless LAN drivers (non-hamradio)  Wireless Extensions
select WIRELESS_EXT
---help---
  Support for wireless LANs and everything having to do with radio,
diff --git a/drivers/net/wireless/airo.c b/drivers/net/wireless/airo.c
index 108d9fe..00764dd 100644
--- a/drivers/net/wireless/airo.c
+++ b/drivers/net/wireless/airo.c
@@ -3139,6 +3139,7 @@ static irqreturn_t airo_interrupt ( int 
}
if ( status  EV_LINK ) {
union iwreq_datawrqu;
+   int scan_forceloss = 0;
/* The link status has changed, if you want to put a
   monitor hook in, do it here.  (Remember that
   interrupts are still disabled!)
@@ -3157,7 +3158,8 @@ static irqreturn_t airo_interrupt ( int 
  code) */
 #define AUTHFAIL 0x0300 /* Authentication failure (low byte is reason
   code) */
-#define ASSOCIATED 0x0400 /* Assocatied */
+#define ASSOCIATED 0x0400 /* Associated */
+#define REASSOCIATED 0x0600 /* Reassociated?  Only on firmware = 5.30.17 */
 #define RC_RESERVED 0 /* Reserved return code */
 #define RC_NOREASON 1 /* Unspecified reason */
 #define RC_AUTHINV 2 /* Previous authentication invalid */
@@ -3174,44 +3176,30 @@ static 

Re: Please pull upstream-fixes branch of wireless-2.6

2006-04-20 Thread Linus Torvalds


On Thu, 20 Apr 2006, Andrew Morton wrote:

 John W. Linville [EMAIL PROTECTED] wrote:
 
  At present, all the branches in wireless-2.6 only pull from linux-2.6.
   I am still pushing (i.e. requesting Jeff's pull) to netdev-2.6,
   if that matters.
  
   Maybe the current wireless-2.6 tree fits into your system better?
 
 Works well, thanks.   I have some patches for you ;)

Well, since Jeff pushed it on to me, if you have patches that fix obvious 
problems and should go in before 2.6.17, you can now push those directly 
to me too ;)

Linus
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Open ethernet hardware specs

2006-04-20 Thread Jeff Garzik


I started a specs section on the linux-net wiki:

http://linux-net.osdl.org/index.php?title=Network-Adapters#Hardware_specifications

If you add to this list, please be SURE of the specification's origin. 
We do not want to link to any fell off the back of a truck specs of 
questionable origin.


Also, janitors, there are more NIC specs at 
http://gkernel.sourceforge.net/specs/ than are listed on the wiki.  What 
I posted is just a starter list.  If someone were to comb through each 
PDF in the /specs/ sub-directories, and make sure it is linked on the 
wiki, I would be grateful.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread Olof Johansson
On Thu, Apr 20, 2006 at 03:14:15PM -0700, Andrew Grover wrote:
 Hah, I was just writing an email covering those. I'll incorporate that
 into this reponse.
 
 On 4/20/06, Olof Johansson [EMAIL PROTECTED] wrote:
  I guess the overall question is, how much of this needs to be addressed
  in the implementation before merge, and how much should be done when
  more drivers (with more features) are merged down the road. It might not
  make sense to implement all of it now if the only available public
  driver lacks the abilities.   But I'm bringing up the points anyway.
 
 Yeah. But I would think maybe this is a reason to merge at least the
 DMA subsystem code, so people with other HW (ARM? I'm still not
 exactly sure) can start trying to write a DMA driver and see where the
 architecture needs to be generalized further.

The interfaces need to evolve as people implement drivers, yes. If it
should be before or after merging can be discussed, but as long as
everyone is on the same page w.r.t. the interfaces being volatile for a
while, merge should be OK.

Having a roadmap of known-todo improvements could be beneficial for
everyone involved, especially if several people start looking at drivers
in parallel. However, so far, (public) activity seems to have been
fairly low.

  I would also prefer to see the series clearly split between the DMA
  framework and first clients (networking) and the I/OAT driver. Right now
  I/OAT and DMA is used interchangeably, especially when describing
  the later patches. It might help you in the perception that this is
  something unique to the Intel chipsets as well.  :-)
 
 I think we have this reasonably well split-out in the patches, but yes
 you're right about how we've been using the terms.

The patches are well split up already, it was mostly that the network
stack changes were marked as I/OAT changes instead of DMA dito.

  1. Performance improvement may be on too narrow a set of workloads
  Maybe from I/OAT and the current client, but the introduction of the
  DMA infrastructure opens up for other uses that are not yet possible in
  the API. For example, DMA with functions is a very natural extension,
  and something that's very common on various platforms (XOR for RAID use,
  checksums, encryption).
 
 Yes. Does this hardware exist in shipping platforms, so we could use
 actual hw to start evaluating the DMA interfaces?

Freescale has it on several processors that are shipping, as far as I
know. Other embedded families likely has them as well (MIPS, ARM), but
I don't know details. The platform I am working on is not yet shipping;
I've just started looking at drivers.

  For people who might want to play with it, a reference software-based
  implementation might be useful.
 
 Yeah I'll ask if I can post the one we have. Or it would be trivial to write.

I was going to look at it myself, but if you have one to post that's
even more trivial. :-)

  3. Data copied by I/OAT is not cached
 
  This is a I/OAT device limitation and not a global statement of the
  DMA infrastructure. Other platforms might be able to prime caches
  with the DMA traffic. Hint flags should be added on either the channel
  allocation calls, or per-operation calls, depending on where it makes
  sense driver/client wise.
 
 Furthermore in our implementation's defense I would say I think the
 smart prefetching that modern CPUs do is helping here.

Yes. It's also not obvious that warming the cache at copy time is always
a gain, it will depends on the receiver and what it does with the data.

 In any case, we
 are seeing performance gains (see benchmarks), which seems to indicate
 this is not an immediate deal-breaker for the technology..

There's always the good old benefit-vs-added-complexity tradeoff, which
I guess is the sore spot right now.

 In
 addition, there may be workloads (file serving? backup?) where we
 could do a skb-page-in-page-cache copy and avoid cache pollution?

Yes, NFS is probably a prime example of where most of the data isn't
looked at; just written to disk. I'm not sure how well-optimized the
receive path is there already w.r.t. avoiding copying though. I don't
remember seeing memcpy and friends being high on the profile when I
looked at SPECsfs last.

  4. Intrusiveness of net stack modifications
  5. Compatibility with upcoming VJ net channel architecture
  Both of these are outside my scope, so I won't comment on them at this
  time.
 
 Yeah I don't have much to say about these except we made the patch as
 unintrusive as we could, and we think there may be ways to use async
 DMA to
 help VJ channels, whenever they arrive.

Not that I know all the tricks they are using, but it seems to me that it
would be hard to both be efficient w.r.t memory use (i.e. more than one
IP packet per page) AND avoid copying once. At least without device-level
flow classification and per-flow (process) buffer rings.



-Olof
-
To unsubscribe from this list: send the line unsubscribe 

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Herbert Xu
On Thu, Apr 20, 2006 at 05:35:00PM +, [EMAIL PROTECTED] wrote:
 
 If the e1000_tx_timeout_task were running concurrently with e1000_down, it 
 seems that they could both attempt to kfree_skb concurrently when running 
 e1000_unmap_and_free_tx_resource.   I googled around to find mention of this 
 anywhere with no luck.  Has this been discussed already?

Yes that's definitely buggy.  There needs to be some form of
synchronisation as the TG3 driver does.  However, to be frank
I'm not too fond of what the TG3 driver does either.  Is there
no better way than an msleep loop?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2.6.16-rc5] S2io: Receive packet classification and steering mechanisms

2006-04-20 Thread Ravinandan Arakali
Andi,
The driver will be polling(listening) to netlink for
any configuration requests. We could release the user
tools but not sure where(in the tree) they would reside.

Thanks,
Ravi

-Original Message-
From: Andi Kleen [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 19, 2006 5:51 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org
Subject: Re: [PATCH 2.6.16-rc5] S2io: Receive packet classification and
steering mechanisms


On Thursday 20 April 2006 00:45, Ravinandan Arakali wrote:
 Andi,
 We would like to explain that this patch is tier-1 of a two
 tiered approach. It implements all the steering
 functionality at driver-only level, and it is fairly Neterion-specific.

That's fine for experiments, but probably not something
that should be in tree.


 The second upcoming submission will add a generic netlink-based
 interface for channel data flow and configuration(including receive
steering
 parameters) on per-channel basis, that will utilize the lower level
 implementation from the current patch.

Will the driver itself listening to netlink?

My feeling would be to teach the stack to use this would require
efficient interfaces and netlink isn't particularly. But if it's just
a glue module outside the driver that would be reasonable as a first
step I guess.

Do you also plan to release user tools to use it?


-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Netlink and user-space buffer pointers

2006-04-20 Thread Andrew Vasquez
On Thu, 20 Apr 2006, James Smart wrote:

 Note: We've transitioned off topic. If what this means is there isn't a 
 good
 way except by ioctls (which still isn't easily portable) or system calls,
 then that's ok. Then at least we know the limits and can look at other
 implementation alternatives.

this topic has been brought-up many times in the past, most recently:

http://thread.gmane.org/gmane.linux.drivers.openib/19525/focus=19525
http://thread.gmane.org/gmane.linux.kernel/387375/focus=387455

where is was suggested to pathscale folks to use some blend of sysfs,
netlink sockets and debugfs:

http://kerneltrap.org/node/4394

 Mike Christie wrote:
 Instead of netlink for scsi commands and transport requests
 
 For scsi commands could we just use sg io, or is there something special
 about the command you want to send? If you can use sg io for scsi
 commands, maybe for transport level requests (in my example iscsi pdu)
 we could modify something like sg/bsg/block layer scsi_ioctl.c to send
 down transport requests to the classes and encapsulate them in some new
 struct transport_requests or use the existing struct request but do that
 thing people keep taling about using the request/request_queue for
 message passing.
 
 Well - there's 2 parts to this answer:
 
 First : IOCTL's are considered dangerous/bad practice and therefore it would
   be nice to find a replacement mechanism that eliminates them. If that
   mechanism has some of the cool features that netlink does, even better.
   Using sg io, in the manner you indicate, wouldn't remove the ioctl use.
   Note: I have OEMs/users that are very confused about the community's 
   statement
   about ioctls. They've heard they are bad, should never be allowed, will no
   be longer supported, but yet they are at the heart of DM and sg io and 
   other
   subsystems. Other than a grandfathered explanation, they don't 
   understand
   why the rules bend for one piece of code but not for another. To them, all
   the features are just as critical regardless of whose providing them.

I believe it to be the same for most hardware-vendor's customers...

 Second: transport level i/o could be done like you suggest, and we've
   prototyped some of this as well. However, there's something very wrong
   about putting block device wrappers and settings around something that
   is not a block device.

Eeww...  no wrappers.  Your netlink prototypes certainly get FC-
transport further along, but would also be nice if there could be some
subsystem consensus on *the* interface.

I honestly don't know which interface is *best*, but from a HBA
vendors perspective managing per-request locally allocated memory is
undesirable.

Thanks,
av
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Herbert Xu
On Fri, Apr 21, 2006 at 09:36:31AM +1000, herbert wrote:
 
 Yes that's definitely buggy.  There needs to be some form of
 synchronisation as the TG3 driver does.  However, to be frank
 I'm not too fond of what the TG3 driver does either.  Is there
 no better way than an msleep loop?

Actually TG3 is buggy too.  If the reset task is scheduled but
isn't running yet there is no synchronisation here to prevent the
reset task from running after tg3_close releases the tp lock.

It needs to kill the reset task and make sure it doesn't get
rescheduled by someone else.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cannot receive multicast packets

2006-04-20 Thread David Stevens
I've run your test program and it receives fine for me.

I note that the source address is not on the same subnet as
(any of) the receiver's addresses. Are the packets being
routed? The default multicasting TTL is 1, though I don't
know if it'll be checked or dropped on the receiver, seeing
as we aren't forwarding it.

Also, you might want to run netstat -s to see if any of the
drop counters are being incremented (e.g., checksum error).

Finally, I'm assuming you don't have any firewall rules that
are matching, right?

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Michael Chan
On Fri, 2006-04-21 at 09:51 +1000, Herbert Xu wrote:

 
 Actually TG3 is buggy too.  If the reset task is scheduled but
 isn't running yet there is no synchronisation here to prevent the
 reset task from running after tg3_close releases the tp lock.
 

If we're in tg3_close() and the reset task isn't running yet, tg3_close
() will proceed. However, when the reset task finally runs, it will see
that netif_running() is zero and will just return.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread David S. Miller
From: Andrew Grover [EMAIL PROTECTED]
Date: Thu, 20 Apr 2006 15:14:15 -0700

 First obviously it's a technology for RX CPU improvement so there's no
 benefit on TX workloads. Second it depends on there being buffers to
 copy the data into *before* the data arrives. This happens to be the
 case for benchmarks like netperf and Chariot, but real apps using
 poll/select wouldn't see a benefit,  Just laying the cards out here.
 BUT we are seeing very good CPU savings on some workloads, so for
 those apps (and if select/poll apps could make use of a
 yet-to-be-implemented async net interface) it would be a win.
 
 I don't know what the breakdown is of apps doing blocking reads vs.
 waiting, does anyone know?

All the bandwidth benchmarks tend to block, real world servers (and
most clients to some extent) tend to use non-blocking reads and
poll/select except in some very limited cases and designs doing
something like 1 thread per connection.

This is an issue for the TCP prequeue and as a consequence VJ's net
channel ideas.  We need something to wakeup some context in order to
push channel data.

All the net channel stuff really wants is an execution context to
run the TCP stack outside of software interrupts.  I/O AT wants
something similar.

For net channels the probably best thing to do is to just queue
to the socket's netchannel, and mark poll state appropriately
and just wait for the thread to get back into recvmsg() to run
the queue.  So I think net channels can be handled in all cases
and application I/O models.

For I/O AT you'd really want to get the DMA engine going as soon
as you had those packets, but I do not see a clean and reliable way
to determine the target pages before the app gets back to recvmsg().

I/O AT really expects a lot of things to be in place in order for it
to function at all.  And sadly, that set of requirements isn't
actually very common outside of benchmarking tools and a few
uncommonly designed servers.  Even a web browser does non-blocking
reads and poll().
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread David S. Miller
From: Olof Johansson [EMAIL PROTECTED]
Date: Thu, 20 Apr 2006 18:33:43 -0500

 On Thu, Apr 20, 2006 at 03:14:15PM -0700, Andrew Grover wrote:
  In
  addition, there may be workloads (file serving? backup?) where we
  could do a skb-page-in-page-cache copy and avoid cache pollution?
 
 Yes, NFS is probably a prime example of where most of the data isn't
 looked at; just written to disk. I'm not sure how well-optimized the
 receive path is there already w.r.t. avoiding copying though. I don't
 remember seeing memcpy and friends being high on the profile when I
 looked at SPECsfs last.

If that makes sense then the cpu copy can be made to use non-temporal
stores.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cannot receive multicast packets

2006-04-20 Thread Andrew Athan


David:

Thank you for taking the time to respond.

The packets are arriving via a switched network composed of Cisco 
devices in PIM dense mode.  The packets pass through several switch 
hops, but no routing hops that have been documented to me.  I did not 
think the source IP was relevant to the matching code in linux, since 
there are no source squelching socket options.


There are no firewall rules active on this machine, and the packets are 
definitely visible at the interface (see tcpdump output in my email).


I am going to try upgrading the kernel, and turning off the multicast 
router kernel options as a next step.  But if you have any other ideas 
at all, I'm all ears.


This seems too much like Mr. Murphy's in the room.

A.

David Stevens wrote:

I've run your test program and it receives fine for me.

I note that the source address is not on the same subnet as
(any of) the receiver's addresses. Are the packets being
routed? The default multicasting TTL is 1, though I don't
know if it'll be checked or dropped on the receiver, seeing
as we aren't forwarding it.

Also, you might want to run netstat -s to see if any of the
drop counters are being incremented (e.g., checksum error).

Finally, I'm assuming you don't have any firewall rules that
are matching, right?

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread Rick Jones

Unfortunately, many benchmarks just do raw bandwidth tests sending to
a receiver that just doesn't even look at the data.  They just return
from recvmsg() and loop back into it.  This is not what applications
using networking actually do, so it's important to make sure we look
intelligently at any benchmarks done and do not fall into the trap of
saying even without cache warming it made things faster when in fact
the tested receiver did not touch the data at all so was a false test.


FWIW, netperf can be configured to access the buffers it gives to send() 
or gets from recv().  A ./configure --enable-dirty in TOT:


http://www.netperf.org/svn/netperf2/trunk

will enable two global options:

 -k dirty,clean # bytes to dirty, bytes to read clean on netperf side

 -K dirty,clean # as above, on netserver side.

And in such a netperf the test banner will include the string dirty 
data (alas the default output will not say how much :)


In say a TCP_STREAM test -k will affect what is done with a buffer 
before send() is called, and -K will affect what is done with a buffer 
_before_ recv() is called with that buffer.


-k N will cause the first N bytes of the buffer to be dirtied, and the 
next N bytes to be read clean


-k N, will cause the first N bytes of the buffer to be dirtied

-k ,N will cause the first N bytes of the buffer to be read clean

-k M,N will cause the first M bytes to be dirtied, the next N bytes to 
be read clean


Actually, that brings-up a question - presently, and for reasons that 
are lost to me in the mists of time - netperf will access the buffer 
before it calls recv().  I'm wondering if that should be changed to an 
access of the buffer after it calls recv()?


And I suspect related to all this is whether or not one should alter the 
size of the buffer ring being used by netperf, which by default is the 
SO_*BUF size divided by the send_size (or recv_size) plus one buffers - 
the -W option can control that.


rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread Rick Jones

David S. Miller wrote:

From: Andrew Grover [EMAIL PROTECTED]
Date: Thu, 20 Apr 2006 15:14:15 -0700



First obviously it's a technology for RX CPU improvement so there's no
benefit on TX workloads. Second it depends on there being buffers to
copy the data into *before* the data arrives. This happens to be the
case for benchmarks like netperf and Chariot, but real apps using
poll/select wouldn't see a benefit,  Just laying the cards out here.
BUT we are seeing very good CPU savings on some workloads, so for
those apps (and if select/poll apps could make use of a
yet-to-be-implemented async net interface) it would be a win.

I don't know what the breakdown is of apps doing blocking reads vs.
waiting, does anyone know?



All the bandwidth benchmarks tend to block, real world servers (and
most clients to some extent) tend to use non-blocking reads and
poll/select except in some very limited cases and designs doing
something like 1 thread per connection.


Another netperf2 option :) (not exported via configure though) if a 
certain define is set - look at recv_tcp_stream() in nettest_bsd.c - 
then netperf will call select() before it calls recv().


rick jones


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread David S. Miller
From: Rick Jones [EMAIL PROTECTED]
Date: Thu, 20 Apr 2006 18:00:37 -0700

 Actually, that brings-up a question - presently, and for reasons that 
 are lost to me in the mists of time - netperf will access the buffer 
 before it calls recv().  I'm wondering if that should be changed to an 
 access of the buffer after it calls recv()?

Yes, that's what it should do, as this is whan a real
application would do.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Herbert Xu
On Thu, Apr 20, 2006 at 03:36:57PM -0700, Michael Chan wrote:

 If we're in tg3_close() and the reset task isn't running yet, tg3_close
 () will proceed. However, when the reset task finally runs, it will see
 that netif_running() is zero and will just return.

Yes you're absolutely right.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIOCGIWSCAN wireless event behaviour

2006-04-20 Thread Daniel Drake

Jean Tourrilhes wrote:

The original behaviour was that the event was sent only when a
user did request a scan. At that time, cards did not do background
scanning, so new scan results would be produced only as a result of a
user scan.
After a short discussion we Dan, we agree that to change that,
the driver should send a scan whenever a new scan result is available,
regardless of how it happens (background scan or user scan). This
allow smart application to synchronise on background scans and avoid
them generating useless user scans. Minimising the number of user scan
is actually good.


Thanks for all the responses.

I am not sure if the 'extra' SIOCGIWSCAN event is what is causing 
wpa_supplicant's confusion, but the kind of behaviour I am seeing is 
wpa_supplicant associating to the network, immediately disassociating, 
and then associating again before the connection stabilises. This is 
with wpa_supplicant 0.5.2 connecting to an unencrypted network.


I am also seeing that softmac reassociates with a network after 
wpa_supplicant exits.


Johannes posted a softmac patch earlier which may help (related to 
softmac's handling of SIOCGIWAP). I will do some further investigation 
and provide a more complete report if that doesn't fix it.


Thanks,
Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Herbert Xu
On Fri, Apr 21, 2006 at 11:27:01AM +1000, herbert wrote:
 On Thu, Apr 20, 2006 at 03:36:57PM -0700, Michael Chan wrote:
 
  If we're in tg3_close() and the reset task isn't running yet, tg3_close
  () will proceed. However, when the reset task finally runs, it will see
  that netif_running() is zero and will just return.
 
 Yes you're absolutely right.

Actually, what if the tg3_close is followed by a tg3_open? That could
produce a spurious reset which I suppose isn't that bad.  Also if the
module is unloaded bad things will happen as well.  So I still don't
feel too comfortable about leaving it scheduled after a close.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Open ethernet hardware specs

2006-04-20 Thread Alexey Dobriyan
On Thu, Apr 20, 2006 at 06:55:58PM -0400, Jeff Garzik wrote:
 Also, janitors, there are more NIC specs at
 http://gkernel.sourceforge.net/specs/ than are listed on the wiki.  What
 I posted is just a starter list.  If someone were to comb through each
 PDF in the /specs/ sub-directories, and make sure it is linked on the
 wiki, I would be grateful.

Almost done.

P.S.:
http://gkernel.sourceforge.net/specs/via/501designguide.pdf.bz2 is
broken.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Michael Chan
On Fri, 2006-04-21 at 11:33 +1000, Herbert Xu wrote:

 Actually, what if the tg3_close is followed by a tg3_open? That could
 produce a spurious reset which I suppose isn't that bad.

Yes, an extra reset. And yes, it isn't too bad.

 Also if the
 module is unloaded bad things will happen as well.

In tg3_remove_one(), we call flush_scheduled_work() in case the
reset_task is still pending. Here, it is safe to call
flush_scheduled_work() because we're not holding the rtnl. Again, when
it runs, nothing bad will happen because it will see netif_running() ==
0.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread Herbert Xu
David S. Miller [EMAIL PROTECTED] wrote:
 
 For I/O AT you'd really want to get the DMA engine going as soon
 as you had those packets, but I do not see a clean and reliable way
 to determine the target pages before the app gets back to recvmsg().

The vmsplice() system call proposed by Linus might be a good fit.

http://www.ussg.iu.edu/hypermail/linux/kernel/0604.2/0854.html
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Shaw Vrana
On Thursday 20 April 2006 17:10, Michael Chan wrote:
 In tg3_remove_one(), we call flush_scheduled_work() in case the
 reset_task is still pending. Here, it is safe to call
 flush_scheduled_work() because we're not holding the rtnl. Again, when
 it runs, nothing bad will happen because it will see netif_running() ==
 0.

I'll bite!  Here's a patch to add a call to flush_scheduled_work() in 
e1000_down.  It's against 2.6.16.9.

Shaw
diff -u -uprN -X linux-2.6.16.9/Documentation/dontdiff linux-2.6.16.9/drivers/net/e1000/e1000_main.c linux-2.6.16.9-patch/drivers/net/e1000/e1000_main.c
--- linux-2.6.16.9/drivers/net/e1000/e1000_main.c	2006-04-18 23:10:14.0 -0700
+++ linux-2.6.16.9-patch/drivers/net/e1000/e1000_main.c	2006-04-20 19:36:55.0 -0700
@@ -538,6 +538,7 @@ e1000_down(struct e1000_adapter *adapter
 	del_timer_sync(adapter-tx_fifo_stall_timer);
 	del_timer_sync(adapter-watchdog_timer);
 	del_timer_sync(adapter-phy_info_timer);
+	flush_scheduled_work();	
 
 #ifdef CONFIG_E1000_NAPI
 	netif_poll_disable(netdev);


Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Herbert Xu
Michael Chan [EMAIL PROTECTED] wrote:
 
 In tg3_remove_one(), we call flush_scheduled_work() in case the
 reset_task is still pending. Here, it is safe to call

Great.

 flush_scheduled_work() because we're not holding the rtnl. Again, when

Hmm doing a quick grep seems to indicate that quite a number of drivers
do this in netdev-close or other callbacks under RTNL.  This means that
they're all vulnerable to the linkwatch deadlock that you alluded to.

Rather than dealing with this individually in each driver perhaps we should
come up with a more centralised solution?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cannot receive multicast packets

2006-04-20 Thread David Stevens
Andrew,

  I did not 
 think the source IP was relevant to the matching code in linux, since 
 there are no source squelching socket options. 
 
 There are no firewall rules active on this machine, and the packets are 
 definitely visible at the interface (see tcpdump output in my email).

The source address is not relevant (other than potentially
for firewall rules), and I understand from your original mail that
they are arriving at the machine. The IP TTL is what I wanted to
know there; but netstat -s will normally tell you why a packet
was dropped, if it's arriving but not making it through the UDP/IP
stack (as is your case).

 I am going to try upgrading the kernel, and turning off the multicast 
 router kernel options as a next step.  But if you have any other ideas 
 at all, I'm all ears.

netstat -s would be a good start. :-) tcpdump receiving a copy
of the packet does not mean UDP or IP won't drop it, but those drops
are counted.

+-DLS

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Michael Chan
On Fri, 2006-04-21 at 12:40 +1000, Herbert Xu wrote:

 One simple solution is to establish a separate queue for RTNL-holding
 users or vice versa for non-RTNL holding networking users.  That
 would allow the drivers to safely flush the non-RTNL queue while
 holding the RTNL.

You mean a separate workqueue for net drivers to use instead of the
keventd_wq? Yeah, I think that'll work. Each driver can also create its
own workqueue but that may be a bit more wasteful.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread Olof Johansson
On Thu, Apr 20, 2006 at 05:27:42PM -0700, David S. Miller wrote:
 From: Olof Johansson [EMAIL PROTECTED]
 Date: Thu, 20 Apr 2006 16:33:05 -0500
 
  From the wiki:
  
  3. Data copied by I/OAT is not cached
  
  This is a I/OAT device limitation and not a global statement of the
  DMA infrastructure. Other platforms might be able to prime caches
  with the DMA traffic. Hint flags should be added on either the channel
  allocation calls, or per-operation calls, depending on where it makes
  sense driver/client wise.
 
 This sidesteps the whole question of _which_ cache to warm.  And if
 you choose wrongly, then what?

 Besides the control overhead of the DMA engines, the biggest thing
 lost in my opinion is the perfect cache warming that a cpu based copy
 does from the kernel socket buffer into userspace.

It's definitely the easiest way to always make sure the right caches
are warm for the app, that I agree with.

But, when warming those caches by copying, the data is pulled in through
a potentially cold cache in the first place. So the cache misses are
just moved from the copy loop to userspace with dma offload. Or am I
missing something?

 The first thing an application is going to do is touch that data.  So
 I think it's very important to prewarm the caches and the only
 straightforward way I know of to always warm up the correct cpu's
 caches is copy_to_user().

The other way (assuming the hardware supports cache warming) would be
to pass down affinities (or look them up during receive processing,
I'm not sure that's practical the way things work now), and dispatch
on a DMA channel with the right cache affinity. I've got a feeling that
straightforward is not a term to use for describing that solution
though.

 Unfortunately, many benchmarks just do raw bandwidth tests sending to
 a receiver that just doesn't even look at the data.  They just return
 from recvmsg() and loop back into it.  This is not what applications
 using networking actually do, so it's important to make sure we look
 intelligently at any benchmarks done and do not fall into the trap of
 saying even without cache warming it made things faster when in fact
 the tested receiver did not touch the data at all so was a false test.

Yes, some real-life-like benchmarking is definitiely needed. Unfortunately
I'm not at a position where I can do much (and share numbers) at the
moment myself.


-Olof
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread shaw
I've replied to this once before, but haven't seen my last two emails on the 
list, so I'm sending again with different settings.  Sorry for the noise.

On Thursday 20 April 2006 17:10, Michael Chan wrote:
 In tg3_remove_one(), we call flush_scheduled_work() in case the
 reset_task is still pending. Here, it is safe to call
 flush_scheduled_work() because we're not holding the rtnl. Again, when
 it runs, nothing bad will happen because it will see netif_running() ==
 0.

I'll bite!  Here's a patch to add a call to flush_scheduled_work() in 
e1000_down.  It's against 2.6.16.9.

Thanks,
Shaw
diff -u -uprN -X linux-2.6.16.9/Documentation/dontdiff linux-2.6.16.9/drivers/net/e1000/e1000_main.c linux-2.6.16.9-patch/drivers/net/e1000/e1000_main.c
--- linux-2.6.16.9/drivers/net/e1000/e1000_main.c	2006-04-18 23:10:14.0 -0700
+++ linux-2.6.16.9-patch/drivers/net/e1000/e1000_main.c	2006-04-20 19:36:55.0 -0700
@@ -538,6 +538,7 @@ e1000_down(struct e1000_adapter *adapter
 	del_timer_sync(adapter-tx_fifo_stall_timer);
 	del_timer_sync(adapter-watchdog_timer);
 	del_timer_sync(adapter-phy_info_timer);
+	flush_scheduled_work();	
 
 #ifdef CONFIG_E1000_NAPI
 	netif_poll_disable(netdev);


Re: [PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread Olof Johansson
On Thu, Apr 20, 2006 at 05:44:38PM -0700, David S. Miller wrote:
 From: Olof Johansson [EMAIL PROTECTED]
 Date: Thu, 20 Apr 2006 18:33:43 -0500
 
  On Thu, Apr 20, 2006 at 03:14:15PM -0700, Andrew Grover wrote:
   In
   addition, there may be workloads (file serving? backup?) where we
   could do a skb-page-in-page-cache copy and avoid cache pollution?
  
  Yes, NFS is probably a prime example of where most of the data isn't
  looked at; just written to disk. I'm not sure how well-optimized the
  receive path is there already w.r.t. avoiding copying though. I don't
  remember seeing memcpy and friends being high on the profile when I
  looked at SPECsfs last.
 
 If that makes sense then the cpu copy can be made to use non-temporal
 stores.

I'm not sure that would buy anything. I didn't mean caching was
necessarily bad, just that lack of it might not hurt as much under that
specific type of workload.

NFS has to look at RPC/NFS headers anyway, so it will benefit from the
cache being warm.


-Olof
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Michael Chan
On Thu, 2006-04-20 at 19:42 -0700, Shaw Vrana wrote:

 I'll bite!  Here's a patch to add a call to flush_scheduled_work() in 
 e1000_down.  It's against 2.6.16.9.
 
You're not following our discussion. It is not safe to call
flush_scheduled_work() in a driver's close() because it is holding the
rtnl and can deadlock with linkwatch_event() if it happens to be on the
workqueue.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Unregister network device before releasing PCMCIA resources

2006-04-20 Thread Pavel Roskin
From: Pavel Roskin [EMAIL PROTECTED]

This is the right thing to do and it prevents kernel BUG on unload.

Some PCMCIA network drivers use link-dev_node as a flag indicating that
the network device has been successfully registered.  Recent code
changes cause this flag to be 0 after PCMCIA resources have been
released.

Signed-off-by: Pavel Roskin [EMAIL PROTECTED]
---

 drivers/net/wireless/netwave_cs.c  |4 ++--
 drivers/net/wireless/orinoco_cs.c  |5 +++--
 drivers/net/wireless/ray_cs.c  |4 +++-
 drivers/net/wireless/spectrum_cs.c |5 +++--
 drivers/net/wireless/wavelan_cs.c  |9 +
 5 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/netwave_cs.c 
b/drivers/net/wireless/netwave_cs.c
index 9343d97..5d80db2 100644
--- a/drivers/net/wireless/netwave_cs.c
+++ b/drivers/net/wireless/netwave_cs.c
@@ -445,11 +445,11 @@ static void netwave_detach(struct pcmcia
 
DEBUG(0, netwave_detach(0x%p)\n, link);
 
-   netwave_release(link);
-
if (link-dev_node)
unregister_netdev(dev);
 
+   netwave_release(link);
+
free_netdev(dev);
 } /* netwave_detach */
 
diff --git a/drivers/net/wireless/orinoco_cs.c 
b/drivers/net/wireless/orinoco_cs.c
index 434f7d7..5988305 100644
--- a/drivers/net/wireless/orinoco_cs.c
+++ b/drivers/net/wireless/orinoco_cs.c
@@ -147,14 +147,15 @@ static void orinoco_cs_detach(struct pcm
 {
struct net_device *dev = link-priv;
 
-   orinoco_cs_release(link);
-
DEBUG(0, PFX detach: link=%p link-dev_node=%p\n, link, 
link-dev_node);
if (link-dev_node) {
DEBUG(0, PFX About to unregister net device %p\n,
  dev);
unregister_netdev(dev);
}
+
+   orinoco_cs_release(link);
+
free_orinocodev(dev);
 }  /* orinoco_cs_detach */
 
diff --git a/drivers/net/wireless/ray_cs.c b/drivers/net/wireless/ray_cs.c
index 879eb42..fac4f1b 100644
--- a/drivers/net/wireless/ray_cs.c
+++ b/drivers/net/wireless/ray_cs.c
@@ -388,13 +388,15 @@ static void ray_detach(struct pcmcia_dev
 this_device = NULL;
 dev = link-priv;
 
+if (link-dev_node)
+   unregister_netdev(dev);
+
 ray_release(link);
 
 local = (ray_dev_t *)dev-priv;
 del_timer(local-timer);
 
 if (link-priv) {
-   if (link-dev_node) unregister_netdev(dev);
 free_netdev(dev);
 }
 DEBUG(2,ray_cs ray_detach ending\n);
diff --git a/drivers/net/wireless/spectrum_cs.c 
b/drivers/net/wireless/spectrum_cs.c
index f7b77ce..2551938 100644
--- a/drivers/net/wireless/spectrum_cs.c
+++ b/drivers/net/wireless/spectrum_cs.c
@@ -626,14 +626,15 @@ static void spectrum_cs_detach(struct pc
 {
struct net_device *dev = link-priv;
 
-   spectrum_cs_release(link);
-
DEBUG(0, PFX detach: link=%p link-dev_node=%p\n, link, 
link-dev_node);
if (link-dev_node) {
DEBUG(0, PFX About to unregister net device %p\n,
  dev);
unregister_netdev(dev);
}
+
+   spectrum_cs_release(link);
+
free_orinocodev(dev);
 }  /* spectrum_cs_detach */
 
diff --git a/drivers/net/wireless/wavelan_cs.c 
b/drivers/net/wireless/wavelan_cs.c
index f7724eb..03c2e16 100644
--- a/drivers/net/wireless/wavelan_cs.c
+++ b/drivers/net/wireless/wavelan_cs.c
@@ -4681,6 +4681,11 @@ #ifdef DEBUG_CALLBACK_TRACE
   printk(KERN_DEBUG - wavelan_detach(0x%p)\n, link);
 #endif
 
+  /* Remove ourselves from the kernel list of ethernet devices */
+  /* Warning : can't be called from interrupt, timer or wavelan_close() */
+  if (link-dev_node)
+unregister_netdev(dev);
+
   /* Some others haven't done their job : give them another chance */
   wv_pcmcia_release(link);
 
@@ -4689,10 +4694,6 @@ #endif
 {
   struct net_device *  dev = (struct net_device *) link-priv;
 
-  /* Remove ourselves from the kernel list of ethernet devices */
-  /* Warning : can't be called from interrupt, timer or wavelan_close() */
-  if (link-dev_node)
-   unregister_netdev(dev);
   link-dev_node = NULL;
   ((net_local *)netdev_priv(dev))-link = NULL;
   ((net_local *)netdev_priv(dev))-dev = NULL;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread David S. Miller
From: Olof Johansson [EMAIL PROTECTED]
Date: Thu, 20 Apr 2006 22:04:26 -0500

 On Thu, Apr 20, 2006 at 05:27:42PM -0700, David S. Miller wrote:
  Besides the control overhead of the DMA engines, the biggest thing
  lost in my opinion is the perfect cache warming that a cpu based copy
  does from the kernel socket buffer into userspace.
 
 It's definitely the easiest way to always make sure the right caches
 are warm for the app, that I agree with.
 
 But, when warming those caches by copying, the data is pulled in through
 a potentially cold cache in the first place. So the cache misses are
 just moved from the copy loop to userspace with dma offload. Or am I
 missing something?

Yes, and it means that the memory bandwidth costs are equivalent
between I/O AT and cpu copy.

In the cpu copy case you eat the read cache miss, but on the write
side you'll prewarm the cache properly.  In the I/O AT case you
eat the same read cost, but the cache will not be prewarmed, so you'll
eat the read cache miss in the application.  It's moving the same
exact cost from one place to another.

The time it takes to get the app to make forward progress (meaning
returned from the recvmsg() system call and back in userspace) must by
definition take at least as long with I/O AT as it does with cpu
copies.  Yet in the I/O AT case, the application must wait that long
and also then take in the delays of the cache misses when it tries to
read the data that the I/O AT engine copied.  Instead of eating the
cache miss cost in the kernel, we eat it in the app because in the I/O
AT case the cpu won't have the user data fresh and loaded into the cpu
cache.

And I say I/O AT must take at least as long as cpu copies because
the same memory copy cost is there, and on top of that I/O AT has to
program the DMA controller and touch a _lot_ of other state to get
things going and then wake the task up.  We're talking non-trivial
overheads like grabbing the page mappings out of the page tables using
get_user_pages().  Evgivny has posted some very nice performance graphs
showing how poorly that function scales.

This is basically why none of the performance gains add up to me.  I
am thus very concerned that the current non-cache-warming
implmentation may fall flat performance wise.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] [IOAT] I/OAT patches repost

2006-04-20 Thread Olof Johansson
On Thu, Apr 20, 2006 at 08:42:00PM -0700, David S. Miller wrote:

 This is basically why none of the performance gains add up to me.  I
 am thus very concerned that the current non-cache-warming
 implmentation may fall flat performance wise.

Ok, I buy your arguments. It does seems unlikely that a DMA offload
without cache warmth will be a net gain. More performance data is
definitely be required.

After digging after PDFs, it seems as the Freescale 85xx (at least,
probably earlier models as well) can warm L2 for the DMA destination
data. However, I don't have any hardware with it to play around
with for benchmarking to see what cache warming might bring (back),
performance-wise.

I think there is still use for a common multi-function DMA framework
across platforms and client components, even if net receive doesn't end
up being {a,the first} user.


-Olof
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Congestion Avoidance Monitoring Tools

2006-04-20 Thread Piet Delaney
I'm upgrading our 2.6.12 kernel to 2.6.13, which includes significant
congestion avoidance code additions and changes. I was wondering if
there are any tools folks can recommend for testing the kernel to make
sure the congestion avoidance code is operating correctly. For 
example the displaying of the congestion window as a function of time
while undergoing convergence. For causing congestion I could modify 
a kernel to discard packets once in a while on a lab gateway and hit 
it with iperf. HP's netperf looks interesting. 

Any suggestions?


-piet

-- 
---
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html