Re: zero-copy TCP

2000-09-12 Thread Jamie Lokier

Todd wrote:
> While I agree with what's going on right now, the recent purchase of
> Alteon by Nortel (primarily for their switch line, not for the
> NICs) leaves quite a bit of doubt in my mind about the future of the card
> and the openness of the firmware in particular.

Why not raise your concerns on the openfw list?  It would be good for
Alteon/Nortel to know the concerns of their user base, and that
programming the Tigon3 will also be interesting.

On the whole I think Alteon have had very positive feedback from opening
up their current firmware.  Like, code improvements, a better compiler
(results not yet incorporated into the current Linux driver AFAIK), bug
reports and good will from users.

I believe the Tigon2 cards will continue to be manufactured for a while.
It seems to be up to the job (only just!).  Btw, if Alteon close off
future firmware updates for the Tigon2, we can always fork from one of
the released firmware versions.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-12 Thread Jes Sorensen

> "Todd" == Todd  <[EMAIL PROTECTED]> writes:

>> > Jes Sorensen wrote: > > It took me a little while in the
>> beginning to convince Alteon to open > > up and provide docs, but
>> since they saw the light they have been > > extremely helpful and
>> went much further in their openness than I had > > ever expected or
>> dared to hope for.  > > And now it's really showing in their
>> favour.  An amazing number of > research groups are writing
>> applications to run on the Alteon cards.

Todd> While I agree with what's going on right now, the recent
Todd> purchase of Alteon by Nortel (primarily for their switch line,
Todd> not for the NICs) leaves quite a bit of doubt in my mind about
Todd> the future of the card and the openness of the firmware in
Todd> particular.

This is kinda running off topic, but lets give them the benefit of the
doubt. The guys at Alteon knew what they did when they opened up.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-12 Thread Todd

> > Jes Sorensen wrote:
> > > It took me a little while in the beginning to convince Alteon to open
> > > up and provide docs, but since they saw the light they have been
> > > extremely helpful and went much further in their openness than I had
> > > ever expected or dared to hope for.
> > 
> > And now it's really showing in their favour.  An amazing number of
> > research groups are writing applications to run on the Alteon cards.

While I agree with what's going on right now, the recent purchase of
Alteon by Nortel (primarily for their switch line, not for the
NICs) leaves quite a bit of doubt in my mind about the future of the card
and the openness of the firmware in particular.

todd

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-12 Thread Alan Cox

> Jes Sorensen wrote:
> > It took me a little while in the beginning to convince Alteon to open
> > up and provide docs, but since they saw the light they have been
> > extremely helpful and went much further in their openness than I had
> > ever expected or dared to hope for.
> 
> And now it's really showing in their favour.  An amazing number of
> research groups are writing applications to run on the Alteon cards.

Its also the gigabit card of choice on Linux currently. And non research people
do use GigE 8)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-12 Thread Jamie Lokier

Jes Sorensen wrote:
> It took me a little while in the beginning to convince Alteon to open
> up and provide docs, but since they saw the light they have been
> extremely helpful and went much further in their openness than I had
> ever expected or dared to hope for.

And now it's really showing in their favour.  An amazing number of
research groups are writing applications to run on the Alteon cards.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-12 Thread Jes Sorensen

> "Jamie" == Jamie Lokier <[EMAIL PROTECTED]> writes:

Jamie> According to group legend here (I wasn't around but will repeat
Jamie> what I was told), we spent about 1 year trying to get docs on
Jamie> Intel's i960 based gigabit card so we could program it.
Jamie> Eventually we gave up and moved to Alteon, who are very
Jamie> helpful.

Heh

It took me a little while in the beginning to convince Alteon to open
up and provide docs, but since they saw the light they have been
extremely helpful and went much further in their openness than I had
ever expected or dared to hope for.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-12 Thread Jes Sorensen

 "Jamie" == Jamie Lokier [EMAIL PROTECTED] writes:

Jamie According to group legend here (I wasn't around but will repeat
Jamie what I was told), we spent about 1 year trying to get docs on
Jamie Intel's i960 based gigabit card so we could program it.
Jamie Eventually we gave up and moved to Alteon, who are very
Jamie helpful.

Heh

It took me a little while in the beginning to convince Alteon to open
up and provide docs, but since they saw the light they have been
extremely helpful and went much further in their openness than I had
ever expected or dared to hope for.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-12 Thread Jamie Lokier

Jes Sorensen wrote:
 It took me a little while in the beginning to convince Alteon to open
 up and provide docs, but since they saw the light they have been
 extremely helpful and went much further in their openness than I had
 ever expected or dared to hope for.

And now it's really showing in their favour.  An amazing number of
research groups are writing applications to run on the Alteon cards.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-12 Thread Alan Cox

 Jes Sorensen wrote:
  It took me a little while in the beginning to convince Alteon to open
  up and provide docs, but since they saw the light they have been
  extremely helpful and went much further in their openness than I had
  ever expected or dared to hope for.
 
 And now it's really showing in their favour.  An amazing number of
 research groups are writing applications to run on the Alteon cards.

Its also the gigabit card of choice on Linux currently. And non research people
do use GigE 8)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-12 Thread Todd

  Jes Sorensen wrote:
   It took me a little while in the beginning to convince Alteon to open
   up and provide docs, but since they saw the light they have been
   extremely helpful and went much further in their openness than I had
   ever expected or dared to hope for.
  
  And now it's really showing in their favour.  An amazing number of
  research groups are writing applications to run on the Alteon cards.

While I agree with what's going on right now, the recent purchase of
Alteon by Nortel (primarily for their switch line, not for the
NICs) leaves quite a bit of doubt in my mind about the future of the card
and the openness of the firmware in particular.

todd

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-12 Thread Jes Sorensen

 "Todd" == Todd  [EMAIL PROTECTED] writes:

  Jes Sorensen wrote:   It took me a little while in the
 beginning to convince Alteon to open   up and provide docs, but
 since they saw the light they have been   extremely helpful and
 went much further in their openness than I had   ever expected or
 dared to hope for.And now it's really showing in their
 favour.  An amazing number of  research groups are writing
 applications to run on the Alteon cards.

Todd While I agree with what's going on right now, the recent
Todd purchase of Alteon by Nortel (primarily for their switch line,
Todd not for the NICs) leaves quite a bit of doubt in my mind about
Todd the future of the card and the openness of the firmware in
Todd particular.

This is kinda running off topic, but lets give them the benefit of the
doubt. The guys at Alteon knew what they did when they opened up.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-12 Thread Jamie Lokier

Todd wrote:
 While I agree with what's going on right now, the recent purchase of
 Alteon by Nortel (primarily for their switch line, not for the
 NICs) leaves quite a bit of doubt in my mind about the future of the card
 and the openness of the firmware in particular.

Why not raise your concerns on the openfw list?  It would be good for
Alteon/Nortel to know the concerns of their user base, and that
programming the Tigon3 will also be interesting.

On the whole I think Alteon have had very positive feedback from opening
up their current firmware.  Like, code improvements, a better compiler
(results not yet incorporated into the current Linux driver AFAIK), bug
reports and good will from users.

I believe the Tigon2 cards will continue to be manufactured for a while.
It seems to be up to the job (only just!).  Btw, if Alteon close off
future firmware updates for the Tigon2, we can always fork from one of
the released firmware versions.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-06 Thread Stephen Williams

Alan Cox wrote:
> I've spent over 2 years trying to extract eepro100 server docs out of Intel
> and failed.

[EMAIL PROTECTED] said:
> Sounds familiar :-) 

Very familiar. I have a whole development environment for the i960RP
(we use the processor on boards we make) and by looking at pictures
I was able to figure out that it uses a standard PCI ethernet chipset
on the secondary bus. I asked Intel, and no one said I could not have
the information I wanted, I just got passed around until I got bored.

All I needed out of Intel was a memory map (there was a PLD that I presume
does address decoding) and a few hints how the prom monitor worked.

If anybody has that stuff, I have a GPL development environment, built
around gcc, that I can quickly (days?) port to the board, as I have
considerable experience with the i960RP/RD processors.

*sigh* It's pointless, isn't it?
-- 
Steve Williams"The woods are lovely, dark and deep.
[EMAIL PROTECTED]  But I have promises to keep,
[EMAIL PROTECTED]and lines to code before I sleep,
http://www.picturel.com   And lines to code before I sleep."


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-06 Thread Stephen Williams

Alan Cox wrote:
 I've spent over 2 years trying to extract eepro100 server docs out of Intel
 and failed.

[EMAIL PROTECTED] said:
 Sounds familiar :-) 

Very familiar. I have a whole development environment for the i960RP
(we use the processor on boards we make) and by looking at pictures
I was able to figure out that it uses a standard PCI ethernet chipset
on the secondary bus. I asked Intel, and no one said I could not have
the information I wanted, I just got passed around until I got bored.

All I needed out of Intel was a memory map (there was a PLD that I presume
does address decoding) and a few hints how the prom monitor worked.

If anybody has that stuff, I have a GPL development environment, built
around gcc, that I can quickly (days?) port to the board, as I have
considerable experience with the i960RP/RD processors.

*sigh* It's pointless, isn't it?
-- 
Steve Williams"The woods are lovely, dark and deep.
[EMAIL PROTECTED]  But I have promises to keep,
[EMAIL PROTECTED]and lines to code before I sleep,
http://www.picturel.com   And lines to code before I sleep."


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Alan Cox

> > True the i960 based one I didn't think of, however Intel never
> > provided docs for it.
> 
> ??? I find this surprising.  Email [EMAIL PROTECTED] and ask him if
> they will give them to you.  I'm sure they would for Linux.

I've spent over 2 years trying to extract eepro100 server docs out of Intel
and failed. I've been refused info on how to detect the problematic 820/840
boards with the memory translation hub - which means we now flush many intel
chipset related bug reports into the 'cannot trust hardware' bucket. Intel
have also consistently refused to document the MSR's that allow you to read
the CPU intended speed and the bus clock that would allow us to flag and
note overclocked CPU's on the /proc/cpuinfo and help bug reporting.

Intel document what they feel like documenting, on their own agenda for their
own purposes.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jes Sorensen

> "Jeff" == Jeff V Merkey <[EMAIL PROTECTED]> writes:

Jeff> IPX is a really good LAN protocol (but totally sucks for
Jeff> internet).  A full blown NCP server in-kernel that's toughtly
Jeff> coupled to the page cache running over IPX would make flames
Jeff> shoot out of the back of a Linux server, and make NT like look
Jeff> an old lady hobbling down the street.  There's no need to
Jeff> configure client addresses with it, and for file and print, it's
Jeff> the best.

IPX is WHAT?

I'd recommend you go look at the switches on your network and note how
they look likg flashing chrstmas trees - broadcast traffic is not good
for any type of network, be it LAN or WAN.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jes Sorensen

> "Jeff" == Jeff V Merkey <[EMAIL PROTECTED]> writes:

Jeff> Intel Nitro Card - i960 processor on the card.  The SMP
Jeff> debugging involved the use of a bus analyser since this card had
Jeff> a piggish memory bus footprint (i960 processors do not have an
Jeff> IO address space, so everything is memory mapped.  The big
Jeff> weakness of early I2O stuff was that the I960 running off on the
Jeff> boards had access to the memory bus of these early systems.
Jeff> Performace problems that were related to passing messages to the
Jeff> embedded OS running on the i960 on the Nitro Card, and it
Jeff> "missing" messages when two processors were talking to it at
Jeff> once.

True the i960 based one I didn't think of, however Intel never
provided docs for it.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jes Sorensen

> "Jeff" == Jeff V Merkey <[EMAIL PROTECTED]> writes:

Jeff> Jes Sorensen wrote:
>>  True the i960 based one I didn't think of, however Intel never
>> provided docs for it.

Jeff> ??? I find this surprising.  Email [EMAIL PROTECTED] and ask
Jeff> him if they will give them to you.  I'm sure they would for
Jeff> Linux.

Well you obviously never had to deal with some of these companies. I
have little interest in writing a driver for a card which I don't have
and which is very old. I know that Don Becker had problems in the past
getting proper specs out on some of the other Intel cards, I don't
know whether he tried to get specs on the Nitro but I would asume so.
>From Don's web page about unsupported boards:

  "No board with an on-board processor is supported, because these
  invariably have a proprietary/undocumented interface. (EEPro Server
  and Matrox multiport PCI switch cards fall into this category.)".

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey



Jes Sorensen wrote:
> 
> > "Jeff" == Jeff V Merkey <[EMAIL PROTECTED]> writes:
> 
> Jeff> Intel Nitro Card - i960 processor on the card.  The SMP
> Jeff> debugging involved the use of a bus analyser since this card had
> Jeff> a piggish memory bus footprint (i960 processors do not have an
> Jeff> IO address space, so everything is memory mapped.  The big
> Jeff> weakness of early I2O stuff was that the I960 running off on the
> Jeff> boards had access to the memory bus of these early systems.
> Jeff> Performace problems that were related to passing messages to the
> Jeff> embedded OS running on the i960 on the Nitro Card, and it
> Jeff> "missing" messages when two processors were talking to it at
> Jeff> once.
> 
> True the i960 based one I didn't think of, however Intel never
> provided docs for it.

??? I find this surprising.  Email [EMAIL PROTECTED] and ask him if
they will give them to you.  I'm sure they would for Linux.

Jeff

> 
> Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey


Dear Stu,

I apologize for bothering you, but how possible would it be to let the
Linux folks get access to the Nitro docs and docs for Intel ethernet
cards in general.  There seems to be interest from folks in getting this
to improve Linux support for Intel products.

Any help would be appreciated.

Thanks,

Jeff Merkey

Alan Cox wrote:
> 
> > > True the i960 based one I didn't think of, however Intel never
> > > provided docs for it.
> >
> > ??? I find this surprising.  Email [EMAIL PROTECTED] and ask him if
> > they will give them to you.  I'm sure they would for Linux.
> 
> I've spent over 2 years trying to extract eepro100 server docs out of Intel
> and failed. I've been refused info on how to detect the problematic 820/840
> boards with the memory translation hub - which means we now flush many intel
> chipset related bug reports into the 'cannot trust hardware' bucket. Intel
> have also consistently refused to document the MSR's that allow you to read
> the CPU intended speed and the bus clock that would allow us to flag and
> note overclocked CPU's on the /proc/cpuinfo and help bug reporting.
> 
> Intel document what they feel like documenting, on their own agenda for their
> own purposes.
> 
> Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Dan Hollis

On Tue, 5 Sep 2000, Henning P . Schmiedehausen wrote:
> On Tue, Sep 05, 2000 at 11:25:12AM -0700, Dan Hollis wrote:
> > I think you mean IPX is dead. Netware *could* work over TCP or UDP.
> > IP is definitely king. Even micro$haft gave up on NetBEUI.
> Yep, thats' what I meant. Sorry that I was not clearer. But I think
> that there are even with NetWare on IP not many new
> installations. There is lots of migration of existing servers and
> keeping existing systems alive but new rollouts?
> But then again, maybe with MANOS and OpenNetWare, everything will be
> different.

OpenNetWare could reign king, if they would abandon the legacy cruft.

-Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Dan Hollis

On 5 Sep 2000, Henning P. Schmiedehausen wrote:
> [EMAIL PROTECTED] (Jeff V. Merkey) writes:
> >IPX is a really good LAN protocol (but totally sucks for internet).  A
> Jeff, Netware is dead. Please leave it there. IP won. The number of
> new Netware Installations (as compared to existing or just upgrades)
> is close (really close) to nil.

I think you mean IPX is dead. Netware *could* work over TCP or UDP.
IP is definitely king. Even micro$haft gave up on NetBEUI.

-Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey


Only Linux makes the lights flash with IPX RIP/SAP.  NetWare uses NLSP
routing and has since 1993 for IPX/SPX.  I agree if someone is running
NetWare 3 or NetWare 4.1 or earlier there's a lot of RIP/SAP traffic,
but not the NLSP versions -- they do not use RIP/SAP but NLSP.

Jeff

Jes Sorensen wrote:
> 
> > "Jeff" == Jeff V Merkey <[EMAIL PROTECTED]> writes:
> 
> Jeff> IPX is a really good LAN protocol (but totally sucks for
> Jeff> internet).  A full blown NCP server in-kernel that's toughtly
> Jeff> coupled to the page cache running over IPX would make flames
> Jeff> shoot out of the back of a Linux server, and make NT like look
> Jeff> an old lady hobbling down the street.  There's no need to
> Jeff> configure client addresses with it, and for file and print, it's
> Jeff> the best.
> 
> IPX is WHAT?
> 
> I'd recommend you go look at the switches on your network and note how
> they look likg flashing chrstmas trees - broadcast traffic is not good
> for any type of network, be it LAN or WAN.
> 
> Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jes Sorensen

> "Jeff" == Jeff V Merkey <[EMAIL PROTECTED]> writes:

Jeff> Only Linux makes the lights flash with IPX RIP/SAP.  NetWare
Jeff> uses NLSP routing and has since 1993 for IPX/SPX.  I agree if
Jeff> someone is running NetWare 3 or NetWare 4.1 or earlier there's a
Jeff> lot of RIP/SAP traffic, but not the NLSP versions -- they do not
Jeff> use RIP/SAP but NLSP.

Well I looked at the switches for NT installations not Linux boxes, it
was a sad sight.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Ingo Molnar


On Wed, 6 Sep 2000, Chris Wedgwood wrote:

> [...] The point is that a write() is only used if some sort of
> dynamic data is generated on the fly.
> 
> There are exsiting applications out there that use mmap+write
> (caching the maps), it would be nice for the authors of these not to
> have to _require_ non-portable sendfile semantics for the best
> performance.

this is not just an interface question, mmap()+write() is conceptually
inferior to a sendfile(). [if the goal is to send the same data multiple
times.]

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread dean gaudet

On Mon, 4 Sep 2000, Jamie Lokier wrote:

> Alan Cox wrote:
> > > It's not faster than card->card DMA, which falls out naturally from my
> > > zero-copy proposal :-)
> > 
> > We already support card->card DMA for routing with fastrouting
> 
> ..but not for user space proxies which was the above's context.
> 
> Still, the fastrouting proves card->card DMA actually works.

this is just a comment from the "real world" (for some definition of
real).

my last job was to take a hosted mail system from 3 million mailboxes to
over 25 million (the new design is targetted for 100 million).  part of
this architecture includes a userspace proxy at the front -- all data goes
through the proxy.

from the point of view of scaling and robustness it never made any sense
for us to put more than about 2Mbyte/s through one of these proxy boxes.  
which is pretty easy to handle even if you're doing another copy of all
the data.  (although my proxy code included a userland zero-copy
implementation, used readv/writev and was otherwise optimised.)

the part which actually broke down at that scale was handling the number
of concurrent connections, not the total bandwidth.

and, alan has pointed this out before -- it's not just concurrent
LAN-speed connections which are of interest.  it's concurrent modem users.  
all the current benchmarks are LAN-style and misrepresent the real-world
by a lot.

i know net connections are getting faster -- but are they ahead of the
moore's law curve or behind it?  it may be better (in the long run) for
linux to work on scaling concurrent connections than work on getting the
last couple percentages out of raw LAN transfers.  but i'm biased because
i've worked in the internet service space for too long :)

unfortunately i think scaling to tens of thousands of modem connections is
going to require some different programming paradigms -- the rt signal
stuff is a start.  but there's probably a lot of wins yet to be had from
deliberately delaying servicing of some connections in order to achieve
better cache usage.  (consider the case of the big static-content FTP/HTTP
server and trying to arrange for a few hundred connections to be on
roughly the same page at roughly the same time.  i think there's wins
there, but the math is hard :)

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Henning P. Schmiedehausen

[EMAIL PROTECTED] (Jeff V. Merkey) writes:



>Linus Torvalds wrote:
>> 
>> 
>> 
>> Basically, only TCP and UDP really matter. Decnet, IPX, etc don't really
>> make a big selling point any more.
>> 
>> 

>Linus,

>IPX is a really good LAN protocol (but totally sucks for internet).  A
>full blown NCP server in-kernel that's toughtly coupled to the page
>cache running over IPX would make flames shoot out of the back of a
>Linux server, and make NT like look an old lady hobbling down the
>street.  There's no need to configure client addresses with it, and for
>file and print, it's the best.

And it would be a good bit of necrophilia, too.

Jeff, Netware is dead. Please leave it there. IP won. The number of
new Netware Installations (as compared to existing or just upgrades)
is close (really close) to nil.

Regards
Henning
-- 
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen   -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH [EMAIL PROTECTED]

Am Schwabachgrund 22  Fon.: 09131 / 50654-0   [EMAIL PROTECTED]
D-91054 Buckenhof Fax.: 09131 / 50654-20   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Derek Fawcus

On Tue, Sep 05, 2000 at 04:35:43AM -0600, Jeff V. Merkey wrote:
> 
> IPX is a really good LAN protocol (but totally sucks for internet). A
> full blown NCP server in-kernel that's toughtly coupled to the page
> cache running over IPX would make flames shoot out of the back of a
> Linux server, and make NT like look an old lady hobbling down the
> street.  There's no need to configure client addresses with it, and for
> file and print, it's the best.

  I'd still prefer it over UDP.  A simple UDP/IP stack should be of a
comprable size to a IPX stack (if one wanted DOS support).

  As for the the configuration - you could use BOOTP/DHCP to get addresses
or alternatly use IPv6 and link-local addresses.

DF
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey



Ingo Molnar wrote:
> 
> On Tue, 5 Sep 2000, Jeff V. Merkey wrote:
> 
> > The origin of this comment was related to a comparison of the
> > MSM/TSM/CSM layer in NetWare and Linux. I've already said that Alan's
> > code handles fast paths well and from what I've seen is comparable to
> > NetWare. [...]
> 
> can we thus take this as a retraction of your below quoted three
> derogatory comments?
> 
> " The entire Linux Network subsystem needs an overhaul. "

To support the performance metrics of NetWare, there are some changes I
will make that will allow Alan's code to beat Native NetWare.  One is
allowing pre-scan protocol stacks to exist.  Another is a WTD
optimization to allow Alan's code to tag pages in the page cache and
post them with a preemptive IO WTD.  Another is moving ALL of the
routing code into the kernel space.  Another is consolidation of bottom
ad top halves to allow a single interrupt thread to run all the way into
the router and out without the need to schedule.  Another is moving the
NCP server into the kernel.   Another is enabling "gang" tagging and
release of a singe cache page by hundereds or thousands of users at one
tme for incoming reads.   The list is very long.  

> 
> " In networking, the enemy is LATENCY for fast performance.  That's why
>   NetWare can handle 5000 users and Linux barfs on 100 in similiar tests.
>   Copying increases latency, and the long code paths in the Linux Network
>   layer. "
> 
> " Alan, Please.  I'm in your code and there are copies all over the
>   place.  I agree you have a "fast path" for most stuff, but there's all
>   kinds of handles lookups, linear list searching like
> 
>   while (x)
>   {
> x = x->next
>   }
> 
>   all over the place that increases latency. "
> 
> Ingo

I already said this code is more than suitable, and better yet, it's
something folks are familiar with in Linux.  Alan and I went over some
of this off line.  Sorry you missed it.

Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Ingo Molnar


On Tue, 5 Sep 2000, Jeff V. Merkey wrote:

> The origin of this comment was related to a comparison of the
> MSM/TSM/CSM layer in NetWare and Linux. I've already said that Alan's
> code handles fast paths well and from what I've seen is comparable to
> NetWare. [...]

can we thus take this as a retraction of your below quoted three
derogatory comments?

" The entire Linux Network subsystem needs an overhaul. "

" In networking, the enemy is LATENCY for fast performance.  That's why
  NetWare can handle 5000 users and Linux barfs on 100 in similiar tests.  
  Copying increases latency, and the long code paths in the Linux Network
  layer. "


" Alan, Please.  I'm in your code and there are copies all over the
  place.  I agree you have a "fast path" for most stuff, but there's all
  kinds of handles lookups, linear list searching like

  while (x)
  {
x = x->next
  }
 
  all over the place that increases latency. "

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey


You opened your mouth.  

:-)

Jeff

Ingo Molnar wrote:
> 
> btw., - the maintainers of the 2.4 networking and TCP/IP code are Alexey
> Kuznetsov and David S. Miller - please direct your findings towards them,
> not me :-)
> 
> Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey


The origin of this comment was related to a comparison of the
MSM/TSM/CSM layer in NetWare and Linux.  I've already said that Alan's
code handles fast paths well and from what I've seen is comparable to
NetWare.  The areas I saw where sideband cases and issues of fragment
re-assembly.  It's as good as what's in NetWare.  

Jeff

Ingo Molnar wrote:
> 
> On Tue, 5 Sep 2000, Jeff V. Merkey wrote:
> 
> > Alright Ingo, you asked for it. I am going through it now and going
> > over ALL my notes. I will catalog ALL of them and post it. Is this
> > what you really want?
> 
> yes, this would be the best indeed, to get those places fixed. But if you
> dont want to spend your time on that then it's enough to just post a
> single incident of such inefficiency and list-walking that impacts latency
> like you claim.
> 
> Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Ingo Molnar


btw., - the maintainers of the 2.4 networking and TCP/IP code are Alexey
Kuznetsov and David S. Miller - please direct your findings towards them,
not me :-)

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Ingo Molnar


On Tue, 5 Sep 2000, Jeff V. Merkey wrote:

> Alright Ingo, you asked for it. I am going through it now and going
> over ALL my notes. I will catalog ALL of them and post it. Is this
> what you really want?

yes, this would be the best indeed, to get those places fixed. But if you
dont want to spend your time on that then it's enough to just post a
single incident of such inefficiency and list-walking that impacts latency
like you claim.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey



Alright Ingo, you asked for it.  I am going through it now and going
over ALL my notes.  I will catalog ALL of them and post it.  Is this
what you really want?  

:-)

Jeff


Ingo Molnar wrote:
> 
> On Tue, 5 Sep 2000, Jeff V. Merkey wrote:
> 
> > > > while (x)
> > > > {
> > > >   x = x->next
> > > > }
> > > >
> > > > all over the place that increases latency. [...]
> > >
> > > i challenge you to show one such place in the 2.4.0-test8-pre2 kernel. If
> > > it's all over the place and if it increases latency, you certainly can
> > > show at least one such place.
> >
> > When I have time to do this exercise, I will. [...]
> 
> well, your original claim (quoted above) shows that you have identified
> numerous such places already, so you dont have to do any additional
> 'exercise'. The "all over the place" code shouldnt be too hard to find
> again - please just say filename and line number in any kernel version of
> your choice and we'll look into it.
> 
> Ingo
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey



Linus Torvalds wrote:
> 
> 
> 
> Basically, only TCP and UDP really matter. Decnet, IPX, etc don't really
> make a big selling point any more.
> 
> 

Linus,

IPX is a really good LAN protocol (but totally sucks for internet).  A
full blown NCP server in-kernel that's toughtly coupled to the page
cache running over IPX would make flames shoot out of the back of a
Linux server, and make NT like look an old lady hobbling down the
street.  There's no need to configure client addresses with it, and for
file and print, it's the best.

Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Ingo Molnar


On Tue, 5 Sep 2000, Jeff V. Merkey wrote:

> > > while (x)
> > > {
> > >   x = x->next
> > > }
> > >
> > > all over the place that increases latency. [...]
> > 
> > i challenge you to show one such place in the 2.4.0-test8-pre2 kernel. If
> > it's all over the place and if it increases latency, you certainly can
> > show at least one such place.
> 
> When I have time to do this exercise, I will. [...]

well, your original claim (quoted above) shows that you have identified
numerous such places already, so you dont have to do any additional
'exercise'. The "all over the place" code shouldnt be too hard to find
again - please just say filename and line number in any kernel version of
your choice and we'll look into it.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Ingo Molnar


On Tue, 5 Sep 2000, Jeff V. Merkey wrote:

   while (x)
   {
 x = x-next
   }
  
   all over the place that increases latency. [...]
  
  i challenge you to show one such place in the 2.4.0-test8-pre2 kernel. If
  it's all over the place and if it increases latency, you certainly can
  show at least one such place.
 
 When I have time to do this exercise, I will. [...]

well, your original claim (quoted above) shows that you have identified
numerous such places already, so you dont have to do any additional
'exercise'. The "all over the place" code shouldnt be too hard to find
again - please just say filename and line number in any kernel version of
your choice and we'll look into it.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey



Alright Ingo, you asked for it.  I am going through it now and going
over ALL my notes.  I will catalog ALL of them and post it.  Is this
what you really want?  

:-)

Jeff


Ingo Molnar wrote:
 
 On Tue, 5 Sep 2000, Jeff V. Merkey wrote:
 
while (x)
{
  x = x-next
}
   
all over the place that increases latency. [...]
  
   i challenge you to show one such place in the 2.4.0-test8-pre2 kernel. If
   it's all over the place and if it increases latency, you certainly can
   show at least one such place.
 
  When I have time to do this exercise, I will. [...]
 
 well, your original claim (quoted above) shows that you have identified
 numerous such places already, so you dont have to do any additional
 'exercise'. The "all over the place" code shouldnt be too hard to find
 again - please just say filename and line number in any kernel version of
 your choice and we'll look into it.
 
 Ingo
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Ingo Molnar


On Tue, 5 Sep 2000, Jeff V. Merkey wrote:

 Alright Ingo, you asked for it. I am going through it now and going
 over ALL my notes. I will catalog ALL of them and post it. Is this
 what you really want?

yes, this would be the best indeed, to get those places fixed. But if you
dont want to spend your time on that then it's enough to just post a
single incident of such inefficiency and list-walking that impacts latency
like you claim.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey


You opened your mouth.  

:-)

Jeff

Ingo Molnar wrote:
 
 btw., - the maintainers of the 2.4 networking and TCP/IP code are Alexey
 Kuznetsov and David S. Miller - please direct your findings towards them,
 not me :-)
 
 Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Ingo Molnar


On Tue, 5 Sep 2000, Jeff V. Merkey wrote:

 The origin of this comment was related to a comparison of the
 MSM/TSM/CSM layer in NetWare and Linux. I've already said that Alan's
 code handles fast paths well and from what I've seen is comparable to
 NetWare. [...]

can we thus take this as a retraction of your below quoted three
derogatory comments?

" The entire Linux Network subsystem needs an overhaul. "

" In networking, the enemy is LATENCY for fast performance.  That's why
  NetWare can handle 5000 users and Linux barfs on 100 in similiar tests.  
  Copying increases latency, and the long code paths in the Linux Network
  layer. "


" Alan, Please.  I'm in your code and there are copies all over the
  place.  I agree you have a "fast path" for most stuff, but there's all
  kinds of handles lookups, linear list searching like

  while (x)
  {
x = x-next
  }
 
  all over the place that increases latency. "

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey



Ingo Molnar wrote:
 
 On Tue, 5 Sep 2000, Jeff V. Merkey wrote:
 
  The origin of this comment was related to a comparison of the
  MSM/TSM/CSM layer in NetWare and Linux. I've already said that Alan's
  code handles fast paths well and from what I've seen is comparable to
  NetWare. [...]
 
 can we thus take this as a retraction of your below quoted three
 derogatory comments?
 
 " The entire Linux Network subsystem needs an overhaul. "

To support the performance metrics of NetWare, there are some changes I
will make that will allow Alan's code to beat Native NetWare.  One is
allowing pre-scan protocol stacks to exist.  Another is a WTD
optimization to allow Alan's code to tag pages in the page cache and
post them with a preemptive IO WTD.  Another is moving ALL of the
routing code into the kernel space.  Another is consolidation of bottom
ad top halves to allow a single interrupt thread to run all the way into
the router and out without the need to schedule.  Another is moving the
NCP server into the kernel.   Another is enabling "gang" tagging and
release of a singe cache page by hundereds or thousands of users at one
tme for incoming reads.   The list is very long.  

 
 " In networking, the enemy is LATENCY for fast performance.  That's why
   NetWare can handle 5000 users and Linux barfs on 100 in similiar tests.
   Copying increases latency, and the long code paths in the Linux Network
   layer. "
 
 " Alan, Please.  I'm in your code and there are copies all over the
   place.  I agree you have a "fast path" for most stuff, but there's all
   kinds of handles lookups, linear list searching like
 
   while (x)
   {
 x = x-next
   }
 
   all over the place that increases latency. "
 
 Ingo

I already said this code is more than suitable, and better yet, it's
something folks are familiar with in Linux.  Alan and I went over some
of this off line.  Sorry you missed it.

Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Henning P. Schmiedehausen

[EMAIL PROTECTED] (Jeff V. Merkey) writes:



Linus Torvalds wrote:
 
 
 
 Basically, only TCP and UDP really matter. Decnet, IPX, etc don't really
 make a big selling point any more.
 
 

Linus,

IPX is a really good LAN protocol (but totally sucks for internet).  A
full blown NCP server in-kernel that's toughtly coupled to the page
cache running over IPX would make flames shoot out of the back of a
Linux server, and make NT like look an old lady hobbling down the
street.  There's no need to configure client addresses with it, and for
file and print, it's the best.

And it would be a good bit of necrophilia, too.

Jeff, Netware is dead. Please leave it there. IP won. The number of
new Netware Installations (as compared to existing or just upgrades)
is close (really close) to nil.

Regards
Henning
-- 
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen   -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH [EMAIL PROTECTED]

Am Schwabachgrund 22  Fon.: 09131 / 50654-0   [EMAIL PROTECTED]
D-91054 Buckenhof Fax.: 09131 / 50654-20   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread dean gaudet

On Mon, 4 Sep 2000, Jamie Lokier wrote:

 Alan Cox wrote:
   It's not faster than card-card DMA, which falls out naturally from my
   zero-copy proposal :-)
  
  We already support card-card DMA for routing with fastrouting
 
 ..but not for user space proxies which was the above's context.
 
 Still, the fastrouting proves card-card DMA actually works.

this is just a comment from the "real world" (for some definition of
real).

my last job was to take a hosted mail system from 3 million mailboxes to
over 25 million (the new design is targetted for 100 million).  part of
this architecture includes a userspace proxy at the front -- all data goes
through the proxy.

from the point of view of scaling and robustness it never made any sense
for us to put more than about 2Mbyte/s through one of these proxy boxes.  
which is pretty easy to handle even if you're doing another copy of all
the data.  (although my proxy code included a userland zero-copy
implementation, used readv/writev and was otherwise optimised.)

the part which actually broke down at that scale was handling the number
of concurrent connections, not the total bandwidth.

and, alan has pointed this out before -- it's not just concurrent
LAN-speed connections which are of interest.  it's concurrent modem users.  
all the current benchmarks are LAN-style and misrepresent the real-world
by a lot.

i know net connections are getting faster -- but are they ahead of the
moore's law curve or behind it?  it may be better (in the long run) for
linux to work on scaling concurrent connections than work on getting the
last couple percentages out of raw LAN transfers.  but i'm biased because
i've worked in the internet service space for too long :)

unfortunately i think scaling to tens of thousands of modem connections is
going to require some different programming paradigms -- the rt signal
stuff is a start.  but there's probably a lot of wins yet to be had from
deliberately delaying servicing of some connections in order to achieve
better cache usage.  (consider the case of the big static-content FTP/HTTP
server and trying to arrange for a few hundred connections to be on
roughly the same page at roughly the same time.  i think there's wins
there, but the math is hard :)

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Ingo Molnar


On Wed, 6 Sep 2000, Chris Wedgwood wrote:

 [...] The point is that a write() is only used if some sort of
 dynamic data is generated on the fly.
 
 There are exsiting applications out there that use mmap+write
 (caching the maps), it would be nice for the authors of these not to
 have to _require_ non-portable sendfile semantics for the best
 performance.

this is not just an interface question, mmap()+write() is conceptually
inferior to a sendfile(). [if the goal is to send the same data multiple
times.]

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jes Sorensen

 "Jeff" == Jeff V Merkey [EMAIL PROTECTED] writes:

Jeff Only Linux makes the lights flash with IPX RIP/SAP.  NetWare
Jeff uses NLSP routing and has since 1993 for IPX/SPX.  I agree if
Jeff someone is running NetWare 3 or NetWare 4.1 or earlier there's a
Jeff lot of RIP/SAP traffic, but not the NLSP versions -- they do not
Jeff use RIP/SAP but NLSP.

Well I looked at the switches for NT installations not Linux boxes, it
was a sad sight.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey


Only Linux makes the lights flash with IPX RIP/SAP.  NetWare uses NLSP
routing and has since 1993 for IPX/SPX.  I agree if someone is running
NetWare 3 or NetWare 4.1 or earlier there's a lot of RIP/SAP traffic,
but not the NLSP versions -- they do not use RIP/SAP but NLSP.

Jeff

Jes Sorensen wrote:
 
  "Jeff" == Jeff V Merkey [EMAIL PROTECTED] writes:
 
 Jeff IPX is a really good LAN protocol (but totally sucks for
 Jeff internet).  A full blown NCP server in-kernel that's toughtly
 Jeff coupled to the page cache running over IPX would make flames
 Jeff shoot out of the back of a Linux server, and make NT like look
 Jeff an old lady hobbling down the street.  There's no need to
 Jeff configure client addresses with it, and for file and print, it's
 Jeff the best.
 
 IPX is WHAT?
 
 I'd recommend you go look at the switches on your network and note how
 they look likg flashing chrstmas trees - broadcast traffic is not good
 for any type of network, be it LAN or WAN.
 
 Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Dan Hollis

On 5 Sep 2000, Henning P. Schmiedehausen wrote:
 [EMAIL PROTECTED] (Jeff V. Merkey) writes:
 IPX is a really good LAN protocol (but totally sucks for internet).  A
 Jeff, Netware is dead. Please leave it there. IP won. The number of
 new Netware Installations (as compared to existing or just upgrades)
 is close (really close) to nil.

I think you mean IPX is dead. Netware *could* work over TCP or UDP.
IP is definitely king. Even micro$haft gave up on NetBEUI.

-Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Dan Hollis

On Tue, 5 Sep 2000, Henning P . Schmiedehausen wrote:
 On Tue, Sep 05, 2000 at 11:25:12AM -0700, Dan Hollis wrote:
  I think you mean IPX is dead. Netware *could* work over TCP or UDP.
  IP is definitely king. Even micro$haft gave up on NetBEUI.
 Yep, thats' what I meant. Sorry that I was not clearer. But I think
 that there are even with NetWare on IP not many new
 installations. There is lots of migration of existing servers and
 keeping existing systems alive but new rollouts?
 But then again, maybe with MANOS and OpenNetWare, everything will be
 different.

OpenNetWare could reign king, if they would abandon the legacy cruft.

-Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jeff V. Merkey


Dear Stu,

I apologize for bothering you, but how possible would it be to let the
Linux folks get access to the Nitro docs and docs for Intel ethernet
cards in general.  There seems to be interest from folks in getting this
to improve Linux support for Intel products.

Any help would be appreciated.

Thanks,

Jeff Merkey

Alan Cox wrote:
 
   True the i960 based one I didn't think of, however Intel never
   provided docs for it.
 
  ??? I find this surprising.  Email [EMAIL PROTECTED] and ask him if
  they will give them to you.  I'm sure they would for Linux.
 
 I've spent over 2 years trying to extract eepro100 server docs out of Intel
 and failed. I've been refused info on how to detect the problematic 820/840
 boards with the memory translation hub - which means we now flush many intel
 chipset related bug reports into the 'cannot trust hardware' bucket. Intel
 have also consistently refused to document the MSR's that allow you to read
 the CPU intended speed and the bus clock that would allow us to flag and
 note overclocked CPU's on the /proc/cpuinfo and help bug reporting.
 
 Intel document what they feel like documenting, on their own agenda for their
 own purposes.
 
 Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Alan Cox

  True the i960 based one I didn't think of, however Intel never
  provided docs for it.
 
 ??? I find this surprising.  Email [EMAIL PROTECTED] and ask him if
 they will give them to you.  I'm sure they would for Linux.

I've spent over 2 years trying to extract eepro100 server docs out of Intel
and failed. I've been refused info on how to detect the problematic 820/840
boards with the memory translation hub - which means we now flush many intel
chipset related bug reports into the 'cannot trust hardware' bucket. Intel
have also consistently refused to document the MSR's that allow you to read
the CPU intended speed and the bus clock that would allow us to flag and
note overclocked CPU's on the /proc/cpuinfo and help bug reporting.

Intel document what they feel like documenting, on their own agenda for their
own purposes.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-05 Thread Jes Sorensen

 "Jeff" == Jeff V Merkey [EMAIL PROTECTED] writes:

Jeff IPX is a really good LAN protocol (but totally sucks for
Jeff internet).  A full blown NCP server in-kernel that's toughtly
Jeff coupled to the page cache running over IPX would make flames
Jeff shoot out of the back of a Linux server, and make NT like look
Jeff an old lady hobbling down the street.  There's no need to
Jeff configure client addresses with it, and for file and print, it's
Jeff the best.

IPX is WHAT?

I'd recommend you go look at the switches on your network and note how
they look likg flashing chrstmas trees - broadcast traffic is not good
for any type of network, be it LAN or WAN.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Dan Kegel

[EMAIL PROTECTED] wrote:
> Few month ago, I gathered precice data and posted it on lk-ml. In our 
> experiment, I used four 100Base cards, the Web-Bench gained nearly 5% 
> performance by the patch. The CPU load reached over 95%. 
> 
> I want to show the reference to experiments results, But unfortunately 
> lk-ml archive, www.kernelnotes.org, seems not be working now. 

Here's a link to an archive of kumon's post, plus a few other related
posts.  It might be worth looking at these even if you've already
read kumon's quote of his post.

http://www.kegel.com/mindcraft_redux.html#csum_partial_copy_generic

- Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Linus Torvalds



On Mon, 4 Sep 2000, Jamie Lokier wrote:

> Linus Torvalds wrote:
> 
> > Basically any copy <= 4 cache lines is "free" compared to trying to be
> > clever.
> 
> We're obviously interested in larger packets than 128 bytes.

"obviously"?

Take a look at some common traffic. Yes, even in servers. 

Small packets are not unlikely.

And even if you add extra code to _not_ do it for small packets, what
you're basically doing is slowing down the path - and increasing your
icache footprint etc quite noticeably, because suddenly you have two
mostly independent paths.

And the bugs get interesting too.

Also, you ignored the fact that the 128 can be 256 or 512. AND you ignored
the fact that I was being very generous indeed to the page table walking
case. So the 128 can end up being quite close to the page size. Or more.

> That's why the data is DMA'd to the card immediately, and that _card_
> retains the data at least for the short term.  Long term if it's still
> retained and the card runs out of memory, the card DMAs old buffers back
> to a kernel skbuff.  This is one way to avoid TLBs.

Yeah.

And such a card doesn't actually exist. At least not in copious numbers.

You need megabytes of memory to hold stuff like that for a busy server. A
few hundred connections with a few kB of queued-up, unacked data.

Face it, it's not realistic. Sure, such cards exist. How many people have
ever _seen_ one?

> Some people who claim zero-copy is great have done actual measurements
> and it does look good for reasonable size packets.  Even though the raw
> performance doesn't look that much better, the CPU utilisation does so
> you can actually _calculate_ a bit more with your data.
> 
> However, I've not seen any evidence that it's a good idea with the
> standard unix APIs.  I suspect everyone will agree on that :-)

Blame the API's.

Face it, complexity almost _never_ pays off.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jes Sorensen

> "Richard" == Richard Gooch <[EMAIL PROTECTED]> writes:

Richard> I thought you said some of the GigE drivers supported this?
Richard> Or were you just saying that the GigE cards were some of the
Richard> few which supported scatter/gather DMA and IP checksumming?

The latter.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Stephen C. Tweedie

Hi,

On Sun, Sep 03, 2000 at 07:29:56PM +0200, Ingo Molnar wrote:
> 
> On Sun, 3 Sep 2000, Andi Kleen wrote:
> 
> > I did the same for fragment RX some months ago (simple fragment lists
> > that were copy-checksummed to user space). Overall it is probably
> > better to use a kiovec, because that can be more easily used in nfsd
> > and sendfile.
> 
> the basic fragment type introduced by the TUX changes is a 'struct
> skb_frag', which has csum, size, *page, page_offset, frag_done, *data and
> *private fields - this is more than normal kiovecs offer. But i think
> kiovecs can be extended to do all this (if Stephen & everybody else
> agrees), i just didnt want to touch it for the time being.

I don't want to extend kiobufs for that sort of thing, since the
entire point of having kiobufs is to have a uniform container with
which to pass information between different kernel components.  If you
need more data, you'd do something like the SGI kiobuf-based block IO
stack does --- use a dedicated struct request, but use a pointer to a
kiobuf as the data location within that request struct.

In principle I'd think it would be a lot easier to add a kiovec
pointer to an skbuff than to extend kiobufs to be suitable for the
networking stack (and we had a BOF on this at OLS --- it seemed quite
feasible).

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Richard Gooch

Jes Sorensen writes:
> > "Richard" == Richard Gooch <[EMAIL PROTECTED]> writes:
> 
> Richard> Andrew Morton writes:
> >> All of them except the 3c905 provide hardware Rx and Tx
> >> checksumming of IP, TCP and UDP headers.  No 64 bit addressing
> >> support.
> 
> Richard> And does the driver support it? Has anyone benchmarked the
> Richard> performance difference (if any)?
> 
> There isn't much gain from using it when we can't do zero copy xmits
> in the first place. It might be worth enabling for receive though.

I thought you said some of the GigE drivers supported this? Or were
you just saying that the GigE cards were some of the few which
supported scatter/gather DMA and IP checksumming?

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jes Sorensen

> "Ingo" == Ingo Molnar <[EMAIL PROTECTED]> writes:

Ingo> On Mon, 4 Sep 2000 [EMAIL PROTECTED] wrote:

>> The experiment showed the following prefetching could reduce 20-30%
>> of csum_partial_copy_generic() execution time.

Ingo> Please test it and post the numbers. csum_partial_copy_generic()
Ingo> already does prefetching - the real test would be to check
Ingo> lat_tcp and bw_tcp numbers over gigabit, with and without this
Ingo> patch applied. (the same numbers over localhost dont really
Ingo> count.) Eg. we had smart KNI-based memcpy routines as well, and
Ingo> it turned out that bw_tcp over gigabit actually got
Ingo> slower. [testing over 100mbit isnt enough obviously because x86
Ingo> CPUs csum much faster than that.]

Thats a surprise to me, I remember better performance when I played
with Doug's KNI patches back in the beginning.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jes Sorensen

> "Jamie" == Jamie Lokier <[EMAIL PROTECTED]> writes:

Jamie> Nice point!  Only valid for TCP & UDP though.

Jamie> When people want _real_ low latency, they don't use TCP or UDP,
Jamie> and they certainly don't put data checksums at the start.  They
Jamie> still aim for zero copies.  That pass, even over cached data,
Jamie> is still significant.

In this case you really want to do a user space driver implementation
to avoid the cost of context switches. This is how a lot of people use
the Myrinet (afaik) and there were also some of the guys at CERN who
did a similar driver for the AceNIC at the same time I wrote the first
version of the 'normal' driver.

Jes

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jes Sorensen

> "Ingo" == Ingo Molnar <[EMAIL PROTECTED]> writes:

Ingo> On Sun, 3 Sep 2000, Andi Kleen wrote:

>> I did the same for fragment RX some months ago (simple fragment
>> lists that were copy-checksummed to user space). Overall it is
>> probably better to use a kiovec, because that can be more easily
>> used in nfsd and sendfile.

Ingo> the basic fragment type introduced by the TUX changes is a
Ingo> 'struct skb_frag', which has csum, size, *page, page_offset,
Ingo> frag_done, *data and *private fields - this is more than normal
Ingo> kiovecs offer. But i think kiovecs can be extended to do all
Ingo> this (if Stephen & everybody else agrees), i just didnt want to
Ingo> touch it for the time being.

I'd love to see this transferred to kiobufs, I'd prefer not to see yet
another structure introduced ;-)

At OLS we discussed a design for this, I think the consencus was to
keep the data field in the old skb and allow this to be used by the
old driver (receive path) and for building headers for tx
packets. Then one can either optionally do a linearized skb with
everything in the data field for the old hardware or stick pointers to
data in a kiobuf.

I set up a mailing list for these discussions at
[EMAIL PROTECTED] ([EMAIL PROTECTED]) to
subscribe. It's been fairly quiet so far, but I'd like to see more
action.

Ok, I'll go read your code next.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jes Sorensen

> "Richard" == Richard Gooch <[EMAIL PROTECTED]> writes:

Richard> Andrew Morton writes:
>> All of them except the 3c905 provide hardware Rx and Tx
>> checksumming of IP, TCP and UDP headers.  No 64 bit addressing
>> support.

Richard> And does the driver support it? Has anyone benchmarked the
Richard> performance difference (if any)?

There isn't much gain from using it when we can't do zero copy xmits
in the first place. It might be worth enabling for receive though.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread kumon
Ingo Molnar writes:
 > On Mon, 4 Sep 2000 [EMAIL PROTECTED] wrote:
 > > The experiment showed the following prefetching could reduce 20-30% of
 > > csum_partial_copy_generic() execution time.
 > 
 > Please test it and post the numbers. csum_partial_copy_generic() already
 > does prefetching - the real test would be to check lat_tcp and bw_tcp
 > numbers over gigabit, with and without this patch applied. (the same

If you want me to do, I can't. Our environment doesn't have
gigabit-ether.  If someone would measure it, please let me know the
results.

 > actually got slower. [testing over 100mbit isnt enough obviously because
 > x86 CPUs csum much faster than that.]

Few month ago, I gathered precice data and posted it on lk-ml.  In our
experiment, I used four 100Base cards, the Web-Bench gained nearly 5%
performance by the patch. The CPU load reached over 95%.

I want to show the reference to experiments results, But unfortunately
lk-ml archive, www.kernelnotes.org, seems not be working now.
So I post the complete mail again.  It's a long text..

> From: [EMAIL PROTECTED]
> Date: Fri, 19 May 2000 22:20:27 +0900
> Message-Id: <[EMAIL PROTECTED]>
> cc: [EMAIL PROTECTED]
> Subject: [PATCH] Fast csum_partial_copy_generic and more
> Sender: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> 
> Hi,
> 
> Here is a patch to speedup csum_partial_copy_generic() on i686 SMP
> upto 50%, and also I added some analysis of further optimization from
> the view point of SMP cache behavior.
> 
> 
> [Patch Summary]
> Attatched patch optimizes csum_partial_copy_generic().
> 
> Measurement using the WEB-BENCH, the consumption time at the function
> is reduced by 33%, so 1.5 times faster than the original.
> 
> [Background of csum_partial_copy_generic]
>  In the funcion of i686 code, long-word transfer are unfolded 16
> times, one loop copies exactly 64 bytes, except the beginning and the
> final fraction processing.
>  From the observation of the csum_partial_copy_generic behavior, it
> produces lots of cache misses, and this is the main reason of
> slowness.
> 
> [How]
>  To accelerate the function, use dummy read as pre-fetch. Of course
> pre-fetching must be done only within the accessible area.  And the
> top word of a cache block should be prefetched, because the CPU gets
> the requested word first when the cache is miss-hit, the first word is
> always needed earlier than the other word in the block.
> 
> To keep track to the block top word, a new pointer is added, which
> always points the first word of the cache block which contains the
> 63th byte of a source block. This pointer never points outside of the
> source region, so it is safe to read.  This patch doesn't make sense
> if the CPU is not a super-scalar type execution.
> 
> Strictly speaking, this prefetch may read just after source regionn at
> most 3 byte. But it never causes trouble, because this excessive area
> and the last transfered byte reside in a same cache block.
> 
> 
> [Benchmarking Result]
> 
> We used Web-bench as a workload, and measure three versions of
> csum_partial_copy_generic.
> 
> Version synopsis,
> 2.3.99-pre8-base: the original kernel. but SLAB_POISONING is disabled
>   to compare performance to older kernels. but the older data
>   is not attached.
> 2.3.99-pre8-AS: Add "Artur Skawina <[EMAIL PROTECTED]>" patch.
>   This pache is also attached at the end of mail.
> 2.3.99-pre8-Pf: Add my pre-fetch patch.
>   This patch also included in this mail.
> 
> We obtained the following profile.
> 
> The number is average consumption time (unit is us) for one
> web-transaction processing.
> 
> Machine is 4 SMP Xeon 450MHz 2MB with 2GB, w/o HIMEM,
> so actually only1 GB is recognized.
> 
> 2.3.99-pre8-Pf
>   2.3.99-pre8-AS
>   2.3.99-pre8-base
> 990.3 1023.8  1019.3  TOTAL(OS)
>  64.9  58.254.4   default_idle
>  24.8  21.820.8   cpu_idle
> 
>  63.4  94.898.3   csum_partial_copy_generic
>  84.2  82.882.2   stext_lock
>  76.0  76.877.7   boomerang_interrupt
>  36.6  36.536.9   boomerang_rx
>  33.4  33.634.3   boomerang_start_xmit
>  28.6  29.029.4   schedule
>  22.4  24.224.1   kmalloc
>  21.5  23.522.0   kfree
>  19.5  20.219.9   tcp_v4_rcv
>  18.0  18.418.3   wait_for_completion
>  17.9  18.317.9   __kfree_skb
>  16.7  16.515.7   __wake_up
>  11.0  11.310.8   do_IRQ
>   rest dropped
> 
> Csum_partial_copy_generic becomes 98.3us->63.4us by using Pf patch, so
> the patch gained 1.5 times speedup.
> 
> Unfortunately, AS version does not show a significant gain.  If the
> cache is hit,it may show some advantage. But unfortunately, in the
> current execution environment, the patch is difficult to hide
> cache-miss latency.
> 
> By using the user-land benchmark, the new patch also reduce time even
> when the source operand is aligned at the cache boundary.
> 
> By applying the patch, stext_lock re-appear to the top of time
> consumption race.  Last time stext_lock 

Re: zero-copy TCP

2000-09-04 Thread Jamie Lokier

Linus Torvalds wrote:
> (The "invalidate on write" is the sane way of doing SMP cache coherency,
> which is probably why. Trying to have shared dirty cache-lines is just
> not a viable option in the end).

With DMA from a device -- "snoop and update" still results in only one
owner of the dirty cache-lines: the CPU.  (Even SMP).  But you are right
that it isn't implemented that way.

> Basically any copy <= 4 cache lines is "free" compared to trying to be
> clever.
n
We're obviously interested in larger packets than 128 bytes.

> If you truly wan tto do zero-copy from user space and get real UNIX
> semantics for writes(), you'd better protect the page somehow, so that
> if the data hasn't made it out to the network (or needed to be
> re-sent) by the time the system call returns [...]

That's why the data is DMA'd to the card immediately, and that _card_
retains the data at least for the short term.  Long term if it's still
retained and the card runs out of memory, the card DMAs old buffers back
to a kernel skbuff.  This is one way to avoid TLBs.

> That's when you get into TLB invalidates etc.  By which time you're
> talking another few cache invalidates, and possibly some nasty cross-CPU
> calls for SMP TLB coherency. 

If you have to do a TLB invalidate then yes of course it's a loss.
That must be avoided :-)

> People who claim "zero-copy" is a great thing often ignore the costs of
> _not_ copying altogether. 

Some people who claim zero-copy is great have done actual measurements
and it does look good for reasonable size packets.  Even though the raw
performance doesn't look that much better, the CPU utilisation does so
you can actually _calculate_ a bit more with your data.

However, I've not seen any evidence that it's a good idea with the
standard unix APIs.  I suspect everyone will agree on that :-)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jamie Lokier

Alan Cox wrote:
> > struct page of course).  Note that it doesn't matter if another thread,
> > and this includes truncate/write in another thread, clobbers the page
> > data.  That's just the normal effect of two concurrent writers to the
> > same memory.
> 
> Oh it does matter. You might send out a page of kernel data by mistake, or
> one some machines take a fault

Of course you still have to lock the page (bump the reference count) and
write-fault if it needs unsharing.

What I mean is it doesn't matter if another thread writes to the page,
even from kernel space, as long as it's still the same user space page.
You don't have to prevent _other_ writers from accessing the page, so
you don't need to modify page tables or do any TLB flushes.

(This applies on x86 at least.  I'm not so sure how DMA cache coherency
works on other architectures.)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Ingo Molnar


On Mon, 4 Sep 2000 [EMAIL PROTECTED] wrote:

> The experiment showed the following prefetching could reduce 20-30% of
> csum_partial_copy_generic() execution time.

Please test it and post the numbers. csum_partial_copy_generic() already
does prefetching - the real test would be to check lat_tcp and bw_tcp
numbers over gigabit, with and without this patch applied. (the same
numbers over localhost dont really count.) Eg. we had smart KNI-based
memcpy routines as well, and it turned out that bw_tcp over gigabit
actually got slower. [testing over 100mbit isnt enough obviously because
x86 CPUs csum much faster than that.]

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jamie Lokier

Alan Cox wrote:
  struct page of course).  Note that it doesn't matter if another thread,
  and this includes truncate/write in another thread, clobbers the page
  data.  That's just the normal effect of two concurrent writers to the
  same memory.
 
 Oh it does matter. You might send out a page of kernel data by mistake, or
 one some machines take a fault

Of course you still have to lock the page (bump the reference count) and
write-fault if it needs unsharing.

What I mean is it doesn't matter if another thread writes to the page,
even from kernel space, as long as it's still the same user space page.
You don't have to prevent _other_ writers from accessing the page, so
you don't need to modify page tables or do any TLB flushes.

(This applies on x86 at least.  I'm not so sure how DMA cache coherency
works on other architectures.)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jamie Lokier

Linus Torvalds wrote:
 (The "invalidate on write" is the sane way of doing SMP cache coherency,
 which is probably why. Trying to have shared dirty cache-lines is just
 not a viable option in the end).

With DMA from a device -- "snoop and update" still results in only one
owner of the dirty cache-lines: the CPU.  (Even SMP).  But you are right
that it isn't implemented that way.

 Basically any copy = 4 cache lines is "free" compared to trying to be
 clever.
n
We're obviously interested in larger packets than 128 bytes.

 If you truly wan tto do zero-copy from user space and get real UNIX
 semantics for writes(), you'd better protect the page somehow, so that
 if the data hasn't made it out to the network (or needed to be
 re-sent) by the time the system call returns [...]

That's why the data is DMA'd to the card immediately, and that _card_
retains the data at least for the short term.  Long term if it's still
retained and the card runs out of memory, the card DMAs old buffers back
to a kernel skbuff.  This is one way to avoid TLBs.

 That's when you get into TLB invalidates etc.  By which time you're
 talking another few cache invalidates, and possibly some nasty cross-CPU
 calls for SMP TLB coherency. 

If you have to do a TLB invalidate then yes of course it's a loss.
That must be avoided :-)

 People who claim "zero-copy" is a great thing often ignore the costs of
 _not_ copying altogether. 

Some people who claim zero-copy is great have done actual measurements
and it does look good for reasonable size packets.  Even though the raw
performance doesn't look that much better, the CPU utilisation does so
you can actually _calculate_ a bit more with your data.

However, I've not seen any evidence that it's a good idea with the
standard unix APIs.  I suspect everyone will agree on that :-)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jes Sorensen

 "Richard" == Richard Gooch [EMAIL PROTECTED] writes:

Richard Andrew Morton writes:
 All of them except the 3c905 provide hardware Rx and Tx
 checksumming of IP, TCP and UDP headers.  No 64 bit addressing
 support.

Richard And does the driver support it? Has anyone benchmarked the
Richard performance difference (if any)?

There isn't much gain from using it when we can't do zero copy xmits
in the first place. It might be worth enabling for receive though.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jes Sorensen

 "Ingo" == Ingo Molnar [EMAIL PROTECTED] writes:

Ingo On Sun, 3 Sep 2000, Andi Kleen wrote:

 I did the same for fragment RX some months ago (simple fragment
 lists that were copy-checksummed to user space). Overall it is
 probably better to use a kiovec, because that can be more easily
 used in nfsd and sendfile.

Ingo the basic fragment type introduced by the TUX changes is a
Ingo 'struct skb_frag', which has csum, size, *page, page_offset,
Ingo frag_done, *data and *private fields - this is more than normal
Ingo kiovecs offer. But i think kiovecs can be extended to do all
Ingo this (if Stephen  everybody else agrees), i just didnt want to
Ingo touch it for the time being.

I'd love to see this transferred to kiobufs, I'd prefer not to see yet
another structure introduced ;-)

At OLS we discussed a design for this, I think the consencus was to
keep the data field in the old skb and allow this to be used by the
old driver (receive path) and for building headers for tx
packets. Then one can either optionally do a linearized skb with
everything in the data field for the old hardware or stick pointers to
data in a kiobuf.

I set up a mailing list for these discussions at
[EMAIL PROTECTED] ([EMAIL PROTECTED]) to
subscribe. It's been fairly quiet so far, but I'd like to see more
action.

Ok, I'll go read your code next.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jes Sorensen

 "Jamie" == Jamie Lokier [EMAIL PROTECTED] writes:

Jamie Nice point!  Only valid for TCP  UDP though.

Jamie When people want _real_ low latency, they don't use TCP or UDP,
Jamie and they certainly don't put data checksums at the start.  They
Jamie still aim for zero copies.  That pass, even over cached data,
Jamie is still significant.

In this case you really want to do a user space driver implementation
to avoid the cost of context switches. This is how a lot of people use
the Myrinet (afaik) and there were also some of the guys at CERN who
did a similar driver for the AceNIC at the same time I wrote the first
version of the 'normal' driver.

Jes

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Richard Gooch

Jes Sorensen writes:
  "Richard" == Richard Gooch [EMAIL PROTECTED] writes:
 
 Richard Andrew Morton writes:
  All of them except the 3c905 provide hardware Rx and Tx
  checksumming of IP, TCP and UDP headers.  No 64 bit addressing
  support.
 
 Richard And does the driver support it? Has anyone benchmarked the
 Richard performance difference (if any)?
 
 There isn't much gain from using it when we can't do zero copy xmits
 in the first place. It might be worth enabling for receive though.

I thought you said some of the GigE drivers supported this? Or were
you just saying that the GigE cards were some of the few which
supported scatter/gather DMA and IP checksumming?

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Stephen C. Tweedie

Hi,

On Sun, Sep 03, 2000 at 07:29:56PM +0200, Ingo Molnar wrote:
 
 On Sun, 3 Sep 2000, Andi Kleen wrote:
 
  I did the same for fragment RX some months ago (simple fragment lists
  that were copy-checksummed to user space). Overall it is probably
  better to use a kiovec, because that can be more easily used in nfsd
  and sendfile.
 
 the basic fragment type introduced by the TUX changes is a 'struct
 skb_frag', which has csum, size, *page, page_offset, frag_done, *data and
 *private fields - this is more than normal kiovecs offer. But i think
 kiovecs can be extended to do all this (if Stephen  everybody else
 agrees), i just didnt want to touch it for the time being.

I don't want to extend kiobufs for that sort of thing, since the
entire point of having kiobufs is to have a uniform container with
which to pass information between different kernel components.  If you
need more data, you'd do something like the SGI kiobuf-based block IO
stack does --- use a dedicated struct request, but use a pointer to a
kiobuf as the data location within that request struct.

In principle I'd think it would be a lot easier to add a kiovec
pointer to an skbuff than to extend kiobufs to be suitable for the
networking stack (and we had a BOF on this at OLS --- it seemed quite
feasible).

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jes Sorensen

 "Richard" == Richard Gooch [EMAIL PROTECTED] writes:

Richard I thought you said some of the GigE drivers supported this?
Richard Or were you just saying that the GigE cards were some of the
Richard few which supported scatter/gather DMA and IP checksumming?

The latter.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Linus Torvalds



On Mon, 4 Sep 2000, Jamie Lokier wrote:

 Linus Torvalds wrote:
 
  Basically any copy = 4 cache lines is "free" compared to trying to be
  clever.
 
 We're obviously interested in larger packets than 128 bytes.

"obviously"?

Take a look at some common traffic. Yes, even in servers. 

Small packets are not unlikely.

And even if you add extra code to _not_ do it for small packets, what
you're basically doing is slowing down the path - and increasing your
icache footprint etc quite noticeably, because suddenly you have two
mostly independent paths.

And the bugs get interesting too.

Also, you ignored the fact that the 128 can be 256 or 512. AND you ignored
the fact that I was being very generous indeed to the page table walking
case. So the 128 can end up being quite close to the page size. Or more.

 That's why the data is DMA'd to the card immediately, and that _card_
 retains the data at least for the short term.  Long term if it's still
 retained and the card runs out of memory, the card DMAs old buffers back
 to a kernel skbuff.  This is one way to avoid TLBs.

Yeah.

And such a card doesn't actually exist. At least not in copious numbers.

You need megabytes of memory to hold stuff like that for a busy server. A
few hundred connections with a few kB of queued-up, unacked data.

Face it, it's not realistic. Sure, such cards exist. How many people have
ever _seen_ one?

 Some people who claim zero-copy is great have done actual measurements
 and it does look good for reasonable size packets.  Even though the raw
 performance doesn't look that much better, the CPU utilisation does so
 you can actually _calculate_ a bit more with your data.
 
 However, I've not seen any evidence that it's a good idea with the
 standard unix APIs.  I suspect everyone will agree on that :-)

Blame the API's.

Face it, complexity almost _never_ pays off.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Dan Kegel

[EMAIL PROTECTED] wrote:
 Few month ago, I gathered precice data and posted it on lk-ml. In our 
 experiment, I used four 100Base cards, the Web-Bench gained nearly 5% 
 performance by the patch. The CPU load reached over 95%. 
 
 I want to show the reference to experiments results, But unfortunately 
 lk-ml archive, www.kernelnotes.org, seems not be working now. 

Here's a link to an archive of kumon's post, plus a few other related
posts.  It might be worth looking at these even if you've already
read kumon's quote of his post.

http://www.kegel.com/mindcraft_redux.html#csum_partial_copy_generic

- Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Jes Sorensen

 "Ingo" == Ingo Molnar [EMAIL PROTECTED] writes:

Ingo On Mon, 4 Sep 2000 [EMAIL PROTECTED] wrote:

 The experiment showed the following prefetching could reduce 20-30%
 of csum_partial_copy_generic() execution time.

Ingo Please test it and post the numbers. csum_partial_copy_generic()
Ingo already does prefetching - the real test would be to check
Ingo lat_tcp and bw_tcp numbers over gigabit, with and without this
Ingo patch applied. (the same numbers over localhost dont really
Ingo count.) Eg. we had smart KNI-based memcpy routines as well, and
Ingo it turned out that bw_tcp over gigabit actually got
Ingo slower. [testing over 100mbit isnt enough obviously because x86
Ingo CPUs csum much faster than that.]

Thats a surprise to me, I remember better performance when I played
with Doug's KNI patches back in the beginning.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-04 Thread Ingo Molnar


On Mon, 4 Sep 2000 [EMAIL PROTECTED] wrote:

 The experiment showed the following prefetching could reduce 20-30% of
 csum_partial_copy_generic() execution time.

Please test it and post the numbers. csum_partial_copy_generic() already
does prefetching - the real test would be to check lat_tcp and bw_tcp
numbers over gigabit, with and without this patch applied. (the same
numbers over localhost dont really count.) Eg. we had smart KNI-based
memcpy routines as well, and it turned out that bw_tcp over gigabit
actually got slower. [testing over 100mbit isnt enough obviously because
x86 CPUs csum much faster than that.]

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Linus Torvalds



On Sun, 3 Sep 2000, Lincoln Dale wrote:
> 
> many people (myself included) have been experimenting with zerocopy 
> infrastructures.
> in my case, i've been working on it as time permits for quite a few months 
> now, and am about on my fourth rewrite.

Heh. 

> i've found exactly what you state about the bad things that occur when you 
> associate zerocopy infrastructure with user-space code.  some of the the MM 
> tricks required for handling individual pages effectively kills any 
> performance gain.
> 
> however, approaching it from the other angle of "buffers pinned in kernel 
> memory" can give you a huge win.

I agree.

The "send data from already pinned buffers" case is different. That is
basically why "sendfile()" exists, and why TUX gets good numbers. Once you
get away from the "zero copy from user space" mentality, and start just
passing kernel buffers around, things look a lot better.

> for the application which prompted me to begin looking at this problem, 
> where packets typically go network -> RAM -> network, providing a zerocopy 
> infrastructure for (a) viewing incoming packet streams pinned in kernel 
> memory from user-space [a sort-of SIGIO with pointers to the buffers], and 
> (b) hooks for user-space directing the kernel to do things with these 
> buffers [eg. "queue buffer A for output on fd Y"] has provided an immediate 
> 60% performance gain.

You really should look into using the page cache if you can: that way you
have a very natural way of looking at it and possibly changing the stream
in user mode with no extra copies for that side either.

I'm not saying that you should necessarily actually go to a "real file",
but the best way of allowing user-space access to things like this is
through mmap(), and if you make it look like the page cache you'll get a
lot of code for free...

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Lincoln Dale

At 22:53 03/09/00, Linus Torvalds wrote:
> >Ugh.  User space DMA gets complicated quickly.  The performance question
> >is, perhaps, can you do this without a TLB flush (but with locking the
> >struct page of course).  Note that it doesn't matter if another thread,
> >and this includes truncate/write in another thread, clobbers the page
> >data.  That's just the normal effect of two concurrent writers to the
> >same memory.
...
>People who claim "zero-copy" is a great thing often ignore the costs of
>_not_ copying altogether.

many people (myself included) have been experimenting with zerocopy 
infrastructures.
in my case, i've been working on it as time permits for quite a few months 
now, and am about on my fourth rewrite.

i've found exactly what you state about the bad things that occur when you 
associate zerocopy infrastructure with user-space code.  some of the the MM 
tricks required for handling individual pages effectively kills any 
performance gain.

however, approaching it from the other angle of "buffers pinned in kernel 
memory" can give you a huge win.
for the application which prompted me to begin looking at this problem, 
where packets typically go network -> RAM -> network, providing a zerocopy 
infrastructure for (a) viewing incoming packet streams pinned in kernel 
memory from user-space [a sort-of SIGIO with pointers to the buffers], and 
(b) hooks for user-space directing the kernel to do things with these 
buffers [eg. "queue buffer A for output on fd Y"] has provided an immediate 
60% performance gain.

performance was previously pinned on front-side-bus (or memory) bandwidth.

the interfaces are a bit hacky, and the way one has to queue packets for 
tcp-write is awful right now, but i hope these can be cleaned up over time.

network cards which offload the IP & TCP checksum calculation isn't even 
required; provided the incoming checksum is preserved, the original pseudo 
TCP header can be "reversed out" without having to re-read the entire 
packet payloads again.


cheers,

lincoln.

--
   Lincoln Dale   Content Services Business Unit
   [EMAIL PROTECTED]  cisco Systems, Inc.   | |
||||
   +1 (408) 525-1274  bldg G, 170 West Tasman    
   +61 (3) 9659-4294 <<   San Jose CA 95134..:||:..:||:.. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Jamie Lokier  <[EMAIL PROTECTED]> wrote:
>Alan Cox wrote:
>> > read/recv block while the NIC DMAs into user space main memory.
>> 
>> Thats actually not always a win either. A DMA to user space flushes
>> those pages out of cache which isnt so ideal if the CPU wants
>> them. Some of the results are suprisingly counter-intuitive like this
>
>Does it flush the CPU cache?  I thought the CPU just snooped the bus and
>updated its cache with new data.

In theory you could do "snoop and update".

In practice I do not know of a single chip that actually does that.
Pretty much _everybody_ does "invalidate on write".

(The "invalidate on write" is the sane way of doing SMP cache coherency,
which is probably why. Trying to have shared dirty cache-lines is just
not a viable option in the end).

>Ugh.  User space DMA gets complicated quickly.  The performance question
>is, perhaps, can you do this without a TLB flush (but with locking the
>struct page of course).  Note that it doesn't matter if another thread,
>and this includes truncate/write in another thread, clobbers the page
>data.  That's just the normal effect of two concurrent writers to the
>same memory.

Simple calculation: to actually even _find_ the physical page, you
usually need to do at least three levels of page table walking. Sure,
some CPU's have "translate" instructions to do it for you in hardware
and use the TLB to help you, but the most common architecture out there
does not do that. So with the 4GB+ option, in order to just _find_ the
physical page (so that you can do DMA to it), you need to do that
complex page table walk.

Let's say that that page table walk is 50 instructions at best case. 
And that pretty assumes that you did the thing in assembly code and were
very aggressive.

Also, you can pretty much assume that even if the code is in the cache,
the page tables themselves probably aren't. So assume a minimum of three
cache misses right there (plus the code - 50 instructions).

Furthermore, to actually pin a page down, even if you do _nothing_ else,
you'd at least need a SMP-safe increment (and eventual decrement). 
That's another 24 cycles just in those two instructions on x86. 

End result: you've done the work equivalent to about 4 cache misses.
Just to look up the physical page, and not actually _doing_ anything
with it. Never mind locking it in memory or anything like that. And
that's assuming you got no icache misses on the actual _code_ to do all
of this.

Basically any copy <= 4 cache lines is "free" compared to trying to be
clever.  That's 128 bytes on most machines right now.  And cache-lines
are growing: 64 and 128 byte cache-lines are not that unlikely these
days (I think Athlon has a 64-byte cache-line, for example, just in the
PC space, and alpha and sparc64 do also). 

So basically the cost of a simple memcpy() isn't neceassarily that big.
The above calculations were rather kind towards the "lock the page
down" case, and it's not all that unlikely that the cost of locking down
a page is on the same order of magnitude as just doing a "memcpy()" on
the whole page.

The above gets _much_ worse in real life.  If you truly wan tto do
zero-copy from user space and get real UNIX semantics for writes(),
you'd better protect the page somehow, so that if the data hasn't made
it out to the network (or needed to be re-sent) by the time the system
call returns, the user can't change the user-mode buffer before the data
is out.

That's when you get into TLB invalidates etc.  By which time you're
talking another few cache invalidates, and possibly some nasty cross-CPU
calls for SMP TLB coherency. 

People who claim "zero-copy" is a great thing often ignore the costs of
_not_ copying altogether. 

(This is the same mistake that people who do complexity analysis often
stumble on.  Sure, "constant time" is perfect.  Except it's not
necessarily unusual "constant time" is 50 times larger than O(n) in
practice.  Same goes for zero-copy - it's "perfect", but can easily be
slower than just plain old "good"). 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Jamie Lokier

Alan Cox wrote:
> > read/recv block while the NIC DMAs into user space main memory.
> 
> Thats actually not always a win either. A DMA to user space flushes
> those pages out of cache which isnt so ideal if the CPU wants
> them. Some of the results are suprisingly counter-intuitive like this

Does it flush the CPU cache?  I thought the CPU just snooped the bus and
updated its cache with new data.

> > The NIC memory is never accessed directly.  It's a cache of skbuff data
> > payloads, and the only access is by DMA, in those places where the
> > kernel stack normally does copy_to_user/copy_from_user.
> 
> Yep. Unfortunately to make it work you need to do a bit more than that - you
> have to pin pages in memory, flush caches on some cpus, lock those pages
> against being truncated by another process (on the other cpu) and you have
> to deal with queue reclamation when the nic gets short of buffers.

Ugh.  User space DMA gets complicated quickly.  The performance question
is, perhaps, can you do this without a TLB flush (but with locking the
struct page of course).  Note that it doesn't matter if another thread,
and this includes truncate/write in another thread, clobbers the page
data.  That's just the normal effect of two concurrent writers to the
same memory.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Jamie Lokier

Alan Cox wrote:
> > It's not faster than card->card DMA, which falls out naturally from my
> > zero-copy proposal :-)
> 
> We already support card->card DMA for routing with fastrouting

..but not for user space proxies which was the above's context.

Still, the fastrouting proves card->card DMA actually works.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Alan Cox

> It's not faster than card->card DMA, which falls out naturally from my
> zero-copy proposal :-)

We already support card->card DMA for routing with fastrouting

> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Alan Cox

> read/recv block while the NIC DMAs into user space main memory.

Thats actually not always a win either. A DMA to user space flushes those pages
out of cache which isnt so ideal if the CPU wants them. Some of the results
are suprisingly counter-intuitive like this

> (Can't DMA earlier because we don't know the buffer address in advance
> when using the standard socket API).

Yes I realised the API you meant when you said. This is local store of packet
then hand to user memory when the IRQ handler names a target address.

> The NIC memory is never accessed directly.  It's a cache of skbuff data
> payloads, and the only access is by DMA, in those places where the
> kernel stack normally does copy_to_user/copy_from_user.

Yep. Unfortunately to make it work you need to do a bit more than that - you
have to pin pages in memory, flush caches on some cpus, lock those pages
against being truncated by another process (on the other cpu) and you have
to deal with queue reclamation when the nic gets short of buffers.

You can do it and with some of the PC world hardware. NeXT did this with their
lance driver



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Jamie Lokier

Ingo Molnar wrote:
> > At CERN we had a bunch of applications where this would be a win, data
> > aquisition servers taking data in from on some custom hardware and
> > sending out data over the wire on another card. You never really want
> > to touch data in memory with the CPU but because of the lack of
> > write() zero copy you end up having to do so.
> 
> yep i agree - in this case a receivefile() implementation would be handy
> (we are 100% ready in 2.4 to introduce it - from the pagecache and VFS
> point of view, it's just not there yet), thus you could receivefile() your
> data into a temporary file, and sendfile() it to the other card, without
> ever touching data. This is faster than any zero-copy read()/write(),
> because it can do things straight in the pagecache, without having to deal
> with user-space page mappings.

It's not faster than card->card DMA, which falls out naturally from my
zero-copy proposal :-)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Jamie Lokier

Linus Torvalds wrote:
>   Proof: the data to be sent out is in RAM.  In fact, often it is cached
>   in the CPU these days. In order to start sending out the packet, the
>   smart card has to move all of the data from RAM/cache over the bus to
>   the card.  It can only start actually sending after that.  Cost: bus
>   speed to copy it over.
> 
>   In contrast, if you do it on the CPU, you can basically start feeding
>   the packet out on the net after doing a CPU checksum that is limited
>   by RAM/cache speeds. Bus speed isn't the limiting factor any more on
>   packet latency, as you can send out the start of the packet on the
>   network before the whole packet has even been copied over the internal
>   bus!

Nice point!  Only valid for TCP & UDP though.

When people want _real_ low latency, they don't use TCP or UDP, and they
certainly don't put data checksums at the start.  They still aim for
zero copies.  That pass, even over cached data, is still significant.

> Right now gigabit is heavy-duty enough that it is worth smart cards. 
> The same used to be true about the first generation of 100Mbit cards. 
> The same will be true of 10Gbps cards in another few years.  But
> basically, they'll probably always end up being the exception rather
> than the rule, unless they become so cheap that it doesn't matter.  But
> "cheap" and "pushing the performance envelope" do not tend to go hand in
> hand. 

Fair enough.  Please read my description of a zero-copy scheme that
doesn't require much intelligence on the card though.  I think it's a
neat kernel trick that might just pay off.  Sometimes, maybe.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Jamie Lokier

Andi Kleen wrote:
> On Sun, Sep 03, 2000 at 05:22:44AM +0200, Jamie Lokier wrote:
> > I just thought I'd mention that you can do zero copy TCP in and out
> > *without* any page marking schemes.  All you need is a network card with
> > quite a lot of RAM and some intelligence.  An Alteon could do it, with
> > extra RAM or an impressively underloaded network.
> 
> The big problem with that is that you end up with not having a single
> stack, but N + 1 (N being the number of intelligent cards) stacks. 
> Mainteance nightmare.

No, the stack stays in the kernel.  That's the beauty of this scheme.

There are any number of proposed "put the stack on the intelligent card"
schemes out there, and they all suffer from the same problem: it's not
quite the stack you want + maintenance nightmare.

For the scheme I am referring to, all protocol handling is done by the
kernel, using the normal code.  The card does have a little
intelligence: it implements a cache of data payloads, and can do
checksums.  That's all.  (It doesn't know about TCP or even IP for example).

Run time is like this:

1. write/sendto

   user space calls write()
   create skbuffs as usual including data part, but mark as "cached on NIC"
   DMA user space data directly to NIC (needs driver hook)
   schedule if worthwhile
   DMA completes, return

2. read/recvfrom

   user space calls read()
   look for skbuffs as usual
   if skbuff says "data part cached on NIC", DMA it from NIC
   schedule if worthwhile
   DMA completes, return

3. NIC driver xmit function

   if skbuff says "data part cached on NIC", there's simply less to DMA

4. NIC driver receive function

   fetch enough data from the NIC for header processing.
 (NIC may help minimise this, but a constant is good enough)
   mark new skbuff as "data prt cached on NIC"

I wouldn't be surprised to find even some of the older hardware NICs can
be made to work this way.  Relatively little is required of the driver
-- just the ability to manage the card's memory independently of packet
I/O on the wire.

ps. Just to address Alan's point: data is never accessed directly over
the PCI bus.  Perhaps he's thinking of another scheme.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Alan Cox

> i think it's a quality of implementation issue. The csum_copy_generic
> thing is a bug. Allowing incorrect checksums to be sent out would be a
> design bug. I think some RFCs do even forbid the sending of incorrect
> packets?

You are perfectly at liberty to send invalidly checksummed packets. Its
a design decision. The remote simply discards the frame

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Dan Kegel

Ingo Molnar ([EMAIL PROTECTED]) wrote:
> yep i agree - in this case a receivefile() implementation would be handy 
> (we are 100% ready in 2.4 to introduce it - from the pagecache and VFS 
> point of view, it's just not there yet), thus you could receivefile() your 
> data into a temporary file, and sendfile() it to the other card, without 
> ever touching data. This is faster than any zero-copy read()/write(), 
> because it can do things straight in the pagecache, without having to deal 
> with user-space page mappings. 

Interface-wise, what would be the difference between receivefile() and sendfile()?
Don't both just transfer data between two arbitrary file descriptors?
- Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Andi Kleen

On Sun, Sep 03, 2000 at 07:42:53PM +0200, Ingo Molnar wrote:
> 
> On Sun, 3 Sep 2000, Andi Kleen wrote:
> 
> > You can already cause incorrect checksums on the wire just by passing
> > a partly unmapped address (the zero-the-rest exception handler in
> > csum_copy_generic in i386 forgets to add in the carry)
> > 
> > I do not believe it is a big deal, packets with bad checksum are not
> > really a problem (you can usually do other better DoS that do not need
> > it)
> 
> i think it's a quality of implementation issue. The csum_copy_generic
> thing is a bug. Allowing incorrect checksums to be sent out would be a

>From brief inspection at least arm, mips, m68k have similar problems. 

[haven't checked the others]

> design bug. I think some RFCs do even forbid the sending of incorrect
> packets?

I do not know any that do. Of course you shouldn't, but the receiver
has to handle it anyways.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Werner Almesberger

Ingo Molnar wrote:
> i believe such zero-copy send should only be allowed for drivers which can
> guarantee correct checksums.

Hmm, I think this shouldn't be tied too closely to TCP. E.g. you can
probably play wonderful ALF tricks with raw sockets.

For TCP/UDP, such a restriction may be useful, but then it shouldn't
be visible to the application, or route changes on multi-homed systems
with different hardware on those ports would be great fun.

- Werner

-- 
  _
 / Werner Almesberger, ICA, EPFL, CH   [EMAIL PROTECTED] /
/_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Ingo Molnar


On Sun, 3 Sep 2000, Andi Kleen wrote:

> You can already cause incorrect checksums on the wire just by passing
> a partly unmapped address (the zero-the-rest exception handler in
> csum_copy_generic in i386 forgets to add in the carry)
> 
> I do not believe it is a big deal, packets with bad checksum are not
> really a problem (you can usually do other better DoS that do not need
> it)

i think it's a quality of implementation issue. The csum_copy_generic
thing is a bug. Allowing incorrect checksums to be sent out would be a
design bug. I think some RFCs do even forbid the sending of incorrect
packets?

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Ingo Molnar


On Sun, 3 Sep 2000, Andi Kleen wrote:

> I did the same for fragment RX some months ago (simple fragment lists
> that were copy-checksummed to user space). Overall it is probably
> better to use a kiovec, because that can be more easily used in nfsd
> and sendfile.

the basic fragment type introduced by the TUX changes is a 'struct
skb_frag', which has csum, size, *page, page_offset, frag_done, *data and
*private fields - this is more than normal kiovecs offer. But i think
kiovecs can be extended to do all this (if Stephen & everybody else
agrees), i just didnt want to touch it for the time being.

Ingo


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Ingo Molnar


On Sun, 3 Sep 2000 [EMAIL PROTECTED] wrote:

> If we go for a Linux-specific solution anyway, maybe one could add
> another send{,to,msg} flag that makes send*(2)'s buffer access
> non-atomic. That way, the kernel only needs to make sure the pages
> don't disappear, but there's no need for expensive MMU games.
> 
> Of course, this would give applications a way for generating packets
> with an incorrect TCP/UDP checksum, [...]

i believe such zero-copy send should only be allowed for drivers which can
guarantee correct checksums. (ie. cards which do Tx-checksums) The other
drivers will still copy. I dont think this is a problem - the number of
cards that can do scatter-gather DMA but cannot do TX-checksumming is
rather low. (i only know about the Tulip.) All modern cards do
TX-checksumming and scatter-gather DMA.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Richard Gooch

Andrew Morton writes:
> Jes Sorensen wrote:
> > 
> > I only know of a few 100baseT cards that can do it such as the
> > Adaptec Starfire and the 3C905B
> > (though I am not sure what it provides is sufficient).
> 
> The 3c905, 3c905B, 3c905C and all 3Com Cardbus NICs will do
> scatter and gather of up to 63 fragments per packet with
> byte-granular fragment alignment and length.
> 
> All of them except the 3c905 provide hardware Rx and Tx
> checksumming of IP, TCP and UDP headers.  No 64 bit
> addressing support.

And does the driver support it? Has anyone benchmarked the performance
difference (if any)?

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Jes Sorensen

> "Andrew" == Andrew Morton <[EMAIL PROTECTED]> writes:

Andrew> Jes Sorensen wrote:
>>  I only know of a few 100baseT cards that can do it such as the
>> Adaptec Starfire and the 3C905B (though I am not sure what it
>> provides is sufficient).

Andrew> The 3c905, 3c905B, 3c905C and all 3Com Cardbus NICs will do
Andrew> scatter and gather of up to 63 fragments per packet with
Andrew> byte-granular fragment alignment and length.

Andrew> All of them except the 3c905 provide hardware Rx and Tx
Andrew> checksumming of IP, TCP and UDP headers.  No 64 bit addressing
Andrew> support.

Read what I wrote above, the 905B was on the list I posted ;-) The
905B is not a 1995 card on the other hand. The 905 doesn't count since
scatter/gather without checksumming makes it pretty useless.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Andrew Morton

Jes Sorensen wrote:
> 
> I only know of a few 100baseT cards that can do it such as the
> Adaptec Starfire and the 3C905B
> (though I am not sure what it provides is sufficient).

The 3c905, 3c905B, 3c905C and all 3Com Cardbus NICs will do
scatter and gather of up to 63 fragments per packet with
byte-granular fragment alignment and length.

All of them except the 3c905 provide hardware Rx and Tx
checksumming of IP, TCP and UDP headers.  No 64 bit
addressing support.

Is this sufficient?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread almesber

Ingo Molnar wrote:
> On 2 Sep 2000, Jes Sorensen wrote:
> > Besides that you need to do copy-on-write if you want to be able to do
> > zero copy on write() from user space [...]
> 
> i agree that this is hard - i'm not sure wether we want to go the pain to
> enable anonymous-buffer write()s do zero-copy.

If we go for a Linux-specific solution anyway, maybe one could add
another send{,to,msg} flag that makes send*(2)'s buffer access
non-atomic. That way, the kernel only needs to make sure the pages
don't disappear, but there's no need for expensive MMU games.

Of course, this would give applications a way for generating packets
with an incorrect TCP/UDP checksum, so maybe that should be protected
by a capability then, maybe CAP_NET_RAW.

- Werner

-- 
  _
 / Werner Almesberger, ICA, EPFL, CH   [EMAIL PROTECTED] /
/_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-03 Thread Jes Sorensen

> "Jeff" == Jeff V Merkey <[EMAIL PROTECTED]> writes:

Jeff> There's been a few cards around since about 1995, but I don't
Jeff> remember all of them.  I do remember having to debug SMP code on
Jeff> them though -- yec

I wouldn't be surprised but I would prefer names. Doing SMP aware
device drivers is not hard though.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   >