Re: tcp bw in 2.6

2007-10-15 Thread Daniel Schaffrath
On 2007/10/02 , at 18:47, Stephen Hemminger wrote: On Tue, 2 Oct 2007 09:25:34 -0700 [EMAIL PROTECTED] (Larry McVoy) wrote: If the server side is the source of the data, i.e, it's transfer is a write loop, then I get the bad behaviour. ... So is this a bug or intentional? For whatever it

Re: tcp bw in 2.6

2007-10-15 Thread Stephen Hemminger
On Mon, 15 Oct 2007 14:40:25 +0200 Daniel Schaffrath [EMAIL PROTECTED] wrote: On 2007/10/02 , at 18:47, Stephen Hemminger wrote: On Tue, 2 Oct 2007 09:25:34 -0700 [EMAIL PROTECTED] (Larry McVoy) wrote: If the server side is the source of the data, i.e, it's transfer is a write

Re: tcp bw in 2.6

2007-10-03 Thread Bill Fink
Tangential aside: On Tue, 02 Oct 2007, Rick Jones wrote: *) depending on the quantity of CPU around, and the type of test one is running, results can be better/worse depending on the CPU to which you bind the application. Latency tends to be best when running on the same core as takes

Re: tcp bw in 2.6

2007-10-03 Thread David Miller
From: [EMAIL PROTECTED] (Larry McVoy) Date: Tue, 2 Oct 2007 15:36:44 -0700 On Tue, Oct 02, 2007 at 03:32:16PM -0700, David Miller wrote: I'm starting to have a theory about what the bad case might be. A strong sender going to an even stronger receiver which can pull out packets into

Re: tcp bw in 2.6

2007-10-03 Thread Larry McVoy
A few notes to the discussion. I've seen one e1000 bug that ended up being a crappy AMD pre-opteron SMP chipset with a totally useless PCI bus implementation, which limited performance quite a bit-totally depending on what you plugged in and in which slot. 10e milk-and-bread-store 32/33 gige

Re: tcp bw in 2.6

2007-10-03 Thread Pekka Pietikainen
On Tue, Oct 02, 2007 at 02:21:32PM -0700, Larry McVoy wrote: More data, sky2 works fine (really really fine, like 79MB/sec) between Linux dylan.bitmover.com 2.6.18.1 #5 SMP Mon Oct 23 17:36:00 PDT 2006 i686 Linux steele 2.6.20-16-generic #2 SMP Sun Sep 23 18:31:23 UTC 2007 x86_64 So this is

Re: tcp bw in 2.6

2007-10-03 Thread Pekka Pietikainen
On Wed, Oct 03, 2007 at 02:23:58PM -0700, Larry McVoy wrote: A few notes to the discussion. I've seen one e1000 bug that ended up being a crappy AMD pre-opteron SMP chipset with a totally useless PCI bus implementation, which limited performance quite a bit-totally depending on what you

Re: tcp bw in 2.6

2007-10-02 Thread Herbert Xu
Larry McVoy [EMAIL PROTECTED] wrote: One of my clients also has gigabit so I played around with just that one and it (itanium running hpux w/ broadcom gigabit) can push the load as well. One weird thing is that it is dependent on the direction the data is flowing. If the hp is sending then

Re: tcp bw in 2.6

2007-10-02 Thread John Heffner
Larry McVoy wrote: A short summary is can someone please post a test program that sources and sinks data at the wire speed? because apparently I'm too old and clueless to write such a thing. Here's a simple reference tcp source/sink that's I've used for years. For example, on a couple

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
On Tue, Oct 02, 2007 at 06:52:54PM +0800, Herbert Xu wrote: One of my clients also has gigabit so I played around with just that one and it (itanium running hpux w/ broadcom gigabit) can push the load as well. One weird thing is that it is dependent on the direction the data is flowing.

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
Interesting data point. My test case is like this: server bind listen while (newsock = accept...) transfer() client connect transfer If the server side is the source of the data, i.e, it's transfer is a write loop, then I get the bad

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
If the server side is the source of the data, i.e, it's transfer is a write loop, then I get the bad behaviour. ... So is this a bug or intentional? For whatever it is worth, I believed that we used to get better performance from the same hardware. My guess is that it changed somewhere

Re: tcp bw in 2.6

2007-10-02 Thread Linus Torvalds
On Tue, 2 Oct 2007, Larry McVoy wrote: Interesting data point. My test case is like this: server bind listen while (newsock = accept...) transfer() client connect transfer If the server side is the source of the data, i.e, it's

Re: tcp bw in 2.6

2007-10-02 Thread Stephen Hemminger
On Tue, 2 Oct 2007 09:25:34 -0700 [EMAIL PROTECTED] (Larry McVoy) wrote: If the server side is the source of the data, i.e, it's transfer is a write loop, then I get the bad behaviour. ... So is this a bug or intentional? For whatever it is worth, I believed that we used to get

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
Isn't this something so straightforward that you would have tests for it? This is the basic FTP server loop, doesn't someone have a big machine with 10gig cards and test that sending/recving data doesn't regress? Sounds like a bug to me, modulo the above caveat of making sure that it's not

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
On Tue, Oct 02, 2007 at 09:47:26AM -0700, Stephen Hemminger wrote: On Tue, 2 Oct 2007 09:25:34 -0700 [EMAIL PROTECTED] (Larry McVoy) wrote: If the server side is the source of the data, i.e, it's transfer is a write loop, then I get the bad behaviour. ... So is this a bug or

Re: tcp bw in 2.6

2007-10-02 Thread Ben Greear
Larry McVoy wrote: Interesting data point. My test case is like this: server bind listen while (newsock = accept...) transfer() client connect transfer If the server side is the source of the data, i.e, it's transfer is a write loop,

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
I have a more complex configuration application, but I don't see this problem in my testing. Using e1000 nics and modern hardware I'm using a similar setup, what kernel are you using? I am purposefully setting the socket send/rx buffers, as well has twiddling with the tcp and netdev

Re: tcp bw in 2.6

2007-10-02 Thread Stephen Hemminger
On Tue, 2 Oct 2007 09:49:52 -0700 [EMAIL PROTECTED] (Larry McVoy) wrote: On Tue, Oct 02, 2007 at 09:47:26AM -0700, Stephen Hemminger wrote: On Tue, 2 Oct 2007 09:25:34 -0700 [EMAIL PROTECTED] (Larry McVoy) wrote: If the server side is the source of the data, i.e, it's transfer is a

Re: tcp bw in 2.6

2007-10-02 Thread Rick Jones
Larry McVoy wrote: A short summary is can someone please post a test program that sources and sinks data at the wire speed? because apparently I'm too old and clueless to write such a thing. http://www.netperf.org/svn/netperf2/trunk/ :) WRT the different speeds in each direction talking

Re: tcp bw in 2.6

2007-10-02 Thread Ben Greear
Larry McVoy wrote: I have a more complex configuration application, but I don't see this problem in my testing. Using e1000 nics and modern hardware I'm using a similar setup, what kernel are you using? I'm currently on 2.6.20, and have also tried 10gbe nics on 2.6.23 with good

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
On Tue, Oct 02, 2007 at 10:14:11AM -0700, Rick Jones wrote: Larry McVoy wrote: A short summary is can someone please post a test program that sources and sinks data at the wire speed? because apparently I'm too old and clueless to write such a thing. WRT the different speeds in each

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
I'm currently on 2.6.20, and have also tried 10gbe nics on 2.6.23 with My guess is that it is a bug in the debian 2.6.18 kernel. Have you tried something like ttcp, iperf, or even regular ftp? Yeah, I've factored out the code since BitKeeper, my test program, and John's test program all

Re: tcp bw in 2.6

2007-10-02 Thread Stephen Hemminger
On Tue, 2 Oct 2007 10:21:55 -0700 [EMAIL PROTECTED] (Larry McVoy) wrote: I'm currently on 2.6.20, and have also tried 10gbe nics on 2.6.23 with My guess is that it is a bug in the debian 2.6.18 kernel. Have you tried something like ttcp, iperf, or even regular ftp? Yeah, I've factored

Re: tcp bw in 2.6

2007-10-02 Thread Rick Jones
has anyone already asked whether link-layer flow-control is enabled? rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: tcp bw in 2.6

2007-10-02 Thread John Heffner
Larry McVoy wrote: On Tue, Oct 02, 2007 at 06:52:54PM +0800, Herbert Xu wrote: One of my clients also has gigabit so I played around with just that one and it (itanium running hpux w/ broadcom gigabit) can push the load as well. One weird thing is that it is dependent on the direction the data

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
Make sure you don't have slab debugging turned on. It kills performance. It's a stock debian kernel, so unless they turn it on it's off. -- --- Larry McVoylm at bitmover.com http://www.bitkeeper.com - To unsubscribe from this list: send the line unsubscribe netdev in

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
On Tue, Oct 02, 2007 at 11:01:47AM -0700, Rick Jones wrote: has anyone already asked whether link-layer flow-control is enabled? I doubt it, the same test works fine in one direction and poorly in the other. Wouldn't the flow control squelch either way? -- --- Larry McVoylm at

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
Looks like you have TSO enabled. Does it behave differently if it's disabled? It cranks the interrupts/sec up to 8K instead of 5K. No difference in performance other than that. I think Rick Jones is on to something with the HP ack avoidance. I sincerely doubt it. I'm only using the

Re: tcp bw in 2.6

2007-10-02 Thread Linus Torvalds
On Tue, 2 Oct 2007, Larry McVoy wrote: tcpdump is a good idea, take a look at this. The window starts out at 46 and never opens up in my test case, but in the rsh case it starts out the same but does open up. Ideas? I don't think that's an issue, since you only send one way. The window

Re: tcp bw in 2.6

2007-10-02 Thread Linus Torvalds
On Tue, 2 Oct 2007, Larry McVoy wrote: No HP in the mix. It's got nothing to do with hp, nor to do with rsh, it has everything to do with the direction the data is flowing. Can you tcpdump both cases and send snippets (both of steady-state, and the initial connect)?

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
More data, we've conclusively eliminated the card / cpu from the mix. We've got 2 ia64 boxes with e1000 interfaces. One box is running linux 2.6.12 and the other is running hpux 11. I made sure the linux one was running at gigabit and reran the tests from the linux/ia64 = hp/ia64. Same results,

Re: tcp bw in 2.6

2007-10-02 Thread Rick Jones
Larry McVoy wrote: On Tue, Oct 02, 2007 at 11:01:47AM -0700, Rick Jones wrote: has anyone already asked whether link-layer flow-control is enabled? I doubt it, the same test works fine in one direction and poorly in the other. Wouldn't the flow control squelch either way? While I am often

Re: tcp bw in 2.6

2007-10-02 Thread John Heffner
Larry McVoy wrote: More data, we've conclusively eliminated the card / cpu from the mix. We've got 2 ia64 boxes with e1000 interfaces. One box is running linux 2.6.12 and the other is running hpux 11. I made sure the linux one was running at gigabit and reran the tests from the linux/ia64 =

Re: tcp bw in 2.6

2007-10-02 Thread Rick Jones
I also would have expected more ACK's from the HP box. It's been a long time since I did TCP, but I thought the rule was still that you were supposed to ACK at least every other full frame - but the HP box is acking roughly every 16K (and it's *not* always at TSO boundaries: the earlier ACK's

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
I think I'm still missing some basic data here (probably because this thread did not originate on netdev). Let me try to nail down some of the basics. You have a linux ia64 box (running 2.6.12 or 2.6.18?) that sends slowly, and receives faster, but not quite a 1 Gbps? And this is true

Re: tcp bw in 2.6

2007-10-02 Thread David Miller
From: Linus Torvalds [EMAIL PROTECTED] Date: Tue, 2 Oct 2007 12:29:50 -0700 (PDT) On Tue, 2 Oct 2007, Larry McVoy wrote: No HP in the mix. It's got nothing to do with hp, nor to do with rsh, it has everything to do with the direction the data is flowing. Can you tcpdump both cases

Re: tcp bw in 2.6

2007-10-02 Thread David Miller
From: Linus Torvalds [EMAIL PROTECTED] Date: Tue, 2 Oct 2007 12:27:53 -0700 (PDT) We see a single packet containing 16060 bytes, which seems to be because of TSO on the sending side (you did your tcpdump on the sender, no?), so it will actually be broken up into 11 1460-byte regular frames

Re: tcp bw in 2.6

2007-10-02 Thread Rick Jones
Alternatively, take your favorite test programs, such as John's, and make a second pair that reverses the direction the data is sent. So one pair is server sends, the other is server receives, try both. That's where we started, BitKeeper, my stripped down test, and John's test all exhibit the

Re: tcp bw in 2.6

2007-10-02 Thread Roland Dreier
It would be really great to see numbers with a more recent kernel than 2.6.18 FWIW Debian has binaries for 2.6.21 in testing and for 2.6.22 in unstable so it should be very easy for Larry to try at least those. - R. - To unsubscribe from this list: send the line unsubscribe netdev in the

Re: tcp bw in 2.6

2007-10-02 Thread David Miller
From: [EMAIL PROTECTED] (Larry McVoy) Date: Tue, 2 Oct 2007 09:48:58 -0700 Isn't this something so straightforward that you would have tests for it? This is the basic FTP server loop, doesn't someone have a big machine with 10gig cards and test that sending/recving data doesn't regress?

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
We fixed a lot of bugs in TSO last year. It would be really great to see numbers with a more recent kernel than 2.6.18 More data, sky2 works fine (really really fine, like 79MB/sec) between Linux dylan.bitmover.com 2.6.18.1 #5 SMP Mon Oct 23 17:36:00 PDT 2006 i686 Linux steele

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
On Tue, Oct 02, 2007 at 02:16:56PM -0700, David Miller wrote: We absolutely depend upon people like you to report when there are anomalies like this. It's the only thing that scales. Well cool, finally doing something useful :) Is this issue no test setup? Because this does seem like

Re: tcp bw in 2.6

2007-10-02 Thread David Miller
From: [EMAIL PROTECTED] (Larry McVoy) Date: Tue, 2 Oct 2007 14:26:08 -0700 And note that sky2 doesn't have this problem. Does the broadcom do TSO? And sky2 not? I noticed a much higher CPU load for sky2. Yes the broadcoms (the revisions I have) do TSO and it is enabled on both sides. Which

Re: tcp bw in 2.6

2007-10-02 Thread David Miller
From: [EMAIL PROTECTED] (Larry McVoy) Date: Tue, 2 Oct 2007 11:40:32 -0700 I doubt it, the same test works fine in one direction and poorly in the other. Wouldn't the flow control squelch either way? HW controls for these things are typically: 1) Generates flow control flames 2) Listens for

Re: tcp bw in 2.6

2007-10-02 Thread Linus Torvalds
On Tue, 2 Oct 2007, Wayne Scott wrote: The slow set was done like this: on ia64: netcat -l -p /dev/null on work: netcat ia64 /dev/zero That sounds wrong. Larry claims the slow case is when the side that did accept() does the sending, the above has the listener just

Re: tcp bw in 2.6

2007-10-02 Thread Rick Jones
David Miller wrote: From: [EMAIL PROTECTED] (Larry McVoy) Date: Tue, 2 Oct 2007 14:26:08 -0700 And note that sky2 doesn't have this problem. Does the broadcom do TSO? And sky2 not? I noticed a much higher CPU load for sky2. Yes the broadcoms (the revisions I have) do TSO and it is

Re: tcp bw in 2.6

2007-10-02 Thread David Miller
From: Rick Jones [EMAIL PROTECTED] Date: Tue, 02 Oct 2007 15:17:35 -0700 Stranger still, with a mix of a 2.6.23-rc5ish kernel and a net-2.6.24 one (pulled oh middle of last week?) I get link-rate and I see no asymmetry between TCP_STREAM and TCP_MAERTS over an e1000 link with no switch or

Re: tcp bw in 2.6

2007-10-02 Thread Larry McVoy
On Tue, Oct 02, 2007 at 03:32:16PM -0700, David Miller wrote: I'm starting to have a theory about what the bad case might be. A strong sender going to an even stronger receiver which can pull out packets into the process as fast as they arrive. This might be part of what keeps the receive

Re: tcp bw in 2.6

2007-10-02 Thread Rick Jones
Larry McVoy wrote: On Tue, Oct 02, 2007 at 03:32:16PM -0700, David Miller wrote: I'm starting to have a theory about what the bad case might be. A strong sender going to an even stronger receiver which can pull out packets into the process as fast as they arrive. This might be part of what

Re: tcp bw in 2.6

2007-10-01 Thread Larry McVoy
On Sat, Sep 29, 2007 at 11:02:32AM -0700, Linus Torvalds wrote: On Sat, 29 Sep 2007, Larry McVoy wrote: I haven't kept up on switch technology but in the past they were much better than you are thinking. The Kalpana switch that I had modified to support vlans (invented by yours truly), did

Re: tcp bw in 2.6

2007-10-01 Thread Linus Torvalds
On Mon, 1 Oct 2007, Larry McVoy wrote: but the client looks like connect(3, {sa_family=AF_INET, sin_port=htons(31235), sin_addr=inet_addr(10.3.9.1)}, 16) = 0 read(3, \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0..., 1048576) = 2896 read(3,

Re: tcp bw in 2.6

2007-10-01 Thread Larry McVoy
On Mon, Oct 01, 2007 at 07:14:37PM -0700, Linus Torvalds wrote: On Mon, 1 Oct 2007, Larry McVoy wrote: but the client looks like connect(3, {sa_family=AF_INET, sin_port=htons(31235), sin_addr=inet_addr(10.3.9.1)}, 16) = 0 read(3,

Re: tcp bw in 2.6

2007-10-01 Thread David Miller
From: [EMAIL PROTECTED] (Larry McVoy) Date: Mon, 1 Oct 2007 19:20:59 -0700 A short summary is can someone please post a test program that sources and sinks data at the wire speed? because apparently I'm too old and clueless to write such a thing. You're not showing us your test program so

Re: tcp bw in 2.6

2007-10-01 Thread Larry McVoy
On Mon, Oct 01, 2007 at 08:50:50PM -0700, David Miller wrote: From: [EMAIL PROTECTED] (Larry McVoy) Date: Mon, 1 Oct 2007 19:20:59 -0700 A short summary is can someone please post a test program that sources and sinks data at the wire speed? because apparently I'm too old and clueless