Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Daniel Kalchev
On 07.12.11 22:23, Luigi Rizzo wrote: Sorry, forgot to mention that the above is with TSO DISABLED (which is not the default). TSO seems to have a very bad interaction with HWCSUM and non-zero mitigation. I have this on both sender and receiver # ifconfig ix1 ix1:

Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Luigi Rizzo
On Thu, Dec 08, 2011 at 12:06:26PM +0200, Daniel Kalchev wrote: On 07.12.11 22:23, Luigi Rizzo wrote: Sorry, forgot to mention that the above is with TSO DISABLED (which is not the default). TSO seems to have a very bad interaction with HWCSUM and non-zero mitigation. I have this on

Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Lawrence Stewart
On 12/08/11 05:08, Luigi Rizzo wrote: On Wed, Dec 07, 2011 at 11:59:43AM +0100, Andre Oppermann wrote: On 06.12.2011 22:06, Luigi Rizzo wrote: ... Even in my experiments there is a lot of instability in the results. I don't know exactly where the problem is, but the high number of read

Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Slawa Olhovchenkov
On Mon, Dec 05, 2011 at 08:27:03PM +0100, Luigi Rizzo wrote: Hi, I am trying to establish the baseline performance for 10G throughput over TCP, and would like to collect some data points. As a testing program i am using nuttcp from ports (as good as anything, i guess -- it is reasonably

Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Luigi Rizzo
On Fri, Dec 09, 2011 at 12:11:50AM +1100, Lawrence Stewart wrote: On 12/08/11 05:08, Luigi Rizzo wrote: ... I ran a bunch of tests on the ixgbe (82599) using RELENG_8 (which seems slightly faster than HEAD) using MTU=1500 and various combinations of card capabilities (hwcsum,tso,lro),

Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Andre Oppermann
On 08.12.2011 14:11, Lawrence Stewart wrote: On 12/08/11 05:08, Luigi Rizzo wrote: On Wed, Dec 07, 2011 at 11:59:43AM +0100, Andre Oppermann wrote: On 06.12.2011 22:06, Luigi Rizzo wrote: ... Even in my experiments there is a lot of instability in the results. I don't know exactly where the

Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Andre Oppermann
On 08.12.2011 16:34, Luigi Rizzo wrote: On Fri, Dec 09, 2011 at 12:11:50AM +1100, Lawrence Stewart wrote: On 12/08/11 05:08, Luigi Rizzo wrote: ... I ran a bunch of tests on the ixgbe (82599) using RELENG_8 (which seems slightly faster than HEAD) using MTU=1500 and various combinations of

Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Luigi Rizzo
On Fri, Dec 09, 2011 at 01:33:04AM +0100, Andre Oppermann wrote: On 08.12.2011 16:34, Luigi Rizzo wrote: On Fri, Dec 09, 2011 at 12:11:50AM +1100, Lawrence Stewart wrote: ... Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have LRO capable hardware setup locally to figure out

Re: datapoints on 10G throughput with TCP ?

2011-12-07 Thread Andre Oppermann
On 06.12.2011 22:06, Luigi Rizzo wrote: On Tue, Dec 06, 2011 at 07:40:21PM +0200, Daniel Kalchev wrote: I see significant difference between number of interrupts on the Intel and the AMD blades. When performing a test between the Intel and AMD blades, the Intel blade generates 20,000-35,000

quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-07 Thread Luigi Rizzo
On Wed, Dec 07, 2011 at 11:59:43AM +0100, Andre Oppermann wrote: On 06.12.2011 22:06, Luigi Rizzo wrote: ... Even in my experiments there is a lot of instability in the results. I don't know exactly where the problem is, but the high number of read syscalls, and the huge impact of setting

Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-07 Thread Daniel Kalchev
On Dec 7, 2011, at 8:08 PM, Luigi Rizzo wrote: Summary: - with default interrupt mitigation, the fastest configuration is with checksums enabled on both sender and receiver, lro enabled on the receiver. This gets about 8.0 Gbit/s I do not observe this. With defaults: # nuttcp -t -T 5

Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-07 Thread Luigi Rizzo
On Wed, Dec 07, 2011 at 09:58:31PM +0200, Daniel Kalchev wrote: On Dec 7, 2011, at 8:08 PM, Luigi Rizzo wrote: Summary: - with default interrupt mitigation, the fastest configuration is with checksums enabled on both sender and receiver, lro enabled on the receiver. This gets

Re: datapoints on 10G throughput with TCP ?

2011-12-06 Thread Daniel Kalchev
Here is what I get, with an existing install, no tuning other than kern.ipc.nmbclusters=512000 Pair of Supermicro blades: FreeBSD 8.2-STABLE #0: Wed Sep 28 11:23:59 EEST 2011 CPU: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (2403.58-MHz K8-class CPU) real memory = 51539607552 (49152 MB)

Re: datapoints on 10G throughput with TCP ?

2011-12-06 Thread Daniel Kalchev
On 06.12.11 13:18, Daniel Kalchev wrote: [...] second blade: # nuttcp -t -T 5 -w 128 -v 10.2.101.13 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.13 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.13 with mss=1448, RTT=0.164 ms

Re: datapoints on 10G throughput with TCP ?

2011-12-06 Thread Daniel Kalchev
Some tests with updated FreeBSD to 8-stable as of today, compared with the previous run On 06.12.11 13:18, Daniel Kalchev wrote: FreeBSD 8.2-STABLE #0: Wed Sep 28 11:23:59 EEST 2011 CPU: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (2403.58-MHz K8-class CPU) real memory = 51539607552

Re: datapoints on 10G throughput with TCP ?

2011-12-06 Thread Jack Vogel
Set the storm threshold to 0, that will disable it, its going to throttle your performance when it happens. Jack On Tue, Dec 6, 2011 at 6:24 AM, Daniel Kalchev dan...@digsys.bg wrote: Some tests with updated FreeBSD to 8-stable as of today, compared with the previous run On 06.12.11

Re: datapoints on 10G throughput with TCP ?

2011-12-06 Thread Daniel Kalchev
I see significant difference between number of interrupts on the Intel and the AMD blades. When performing a test between the Intel and AMD blades, the Intel blade generates 20,000-35,000 interrupts, while the AMD blade generates under 1,000 interrupts. There is no longer throttling, but the

Re: datapoints on 10G throughput with TCP ?

2011-12-06 Thread Luigi Rizzo
On Tue, Dec 06, 2011 at 07:40:21PM +0200, Daniel Kalchev wrote: I see significant difference between number of interrupts on the Intel and the AMD blades. When performing a test between the Intel and AMD blades, the Intel blade generates 20,000-35,000 interrupts, while the AMD blade generates

Re: datapoints on 10G throughput with TCP ?

2011-12-06 Thread Daniel O'Connor
On 07/12/2011, at 24:54, Daniel Kalchev wrote: It seems performance measurements are more dependent on the server (nuttcp -S) machine. We will have to rule out the interrupt storms first of course, any advice? You can control the storm threshold by setting the hw.intr_storm_threshold

datapoints on 10G throughput with TCP ?

2011-12-05 Thread Luigi Rizzo
Hi, I am trying to establish the baseline performance for 10G throughput over TCP, and would like to collect some data points. As a testing program i am using nuttcp from ports (as good as anything, i guess -- it is reasonably flexible, and if you use it in TCP with relatively large writes, the

Re: datapoints on 10G throughput with TCP ?

2011-12-05 Thread Daniel Kalchev
On Dec 5, 2011, at 9:27 PM, Luigi Rizzo wrote: - have two machines connected by a 10G link - on one run nuttcp -S - on the other one run nuttcp -t -T 5 -w 128 -v the.other.ip Any particular tuning of FreeBSD? Daniel ___

Re: datapoints on 10G throughput with TCP ?

2011-12-05 Thread Luigi Rizzo
On Mon, Dec 05, 2011 at 11:15:09PM +0200, Daniel Kalchev wrote: On Dec 5, 2011, at 9:27 PM, Luigi Rizzo wrote: - have two machines connected by a 10G link - on one run nuttcp -S - on the other one run nuttcp -t -T 5 -w 128 -v the.other.ip Any particular tuning of FreeBSD?

Re: datapoints on 10G throughput with TCP ?

2011-12-05 Thread Luigi Rizzo
On Mon, Dec 05, 2011 at 03:08:54PM -0800, Jack Vogel wrote: You can't get line rate with ixgbe, in what configuration/hardware? We surely do get line rate in validation here, but its sensitive to your hardware and config. sources from HEAD as of a week or so, default parameter setting, 82599

Re: datapoints on 10G throughput with TCP ?

2011-12-05 Thread Jack Vogel
You can't get line rate with ixgbe, in what configuration/hardware? We surely do get line rate in validation here, but its sensitive to your hardware and config. Jack On Mon, Dec 5, 2011 at 2:28 PM, Luigi Rizzo ri...@iet.unipi.it wrote: On Mon, Dec 05, 2011 at 11:15:09PM +0200, Daniel Kalchev