Re: [net-next RFC V5 0/5] Multiqueue virtio-net

2012-07-09 Thread Rick Jones

On 07/08/2012 08:23 PM, Jason Wang wrote:

On 07/07/2012 12:23 AM, Rick Jones wrote:

On 07/06/2012 12:42 AM, Jason Wang wrote:
Which mechanism to address skew error?  The netperf manual describes
more than one:


This mechanism is missed in my test, I would add them to my test scripts.


http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance


Personally, my preference these days is to use the demo mode method
of aggregate results as it can be rather faster than (ab)using the
confidence intervals mechanism, which I suspect may not really scale
all that well to large numbers of concurrent netperfs.


During my test, the confidence interval would even hard to achieved in
RR test when I pin vhost/vcpus in the processors, so I didn't use it.


When running aggregate netperfs, *something* has to be done to address 
the prospect of skew error.  Otherwise the results are suspect.


happy benchmarking,

rick jones
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next RFC V5 0/5] Multiqueue virtio-net

2012-07-08 Thread Ronen Hod

On 07/05/2012 01:29 PM, Jason Wang wrote:

Hello All:

This series is an update version of multiqueue virtio-net driver based on
Krishna Kumar's work to let virtio-net use multiple rx/tx queues to do the
packets reception and transmission. Please review and comments.

Test Environment:
- Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 8 cores 2 numa nodes
- Two directed connected 82599

Test Summary:

- Highlights: huge improvements on TCP_RR test


Hi Jason,

It might be that the good TCP_RR results are due to the large number of 
sessions (50-250). Can you test it also with small number of sessions?


- Lowlights: regression on small packet transmission, higher cpu utilization
  than single queue, need further optimization

Analysis of the performance result:

- I count the number of packets sending/receiving during the test, and
   multiqueue show much more ability in terms of packets per second.

- For the tx regression, multiqueue send about 1-2 times of more packets
   compared to single queue, and the packets size were much smaller than single
   queue does. I suspect tcp does less batching in multiqueue, so I hack the
   tcp_write_xmit() to forece more batching, multiqueue works as well as
   singlequeue for both small transmission and throughput


Could it be that since the CPUs are not busy they are available for immediate 
handling of the packets (little batching)? In such scenario the CPU utilization 
is not really interesting. What will happen on a busy machine?

Ronen.



- I didn't pack the accelerate RFS with virtio-net in this sereis as it still
   need further shaping, for the one that interested in this please see:
   http://www.mail-archive.com/kvm@vger.kernel.org/msg64111.html

Changes from V4:
- Add ability to negotiate the number of queues through control virtqueue
- Ethtool -{L|l} support and default the tx/rx queue number to 1
- Expose the API to set irq affinity instead of irq itself

Changes from V3:

- Rebase to the net-next
- Let queue 2 to be the control virtqueue to obey the spec
- Prodives irq affinity
- Choose txq based on processor id

References:

- V4: https://lkml.org/lkml/2012/6/25/120
- V3: http://lwn.net/Articles/467283/

Test result:

1) 1 vm 2 vcpu 1q vs 2q, 1 - 1q, 2 - 2q, no pinning

- Guest to External Host TCP STREAM
sessions size throughput1 throughput2   norm1 norm2
1 64 650.55 655.61 100% 24.88 24.86 99%
2 64 1446.81 1309.44 90% 30.49 27.16 89%
4 64 1430.52 1305.59 91% 30.78 26.80 87%
8 64 1450.89 1270.82 87% 30.83 25.95 84%
1 256 1699.45 1779.58 104% 56.75 59.08 104%
2 256 4902.71 3446.59 70% 98.53 62.78 63%
4 256 4803.76 2980.76 62% 97.44 54.68 56%
8 256 5128.88 3158.74 61% 104.68 58.61 55%
1 512 2837.98 2838.42 100% 89.76 90.41 100%
2 512 6742.59 5495.83 81% 155.03 99.07 63%
4 512 9193.70 5900.17 64% 202.84 106.44 52%
8 512 9287.51 7107.79 76% 202.18 129.08 63%
1 1024 4166.42 4224.98 101% 128.55 129.86 101%
2 1024 6196.94 7823.08 126% 181.80 168.81 92%
4 1024 9113.62 9219.49 101% 235.15 190.93 81%
8 1024 9324.25 9402.66 100% 239.10 179.99 75%
1 2048 7441.63 6534.04 87% 248.01 215.63 86%
2 2048 7024.61 7414.90 105% 225.79 219.62 97%
4 2048 8971.49 9269.00 103% 278.94 220.84 79%
8 2048 9314.20 9359.96 100% 268.36 192.23 71%
1 4096 8282.60 8990.08 108% 277.45 320.05 115%
2 4096 9194.80 9293.78 101% 317.02 248.76 78%
4 4096 9340.73 9313.19 99% 300.34 230.35 76%
8 4096 9148.23 9347.95 102% 279.49 199.43 71%
1 16384 8787.89 8766.31 99% 312.38 316.53 101%
2 16384 9306.35 9156.14 98% 319.53 279.83 87%
4 16384 9177.81 9307.50 101% 312.69 230.07 73%
8 16384 9035.82 9188.00 101% 298.32 199.17 66%
- TCP RR
sessions size throughput1 throughput2   norm1 norm2
50 1 54695.41 84164.98 153% 1957.33 1901.31 97%
100 1 60141.88 88598.94 147% 2157.90 2000.45 92%
250 1 74763.56 135584.22 181% 2541.94 2628.59 103%
50 64 51628.38 82867.50 160% 1872.55 1812.16 96%
100 64 60367.73 84080.60 139% 2215.69 1867.69 84%
250 64 68502.70 124910.59 182% 2321.43 2495.76 107%
50 128 53477.08 77625.07 145% 1905.10 1870.99 98%
100 128 59697.56 74902.37 125% 2230.66 1751.03 78%
250 128 71248.74 133963.55 188% 2453.12 2711.72 110%
50 256 47663.86 67742.63 142% 1880.45 1735.30 92%
100 256 54051.84 68738.57 127% 2123.03 1778.59 83%
250 256 68250.06 124487.90 182% 2321.89 2598.60 111%
- External Host to Guest TCP STRAM
sessions size throughput1 throughput2   norm1 norm2
1 64 847.71 864.83 102% 57.99 57.93 99%
2 64 1690.82 1544.94 91% 80.13 55.09 68%
4 64 3434.98 3455.53 100% 127.17 89.00 69%
8 64 5890.19 6557.35 111% 194.70 146.52 75%
1 256 2094.04 2109.14 100% 130.73 127.14 97%
2 256 5218.13 3731.97 71% 219.15 114.02 52%
4 256 6734.51 9213.47 136% 227.87 208.31 91%
8 256 6452.86 9402.78 145% 224.83 207.77 92%
1 512 3945.07 4203.68 106% 279.72 273.30 97%
2 512 7878.96 8122.55 103% 278.25 231.71 83%
4 512 7645.89 9402.13 122% 252.10 217.42 86%
8 512 6657.06 9403.71 141% 239.81 214.89 89%
1 1024 5729.06 5111.21 89% 289.38 303.09 104%
2 1024 8097.27 8159.67 100% 269.29 242.97 90%
4 1024 7778.93 

Re: [net-next RFC V5 0/5] Multiqueue virtio-net

2012-07-08 Thread Jason Wang

On 07/07/2012 12:23 AM, Rick Jones wrote:

On 07/06/2012 12:42 AM, Jason Wang wrote:

I'm not expert of tcp, but looks like the changes are reasonable:
- we can do full-sized TSO check in tcp_tso_should_defer() only for
westwood, according to tcp westwood
- run tcp_tso_should_defer for tso_segs = 1 when tso is enabled.


I'm sure Eric and David will weigh-in on the TCP change.  My initial 
inclination would have been to say well, if multiqueue is draining 
faster, that means ACKs come-back faster, which means the race 
between more data being queued by netperf and ACKs will go more to the 
ACKs which means the segments being sent will be smaller - as 
TCP_NODELAY is not set, the Nagle algorithm is in force, which means 
once there is data outstanding on the connection, no more will be sent 
until either the outstanding data is ACKed, or there is an 
accumulation of  MSS worth of data to send.



Also, how are you combining the concurrent netperf results?  Are you
taking sums of what netperf reports, or are you gathering statistics
outside of netperf?



The throughput were just sumed from netperf result like what netperf
manual suggests. The cpu utilization were measured by mpstat.


Which mechanism to address skew error?  The netperf manual describes 
more than one:


This mechanism is missed in my test, I would add them to my test scripts.


http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance 



Personally, my preference these days is to use the demo mode method 
of aggregate results as it can be rather faster than (ab)using the 
confidence intervals mechanism, which I suspect may not really scale 
all that well to large numbers of concurrent netperfs.


During my test, the confidence interval would even hard to achieved in 
RR test when I pin vhost/vcpus in the processors, so I didn't use it.


I also tend to use the --enable-burst configure option to allow me to 
minimize the number of concurrent netperfs in the first place.  Set 
TCP_NODELAY (the test-specific -D option) and then have several 
transactions outstanding at one time (test-specific -b option with a 
number of additional in-flight transactions).


This is expressed in the runemomniaggdemo.sh script:

http://www.netperf.org/svn/netperf2/trunk/doc/examples/runemomniaggdemo.sh 



which uses the find_max_burst.sh script:

http://www.netperf.org/svn/netperf2/trunk/doc/examples/find_max_burst.sh

to pick the burst size to use in the concurrent netperfs, the results 
of which can be post-processed with:


http://www.netperf.org/svn/netperf2/trunk/doc/examples/post_proc.py

The nice feature of using the demo mode mechanism is when it is 
coupled with systems with reasonably synchronized clocks (eg NTP) it 
can be used for many-to-many testing in addition to one-to-many 
testing (which cannot be dealt with by the confidence interval method 
of dealing with skew error)




Yes, looks demo mode is helpful. I would have a look at these scripts, 
Thanks.

A single instance TCP_RR test would help confirm/refute any
non-trivial change in (effective) path length between the two cases.



Yes, I would test this thanks.


Excellent.

happy benchmarking,

rick jones

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next RFC V5 0/5] Multiqueue virtio-net

2012-07-08 Thread Jason Wang

On 07/08/2012 04:19 PM, Ronen Hod wrote:

On 07/05/2012 01:29 PM, Jason Wang wrote:

Hello All:

This series is an update version of multiqueue virtio-net driver 
based on
Krishna Kumar's work to let virtio-net use multiple rx/tx queues to 
do the

packets reception and transmission. Please review and comments.

Test Environment:
- Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 8 cores 2 numa nodes
- Two directed connected 82599

Test Summary:

- Highlights: huge improvements on TCP_RR test


Hi Jason,

It might be that the good TCP_RR results are due to the large number 
of sessions (50-250). Can you test it also with small number of sessions?


Sure, I would test them.


- Lowlights: regression on small packet transmission, higher cpu 
utilization

  than single queue, need further optimization

Analysis of the performance result:

- I count the number of packets sending/receiving during the test, and
   multiqueue show much more ability in terms of packets per second.

- For the tx regression, multiqueue send about 1-2 times of more packets
   compared to single queue, and the packets size were much smaller 
than single
   queue does. I suspect tcp does less batching in multiqueue, so I 
hack the

   tcp_write_xmit() to forece more batching, multiqueue works as well as
   singlequeue for both small transmission and throughput


Could it be that since the CPUs are not busy they are available for 
immediate handling of the packets (little batching)? In such scenario 
the CPU utilization is not really interesting. What will happen on a 
busy machine?




The regression happnes when test guest transmission in stream test, the 
cpu utilization is 100% in this situation.

Ronen.



- I didn't pack the accelerate RFS with virtio-net in this sereis as 
it still

   need further shaping, for the one that interested in this please see:
   http://www.mail-archive.com/kvm@vger.kernel.org/msg64111.html

Changes from V4:
- Add ability to negotiate the number of queues through control 
virtqueue

- Ethtool -{L|l} support and default the tx/rx queue number to 1
- Expose the API to set irq affinity instead of irq itself

Changes from V3:

- Rebase to the net-next
- Let queue 2 to be the control virtqueue to obey the spec
- Prodives irq affinity
- Choose txq based on processor id

References:

- V4: https://lkml.org/lkml/2012/6/25/120
- V3: http://lwn.net/Articles/467283/

Test result:

1) 1 vm 2 vcpu 1q vs 2q, 1 - 1q, 2 - 2q, no pinning

- Guest to External Host TCP STREAM
sessions size throughput1 throughput2   norm1 norm2
1 64 650.55 655.61 100% 24.88 24.86 99%
2 64 1446.81 1309.44 90% 30.49 27.16 89%
4 64 1430.52 1305.59 91% 30.78 26.80 87%
8 64 1450.89 1270.82 87% 30.83 25.95 84%
1 256 1699.45 1779.58 104% 56.75 59.08 104%
2 256 4902.71 3446.59 70% 98.53 62.78 63%
4 256 4803.76 2980.76 62% 97.44 54.68 56%
8 256 5128.88 3158.74 61% 104.68 58.61 55%
1 512 2837.98 2838.42 100% 89.76 90.41 100%
2 512 6742.59 5495.83 81% 155.03 99.07 63%
4 512 9193.70 5900.17 64% 202.84 106.44 52%
8 512 9287.51 7107.79 76% 202.18 129.08 63%
1 1024 4166.42 4224.98 101% 128.55 129.86 101%
2 1024 6196.94 7823.08 126% 181.80 168.81 92%
4 1024 9113.62 9219.49 101% 235.15 190.93 81%
8 1024 9324.25 9402.66 100% 239.10 179.99 75%
1 2048 7441.63 6534.04 87% 248.01 215.63 86%
2 2048 7024.61 7414.90 105% 225.79 219.62 97%
4 2048 8971.49 9269.00 103% 278.94 220.84 79%
8 2048 9314.20 9359.96 100% 268.36 192.23 71%
1 4096 8282.60 8990.08 108% 277.45 320.05 115%
2 4096 9194.80 9293.78 101% 317.02 248.76 78%
4 4096 9340.73 9313.19 99% 300.34 230.35 76%
8 4096 9148.23 9347.95 102% 279.49 199.43 71%
1 16384 8787.89 8766.31 99% 312.38 316.53 101%
2 16384 9306.35 9156.14 98% 319.53 279.83 87%
4 16384 9177.81 9307.50 101% 312.69 230.07 73%
8 16384 9035.82 9188.00 101% 298.32 199.17 66%
- TCP RR
sessions size throughput1 throughput2   norm1 norm2
50 1 54695.41 84164.98 153% 1957.33 1901.31 97%
100 1 60141.88 88598.94 147% 2157.90 2000.45 92%
250 1 74763.56 135584.22 181% 2541.94 2628.59 103%
50 64 51628.38 82867.50 160% 1872.55 1812.16 96%
100 64 60367.73 84080.60 139% 2215.69 1867.69 84%
250 64 68502.70 124910.59 182% 2321.43 2495.76 107%
50 128 53477.08 77625.07 145% 1905.10 1870.99 98%
100 128 59697.56 74902.37 125% 2230.66 1751.03 78%
250 128 71248.74 133963.55 188% 2453.12 2711.72 110%
50 256 47663.86 67742.63 142% 1880.45 1735.30 92%
100 256 54051.84 68738.57 127% 2123.03 1778.59 83%
250 256 68250.06 124487.90 182% 2321.89 2598.60 111%
- External Host to Guest TCP STRAM
sessions size throughput1 throughput2   norm1 norm2
1 64 847.71 864.83 102% 57.99 57.93 99%
2 64 1690.82 1544.94 91% 80.13 55.09 68%
4 64 3434.98 3455.53 100% 127.17 89.00 69%
8 64 5890.19 6557.35 111% 194.70 146.52 75%
1 256 2094.04 2109.14 100% 130.73 127.14 97%
2 256 5218.13 3731.97 71% 219.15 114.02 52%
4 256 6734.51 9213.47 136% 227.87 208.31 91%
8 256 6452.86 9402.78 145% 224.83 207.77 92%
1 512 3945.07 4203.68 106% 279.72 273.30 97%
2 512 7878.96 8122.55 103% 278.25 231.71 

Re: [net-next RFC V5 0/5] Multiqueue virtio-net

2012-07-06 Thread Jason Wang

On 07/06/2012 01:45 AM, Rick Jones wrote:

On 07/05/2012 03:29 AM, Jason Wang wrote:



Test result:

1) 1 vm 2 vcpu 1q vs 2q, 1 - 1q, 2 - 2q, no pinning

- Guest to External Host TCP STREAM
sessions size throughput1 throughput2   norm1 norm2
1 64 650.55 655.61 100% 24.88 24.86 99%
2 64 1446.81 1309.44 90% 30.49 27.16 89%
4 64 1430.52 1305.59 91% 30.78 26.80 87%
8 64 1450.89 1270.82 87% 30.83 25.95 84%


Was the -D test-specific option used to set TCP_NODELAY?  I'm guessing 
from your description of how packet sizes were smaller with multiqueue 
and your need to hack tcp_write_xmit() it wasn't but since we don't 
have the specific netperf command lines (hint hint :) I wanted to make 
certain.

Hi Rick:

I didn't specify -D for disabling Nagle. I also collects rx packets and 
average packet size:


Guest to External Host ( 2vcpu 1q vs 2q )
sessions size tput-sq tput-mq %  norm-sq norm-mq %  #tx-pkts-sq 
#tx-pkts-mq % avg-sz-sq avg-sz-mq %

1 64 668.85 671.13 100% 25.80 26.86 104% 629038 627126 99% 1395 1403 100%
2 64 1421.29 1345.40 94% 32.06 27.57 85% 1318498 1246721 94% 1413 1414 100%
4 64 1469.96 1365.42 92% 32.44 27.04 83% 1362542 1277848 93% 1414 1401 99%
8 64 1131.00 1361.58 120% 24.81 26.76 107% 1223700 1280970 104% 1395 
1394 99%

1 256 1883.98 1649.87 87% 60.67 58.48 96% 1542775 1465836 95% 1592 1472 92%
2 256 4847.09 3539.74 73% 98.35 64.05 65% 2683346 3074046 114% 2323 1505 64%
4 256 5197.33 3283.48 63% 109.14 62.39 57% 1819814 2929486 160% 3636 
1467 40%

8 256 5953.53 3359.22 56% 122.75 64.21 52% 906071 2924148 322% 8282 1502 18%
1 512 3019.70 2646.07 87% 93.89 86.78 92% 2003780 2256077 112% 1949 1532 78%
2 512 7455.83 5861.03 78% 173.79 104.43 60% 1200322 3577142 298% 7831 
2114 26%
4 512 8962.28 7062.20 78% 213.08 127.82 59% 468142 2594812 554% 24030 
3468 14%
8 512 7849.82 8523.85 108% 175.41 154.19 87% 304923 1662023 545% 38640 
6479 16%


When multiqueue were enabled, it does have a higher packets per second 
but with a much more smaller packet size. It looks to me that multiqueue 
is faster and guest tcp have less oppotunity to build a larger skbs to 
send, so lots of small packet were required to send which leads to much 
more #exit and vhost works. One interesting thing is, if I run tcpdump 
in the host where guest run, I can get obvious throughput increasing. To 
verify the assumption, I hack the tcp_write_xmit() with following patch 
and set tcp_tso_win_divisor=1, then I multiqueue can outperform or at 
least get the same throughput as singlequeue, though it could introduce 
latency but I havent' measured it.


I'm not expert of tcp, but looks like the changes are reasonable:
- we can do full-sized TSO check in tcp_tso_should_defer() only for 
westwood, according to tcp westwood

- run tcp_tso_should_defer for tso_segs = 1 when tso is enabled.

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c465d3e..166a888 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1567,7 +1567,7 @@ static bool tcp_tso_should_defer(struct sock *sk, 
struct sk_buff *skb)


in_flight = tcp_packets_in_flight(tp);

-   BUG_ON(tcp_skb_pcount(skb) = 1 || (tp-snd_cwnd = in_flight));
+   BUG_ON(tp-snd_cwnd = in_flight);

send_win = tcp_wnd_end(tp) - TCP_SKB_CB(skb)-seq;

@@ -1576,9 +1576,11 @@ static bool tcp_tso_should_defer(struct sock *sk, 
struct sk_buff *skb)


limit = min(send_win, cong_win);

+#if 0
/* If a full-sized TSO skb can be sent, do it. */
if (limit = sk-sk_gso_max_size)
goto send_now;
+#endif

/* Middle in queue won't get any more data, full sendable 
already? */

if ((skb != tcp_write_queue_tail(sk))  (limit = skb-len))
@@ -1795,10 +1797,9 @@ static bool tcp_write_xmit(struct sock *sk, 
unsigned int mss_now, int nonagle,
 
(tcp_skb_is_last(sk, skb) ?
  nonagle : 
TCP_NAGLE_PUSH

break;
-   } else {
-   if (!push_one  tcp_tso_should_defer(sk, skb))
-   break;
}
+   if (!push_one  tcp_tso_should_defer(sk, skb))
+   break;

limit = mss_now;
if (tso_segs  1  !tcp_urg_mode(tp))






Instead of calling them throughput1 and throughput2, it might be more 
clear in future to identify them as singlequeue and multiqueue.




Sure.
Also, how are you combining the concurrent netperf results?  Are you 
taking sums of what netperf reports, or are you gathering statistics 
outside of netperf?




The throughput were just sumed from netperf result like what netperf 
manual suggests. The cpu utilization were measured by mpstat.

- TCP RR
sessions size throughput1 throughput2   norm1 norm2
50 1 54695.41 84164.98 153% 1957.33 1901.31 97%


A single instance TCP_RR test would help confirm/refute any 
non-trivial change in 

Re: [net-next RFC V5 0/5] Multiqueue virtio-net

2012-07-06 Thread Rick Jones

On 07/06/2012 12:42 AM, Jason Wang wrote:

I'm not expert of tcp, but looks like the changes are reasonable:
- we can do full-sized TSO check in tcp_tso_should_defer() only for
westwood, according to tcp westwood
- run tcp_tso_should_defer for tso_segs = 1 when tso is enabled.


I'm sure Eric and David will weigh-in on the TCP change.  My initial 
inclination would have been to say well, if multiqueue is draining 
faster, that means ACKs come-back faster, which means the race between 
more data being queued by netperf and ACKs will go more to the ACKs 
which means the segments being sent will be smaller - as TCP_NODELAY is 
not set, the Nagle algorithm is in force, which means once there is data 
outstanding on the connection, no more will be sent until either the 
outstanding data is ACKed, or there is an accumulation of  MSS worth of 
data to send.



Also, how are you combining the concurrent netperf results?  Are you
taking sums of what netperf reports, or are you gathering statistics
outside of netperf?



The throughput were just sumed from netperf result like what netperf
manual suggests. The cpu utilization were measured by mpstat.


Which mechanism to address skew error?  The netperf manual describes 
more than one:


http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance

Personally, my preference these days is to use the demo mode method of 
aggregate results as it can be rather faster than (ab)using the 
confidence intervals mechanism, which I suspect may not really scale all 
that well to large numbers of concurrent netperfs.


I also tend to use the --enable-burst configure option to allow me to 
minimize the number of concurrent netperfs in the first place.  Set 
TCP_NODELAY (the test-specific -D option) and then have several 
transactions outstanding at one time (test-specific -b option with a 
number of additional in-flight transactions).


This is expressed in the runemomniaggdemo.sh script:

http://www.netperf.org/svn/netperf2/trunk/doc/examples/runemomniaggdemo.sh

which uses the find_max_burst.sh script:

http://www.netperf.org/svn/netperf2/trunk/doc/examples/find_max_burst.sh

to pick the burst size to use in the concurrent netperfs, the results of 
which can be post-processed with:


http://www.netperf.org/svn/netperf2/trunk/doc/examples/post_proc.py

The nice feature of using the demo mode mechanism is when it is 
coupled with systems with reasonably synchronized clocks (eg NTP) it can 
be used for many-to-many testing in addition to one-to-many testing 
(which cannot be dealt with by the confidence interval method of dealing 
with skew error)



A single instance TCP_RR test would help confirm/refute any
non-trivial change in (effective) path length between the two cases.



Yes, I would test this thanks.


Excellent.

happy benchmarking,

rick jones

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization