Hi Hans, I would continue this discussion with a different change. The piece of change is here and also I attached the patch "change.patch" against the FreeBSD HEAD code-line.
diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c
index 2043fc9..fa124f1 100644
--- a/sys/netinet/tcp_output.c
+++ b/sys/netinet/tcp_output.c
@@ -938,25 +938,16 @@ send:
* fractional unless the send sockbuf can be
* emptied:
*/
- max_len = (tp->t_maxseg - optlen);
- if ((off + len) < sbavail(&so->so_snd)) {
+ max_len = (tp->t_maxopd - optlen);
+ if (len > (max_len << 1)) {
moff = len % max_len;
if (moff != 0) {
len -= moff;
sendalot = 1;
}
}
-
- /*
- * In case there are too many small fragments
- * don't use TSO:
- */
- if (len <= max_len) {
- len = max_len;
- sendalot = 1;
- tso = 0;
- }
-
+ KASSERT(len > max_len,
+ ("[%s:%d]: len <= max_len", __func__,
__LINE__));
/*
* Send the FIN in a separate segment
* after the bulk sending is done.
I think this change could save additional loops that send single MSS-size
packets. So I think some CPU cycles can be saved as well, due to this
change
reduced software sends and pushed more data to offloading sends.
Here is my test. The iperf command I choose pushes 100Mbytes data to the
wire by setting the default TCP sendspace to 1MB and recvspace to 2MB. I
tested this TCP connection performance on a pair of 10Gbps FreeBSD 10.2
nodes
(s1 and r1) with a switch in between. Both nodes have TSO and delayed ACK
enabled.
root@s1:~ # ping -c 3 r1
PING r1-link1 (10.1.2.3): 56 data bytes
64 bytes from 10.1.2.3: icmp_seq=0 ttl=64 time=0.045 ms
64 bytes from 10.1.2.3: icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from 10.1.2.3: icmp_seq=2 ttl=64 time=0.038 ms
--- r1-link1 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.037/0.040/0.045/0.004 ms
1M snd buffer/2M rcv buffer
sysctl -w net.inet.tcp.hostcache.expire=1
sysctl -w net.inet.tcp.sendspace=1048576
sysctl -w net.inet.tcp.recvspace=2097152
iperf -s <== iperf command@receiver
iperf -c r1 -m -n 100M <== iperf command@sender
root@s1:~ # iperf -c r1 -m -n 100M
------------------------------------------------------------
Client connecting to r1, TCP port 5001
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
[ 3] local 10.1.2.2 port 22491 connected with 10.1.2.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 0.3 sec 100 MBytes 2.69 Gbits/sec
[ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
root@r1:~ # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 2.00 MByte (default)
------------------------------------------------------------
[ 4] local 10.1.2.3 port 5001 connected with 10.1.2.2 port 22491
[ ID] Interval Transfer Bandwidth
[ 4] 0.0- 0.3 sec 100 MBytes 2.62 Gbits/sec
Each test sent 100MBytes of data, and I collected the packet trace from
both
nodes by tcpdump. I did this test twice to confirm the result can be
reproduced.
>From the trace files of both nodes before my code change, I see a lot of
single-MSS size packets. See the attached trace files in
"before_change.zip".
For example, in a sender trace file I see 43480 single-MSS size
packets(tcp.len==1448) out of 57005 packets that contain data(tcp.len >
0).
That's 76.2%.
And I did the same iperf test and gathered trace files. I did not find
many single-MSS packets this time. See the attached trace files in
"after_change.zip". For example, in a sender trace file I see zero
single-MSS
size packets(tcp.len==1448) out of 35729 data packets(tcp.len > 0).
Compared with the receiver traces, I did not see significant more
fractional
packets received after change.
I also did tests using netperf, although I did not get enough 95%
confidence for
every test on snd/rcv buffer size. Attached are my netperf result on
different
snd/rcv buffer size before and after the change (netperf_before_change.txt
and
netperf_after_change.txt), which also look good.
used netperf command:
netperf -H s1 -t TCP_STREAM -C -c -l 400 -i 10,3 -I 95,10 -- -s
${LocalSndBuf} -S ${RemoteSndBuf}
Thanks,
--Cheng Cui
NetApp Scale Out Networking
change.patch
Description: change.patch
Thu Apr 7 14:42:21 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo !!! WARNING !!! Desired confidence was not achieved within the specified iterations. !!! This implies that there was variability in the test environment that !!! must be investigated before going further. !!! Confidence intervals: Throughput : 10.783% !!! Local CPU util : 6.277% !!! Remote CPU util : 1.153% Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 65536 32768 32768 400.01 4670.31 4.87 5.48 2.747 3.091 Thu Apr 7 15:49:02 MDT 2016 Thu Apr 7 15:49:12 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo !!! WARNING !!! Desired confidence was not achieved within the specified iterations. !!! This implies that there was variability in the test environment that !!! must be investigated before going further. !!! Confidence intervals: Throughput : 7.347% !!! Local CPU util : 11.658% !!! Remote CPU util : 1.524% Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 131072 65536 65536 400.01 4742.45 4.99 5.53 2.759 3.064 Thu Apr 7 16:55:52 MDT 2016 Thu Apr 7 16:56:02 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo !!! WARNING !!! Desired confidence was not achieved within the specified iterations. !!! This implies that there was variability in the test environment that !!! must be investigated before going further. !!! Confidence intervals: Throughput : 10.212% !!! Local CPU util : 12.850% !!! Remote CPU util : 0.874% Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 262144 131072 131072 400.02 4881.49 5.42 5.53 2.915 2.981 Thu Apr 7 18:02:42 MDT 2016 Thu Apr 7 18:02:52 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo !!! WARNING !!! Desired confidence was not achieved within the specified iterations. !!! This implies that there was variability in the test environment that !!! must be investigated before going further. !!! Confidence intervals: Throughput : 36.686% !!! Local CPU util : 12.641% !!! Remote CPU util : 12.322% Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 524288 262144 262144 400.02 3734.29 5.01 5.03 3.678 3.671 Thu Apr 7 19:09:33 MDT 2016 Thu Apr 7 19:09:43 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 1048576 524288 524288 400.02 2891.10 4.58 4.64 4.155 4.210 Thu Apr 7 19:43:03 MDT 2016 Thu Apr 7 19:43:13 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo !!! WARNING !!! Desired confidence was not achieved within the specified iterations. !!! This implies that there was variability in the test environment that !!! must be investigated before going further. !!! Confidence intervals: Throughput : 11.692% !!! Local CPU util : 10.800% !!! Remote CPU util : 7.792% Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 2097152 1048576 1048576 400.04 2984.77 4.78 4.80 4.201 4.221 Thu Apr 7 20:49:54 MDT 2016 Thu Apr 7 20:50:05 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 4194304 2097152 2097152 400.34 2908.97 4.66 4.68 4.196 4.213 Thu Apr 7 21:16:47 MDT 2016 Thu Apr 7 21:16:57 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 8388608 4194304 4194304 400.28 2922.54 3.82 4.69 3.431 4.205 Thu Apr 7 21:57:01 MDT 2016
Sat Apr 9 09:56:28 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo !!! WARNING !!! Desired confidence was not achieved within the specified iterations. !!! This implies that there was variability in the test environment that !!! must be investigated before going further. !!! Confidence intervals: Throughput : 21.820% !!! Local CPU util : 10.174% !!! Remote CPU util : 14.250% Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 65536 32768 32768 400.01 4537.24 5.17 5.28 3.039 3.085 Sat Apr 9 11:03:08 MDT 2016 Sat Apr 9 11:03:19 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 131072 65536 65536 400.02 4628.43 5.70 5.42 3.231 3.071 Sat Apr 9 11:49:59 MDT 2016 Sat Apr 9 11:50:09 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo !!! WARNING !!! Desired confidence was not achieved within the specified iterations. !!! This implies that there was variability in the test environment that !!! must be investigated before going further. !!! Confidence intervals: Throughput : 3.551% !!! Local CPU util : 10.216% !!! Remote CPU util : 2.961% Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 262144 131072 131072 400.01 4551.61 5.30 5.46 3.057 3.148 Sat Apr 9 12:56:49 MDT 2016 Sat Apr 9 12:56:59 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo !!! WARNING !!! Desired confidence was not achieved within the specified iterations. !!! This implies that there was variability in the test environment that !!! must be investigated before going further. !!! Confidence intervals: Throughput : 25.918% !!! Local CPU util : 14.030% !!! Remote CPU util : 11.608% Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 524288 262144 262144 400.02 4137.23 5.51 5.19 3.587 3.356 Sat Apr 9 14:03:40 MDT 2016 Sat Apr 9 14:03:50 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 1048576 524288 524288 400.02 2952.14 4.75 4.73 4.216 4.196 Sat Apr 9 14:43:50 MDT 2016 Sat Apr 9 14:44:01 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 2097152 1048576 1048576 400.03 3001.44 4.94 4.84 4.310 4.231 Sat Apr 9 15:44:02 MDT 2016 Sat Apr 9 15:44:12 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 4194304 2097152 2097152 400.34 2948.57 4.79 4.79 4.259 4.262 Sat Apr 9 16:10:54 MDT 2016 Sat Apr 9 16:11:04 MDT 2016 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 8388608 4194304 4194304 400.34 2940.28 4.28 4.70 3.811 4.194 Sat Apr 9 17:04:30 MDT 2016
_______________________________________________ [email protected] mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "[email protected]"
