Hello, Can someone kindly explain again the possible reasons why Rx is so painfully slow for a high latency (~230ms) link?
>From a user perspective, I wonder if there is any *quick Rx code hacking* that could help reduce the throughput gap of (iperf2 = 30Mb/s vs rxperf = 800Kb/s) for the following specific case. We are considering the possibility of including two hosts ~230ms RTT apart as server and client. I used iperf2 and rxperf to test throughput between the two. There is no other connection competing with the test. So this is different from a low-latency, thread or udp buffer exhaustion scenario. iperf2's UDP test shows a bandwidth of ~30Mb/s without packet loss, though some of them have been re-ordered at the receiver side. Below 5 Mb/s, the receiver sees no packet re-ordering. Above 30 Mb/s, packet loss is seen by the receiver. Test result is pretty consistent at multiple time points within 24 hours. UDP buffer size used by iperf is 208 KB. Write length is set at 1300 (-l 1300) which is below the path MTU. Interestingly, a quick skim through the iperf2 source code suggests that an iperf sender does not wait for the receiver's ack. It simply keeps write(mSettins->mSock, mBuf, mSettings->mBufLen) and timing it to extract the numerical value for the throughput. It only checks, in the end, to see if the receiver complains about packet loss. rxperf, on the other hand, only gets ~800 Kb/s. What makes it worse is that it does not seem to be dependent on the window size (-W 32~255), or udpsize (-u default~512*1024). I tried to re-compile rxperf that has #define RXPERF_BUFSIZE (1024 * 1024 * 64) instead of the original (512 * 1024). I did not see a throughput improvement from going above -u 512K. Occasionally some packets are re-transmitted. If I reduce -W or -u to very small values, I see some penalty. Kernel's rmem_max and wmem_max have been set at 32M for the socket buffer size in both hosts. rx max mtu is set at "-m 1344" (i.e., 1400 path mtu - 20 IP header - 8 UDP header - 28 Rx header). rxperf is compiled from the 1.8.3 source code. I noticed some discussions before at: https://lists.openafs.org/pipermail/openafs-info/2010-December/035143.html https://lists.openafs.org/pipermail/openafs-info/2013-June/039661.html and most recently at https://openafs-workshop.org/2019/schedule/faster-wan-volume-operations-with -dpf/ (Very nice work. We look forward to the code commission and merging to master.) The theory goes if I have a 32-packet recv/send window (Ack Count) with 1344 bytes of packet size and RTT=230ms, I should expect a theoretical upper bound of 32 x 8 x 1344 / 0.23 / 1000000 = 1.5 Mb/s. If the AFS-implemented Rx windows size (32) is really the limiting factor of the throughput, then the throughput should increase when I increase the window size (-w) above 32 and configure a sufficiently big kernel socket buffer size. I did not see either of the predictions by the theory above. I wonder if some light could be shed on: 1. What else may be the limiting factor in my case 2. If there is a quick way to increase recv/send window from 32 to 255 in Rx code without breaking other parts of AFS. 3. If there is any quick (maybe dirty) way to leverage the iperf2 observation, relax the wait for ack as long as the received packets are in order and not lost (that is, get me up to 5Mb/s...) Thank you in advance. ========================== Ximeng (Simon) Guan, Ph.D. Director of Device Technology Reliance Memory ========================== iperf2 test =========== *Server Side [xmsguan@afsdb1 ~]$ iperf -u -s ------------------------------------------------------------ Server listening on UDP port 5001 Receiving 1470 byte datagrams UDP buffer size: 208 KByte (default) ------------------------------------------------------------ *Client Side [xmsguan@afsdb3 ~]$ iperf -u -b 30M -l 1300 -i 1 -t 3 -e -c afsdb1 ------------------------------------------------------------ Client connecting to *, UDP port 5001 with pid 6381 Sending 1300 byte datagrams, IPG target: 330.61 us (kalman adjust) UDP buffer size: 208 KByte (default) ------------------------------------------------------------ [ 3] local * port 53558 connected with * port 5001 [ ID] Interval Transfer Bandwidth Write/Err PPS [ 3] 0.00-1.00 sec 3.75 MBytes 31.5 Mbits/sec 3026/0 3026 pps [ 3] 1.00-2.00 sec 3.75 MBytes 31.5 Mbits/sec 3025/0 3025 pps [ 3] 0.00-3.00 sec 11.3 MBytes 31.5 Mbits/sec 9075/0 3024 pps [ 3] Sent 9075 datagrams [ 3] Server Report: [ 3] 0.0- 3.0 sec 11.3 MBytes 31.2 Mbits/sec 0.852 ms 0/ 9075 (0%) [ 3] 0.00-3.03 sec 3735 datagrams received out-of-order [xmsguan@afsdb3 ~]$ rxperf test =========== 24-packet window: *Server side [xmsguan@afsdb1 ~]$ ./rxperf server -u 33554432 -W 24 *Client side ./rxperf client -c send -b 1024000 -m 1344 -u 33554432 -W 24 -s afsdb1 -T 3 -D SEND: threads 1, times 3, bytes 1024000: 32509 msec [756 kbit/s] rx stats: free packets 179, allocs 2453, alloc-failures(rcv 0/0,send 0/0,ack 0) greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, selects 0, sendSelects 0 packets read: data 3 ack 1680 busy 0 abort 0 ackall 0 challenge 0 response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0 other read counters: data 3, ack 1680, dup 0 spurious 0 dally 0 packets sent: data 2364 ack 4 busy 0 abort 0 ackall 0 challenge 0 response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0 other send counters: ack 4, data 2337 (not resends), resends 27, pushed 0, acked&ignored 4715 (these should be small) sendFailed 0, fatalErrors 0 Average rtt is 0.233, with 2024 samples Minimum rtt is 0.225, maximum is 0.332 0 server connections, 1 client connections, 1 peer structs, 1 call structs, 0 free call structs Peer a0a0a07.7009. Rtt 1884, total sent 2364, resent 27 Packet size 1344 [xmsguan@afsdb3 ~]$ 32-packet window *Server side [xmsguan@afsdb1 ~]$ ./rxperf server -u 33554432 -W 32 *Client side [xmsguan@afsdb3 ~]$ ./rxperf client -c send -b 1024000 -m 1344 -u 33554432 -W 32 -s afsdb1 -T 3 -D SEND: threads 1, times 3, bytes 1024000: 29755 msec [825.9 kbit/s] rx stats: free packets 179, allocs 2453, alloc-failures(rcv 0/0,send 0/0,ack 0) greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, selects 0, sendSelects 0 packets read: data 3 ack 1680 busy 0 abort 0 ackall 0 challenge 0 response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0 other read counters: data 3, ack 1680, dup 0 spurious 0 dally 0 packets sent: data 2365 ack 4 busy 0 abort 0 ackall 0 challenge 0 response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0 other send counters: ack 4, data 2337 (not resends), resends 28, pushed 0, acked&ignored 4955 (these should be small) sendFailed 0, fatalErrors 0 Average rtt is 0.234, with 2081 samples Minimum rtt is 0.224, maximum is 0.333 0 server connections, 1 client connections, 1 peer structs, 1 call structs, 0 free call structs Peer a0a0a07.7009. Rtt 1840, total sent 2365, resent 28 Packet size 1344 [xmsguan@afsdb3 ~]$ 255-packet window: *Server side [xmsguan@afsdb1 ~]$ ./rxperf server -u 33554432 -W 255 *Client side [xmsguan@afsdb3 ~]$ ./rxperf client -c send -b 1024000 -m 1344 -u 33554432 -W 255 -s afsdb1 -T 3 -D SEND: threads 1, times 3, bytes 1024000: 32508 msec [756 kbit/s] rx stats: free packets 638, allocs 2393, alloc-failures(rcv 0/0,send 0/0,ack 0) greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, selects 0, sendSelects 0 packets read: data 3 ack 1670 busy 0 abort 0 ackall 0 challenge 0 response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0 other read counters: data 3, ack 1670, dup 0 spurious 0 dally 0 packets sent: data 2404 ack 4 busy 0 abort 0 ackall 0 challenge 0 response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0 other send counters: ack 4, data 2337 (not resends), resends 67, pushed 0, acked&ignored 3969 (these should be small) sendFailed 0, fatalErrors 0 Average rtt is 0.232, with 2054 samples Minimum rtt is 0.223, maximum is 0.336 0 server connections, 1 client connections, 1 peer structs, 1 call structs, 0 free call structs Peer a0a0a07.7009. Rtt 1846, total sent 2404, resent 67 Packet size 1344 [xmsguan@afsdb3 ~]$ _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
