Re: [OpenAFS] iperf vs rxperf in high latency network

Jeffrey E Altman Thu, 08 Aug 2019 20:07:52 -0700

Hi Simon,

response inline ...

On 8/8/2019 2:54 PM, [email protected] wrote:
> To make sure I captured all the explanations correctly, please allow me to 
> summarize my understandings:
> 
> Flow control over a high-latency, potentially congested link is a fundamental 
> challenge that both TCP and UDP+Rx face. Both protocol and implementation can 
> pose a problem. The reason why I did not see an improvement when enlarging 
> the window size in rxperf is that firstly I chose too few data bytes to 
> transfer and secondly that OpenAFS's Rx has some implementation limitations 
> that become a limiting factor before the window size limit kicks in. They are 
> non-trivial to fix, as demonstrated in the 1.5.x throughput "hiccup". But 
> AuriStor fixed a significant amount of it in its proprietary Rx 
> re-implementation. 
> 
> One can borrow ideas and principals from algorithm research in TCP's flow 
> control to improve Rx throughput. I am not an expert on this topic, but I 
> wonder if the principals in Google's BBR algorithm can help further improve 
> Rx throughput, and I wonder if there is anything that makes TCP fundamentally 
> superior than UDP in implementing flow control. 

There is nothing specific to TCP that makes it better than RX in
implementing flow control other than the fact that TCP has more than
thirty years of active research applied to it and RX does not.

AuriStor continues to invest in RX as we believe that RX can perform as
well as TCP while benefiting from its unique security binding
capabilities.  Reliance Memory's RRAM is targeted at IoT devices.  I
believe that RX is should be the network transport of choice for IoT.

One of the requirements for implementing BBR is fine grained accurate
measurements of RTT which is very hard to obtain from within a userland
implementation that relies upon an operating system's UDP sockets.
However, BBR principals can be applied to the Linux kernel's af_rxrpc
implementation and userland implementations built to use Intel's Data
Plane Development Kit (DPDK).  I would be happy to speak with you
off-list about either.

> When it comes to deployment strategy, there may be workarounds to the 
> high-latency limitation. Each of them, of course, has limitations. I can 
> probably use the technique mentioned below to leverage the TCP throughput in 
> RO volume synchronization, 
> https://lists.openafs.org/pipermail/openafs-info/2018-August/042502.html
> and wait until DPF becomes available in vos operations:
> https://openafs-workshop.org/2019/schedule/faster-wan-volume-operations-with-dpf/

As part of AuriStor's SBIR we were funded to research RX/TCP and
implement it if appropriate.  The accepted theory was that RX/TCP would
permit RX based applications to benefit from all of the research and
implementation improvements that TCP benefited from over the decades.
However, we quickly discovered that an RX application that implemented
both RX/TCP and current day RX/UDP could not ensure fairness for the
RX/UDP connections.  The RX/TCP flows would dominate the network at the
expense of RX/UDP flows because RX/UDP could not properly adjust to
network congestion levels.

Some people argued "good riddance, let RX/UDP die" but the reality is
that RX/UDP is where the existing user base is and it was unacceptable
to me that one class of users should be penalized in favor of another.
In order to permit TCP flows to be mixed with RX/UDP flows fairly,
RX/UDP needed fixing; and once RX/UDP was fixed there was little
justification for RX/TCP.

The same fairness issues apply to Sine Nomine Associate's DPF and prior
Out-of-Band TCP proposals.

> I can also adopt a small home volume, distributed subfolder volume strategy 
> that allows home volumes to move with relocated users across WAN, but keep 
> subdirectory volumes at their respective geographic location. Users can pick 
> a subdirectory that is closest to their current location to work with. When 
> combined with a version control system that uses TCP in syncing, project data 
> synching can be alleviated. 

AuriStor has several ideas that would be beneficial to your deployment
scenarios:

 1. floating master read/write replication.

 2. split horizon volume location service

I would be happy to discuss both topics with you off-list.

> There is a commercial path that we can pursue with AuriStor or other vendors. 
> But I guess that is out of the scope of this mail list. 
> 
> Any other strategies that may help?
> 
> Thank you, Jeff!

You are welcome.

> Simon Guan

Jeffrey Altman

<<attachment: jaltman.vcf>>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] iperf vs rxperf in high latency network

Reply via email to