On 10/26/2012 5:03 PM, Andrew Deason wrote: > On Wed, 17 Oct 2012 10:45:05 +0200 > To provide a sense of ordering... rxgk standards work will definitely > precede tcp oob, though rxgk implementation may or may not. After rxgk, > some smaller/simpler standards docs may go through, but tcp oob may be > the next 'bigger' one. But the ordering here is unsure; Mike Meffie > should be clarifying some specifics of the new standards process within > the next week. I expect that around that time is when we'll discuss the > priority of which documents to look at; some people may disagree with my > guessed priorities. > > Note that that is my thinking and my guesses for code being in the tree, > not for a stable release. Release scheduling is such a question mark for > me right now I can't even begin to guess for that.
I have significant concerns about the design of TCP OOB as it was described at EAKC2012. http://conferences.inf.ed.ac.uk/eakc2012/slides/201210_eakc_oob.pdf The argument in favor of a TCP based solution is that RX cannot go fast enough. Andrew's claim is that RX cannot use a window size greater than 43.75K because of the 32 packet window limitation in 1.6. The fact is that this limitation is not a protocol limitation but an implementation limitation. Andrew points to Simon Wilkinson's past talks on RX as a justification for this restriction. Of course, Simon's talks also provide the road map for how to remove the bottlenecks and permit the full window size of 256 packets to be used without performance degradation. When combined with Jumbograms a window can support up to 2MB of application data per call without any changes to the protocol. In fact, Simon's "AFS Performance" talk, slide 28 http://conferences.inf.ed.ac.uk/eakc2012/slides/AFS_Performance.pdf showed some of the results of his hard work with RX UDP throughput quadrupled since 1.4 and more than doubled since 1.6. There is still room for significant improvement beyond the numbers presented at the conference. As Andrew indicated in his talk, TCP OOB was a compromise driven by the fact that spending the resources to fix the RX implementation was deemed to be too time consuming and expensive. TCP OOB was designed to be a rapid development approach to obtaining higher throughput. Andrew achieved some impressive numbers in a closed environment with no requirement for wire privacy OOB channel. However, when we begin to consider standardization and inclusion of an OOB mechanism in OpenAFS we must provide for wire authentication and privacy at least as strong as that provided by the RX UDP connection and the impact that TCP socket allocation will have on the scalability of the file server and fairness in delivering data to all clients. Between AFS2 and AFS3 a decision was made to switch to UDP because the file servers could not maintain enough open tcp connections to serve all of the clients. While we might believe the days of TCP socket limits are behind us, the number of file descriptors on a system does not scale to the number of clients that may be actively connecting to an AFS file server on public Internet deployments. Adding a TCP connection per active FetchData / StoreData operation can severely restrict the number of clients a file server can communicate with simultaneously. It might be interesting to the community to note that Microsoft as part of Server 2012 and Windows 8 have begun rolling out UDP based equivalents to many of the Windows protocols that are frequently used over the public Internet. Some of these new protocols have been back ported to Windows 7 SP1 and are being rolled out via Windows Update starting today. The reason for these new protocol implementations is that research has shown that UDP based protocols can be faster than TCP and perform significantly better over connections with large latencies and packet loss. While I believe there is a place for OOB transfer protocols in constrained situations it is my personal belief that OOB transfer protocols are not an appropriate long term solution for general access to the /afs name space. Improving RX UDP to support high performance data transfers is not just theoretical but has been demonstrated. In a later reply, Matt asked about RX/TCP and what happened to it. RX/TCP was designed to be a BEEP style protocol which included bidirectional data flows. After considerable effort was spent on implementation there were significant problems. First the performance characteristics when managing multiple calls in more than one direction over a single TCP connection were not as impressive as one would want. Research papers have documented the problems caused by multiple layers of flow control and their negative interactions. Second, without a clean well defined RX API which hid the implementation from the callers it would not be possible to quietly graft RX TCP into the existing cache managers and service. Finally, the file server would need to manage the binding of callback connections to an appropriately secured in-bound TCP connection and manage the cases where one did not exist. In the end, the resources being spent on the problem did not appear to be worth the benefits when compared to the benefits that improving the RX UDP implementation will provide. As a result, YFSi has continued to devote on-going development resources to profiling, protocol analysis, and performance optimization for the RX UDP protocol. It is our firm belief that for the vast majority of use cases, OOB protocols are simply unnecessary. That is not to say that standardization work on an OOB solution should not proceed. However, I believe that the community would be better off not looking for a quick fix and should instead focus efforts and resources on a top to bottom analysis of the data flows. This is what YFSi has spent the last five years doing to great success. Jeffrey Altman Jeffrey Altman
signature.asc
Description: OpenPGP digital signature
