Ken Hornstein wrote:
I've not compared RX against sunrpc+gssapi, but I've compared it against TCP on higher speed networks. Between two OC-12 connected machines, I can get around 510-520 Mbits. The best I can do with RX between the same machines is around 140 Mbits (I know Hartmut has claimed that he sees full bandwidth at Gig-E speeds, but I've never seen that here).
I did some debugging recently in the RX protocol layer trying to solve performance problems and fixed a few things already:
1. despite the verbal claim to request an ACK on the last packet sent in a chain, this does not always happen. In Murphy's terms it never happens when it would do good. In my tests, once the windows had reasonably opened the last packet in a chain had the flags field set to 0. Therefore the ACK required to release the next batch was sent only after a timeout (of about 0.3 second). The protocol continues to work, but slowly.
Fixing this brought the speed from 20-30 MB/s to >110 MB/s (memory to memory, LAN). What was funny is that Hartmut ran my program without any tweaks and got 114 MB/s immediatly. The only explanation I have is that he hit some sort of "sweet spot" or "standing wave" on the two identical uni-processor machines he tried it on.
2. When debugging using the rx_debugFile it would be nice if all packets were printed. Alas, when sending a chain of packets the "dpf" is placed outside the loop so that it only traces one of them, by Murphy the least interesting one.
I'll submit the patches to those two issues as soon as I've dealt with the last one:
once the first few ACKs arrive RX doubles the packet size (the MTU). Now, certain networks (wide area) appear to support this only "sort of". I have been given hints that Juniper routers start to heavily drop packets in that case, ruining packet re-assembly. I wrote a small test program that runs 20 kB/s without any special parameters, but at satisfying 2.4MB/s simply by calling rx_SetNoJumbo() (over the WAN of course, that's also the speed you get via TCP). What would be needed is a mechanism to discover that the MTU increase led to a slow-down and recover - but I admit I don't quite understand yet how the already existing code is meant to work.
Another one: RX has an implicit limit of a send/receive window of 255 packets. For trans-atlantic traffic that's not too much with a standard MTU of 1416 (or so) bytes. In 1.3.x the variables are 32 bit numbers, but the offset for the ACKs is still a u_char and that sits in the packet. Perhaps fixable by sending several small ACK packets.
BTW:
RX over TCP might be interesting - if it scales! I'm a bit worried about file servers doing poll() (forget select()) on 10000+ TCP connections...
Besides that probably much more thought went into TCP than into RX also comes the fact that the routers very likely are better tuned to TCP.
-- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Rainer Toebbicke European Laboratory for Particle Physics(CERN) - Geneva, Switzerland Phone: +41 22 767 8985 Fax: +41 22 767 7155 _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
