Re: [OpenAFS] 1.4.8, Rx Performance Improvements, and a Small Business Innovative Research grant

Rainer Toebbicke Fri, 03 Oct 2008 03:10:19 -0700

As a result of these problems Rx was periodically not sending theanticipated acknowledgment packet which in turn resulted in a timeout
and retransmission.  The Rx stack was also frequently finding itself
out of free packets and was forced to block on a global lock while
additional packets structures were allocated from the process'memory pool. The end result was a performance improvement of greaterthan 9.5% when comparing the Rx performance of 1.4.8 over 1.4.7.
Rough tests show that the 1.4.8 Rx stack is capable of 124MBytes/second
over a 10Gbit link.  There is still a long way to go to fill a 10Gbit
pipe but it is a start.  Now we are only off by one order of magnitude.

Having in the past repeatedly dug into the RX code (without spottingthose problems!) I am of course very interested and will try the newcode as soon as possible!

Just a few findings on RX from my previous (vain) attempts to make it"lightning fast" - perhaps they trigger ideas for whoever is stillworking on it or corrections from those who know better:

1. as latency grows when crossing routers or even public networks thedefault window of 32 packets is too small. On the other hand, thehandling of the transmission queue grows with n**2, and even fastprocessors are quickly overwhelmed. Here's where "oprofile" is avaluable tool. Some of this can be reduced with queue hints, wiselyposting retransmit events and trying to avoid scanning the whole queuein several places;

2. jumbograms are a pain: years ago we had a research network droppingfragmented packets and spent weeks on pinning that down. Currently wesuspect another one. Firewalls choke on them. They also increasecomplexity for access list in routers. And of course the probabilityincreases of having to retransmit the whole jumbogram because onefragment got lost. What makes me frown is that it is apparentlyfaster if the kernels split and reassemble jumbograms on the fly, thanby RX doing it with much more knowledge about the state;

3. the path for handling an ACK packet is very long, I measured on theorder of 10 microseconds on average on a modern processor. At over100 MB/s you'd be handling ~50000 ACKs per second in a non-jumbogramconfiguration and have hardly any time left to send out new packets. Alot is spent on waiting for the call-lock: even when that one isreleased quickly (which it isn't in the standard implementation, asthe code leisurely walks around with it for extended periods, but Iexperimented with a "release" flag), the detour through the schedulerslows down things dramatically. The lock structure should probably berevisited to make contention between ack recv & transmit threads lesslikely;

4. slow start is implemented state-of-the-art, fast recovery howeverlooks odd to me (actually: "inexisting" but I may be fooled by somejumbogram smoke). When it comes to congestion avoidance, a lot of theresearch that went into TCP in the last ten years is obviouslymissing. I started experimenting with CUBIC in the hope that it helpsto reduce retransmits and keeping a constant flow, let's see;

5. earlier this year we mentioned handling of new calls, which isagain a quadratic problem due to the mixture of service classes. Thismakes it impractical to allow for thousands of waiting calls, creatinga problem on a cluster with thousands of nodes.

With those observations... does rx-over-tcp look like a solution? Onthe packet-transmission side probably, but the encapsulation verylikely still demands significant processing power. And running aserver with 10000 or 20000 TCP connections does not sound that obviouseither.


Voilà... my 0.02 €. Sorry for being verbose, I couldn't resist.


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] 1.4.8, Rx Performance Improvements, and a Small Business Innovative Research grant

Reply via email to