On Jul 16, 2007, at 12:39 AM, David Barrett wrote:
This looks really impressive!  However, the subversion browser appears
busted (is throwing a Python exception).

Oops, sure enough - some change must have broken it recently. Thanks for mentioning it; I'll check it out.

I've checked out the code and it
looks really clean. A few questions about the congestion-control code:

1) I see you have a few different modules: TCP, VEGAS, DELAY, and
AGGRESSIVE.  What's DELAY and AGGRESSIVE?

DELAY is an experimental TCP-friendly delay-based scheme; AGGRESSIVE was sort of a "straw man" scheme that doesn't try to be TCP-friendly but instead tries to explore somewhat aggressive behaviors that could still be reasonable in contexts where TCP-friendliness isn't required. Neither of these are particularly well-founded in theory or thoroughly tested, and I definitely wouldn't recommend using either of them as a default scheme without further refinement. (As you can see from the code, plain old TCP congestion control is currently the default; the others should be considered toys for experimentation.)

2) In TCP/VEGAS/DELAY mode, on a loss it seems to fall back as follows:

        ssthresh = max( cwnd/2, CWND_MIN )

But spec (RFC2581, eq.3) recommends:

        ssthresh = max( FlightSize/2, 2*SMSS )

I'm curious if it's intentional to fall back to half of cwnd versus half of the in-flight count, or if this is a bug? If it's intentional, what is the
resulting effect?

It's intentional, but it only makes sense when you also take into account the 'cwndlim' variable and how it's used - grep for 'if (cwndlim) { ...'.

As far as I know, the primary reason for RFC2581's recommendation of using flightsize/2 instead of cwnd/2 on a loss event is because the standard TCP congestion control algorithm allows the sender to increment the congestion window every round-trip time *even if there wasn't enough data to send during that round trip to fill the existing window*. Thus, over a long semi-idle period where data is "trickling" through the connection at a rate large enough to avoid triggering a new slow-start from scratch, but slowly enough not to cause any packet losses due to congestion, the cwnd can slowly get incremented _way_ above any reasonable value. So when the send rate increases and you do eventually experience a loss event, it's important to use flightsize/2 instead of cwnd/2 because cwnd can be pretty much arbitrarily large and might no longer have any relationship whatsoever to the actual available bandwidth.

But to me that just seems like a bug in the way the traditional scheme manages cwnd in the first place. It may be a "relatively" harmless bug as long as you use flightsize/2 on a loss event, but even then it's not completely harmless, because after a long semi- idle period it can still cause the sender to "stuff" the connection briefly with a truly unreasonable number of packets, most of which will definitely get lost, perhaps in the process causing a serious burst of congestion throughout the network that tramples on many other flows as well. A better solution seemed to me just to ensure that cwnd never gets incremented to an unreasonably high value in the first place. That's easy to do during congestion avoidance, for example, by incrementing cwnd once per round-trip *only* if data transmission was actually congestion-limited at some point during that round-trip. That's the purpose of the cwndlim flag. (A similar test is made during slow-start, as you can see from the code.) During a long semi-idle or underutilized period, cwnd will get incremented to represent the maximum bandwidth that actually gets used, and stay there; if the sender then subsequently starts sending data as quickly as possible, cwnd starts increasing again from that point.

Since cwnd never gets to be much greater than flightsize with this modification, it seemed safe and reasonable for loss events just to cut cwnd to cwnd/2 directly instead of using flightsize/2, since it's cwnd that we're really trying to cut in half anyway according to the congestion control theory that the algorithm is based on.

I'm not a congestion control expert, though, and although this slight modification to the classic algorithm seems safe to me and has proved to behave very well in all of my testing so far, it obviously should be looked at by a real congestion control authority, and of course tested more thoroughly, before the code goes into production use.

3) I note you accept any subsequent sequence number for the RTO calculation, rather than requiring the exact packet. Very clever; is this more stable in
a high-loss scenario?

Yes, that was the idea, although I haven't done any specific testing to verify that hunch.

4) What's all that power-computation business in the DELAY
congestion-control mode? Is this basically using changes in throughput to
trigger backoff rather than packet loss?  Is this Fast TCP?

The DELAY scheme in the code is inspired by several existing delay- sensitive congestion control schemes, but doesn't correspond exactly to any of them. It seemed to work reasonably well in the limited test scenarios under which I've tested it, but it's definitely not ready for serious use. In particular, I haven't done any real testing for stability under more adverse conditions such as many concurrent flows or paths with large bandwidth-delay product. The DELAY scheme is currently just a toy I was using to play with delay- based congestion control, and nothing more.

Although the VEGAS scheme "more or less" implements the standard TCP Vegas scheme, which at least has some real research behind it, I wouldn't really recommend using that either for serious use. For example, I'm pretty sure it'll have problems if the baseline available bandwidth over a connection changes dynamically due to a routing change or something like that - and as far as I can tell that's a fundamental problem with the actual TCP Vegas scheme, not just with my implementation of it. (And I think other researchers have discovered other, perhaps more serious, flaws in classic TCP Vegas too, since it was first published.)

In your experience, which of the congestion modes gives the best results,
and under what scenarios?  Have you done much testing on this?

The only CC scheme currently in SST that's quite well-tested and for which I have high confidence is the (more-or-less) classic loss-based TCP scheme. I would like to get a solid delay-based scheme into SST, because delay-sensitive schemes tend to keep the connection's round- trip latency under load much closer to the path's minimum latency rather than oscillating between that and that plus the maximum buffering at the bottleneck link. For things like VoIP we obviously would like consistently low latency if possible. Both of the toy delay-based schemes in SST do indeed achieve this nice "minimum latency preserving" behavior under the _very_ limited scenarios over which I've tested them, but I don't expect either of them to hold up as-implemented over more diverse situations. And I haven't worked hard to get a "proper" delay-based scheme into SST yet because it's still not clear to me what the "best" delay-based scheme is, anyway - the field of delay-based CC still seems to be very much a topic of research and regular upheaval as far as I can tell, and I haven't been able to follow it consistently. TFRC (RFC 3448, RFC 4828) at least seems to be getting some traction in the standards bodies, but even that scheme still seems very ad hoc to me. So I really can't offer any great wisdom regarding congestion control at the moment - for now I'm just sticking to classic TCP congestion control for serious use, and hoping some delay-based scheme will eventually appear that's obviously "the right one", at which point I'll implement that in SST. If that doesn't happen, well, who knows...

Cheers,
Bryan


-david

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:p2p-hackers-
[EMAIL PROTECTED] On Behalf Of Bryan Ford
Sent: Thursday, July 12, 2007 5:20 AM
To: theory and practice of decentralized computer networks
Subject: [p2p-hackers] Structured Stream Transport

[Partly in response to David's note on UDT...]

For anyone needing an application-linkable transport protocol library
that implements congestion control, reliability, security, NAT
traversal, and other cool features - and who isn't afraid of unstable
bleeding-edge code - there's an early user-space prototype
implementation of my "Structured Stream Transport" (SST) available here:

        http://pdos.csail.mit.edu/uia/sst/

The SST protocol itself provides an enhanced transport abstraction
that subsumes traditional streams and datagrams, and is described in
more detail in a paper to appear in this year's SIGCOMM:

        Structured Streams: a New Transport Abstraction
        http://www.brynosaurus.com/pub/net/sst-abs.html

I haven't widely announced the protocol or library "in the wild" yet
because the code isn't quite ready for production use and the
protocol spec is not yet complete or fully consistent with either the
code or the SIGCOMM paper.  So consider yourself warned. :)  I'm
hoping to fix most of the remaining barriers to real use within the
next couple months.  One immediate challenge for potential users of
the code is that it relies on the Qt 4 toolkit - only the QtCore and
QtNet libraries; note that it _doesn't_ need the monstrous ~7MB QtGui
library - but any dependency on Qt means integrating the Qt event
loop with whatever event loop your application uses (at least under
Unix or Mac).  This is obviously trivial if your application happens
to be Qt-based but takes a bit of work (probably not too much)
otherwise.  I'm willing to help anyone seriously interested in using
SST in a real application to smooth over these technical issues.

Cheers,
Bryan


_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers

_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers

_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to