On Jul 16, 2007, at 12:39 AM, David Barrett wrote:
This looks really impressive! However, the subversion browser appears
busted (is throwing a Python exception).
Oops, sure enough - some change must have broken it recently. Thanks
for mentioning it; I'll check it out.
I've checked out the code and it
looks really clean. A few questions about the congestion-control
code:
1) I see you have a few different modules: TCP, VEGAS, DELAY, and
AGGRESSIVE. What's DELAY and AGGRESSIVE?
DELAY is an experimental TCP-friendly delay-based scheme; AGGRESSIVE
was sort of a "straw man" scheme that doesn't try to be TCP-friendly
but instead tries to explore somewhat aggressive behaviors that could
still be reasonable in contexts where TCP-friendliness isn't
required. Neither of these are particularly well-founded in theory
or thoroughly tested, and I definitely wouldn't recommend using
either of them as a default scheme without further refinement. (As
you can see from the code, plain old TCP congestion control is
currently the default; the others should be considered toys for
experimentation.)
2) In TCP/VEGAS/DELAY mode, on a loss it seems to fall back as
follows:
ssthresh = max( cwnd/2, CWND_MIN )
But spec (RFC2581, eq.3) recommends:
ssthresh = max( FlightSize/2, 2*SMSS )
I'm curious if it's intentional to fall back to half of cwnd versus
half of
the in-flight count, or if this is a bug? If it's intentional,
what is the
resulting effect?
It's intentional, but it only makes sense when you also take into
account the 'cwndlim' variable and how it's used - grep for 'if
(cwndlim) { ...'.
As far as I know, the primary reason for RFC2581's recommendation of
using flightsize/2 instead of cwnd/2 on a loss event is because the
standard TCP congestion control algorithm allows the sender to
increment the congestion window every round-trip time *even if there
wasn't enough data to send during that round trip to fill the
existing window*. Thus, over a long semi-idle period where data is
"trickling" through the connection at a rate large enough to avoid
triggering a new slow-start from scratch, but slowly enough not to
cause any packet losses due to congestion, the cwnd can slowly get
incremented _way_ above any reasonable value. So when the send rate
increases and you do eventually experience a loss event, it's
important to use flightsize/2 instead of cwnd/2 because cwnd can be
pretty much arbitrarily large and might no longer have any
relationship whatsoever to the actual available bandwidth.
But to me that just seems like a bug in the way the traditional
scheme manages cwnd in the first place. It may be a "relatively"
harmless bug as long as you use flightsize/2 on a loss event, but
even then it's not completely harmless, because after a long semi-
idle period it can still cause the sender to "stuff" the connection
briefly with a truly unreasonable number of packets, most of which
will definitely get lost, perhaps in the process causing a serious
burst of congestion throughout the network that tramples on many
other flows as well. A better solution seemed to me just to ensure
that cwnd never gets incremented to an unreasonably high value in the
first place. That's easy to do during congestion avoidance, for
example, by incrementing cwnd once per round-trip *only* if data
transmission was actually congestion-limited at some point during
that round-trip. That's the purpose of the cwndlim flag. (A similar
test is made during slow-start, as you can see from the code.)
During a long semi-idle or underutilized period, cwnd will get
incremented to represent the maximum bandwidth that actually gets
used, and stay there; if the sender then subsequently starts sending
data as quickly as possible, cwnd starts increasing again from that
point.
Since cwnd never gets to be much greater than flightsize with this
modification, it seemed safe and reasonable for loss events just to
cut cwnd to cwnd/2 directly instead of using flightsize/2, since it's
cwnd that we're really trying to cut in half anyway according to the
congestion control theory that the algorithm is based on.
I'm not a congestion control expert, though, and although this slight
modification to the classic algorithm seems safe to me and has proved
to behave very well in all of my testing so far, it obviously should
be looked at by a real congestion control authority, and of course
tested more thoroughly, before the code goes into production use.
3) I note you accept any subsequent sequence number for the RTO
calculation,
rather than requiring the exact packet. Very clever; is this more
stable in
a high-loss scenario?
Yes, that was the idea, although I haven't done any specific testing
to verify that hunch.
4) What's all that power-computation business in the DELAY
congestion-control mode? Is this basically using changes in
throughput to
trigger backoff rather than packet loss? Is this Fast TCP?
The DELAY scheme in the code is inspired by several existing delay-
sensitive congestion control schemes, but doesn't correspond exactly
to any of them. It seemed to work reasonably well in the limited
test scenarios under which I've tested it, but it's definitely not
ready for serious use. In particular, I haven't done any real
testing for stability under more adverse conditions such as many
concurrent flows or paths with large bandwidth-delay product. The
DELAY scheme is currently just a toy I was using to play with delay-
based congestion control, and nothing more.
Although the VEGAS scheme "more or less" implements the standard TCP
Vegas scheme, which at least has some real research behind it, I
wouldn't really recommend using that either for serious use. For
example, I'm pretty sure it'll have problems if the baseline
available bandwidth over a connection changes dynamically due to a
routing change or something like that - and as far as I can tell
that's a fundamental problem with the actual TCP Vegas scheme, not
just with my implementation of it. (And I think other researchers
have discovered other, perhaps more serious, flaws in classic TCP
Vegas too, since it was first published.)
In your experience, which of the congestion modes gives the best
results,
and under what scenarios? Have you done much testing on this?
The only CC scheme currently in SST that's quite well-tested and for
which I have high confidence is the (more-or-less) classic loss-based
TCP scheme. I would like to get a solid delay-based scheme into SST,
because delay-sensitive schemes tend to keep the connection's round-
trip latency under load much closer to the path's minimum latency
rather than oscillating between that and that plus the maximum
buffering at the bottleneck link. For things like VoIP we obviously
would like consistently low latency if possible. Both of the toy
delay-based schemes in SST do indeed achieve this nice "minimum
latency preserving" behavior under the _very_ limited scenarios over
which I've tested them, but I don't expect either of them to hold up
as-implemented over more diverse situations. And I haven't worked
hard to get a "proper" delay-based scheme into SST yet because it's
still not clear to me what the "best" delay-based scheme is, anyway -
the field of delay-based CC still seems to be very much a topic of
research and regular upheaval as far as I can tell, and I haven't
been able to follow it consistently. TFRC (RFC 3448, RFC 4828) at
least seems to be getting some traction in the standards bodies, but
even that scheme still seems very ad hoc to me. So I really can't
offer any great wisdom regarding congestion control at the moment -
for now I'm just sticking to classic TCP congestion control for
serious use, and hoping some delay-based scheme will eventually
appear that's obviously "the right one", at which point I'll
implement that in SST. If that doesn't happen, well, who knows...
Cheers,
Bryan
-david
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:p2p-hackers-
[EMAIL PROTECTED] On Behalf Of Bryan Ford
Sent: Thursday, July 12, 2007 5:20 AM
To: theory and practice of decentralized computer networks
Subject: [p2p-hackers] Structured Stream Transport
[Partly in response to David's note on UDT...]
For anyone needing an application-linkable transport protocol library
that implements congestion control, reliability, security, NAT
traversal, and other cool features - and who isn't afraid of unstable
bleeding-edge code - there's an early user-space prototype
implementation of my "Structured Stream Transport" (SST) available
here:
http://pdos.csail.mit.edu/uia/sst/
The SST protocol itself provides an enhanced transport abstraction
that subsumes traditional streams and datagrams, and is described in
more detail in a paper to appear in this year's SIGCOMM:
Structured Streams: a New Transport Abstraction
http://www.brynosaurus.com/pub/net/sst-abs.html
I haven't widely announced the protocol or library "in the wild" yet
because the code isn't quite ready for production use and the
protocol spec is not yet complete or fully consistent with either the
code or the SIGCOMM paper. So consider yourself warned. :) I'm
hoping to fix most of the remaining barriers to real use within the
next couple months. One immediate challenge for potential users of
the code is that it relies on the Qt 4 toolkit - only the QtCore and
QtNet libraries; note that it _doesn't_ need the monstrous ~7MB QtGui
library - but any dependency on Qt means integrating the Qt event
loop with whatever event loop your application uses (at least under
Unix or Mac). This is obviously trivial if your application happens
to be Qt-based but takes a bit of work (probably not too much)
otherwise. I'm willing to help anyone seriously interested in using
SST in a real application to smooth over these technical issues.
Cheers,
Bryan
_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers
_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers
_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers