There are two proposals, one in the draft and one in your email. The
general assumption behind these proposals is that a node participates in
the overlay as a full peer and uses these solutions for achieving hop-by-hop
reliability.
I do not agree with the proposals in the draft or in your email. For
simplicity, lets refer to these solutions as Lesser-reliability-over-UDP
(LROU). The reasons, some of which I have stated in the earlier posts, are:
1) LSROU or even TCP-over-UDP is not a universal solution. It is
well-known that TURN server use is necessary for UDP, especially when
using cascaded NATs, or NATs with end-point dependent filtering.
2) Base draft has tried to incorporate solutions that work in all
scenarios. This is why we have recursive routing. LSROU does not work in
all scenarios. A combination of LSROU and TCP also does not work in all
scenarios. Bottom line: relaying is unavoidable. Clearly, lesser the
relaying the better, and LROU is considered to be one way. However, see
(3) and (5).
3) TCP inbound connections through NAT are considered more problematic
than UDP. The available data I am aware of (Characterizing paper, IMC'05)
suggests that for 100% of common type of deployed NATs, it is possible to
establish direct TCP connections. You noted that this is not as universal
as discussed in the paper, but how much less? Can we put a number? Do we
have data?
4) We cannot make any assumptions on the size of the data sent over LSROU.
5) LSROU only relies on *timeouts* to recover each loss. TCP recovers loss
using *TDACK* and *timeout*. A node participating in the overlay as a full
peer that uses LSROU to recover losses using *timeouts* is the weakest link
in the routing chain.
5) Lets suppose that TCP inbound connections were much more problematic
than UDP and so a lot of peers will run LSROU. Imagine that the only way
these peers running LSROU recover losses is using timeouts. Can we
imagine the poor routing performance of this system? Shall we standardize
it?
6) The present text in the draft uses TFRC-SP. TFRC-SP mandates a gap of
10ms between each transmission. Imagine 4 packets traversing 5 hops, and
each transmission delayed by 10ms. Even if there are no losses, the last
packet will leave the fourth hop after 120ms.
My proposal (a rough text) is as follows.
"RELOAD uses TCP for achieving hop-by-hop reliability and relies on
existing techniques to solve inbound TCP connection problem. When direct
connection fails, the node (a) only participates as a client or (b)
particpates as a peer and uses a TCP TURN server to achieve a 1-hop
connection with its connection table entries.
Alternatively, a peer can use a TCP-over-UDP protocol to establish
direct connections and to achieve reliability. However, we do not specify
such a protocol."
-s
On Wed, 25 Mar 2009, Bruce Lowekamp wrote:
Salman,
Based on your list, I'm going to assume you agree with the current
proposal. RELOAD supports (and prefers) a TCP overlay link protocol,
and it offers a UDP-based protocol when that doesn't work. I'd fully
support a simple use RFCXXXX for a TCP-over-UDP protocol, but since
there isn't one, we have a goal of something that should work, even if
it doesn't reach "ideal" (TCP-like) performance. If you believe more
needs to be done to specify a real TCP over UDP, I fully support you
advancing that in TSV.
Bruce
On Thu, Mar 19, 2009 at 12:13 AM, Salman Abdul Baset
<[email protected]> wrote:
On Wed, 18 Mar 2009, Bruce Lowekamp wrote:
That paper in particular, and the reasons UDP connections are more
reliably formed than TCP, have been discussed numerous times in
MMUSIC, and I really don't think we should be repeating the whole
conversation here. But the summary is that it's not nearly as
universal a solution as indicated in that paper.
Bruce
Sure. I have already mentioned that this is a IMC'05 paper and more recent
data, if available, is helpful and needed.
There are at least four solutions to the hop-by-hop reliability problem:
(1) Clients
Nodes behind TCP *un*friendly NATs can always act as clients and establish a
TCP connection(s) with reachable node(s). The reachable nodes can be behind
friendly NATs or they can have a public IP address.
(2) Full peer but use relay peer(s)
A node participates as a peer. It establishes TCP connection with reachable
peers, which inturn establish a TCP connection with the nodes' connection
table entries.
(3) Full peer with techniques for direct TCP connection establishment
A node participates as a peer and uses TCP traversal techniques for
establishing direct connection (including Dean's upcoming ones:)
(4) Full peer with TCP-over-UDP
Since TCP traversal may fail, design/reuse a reliable congestion control
protocol over UDP.
Note that:
(1) and (2) always work.
(3) and (4) do not work well behind cascaded NATs. (4) fails behind
UDP-blocking firewalls.
(4) is feasible over (3) since UDP has a better chance of connection
establishment when NATs are not cascaded. However, UDP blocking firewalls
need to be factored in this feasible discussion. Again, any recent data is
helpful.
For (4), the TCP-over-UDP protocol needs to be well-designed and
well-implemented. Otherwise, peers doing TCP-over-UDP may be the weakest
link in the routing chain. Approaches which recover every loss using timeout
may not be the most feasible ones.
-salman
_______________________________________________
P2PSIP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/p2psip