[P2PSIP] tsv-dir review of draft-ietf-p2psip-base-18

SCHARF, Michael Mon, 05 Sep 2011 08:36:02 -0700

Hi,

I've reviewed this document as part of the transport area directorate's
ongoing effort to review key IETF documents. These comments were written
primarily for the transport area directors, but are copied to the
document's authors for their information and to allow them to address
any issues raised. The authors should consider this review together with
any other last-call comments they receive. Please always CC
[email protected] if you reply to or forward this review.



This draft is on the right track but has open issues, described in the
review. Given the length and complexity of the document, I only comment
on transport issues. However, for what it is worth, my personal
impression is that other sections not addressed by this review seem to
be somehow experimental.

My main concern is the Forwarding and Link Management Layer. It is
designed to run over different transports, most notably TCP and UDP.
Having such a generic transport including built-in NAT traversal support
is very valuable. But I find parts of the specification confusing, as
explained below. Also, building a simple, generic transport mechanism on
top of UDP and TCP (or, DTLS and TLS) is actually something that is
hardly specific to RELOAD.


Main concerns:

Section 1.2.5: "This layer also utilizes a framing header to encapsulate
messages as they are forwarding along each hop.  This header aids
reliability congestion control, flow control, etc.  It has meaning only
in the context of that individual link." (BTW: Note that a comma is
missing after reliability.)

=> This already well explains my concern that is further detailed below:
If TCP (or SCTP) transport is used, reliability, congestion control,
flow control are already solved by the transport protocol. Does the
Forwarding and Link Management Layer duplicate these functions then?
This and potential interactions are not well described in the document.


Section 5.2.1: "Because messages may be lost in transit through the
overlay, RELOAD incorporates an end-to-end reliability mechanism.  When
an originating node transmits a request it MUST set a 3 second timer. If
a response has not been received when the timer fires, the request is
retransmitted with the same transaction identifier. The request MAY be
retransmitted up to 4 times (for a total of 5 messages).  After the
timer for the fifth transmission fires, the message SHALL be considered
to have failed."

=> It may make sense to exponentially backoff this timer. Or, at least
it would be useful to explain why not to backoff. 3 sec is the initial
retransmission timeout duration of older TCP stacks, i. e., in worst
case scenarios even the transfer over a single overlap hop can result in
3 sec delay.


Section 5.5.1.6: "Highest priority is assigned to protocols that offer
well-understood congestion and flow control without head of line
blocking. For example, SCTP without message ordering, DCCP, or those
protocols encapsulated using UDP. [...] Second highest priority is
assigned to protocols that offer well-understood congestion and flow
control but have head of line blocking such as TCP."

=> While SCTP may be better suited for signaling applications than TCP,
I wonder why DCCP should have a high priority than TCP. AFAIK, DCCP
offers no reliability and is not build for signaling transport. Any
application-level retransmission mechanism on top of DCCP is likely to
be slower than transport protocol mechanism offered by SCTP and TCP.
Maybe I miss something?

(As a side note: Research results have shown that head-of-line blocking
only has a significant impact for high packet loss rates (> 1%). In that
case, the link is seriously congested and congestion control will
significantly throttle the data transfer. Head-of-line blocking might
then not be the most important delay component. Personally, I don't
think that avoiding head-of-line blocking is the most important
advantage of SCTP over TCP, and thus I don't think that it has to be
stressed in that section. But that is my personal view and it might not
be shared by the whole research community.)


Section 5.6: "The Framing Header (FH) is used to frame messages and
provide timing when used on a reliable stream-based transport protocol.
Simple Reliability (SR) makes use of the FH to provide congestion
control and semi-reliability when using unreliable message-oriented
transport protocols." 

=> In this and in followup sections, the operation on top of a reliable
congestion-controlled protocol (TCP, possibly SCTP in future) should be
better separated from datagram transport (UDP). The required functions
in the Overlay Link Layer seem to be quite different. 


Sections 5.6.1.1, Section 5.6.1.3, Section 5.6.1.4

=> These cases are apparently not covered by this spec. Move to
appendix?


Section 5.6.2: "The same header is used for both reliable and unreliable
transports for simplicity of implementation."

=> True, one can do that. But as both endpoints anyway have to agree to
on a transport, it would not be overly complex to use headers that are
better aligned with the features of the underlying transport, right? I.
e., one could use a more complex header if UDP transport is needed.
RELOAD has a similar flexibility concerning data structures in many
other parts of the protocol.


Section 5.6.2: "When the receiver receives a message, it SHOULD
immediately send an ACK message.  The receiver MUST keep track of the 32
most recent sequence numbers received on this association in order to
generate the appropriate ack."

=> Maybe I am completely lost here: Why is this MUST needed for TCP/TLS
links? In that case, messages arrive in order on a link, i. e., acking
the most recent one is sufficient. All previous ones must already have
been received.


Section 5.6.2: " received  A bitmask indicating if each of the previous
32 sequence numbers before this packet has been among the 32 packets
most recently received on this connection."

=> I had a hard time in understanding this sentence, and it
implications. I guess that this bitmask doesn't ensure full reliability,
at least in corner cases (e. g. 33 messages being lost in sequence). In
other words, this is a partial reliability scheme. I wonder why not to
use a cumulative ack with a bitmask acking out-of-order data, similar to
TCP's SACK. Also, the requirement of only acking the "32 packets most
recently received" may have undesired effects (e. g., what happens in
case of packet duplication?). In fact, instead of this customized
scheme, running a small subset of TCP's mechanisms in the user space
might just be fine, and I don't understand why this would be more
complex than implementing this scheme.


Section 5.6.2: "The received field bits in the ACK provide a high degree
of redundancy so that the sender can figure out which packets the
receiver has received and can then estimate packet loss rates. If the
sender also keeps track of the time at which recent sequence numbers
have been sent, the RTT can be estimated."

=> There is a lot of redunancy, indeed. But it would help to have a
short paragraph that explains how the sender indeed estimates the packet
loss rate. And note that the wording is slightly misleading: IMHO it is
not the *redundancy* that enables the sender to estimate the packet loss
rate. Other acknowlegdement schemes with less redundancy would work as
well.


Section 5.6.3.1: "In general, senders MAY implement any rate control
scheme of their choice, provided that it is REQUIRED to be no more
aggressive then TFRC[RFC5348]. The following section describes a simple,
inefficient scheme that complies with this requirement."

=> Maybe I again miss something, but as far as I can see the following
sections doesn't fully explain how TFRC is implemented here. For
example, TFRC requires information about the segment size; this is not
considered here. In general, it is not clear to me why a simple baseline
implementation does not follow the TFRC protocol (RFC 5348) more closely
(or a light-weight user-space TCP-like protocol).


Section 5.6.3.1.1: "In each retransmission, the sequence number is
incremented."

=> It would be worth to spend a sentence about the rational and
consequences of this. For instance, whether this implies that a receiver
may process the original message and the retransmission twice.


Section 5.6.3.1.1: "Implementations that use a dynamic estimate to
compute the RTO MUST use the algorithm described in RFC 6298[RFC6298],
with the exception that the value of RTO SHOULD NOT be rounded up to the
nearest second but instead rounded up to the nearest millisecond."

=> This sentence doesn't cite RFC 6298 correctly. RFC 6298 doesn't
mandate to round up to the nearest second. It mandates a minimum RTO of
1 second. And I strongly suggest to use such a minimum RTO as well
(maybe 500ms as minimum would be OK as well).


Section 5.6.3.1.1: "Once an ACK has been received for a message, the
next message can be sent, but the peer SHOULD ensure that there is at
least 10 ms between sending any two messages.  The only time a value
less than 10 ms can be used is when it is known that all nodes are on a
network that can support retransmissions faster than 10 ms with no
congestion issues."

=> This last requirement cannot be achieved in practice, IMHO. It is
impossible to know in advance congestion situations unless the overlay
operates in a fully controlled environment without any risk of link
failures etc. Thus, this paragraph effectively seems to limit the peak
load per link to 100 messages/s. I could imagine that this is not
sufficient in very large overlays. Would it make sense to put this
parameter in the configuration file instead?


Section 5.6.5: "Because the TCP layer's application-level timeout is too
slow to be useful for overlay routing, the Overlay Link implementation
MUST use the framing header to measure the RTT of the connection and
calculate an RTO as specified in Section 2 of [RFC6298]. The resulting
RTO is not used for retransmissions, but as a timeout to indicate when
the link SHOULD be removed from the routing table."

=> Again, instead of such implicit statements, I somehow miss in the
document an explicit statement that explains that relability, congestion
control, flow control etc. does not have to be provided by the overlay
link protocol when running over TCP.


Editorial issues:

Section 5.3.2.1: "Because the sequence number may in principle wrap,
greater than or less than are interpreted by modulo arithmetic as in
TCP."

=> Maybe better "... as sequence numbers in TCP"?


Section 5.5.1: "A node sends an Attach request when it wishes to
establish a direct TCP or UDP connection to another node for the purpose
of sending RELOAD messages."

=> As correctly noted later in Section 5.6, UDP does not properly have
"connections". The rationale for still using that term should be moved
to Section 5.5.1.


Section 5.5.1.8 and following sections: "An agent MUST skip the
verification procedures in Section 5.1 and 6.1 of ICE."

=> Less confusing would be "... of the ICE specification".


All Sections: Please ensure that concepts and acronyms are explained
when they are first used.


I apolologize if I should have missed something.

Michael
_______________________________________________
P2PSIP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/p2psip

[P2PSIP] tsv-dir review of draft-ietf-p2psip-base-18

Reply via email to