Hi, I've reviewed this document as part of the transport area directorate's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors for their information and to allow them to address any issues raised. The authors should consider this review together with any other last-call comments they receive. Please always CC [email protected] if you reply to or forward this review.
This draft is on the right track but has open issues, described in the review. Given the length and complexity of the document, I only comment on transport issues. However, for what it is worth, my personal impression is that other sections not addressed by this review seem to be somehow experimental. My main concern is the Forwarding and Link Management Layer. It is designed to run over different transports, most notably TCP and UDP. Having such a generic transport including built-in NAT traversal support is very valuable. But I find parts of the specification confusing, as explained below. Also, building a simple, generic transport mechanism on top of UDP and TCP (or, DTLS and TLS) is actually something that is hardly specific to RELOAD. Main concerns: Section 1.2.5: "This layer also utilizes a framing header to encapsulate messages as they are forwarding along each hop. This header aids reliability congestion control, flow control, etc. It has meaning only in the context of that individual link." (BTW: Note that a comma is missing after reliability.) => This already well explains my concern that is further detailed below: If TCP (or SCTP) transport is used, reliability, congestion control, flow control are already solved by the transport protocol. Does the Forwarding and Link Management Layer duplicate these functions then? This and potential interactions are not well described in the document. Section 5.2.1: "Because messages may be lost in transit through the overlay, RELOAD incorporates an end-to-end reliability mechanism. When an originating node transmits a request it MUST set a 3 second timer. If a response has not been received when the timer fires, the request is retransmitted with the same transaction identifier. The request MAY be retransmitted up to 4 times (for a total of 5 messages). After the timer for the fifth transmission fires, the message SHALL be considered to have failed." => It may make sense to exponentially backoff this timer. Or, at least it would be useful to explain why not to backoff. 3 sec is the initial retransmission timeout duration of older TCP stacks, i. e., in worst case scenarios even the transfer over a single overlap hop can result in 3 sec delay. Section 5.5.1.6: "Highest priority is assigned to protocols that offer well-understood congestion and flow control without head of line blocking. For example, SCTP without message ordering, DCCP, or those protocols encapsulated using UDP. [...] Second highest priority is assigned to protocols that offer well-understood congestion and flow control but have head of line blocking such as TCP." => While SCTP may be better suited for signaling applications than TCP, I wonder why DCCP should have a high priority than TCP. AFAIK, DCCP offers no reliability and is not build for signaling transport. Any application-level retransmission mechanism on top of DCCP is likely to be slower than transport protocol mechanism offered by SCTP and TCP. Maybe I miss something? (As a side note: Research results have shown that head-of-line blocking only has a significant impact for high packet loss rates (> 1%). In that case, the link is seriously congested and congestion control will significantly throttle the data transfer. Head-of-line blocking might then not be the most important delay component. Personally, I don't think that avoiding head-of-line blocking is the most important advantage of SCTP over TCP, and thus I don't think that it has to be stressed in that section. But that is my personal view and it might not be shared by the whole research community.) Section 5.6: "The Framing Header (FH) is used to frame messages and provide timing when used on a reliable stream-based transport protocol. Simple Reliability (SR) makes use of the FH to provide congestion control and semi-reliability when using unreliable message-oriented transport protocols." => In this and in followup sections, the operation on top of a reliable congestion-controlled protocol (TCP, possibly SCTP in future) should be better separated from datagram transport (UDP). The required functions in the Overlay Link Layer seem to be quite different. Sections 5.6.1.1, Section 5.6.1.3, Section 5.6.1.4 => These cases are apparently not covered by this spec. Move to appendix? Section 5.6.2: "The same header is used for both reliable and unreliable transports for simplicity of implementation." => True, one can do that. But as both endpoints anyway have to agree to on a transport, it would not be overly complex to use headers that are better aligned with the features of the underlying transport, right? I. e., one could use a more complex header if UDP transport is needed. RELOAD has a similar flexibility concerning data structures in many other parts of the protocol. Section 5.6.2: "When the receiver receives a message, it SHOULD immediately send an ACK message. The receiver MUST keep track of the 32 most recent sequence numbers received on this association in order to generate the appropriate ack." => Maybe I am completely lost here: Why is this MUST needed for TCP/TLS links? In that case, messages arrive in order on a link, i. e., acking the most recent one is sufficient. All previous ones must already have been received. Section 5.6.2: " received A bitmask indicating if each of the previous 32 sequence numbers before this packet has been among the 32 packets most recently received on this connection." => I had a hard time in understanding this sentence, and it implications. I guess that this bitmask doesn't ensure full reliability, at least in corner cases (e. g. 33 messages being lost in sequence). In other words, this is a partial reliability scheme. I wonder why not to use a cumulative ack with a bitmask acking out-of-order data, similar to TCP's SACK. Also, the requirement of only acking the "32 packets most recently received" may have undesired effects (e. g., what happens in case of packet duplication?). In fact, instead of this customized scheme, running a small subset of TCP's mechanisms in the user space might just be fine, and I don't understand why this would be more complex than implementing this scheme. Section 5.6.2: "The received field bits in the ACK provide a high degree of redundancy so that the sender can figure out which packets the receiver has received and can then estimate packet loss rates. If the sender also keeps track of the time at which recent sequence numbers have been sent, the RTT can be estimated." => There is a lot of redunancy, indeed. But it would help to have a short paragraph that explains how the sender indeed estimates the packet loss rate. And note that the wording is slightly misleading: IMHO it is not the *redundancy* that enables the sender to estimate the packet loss rate. Other acknowlegdement schemes with less redundancy would work as well. Section 5.6.3.1: "In general, senders MAY implement any rate control scheme of their choice, provided that it is REQUIRED to be no more aggressive then TFRC[RFC5348]. The following section describes a simple, inefficient scheme that complies with this requirement." => Maybe I again miss something, but as far as I can see the following sections doesn't fully explain how TFRC is implemented here. For example, TFRC requires information about the segment size; this is not considered here. In general, it is not clear to me why a simple baseline implementation does not follow the TFRC protocol (RFC 5348) more closely (or a light-weight user-space TCP-like protocol). Section 5.6.3.1.1: "In each retransmission, the sequence number is incremented." => It would be worth to spend a sentence about the rational and consequences of this. For instance, whether this implies that a receiver may process the original message and the retransmission twice. Section 5.6.3.1.1: "Implementations that use a dynamic estimate to compute the RTO MUST use the algorithm described in RFC 6298[RFC6298], with the exception that the value of RTO SHOULD NOT be rounded up to the nearest second but instead rounded up to the nearest millisecond." => This sentence doesn't cite RFC 6298 correctly. RFC 6298 doesn't mandate to round up to the nearest second. It mandates a minimum RTO of 1 second. And I strongly suggest to use such a minimum RTO as well (maybe 500ms as minimum would be OK as well). Section 5.6.3.1.1: "Once an ACK has been received for a message, the next message can be sent, but the peer SHOULD ensure that there is at least 10 ms between sending any two messages. The only time a value less than 10 ms can be used is when it is known that all nodes are on a network that can support retransmissions faster than 10 ms with no congestion issues." => This last requirement cannot be achieved in practice, IMHO. It is impossible to know in advance congestion situations unless the overlay operates in a fully controlled environment without any risk of link failures etc. Thus, this paragraph effectively seems to limit the peak load per link to 100 messages/s. I could imagine that this is not sufficient in very large overlays. Would it make sense to put this parameter in the configuration file instead? Section 5.6.5: "Because the TCP layer's application-level timeout is too slow to be useful for overlay routing, the Overlay Link implementation MUST use the framing header to measure the RTT of the connection and calculate an RTO as specified in Section 2 of [RFC6298]. The resulting RTO is not used for retransmissions, but as a timeout to indicate when the link SHOULD be removed from the routing table." => Again, instead of such implicit statements, I somehow miss in the document an explicit statement that explains that relability, congestion control, flow control etc. does not have to be provided by the overlay link protocol when running over TCP. Editorial issues: Section 5.3.2.1: "Because the sequence number may in principle wrap, greater than or less than are interpreted by modulo arithmetic as in TCP." => Maybe better "... as sequence numbers in TCP"? Section 5.5.1: "A node sends an Attach request when it wishes to establish a direct TCP or UDP connection to another node for the purpose of sending RELOAD messages." => As correctly noted later in Section 5.6, UDP does not properly have "connections". The rationale for still using that term should be moved to Section 5.5.1. Section 5.5.1.8 and following sections: "An agent MUST skip the verification procedures in Section 5.1 and 6.1 of ICE." => Less confusing would be "... of the ICE specification". All Sections: Please ensure that concepts and acronyms are explained when they are first used. I apolologize if I should have missed something. Michael _______________________________________________ P2PSIP mailing list [email protected] https://www.ietf.org/mailman/listinfo/p2psip
