Re: [P2PSIP] tsv-dir review of draft-ietf-p2psip-base-18

Bruce Lowekamp Mon, 05 Sep 2011 12:05:18 -0700

Thanks for the review.  Responses inline

Bruce

On Sun, Sep 4, 2011 at 8:50 AM, SCHARF, Michael
<[email protected]> wrote:
> Hi,
>
> I've reviewed this document as part of the transport area directorate's
> ongoing effort to review key IETF documents. These comments were written
> primarily for the transport area directors, but are copied to the
> document's authors for their information and to allow them to address
> any issues raised. The authors should consider this review together with
> any other last-call comments they receive. Please always CC
> [email protected] if you reply to or forward this review.
>
>
> This draft is on the right track but has open issues, described in the
> review. Given the length and complexity of the document, I only comment
> on transport issues. However, for what it is worth, my personal
> impression is that other sections not addressed by this review seem to
> be somehow experimental.
>
> My main concern is the Forwarding and Link Management Layer. It is
> designed to run over different transports, most notably TCP and UDP.
> Having such a generic transport including built-in NAT traversal support
> is very valuable. But I find parts of the specification confusing, as
> explained below. Also, building a simple, generic transport mechanism on
> top of UDP and TCP (or, DTLS and TLS) is actually something that is
> hardly specific to RELOAD.

Totally agree.  One of the reasons we left room for extension points
for new transports is in the hope that a more general solution will
become mature enough to be dropped in at a later date.   In line with
that idea, we tried to go for overly simple to avoid spending too much
time specifying complexity that we hope would be replaced with a
better solution.

>
>
> Main concerns:
>
> Section 1.2.5: "This layer also utilizes a framing header to encapsulate
> messages as they are forwarding along each hop.  This header aids
> reliability congestion control, flow control, etc.  It has meaning only
> in the context of that individual link." (BTW: Note that a comma is
> missing after reliability.)
>
> => This already well explains my concern that is further detailed below:
> If TCP (or SCTP) transport is used, reliability, congestion control,
> flow control are already solved by the transport protocol. Does the
> Forwarding and Link Management Layer duplicate these functions then?
> This and potential interactions are not well described in the document.

Within the role of the overlay network, TCP/SCTP are used as
link-layer protocols for hop-by-hop connections (part of forwarding
and link management, as you observe).  However, e2e control is still
needed and provided by the Message Transport layer.  While we may get
some benefit from TCP/SCTP over the individual hops, they don't
address the end-to-end transport issues.

>
>
> Section 5.2.1: "Because messages may be lost in transit through the
> overlay, RELOAD incorporates an end-to-end reliability mechanism.  When
> an originating node transmits a request it MUST set a 3 second timer. If
> a response has not been received when the timer fires, the request is
> retransmitted with the same transaction identifier. The request MAY be
> retransmitted up to 4 times (for a total of 5 messages).  After the
> timer for the fifth transmission fires, the message SHALL be considered
> to have failed."
>
> => It may make sense to exponentially backoff this timer. Or, at least
> it would be useful to explain why not to backoff. 3 sec is the initial
> retransmission timeout duration of older TCP stacks, i. e., in worst
> case scenarios even the transfer over a single overlap hop can result in
> 3 sec delay.

I think we had a discussion about this timer at some point.  I would
hope we don't need to worry about such ancient TCP stacks, but a more
realistic problem is the RTT for the multi-hop path across a large
overlay exceeding 3 seconds.  I admit I thought we had turned this
into a configurable parameter, at least.

There might be an argument that this should actually be adaptive,
applying something like 6298's RTO estimator across all e2e RTTs on
the overlay network and using that as an initial timeout for each new
message.  However, given the high variance in RTTs, it's not clear the
effect this would have on performance, and it would add a lot of
complexity to the implementation.  Given that the underlying
hop-by-hop transport has congestion control already, we decided to
leave it as a simple constant as network stability shouldn't be a
factor here.

>
>
> Section 5.5.1.6: "Highest priority is assigned to protocols that offer
> well-understood congestion and flow control without head of line
> blocking. For example, SCTP without message ordering, DCCP, or those
> protocols encapsulated using UDP. [...] Second highest priority is
> assigned to protocols that offer well-understood congestion and flow
> control but have head of line blocking such as TCP."
>
> => While SCTP may be better suited for signaling applications than TCP,
> I wonder why DCCP should have a high priority than TCP. AFAIK, DCCP
> offers no reliability and is not build for signaling transport. Any
> application-level retransmission mechanism on top of DCCP is likely to
> be slower than transport protocol mechanism offered by SCTP and TCP.
> Maybe I miss something?

We don't care about hop-by-hop reliability here---so a datagram-based
protocol with congestion control seems ideal.

>
> (As a side note: Research results have shown that head-of-line blocking
> only has a significant impact for high packet loss rates (> 1%). In that
> case, the link is seriously congested and congestion control will
> significantly throttle the data transfer. Head-of-line blocking might
> then not be the most important delay component. Personally, I don't
> think that avoiding head-of-line blocking is the most important
> advantage of SCTP over TCP, and thus I don't think that it has to be
> stressed in that section. But that is my personal view and it might not
> be shared by the whole research community.)

Without being completely sure what results you're referring to, I'm
going to assume that the analysis is whether HoL blocking affects the
overall throughput of a TCP link, which I agree it shouldn't given
modern SACK/retransmission etc.  However, as we're using TCP here as
the hop-by-hop link layer underlying a higher-level e2e protocol in
the overlay network, the overall throughput of the hop isn't
necessarily as important as eliminating the latency imposed by HoL
blocking, which is there whenever there's a loss that requires
retransmission and a delay in delivering subsequent messages to the
application layer, and would greatly increase the e2e latency (and
variance in latency) of the messages.

>
>
> Section 5.6: "The Framing Header (FH) is used to frame messages and
> provide timing when used on a reliable stream-based transport protocol.
> Simple Reliability (SR) makes use of the FH to provide congestion
> control and semi-reliability when using unreliable message-oriented
> transport protocols."
>
> => In this and in followup sections, the operation on top of a reliable
> congestion-controlled protocol (TCP, possibly SCTP in future) should be
> better separated from datagram transport (UDP). The required functions
> in the Overlay Link Layer seem to be quite different.
>

why?  e2e issues seem the same---even between consecutive TCP hops,
the peers may experience congestion on the outgoing link and be forced
to drop messages.

>
> Sections 5.6.1.1, Section 5.6.1.3, Section 5.6.1.4
>
> => These cases are apparently not covered by this spec. Move to
> appendix?
>

We agreed a long time ago to leave things in 5.6.1.x in as callouts to
future transports that were hoped to be added.  I have no particular
opinion whether it should be there or in an appendix.

>
> Section 5.6.2: "The same header is used for both reliable and unreliable
> transports for simplicity of implementation."
>
> => True, one can do that. But as both endpoints anyway have to agree to
> on a transport, it would not be overly complex to use headers that are
> better aligned with the features of the underlying transport, right? I.
> e., one could use a more complex header if UDP transport is needed.
> RELOAD has a similar flexibility concerning data structures in many
> other parts of the protocol.
>
>
> Section 5.6.2: "When the receiver receives a message, it SHOULD
> immediately send an ACK message.  The receiver MUST keep track of the 32
> most recent sequence numbers received on this association in order to
> generate the appropriate ack."
>
> => Maybe I am completely lost here: Why is this MUST needed for TCP/TLS
> links? In that case, messages arrive in order on a link, i. e., acking
> the most recent one is sufficient. All previous ones must already have
> been received.

Given the already significant length and complexity of the draft and
protocol, I think "simplicity of implementation" is a significant
concern.  I don't see a negative impact here.

>
>
> Section 5.6.2: " received  A bitmask indicating if each of the previous
> 32 sequence numbers before this packet has been among the 32 packets
> most recently received on this connection."
>
> => I had a hard time in understanding this sentence, and it
> implications. I guess that this bitmask doesn't ensure full reliability,
> at least in corner cases (e. g. 33 messages being lost in sequence). In
> other words, this is a partial reliability scheme. I wonder why not to
> use a cumulative ack with a bitmask acking out-of-order data, similar to
> TCP's SACK. Also, the requirement of only acking the "32 packets most
> recently received" may have undesired effects (e. g., what happens in
> case of packet duplication?). In fact, instead of this customized
> scheme, running a small subset of TCP's mechanisms in the user space
> might just be fine, and I don't understand why this would be more
> complex than implementing this scheme.
>

below

>
> Section 5.6.2: "The received field bits in the ACK provide a high degree
> of redundancy so that the sender can figure out which packets the
> receiver has received and can then estimate packet loss rates. If the
> sender also keeps track of the time at which recent sequence numbers
> have been sent, the RTT can be estimated."
>
> => There is a lot of redunancy, indeed. But it would help to have a
> short paragraph that explains how the sender indeed estimates the packet
> loss rate. And note that the wording is slightly misleading: IMHO it is
> not the *redundancy* that enables the sender to estimate the packet loss
> rate. Other acknowlegdement schemes with less redundancy would work as
> well.
>

The discussion about what protocol to use here was one of the most
controversial issues we had to address (both among the authors and
within the wg).  While I personally wanted to include a more tcp-like
layer here, I was outvoted on the issue on the ground that what we
have now is simpler (at least to those not sufficiently familiar with
TCP), safe for network stability, and has sufficient performance for
most overlays.  We hope that more work will come out of TSV on
appropriate user-layer/tunneled over UDP protocols that can be used to
replace what's here in the future, but we decided to leave such work
out of this draft as it was already sufficiently complexed.  (and at
the time I had hoped to spend time working with folks in TSV on such
issues, but have had to curtail my IETF time recently as my current
position doesn't support such work)

>
> Section 5.6.3.1: "In general, senders MAY implement any rate control
> scheme of their choice, provided that it is REQUIRED to be no more
> aggressive then TFRC[RFC5348]. The following section describes a simple,
> inefficient scheme that complies with this requirement."
>
> => Maybe I again miss something, but as far as I can see the following
> sections doesn't fully explain how TFRC is implemented here. For
> example, TFRC requires information about the segment size; this is not
> considered here. In general, it is not clear to me why a simple baseline
> implementation does not follow the TFRC protocol (RFC 5348) more closely
> (or a light-weight user-space TCP-like protocol).
>

again, simplicity won out.  We believe what's there is a simpler
protocol (the simplest we could think of) that follows the requirement
of being less aggressive than TFRC.

>
> Section 5.6.3.1.1: "In each retransmission, the sequence number is
> incremented."
>
> => It would be worth to spend a sentence about the rational and
> consequences of this. For instance, whether this implies that a receiver
> may process the original message and the retransmission twice.
>

Happy to spend a sentence on this, but just to be clear, this is
designed to allow the world's dumbest stop-and-wait protocol.
Specifically, the goal of 5.6.3.1.1 is to specify a ludicrously
simple, totally safe algorithm that is sufficient for most envisioned
deployments of P2PSIP for implementors who don't want to spend the
time implementing TFRC.  I would hope most do spend the time, but it
seemed unnecessary to specify how TFRC works in this document.

As far as the separate sequence numbers, they can be used because
there's no issue with processing here as with TCP sequence numbers
(this is the link-level hop-by-hop protocol), and if you want to do
something fancier than stop-and-wait, it avoids ambiguity in
calculating RTO (i.e. you don't need Karn's algorithm).  Essentially,
the receiver should is designed to be compliant with a much more
intelligent sender than the stop-and-wait specified in this section,
similar to the simplest TCP receivers.

>
> Section 5.6.3.1.1: "Implementations that use a dynamic estimate to
> compute the RTO MUST use the algorithm described in RFC 6298[RFC6298],
> with the exception that the value of RTO SHOULD NOT be rounded up to the
> nearest second but instead rounded up to the nearest millisecond."
>
> => This sentence doesn't cite RFC 6298 correctly. RFC 6298 doesn't
> mandate to round up to the nearest second. It mandates a minimum RTO of
> 1 second. And I strongly suggest to use such a minimum RTO as well
> (maybe 500ms as minimum would be OK as well).
>

I think you refer to where 6298 says:

         Until a round-trip time (RTT) measurement has been made for a
         segment sent between the sender and receiver, the sender SHOULD
         set RTO <- 1 second, though the "backing off" on repeated
         retransmission discussed in (5.5) still applies.

there's no need for that as ICE has already completed, so we have an
RTT measurement.

>
> Section 5.6.3.1.1: "Once an ACK has been received for a message, the
> next message can be sent, but the peer SHOULD ensure that there is at
> least 10 ms between sending any two messages.  The only time a value
> less than 10 ms can be used is when it is known that all nodes are on a
> network that can support retransmissions faster than 10 ms with no
> congestion issues."
>
> => This last requirement cannot be achieved in practice, IMHO. It is
> impossible to know in advance congestion situations unless the overlay
> operates in a fully controlled environment without any risk of link
> failures etc.

The operating environment referred to would be a LAN or similar.  (more below)

> Thus, this paragraph effectively seems to limit the peak
> load per link to 100 messages/s. I could imagine that this is not
> sufficient in very large overlays.

Agreed.  This protocol isn't intended to be operated in such an
environment (would need TFRC, etc for that), but for many of the
envisioned deployments of P2PSIP, it should be fine. (or some future
emerging option)

> Would it make sense to put this
> parameter in the configuration file instead?

I guess it's a valid point that you can't determine if you're in a
network where you can do better without a configuration option.  I
hate to spend the time overthinking this option, given its deliberate
limited applicability.  Maybe we should just remove that.

>
>
> Section 5.6.5: "Because the TCP layer's application-level timeout is too
> slow to be useful for overlay routing, the Overlay Link implementation
> MUST use the framing header to measure the RTT of the connection and
> calculate an RTO as specified in Section 2 of [RFC6298]. The resulting
> RTO is not used for retransmissions, but as a timeout to indicate when
> the link SHOULD be removed from the routing table."
>
> => Again, instead of such implicit statements, I somehow miss in the
> document an explicit statement that explains that relability, congestion
> control, flow control etc. does not have to be provided by the overlay
> link protocol when running over TCP.

Can add something to the beginning of 5.6 to this effect.
_______________________________________________
P2PSIP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/p2psip

Re: [P2PSIP] tsv-dir review of draft-ietf-p2psip-base-18

Reply via email to