One of the (if not THE) fundamental decisions that makes the Internet work is that reliability is end-to-end, not hop-by-hop. Hops are allowed to drop traffic due to congestion, and that is taken as implicit feedback that there is congestion in the network.
The proposal I sent out drops traffic in two places. First, each peer maintains a limited queue for fragments and drops excess fragments (the literal analog of drops due to congestion in routers). Second, the overlay link protocol is only semi-reliable (which I debated having at all), with the assumption that on the Internet, loss is due to congestion. So we have two congestion signals to the peer, queue length and link protocol drops. Congestion needs to cause load shedding in a network (overlay or not) or else it will collapse. Arguing that we should do extra work for hop-by-hop reliability makes no sense to me. Bruce On Tue, Apr 7, 2009 at 10:54 PM, Henning Schulzrinne <[email protected]> wrote: > The problem is that this will collapse (say) 10 times sooner than in a > non-P2P network (assuming 10 hops). > > Plus, getting the round-trip time estimation right is going to be that much > harder the more hops there are, leading to spurious retransmission. After > all, each hop is likely, on average, to transit a fair chunk of the Internet > if the P2P network is global. With hop-by-hop RTT estimation, you at least > have a chance to re-use the same set of "links" reasonably frequently, e.g., > to finger table neighbors, so that RTT estimates are at least plausible. > Each end-to-end path will be one-off, so that the RTT can only be > worst-case, and that has to be significantly larger than the initial TCP > estimate, given that you're traversing good spans of the Internet several > (but unknown number of) times. > > Betting that all messages are small seems like a dangerous gamble. We lost > that one before. (Not just SIP - who would have thought that DNS messages > could become fairly substantial?) > > Dropping messages mid-stream is a great way to design a protocol that > doesn't meet the reliability requirements that "competition" with > client-server mechanisms requires, and makes diagnosis really hard, as you > could have failures that depend on who is asking the question, and other > random behavior (e.g., which replica is being asked for). P2P systems are > hard enough to debug without adding more obscure failure modes. > > My opinion is to either to do it right or not to do it at all. Trying to cut > corners now, to show how "simple" designing transport is, will just lead to > regret later. We've been there, and some of us don't care to re-visit that > place. > > In general, we're in great danger of designing a protocol that is far more > complicated than needed for the simple URL-to-IP address mapping (you don't > need "kinds" and customizable policies for that...), yet insufficient to > handle anything more demanding. > > Henning > > On Apr 7, 2009, at 10:09 PM, Bruce Lowekamp wrote: > >> I think that if messages are typically only a few (<= 3) fragments, we >> can safely ignore this issue. If the loss rate is high enough that it >> matters, the network will collapse anyway. Also remember that what >> happens is there's a retransmission of all of the fragments, which >> might waste some traffic, but the receiver can still have the other >> fragments waiting, so they're not completely useless. >> >> OTOH, if messages are going to typically be even larger (or need to >> deal with higher loss rates), I would favor fixing the problem >> end-to-end, not through hop-by-hop reliability. I hesitated adding >> the minimal semi-reliability to the proposal I wrote up. It's just a >> lot of complexity and latency added for something that I'm not sure is >> a good addition to the overall stability of the network. >> >> It's important to be willing to drop traffic mid-network. That's the >> only way the network can shed load. The senders will back off. >> (Currently they back off in a per-transaction question. It's an >> interesting question whether they should do more.) >> >> Bruce >> >> >> On Mon, Apr 6, 2009 at 11:16 PM, Salman Abdul Baset >> <[email protected]> wrote: >>> >>> Are you saying hop-by-hop reliability is not needed? A received message >>> is >>> useless unless all fragments are reliably received. >>> >>> If indeed hop-by-hop reliability is needed, then the link layer analogy >>> is >>> unfortunately not right. >>> >>> -s >>> >>> On Mon, 6 Apr 2009, Bruce Lowekamp wrote: >>> >>>> Fortunately, reliability is not important for the overlay link >>>> protocol. It's a link layer. >>>> >>>> Bruce >>>> >>>> >>>> On Mon, Apr 6, 2009 at 2:02 PM, Salman Abdul Baset <[email protected]> >>>> wrote: >>>>> >>>>> TFRC is a congestion control protocol. What is intended is a reliable >>>>> congestion control protocol over UDP, which does not exist as a >>>>> specification. Reliability cannot merely be sprinkled on top of TFRC or >>>>> any >>>>> congestion control protocol. >>>>> >>>>> -s >>>>> >>>>> On Mon, 6 Apr 2009, Bruce Lowekamp wrote: >>>>> >>>>>> Yes, that's a valid option. Probably the right one at this point. In >>>>>> fact, the solution space as I last sent it out looks like: >>>>>> >>>>>> - stop and wait >>>>>> - simplified AIMD >>>>>> - TFRC >>>>>> - (TCP over UDP might go here if a draft existed) >>>>>> - TCP >>>>>> >>>>>> The argument has mostly been about the simplified AIMD. I kind of >>>>>> hate to lose it, but you're right, it will be a lot less controversial >>>>>> if we do. Actually, TFRC is a pretty flexible protocol itself, so we >>>>>> can probably just do something a bit more within that framework if we >>>>>> want to have options. >>>>>> >>>>>> stop and wait was always intended as maybe a development-type >>>>>> protocol, not a real deployable protocol (although it would work OK in >>>>>> a small office environment) >>>>>> >>>>>> Bruce >>>>>> >>>>>> >>>>>> On Mon, Apr 6, 2009 at 9:18 AM, Brian Rosen <[email protected]> wrote: >>>>>>> >>>>>>> Is "use TCP when it works, and TRFC when it doesn't" an answer? >>>>>>> Arguments like "it's too complex" don't work for me when we're >>>>>>> talking >>>>>>> transport protocols that have to do congestion control, etc. >>>>>>> Congestion >>>>>>> control is complex. >>>>>>> >>>>>>> Brian >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: [email protected] [mailto:[email protected]] On >>>>>>> Behalf >>>>>>> Of >>>>>>> Lars Eggert >>>>>>> Sent: Monday, April 06, 2009 5:20 AM >>>>>>> To: Bruce Lowekamp >>>>>>> Cc: Salman Abdul Baset; [email protected] >>>>>>> Subject: Re: [P2PSIP] Solution space for fragmentation, congestion >>>>>>> control >>>>>>> and reliability >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> On 2009-4-6, at 6:07, Bruce Lowekamp wrote: >>>>>>>> >>>>>>>> We have the option of simpy saying "use TFRC." That will be good >>>>>>>> enough performance, and require relatively little specification >>>>>>>> since >>>>>>>> TSV has already put a lot of work into it. It's also a bit >>>>>>>> complicated. A lot more complicated than is really needed for most >>>>>>>> p2psip implementations/deployments. >>>>>>>> >>>>>>>> So the motivation of the other options was to provide simpler >>>>>>>> options >>>>>>>> that are going to provide enough performance for many/most >>>>>>>> deployments. >>>>>>> >>>>>>> I'd strongly urge you to use TFRC rather than rolling your own >>>>>>> scheme. >>>>>>> Don't underestimate the validation effort that is required to ensure >>>>>>> that >>>>>>> a >>>>>>> congestion control scheme is safe to deploy. This has all been done >>>>>>> for >>>>>>> TFRC, and it must be done for any new scheme. >>>>>>> >>>>>>> Lars >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> _______________________________________________ >> P2PSIP mailing list >> [email protected] >> https://www.ietf.org/mailman/listinfo/p2psip > > _______________________________________________ P2PSIP mailing list [email protected] https://www.ietf.org/mailman/listinfo/p2psip
