My BoF report: multipath

Martin Thomson Thu, 22 Oct 2020 17:06:30 -0700

(I put a variation of this comment in the meeting and in slack, but I wanted to 
expand on it some.  Sorry, but this got long.  Four hours is not enough sleep.)


Multipath seems pretty clearly useful for certain cases.  I think that the 
meeting today answered at least the first two of the BoF questions I posed 
earlier on the list.  So if we are to regard this as a BoF, it meet its goals 
(thanks chairs).  There is some uncertainty about the first question about 
having a clear problem to solve, but I am of the view that we could muddle 
through with some combination of either ignoring our differences or working 
around them.  The third question regarding constituency is where I didn't find 
a satisfactory answer.  I want to be clear though, this is no fault of the 
proponents.  At the current time, I am convinced that formally starting work on 
multipath would be unwise.

Multipath aims to improve performance either through latency, robustness, or 
throughput.  Application awareness and involvement in scheduling seemed to be 
the key factor that enables finding the optimal usage pattern or scheduling 
algorithm that allow multipath to deliver on those goals.  Applications and 
users are in the best place to balance goals against other factors like cost or 
whatever else matters most.  (For reference, I recall the same point being made 
by Roberto and Christian most clearly, but several others made the same point.) 
 Christoph did a good job of showing how this applies to very specific use 
cases, and I thought I saw that in the Alibaba presentation also, but we didn't 
quite get enough time to get the necessary detail in either presentation.  One 
potential advantage in this regard is that QUIC implementations are often 
closer to applications, so they might be in a good position to integrate better.

However, many of the cases that were presented were exactly the sorts of opaque 
intermediation that is almost the antithesis of that ideal.  Similarly, David's 
assertion that multipath is orthogonal to MASQUE is reliant on the assumption 
that application involvement is not that important.  In these cases, it's not 
clear that using multipath is strictly good.  

I should unpack that a little.  For those people who are making scheduling 
decisions outside of the endpoint (possible examples being the satellite case 
and the 3GPP case), it's not clear that this is anything endpoints can prevent. 
 An endpoint probably can't stop a network provider from using ECMP either.  
Similarly, it is not clear how an application endpoint could be aware of these 
decisions at a level that would allow them to understand and adapt to this 
treatment.  The result is that these cases have a far more ambiguous value 
proposition.  Improvements come with trade-offs: for instance, the application 
might get better throughput, but it comes at a cost to latency.  So I conclude 
that while these intermediary-based designs might provide an aggregate gain, 
they will probably not realize the full performance gains that come from 
end-to-end awareness and control.

For IETF insiders, see also the BANANA or LOOPS BoFs which were strictly 
network-based analogues of these.  Many of the same concerns that caused those 
BoFs to fail apply to those use cases.

Maybe we accept the application of the protocol to these questionable ends as 
acceptable collateral if we are able to deploy at the endpoints.  Maybe we 
allow intermediaries to seek marginal improvements, but try to ensure that we 
have a clear path to deploying something better in the long term.  But there is 
a risk that deployment in the network could interact poorly with more-ideal 
end-to-end solutions and even prevent those deployments.

These are systems-level questions that are large in scope and subtle in their 
effect.  I think that it will require considerable energy to resolve them.  Or, 
as seems more likely in my experience, it will take more time and effort to 
design a protocol where there are fundamental disagreements about the nature of 
the deployment models.

However, this isn't the only factor.  We are not deciding on the merits and 
value of multipath in a vacuum.  It was pretty clear that multipath has 
potential, at least in principle, or in certain cases.  I'm also mostly 
convinced now that we could produce a design.  There's some uncertainty, but it 
seems like we could tolerate that.  QUIC definitely wasn't a sure thing when we 
started out, I can't expect any large effort to be risk free.

So, with some uncertainty about uses cases, I might still conclude that we have 
satisfactory answers to the first two BoF questions.  My concern here is about 
the third: constituency.

What I think is most important at this point is understanding if this protocol 
will remain a single, coherent thing.  That we can keep building on the 
"synergies" that Spencer referred to.  No matter the technical merits of the 
protocol (it's great! probably!) that synergy is probably the most important 
feature that this working group has delivered with QUIC.  The details of the 
protocol matter less than the fact that we have a group of people committed to 
building and maintaining that protocol.  This working group needs to be the 
venue where work happens so that this community can continue to build on this 
success.

So for multipath, if we take it on, I'd only like to do so if I was convinced 
that a non-trivial proportion of the active deployments are committed to 
working on it and deploying the new extension or version.  That is, that this 
community wants to do the work.  I see no evidence of that yet, which is why I 
will claim that this fails to satisfactorily answer that third BoF question.

It is very easy for a splinter group to define a new version of QUIC that does 
anything.  draft-deconnick or draft-huitema could be the basis of that sort of 
effort and that could result in the definition of QUIC 84 or 0x0219c81 or 
whatever.  Call it QUICv2 if you really want.  But if that protocol is only 
used in certain narrow contexts, then it doesn't produce any of those 
synergies.  On the contrary, it works to undermine them, so I would prefer to 
avoid that.

So rather than ask whether multipath is doable, I think we need to instead 
decide what the QUIC working group - the group that built the core protocol - 
is doing next for that core protocol and the deployments that depend on it.  
Personally, I don't think that we're ready for another large project.  We need 
deployment experience with the protocol.  We also need to go in and backfill 
those pieces of QUIC we need for the next thing, like version negotiation.  For 
me, that's more than enough.

I've now seen a lot of enthusiasm for the idea of multipath.  There were some 
great presentations with convincing use cases.  There might be too much 
diversity in use cases or a schism in approaches, but we probably could, with 
sufficient energy, overcome that.  However, I have to conclude that this is not 
a good time for starting that work.

I realize that this is likely unsatisfactory to those who want multipath.  I 
also recognize that deferring work when there is such clear demand could result 
in that demand manifesting in a bunch of non-interoperable protocols.  Those 
are risks that we each have to assess for ourselves.

This will change over time.  I don't know how long it will take.  But it's not 
now.

Thanks for reading this far,
Martin

My BoF report: multipath

Reply via email to