Re: Comments on draft-michel-quic-fec-01

Christian Huitema Mon, 22 Jan 2024 09:18:17 -0800

Thanks for the reply. Comments in line.

On 1/22/2024 3:27 AM, François Michel wrote:

Hi Christian,
Great to hear from you, especially given you expertise in the topic!
Thank you for all your comments. See my answers below.
I am adding Rachel to the loop, that was interested in progressing inthe draft together and ensure the design can handle their use-case as well.
Le 19/01/24 à 07:29, Christian Huitema a écrit :
François, Olivier,
I just spent some time studying your draft on QUIC FEC. I like theidea of having an FEC framework independent from the algorithm used toactually compute the FEC data and repair packets. Your draft solves anumber of practical problems, such as how to notify peers when FEChelps receive a frame from an otherwise lost packet, or how toidentify "symblos" independently of packet numbers using the symbolidentifier frame (SID).
The draft is obviously a work in progress.
Yes, the aim of this draft and current papers under submission is tospark the interest on the topic again. I've been working on FEC foryears now and was part of the earlier QUIC-FEC work at NWCRG where wealready wrote interesting drafts.My intent with this one here is to propose a short and simplespecification that people can wrap their head around. We can then makeit progress together with quicwg folks instead of proposing a first,exhaustive but complex draft that is difficult to apprehend.
You propose two alternatives for linking frames to a SID. I wish youpicked just one, and I prefer your first alternative, in which yourSID frames brackets a list of protected frames.
I prefer the first one as well, but I was insure it fits well the designof existing implementations. If people are okay, I'd be more than happyto only keep alternative 1.
  However, I an not quite sure
how this should be parsed. You give an example as:

   | Pkt(6)[STREAM(2, "xyz"),                                    |
   |        SOURCE_SYMBOL(1, { STREAM(8, "def"),                 |
   |                           DATAGRAM("msg") }]                |

In that example, the frame  STREAM(8..) and DATAGRAM() are protected,
while the "STREAM(2)" is not. Fine, but the syntax is described as:

SOURCE_SYMBOL {
   SID (i),
   FEC Protected Payload (..)
}
... and I don't know how to parse that. There is no indication of thelength of the "FEC Protected Payload". Do you mean to indicate thatthe SOURCE_SYMBOL frame extends to the end of the packet, and that allframes following the SID are protected?
Yes, that's right. In the current design, the frame stops at the end ofthe packet. We can add a length field or a number of protected frames.


OK. It would be simple enough to define packets as:
<non protected frames><SID><protected frames>

You define a framework in which client and server negotiate to useFEC, and also to select a FEC scheme. The syntax of your transportparameter seems a bit restrictive: the client proposes to use FEC anda specific scheme, and the server accepts or refuse. Given theexperimental nature of FEC, I expect that we will try severalalgorithms. It would be nice for the client to propose a list, and forthe server to pick one -- or zero, if it does not support any of theproposed values. In fact, I think that you could merge the "enableFEC" parameter that negotiates use of FEC with the "decoder FECscheme" negotiation.
Agree, we could use the transport parameter to propose a list of FECschemes, that's indeed how most negotiation mechanisms work. The absenceof this parameter indicates FEC is not supported and an empty list wouldannounce that FEC SOURCE_SYMBOL frames are parsed but not used. Thismight cause problems though as the REPAIR frame format depends on thenegotiated scheme.

The classic negotiation is "client proposes many, sender pick one, andthat one defines the REPAIR format." The alternative would be to nothave a repair format, and let each scheme define its own.

Your draft does not assign identifiers to existing FEC schemes. Tofacilitate interop tests, I suggest that you define at least one. Infact, I would suggest a very simple one, in which the REPAIR frameidentifies a range of SID, and then carries the XOR of all packets inthat range.
Agree. I am not a fan of the XOR code as it performs really badly inmany scenarios when losses occur in bursts and it might leadimplementers to only implement XOR and I would like to avoid that. Wecould maybe define xor (e.g. "interop_xor") in another draft especiallyfor interop purpose so that it is clear.


Yes, that was my intent too.

The suggestion above brings a discussion of the relative size of the"FEC Protected Payload" and the REPAIR frames. As in the exampleabove, I would expect REPAIR frames to include a small header followedby a combination of the content of several FEC Protected Payload, withthat combination being at least as long as the longest FEC ProtectedPayload in the set. That longest size, by default, can be a fullpacket payload (per PMTU), minus the length of the SID prefix. Butthat leave very little room for encoding the prefix of the REPAIRframe, which is likely to require at list the REPAIR frame type(arguably same length as the SOURCE_SYMBOL frame type), and SIDidentifying the range (same length as the SID parameter of theSOURCE_SYMBOL frame), and an additional parameter indicating thevariant of he repair frame according to the selected scheme (arguablythe same length as the coding window). Is that the problem that youare discussing in section 4.2.3?
If you refer to the end section 5.3 and not 4.2.3, yes. The REPAIR framemay contain metadata that may increase its overhead. So I see two waysto cope with this problem:
1) Restrict the maximum size of FEC Protected Payload (what we do now inour implems)2) Make it possible to "stream" REPAIR frames. This is a bit sad as youmay need more packets than repair symbols but on average that could workwell.

If you see the need to support streaming of repairs, then it would makesense to define how to do that in the generic "framing" document.

Restricting the length of the protected frames feels much simpler, forexample not having to worry about receiving only fractions of REPAIRframes. But you would want that to be a function of the PMTU, otherwiseyou introduce overhead, which is why I was mentioning "maximum overheadof repair"

Should there be some property associated to the FEC scheme, such asthe maximum overhead of a REPAIR frame?
If we decide 1) above, yes, that would be really helpful to have the FECscheme signal the max repair frame overhead, but that may not work withschemes that e.g. explicitly list the IDs of protected symbols.

Make that a "don't care" option, but it requires defining how to splitand reassemble repair frames.

(Also, why pad the FEC-protected data at the beginning rather than atthe end? Or leave that as a property of the FEC scheme?)
We pad it at the beginning so that padding can be naturally handled atthe QUIC layer. The padding can be parsed and handled as classical QUICpadding frames, so the decoded does not have to process the recoveredpayload before handling it to QUIC.

That does not feel like a very good reason. Padding at the end is"virtual", and does not introduce packet overhead. Yes, this means thefinal "repaired" frame may contain an arbitrary number of zeroes at theend, but these will be parsed as QUIC padding and ignored by the decoder.

I am not sure that I fully understand how to use the FEC WINDOW frame.You allow it to change, but what if the packet containing that frameis lost? How can the peer know when exactly the use of the new windowstarts, and which window is associated with a particular SOURCE_SYMBOLor REPAIR frame?
I think our draft is not clear enough about that. The FEC_WINDOW frameannounces that maximum amount of symbols that can be stored by the FECdecoder. It has a purpose comparable to QUIC MAX_DATA frames. Thisprevent the sender to send REPAIR frames that protect more symbols thanwhat the receiver is able to store.
In our view, the actual window protected by a repair symbol should beannounced inside the REPAIR frame carrying the repair symbol. Sourcesymbols can be associated with many different windows with differentsizes, all being defined by the repair symbols.
If we announce a reduced FEC_WINDOW and if the packet containing it islost, there will be a period of time where the server may send REPAIRframes that protect more symbols than what the receiver is allowed tostore and those REPAIR frames will be useless. In our scenarios, thisFEC_WINDOW limit was rarely hit though, as if you send at a limitedbitrate (e.g. real-time video), you'll probably protect less packetsthan the maximum buffer size of the FEC Decoder. Same thing is youprotect bulk transfers with mdeium/low BDPs, but this limit will likelybe hit with high BDPs or low-memory FEC Decoders.
Reed-Solomon codes are often characterized by two numbers, the lengthof the coding window and the number of redundant copies -- in ourcase, the number of REPAIR frames for a given coding window. It seemsthat in your proposal these two numbers are set arbitrarily by thesender. Should there me some negotiation of maximum values? Or wouldthose maximum values be deduced from the scheme identifier, somethinglike "reed solomon 32 + 8"? Or should the "repair" frame indicate thelength of the coding widow over which it operates?
I like the idea of the repair symbols being self-contained. However, Iunderstand that there might be good reasons to define limits, especiallyon constrained devices, as complex codes (e.g. large reed solomonblocks) might be too heavy CPU-wise.

I have used 40+8 in previous implementations. Yes, there is a CPU cost,but it is smaller than or comparable to the cost of encryption. Someimplementations might want to do that.

I am also not sure how the update of the coding window works for aconvolutional code
(see my answer in the paragraph below)
One way to understand the coding window is "the number of frames overwhich a given REPAIR may operate," but we are concerned withcorrelated losses happening in trains. To protect against that, it isnice to send the repair frames some time after the protected frames,in which case the window would an indication of how long a copy of agiven frame has to be kept. This could be expressed as a number ofpackets, butif multipath is supported we may want to send the repairs on adifferent path, and then using number of packets is not natural.
In our implem, the sender stores the maximum window size (announced byFEC_WINDOW frames). When generating a repair symbol, the size of thewindow protected by this repair symbol is set to min(max_window_size,n_symbols_in_flight). That window size is announced as part of therepair symbol. Concerning the receiver, it keeps track of the sourcesymbol with the highest received SID (highest_sid). It keeps in memorythe symbols with SID [highest_sid-max_window_size, highest_sid] and cantherefore perform decoding for repair symbols protecting windows sittinginside this interval.

I can see a negotiation between sender and receiver. In my mind, thereare two critical numbers, both on the receiver side:

* the maximum number of SID that it will remember, because that requirescommitting memory.

* the maximum number of SID+REPAIR frames that it is willing to combine,i.e., the size of the "repair matrix", because of both CPU and memory.Arguably, that one depends on the scheme.

If we have a negotiation, then I would expect it at the beginning of theconnection. The variable window seems like a secondary optimization, asin "we negotiated a maximum of 16+5, but for now i am sending 8+2because it seems sufficient".

I would rather encode that in the SID frame than in a separate frame,because we have some room in the SID frame (see length of REPAIR frameissue). Also because that removes the need for handling synchronizationof FEC_WINDOW, SID and REPAIR.

So using SIDs instead of packet numbers and number of frames seems morenatural to me.
Speaking of multipath: I had hopes at some point to be able to defineFEC-dedicated paths, even if the FEC-dedicated path is running on thesame network path (i.e. like a "virtual path" concept). Let me explain:frames protected by FEC could be sent on packets over the FEC-dedicatedpath only and frames not protected by FEC would be sent on another path.Both path could actually use the same network path, but this couldnaturally allow to decouple FEC-protected frames from others and removethe need of an SID frame (and spare space in packets!) if we ensure thepacket numbers are sent contiguously, as we could use packet numbers asSIDs. I know contiguous packet numbers is not how QUIC works, and thisidea may not be the ideal solution, but I describe it here as it mightspark clever ideas from you or other folks reading this thread.

Why specialize paths? You could spray both the SID and the REPAIR overmultiple paths. You don't need much there, but you do need a common SIDnumber space between all paths. And you need to handle reordering of outof order SID and REPAIR packets, with a reordering buffer the size ofthe FEC window.

OK, that's a lot of text. Some of that may be because I did not fullyunderstand your intent. I expect things to get clearer with your nextdraft, or when we start interop testing of different implementations...
Sorry if the draft is still in a bit of a rough form! We currentlyconcentrate on having implems and papers published (publishing FEC workis a *real* effort, as the ideas behind FEC are old, it is difficult toconvince reviewers of the novelty... :-) and that's fine but it takes time)
I will certainly integrate your comments in the next version(s), theyare all valuable and totally on-point ! Thank you so much for that.
Waiting to work on that!
Same, after all these years ! Maybe Brisbane is too early to host ahackathon table (especially that I don't know how many folks would beinterested), but that might be something we could do for next IETFs.Once I get my work published, I can also release my FEC encoder/decoderlib that has C bindings that could really be valuable for interop as well.


Will look into that, thanks.

-- Christian Huitema

Re: Comments on draft-michel-quic-fec-01

Reply via email to