Re: Comments on draft-michel-quic-fec-01

François Michel Mon, 22 Jan 2024 03:28:22 -0800

Hi Christian,

Great to hear from you, especially given you expertise in the topic!
Thank you for all your comments. See my answers below.

I am adding Rachel to the loop, that was interested in progressing inthe draft together and ensure the design can handle their use-case as well.



Le 19/01/24 à 07:29, Christian Huitema a écrit :

François, Olivier,
I just spent some time studying your draft on QUIC FEC. I like the ideaof having an FEC framework independent from the algorithm used toactually compute the FEC data and repair packets. Your draft solves anumber of practical problems, such as how to notify peers when FEC helpsreceive a frame from an otherwise lost packet, or how to identify"symblos" independently of packet numbers using the symbol identifierframe (SID).
The draft is obviously a work in progress.

Yes, the aim of this draft and current papers under submission is tospark the interest on the topic again. I've been working on FEC foryears now and was part of the earlier QUIC-FEC work at NWCRG where wealready wrote interesting drafts.My intent with this one here is to propose a short and simplespecification that people can wrap their head around. We can then makeit progress together with quicwg folks instead of proposing a first,exhaustive but complex draft that is difficult to apprehend.

You propose two alternatives for linking frames to a SID. I wish youpicked just one, and I prefer your first alternative, in which your SIDframes brackets a list of protected frames.

I prefer the first one as well, but I was insure it fits well the designof existing implementations. If people are okay, I'd be more than happyto only keep alternative 1.


 However, I an not quite sure

how this should be parsed. You give an example as:

   | Pkt(6)[STREAM(2, "xyz"),                                    |
   |        SOURCE_SYMBOL(1, { STREAM(8, "def"),                 |
   |                           DATAGRAM("msg") }]                |

In that example, the frame  STREAM(8..) and DATAGRAM() are protected,
while the "STREAM(2)" is not. Fine, but the syntax is described as:

SOURCE_SYMBOL {
   SID (i),
   FEC Protected Payload (..)
}
... and I don't know how to parse that. There is no indication of thelength of the "FEC Protected Payload". Do you mean to indicate that theSOURCE_SYMBOL frame extends to the end of the packet, and that allframes following the SID are protected?

Yes, that's right. In the current design, the frame stops at the end ofthe packet. We can add a length field or a number of protected frames.

You define a framework in which client and server negotiate to use FEC,and also to select a FEC scheme. The syntax of your transport parameterseems a bit restrictive: the client proposes to use FEC and a specificscheme, and the server accepts or refuse. Given the experimental natureof FEC, I expect that we will try several algorithms. It would be nicefor the client to propose a list, and for the server to pick one -- orzero, if it does not support any of the proposed values. In fact, Ithink that you could merge the "enable FEC" parameter that negotiatesuse of FEC with the "decoder FEC scheme" negotiation.

Agree, we could use the transport parameter to propose a list of FECschemes, that's indeed how most negotiation mechanisms work. The absenceof this parameter indicates FEC is not supported and an empty list wouldannounce that FEC SOURCE_SYMBOL frames are parsed but not used. Thismight cause problems though as the REPAIR frame format depends on thenegotiated scheme.

Your draft does not assign identifiers to existing FEC schemes. Tofacilitate interop tests, I suggest that you define at least one. Infact, I would suggest a very simple one, in which the REPAIR frameidentifies a range of SID, and then carries the XOR of all packets inthat range.

Agree. I am not a fan of the XOR code as it performs really badly inmany scenarios when losses occur in bursts and it might leadimplementers to only implement XOR and I would like to avoid that. Wecould maybe define xor (e.g. "interop_xor") in another draft especiallyfor interop purpose so that it is clear.

The suggestion above brings a discussion of the relative size of the"FEC Protected Payload" and the REPAIR frames. As in the example above,I would expect REPAIR frames to include a small header followed by acombination of the content of several FEC Protected Payload, with thatcombination being at least as long as the longest FEC Protected Payloadin the set. That longest size, by default, can be a full packet payload(per PMTU), minus the length of the SID prefix. But that leave verylittle room for encoding the prefix of the REPAIR frame, which is likelyto require at list the REPAIR frame type (arguably same length as theSOURCE_SYMBOL frame type), and SID identifying the range (same length asthe SID parameter of the SOURCE_SYMBOL frame), and an additionalparameter indicating the variant of he repair frame according to theselected scheme (arguably the same length as the coding window). Is thatthe problem that you are discussing in section 4.2.3?

If you refer to the end section 5.3 and not 4.2.3, yes. The REPAIR framemay contain metadata that may increase its overhead. So I see two waysto cope with this problem:

1) Restrict the maximum size of FEC Protected Payload (what we do now inour implems)2) Make it possible to "stream" REPAIR frames. This is a bit sad as youmay need more packets than repair symbols but on average that could workwell.

Should there besome property associated to the FEC scheme, such as the maximum overheadof a REPAIR frame?

If we decide 1) above, yes, that would be really helpful to have the FECscheme signal the max repair frame overhead, but that may not work withschemes that e.g. explicitly list the IDs of protected symbols.

(Also, why pad the FEC-protected data at thebeginning rather than at the end? Or leave that as a property of the FECscheme?)

We pad it at the beginning so that padding can be naturally handled atthe QUIC layer. The padding can be parsed and handled as classical QUICpadding frames, so the decoded does not have to process the recoveredpayload before handling it to QUIC.

I am not sure that I fully understand how to use the FEC WINDOW frame.You allow it to change, but what if the packet containing that frame islost? How can the peer know when exactly the use of the new windowstarts, and which window is associated with a particular SOURCE_SYMBOLor REPAIR frame?

I think our draft is not clear enough about that. The FEC_WINDOW frameannounces that maximum amount of symbols that can be stored by the FECdecoder. It has a purpose comparable to QUIC MAX_DATA frames. Thisprevent the sender to send REPAIR frames that protect more symbols thanwhat the receiver is able to store.

In our view, the actual window protected by a repair symbol should beannounced inside the REPAIR frame carrying the repair symbol. Sourcesymbols can be associated with many different windows with differentsizes, all being defined by the repair symbols.

If we announce a reduced FEC_WINDOW and if the packet containing it islost, there will be a period of time where the server may send REPAIRframes that protect more symbols than what the receiver is allowed tostore and those REPAIR frames will be useless. In our scenarios, thisFEC_WINDOW limit was rarely hit though, as if you send at a limitedbitrate (e.g. real-time video), you'll probably protect less packetsthan the maximum buffer size of the FEC Decoder. Same thing is youprotect bulk transfers with mdeium/low BDPs, but this limit will likelybe hit with high BDPs or low-memory FEC Decoders.

Reed-Solomon codes are often characterized by two numbers, the length ofthe coding window and the number of redundant copies -- in our case, thenumber of REPAIR frames for a given coding window. It seems that in yourproposal these two numbers are set arbitrarily by the sender. Shouldthere me some negotiation of maximum values? Or would those maximumvalues be deduced from the scheme identifier, something like "reedsolomon 32 + 8"? Or should the "repair" frame indicate the length of thecoding widow over which it operates?

I like the idea of the repair symbols being self-contained. However, Iunderstand that there might be good reasons to define limits, especiallyon constrained devices, as complex codes (e.g. large reed solomonblocks) might be too heavy CPU-wise.

I am also not sure how the update of the coding window works for aconvolutional code


(see my answer in the paragraph below)

One way to understand the coding window is "the number of frames overwhich a given REPAIR may operate," but we are concerned with correlatedlosses happening in trains. To protect against that, it is nice to sendthe repair frames some time after the protected frames, in which casethe window would an indication of how long a copy of a given frame hasto be kept. This could be expressed as a number of packets, butif multipath is supported we may want to send the repairs on a differentpath, and then using number of packets is not natural.

In our implem, the sender stores the maximum window size (announced byFEC_WINDOW frames). When generating a repair symbol, the size of thewindow protected by this repair symbol is set to min(max_window_size,n_symbols_in_flight). That window size is announced as part of therepair symbol. Concerning the receiver, it keeps track of the sourcesymbol with the highest received SID (highest_sid). It keeps in memorythe symbols with SID [highest_sid-max_window_size, highest_sid] and cantherefore perform decoding for repair symbols protecting windows sittinginside this interval.

So using SIDs instead of packet numbers and number of frames seems morenatural to me.

Speaking of multipath: I had hopes at some point to be able to defineFEC-dedicated paths, even if the FEC-dedicated path is running on thesame network path (i.e. like a "virtual path" concept). Let me explain:frames protected by FEC could be sent on packets over the FEC-dedicatedpath only and frames not protected by FEC would be sent on another path.Both path could actually use the same network path, but this couldnaturally allow to decouple FEC-protected frames from others and removethe need of an SID frame (and spare space in packets!) if we ensure thepacket numbers are sent contiguously, as we could use packet numbers asSIDs. I know contiguous packet numbers is not how QUIC works, and thisidea may not be the ideal solution, but I describe it here as it mightspark clever ideas from you or other folks reading this thread.

OK, that's a lot of text. Some of that may be because I did not fullyunderstand your intent. I expect things to get clearer with your nextdraft, or when we start interop testing of different implementations...

Sorry if the draft is still in a bit of a rough form! We currentlyconcentrate on having implems and papers published (publishing FEC workis a *real* effort, as the ideas behind FEC are old, it is difficult toconvince reviewers of the novelty... :-) and that's fine but it takes time)

I will certainly integrate your comments in the next version(s), theyare all valuable and totally on-point ! Thank you so much for that.

Waiting to work on that!

Same, after all these years ! Maybe Brisbane is too early to host ahackathon table (especially that I don't know how many folks would beinterested), but that might be something we could do for next IETFs.Once I get my work published, I can also release my FEC encoder/decoderlib that has C bindings that could really be valuable for interop as well.


Thank you again !

Cheers,

François


-- Christian Huitema

Re: Comments on draft-michel-quic-fec-01

Reply via email to