Hi Christian,
Great to hear from you, especially given you expertise in the topic!
Thank you for all your comments. See my answers below.
I am adding Rachel to the loop, that was interested in progressing in
the draft together and ensure the design can handle their use-case as well.
Le 19/01/24 à 07:29, Christian Huitema a écrit :
François, Olivier,
I just spent some time studying your draft on QUIC FEC. I like the idea
of having an FEC framework independent from the algorithm used to
actually compute the FEC data and repair packets. Your draft solves a
number of practical problems, such as how to notify peers when FEC helps
receive a frame from an otherwise lost packet, or how to identify
"symblos" independently of packet numbers using the symbol identifier
frame (SID).
The draft is obviously a work in progress.
Yes, the aim of this draft and current papers under submission is to
spark the interest on the topic again. I've been working on FEC for
years now and was part of the earlier QUIC-FEC work at NWCRG where we
already wrote interesting drafts.
My intent with this one here is to propose a short and simple
specification that people can wrap their head around. We can then make
it progress together with quicwg folks instead of proposing a first,
exhaustive but complex draft that is difficult to apprehend.
You propose two alternatives for linking frames to a SID. I wish you
picked just one, and I prefer your first alternative, in which your SID
frames brackets a list of protected frames.
I prefer the first one as well, but I was insure it fits well the design
of existing implementations. If people are okay, I'd be more than happy
to only keep alternative 1.
However, I an not quite sure
how this should be parsed. You give an example as:
| Pkt(6)[STREAM(2, "xyz"), |
| SOURCE_SYMBOL(1, { STREAM(8, "def"), |
| DATAGRAM("msg") }] |
In that example, the frame STREAM(8..) and DATAGRAM() are protected,
while the "STREAM(2)" is not. Fine, but the syntax is described as:
SOURCE_SYMBOL {
SID (i),
FEC Protected Payload (..)
}
... and I don't know how to parse that. There is no indication of the
length of the "FEC Protected Payload". Do you mean to indicate that the
SOURCE_SYMBOL frame extends to the end of the packet, and that all
frames following the SID are protected?
Yes, that's right. In the current design, the frame stops at the end of
the packet. We can add a length field or a number of protected frames.
You define a framework in which client and server negotiate to use FEC,
and also to select a FEC scheme. The syntax of your transport parameter
seems a bit restrictive: the client proposes to use FEC and a specific
scheme, and the server accepts or refuse. Given the experimental nature
of FEC, I expect that we will try several algorithms. It would be nice
for the client to propose a list, and for the server to pick one -- or
zero, if it does not support any of the proposed values. In fact, I
think that you could merge the "enable FEC" parameter that negotiates
use of FEC with the "decoder FEC scheme" negotiation.
Agree, we could use the transport parameter to propose a list of FEC
schemes, that's indeed how most negotiation mechanisms work. The absence
of this parameter indicates FEC is not supported and an empty list would
announce that FEC SOURCE_SYMBOL frames are parsed but not used. This
might cause problems though as the REPAIR frame format depends on the
negotiated scheme.
Your draft does not assign identifiers to existing FEC schemes. To
facilitate interop tests, I suggest that you define at least one. In
fact, I would suggest a very simple one, in which the REPAIR frame
identifies a range of SID, and then carries the XOR of all packets in
that range.
Agree. I am not a fan of the XOR code as it performs really badly in
many scenarios when losses occur in bursts and it might lead
implementers to only implement XOR and I would like to avoid that. We
could maybe define xor (e.g. "interop_xor") in another draft especially
for interop purpose so that it is clear.
The suggestion above brings a discussion of the relative size of the
"FEC Protected Payload" and the REPAIR frames. As in the example above,
I would expect REPAIR frames to include a small header followed by a
combination of the content of several FEC Protected Payload, with that
combination being at least as long as the longest FEC Protected Payload
in the set. That longest size, by default, can be a full packet payload
(per PMTU), minus the length of the SID prefix. But that leave very
little room for encoding the prefix of the REPAIR frame, which is likely
to require at list the REPAIR frame type (arguably same length as the
SOURCE_SYMBOL frame type), and SID identifying the range (same length as
the SID parameter of the SOURCE_SYMBOL frame), and an additional
parameter indicating the variant of he repair frame according to the
selected scheme (arguably the same length as the coding window). Is that
the problem that you are discussing in section 4.2.3?
If you refer to the end section 5.3 and not 4.2.3, yes. The REPAIR frame
may contain metadata that may increase its overhead. So I see two ways
to cope with this problem:
1) Restrict the maximum size of FEC Protected Payload (what we do now in
our implems)
2) Make it possible to "stream" REPAIR frames. This is a bit sad as you
may need more packets than repair symbols but on average that could work
well.
Should there be
some property associated to the FEC scheme, such as the maximum overhead
of a REPAIR frame?
If we decide 1) above, yes, that would be really helpful to have the FEC
scheme signal the max repair frame overhead, but that may not work with
schemes that e.g. explicitly list the IDs of protected symbols.
(Also, why pad the FEC-protected data at the
beginning rather than at the end? Or leave that as a property of the FEC
scheme?)
We pad it at the beginning so that padding can be naturally handled at
the QUIC layer. The padding can be parsed and handled as classical QUIC
padding frames, so the decoded does not have to process the recovered
payload before handling it to QUIC.
I am not sure that I fully understand how to use the FEC WINDOW frame.
You allow it to change, but what if the packet containing that frame is
lost? How can the peer know when exactly the use of the new window
starts, and which window is associated with a particular SOURCE_SYMBOL
or REPAIR frame?
I think our draft is not clear enough about that. The FEC_WINDOW frame
announces that maximum amount of symbols that can be stored by the FEC
decoder. It has a purpose comparable to QUIC MAX_DATA frames. This
prevent the sender to send REPAIR frames that protect more symbols than
what the receiver is able to store.
In our view, the actual window protected by a repair symbol should be
announced inside the REPAIR frame carrying the repair symbol. Source
symbols can be associated with many different windows with different
sizes, all being defined by the repair symbols.
If we announce a reduced FEC_WINDOW and if the packet containing it is
lost, there will be a period of time where the server may send REPAIR
frames that protect more symbols than what the receiver is allowed to
store and those REPAIR frames will be useless. In our scenarios, this
FEC_WINDOW limit was rarely hit though, as if you send at a limited
bitrate (e.g. real-time video), you'll probably protect less packets
than the maximum buffer size of the FEC Decoder. Same thing is you
protect bulk transfers with mdeium/low BDPs, but this limit will likely
be hit with high BDPs or low-memory FEC Decoders.
Reed-Solomon codes are often characterized by two numbers, the length of
the coding window and the number of redundant copies -- in our case, the
number of REPAIR frames for a given coding window. It seems that in your
proposal these two numbers are set arbitrarily by the sender. Should
there me some negotiation of maximum values? Or would those maximum
values be deduced from the scheme identifier, something like "reed
solomon 32 + 8"? Or should the "repair" frame indicate the length of the
coding widow over which it operates?
I like the idea of the repair symbols being self-contained. However, I
understand that there might be good reasons to define limits, especially
on constrained devices, as complex codes (e.g. large reed solomon
blocks) might be too heavy CPU-wise.
I am also not sure how the update of the coding window works for a
convolutional code
(see my answer in the paragraph below)
One way to understand the coding window is "the number of frames over
which a given REPAIR may operate," but we are concerned with correlated
losses happening in trains. To protect against that, it is nice to send
the repair frames some time after the protected frames, in which case
the window would an indication of how long a copy of a given frame has
to be kept. This could be expressed as a number of packets, but
if multipath is supported we may want to send the repairs on a different
path, and then using number of packets is not natural.
In our implem, the sender stores the maximum window size (announced by
FEC_WINDOW frames). When generating a repair symbol, the size of the
window protected by this repair symbol is set to min(max_window_size,
n_symbols_in_flight). That window size is announced as part of the
repair symbol. Concerning the receiver, it keeps track of the source
symbol with the highest received SID (highest_sid). It keeps in memory
the symbols with SID [highest_sid-max_window_size, highest_sid] and can
therefore perform decoding for repair symbols protecting windows sitting
inside this interval.
So using SIDs instead of packet numbers and number of frames seems more
natural to me.
Speaking of multipath: I had hopes at some point to be able to define
FEC-dedicated paths, even if the FEC-dedicated path is running on the
same network path (i.e. like a "virtual path" concept). Let me explain:
frames protected by FEC could be sent on packets over the FEC-dedicated
path only and frames not protected by FEC would be sent on another path.
Both path could actually use the same network path, but this could
naturally allow to decouple FEC-protected frames from others and remove
the need of an SID frame (and spare space in packets!) if we ensure the
packet numbers are sent contiguously, as we could use packet numbers as
SIDs. I know contiguous packet numbers is not how QUIC works, and this
idea may not be the ideal solution, but I describe it here as it might
spark clever ideas from you or other folks reading this thread.
OK, that's a lot of text. Some of that may be because I did not fully
understand your intent. I expect things to get clearer with your next
draft, or when we start interop testing of different implementations...
Sorry if the draft is still in a bit of a rough form! We currently
concentrate on having implems and papers published (publishing FEC work
is a *real* effort, as the ideas behind FEC are old, it is difficult to
convince reviewers of the novelty... :-) and that's fine but it takes time)
I will certainly integrate your comments in the next version(s), they
are all valuable and totally on-point ! Thank you so much for that.
Waiting to work on that!
Same, after all these years ! Maybe Brisbane is too early to host a
hackathon table (especially that I don't know how many folks would be
interested), but that might be something we could do for next IETFs.
Once I get my work published, I can also release my FEC encoder/decoder
lib that has C bindings that could really be valuable for interop as well.
Thank you again !
Cheers,
François
-- Christian Huitema