François, Olivier,
I just spent some time studying your draft on QUIC FEC. I like the idea
of having an FEC framework independent from the algorithm used to
actually compute the FEC data and repair packets. Your draft solves a
number of practical problems, such as how to notify peers when FEC helps
receive a frame from an otherwise lost packet, or how to identify
"symblos" independently of packet numbers using the symbol identifier
frame (SID).
The draft is obviously a work in progress.
You propose two alternatives for linking frames to a SID. I wish you
picked just one, and I prefer your first alternative, in which your SID
frames brackets a list of protected frames. However, I an not quite sure
how this should be parsed. You give an example as:
| Pkt(6)[STREAM(2, "xyz"), |
| SOURCE_SYMBOL(1, { STREAM(8, "def"), |
| DATAGRAM("msg") }] |
In that example, the frame STREAM(8..) and DATAGRAM() are protected,
while the "STREAM(2)" is not. Fine, but the syntax is described as:
SOURCE_SYMBOL {
SID (i),
FEC Protected Payload (..)
}
... and I don't know how to parse that. There is no indication of the
length of the "FEC Protected Payload". Do you mean to indicate that the
SOURCE_SYMBOL frame extends to the end of the packet, and that all
frames following the SID are protected?
You define a framework in which client and server negotiate to use FEC,
and also to select a FEC scheme. The syntax of your transport parameter
seems a bit restrictive: the client proposes to use FEC and a specific
scheme, and the server accepts or refuse. Given the experimental nature
of FEC, I expect that we will try several algorithms. It would be nice
for the client to propose a list, and for the server to pick one -- or
zero, if it does not support any of the proposed values. In fact, I
think that you could merge the "enable FEC" parameter that negotiates
use of FEC with the "decoder FEC scheme" negotiation.
Your draft does not assign identifiers to existing FEC schemes. To
facilitate interop tests, I suggest that you define at least one. In
fact, I would suggest a very simple one, in which the REPAIR frame
identifies a range of SID, and then carries the XOR of all packets in
that range.
The suggestion above brings a discussion of the relative size of the
"FEC Protected Payload" and the REPAIR frames. As in the example above,
I would expect REPAIR frames to include a small header followed by a
combination of the content of several FEC Protected Payload, with that
combination being at least as long as the longest FEC Protected Payload
in the set. That longest size, by default, can be a full packet payload
(per PMTU), minus the length of the SID prefix. But that leave very
little room for encoding the prefix of the REPAIR frame, which is likely
to require at list the REPAIR frame type (arguably same length as the
SOURCE_SYMBOL frame type), and SID identifying the range (same length as
the SID parameter of the SOURCE_SYMBOL frame), and an additional
parameter indicating the variant of he repair frame according to the
selected scheme (arguably the same length as the coding window). Is that
the problem that you are discussing in section 4.2.3? Should there be
some property associated to the FEC scheme, such as the maximum overhead
of a REPAIR frame? (Also, why pad the FEC-protected data at the
beginning rather than at the end? Or leave that as a property of the FEC
scheme?)
I am not sure that I fully understand how to use the FEC WINDOW frame.
You allow it to change, but what if the packet containing that frame is
lost? How can the peer know when exactly the use of the new window
starts, and which window is associated with a particular SOURCE_SYMBOL
or REPAIR frame?
Reed-Solomon codes are often characterized by two numbers, the length of
the coding window and the number of redundant copies -- in our case, the
number of REPAIR frames for a given coding window. It seems that in your
proposal these two numbers are set arbitrarily by the sender. Should
there me some negotiation of maximum values? Or would those maximum
values be deduced from the scheme identifier, something like "reed
solomon 32 + 8"? Or should the "repair" frame indicate the length of the
coding widow over which it operates?
I am also not sure how the update of the coding window works for a
convolutional code.
One way to understand the coding window is "the number of frames over
which a given REPAIR may operate," but we are concerned with correlated
losses happening in trains. To protect against that, it is nice to send
the repair frames some time after the protected frames, in which case
the window would an indication of how long a copy of a given frame has
to be kept. This could be expressed as a number of packets, but
if multipath is supported we may want to send the repairs on a different
path, and then using number of packets is not natural.
OK, that's a lot of text. Some of that may be because I did not fully
understand your intent. I expect things to get clearer with your next
draft, or when we start interop testing of different implementations...
Waiting to work on that!
-- Christian Huitema