(I've been meaning to write this for a few days, but the IANA review prompted 
me to sit down and work through it.)

I have come to believe that we are reserving too many codepoints for greasing.

The greasing function we currently use is 27+N*31.  That's about 2^57 values.  
148764065110560896 by my estimation.

The waste I can tolerate.  This is a very large number, but it leaves plenty of 
values free for use.  The problem is that the odds of collision are too low.

This problem was highlighted when draft-iyengar-quic-delayed-ack-02 chose a 
greased codepoint.  The failure mode there is terrible: occasionally an 
implementation will generate a greased value that collides with this value.  If 
an implementation of that draft encounters that collision it will fail.  It 
will probably fail when trying to parse the transport parameter, but it could 
instead fail when it receives the frame.  Alternatively, is also plausible that 
an implementation of the draft could include a greased value with the same 
codepoint, which would be similarly bad.  Such failures are virtually 
impossible to reproduce and unless you are lucky enough to have good logging, 
you might have trouble catching the reason for the error.

The odds that an implementation of that draft would encounter a peer who used a 
greased codepoint with the same value is minuscule.  There are around 2^57 
valid greasing values, so even if every connection ever included greased 
values, the odds of a given connection containing that value is tiny. 
(Coincidentally, this is the same as the probability we allow for an attacker 
to attack the packet protection AEAD.)

I think that we want a higher risk of collision than that.  Then mistakes like 
this will result in a higher probability of failure when deployed.  Ideally, 
failures will be discovered in testing.

The reason we have a non-trivial number of reserved codepoints is to avoid 
having implementations that filter out these reserved values by just 
enumerating them.  However, for that we don't need so many reserved values.  
Even as little as 1000 will create a disincentive to tabulate the values.  That 
many would add a large 8k table to a simple implementation.

If we were to select 1000 (or 1024) values in a way that is relatively simple 
to generate from a random number, then that would be better than the current 
situation.

So I'm going to propose that we define a new function that takes a 10-bit 
random value and produces a greasing codepoint.  This can be used to produce a 
greasing codepoint.  I have two requirements:

* It should be a tiny bit more complex to implement a function that tests a 
value for whether it is reserved.  The current scheme is easy to filter.  (X - 
27) % 31 == 0 isn't much code to write.

* The scheme needs to produce values that encode to 1, 2, and 4 bytes in 
varints with reasonable probability.  An even distribution over 2^62 with 1024 
values has a gap of 2^52 between each value, which means that 2 byte and 4 byte 
values aren't likely to be represented.

The function I've been thinking of looks like this:

def grease(n):
    v = 27
    i = 1
    while n > 0:
        v = (v << i) | (n & 1)
        i = i + 1
        n = n >> 1
    return v

This is bit spreader, taking the bits of the random value and adding 
progressively more space between each bit.

When passed 0x3ff, the resulting value is the 60-bit binary value: 
0b110111010010001000010000010000001000000010000000010000000001.  Smaller values 
result in different 1 bits being turned off.  If it is passed 0x1ff, it 
produces the shorter 50-bit value of 
0b11011101001000100001000001000000100000001000000001.

The result is two values that encode to 1 byte, 14 values that encode to 2 
bytes, 48 values that encode to 4 bytes, with the remaining 960 values encoding 
to 8 bytes.

Detecting this is fairly easy, but also probably not worth writing out the code 
to do so.  This could be made considerably harder by flipping more than one bit 
at a time.  For example, `v = (v << i) ^ ((n & 1) * 19)`.

What do people think?  Is this a change we could tolerate?  Or is my logic 
faulty here in some way?

Reply via email to