On Wed, Mar 25, 2026, at 12:16, Kazuho Oku wrote:
> 2026年3月25日(水) 6:09 Martin Thomson <[email protected]>:
>> On Tue, Mar 24, 2026, at 21:06, Kazuho Oku wrote:
>> > * it avoids requiring the decoder of every frame type to support trial 
>> > or incremental decoding to handle truncated input;
>> 
>> As I said, while you might simplify, but in doing so you introduce 
>> performance penalties that ultimately lead you back to having to handle 
>> incremental decoding.  That this is an option is potentially valuable and an 
>> argument in favor, I wouldn't weight it too much.
>> 
>> However, frame decoding is such a small part of an implementation that I see 
>> this as less of a problem than you are making out.
>> 
>> > * the overhead is identical when sending large data; and
>> 
>> Not entirely correct, but it's very close.  When the length spans more (the 
>> "record" length always includes at least a STREAM frame header) there is a 
>> chance that the varint needs more bytes.  So it will be every so slightly 
>> higher overhead.
>> 
>> > * it naturally reduces the risk of blocking caused by frames crossing 
>> > TLS record boundaries.
>> 
>> I don't think this is right.  If you are blocking on the "record" being 
>> complete, any problem will be *worse*, not better if the "record" spans more 
>> bytes.
>
> Could you clarify why?
>
> My point is not that buffering disappears with records. Rather, my 
> point is that introducing records greatly reduces the likelihood of 
> STREAM frames spanning multiple TLS records, [...]

I don't find that reasoning compelling.  If you are only sending STREAM frames, 
the same applies whether the length field is before the frame type or after.

My reasoning is somewhat simpler.

If you send X bytes, that will be split into segments.  That means that you 
will get either a complete TLS record or not, because some segments are missing 
or delayed.  At the TLS layer, it's all or nothing: you can't release bytes 
until the entire record is present.

In that case, if you have a 1:1:1:1 write:frame:QMux record:TLS record ratio, 
there is zero difference between the options.  Where things break down is on 
the last pairing.

If you have QMux records split across TLS records, if you are going to realize 
the benefits you are touting, you need to wait until the next TLS record is 
present.  However, if you are willing to incrementally process, you can process 
the partial STREAM frame in either model.  The record-based model, which 
encourages implementations to wait for entire records, remains blocked if they 
take that option.

In the version without QMux records, you can also process the WINDOW_UPDATE 
frames or whatever control frames are sent ahead of the record split.  You more 
or less have to without records.  In the record-based version, implementations 
will block, because that's easier to implement, so no performance advantage.  
The bytes are there, but they won't bother reading them off because that 
requires incremental frame decoding.

I acknowledge that this is based on an assumption that the easier 
implementation option will be taken.  And the cost of that might not be 
realized.  Maybe the happy path is what most people experience.  But this 
benefit applies to the non-record-based version as well.

Of course, one thing you *can* bet on is that performance bugs that can happen, 
will happen.  Murphy has a lot of relevance in that domain.

> This in turn means that whenever a QMux-over-TLS receiver decrypts a 
> TLS record, a complete QMux record becomes available.

That's not a guarantee, it's just a likely outcome.

> As a result, additional latency due to QMux-layer buffering is avoided, 
> even if the receiver does not implement trial or incremental processing.

My point is that the additional latency is all in the QMux record layer, 
because it encourages the easier implementation.  Either model can take 
advantage of the approach you describe equally.  So yes, that argument doesn't 
work for me.

Reply via email to