On Wed, Mar 25, 2026, at 12:16, Kazuho Oku wrote: > 2026年3月25日(水) 6:09 Martin Thomson <[email protected]>: >> On Tue, Mar 24, 2026, at 21:06, Kazuho Oku wrote: >> > * it avoids requiring the decoder of every frame type to support trial >> > or incremental decoding to handle truncated input; >> >> As I said, while you might simplify, but in doing so you introduce >> performance penalties that ultimately lead you back to having to handle >> incremental decoding. That this is an option is potentially valuable and an >> argument in favor, I wouldn't weight it too much. >> >> However, frame decoding is such a small part of an implementation that I see >> this as less of a problem than you are making out. >> >> > * the overhead is identical when sending large data; and >> >> Not entirely correct, but it's very close. When the length spans more (the >> "record" length always includes at least a STREAM frame header) there is a >> chance that the varint needs more bytes. So it will be every so slightly >> higher overhead. >> >> > * it naturally reduces the risk of blocking caused by frames crossing >> > TLS record boundaries. >> >> I don't think this is right. If you are blocking on the "record" being >> complete, any problem will be *worse*, not better if the "record" spans more >> bytes. > > Could you clarify why? > > My point is not that buffering disappears with records. Rather, my > point is that introducing records greatly reduces the likelihood of > STREAM frames spanning multiple TLS records, [...]
I don't find that reasoning compelling. If you are only sending STREAM frames, the same applies whether the length field is before the frame type or after. My reasoning is somewhat simpler. If you send X bytes, that will be split into segments. That means that you will get either a complete TLS record or not, because some segments are missing or delayed. At the TLS layer, it's all or nothing: you can't release bytes until the entire record is present. In that case, if you have a 1:1:1:1 write:frame:QMux record:TLS record ratio, there is zero difference between the options. Where things break down is on the last pairing. If you have QMux records split across TLS records, if you are going to realize the benefits you are touting, you need to wait until the next TLS record is present. However, if you are willing to incrementally process, you can process the partial STREAM frame in either model. The record-based model, which encourages implementations to wait for entire records, remains blocked if they take that option. In the version without QMux records, you can also process the WINDOW_UPDATE frames or whatever control frames are sent ahead of the record split. You more or less have to without records. In the record-based version, implementations will block, because that's easier to implement, so no performance advantage. The bytes are there, but they won't bother reading them off because that requires incremental frame decoding. I acknowledge that this is based on an assumption that the easier implementation option will be taken. And the cost of that might not be realized. Maybe the happy path is what most people experience. But this benefit applies to the non-record-based version as well. Of course, one thing you *can* bet on is that performance bugs that can happen, will happen. Murphy has a lot of relevance in that domain. > This in turn means that whenever a QMux-over-TLS receiver decrypts a > TLS record, a complete QMux record becomes available. That's not a guarantee, it's just a likely outcome. > As a result, additional latency due to QMux-layer buffering is avoided, > even if the receiver does not implement trial or incremental processing. My point is that the additional latency is all in the QMux record layer, because it encourages the easier implementation. Either model can take advantage of the approach you describe equally. So yes, that argument doesn't work for me.
