On Thu, Oct 16, 2025 at 10:05:13PM -0700, Christian Huitema wrote: > By the way, there is another issue than just "the receiver cannot cope". > Packets for a stream may be received out of order. The stack can only > deliver them to the application in order. Suppose that the stack has > increased "max stream data" to allow a thousand packets on the stream. If > the first packet is lost, the stack may have to buffer 999 packets until it > receives the correction. So there is a direct relation between "max stream > data" times number of streams and the max memory that the stack will need. > On a small device, one has to be cautious. And if you have a memory budget, > then it makes sense to just enforce it using "max data".
In H2 (which doesn't have the out-of-order issue), I have been working on a complex algorithm to try to determine how much to advertise so as to keep the minimum amount of data between the demux and the application layer. That was particularly complex (measuring ordering of queue/dequeue) and looked a bit like congestion control algorithms. Then I figured that everything could easily fall into pieces with bursty streams in parallel, and that actually it was much easier to simply allocate a budget for the whole connection and assign a part of it for the streams that are present. I don't remember the exact details but basically there was a large percentage of the budget that was equally shared between streams that were expected to receive data, and a small part for future streams. This allows new streams to work (albeit possibly not fast) when others are already receiving data, but as soon as a new stream also needs to receive data, the other ones see their share reduce and won't refill the rx window until it reaches the previously advertised level. While I found that pretty naive, it happened to work surprisingly well, allowing us to multiply the single-POST performance by 30 or so, and to get rid of HOL when merging multiple downloading client connections to a same server over H2. Finally we managed to port it to QUIC as well (with some adaptations that I don't remember). I only remember that it was harder with QUIC since you cannot benefit from the TCP stack's ability to compensate for your excessive advertisements, but overall it's ok. The key here is to never allocate everything to a given stream so that new ones still have something to start to work with, and let the distribution rebalance by itself as transfers progress. In the end I simply gave up with the initial design which would probably only allow us to save extra memory in optimal cases, but as you said above, with out-of-order you don't gain anything anymore if you have to buffer the last 999/1000 of the contents waiting for the first one to arrive before delivering it. Willy
