Re: Buggy receive window auto-tuning inside multiple QUIC stacks (Re: More QUIC, how?)

Kazuho Oku Sat, 18 Oct 2025 00:41:09 -0700

2025年10月17日(金) 14:05 Christian Huitema <[email protected]>:


>
> On 10/16/2025 8:44 PM, Kazuho Oku wrote:
> > 2025年10月16日(木) 11:17 Ian Swett <[email protected]>:
> >
> >>
> >> On Tue, Oct 14, 2025 at 9:28 PM Kazuho Oku <[email protected]> wrote:
> >>
> >>>
> >>> 2025年10月14日(火) 23:45 Ian Swett <[email protected]>:
> >>>
> >>>> Thanks for bringing this up, Kazuho.  My back-of-the envelope math
> also
> >>>> indicated that 1/3 was a better value than 1/2 when I looked into it
> a few
> >>>> years ago, but I never constructed a clean test to prove it with a
> real
> >>>> congestion controller.  Unfortunately, our congestion control
> simulator is
> >>>> below our flow control layer.
> >>>>
> >>>> It probably makes sense to test this in the real-world and see if
> >>>> reducing it to 1/3 measurably reduces the number of blocked frames we
> >>>> receive on our servers.
> >>>>
> >>> Makes perfect sense. In fact, that was how we noticed the problem —
> >>> someone asked us why the H3 traffic we were serving was slower than H2.
> >>> Looking at the stats, we saw that we were receiving blocked frames, and
> >>> ended up reading the client-side source code to identify the bug.
> >>>
> >>>
> >>>> There are use cases when auto-tuning is nice.  Even for Chrome, there
> >>>> are cases when if we started with a smaller stream flow control
> window, we
> >>>> would have avoided some bugs where a few streams consume the entire
> >>>> connection flow control window.
> >>>>
> >>> Yeah, it can certainly be useful at times to block the sender’s
> progress
> >>> so that resources can be utilized elsewhere.
> >>>
> >>> That said, blocking Slow Start from making progress is a different
> matter
> >>> — especially after spending so much effort developing QUIC based on the
> >>> idea that reducing startup latency by one RTT is worth it.
> >>>
> >>>
> >> I completely agree.  Is there a good heuristic for guessing whether the
> >> peer is still in slow start, particularly when one doesn't know what
> >> congestion controller they're using?
> >>
> > IIUC, the primary intent of auto-tuning is to avoid bufferbloat when the
> > receiving application is slow to read.
> >
> > The intent makes perfect sense, but I’m under the impression that the
> "old"
> > approach - estimating the sender’s rate and trying to stay slightly ahead
> > of it - is showing its age.
>
> It really depends what you want to achieve. As you mention later, it
> depends whether the QUIC stack delivers data to the application via a
> queue or via a callback. If using a callback, by definition the stack
> will not keep a queue of packets "received but not delivered". Any
> buffer bloat will happen in the application. If it cannot keep up with
> the rate at which the peer is sending, it will have to either buffer
> unprocessed data or drop it on the floor. And if you don't want that to
> happen, then you need to either do a lot of guesswork and predict how
> fast the application will process the data, or instead just provide an
> API to the application to control that.


I mostly agree. What I am saying is that doing such things is better than
the old approach of trying to stay slightly ahead of peer's send rate,
because the old approach is fragile at the connection level and does not
work at the stream level.

And regarding the complexity of the guesswork, it does not need to be
complicated.

One simple way is to calculate a rough estimate in the first round trip and
set the receive window to that value, instead of trying to stay slightly
ahead of Slow Start. Then, over the following round trips, adjust the
receive window gradually - just as you would have done in the old approach.


> That's why I ended up
> implementing a per stream API in picoquic to let the application either
> just process data as they come (the default), or take control and open
> the "max stream data" parameter as it needs.
>
> By the way, there is another issue than just "the receiver cannot cope".
> Packets for a stream may be received out of order. The stack can only
> deliver them to the application in order. Suppose that the stack has
> increased "max stream data" to allow a thousand packets on the stream.
> If the first packet is lost, the stack may have to buffer 999 packets
> until it receives the correction. So there is a direct relation between
> "max stream data" times number of streams and the max memory that the
> stack will need. On a small device, one has to be cautious. And if you
> have a memory budget, then it makes sense to just enforce it using "max
> data".
>
> -- Christian Huitema
>
>

-- 
Kazuho Oku

Re: Buggy receive window auto-tuning inside multiple QUIC stacks (Re: More QUIC, how?)

Reply via email to