I ran into an unexpected interaction between DLPMTUD and QUIC's idle timeout in my implementation. I've implemented the "Probing using padding data" described in the DLPMTUD draft: I send a single PING frame and then use PADDING frames to achieve the desired probe packet size.
Assume that you're sending MTU probe packets on a timer, and that timer fires a long timer after the connection became idle (but before the idle timeout). Peer A will therefore send a probe packet to its peer, B. Since the probe packet is ack-eliciting, this resets the idle timeout timer for A. It now happens that the probe packet is too large for the path, and the packet is dropped. Therefore, B's idle timeout is not reset, and A and B will have a large disagreement about the start and end of the idle period. This root cause of this disagreement is that MTU probe packets are treated differently than other ack-eliciting packets from a loss recovery standpoint: they are not retransmitted, but their loss is interpreted as a signal that the path doesn't support that particular MTU. Note that A not resetting the idle timer when sending a probe packet doesn't solve the problem. There's another case where this fails: Assume this time the probe packet is received by B, but the ACK for that packet is lost. Now B will have reset its probe timer when it received the probe packet, but A will not, leading again to a large disagreement about the start and end of the idle period, this time in the other direction. I can see multiple solutions to this: 1. Don't send MTU packets on a timer. Only send them when application data is sent. This avoids sending packets during periods of quiescence (might be good to not wake up the network interface), but it also means that we're not using those periods of quiescence, where plenty of congestion window is available. 2. Retransmit the PING frame from the probe packet in a normal size packet, until it is acknowledged. This is sad, since it will cause an additional packet to be sent every time a probe packet is lost. Any thoughts on how to best deal with this? Cheers, Marten
