On 3/3/25 10:19, Martin Morgenstern via dev wrote:
> This is a robustness improvement for a specific case where very long
> ovn-controller iterations (about ~20s) and long JSONRPC message queues
> in the ovsdb client synchronization layer lead to unanswered echo
> requests that in turn lead to connections being dropped.
> 
> In such a case, the echo request is "stuck" in the incoming message
> queue and might not be processed in time, because we process everything
> in small batches.
> 
> Thus, instead of waiting until we can process an incoming echo request,
> we remember our last send activity and preemptively send an echo reply
> when needed.
> 
> Signed-off-by: Martin Morgenstern <martin.morgenst...@cloudandheat.com>
> ---

Hi, Martin.  Thanks for the set!

Regrading this particular change though, I don't think we should do that.
Generating unsolicited echo replies defeats one of the reasons those probes
exist in the first place.  We need to be able to check that the other
side receives our messages, and if the other side just generates replies
periodically, we loose that ability.  So, if the connection is half-fenced
(packets can go one way, but not the other), we'll be sending echo requests
and will receive echo replies even though the request or any other data
is not able to reach the client.  I've seen such issues in real-world OVN
setups.  This condition must be detectable with the probe.

If the application can't process all the messages in time, application
should set higher probe intervals, so it can reply in time.  If it can not
keep up with the messages and the incoming queue is ever-growing, that
needs to be fixed on the application side as well as it will never be up
to date and will fall behind more and more over time.

I'll try to take a look at the other patches in the set later this week.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to