On 1 Nov 2024, at 2:23, Ilya Maximets wrote:

> Multiple versions of Libreswan have an issue where ipsec --start
> command may get stuck forever.  This issue affects many popular
> versions of Libreswan from 4.5 to 4.15, which are shipped in most
> modern distributions.
>
> When ipsec --start gets stuck, ovs-monitor-ipsec hangs and can't do
> anything else, so not only this one but all other tunnels are also
> not being started.
>
> Add a timeout to the subprocess call, so we do not wait forever.  Just
> introduced reconciliation process will clean things up and will try to
> re-add this connection later.
>
> Pluto may take a lot of time to process the --start request.  Notably,
> the time depends on the retransmission timeout, which is 60 seconds by
> default.  However, even at high scale, it doesn't take much more than
> that in tests.  So, 120 second timeout should be a reasonable default
> value.
>
> Note: it is observed in practice that the process doesn't actually
> terminate for a long time, so we can't afford waiting for it.
> That's the main reason why we're not using the subprocess.run() with
> a timeout option here (it would wait).  But also, because we'd had to
> catch the exception anyway.
>
> Reported-at: https://issues.redhat.com/browse/FDP-846
> Signed-off-by: Ilya Maximets <[email protected]>


Changes look good to me.

Acked-by: Eelco Chaudron <[email protected]>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to