On 31 Oct 2024, at 16:02, Ilya Maximets wrote:

> On 10/31/24 12:02, Eelco Chaudron wrote:
>>
>>
>> On 30 Oct 2024, at 14:50, Ilya Maximets wrote:
>>
>>> Multiple versions of Libreswan have an issue where ipsec --start
>>> command may get stuck forever.  This issue affects many popular
>>> versions of Libreswan from 4.5 to 4.15, which are shipped in most
>>> modern distributions.
>>>
>>> When ipsec --start gets stuck, ovs-monitor-ipsec hangs and can't do
>>> anything else, so not olny this one but all other tunnels are also
>>
>> nit: olny -> only
>
> OK.
>
>>
>>> not being started.
>>>
>>> Add a timeout to the subprocess call, so we do not wait forever.  Just
>>> introduced reconciliation process will clean things up and will try to
>>> re-add this connection later.
>>>
>>> Pluto may take a lot of time to process the --start request.  Notably,
>>> the time depends on the retransmission timeout, which is 60 seconds by
>>> default.  However, even at high scale, it doesn't take much more than
>>> that in tests.  So, 120 second timeout should be a reasonable default
>>> value.
>>>
>>> Note: it is observed in practice that the process doesn't actually
>>> terminate for a long time, so we can't afford waiting for it.
>>> That's the main reason why we're not using the subprocess.run() with
>>> a timeout option here (it would wait).  But also, because we'd had to
>>> catch the exception anyway.
>>>
>>> Reported-at: https://issues.redhat.com/browse/FDP-846
>>> Signed-off-by: Ilya Maximets <[email protected]>
>>> ---
>>>  ipsec/ovs-monitor-ipsec.in | 14 ++++++++++++--
>>>  1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
>>> index 5d4b77bd2..dba855af5 100755
>>> --- a/ipsec/ovs-monitor-ipsec.in
>>> +++ b/ipsec/ovs-monitor-ipsec.in
>>> @@ -84,6 +84,7 @@ exiting = False
>>>  monitor = None
>>>  xfrm = None
>>>  RECONCILIATION_INTERVAL = 15  # seconds
>>> +TIEMOUT_EXPIRED = 37
>>
>> TIEMOUT -> TIMEOUT
>
> Oops. :)
>
>>
>> Also, 37 seems rather random, any reason why this was chosen?
>
> It is random.  I just needed a non-zero number to return as I do
> not want to re-raise the exception and catch it in the caller.
>
> Maybe 137 would be a good number?  We do kill the process and
> 128 + 9 = 137 would be an exit code for a process terminated with
> SIGKILL in Linux.
>
> WDYT?

Sounds good to me.

>>
>>>
>>>
>>>  def run_command(args, description=None):
>>> @@ -96,7 +97,16 @@ def run_command(args, description=None):
>>
>> Maybe add the timeout as an argument, i.e. timeout=120?
>
> It's never used as an argument, but I can see an argument for
> not using a global value in the function from the next patch,
> so I'll change that.
>
>>
>>>      vlog.dbg("Running %s" % args)
>>>      proc = subprocess.Popen(args, stdout=subprocess.PIPE,
>>>                              stderr=subprocess.PIPE)
>>> -    pout, perr = proc.communicate()
>>> +    try:
>>> +        pout, perr = proc.communicate(timeout=120)
>>> +        ret = proc.returncode
>>> +    except subprocess.TimeoutExpired:
>>> +        vlog.warn("Command timed out trying to %s." % description)
>>
>> Maye add the timeout value to the log also?
>
> Maybe "Command timed out after %s seconds trying to %s." ?
>

Looks good to me.

>>
>>> +        pout, perr = b'', b''
>>> +        # Just kill the process here.  We can't afford waiting for it,
>>> +        # as it may be stuck and may not actually be terminated.
>>> +        proc.kill()
>>> +        ret = TIEMOUT_EXPIRED
>>>
>>>      if proc.returncode or perr:
>>>          vlog.warn("Failed to %s; exit code: %d"
>>> @@ -105,7 +115,7 @@ def run_command(args, description=None):
>>>          vlog.warn("stderr: %s" % perr)
>>>          vlog.warn("stdout: %s" % pout)
>>>
>>> -    return proc.returncode, pout.decode(), perr.decode()
>>> +    return ret, pout.decode(), perr.decode()
>>>
>>>
>>>  class XFRM(object):
>>> -- 
>>> 2.46.0
>>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to