On 30 Oct 2024, at 14:50, Ilya Maximets wrote:

> Until now, functions that needed to call external programs like openssl
> or ipsec commands were using subprocess commands directly.  Most of
> these calls had no failure checks or any logging making it hard to
> understand what is happening inside the daemon when something doesn't
> work as intended.
>
> Some commands also had a chance to not read the command output in full.
> That might sound like not a big problem, but in practice it causes
> ovs-monitor-ipsec to deadlock pluto and itself with certain versions of
> Libreswan (mainly Libreswan 5+).  The order of events is following:
>
>  1. ovs-monitor-ipsec calls ipsec status redirecting the output
>     to a pipe.
>  2. ipsec status calls ipsec whack.
>  3. ipsec whack connects to pluto and asks for status.
>  4. ovs-monitor-ipsec doesn't read the pipe in full.
>  5. ipsec whack blocks on write to the other side of the pipe
>     when it runs out of buffer space.
>  6. pluto blocks on sendmsg to ipsec whack for the same reason.
>  7. ovs-monitor-ipsec calls another ipsec command and blocks
>     on connection to pluto.
>
> In this scenario the running process is at the mercy of garbage
> collector and it doesn't run because we're blocked on calling another
> ipsec command.  All the processes are completely blocked and will not
> do any work until ipsec whack is killed.
>
> With this change we're introducing a new function that will be used
> for all the external process execution commands and will read the full
> output before returning, avoiding the deadlock.  It will also log all
> the failures as warnings and the commands themselves at the debug level.
>
> We'll be adding more logic into this function in later commits as well,
> so it will not stay that simple.
>
> Signed-off-by: Ilya Maximets <[email protected]>

Thanks for the series, I did finish reviewing it and did some basic testing.

Please find comments below and in the successive patches.

//Eelco

> ---
>  ipsec/ovs-monitor-ipsec.in | 290 +++++++++++++++++--------------------
>  1 file changed, 131 insertions(+), 159 deletions(-)
>
> diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
> index 37c509ac6..4885e048f 100755
> --- a/ipsec/ovs-monitor-ipsec.in
> +++ b/ipsec/ovs-monitor-ipsec.in
> @@ -84,6 +84,28 @@ monitor = None
>  xfrm = None
>
>
> +def run_command(args, description=None):
> +    """ This function runs the process args[0] with args[1:] arguments
> +    and returns a tuple: return-code, stdout, stderr. """
> +
> +    if not description:
> +        description = args[-1]

If the description is not given, it makes more sense to show the actual command 
instead of the last argument.
Or maybe replace the string with “run”, and in the timeout case just print the 
entire command?

> +
> +    vlog.dbg("Running %s" % args)
> +    proc = subprocess.Popen(args, stdout=subprocess.PIPE,
> +                            stderr=subprocess.PIPE)
> +    pout, perr = proc.communicate()
> +
> +    if proc.returncode or perr:
> +        vlog.warn("Failed to %s; exit code: %d"
> +                  % (description, proc.returncode))
> +        vlog.warn("cmdline: %s" % proc.args)
> +        vlog.warn("stderr: %s" % perr)
> +        vlog.warn("stdout: %s" % pout)
> +
> +    return proc.returncode, pout.decode(), perr.decode()
> +
> +

<SNIP>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to