On 10/31/24 12:01, Eelco Chaudron wrote:
>
>
> On 30 Oct 2024, at 14:50, Ilya Maximets wrote:
>
>> Until now, functions that needed to call external programs like openssl
>> or ipsec commands were using subprocess commands directly. Most of
>> these calls had no failure checks or any logging making it hard to
>> understand what is happening inside the daemon when something doesn't
>> work as intended.
>>
>> Some commands also had a chance to not read the command output in full.
>> That might sound like not a big problem, but in practice it causes
>> ovs-monitor-ipsec to deadlock pluto and itself with certain versions of
>> Libreswan (mainly Libreswan 5+). The order of events is following:
>>
>> 1. ovs-monitor-ipsec calls ipsec status redirecting the output
>> to a pipe.
>> 2. ipsec status calls ipsec whack.
>> 3. ipsec whack connects to pluto and asks for status.
>> 4. ovs-monitor-ipsec doesn't read the pipe in full.
>> 5. ipsec whack blocks on write to the other side of the pipe
>> when it runs out of buffer space.
>> 6. pluto blocks on sendmsg to ipsec whack for the same reason.
>> 7. ovs-monitor-ipsec calls another ipsec command and blocks
>> on connection to pluto.
>>
>> In this scenario the running process is at the mercy of garbage
>> collector and it doesn't run because we're blocked on calling another
>> ipsec command. All the processes are completely blocked and will not
>> do any work until ipsec whack is killed.
>>
>> With this change we're introducing a new function that will be used
>> for all the external process execution commands and will read the full
>> output before returning, avoiding the deadlock. It will also log all
>> the failures as warnings and the commands themselves at the debug level.
>>
>> We'll be adding more logic into this function in later commits as well,
>> so it will not stay that simple.
>>
>> Signed-off-by: Ilya Maximets <[email protected]>
>
> Thanks for the series, I did finish reviewing it and did some basic testing.
>
> Please find comments below and in the successive patches.
>
> //Eelco
>
>> ---
>> ipsec/ovs-monitor-ipsec.in | 290 +++++++++++++++++--------------------
>> 1 file changed, 131 insertions(+), 159 deletions(-)
>>
>> diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
>> index 37c509ac6..4885e048f 100755
>> --- a/ipsec/ovs-monitor-ipsec.in
>> +++ b/ipsec/ovs-monitor-ipsec.in
>> @@ -84,6 +84,28 @@ monitor = None
>> xfrm = None
>>
>>
>> +def run_command(args, description=None):
>> + """ This function runs the process args[0] with args[1:] arguments
>> + and returns a tuple: return-code, stdout, stderr. """
>> +
>> + if not description:
>> + description = args[-1]
>
> If the description is not given, it makes more sense to show the actual
> command
> instead of the last argument.
> Or maybe replace the string with “run”, and in the timeout case just print the
> entire command?
We do already print the entire command (proc.args). I guess what Roi
suggested before - printing the first argument - makes sense here, but
we'll need to adjust the text of the message.
Maybe we can do:
if not description:
description = "run %s command" % args[0]
So it will turn into "Failed to run ipsec/openssl command; exit code: ..."
And then printing of the whole cmdline.
What do you think? Roi?
>
>> +
>> + vlog.dbg("Running %s" % args)
>> + proc = subprocess.Popen(args, stdout=subprocess.PIPE,
>> + stderr=subprocess.PIPE)
>> + pout, perr = proc.communicate()
>> +
>> + if proc.returncode or perr:
>> + vlog.warn("Failed to %s; exit code: %d"
>> + % (description, proc.returncode))
>> + vlog.warn("cmdline: %s" % proc.args)
>> + vlog.warn("stderr: %s" % perr)
>> + vlog.warn("stdout: %s" % pout)
>> +
>> + return proc.returncode, pout.decode(), perr.decode()
>> +
>> +
>
> <SNIP>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev