On 10/29/24 13:16, Roi Dayan wrote:
>
>
> On 29/10/2024 12:14, Ilya Maximets wrote:
>> Until now, functions that needed to call external programs like openssl
>> or ipsec commands were using subprocess commands directly. Most of
>> these calls had no failure checks or any logging making it hard to
>> understand what is happening inside the daemon when something doesn't
>> work as intended.
>>
>> Some commands also had a chance to not read the command output in full.
>> That might sound like not a big problem, but in practice it causes
>> ovs-monitor-ipsec to deadlock pluto and itself with certain versions of
>> Libreswan (mainly Libreswan 5+). The order of events is following:
>>
>> 1. ovs-monitor-ipsec calls ipsec status redirecting the output
>> to a pipe.
>> 2. ipsec status calls ipsec whack.
>> 3. ipsec whack connects to pluto and asks for status.
>> 4. ovs-monitor-ipsec doesn't read the pipe in full.
>> 5. ipsec whack blocks on write to the other side of the pipe
>> when it runs out of buffer space.
>> 6. pluto blocks on sendmsg to ipsec whack for the same reason.
>> 7. ovs-monitor-ipsec calls another ipsec command and blocks
>> on connection to pluto.
>>
>> In this scenario the running process is at the mercy of garbage
>> collector and it doesn't run because we're blocked on calling another
>> ipsec command. All the processes are completely blocked and will not
>> do any work until ipsec whack is killed.
>>
>> With this change we're introducing a new function that will be used
>> for all the external process execution commands and will read the full
>> output before returning, avoiding the deadlock. It will also log all
>> the failures as warnings and the commands themselves at the debug level.
>>
>> We'll be adding more logic into this function in later commits as well,
>> so it will not stay that simple.
>>
>> Signed-off-by: Ilya Maximets <[email protected]>
>> ---
>> ipsec/ovs-monitor-ipsec.in | 296 +++++++++++++++++--------------------
>> 1 file changed, 134 insertions(+), 162 deletions(-)
>>
>> diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
>> index 37c509ac6..19a401609 100755
>> --- a/ipsec/ovs-monitor-ipsec.in
>> +++ b/ipsec/ovs-monitor-ipsec.in
>> @@ -84,6 +84,28 @@ monitor = None
>> xfrm = None
>>
>>
>> +def run_command(args, description=None):
>> + """ This function runs the process args[0] with args[1:] arguments
>> + and returns a tuple: return-code, stdout, stderr. """
>> +
>> + if not description:
>
> the last arg as a description might not give much or confuse?
> maybe it's better to use the first arg i.e. the command ?
Thanks, Roi, for taking a look!
The first argument in most cases is 'ipsec', so it doesn't give
a lot of context. The last argument is used, just because I was
a little lazy and didn't want to add repeating descriptions to
all the calls like this:
run_command(self.IPSEC_AUTO +
["--config", self.IPSEC_CONF,
"--ctlsocket", self.IPSEC_CTL,
"--add",
"--asynchronous", "prevent_unencrypted_gre"])
These ones have a recongnizeable last argument, which is an actual
command name. Most other types of calls do have their description
specified.
So, we could either keep as is, or add "prevent unencrypted gre"
to the commands like the one above, which seems a little repetitive.
But using the first argument doesn't seem to be very useful.
WDYT?
>
>> + description = args[-1]
>> +
>> + vlog.dbg("Running %s" % args)
>> + proc = subprocess.Popen(args, stdout=subprocess.PIPE,
>> + stderr=subprocess.PIPE)
>> + pout, perr = proc.communicate()
>> +
>
> you can skip call to len() in the if statement.
> pout and perr should always be a string empty or not.
> the if statement will work well on perr itself.
Yeah, good point, I'll change that.
>
>> + if proc.returncode or len(perr):
>> + vlog.warn("Failed to %s; exit code: %d"
>> + % (description, proc.returncode))
>> + vlog.warn("cmdline: %s" % proc.args)
>> + vlog.warn("stderr: %s" % perr)
>> + vlog.warn("stdout: %s" % pout)
>> +
>
> Since pout and perr are always strings you don't need
> the or statement.
>
> this should work the same
>
> return proc.returncode, pout, perr
I guess, there were too many verions of this code and they were not
initialized at some point during timeout exception handling in the
patch #5, but they are now, so I should fix this, agreed.
>
> also every caller that uses pout and/or perr
> will do decode().strip().
> so why not return it like so by default?
>
> return proc.returncode, pout.decode().strip(), perr.decode()
Good point. I'd leave the strip() for the caller, but otherwise
it makes sense to decode here, true.
>
>
>> + return proc.returncode, pout or b'', perr or b''
>> +
>> +
Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev