On 31 Oct 2024, at 12:51, Roi Dayan wrote:
> On 31/10/2024 13:31, Ilya Maximets wrote: >> On 10/31/24 12:01, Eelco Chaudron wrote: >>> >>> >>> On 30 Oct 2024, at 14:50, Ilya Maximets wrote: >>> >>>> Until now, functions that needed to call external programs like openssl >>>> or ipsec commands were using subprocess commands directly. Most of >>>> these calls had no failure checks or any logging making it hard to >>>> understand what is happening inside the daemon when something doesn't >>>> work as intended. >>>> >>>> Some commands also had a chance to not read the command output in full. >>>> That might sound like not a big problem, but in practice it causes >>>> ovs-monitor-ipsec to deadlock pluto and itself with certain versions of >>>> Libreswan (mainly Libreswan 5+). The order of events is following: >>>> >>>> 1. ovs-monitor-ipsec calls ipsec status redirecting the output >>>> to a pipe. >>>> 2. ipsec status calls ipsec whack. >>>> 3. ipsec whack connects to pluto and asks for status. >>>> 4. ovs-monitor-ipsec doesn't read the pipe in full. >>>> 5. ipsec whack blocks on write to the other side of the pipe >>>> when it runs out of buffer space. >>>> 6. pluto blocks on sendmsg to ipsec whack for the same reason. >>>> 7. ovs-monitor-ipsec calls another ipsec command and blocks >>>> on connection to pluto. >>>> >>>> In this scenario the running process is at the mercy of garbage >>>> collector and it doesn't run because we're blocked on calling another >>>> ipsec command. All the processes are completely blocked and will not >>>> do any work until ipsec whack is killed. >>>> >>>> With this change we're introducing a new function that will be used >>>> for all the external process execution commands and will read the full >>>> output before returning, avoiding the deadlock. It will also log all >>>> the failures as warnings and the commands themselves at the debug level. >>>> >>>> We'll be adding more logic into this function in later commits as well, >>>> so it will not stay that simple. >>>> >>>> Signed-off-by: Ilya Maximets <[email protected]> >>> >>> Thanks for the series, I did finish reviewing it and did some basic testing. >>> >>> Please find comments below and in the successive patches. >>> >>> //Eelco >>> >>>> --- >>>> ipsec/ovs-monitor-ipsec.in | 290 +++++++++++++++++-------------------- >>>> 1 file changed, 131 insertions(+), 159 deletions(-) >>>> >>>> diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in >>>> index 37c509ac6..4885e048f 100755 >>>> --- a/ipsec/ovs-monitor-ipsec.in >>>> +++ b/ipsec/ovs-monitor-ipsec.in >>>> @@ -84,6 +84,28 @@ monitor = None >>>> xfrm = None >>>> >>>> >>>> +def run_command(args, description=None): >>>> + """ This function runs the process args[0] with args[1:] arguments >>>> + and returns a tuple: return-code, stdout, stderr. """ >>>> + >>>> + if not description: >>>> + description = args[-1] >>> >>> If the description is not given, it makes more sense to show the actual >>> command >>> instead of the last argument. >>> Or maybe replace the string with “run”, and in the timeout case just print >>> the >>> entire command? >> >> We do already print the entire command (proc.args). I guess what Roi >> suggested before - printing the first argument - makes sense here, but >> we'll need to adjust the text of the message. >> >> Maybe we can do: >> >> if not description: >> description = "run %s command" % args[0] >> >> So it will turn into "Failed to run ipsec/openssl command; exit code: ..." >> And then printing of the whole cmdline. >> >> What do you think? Roi? > > yes this looks fine. thanks. +1 >> >>> >>>> + >>>> + vlog.dbg("Running %s" % args) >>>> + proc = subprocess.Popen(args, stdout=subprocess.PIPE, >>>> + stderr=subprocess.PIPE) >>>> + pout, perr = proc.communicate() >>>> + >>>> + if proc.returncode or perr: >>>> + vlog.warn("Failed to %s; exit code: %d" >>>> + % (description, proc.returncode)) >>>> + vlog.warn("cmdline: %s" % proc.args) >>>> + vlog.warn("stderr: %s" % perr) >>>> + vlog.warn("stdout: %s" % pout) >>>> + >>>> + return proc.returncode, pout.decode(), perr.decode() >>>> + >>>> + >>> >>> <SNIP> >>> >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
