This patch set is a result of debugging different Librewan issues for the past few weeks in an attempt to solve the problem where ovs-monitor-ipsec gets stuck forever while calling ipsec commands and cannot progress any further.
Main parts here are the introduction of the reconciliation mechanism for the ipsec connections and termination of the stuck commands on timeout. This set also contains a lot of small changes that ultimately fix compatibility with multiple versions of Libreswan as well as improve visibility into what the ovs-monitor-ipsec process is doing by adding more verbose logging. For example, without the fist patch in the set, ovs-monitor-ipsec deadlocks both libreswan and itself with Libreswan 5 pretty easily: https://github.com/libreswan/libreswan/issues/1859 More details on addressed issues are in the commit messages. The last few patches in the set are adding a system test that stresses the reconciliation and various failure handling paths inside the monitor. Mainly because we do get a lot of failures from Libreswan while running the test. This test is currently actively used by Libreswan team to find and fix the root causes of multiple issues that triggered creation of this patch set. The intention for this patch set is to be backported to at least branch 3.3. But further down to 3.1 (or even 2.17 ?) may also be good. Luckily, the code is not that different on older branches. The set is tested with various versions of Libreswan including 3.32 (from Ubuntu 22.04), 4.5, 4.6, 4.9, 4.12, 4.14, 4.15 and 5.1. Without the set, only 4.5 and below work well enough, 4.9 - 4.15 are getting completely stuck with a few dozens of connections, and 5.1 deadlocks easily. With the set: 4.5 and below still work well, 5.1 works well, 4.9 - 4.15 can get into state with connectivity issues (libreswan issue that cannot be worked around externally), but it is much less likely to end up in this state and it affects only a couple individual connections instead of blocking the daemon as a whole. Also, 4.14 and 4.15 seems noticeably harder to get into that state (but still very possible). Ilya Maximets (9): ipsec: Add a helper function to run commands from the monitor. ipsec: libreswan: Reconcile missing connections periodically. ipsec: libreswan: Try to bring non-active connections up. ipsec: libreswan: Fix regexp for connections waiting on child SA. ipsec: libreswan: Avoid monitor hanging on stuck ipsec commands. ipsec: Make command timeout configurable. system-tests: Verbose cleanup of ports and namespaces. tests: ipsec: Add NxN + reconciliation test. tests: ipsec: Check that nodes can ping each other in the NxN test. ipsec/ovs-monitor-ipsec.in | 483 +++++++++++++++++++--------------- tests/system-common-macros.at | 7 +- tests/system-ipsec.at | 206 ++++++++++++++- 3 files changed, 463 insertions(+), 233 deletions(-) -- 2.46.0 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
