This patch set is a result of debugging different Librewan issues
for the past few weeks in an attempt to solve the problem where
ovs-monitor-ipsec gets stuck forever while calling ipsec commands
and cannot progress any further.

Main parts here are the introduction of the reconciliation mechanism
for the ipsec connections and termination of the stuck commands on
timeout.

This set also contains a lot of small changes that ultimately fix
compatibility with multiple versions of Libreswan as well as improve
visibility into what the ovs-monitor-ipsec process is doing by adding
more verbose logging.
For example, without the fist patch in the set, ovs-monitor-ipsec
deadlocks both libreswan and itself with Libreswan 5 pretty easily:
  https://github.com/libreswan/libreswan/issues/1859
More details on addressed issues are in the commit messages.

The last few patches in the set are adding a system test that stresses
the reconciliation and various failure handling paths inside the
monitor.  Mainly because we do get a lot of failures from Libreswan
while running the test.  This test is currently actively used by
Libreswan team to find and fix the root causes of multiple issues that
triggered creation of this patch set.

The intention for this patch set is to be backported to at least
branch 3.3.  But further down to 3.1 (or even 2.17 ?) may also be good.
Luckily, the code is not that different on older branches.

The set is tested with various versions of Libreswan including
3.32 (from Ubuntu 22.04), 4.5, 4.6, 4.9, 4.12, 4.14, 4.15 and 5.1.

Without the set, only 4.5 and below work well enough, 4.9 - 4.15 are
getting completely stuck with a few dozens of connections, and 5.1
deadlocks easily.

With the set: 4.5 and below still work well, 5.1 works well, 4.9 - 4.15
can get into state with connectivity issues (libreswan issue that cannot
be worked around externally), but it is much less likely to end up in
this state and it affects only a couple individual connections instead
of blocking the daemon as a whole.  Also, 4.14 and 4.15 seems noticeably
harder to get into that state (but still very possible).


Version 3:
  - Updated description logs in the run_command().
  - Fixed typos and added timeout value to the log.
  - Changed the test to report as skipped after the configuration is checked
    instead of silently skipping the ping test with Libreswan 4.x.
  - Added Acks from Roi to patches 1-4.
  - Added Acks from Eelco to patches 2-4 and 7-8.
  - Added one new patch at the end of the set that greatly reduces
    chances for hitting bugs in Libreswan (still not enough to enable
    the ping test, but much better than without it).  Can drop this
    one if not desired, but seems useful in real world setups.

Version 2:
  - Moved the regexp patch earlier in the set to avoid CI failures.
  - Added logic to avoid reconciliation triggered on every wake up
    if there are no configuration changes.  Now it runs only once in
    15 seconds, if there are no config changes.
  - Improved regexp for loaded connections.  Now we match on the
    string starting with a digit (IP address) after the name.
    This solves matching on connections that do not have === in their
    formatting.  No idea why libreswan prints differently sometimes.
  - Addressed comments from Roi: removed unnecessary len() and moved
    stdout/err decoding to the common function.
  - Added grep on pluto's ERRORs to the test, so they are more visible.


Ilya Maximets (10):
  ipsec: Add a helper function to run commands from the monitor.
  ipsec: libreswan: Fix regexp for connections waiting on child SA.
  ipsec: libreswan: Reconcile missing connections periodically.
  ipsec: libreswan: Try to bring non-active connections up.
  ipsec: libreswan: Avoid monitor hanging on stuck ipsec commands.
  ipsec: Make command timeout configurable.
  system-tests: Verbose cleanup of ports and namespaces.
  tests: ipsec: Add NxN + reconciliation test.
  tests: ipsec: Check that nodes can ping each other in the NxN test.
  ipsec: libreswan: Reduce chances for crossing streams.

 ipsec/ovs-monitor-ipsec.in    | 496 +++++++++++++++++++---------------
 tests/system-common-macros.at |   7 +-
 tests/system-ipsec.at         | 208 +++++++++++++-
 3 files changed, 480 insertions(+), 231 deletions(-)

-- 
2.46.0

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to