On 11/4/24 09:21, Roi Dayan wrote:
> 
> 
> On 03/11/2024 22:26, Ilya Maximets wrote:
>> On 11/1/24 02:23, Ilya Maximets wrote:
>>> This patch set is a result of debugging different Librewan issues
>>> for the past few weeks in an attempt to solve the problem where
>>> ovs-monitor-ipsec gets stuck forever while calling ipsec commands
>>> and cannot progress any further.
>>>
>>> Main parts here are the introduction of the reconciliation mechanism
>>> for the ipsec connections and termination of the stuck commands on
>>> timeout.
>>>
>>> This set also contains a lot of small changes that ultimately fix
>>> compatibility with multiple versions of Libreswan as well as improve
>>> visibility into what the ovs-monitor-ipsec process is doing by adding
>>> more verbose logging.
>>> For example, without the fist patch in the set, ovs-monitor-ipsec
>>> deadlocks both libreswan and itself with Libreswan 5 pretty easily:
>>>   https://github.com/libreswan/libreswan/issues/1859
>>> More details on addressed issues are in the commit messages.
>>>
>>> The last few patches in the set are adding a system test that stresses
>>> the reconciliation and various failure handling paths inside the
>>> monitor.  Mainly because we do get a lot of failures from Libreswan
>>> while running the test.  This test is currently actively used by
>>> Libreswan team to find and fix the root causes of multiple issues that
>>> triggered creation of this patch set.
>>>
>>> The intention for this patch set is to be backported to at least
>>> branch 3.3.  But further down to 3.1 (or even 2.17 ?) may also be good.
>>> Luckily, the code is not that different on older branches.
>>
>> Thanks, Roi and Eelco for review!
>>
>> I applied the set to main.  And since we're still providing releases
>> for 3.1 according to the 'last 4 release branches' policy, backported
>> down to 3.1.  This should be enough for most users.
>>
>> For now, doesn't seem critical enough to backport to 2.17 (old LTS).
>>
>> Best regards, Ilya Maximets.
>>
> 
> ha ok sorry for late review on part of the series :)
> thanks

Sorry, I wanted to get this out of my way and didn't wait long for more
reviews since both of you already reviewed the most important parts of
the set.

Thanks for the rest of the reviews anyways, it's good to have more eyes
on the changes!

Best regards, Ilya Maximets.

> 
>>>
>>> The set is tested with various versions of Libreswan including
>>> 3.32 (from Ubuntu 22.04), 4.5, 4.6, 4.9, 4.12, 4.14, 4.15 and 5.1.
>>>
>>> Without the set, only 4.5 and below work well enough, 4.9 - 4.15 are
>>> getting completely stuck with a few dozens of connections, and 5.1
>>> deadlocks easily.
>>>
>>> With the set: 4.5 and below still work well, 5.1 works well, 4.9 - 4.15
>>> can get into state with connectivity issues (libreswan issue that cannot
>>> be worked around externally), but it is much less likely to end up in
>>> this state and it affects only a couple individual connections instead
>>> of blocking the daemon as a whole.  Also, 4.14 and 4.15 seems noticeably
>>> harder to get into that state (but still very possible).
>>>
>>>
>>> Version 3:
>>>   - Updated description logs in the run_command().
>>>   - Fixed typos and added timeout value to the log.
>>>   - Changed the test to report as skipped after the configuration is checked
>>>     instead of silently skipping the ping test with Libreswan 4.x.
>>>   - Added Acks from Roi to patches 1-4.
>>>   - Added Acks from Eelco to patches 2-4 and 7-8.
>>>   - Added one new patch at the end of the set that greatly reduces
>>>     chances for hitting bugs in Libreswan (still not enough to enable
>>>     the ping test, but much better than without it).  Can drop this
>>>     one if not desired, but seems useful in real world setups.
>>>
>>> Version 2:
>>>   - Moved the regexp patch earlier in the set to avoid CI failures.
>>>   - Added logic to avoid reconciliation triggered on every wake up
>>>     if there are no configuration changes.  Now it runs only once in
>>>     15 seconds, if there are no config changes.
>>>   - Improved regexp for loaded connections.  Now we match on the
>>>     string starting with a digit (IP address) after the name.
>>>     This solves matching on connections that do not have === in their
>>>     formatting.  No idea why libreswan prints differently sometimes.
>>>   - Addressed comments from Roi: removed unnecessary len() and moved
>>>     stdout/err decoding to the common function.
>>>   - Added grep on pluto's ERRORs to the test, so they are more visible.
>>>
>>>
>>> Ilya Maximets (10):
>>>   ipsec: Add a helper function to run commands from the monitor.
>>>   ipsec: libreswan: Fix regexp for connections waiting on child SA.
>>>   ipsec: libreswan: Reconcile missing connections periodically.
>>>   ipsec: libreswan: Try to bring non-active connections up.
>>>   ipsec: libreswan: Avoid monitor hanging on stuck ipsec commands.
>>>   ipsec: Make command timeout configurable.
>>>   system-tests: Verbose cleanup of ports and namespaces.
>>>   tests: ipsec: Add NxN + reconciliation test.
>>>   tests: ipsec: Check that nodes can ping each other in the NxN test.
>>>   ipsec: libreswan: Reduce chances for crossing streams.
>>>
>>>  ipsec/ovs-monitor-ipsec.in    | 496 +++++++++++++++++++---------------
>>>  tests/system-common-macros.at |   7 +-
>>>  tests/system-ipsec.at         | 208 +++++++++++++-
>>>  3 files changed, 480 insertions(+), 231 deletions(-)
>>>
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to