On 4/22/24 10:55, Ales Musil wrote:
> On Mon, Apr 22, 2024 at 9:45 AM Martin Kalcok <[email protected]>
> wrote:
> 
>> wrt the failed ovn-kubernetes tests, they seemed to have passed
>> successfully in my branch [0]. Is it possible that the tests are unstable?
>>
>> [0] https://github.com/mkalcok/ovn/actions/runs/8752915096
>>
> 
> Hi Martin,
> 
> yeah unfortunately the test was broken and AFAIK it should be fixed now [0].
> 
> 
>>
>> On Fri, Apr 19, 2024 at 2:33 PM Martin Kalcok <[email protected]>
>> wrote:
>>
>>> This change fixes bug that breaks ability of machines from external
>>> networks, to communicate with machines in SNATed networks (specifically
>>> when using a Distributed router).
>>>
>>> Currently when a machine (S1) on an external network tries to talk
>>> over TCP with a machine (A1) in a network that has enabled SNAT, the
>>> connection is established successfully. However after the three-way
>>> handshake, any packets that come from the A1 machine will have their
>>> source address translated by the Distributed router, breaking the
>>> communication.
>>>
>>> Existing rule in `build_lrouter_out_snat_flow` that decides which
>>> packets should be SNATed already tries to avoid SNATing packets in
>>> reply direction with `(!ct.trk || !ct.rpl)`. However, previous stages
>>> in the distributed LR egress pipeline do not initiate the CT state.
>>>
>>> Additionally we need to commit new connections that originate from
>>> external networks into CT, so that the packets in the reply direction
>>> (back to the external network) can be properly identified.
>>>
>>> Rationale:
>>>
>>> In my original RFC [0], there were questions about the motivation for
>>> fixing this issue. I'll try to summarize why I think this is a bug
>>> that should be fixed.
>>>
>>> 1. Current implementation for Distributed router already tries to
>>>    avoid SNATing packets in the reply direction, it's just missing
>>>    initialized CT states to make proper decisions.
>>>
>>> 2. This same scenario works with Gateway Router. I tested with
>>>    following setup:
>>>
>>>     foo -- R1 -- join - R3 -- alice
>>>                   |
>>>     bar ----------R2
>>>
>>>     R1 is a Distributed router with SNAT for foo. R2 is a Gateway
>>>     router with SNAT for bar. R3 is a Gateway router with no SNAT.
>>>     Using 'alice1' as a client I was able to talk over TCP with
>>>     'bar1' but connection with 'foo1' failed.
>>>
>>> 3. Regarding security and "leaking" of internal IPs. Reading through
>>>    RFC 4787 [1], 5382 [2] and their update in 7857 [3], the
>>>    specifications do not seem to mandate that SNAT implementations
>>>    must filter incoming traffic destined directly to the internal
>>>    network. Sections like "5. Filtering Behavior" in 4787 and
>>>    "4.3 Externally Initiated Connections" in 5382 describe only
>>>    behavior for traffic destined to external IP/Port associated
>>>    with NAT on the device that performs NAT.
>>>
>>>    Besides, with the current implementation, it's already possible
>>>    to scan the internal network with pings and TCP syn scanning.
>>>
>>> 4. We do have customers/clouds that depend on this functionality.
>>>    This is a scenario that used to work in Openstack with ML2/OVS
>>>    and migrating those clouds to ML2/OVN would break it.
>>>
>>> [0]
>>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411670.html
>>> [1]https://datatracker.ietf.org/doc/html/rfc4787
>>> [2]https://datatracker.ietf.org/doc/html/rfc5382
>>> [3]https://datatracker.ietf.org/doc/html/rfc7857
>>>
>>> Signed-off-by: Martin Kalcok <[email protected]>
>>> ---
>>>  northd/northd.c         | 66 ++++++++++++++++++++++++++++-------
>>>  northd/ovn-northd.8.xml | 28 +++++++++++++++
>>>  tests/ovn-northd.at     | 76 +++++++++++++++++++++++++++++++++++++----
>>>  tests/system-ovn.at     | 68 ++++++++++++++++++++++++++++++++++++
>>>  4 files changed, 219 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index 37f443e70..2c2eee445 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -14425,20 +14425,27 @@ build_lrouter_out_is_dnat_local(struct
>>> lflow_table *lflows,
>>>
>>>  static void
>>>  build_lrouter_out_snat_match(struct lflow_table *lflows,
>>> -                             const struct ovn_datapath *od,
>>> -                             const struct nbrec_nat *nat, struct ds
>>> *match,
>>> -                             bool distributed_nat, int cidr_bits, bool
>>> is_v6,
>>> -                             struct ovn_port *l3dgw_port,
>>> -                             struct lflow_ref *lflow_ref)
>>> +                                         const struct ovn_datapath *od,
>>> +                                         const struct nbrec_nat *nat,
>>> +                                         struct ds *match,
>>> +                                         bool distributed_nat, int
>>> cidr_bits,
>>> +                                         bool is_v6,
>>> +                                         struct ovn_port *l3dgw_port,
>>> +                                         struct lflow_ref *lflow_ref,
>>> +                                         bool is_reverse)

I re-indented this part.

[...]

> 
> The change looks good to me, thanks!
> 
> Acked-by: Ales Musil <[email protected]>
> 
> [0]
> https://github.com/ovn-org/ovn-kubernetes/commit/236e63c665bb60dd48dd5cc7f8ef2324a061d229
> 

Applied to main and backported to 24.03.

Regards,
Dumitru


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to