On Wed, Mar 6, 2024 at 11:13 AM Dumitru Ceara <[email protected]> wrote:
>
> On 3/5/24 15:56, Numan Siddique wrote:
> > On Mon, Feb 26, 2024 at 7:59 AM Dumitru Ceara <[email protected]> wrote:
> >>
> >> Otherwise, in case there's also a SNAT rule that uses the VIP as
> >> external IP, we break sessions initiated from behind the VIP.
> >>
> >> This partially reverts 832893bdbb42 ("ovn-northd: Skip unsnat flows for
> >> load balancer vips in router ingress pipeline").  That's OK because
> >> commit 384a7c6237da ("northd: Refactor Logical Flows for routers with
> >> DNAT/Load Balancers") addressed the original issue in a better way:
> >>
> >>     In the reply direction, the order of traversal of the tables
> >>     "lr_in_defrag", "lr_in_unsnat" and "lr_in_dnat" adds incorrect
> >>     datapath flows that check ct_state in the wrong conntrack zone.
> >>     This is illustrated below where reply trafic enters the physical host
> >>     port (6) and traverses DNAT zone (14), SNAT zone (default), back to the
> >>     DNAT zone and then on to Logical Switch Port zone (22). The third
> >>     flow is incorrectly checking the state from the SNAT zone instead
> >>     of the DNAT zone.
> >>
> >> We also add a system test to ensure traffic initiated from behind a VIP
> >> + SNAT is not broken.
> >>
> >> Another nice side effect is that the northd I-P is slightly simplified
> >> because we don't need to track NAT external IPs anymore.
> >>
> >> Fixes: 832893bdbb42 ("ovn-northd: Skip unsnat flows for load balancer vips 
> >> in router ingress pipeline")
> >> Reported-at: https://issues.redhat.com/browse/FDP-291
> >> Signed-off-by: Dumitru Ceara <[email protected]>
> >
> >
> > Thanks for the fix.  It also simplified the lr-nat-stateful code.
> >
> > Acked-by: Numan Siddique <[email protected]>
> >
>
> Thanks, Numan!
>
> Applied to main and backported to all branches down to 22.03.
>
> Regards,
> Dumitru
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Hi Dumitru, Numan, Tim and folks,

I noticed that the HW offload of k8s nodePort traffic is broken due to
this change. The reason is that for client to nodePort (LB with VIP
being the node IP) traffic, when the packet is going through the
unSNAT stage in the SNAT CT zone, since the entry is never committed
to the SNAT zone, it will have CT state returned as "new", which
prevents the HW offload to work for such packets.

At the moment I have to revert this change in our downstream. For the
problem that was fixed by this change [0], I think we can avoid it by
separating the port range of SNAT and DNAT. For DNAT, the nodePort
range in k8s is configured by API-server option:

--service-node-port-range <a string in the form 'N1-N2'> Default: 30000-32767

For SNAT, it can be configured in the OVN's NAT table's
external_port_range column, and we can choose something like
10000-30000.

An extra benefit of this is that it reduces a CT recirc.

Alternatively solutions are:
Alternative1: Always commit to both SNAT and DNAT zones. This would
introduce unnecessary cost of CT entries and extra CT recirc.
Alternative2: Use a common zone for SNAT and DNAT. But there are other
issues reported for using the common zone [1]

Could you let me know if other thoughts on this?

[0] https://issues.redhat.com/browse/FDP-291
[1] b8c40e7593 
https://github.com/ovn-org/ovn/commit/b8c40e7593a9fa40a057268c507a912d67b99ec4

Thanks,
Han
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to