On Mon, Mar 13, 2023 at 1:08 PM Simon Horman <[email protected]> wrote:
> On Mon, Mar 13, 2023 at 08:20:32AM +0100, Ales Musil wrote: > > On Fri, Mar 10, 2023 at 2:25 PM Simon Horman <[email protected]> > > wrote: > > > > > On Thu, Mar 09, 2023 at 07:16:04AM +0100, Ales Musil wrote: > > > > There are essentially three problems with the current > > > > combination of DGP + SNAT + LB: > > > > > > > > 1) The first packet is being SNATed in common zone due > > > > to a problem with pinctrl not preserving ct_mark/ct_label. > > > > The commit would create a SNAT entry within the same with DNAT > > > > entry. > > > > > > > > 2) The unSNAT for reply always happened in common zone because of > > > > the loopback check which would be triggered only when we loop > > > > the packet through the LR. Now there are two possibilities how > > > > the reply packet would be handled: > > > > > > > > a) If the entry for SNAT in common zone did not time out yet, the > > > > packet would be passed through unSNAT in common zone which would > > > > be fine and continue on. However, the unDNAT wouldn't work due to > > > > the limitation of CT not capable of doing unSNAT/unDNAT on the same > > > > packet twice in the same zone. So the reply would arrive to > > > > the original interface, but with wrong source address. > > > > > > > > b) If the entry for SNAT timed out it would loop and do unSNAT > correctly > > > > in separate zone and then also unDNAT. This is not possible anymore > with > > > > a recent change 8c341b9d (northd: Drop packets destined to router > owned > > > NAT IP for DGP). > > > > The reply would be dropped before looping after that change co the > > > traffic > > > > would never arrive to the original interface. > > > > > > > > 3) The unDNAT was happening only if the DGP was outport meaning > > > > the reply traffic was routed out, but in the opposite case > > > > the unDNAT was happening only because of the looping which made > > > > outport=inport. That's why it worked before introduction of explicit > > > drop. > > > > > > > > In order to fix all those issues do two changes: > > > > > > > > 1) Include inport in the unDNAT match, so that we have both > > > > routing directions covered e.g. (inport == "dgp_port" || outport == > > > "dpg_port"). > > > > > > > > 2) Always use the separate zone for SNAT and DNAT. As the common > > > > zone was needed for HWOL make the common zone optional with > > > > configuration option called "use_common_zone". This option is > > > > by default "false" and can be specified per LR. Use of separate > > > > zones also eliminates the need for the flag propagation > > > > in "lr_out_chk_dnat_local" stage, removing the match on > ct_mark/ct_label. > > > > > > > > Reported-at: https://bugzilla.redhat.com/2161281 > > > > Signed-off-by: Ales Musil <[email protected]> > > > > --- > > > > v2: Fix flaky system test. > > > > v3: Rebase on top of current main. > > > > > > I am seeing consistent failure of system-tests with this version :( > > > > > > 237: SNAT in separate zone from DNAT -- ovn-northd -- > parallelization=yes > > > -- ovn_monitor_all=yes FAILED (system-ovn.at:8702) > > > 238: SNAT in separate zone from DNAT -- ovn-northd -- > parallelization=yes > > > -- ovn_monitor_all=no FAILED (system-ovn.at:8702) > > > 239: SNAT in separate zone from DNAT -- ovn-northd -- > parallelization=no > > > -- ovn_monitor_all=yes FAILED (system-ovn.at:8702) > > > 240: SNAT in separate zone from DNAT -- ovn-northd -- > parallelization=no > > > -- ovn_monitor_all=no FAILED (system-ovn.at:8702) > > > > > > Link: > > > > https://github.com/horms/ovn/actions/runs/4383139214/jobs/7676416462#step:13:3734 > > > > > > > > I was wondering why it was always green for me and this is the system > tests > > over userspace datapath. > > It's nice that we are catching bugs with that already. I will take a look > > at why it is failing. But IMO it shouldn't > > be as this patch doesn't have anything specific per datapath type. > > It may be that there is some unreliability in the test, unrelated to your > patch. If so, I wouldn't see it as blocking your patch. But it would be > nice to get on top of it at some point. > > We are actually hitting the recirculation limit in this scenario (6 for userspace datapath). I'm not yet sure how to properly solve that. Thanks, Ales -- Ales Musil Senior Software Engineer - OVN Core Red Hat EMEA <https://www.redhat.com> [email protected] IM: amusil <https://red.ht/sig> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
