On Tue, Sep 27, 2022 at 3:02 PM Mark Michelson <[email protected]> wrote: > > Commit 4deac4509abbedd6ffaecf27eed01ddefccea40a introduced functionality > in ovn-northd to use a single zone for SNAT and DNAT when possible. It > accounts for certain situations, such as hairpinned traffic, where we > still need a separate SNAT and DNAT zone. > > However, one situation it did not account for was when traffic traverses > a logical router and is DNATted as a result of a load balancer, then > when the traffic egresses the router, it needs to be SNATted. In this > situation, we try to use the same CT zone for the SNAT as for the load > balancer DNAT, which does not work. > > This commit fixes the issue by setting the DNAT_LOCAL bit in the initial > stage of the egress pipeline if the packet was dnatted during the > ingress pipeline. This ensures that when the SNAT stage is reached, a > separate CT zone is used for SNAT. > > Signed-off-by: Mark Michelson <[email protected]>
Acked-by: Numan Siddique <[email protected]> Numan > --- > v2 -> v3: > * Rebased on top of current main branch > * Fixed checkpatch issues from v2. > * Accounted for the ct_label -> ct_mark change. > * All tests are now passing (locally) > --- > northd/northd.c | 26 ++++++++- > northd/ovn-northd.8.xml | 16 ++++++ > tests/ovn-northd.at | 7 ++- > tests/system-ovn.at | 117 ++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 163 insertions(+), 3 deletions(-) > > diff --git a/northd/northd.c b/northd/northd.c > index 84440a47f..5eac7fa51 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -13633,7 +13633,8 @@ static void > build_lrouter_nat_defrag_and_lb(struct ovn_datapath *od, struct hmap *lflows, > const struct hmap *ports, struct ds *match, > struct ds *actions, > - const struct shash *meter_groups) > + const struct shash *meter_groups, > + bool ct_lb_mark) > { > if (!od->nbr) { > return; > @@ -13827,6 +13828,26 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath > *od, struct hmap *lflows, > } > } > > + if (od->nbr->n_nat) { > + ds_clear(match); > + const char *ct_natted = ct_lb_mark ? > + "ct_mark.natted" : > + "ct_label.natted"; > + ds_put_format(match, "ip && %s == 1", ct_natted); > + /* This flow is unique since it is in the egress pipeline but checks > + * the value of ct_label.natted, which would have been set in the > + * ingress pipeline. If a change is ever introduced that clears or > + * otherwise invalidates the ct_label between the ingress and egress > + * pipelines, then an alternative will need to be devised. > + */ > + ds_clear(actions); > + ds_put_cstr(actions, REGBIT_DST_NAT_IP_LOCAL" = 1; next;"); > + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_CHECK_DNAT_LOCAL, > + 50, ds_cstr(match), ds_cstr(actions), > + &od->nbr->header_); > + > + } > + > /* Handle force SNAT options set in the gateway router. */ > if (od->is_gw_router) { > if (dnat_force_snat_ip) { > @@ -13925,7 +13946,8 @@ build_lswitch_and_lrouter_iterate_by_od(struct > ovn_datapath *od, > build_misc_local_traffic_drop_flows_for_lrouter(od, lsi->lflows); > build_lrouter_arp_nd_for_datapath(od, lsi->lflows, lsi->meter_groups); > build_lrouter_nat_defrag_and_lb(od, lsi->lflows, lsi->ports, &lsi->match, > - &lsi->actions, lsi->meter_groups); > + &lsi->actions, lsi->meter_groups, > + lsi->features->ct_no_masked_label); > } > > /* Helper function to combine all lflow generation which is iterated by port. > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > index dae961c87..177b6341d 100644 > --- a/northd/ovn-northd.8.xml > +++ b/northd/ovn-northd.8.xml > @@ -4392,6 +4392,22 @@ nd_ns { > </li> > </ul> > > + <p> > + This table also installs a priority-50 logical flow for each logical > + router that has NATs configured on it. The flow has match > + <code>ip && ct_label.natted == 1</code> and action > + <code>REGBIT_DST_NAT_IP_LOCAL = 1; next;</code>. This is intended > + to ensure that traffic that was DNATted locally will use a separate > + conntrack zone for SNAT if SNAT is required later in the egress > + pipeline. Note that this flow checks the value of > + <code>ct_label.natted</code>, which is set in the ingress pipeline. > + This means that ovn-northd assumes that this value is carried over > + from the ingress pipeline to the egress pipeline and is not altered > + or cleared. If conntrack label values are ever changed to be cleared > + between the ingress and egress pipelines, then the match conditions > + of this flow will be updated accordingly. > + </p> > + > <h3>Egress Table 1: UNDNAT</h3> > > <p> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > index 093e01c6d..0fa9272bc 100644 > --- a/tests/ovn-northd.at > +++ b/tests/ovn-northd.at > @@ -5028,6 +5028,7 @@ AT_CHECK([grep "lr_in_dnat" lr0flows | sort], [0], [dnl > > AT_CHECK([grep "lr_out_chk_dnat_local" lr0flows | sed 's/table=./table=?/' | > sort], [0], [dnl > table=? (lr_out_chk_dnat_local), priority=0 , match=(1), > action=(reg9[[4]] = 0; next;) > + table=? (lr_out_chk_dnat_local), priority=50 , match=(ip && > ct_mark.natted == 1), action=(reg9[[4]] = 1; next;) > table=? (lr_out_chk_dnat_local), priority=50 , match=(ip && ip4.dst == > 172.168.0.10 && is_chassis_resident("cr-lr0-public")), action=(reg9[[4]] = 1; > next;) > table=? (lr_out_chk_dnat_local), priority=50 , match=(ip && ip4.dst == > 172.168.0.20 && is_chassis_resident("cr-lr0-public")), action=(reg9[[4]] = 1; > next;) > table=? (lr_out_chk_dnat_local), priority=50 , match=(ip && ip4.dst == > 172.168.0.30 && is_chassis_resident("cr-lr0-public")), action=(reg9[[4]] = 1; > next;) > @@ -5102,6 +5103,7 @@ AT_CHECK([grep "lr_in_dnat" lr0flows | sort], [0], [dnl > > AT_CHECK([grep "lr_out_chk_dnat_local" lr0flows | sed 's/table=./table=?/' | > sort], [0], [dnl > table=? (lr_out_chk_dnat_local), priority=0 , match=(1), > action=(reg9[[4]] = 0; next;) > + table=? (lr_out_chk_dnat_local), priority=50 , match=(ip && > ct_mark.natted == 1), action=(reg9[[4]] = 1; next;) > table=? (lr_out_chk_dnat_local), priority=50 , match=(ip && ip4.dst == > 172.168.0.10 && is_chassis_resident("cr-lr0-public")), action=(reg9[[4]] = 1; > next;) > table=? (lr_out_chk_dnat_local), priority=50 , match=(ip && ip4.dst == > 172.168.0.20 && is_chassis_resident("cr-lr0-public")), action=(reg9[[4]] = 1; > next;) > table=? (lr_out_chk_dnat_local), priority=50 , match=(ip && ip4.dst == > 172.168.0.30 && is_chassis_resident("cr-lr0-public")), action=(reg9[[4]] = 1; > next;) > @@ -5170,6 +5172,7 @@ AT_CHECK([grep "lr_in_dnat" lr0flows | sort], [0], [dnl > > AT_CHECK([grep "lr_out_chk_dnat_local" lr0flows | sed 's/table=./table=?/' | > sort], [0], [dnl > table=? (lr_out_chk_dnat_local), priority=0 , match=(1), > action=(reg9[[4]] = 0; next;) > + table=? (lr_out_chk_dnat_local), priority=50 , match=(ip && > ct_mark.natted == 1), action=(reg9[[4]] = 1; next;) > ]) > > AT_CHECK([grep "lr_out_undnat" lr0flows | sed 's/table=./table=?/' | sort], > [0], [dnl > @@ -5230,6 +5233,7 @@ AT_CHECK([grep "lr_in_dnat" lr0flows | sort], [0], [dnl > > AT_CHECK([grep "lr_out_chk_dnat_local" lr0flows | sed 's/table=./table=?/' | > sort], [0], [dnl > table=? (lr_out_chk_dnat_local), priority=0 , match=(1), > action=(reg9[[4]] = 0; next;) > + table=? (lr_out_chk_dnat_local), priority=50 , match=(ip && > ct_mark.natted == 1), action=(reg9[[4]] = 1; next;) > ]) > > AT_CHECK([grep "lr_out_undnat" lr0flows | sed 's/table=./table=?/' | sort], > [0], [dnl > @@ -5295,6 +5299,7 @@ AT_CHECK([grep "lr_in_dnat" lr0flows | sort], [0], [dnl > > AT_CHECK([grep "lr_out_chk_dnat_local" lr0flows | sed 's/table=./table=?/' | > sort], [0], [dnl > table=? (lr_out_chk_dnat_local), priority=0 , match=(1), > action=(reg9[[4]] = 0; next;) > + table=? (lr_out_chk_dnat_local), priority=50 , match=(ip && > ct_mark.natted == 1), action=(reg9[[4]] = 1; next;) > ]) > > AT_CHECK([grep "lr_out_undnat" lr0flows | sed 's/table=./table=?/' | sort], > [0], [dnl > @@ -5373,6 +5378,7 @@ AT_CHECK([grep "lr_in_dnat" lr0flows | sort], [0], [dnl > > AT_CHECK([grep "lr_out_chk_dnat_local" lr0flows | sed 's/table=./table=?/' | > sort], [0], [dnl > table=? (lr_out_chk_dnat_local), priority=0 , match=(1), > action=(reg9[[4]] = 0; next;) > + table=? (lr_out_chk_dnat_local), priority=50 , match=(ip && > ct_mark.natted == 1), action=(reg9[[4]] = 1; next;) > ]) > > AT_CHECK([grep "lr_out_undnat" lr0flows | sed 's/table=./table=?/' | sort], > [0], [dnl > @@ -6138,7 +6144,6 @@ AT_CHECK([grep -e "(lr_in_ip_routing ).*outport" > lr0flows | sed 's/table=../ta > ]) > > AT_CLEANUP > -]) > > OVN_FOR_EACH_NORTHD([ > AT_SETUP([check exclude-lb-vips-from-garp option]) > diff --git a/tests/system-ovn.at b/tests/system-ovn.at > index 8acfb3e39..e051539fe 100644 > --- a/tests/system-ovn.at > +++ b/tests/system-ovn.at > @@ -8343,3 +8343,120 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port > patch-.*/d > > AT_CLEANUP > ]) > + > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([SNAT in separate zone from DNAT]) > + > +CHECK_CONNTRACK() > +CHECK_CONNTRACK_NAT() > +ovn_start > +OVS_TRAFFIC_VSWITCHD_START() > +ADD_BR([br-int]) > + > +# Set external-ids in br-int needed for ovn-controller > +ovs-vsctl \ > + -- set Open_vSwitch . external-ids:system-id=hv1 \ > + -- set Open_vSwitch . > external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ > + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ > + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ > + -- set bridge br-int fail-mode=secure > other-config:disable-in-band=true > + > +# The goal of this test is to ensure that when traffic is first DNATted > +# (by way of a load balancer), and then SNATted, the SNAT happens in a > +# separate conntrack zone from the DNAT. > + > +start_daemon ovn-controller > + > +check ovn-nbctl ls-add public > + > +check ovn-nbctl lr-add r1 > +check ovn-nbctl lrp-add r1 r1_public 00:de:ad:ff:00:01 172.16.0.1/16 > +check ovn-nbctl lrp-add r1 r1_s1 00:de:ad:fe:00:01 173.0.1.1/24 > +check ovn-nbctl lrp-set-gateway-chassis r1_public hv1 > + > +check ovn-nbctl lb-add r1_lb 30.0.0.1 172.16.0.102 > +check ovn-nbctl lr-lb-add r1 r1_lb > + > +check ovn-nbctl ls-add s1 > +check ovn-nbctl lsp-add s1 s1_r1 > +check ovn-nbctl lsp-set-type s1_r1 router > +check ovn-nbctl lsp-set-addresses s1_r1 router > +check ovn-nbctl lsp-set-options s1_r1 router-port=r1_s1 > + > +check ovn-nbctl lsp-add s1 vm1 > +check ovn-nbctl lsp-set-addresses vm1 "00:de:ad:01:00:01 173.0.1.2" > + > +check ovn-nbctl lsp-add public public_r1 > +check ovn-nbctl lsp-set-type public_r1 router > +check ovn-nbctl lsp-set-addresses public_r1 router > +check ovn-nbctl lsp-set-options public_r1 router-port=r1_public > nat-addresses=router > + > +check ovn-nbctl lr-add r2 > +check ovn-nbctl lrp-add r2 r2_public 00:de:ad:ff:00:02 172.16.0.2/16 > +check ovn-nbctl lrp-add r2 r2_s2 00:de:ad:fe:00:02 173.0.2.1/24 > +check ovn-nbctl lr-nat-add r2 dnat_and_snat 172.16.0.102 173.0.2.2 > +check ovn-nbctl lrp-set-gateway-chassis r2_public hv1 > + > +check ovn-nbctl ls-add s2 > +check ovn-nbctl lsp-add s2 s2_r2 > +check ovn-nbctl lsp-set-type s2_r2 router > +check ovn-nbctl lsp-set-addresses s2_r2 router > +check ovn-nbctl lsp-set-options s2_r2 router-port=r2_s2 > + > +check ovn-nbctl lsp-add s2 vm2 > +check ovn-nbctl lsp-set-addresses vm2 "00:de:ad:01:00:02 173.0.2.2" > + > +check ovn-nbctl lsp-add public public_r2 > +check ovn-nbctl lsp-set-type public_r2 router > +check ovn-nbctl lsp-set-addresses public_r2 router > +check ovn-nbctl lsp-set-options public_r2 router-port=r2_public > nat-addresses=router > + > +ADD_NAMESPACES(vm1) > +ADD_VETH(vm1, vm1, br-int, "173.0.1.2/24", "00:de:ad:01:00:01", \ > + "173.0.1.1") > +ADD_NAMESPACES(vm2) > +ADD_VETH(vm2, vm2, br-int, "173.0.2.2/24", "00:de:ad:01:00:02", \ > + "173.0.2.1") > + > +check ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 > 00:00:00:01:02:03 > +check ovn-nbctl --wait=hv sync > + > +# Next, make sure that a ping works as expected > +NS_CHECK_EXEC([vm1], [ping -q -c 3 -i 0.3 -w 2 30.0.0.1 | FORMAT_PING], \ > +[0], [dnl > +3 packets transmitted, 3 received, 0% packet loss, time 0ms > +]) > + > +# Finally, make sure that conntrack shows two separate zones being used for > +# DNAT and SNAT > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.1) | \ > +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl > +icmp,orig=(src=173.0.1.2,dst=30.0.0.1,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=173.0.1.2,id=<cleared>,type=0,code=0),zone=<cleared>,mark=2 > +]) > + > +# The final two entries appear identical here. That is because FORMAT_CT > +# scrubs the zone numbers. In actuality, the zone numbers are different, > +# which is why there are two entries. > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.0.102) | \ > +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl > +icmp,orig=(src=172.16.0.101,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=173.0.2.2,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared> > +icmp,orig=(src=173.0.1.2,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared> > +icmp,orig=(src=173.0.1.2,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared> > +]) > + > +OVS_APP_EXIT_AND_WAIT([ovn-controller]) > + > +as ovn-sb > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > + > +as ovn-nb > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > + > +as northd > +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) > + > +as > +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d > +/connection dropped.*/d"]) > +AT_CLEANUP > +]) > -- > 2.37.2 > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
