On 4/12/23 07:56, Ales Musil wrote:
> There are essentially three problems with the current
> combination of DGP + SNAT + LB:
> 
> 1) The first packet is being SNATed in common zone due
> to a problem with pinctrl not preserving ct_mark/ct_label.
> The commit would create a SNAT entry within the same with DNAT
> entry.
> 
> 2) The unSNAT for reply always happened in common zone because of
> the loopback check which would be triggered only when we loop
> the packet through the LR. Now there are two possibilities how
> the reply packet would be handled:
> 
> a) If the entry for SNAT in common zone did not time out yet, the
> packet would be passed through unSNAT in common zone which would
> be fine and continue on. However, the unDNAT wouldn't work due to
> the limitation of CT not capable of doing unSNAT/unDNAT on the same
> packet twice in the same zone. So the reply would arrive to
> the original interface, but with wrong source address.
> 
> b) If the entry for SNAT timed out it would loop and do unSNAT correctly
> in separate zone and then also unDNAT. This is not possible anymore with
> a recent change 8c341b9d (northd: Drop packets destined to router owned NAT 
> IP for DGP).
> The reply would be dropped before looping after that change co the traffic
> would never arrive to the original interface.
> 
> 3) The unDNAT was happening only if the DGP was outport meaning
> the reply traffic was routed out, but in the opposite case
> the unDNAT was happening only because of the looping which made
> outport=inport. That's why it worked before introduction of explicit drop.
> 
> In order to fix all those issues do two changes:
> 
> 1) Include inport in the unDNAT match, so that we have both
> routing directions covered e.g. (inport == "dgp_port" || outport == 
> "dpg_port").
> 
> 2) Always use the separate zone for SNAT and DNAT. As the common
> zone was needed for HWOL make the common zone optional with
> configuration option called "use_common_zone". This option is
> by default "false" and can be specified per LR. Use of separate
> zones also eliminates the need for the flag propagation
> in "lr_out_chk_dnat_local" stage, removing the match on ct_mark/ct_label.
> 
> The "SNAT in separate zone from DNAT" system test is moved to run only
> with kernel datapath. The reason is that this test doesn't work with
> userspace datapath due to recirculation limit, currently set to 6 [0].
> 
> [0]https://github.com/openvswitch/ovs/blob/9d840923d32124fe427de76e8234c49d64e4bb77/lib/dpif-netdev.c#L102
> Reported-at: https://bugzilla.redhat.com/2161281
> Signed-off-by: Ales Musil <[email protected]>
> ---
> v2: Fix flaky system test.
> v3: Rebase on top of current main.
> v4: Rebase on top of current main.
>     Move the system test to system-ovn-kmod
>     to unblock the failures with userspace.
> v5: Rebase on top of current main.
> v6: Rebase on top of current main.
>     Change the config to a global option instead.
> v7: Address comments from Dumitru:
>     Rename the stateless helper to better reflect
>     that it applies only to snat_and_dnat type.
>     Call the common zone and separate zone functions
>     separately based on condition.
> v8: Address comments from Ilya and Dumitru.
>     Change the helper for common zone.
>     Avoid ds_clone.
> v9: Rebase on top of current main.
>     Address comments from Dumitru.
>     Split the function even further for three
>     possible cases: stateless, common zone, separate zone.
> ---
>  northd/northd.c          | 621 +++++++++++++++++++++++----------------
>  northd/ovn-northd.8.xml  |  90 +-----
>  ovn-nb.xml               |   9 +
>  tests/ovn-northd.at      | 237 ++++++++++-----
>  tests/ovn.at             |   3 +
>  tests/system-ovn-kmod.at | 166 +++++++++++
>  tests/system-ovn.at      | 117 --------
>  7 files changed, 721 insertions(+), 522 deletions(-)
> 
> diff --git a/northd/northd.c b/northd/northd.c
> index da3c8cf77..8b8f411df 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -66,6 +66,9 @@ static bool check_lsp_is_up;
>  
>  static bool install_ls_lb_from_router;
>  
> +/* Use common zone for SNAT and DNAT if this option is set to "true". */
> +static bool use_common_zone = false;
> +
>  /* MAC allocated for service monitor usage. Just one mac is allocated
>   * for this purpose and ovn-controller's on each chassis will make use
>   * of this mac when sending out the packets to monitor the services
> @@ -10657,11 +10660,19 @@ struct lrouter_nat_lb_flows_ctx {
>      const struct shash *meter_groups;
>  };
>  
> +static inline bool
> +lrouter_use_common_zone(const struct ovn_datapath *od)
> +{
> +    return !od->is_gw_router && use_common_zone;
> +}
> +
>  static void
>  build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx *ctx,
>                                       enum lrouter_nat_lb_flow_type type,
>                                       struct ovn_datapath *od)
>  {
> +    struct ovn_port *dgp = od->l3dgw_ports[0];
> +
>      const char *undnat_action;
>  
>      switch (type) {
> @@ -10673,9 +10684,12 @@ build_distr_lrouter_nat_flows_for_lb(struct 
> lrouter_nat_lb_flows_ctx *ctx,
>          break;
>      case LROUTER_NAT_LB_FLOW_NORMAL:
>      case LROUTER_NAT_LB_FLOW_MAX:
> -        undnat_action = od->is_gw_router ? "ct_dnat;" : "ct_dnat_in_czone;";
> +        undnat_action = lrouter_use_common_zone(od)
> +                        ? "ct_dnat_in_czone;"
> +                        : "ct_dnat;";
>          break;
>      }
> +
>      /* Store the match lengths, so we can reuse the ds buffer. */
>      size_t new_match_len = ctx->new_match->length;
>      size_t undnat_match_len = ctx->undnat_match->length;
> @@ -10702,10 +10716,9 @@ build_distr_lrouter_nat_flows_for_lb(struct 
> lrouter_nat_lb_flows_ctx *ctx,
>          return;
>      }
>  
> -    ds_put_format(ctx->undnat_match,
> -                  ") && outport == %s && is_chassis_resident(%s)",
> -                  od->l3dgw_ports[0]->json_key,
> -                  od->l3dgw_ports[0]->cr_port->json_key);
> +    ds_put_format(ctx->undnat_match, ") && (inport == %s || outport == %s)"
> +                  " && is_chassis_resident(%s)", dgp->json_key, 
> dgp->json_key,
> +                  dgp->cr_port->json_key);
>      ovn_lflow_add_with_hint(ctx->lflows, od, S_ROUTER_OUT_UNDNAT, 120,
>                              ds_cstr(ctx->undnat_match), undnat_action,
>                              &ctx->lb->nlb->header_);
> @@ -13649,84 +13662,101 @@ build_lrouter_ipv4_ip_input(struct ovn_port *op,
>  }
>  
>  static void
> -build_lrouter_in_unsnat_flow(struct hmap *lflows, struct ovn_datapath *od,
> -                             const struct nbrec_nat *nat, struct ds *match,
> -                             struct ds *actions, bool distributed, bool 
> is_v6,
> -                             struct ovn_port *l3dgw_port)
> +build_lrouter_in_unsnat_match(struct ovn_datapath *od,
> +                              const struct nbrec_nat *nat, struct ds *match,
> +                              bool distributed, bool is_v6,
> +                              struct ovn_port *l3dgw_port)
>  {
> -    /* Ingress UNSNAT table: It is for already established connections'
> -    * reverse traffic. i.e., SNAT has already been done in egress
> -    * pipeline and now the packet has entered the ingress pipeline as
> -    * part of a reply. We undo the SNAT here.
> -    *
> -    * Undoing SNAT has to happen before DNAT processing.  This is
> -    * because when the packet was DNATed in ingress pipeline, it did
> -    * not know about the possibility of eventual additional SNAT in
> -    * egress pipeline. */
> -    if (strcmp(nat->type, "snat") && strcmp(nat->type, "dnat_and_snat")) {
> -        return;
> -    }
> +    ds_clear(match);
>  
> -    bool stateless = lrouter_dnat_and_snat_is_stateless(nat);
> -    if (od->is_gw_router) {
> -        ds_clear(match);
> -        ds_clear(actions);
> -        ds_put_format(match, "ip && ip%s.dst == %s",
> -                      is_v6 ? "6" : "4", nat->external_ip);
> -        if (stateless) {
> -            ds_put_format(actions, "next;");
> -        } else {
> -            ds_put_cstr(actions, "ct_snat;");
> -        }
> +    ds_put_format(match, "ip && ip%c.dst == %s",
> +                  is_v6 ? '6' : '4', nat->external_ip);
>  
> -        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
> -                                90, ds_cstr(match), ds_cstr(actions),
> -                                &nat->header_);
> -    } else {
> +    if (!od->is_gw_router) {
>          /* Distributed router. */
>  
>          /* Traffic received on l3dgw_port is subject to NAT. */
> -        ds_clear(match);
> -        ds_clear(actions);
> -        ds_put_format(match, "ip && ip%s.dst == %s && inport == %s && "
> -                      "flags.loopback == 0", is_v6 ? "6" : "4",
> -                      nat->external_ip, l3dgw_port->json_key);
> +        ds_put_format(match, " && inport == %s", l3dgw_port->json_key);
> +
>          if (!distributed && od->n_l3dgw_ports) {
>              /* Flows for NAT rules that are centralized are only
> -            * programmed on the gateway chassis. */
> +             * programmed on the gateway chassis. */
>              ds_put_format(match, " && is_chassis_resident(%s)",
>                            l3dgw_port->cr_port->json_key);
>          }
> +    }
> +}
>  
> -        if (stateless) {
> -            ds_put_format(actions, "next;");
> -        } else {
> -            ds_put_cstr(actions, "ct_snat_in_czone;");
> -        }
> +static void
> +build_lrouter_in_unsnat_stateless_flow(struct hmap *lflows,
> +                                       struct ovn_datapath *od,
> +                                       const struct nbrec_nat *nat,
> +                                       struct ds *match,
> +                                       bool distributed, bool is_v6,
> +                                       struct ovn_port *l3dgw_port)
> +{
> +    if (strcmp(nat->type, "snat") && strcmp(nat->type, "dnat_and_snat")) {
> +        return;
> +    }
>  
> -        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
> -                                100, ds_cstr(match), ds_cstr(actions),
> -                                &nat->header_);
> +    uint16_t priority = od->is_gw_router ? 90 : 100;
>  
> -        if (!stateless) {
> -            ds_clear(match);
> -            ds_clear(actions);
> -            ds_put_format(match, "ip && ip%s.dst == %s && inport == %s && "
> -                          "flags.loopback == 1 && flags.use_snat_zone == 1",
> -                          is_v6 ? "6" : "4", nat->external_ip,
> -                          l3dgw_port->json_key);
> -            if (!distributed && od->n_l3dgw_ports) {
> -                /* Flows for NAT rules that are centralized are only
> -                * programmed on the gateway chassis. */
> -                ds_put_format(match, " && is_chassis_resident(%s)",
> -                            l3dgw_port->cr_port->json_key);
> -            }
> -            ds_put_cstr(actions, "ct_snat;");
> -            ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
> -                                    100, ds_cstr(match), ds_cstr(actions),
> -                                    &nat->header_);
> -        }
> +    build_lrouter_in_unsnat_match(od, nat, match, distributed, is_v6,
> +                                  l3dgw_port);
> +
> +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
> +                            priority, ds_cstr(match), "next;",
> +                            &nat->header_);
> +}
> +
> +static void
> +build_lrouter_in_unsnat_in_czone_flow(struct hmap *lflows,
> +                                      struct ovn_datapath *od,
> +                                      const struct nbrec_nat *nat,
> +                                      struct ds *match, bool distributed,
> +                                      bool is_v6, struct ovn_port 
> *l3dgw_port)
> +{
> +    if (strcmp(nat->type, "snat") && strcmp(nat->type, "dnat_and_snat")) {
> +        return;
>      }
> +
> +    build_lrouter_in_unsnat_match(od, nat, match, distributed, is_v6,
> +                                  l3dgw_port);
> +
> +    ds_put_cstr(match, " && flags.loopback == 0");
> +
> +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
> +                            100, ds_cstr(match), "ct_snat_in_czone;",
> +                            &nat->header_);
> +
> +    ds_chomp(match, '0');

I missed this in v8.  I think this obfuscates a bit the code.  Would you
be ok with the following incremental (I can squash it in when applying
the patch)?

diff --git a/northd/northd.c b/northd/northd.c
index 8d46358338..3854a2852f 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -13740,15 +13740,20 @@ build_lrouter_in_unsnat_in_czone_flow(struct hmap 
*lflows,
     build_lrouter_in_unsnat_match(od, nat, match, distributed, is_v6,
                                   l3dgw_port);
 
-    ds_put_cstr(match, " && flags.loopback == 0");
+    /* We're adding two flows: one matching on "M1 && flags.loopback == 0" and
+     * the second one matching on "M && flags.loopback == 1 && M2".
+     * Reuse the common part of the match string.
+     */
+    size_t common_match_len = match->length;
 
+    ds_put_cstr(match, " && flags.loopback == 0");
     ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
                             100, ds_cstr(match), "ct_snat_in_czone;",
                             &nat->header_);
 
-    ds_chomp(match, '0');
+    ds_truncate(match, common_match_len);
     /* Update common zone match for the hairpin traffic. */
-    ds_put_cstr(match, "1 && flags.use_snat_zone == 1");
+    ds_put_cstr(match, " && flags.loopback == 1 && flags.use_snat_zone == 1");
 
     ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT,
                             100, ds_cstr(match), "ct_snat;",
---

The rest looks OK to me.

Regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to