Re: [ovs-dev] [RFC ovn] northd.c: Fix direct access to SNATed networks.

Mark Michelson Wed, 07 Feb 2024 16:22:26 -0800

Hi Martin,

Thanks a bunch for your first OVN contribution. I have comments below.


On 2/7/24 10:56, Martin Kalcok wrote:

When a machine from an external network tries to access machine in the
OVN's internal network with SNAT enabled, using protocol like TCP, it
receives connection reset after first packets are successfully exchanged.

This setup may be a bit unusual and requires that whatever is responsible
for routing in the external network, routes packets for OVN's internal
network via LR's external port IP. However it is something that used to
work in ML2/OVS Openstack deployments and got broken after upgrade to
ML2/OVN.

To demonstrate what the traffic looked like from the point of view of the
external machine, lets say we have:
   * E  (IP of machine in the external network)
   * I  (IP of machine in OVN's internal network)
   * LR (IP of OVN's Logical Router Port connected to external network)

The traffic looks like this:
   * [SYN]      E  -> I
   * [SYN,ACK]  I  -> E
   * [ACK]      E  -> I
   * [PSH,ACK]  E  -> I
   * [ACK]      LR -> E

Connection gets reseted after it receives ACK from an unexpected IP (LR).

Although I'm not completely sure why the first [SYN,ACK] reply got back
without SNAT, root cause of this issue is that traffic initiated from
the external network is not tracked in CT. This causes even traffic that's
in the "reply" direction (i.e. it was initiated from outside the SNAT
network), to be translated by Logical Router.

My proposal is to initiate CT and commit "new" connections destined
to the internal network (with enabled SNAT) on Logical Router. This
will enable SNAT rules to translate only traffic that originates from
internal networks.

Signed-off-by: Martin Kalcok <[email protected]>
---
  northd/northd.c | 70 ++++++++++++++++++++++++++++++++++++++++++-------
  1 file changed, 61 insertions(+), 9 deletions(-)

As this would be my first "real" contribution to the OVN, I'd love to hear
some feeback on my approach while I work on performance measurments and
tests, which I plan on including in final proposal.
I also left some inline XXX comments marking parts that I'm not sure about
and need resolving before final proposal.
I tested this change in my local Openstack deployment with distributed gw
chassis and it seems to solve the issue regardless of the VM's placement,
whether it does or does not have a floating IP. It also doesn't seem to
break outbound connectivity, nor connectivity to VM's floating IPs.
Thank you, for any reviews/sugestions.

Without getting into the details of this particular patch, I have somegeneral suggestions for contributing:

1) Before sending a patch, run utilities/checkpatch.py against allpatches in the series. This can help 0-day Robot not to complain aboutcoding guidelines violations.

2) Run the testsuite locally before posting. If you see test errors,double check if your change is what's causing them.

3) As a sanity check, consider pushing the code to your github fork, andensure that github actions complete without error. If there are anyerrors, double check that they're not the fault of github (e.g. it's notyour fault if the action fails because the machine runs out of diskspace while allocating pods). In this case, The OVS robot links tohttps://github.com/ovsrobot/ovn/actions/runs/7817861729/job/21326755181, where there are some compilation errors reported. I've pointed out theoffending lines in the diff below.


4) Consider adding test cases. In this case, I suggest adding a couple:

* First, you can add one to ovn-northd.at that strictly tests whetherthe generated logical flows for SNAT are as expected. It would also begood to ensure that the flows are updated properly when the SNATconfiguration changes on the logical router.* Second, I would add a test to system-ovn.at that simulates thebroken scenario. It could ensure that packets originated internally areSNATted, and that replies to packets originated externally are not SNATted.


diff --git a/northd/northd.c b/northd/northd.c
index 01eec64ca..b695932fd 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -10772,6 +10772,12 @@ build_ecmp_route_flow(struct lflow_table *lflows, 
struct ovn_datapath *od,
      ds_destroy(&actions);
  }

+static inline bool

+lrouter_use_common_zone(const struct ovn_datapath *od)
+{
+    return !od->is_gw_router && use_common_zone;
+}
+
  static void
  add_route(struct lflow_table *lflows, struct ovn_datapath *od,
            const struct ovn_port *op, const char *lrp_addr_s,
@@ -10796,6 +10802,9 @@ add_route(struct lflow_table *lflows, struct 
ovn_datapath *od,

I'm not sure that add_route() is a good fit for this new code.add_route() is called for every static route, logical router port IP,load balancer VIP, and NAT external address. It doesn't really makesense to be iterating through the router's NATs when adding a staticroute, for instance. Further, this has the potential to attempt to addduplicate flows since a router can (and likely will) have add_route()called multiple times for its addresses.

If you need to add new flows related to the configured NAT rules on arouter, I'd suggest looking for an appropriate place inbuild_lrouter_nat_defrag_and_lb() or the functions it calls. We iterateover the configured NAT rules there, so it's a natural fit for thesetypes of changes.

      build_route_match(op_inport, rtb_id, network_s, plen, is_src_route,
                        is_ipv4, &match, &priority, ofs);

+ bool use_ct = false;

+    struct ds target_net = DS_EMPTY_INITIALIZER;
+    struct ds snat_port_match = DS_EMPTY_INITIALIZER;
      struct ds common_actions = DS_EMPTY_INITIALIZER;
      struct ds actions = DS_EMPTY_INITIALIZER;
      if (is_discard_route) {
@@ -10808,16 +10817,61 @@ add_route(struct lflow_table *lflows, struct 
ovn_datapath *od,
          } else {
              ds_put_format(&common_actions, "ip%s.dst", is_ipv4 ? "4" : "6");
          }
+
+        /* To enable direct connectivity from external networks to Logical IPs
+         * of VMs/Containers in the SNATed networks, we need to track
+         * connections initiated from external networks.  This allows SNAT
+         * rules to avoid SNATing packets in "reply" direction. */
+        if (od->nbr && od->nbr->n_nat) {
+            ds_put_format(&target_net, "%s/%d", network_s, plen);
+            for (int i = 0; i < od->nbr->n_nat; i++) {
+                const struct nbrec_nat *nat = od->nbr->nat[i];
+                if (strcmp(nat->type, "snat") ||
+                    strcmp(nat->logical_ip, ds_cstr(&target_net)) ||

We allow IPv6 addresses for NAT. strcmp isn't a good way to compare IPv6addresses since the same address can have many string representations.

+                    lrouter_use_common_zone(od)) {
+                    continue;
+                }
+
+                /* Initiate connection tracking for traffic destined to
+                 * SNATed network. */
+                use_ct = true;
+
+                /* XXX: Need to figure out what is appropraite priority for 
these rules.
+                        All they require is to be after any existing specific 
rules
+                        and before the default rule.*/

There aren't any hard and fast rules for picking a priority number,other than what you stated in this comment. As long as it gets the jobdone, you can pick any appropriate priority number. It's best to try toleave some room for new priorities to be inserted in case new rulesarise, but other than that, pick whatever you want.

+
+                /* Commit new connections initiated from the outside of
+                 * SNATed network to CT. */
+                ds_put_format(&snat_port_match,
+                              "outport == %s && ct.new",
+                              op->json_key);
+                ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_ARP_REQUEST, 5,
+                                        ds_cstr(&snat_port_match),
+                                        "ct_commit; output;", stage_hint);

I'm not sure I understand the choice of S_ROUTER_IN_ARP_REQUEST forperforming the ct_commit; This doesn't have anything to do with ARPrequests, so it seems like a different table would fit better here.

+                ds_clear(&snat_port_match);
+
+                /* Initiate connection tracking for traffic from SNATed
+                 * network to enable rules in S_ROUTER_OUT_SNAT disregard
+                 * traffic in "reply" direction. */
+                ds_put_format(&snat_port_match, "inport == %s", op->json_key);
+                ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_POST_UNDNAT,
+                                        5, ds_cstr(&snat_port_match),
+                                        "ct_next;", stage_hint);

Your two calls to ovn_lflow_add_with_hint() are missing the finallflow_ref parameter, which is why github actions failed when testing.

+                break;
+            }
+        }
+
          ds_put_format(&common_actions, "; "
                        "%s = %s; "
                        "eth.src = %s; "
                        "outport = %s; "
                        "flags.loopback = 1; "
-                      "next;",
+                      "%s",
                        is_ipv4 ? REG_SRC_IPV4 : REG_SRC_IPV6,
                        lrp_addr_s,
                        op->lrp_networks.ea_s,
-                      op->json_key);
+                      op->json_key,
+                      use_ct ? "ct_next;" : "next;");

I'm a bit confused by this addition. In your scenario, E sends trafficto I, and I sends traffic to E. Traffic is never destined for a LRPaddress, load balancer VIP, or NAT external address. So I don't know whythis flow would be matched in your scenario. Did I miss a detail somewhere?

Even if this is necessary for your scenario, I think this is addingconntrack to situations where it shouldn't be present, such as trafficdestined for static routes.

          ds_put_format(&actions, "ip.ttl--; %s", ds_cstr(&common_actions));
      }

@@ -10833,6 +10887,8 @@ add_route(struct lflow_table *lflows, struct ovn_datapath *od,

                                  ds_cstr(&common_actions),\
                                  stage_hint, lflow_ref);
      }
+    ds_destroy(&snat_port_match);
+    ds_destroy(&target_net);
      ds_destroy(&match);
      ds_destroy(&common_actions);
      ds_destroy(&actions);
@@ -10933,12 +10989,6 @@ struct lrouter_nat_lb_flows_ctx {
      const struct shash *meter_groups;
  };

-static inline bool

-lrouter_use_common_zone(const struct ovn_datapath *od)
-{
-    return !od->is_gw_router && use_common_zone;
-}
-
  static void
  build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx *ctx,
                                       enum lrouter_nat_lb_flow_type type,
@@ -14627,7 +14677,9 @@ build_lrouter_out_snat_flow(struct lflow_table *lflows,
      }
      ds_put_cstr(match, " && (!ct.trk || !ct.rpl)");

- ds_put_format(actions, "ct_snat(%s", nat->external_ip);

+    /* ct_commit is needed here otherwise replies to the SNATed traffic
+     * look like "new" traffic in S_ROUTER_IN_ARP_REQUEST*/
+    ds_put_format(actions, "ct_commit; ct_snat(%s", nat->external_ip);
      if (nat->external_port_range[0]) {
          ds_put_format(actions, ",%s", nat->external_port_range);
      }

I think the reason why this patch works is because it essentially endsup sending everything to conntrack. The problem is that it's likelysending more than is necessary.

If you look in build_lrouter_out_snat_flow(), you can see that we have "&& (!ct.trk || !ct.rpl)" as part of the match before we attempt to SNAT.Therefore, we at least are attempting not to SNAT reply traffic.

As you noted, the [SYN,ACK] sent from "I" in your scenario is notSNATted. This is because the [SYN] from "E" was committed to conntrackin the S_ROUTER_OUT_POST_UNDNAT stage. Therefore, the [SYN,ACK] wasproperly seen as a reply packet, and it was not SNATted.

However, for some reason the final [ACK] from "I" was not seen as areply packet, and therefore it was SNATted. I'm curious if anythingabout the TCP connection changed between the initial handshake and thesubsequent packets that would make conntrack see the traffic as"related" instead of part of the same connection. If so, then I thinkthe trick here is to ensure that ct.rel traffic is committed toconntrack as well.

Ales Musil added a new action type in OVN 22.12 called "ct_commit_nat" .Its intended purpose was to ensure that in addition to the directconnection, related traffic would also be committed to conntrack.Currently, it's only used for DNAT so that ICMP replies from loadbalancer backends will be unDNATted.

It seems we need something similar here, but it needs to be used whentraffic hits the UNSNAT stage of the router pipeline. Maybe it would beenough to extend the ct_commit_nat command to take an optional "snat" or"dnat" parameter to choose the appropriate conntrack zone to commit to.Then ct.rel traffic in the S_ROUTER_IN_UNSNAT stage could callct_commit_nat(snat) instead of ct_snat to ensure that the relatedtraffic is committed to conntrack. I'm not 100% sure if that will do thetrick, but if it does, it will ensure a more focused amount of trafficwill get committed to conntrack. It would also be a much simpler andsmaller patch.

@Ales Musil, did my suggestion above make sense? Or am I completelyapproaching this incorrectly?


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC ovn] northd.c: Fix direct access to SNATed networks.

Reply via email to