Thanks for the review Numan.

On 11/16/20 4:19 AM, Numan Siddique wrote:
On Thu, Nov 12, 2020 at 8:26 PM Mark Michelson <[email protected]> wrote:

In certain situations, OVN may coexist with other applications on a
host. Traffic from OVN and the other applications may then go out a
shared gateway. If OVN traffic and the other application traffic use
different conntrack zones for SNAT, then it is possible for the shared
gateway to assign conflicting source IP:port combinations. By sharing
the same conntrack zone, there will be no conflicting assignments.

In this commit, we introduce options:snat-ct-zone for northbound logical
routers. By setting this option, users can explicitly set the conntrack
zone for the logical router so that it will match the zone used by
non-OVN traffic on the host.

The biggest side effects of this patch are:
1) southbound datapath changes now result in recalculating CT zones in
    ovn-controller. This can result in recomputing physical flows in more
    situations than previously.
2) The table 65 flow to transition between datapaths is no longer
    associated with a port binding. This is because the flow refers to
    the peer datapath's CT zones, which can now be updated due to changes
    on that datapath. The flow therefore may need to be updated either
    due to the port binding being changed or the peer datapath being
    changed.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1892311
Signed-off-by: Mark Michelson <[email protected]>

Hi Mark,

Thanks for the patch.

I did some testing with the below logical resources

******
switch 0ff55345-e283-4ed7-8c7f-8475a8a8f968 (sw1)
     port sw1-lr0
         type: router
         addresses: ["00:00:00:00:ff:02"]
         router-port: lr0-sw1
     port sw1-port2
         addresses: ["40:54:00:00:00:04 20.0.0.4"]
switch bddcdd29-9b32-482a-b9a6-0f9fa1a0e25a (sw0)
     port sw0-lr0
         type: router
         addresses: ["00:00:00:00:ff:01"]
         router-port: lr0-sw0
     port sw0-port1
         addresses: ["50:54:00:00:00:03 10.0.0.3"]
     port sw0-lr1
         type: router
         router-port: lr1-sw0
router 9462469c-d799-4d28-bccd-80e9a8c68e24 (lr1)
     port lr1-sw0
         mac: "00:00:00:00:fa:21"
         networks: ["10.0.0.20/24"]
router 96b14269-65ad-4ae9-9fa6-e4ef2a344a54 (lr0)
     port lr0-sw1
         mac: "00:00:00:00:ff:02"
         networks: ["20.0.0.1/24"]
     port lr0-public
         mac: "00:00:20:20:12:13"
         networks: ["172.168.0.100/24"]
         gateway chassis: [chassis-1]
*****


If I set - "ovn-nbctl set logical_router lr0 options:snat-ct-zone=x"
and If 'x' is already allocated for sw0 snat, then the
both sw0 snat and lr0 snat share the same zone id. Is that expected ?

That is not expected. If you request a zone for lr0, then sw0 should have its zone re-assigned to something different. I can add a test case for this and debug it. Looking at the code, it's not obvious to me why it's failing.


If I set the same zone id 'y' for both lr0 and lr1, then I observed
that both lr0 and lr1 have the same zone id 'y' and I also
see the warning in the ovn-controller log.

Later if I change the zone of lr1 to 'z' and then lr0 to 'z', the lr0
is not updated with 'z'. So lr0 will have 'y' and lr1 will have 'z'.
Seems like it's not consistent.

I wrote the code with the expectation that if duplicate zone assignments were detected, the oldest request would be honored and the newest would be ignored. If you first requested 'y' for lr0 and then later requested 'y' for lr1, the expectation is that when you requested 'y' for lr1, you'd see the warning, and lr1 would be auto-assigned a different zone id. I think the inconsistent behavior may be occurring due to the hashing of the datapaths. The order in which datapaths are visited does not correlate with the order that the datapaths were assigned their zone IDs. So if the newer assignment is visited in the datapath traversal before the older assignment, they'll both get assigned the same zone. But if the newer assignment is visited in the datapath traversal after the older assignment, it works as intended. This could be corrected, but the point may be moot based on discussion below.


Can there be a  possibility that there are 2 shared gateways in one
node and both want to share the same zone id ?

i.e lr0 is associated with one shared gateway and lr1 with another
shared gateway (on the same host) and both want to use the zone id 0 ?

Yes, I guess that could very well be possible. I guess there's no harm in having multiple gateways share the same zone assignment if they both were explicitly configured to use that particular zone assigment. That actually would simplify the code a bit.


Thanks
Numan


---
  controller/ovn-controller.c | 89 +++++++++++++++++++++++++++++++-----
  controller/physical.c       |  2 +-
  lib/ovn-util.c              |  7 +++
  lib/ovn-util.h              |  1 +
  northd/ovn-northd.c         | 10 ++++
  ovn-nb.xml                  |  7 +++
  tests/ovn.at                | 91 +++++++++++++++++++++++++++++++++++++
  7 files changed, 194 insertions(+), 13 deletions(-)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index a06cae3cc..54c8fd2db 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -531,6 +531,21 @@ update_sb_db(struct ovsdb_idl *ovs_idl, struct ovsdb_idl 
*ovnsb_idl,
      }
  }

+static void
+add_pending_ct_zone_entry(struct shash *pending_ct_zones,
+                          enum ct_zone_pending_state state,
+                          int zone, bool add, const char *name)
+{
+    VLOG_DBG("%s ct zone %"PRId32" for '%s'",
+             add ? "assigning" : "removing", zone, name);
+
+    struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending);
+    pending->state = state; /* Skip flushing zone. */
+    pending->zone = zone;
+    pending->add = add;
+    shash_add(pending_ct_zones, name, pending);
+}
+
  static void
  update_ct_zones(const struct sset *lports, const struct hmap *local_datapaths,
                  struct simap *ct_zones, unsigned long *ct_zone_bitmap,
@@ -540,6 +555,7 @@ update_ct_zones(const struct sset *lports, const struct 
hmap *local_datapaths,
      int scan_start = 1;
      const char *user;
      struct sset all_users = SSET_INITIALIZER(&all_users);
+    struct simap req_snat_zones = SIMAP_INITIALIZER(&req_snat_zones);

      SSET_FOR_EACH(user, lports) {
          sset_add(&all_users, user);
@@ -554,6 +570,25 @@ update_ct_zones(const struct sset *lports, const struct 
hmap *local_datapaths,
          char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
          sset_add(&all_users, dnat);
          sset_add(&all_users, snat);
+
+        int req_snat_zone = datapath_snat_ct_zone(ld->datapath);
+        if (req_snat_zone >= 0) {
+            struct simap_node *node;
+            bool collision = false;
+            SIMAP_FOR_EACH (node, &req_snat_zones) {
+                if (node->data == req_snat_zone) {
+                    VLOG_WARN("Datapaths %.*s and " UUID_FMT " request SNAT "
+                              "CT zone %d\n", UUID_LEN, node->name,
+                              UUID_ARGS(&ld->datapath->header_.uuid),
+                              req_snat_zone);
+                    collision = true;
+                    break;
+                }
+            }
+            if (!collision) {
+                simap_put(&req_snat_zones, snat, req_snat_zone);
+            }
+        }
          free(dnat);
          free(snat);
      }
@@ -564,17 +599,51 @@ update_ct_zones(const struct sset *lports, const struct 
hmap *local_datapaths,
              VLOG_DBG("removing ct zone %"PRId32" for '%s'",
                       ct_zone->data, ct_zone->name);

-            struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending);
-            pending->state = CT_ZONE_DB_QUEUED; /* Skip flushing zone. */
-            pending->zone = ct_zone->data;
-            pending->add = false;
-            shash_add(pending_ct_zones, ct_zone->name, pending);
+            add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_DB_QUEUED,
+                                      ct_zone->data, false, ct_zone->name);

              bitmap_set0(ct_zone_bitmap, ct_zone->data);
              simap_delete(ct_zones, ct_zone);
          }
      }

+    /* Prioritize requested CT zones */
+    struct simap_node *snat_req_node;
+    SIMAP_FOR_EACH (snat_req_node, &req_snat_zones) {
+        struct simap_node *node = simap_find(ct_zones, snat_req_node->name);
+        if (node) {
+            if (node->data == snat_req_node->data) {
+                /* Already have this zone reserved */
+                continue;
+            } else {
+                /* Zone has changed for this node. delete old entry */
+                bitmap_set0(ct_zone_bitmap, node->data);
+                simap_delete(ct_zones, node);
+            }
+        } else if (snat_req_node->data > 0 &&
+                   bitmap_is_set(ct_zone_bitmap, snat_req_node->data)) {
+            /* Uh oh. Someone else already has this zone assigned.
+             * We need to find who and remove them from ct_zones so
+             * that they get re-assigned a new zone below
+             */
+            struct simap_node *next;
+            SIMAP_FOR_EACH_SAFE (node, next, ct_zones) {
+                if (node->data == snat_req_node->data) {
+                    simap_delete(ct_zones, node);
+                    break;
+                }
+            }
+        }
+
+        add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED,
+                                  snat_req_node->data, true,
+                                  snat_req_node->name);
+
+        bitmap_set1(ct_zone_bitmap, snat_req_node->data);
+        simap_put(ct_zones, snat_req_node->name, snat_req_node->data);
+    }
+    simap_destroy(&req_snat_zones);
+
      /* xxx This is wasteful to assign a zone to each port--even if no
       * xxx security policy is applied. */

@@ -596,13 +665,8 @@ update_ct_zones(const struct sset *lports, const struct 
hmap *local_datapaths,
          }
          scan_start = zone + 1;

-        VLOG_DBG("assigning ct zone %"PRId32" to '%s'", zone, user);
-
-        struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending);
-        pending->state = CT_ZONE_OF_QUEUED;
-        pending->zone = zone;
-        pending->add = true;
-        shash_add(pending_ct_zones, user, pending);
+        add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED,
+                                  zone, true, user);

          bitmap_set1(ct_zone_bitmap, zone);
          simap_put(ct_zones, user, zone);
@@ -2330,6 +2394,7 @@ main(int argc, char *argv[])

      engine_add_input(&en_ct_zones, &en_ovs_open_vswitch, NULL);
      engine_add_input(&en_ct_zones, &en_ovs_bridge, NULL);
+    engine_add_input(&en_ct_zones, &en_sb_datapath_binding, NULL);
      engine_add_input(&en_ct_zones, &en_runtime_data, NULL);

      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
diff --git a/controller/physical.c b/controller/physical.c
index 1bc2c389b..00c4ca4fd 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -926,7 +926,7 @@ consider_port_binding(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,

          ofctrl_add_flow(flow_table, OFTABLE_LOG_TO_PHY, 100,
                          binding->header_.uuid.parts[0],
-                        &match, ofpacts_p, &binding->header_.uuid);
+                        &match, ofpacts_p, hc_uuid);
          return;
      }

diff --git a/lib/ovn-util.c b/lib/ovn-util.c
index 18aac8da3..f0fb796b4 100644
--- a/lib/ovn-util.c
+++ b/lib/ovn-util.c
@@ -532,6 +532,13 @@ datapath_is_switch(const struct sbrec_datapath_binding 
*ldp)
  {
      return smap_get(&ldp->external_ids, "logical-switch") != NULL;
  }
+
+int
+datapath_snat_ct_zone(const struct sbrec_datapath_binding *dp)
+{
+    return smap_get_int(&dp->external_ids, "snat-ct-zone", -1);
+}
+

  struct tnlid_node {
      struct hmap_node hmap_node;
diff --git a/lib/ovn-util.h b/lib/ovn-util.h
index 3496673b2..ea8226571 100644
--- a/lib/ovn-util.h
+++ b/lib/ovn-util.h
@@ -107,6 +107,7 @@ uint32_t ovn_logical_flow_hash(const struct uuid 
*logical_datapath,
                                 uint16_t priority,
                                 const char *match, const char *actions);
  bool datapath_is_switch(const struct sbrec_datapath_binding *);
+int datapath_snat_ct_zone(const struct sbrec_datapath_binding *ldp);
  void ovn_conn_show(struct unixctl_conn *conn, int argc OVS_UNUSED,
                     const char *argv[] OVS_UNUSED, void *idl_);

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 4d4190cb9..8ce756c8a 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -1179,6 +1179,16 @@ ovn_datapath_update_external_ids(struct ovn_datapath *od)
              smap_add(&ids, "interconn-ts", ts);
          }
      }
+
+    /* Set snat-ct-zone */
+    if (od->nbr) {
+        int nat_default_ct = smap_get_int(&od->nbr->options,
+                                           "snat-ct-zone", -1);
+        if (nat_default_ct >= 0) {
+            smap_add_format(&ids, "snat-ct-zone", "%d", nat_default_ct);
+        }
+    }
+
      sbrec_datapath_binding_set_external_ids(od->sb, &ids);
      smap_destroy(&ids);
  }
diff --git a/ovn-nb.xml b/ovn-nb.xml
index 5e6c6c7a3..619c0e299 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -1961,6 +1961,13 @@
          unique key for each datapath by itself.  However, if it is configured,
          <code>ovn-northd</code> honors the configured value.
        </column>
+      <column name="options" key="snat-ct-zone"
+          type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'>
+        Use the requested conntrack zone for SNAT with this router. This can be
+        useful if egress traffic from the host running OVN comes from both OVN
+        and other sources. This way, OVN and the other sources can make use of
+        the same conntrack zone.
+      </column>
      </group>

      <group title="Common Columns">
diff --git a/tests/ovn.at b/tests/ovn.at
index 378d11921..4f8c1a3e1 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -22686,3 +22686,94 @@ AT_CHECK([test "$encap_rec_mvtep" == 
"$encap_rec_mvtep1"], [0], [])

  OVN_CLEANUP([hv1])
  AT_CLEANUP
+
+AT_SETUP([ovn -- snat default ct zone])
+ovn_start
+
+net_add n1
+sim_add hv1
+ovs-vsctl add-br br-phys
+as hv1
+ovn_attach n1 br-phys 192.168.0.10
+
+ovn-nbctl ls-add sw0
+ovn-nbctl lsp-add sw0 sw0-p1
+ovn-nbctl lsp-set-addresses sw0-p1 "00:00:00:00:00:02 10.0.0.2"
+
+ovn-nbctl lr-add gw_router
+ovn-nbctl set Logical_Router gw_router options:chassis="hv1"
+
+ovn-nbctl lrp-add gw_router gw_router-sw0 00:00:00:00:00:01 10.0.0.1/24
+ovn-nbctl lsp-add sw0 sw0-gw_router
+ovn-nbctl lsp-set-addresses sw0-gw_router router
+ovn-nbctl set Logical_Switch_Port sw0-gw_router type=router \
+                             options:router-port=gw_router-sw0 \
+
+ovn-nbctl lr-nat-add gw_router snat 192.168.0.1 10.0.0.0/24
+
+as hv1 ovs-vsctl -- add-port br-int hv1-vif1 -- \
+    set interface hv1-vif1 external-ids:iface-id=sw0-p1
+
+ovn-nbctl --wait=hv sync
+
+ro_nb_uuid=$(ovn-nbctl get Logical_Router gw_router _uuid)
+sw_nb_uuid=$(ovn-nbctl get Logical_Switch sw0 _uuid)
+ro_sb_uuid=$(ovn-sbctl --bare --columns=_uuid find Datapath_Binding 
external-ids:logical-router=${ro_nb_uuid})
+sw_sb_uuid=$(ovn-sbctl --bare --columns=_uuid find Datapath_Binding 
external-ids:logical-switch=${sw_nb_uuid})
+ro_meta=$(ovn-sbctl get Datapath_Binding ${ro_sb_uuid} tunnel_key)
+ro_meta=$(printf %#x ${ro_meta})
+sw_meta=$(ovn-sbctl get Datapath_Binding ${sw_sb_uuid} tunnel_key)
+sw_meta=$(printf %#x ${sw_meta})
+
+echo "ro_nb_uuid: ${ro_nb_uuid}"
+echo "sw_nb_uuid: ${sw_nb_uuid}"
+echo "ro_sb_uuid: ${ro_sb_uuid}"
+echo "sw_sb_uuid: ${sw_sb_uuid}"
+echo "ro_meta: ${ro_meta}"
+echo "sw_meta: ${sw_meta}"
+
+as hv1
+ovs-vsctl list bridge br-int
+snat_zone=$(printf %#x $(ovs-vsctl get bridge br-int 
external-ids:ct-zone-${ro_sb_uuid}_snat | tr -d \"))
+
+echo "snat_zone: ${snat_zone}"
+
+as hv1 ovs-ofctl dump-flows br-int > offlows_pre
+AT_CAPTURE_FILE([offlows_pre])
+# We should have a flow in table 33 that transitions from the ingress pipeline
+# to the egress pipeline of gw_router.
+AT_CHECK_UNQUOTED([grep -c 
"table=33.*metadata=${ro_meta}.*load:${snat_zone}->NXM_NX_REG12[]" 
offlows_pre], [0], [dnl
+1
+])
+
+# We should have a flow in table 65 that transitions from the egress pipeline
+# of sw0 to the ingress pipeline of gw_router.
+AT_CHECK_UNQUOTED([grep -c 
"table=65.*metadata=${sw_meta}.*load:${snat_zone}->NXM_NX_REG12[]" 
offlows_pre], [0], [dnl
+1
+])
+
+ovn-nbctl --wait=hv set Logical_Router gw_router options:snat-ct-zone=666
+
+as hv1
+snat_zone=$(ovs-vsctl get bridge br-int external-ids:ct-zone-${ro_sb_uuid}_snat | 
tr -d \")
+
+echo "snat_zone: ${snat_zone}"
+
+AT_CHECK([test "${snat_zone}" = "666"], [0], [])
+
+as hv1 ovs-ofctl dump-flows br-int > offlows_post
+AT_CAPTURE_FILE([offlows_post])
+# We should have a flow in table 33 that transitions from the ingress pipeline
+# to the egress pipeline of gw_router.
+AT_CHECK_UNQUOTED([grep -c 
"table=33.*metadata=${ro_meta}.*load:0x29a->NXM_NX_REG12[]" offlows_post], [0], 
[dnl
+1
+])
+
+# We should have a flow in table 65 that transitions from the egress pipeline
+# of sw0 to the ingress pipeline of gw_router.
+AT_CHECK_UNQUOTED([grep -c 
"table=65.*metadata=${sw_meta}.*load:0x29a->NXM_NX_REG12[]" offlows_post], [0], 
[dnl
+1
+])
+
+OVN_CLEANUP([hv1])
+AT_CLEANUP
--
2.25.4

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev



_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to