On 7/24/20 8:40 AM, Numan Siddique wrote:


On Fri, Jul 24, 2020 at 1:07 AM Mark Michelson <[email protected] <mailto:[email protected]>> wrote:

    When traffic arrives over an ECMP route, there is no guarantee that the
    reply traffic will egress over the same route. Sometimes, the nature of
    the traffic (or the intervening equipment) means that it is important
    for reply traffic to go out the same route it came in.

    This commit introduces optional ECMP symmetric reply behavior. If
    configured, then traffic to or from the ECMP route will be sent to
    conntrack. New incoming traffic over the route will have the source MAC
    address and incoming port saved in the ct_label. Reply traffic then uses
    this saved information to send the packet back out the same way it came
    in.

    To facilitate this, a new table was added to the ingress logical router
    pipeline. The ECMP_STATEFUL table is responsible for committing to
    conntrack and setting the ct_label when it detects new incoming traffic
    from the route.

    Signed-off-by: Mark Michelson <[email protected]
    <mailto:[email protected]>>
    Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1849683


Hi Mark,

There's a checkpatch warning which you want to address:

---
WARNING: Line is 80 characters long (recommended limit is 79)
#228 FILE: northd/ovn-northd.c:7783:
    ds_put_format(&ecmp_reply, "ct.rpl && ct_label.ecmp_reply_port == %" PRId64,

---
Please note that I didn't do a thorough code review.
I tested this patch using ovn-fake-multi node setup and created the OVN resources as it's done in system-ovn.at <http://system-ovn.at>. In my testing, alice1 is claimed by ovn-chassis-2 and bob1 is claimed by ovn-chassis-1.
R2 and R3 are claimed on chassis - ovn-chassis-1.

I see few issues. It's not working as expected. I started nc server listening on port 80 on alice1.

 1. when bob1 connects to alice1 IP, I see that the below logical flow gets hit on ovn-chassis-1 where bob1 resides     table=7 (lr_in_ecmp_stateful), priority=100  , match=(inport == "R1_join" && ip4.dst == 10.0.0.0/24 <http://10.0.0.0/24> && (ct.new && !ct.est)), action=(ct_commit { ct_label.ecmp_reply_eth = eth.src; ct_label.ecmp_reply_port = 2;}; next;)

 2. But for the reply traffic from alice1 to bob1, the below expected logical flow is NOT hit     table=10(lr_in_ip_routing   ), priority=100  , match=(ct.rpl && ct_label.ecmp_reply_port == 2 && ip4.src == 10.0.0.0/24 <http://10.0.0.0/24>), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:04:01:02:03; reg1 = 20.0.0.1; outport = "R1_join"; next;)

3. Instead the normal ecmp select flow is hit in ovn-chassis-2 where alice1 resides.

4. #conntrack -L | grep 172.16.0.1 # on ovn-chassis-1
    tcp      6 117 SYN_SENT src=172.16.0.1 dst=10.0.0.2 sport=43644 dport=80 [UNREPLIED] src=10.0.0.2 dst=172.16.0.1 sport=80 dport=43644 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=6 use=1     #ovs-appctl -t ovs-vswitchd dpctl/dump-conntrack | grep 172.16.0.1 # on ovn-chassis-1
tcp,orig=(src=172.16.0.1,dst=10.0.0.2,sport=43660,dport=80),reply=(src=10.0.0.2,dst=172.16.0.1,sport=80,dport=43660),zone=6,labels=0x2000004010204,protoinfo=(state=SYN_SENT)

The conntrack state is in SYN_SENT state.

I think this needs to be handled properly.


Thanks Numan. I'll alter the test to use TCP traffic instead of ICMP and see how that affects it.


Please see below for a few comments.


    ---
      lib/logical-fields.c      |   4 ++
      northd/ovn-northd.8.xml   |  49 ++++++++++---
      northd/ovn-northd.c       | 119 +++++++++++++++++++++++++++----
      ovn-architecture.7.xml    |   7 +-
      ovn-nb.ovsschema          |   5 +-
      ovn-nb.xml                |  14 ++++
      tests/ovn.at <http://ovn.at>              |  28 ++++----
      tests/system-ovn.at <http://system-ovn.at>       | 144
    ++++++++++++++++++++++++++++++++++++++
      utilities/ovn-nbctl.8.xml |  31 ++++++--
      utilities/ovn-nbctl.c     |  18 ++++-
      10 files changed, 367 insertions(+), 52 deletions(-)

    diff --git a/lib/logical-fields.c b/lib/logical-fields.c
    index 8639523ea..4c129814d 100644
    --- a/lib/logical-fields.c
    +++ b/lib/logical-fields.c
    @@ -127,6 +127,10 @@ ovn_init_symtab(struct shash *symtab)

          expr_symtab_add_field(symtab, "ct_label", MFF_CT_LABEL, NULL,
    false);
          expr_symtab_add_subfield(symtab, "ct_label.blocked", NULL,
    "ct_label[0]");
    +    expr_symtab_add_subfield(symtab, "ct_label.ecmp_reply_eth", NULL,
    +                             "ct_label[0..47]");


We already have ct_labal.blocked mapped to ct_label[0]. In the patch which I
proposed here - [2], it is using ct_label.natted to ct_label[1].

Even though ct_label.blocked is used in the logical switch pipeline, I think for clarity it's better to shift ct_label.ecmp_reply_eth to starting from 32 offset maybe ?

Sure, I have no problem with that.



    +    expr_symtab_add_subfield(symtab, "ct_label.ecmp_reply_port", NULL,
    +                             "ct_label[48..63]");


Same here.

[2] - https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/


Thanks
Numan


          expr_symtab_add_field(symtab, "ct_state", MFF_CT_STATE, NULL,
    false);

    diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
    index eb2514f15..cf251e02a 100644
    --- a/northd/ovn-northd.8.xml
    +++ b/northd/ovn-northd.8.xml
    @@ -2120,15 +2120,31 @@ icmp6 {
          <p>
            This is to send packets to connection tracker for tracking and
            defragmentation.  It contains a priority-0 flow that simply
    moves traffic
    -      to the next table.  If load balancing rules with virtual IP
    addresses
    -      (and ports) are configured in <code>OVN_Northbound</code>
    database for a
    -      Gateway router, a priority-100 flow is added for each
    configured virtual
    -      IP address <var>VIP</var>. For IPv4 <var>VIPs</var> the flow
    matches
    -      <code>ip &amp;&amp; ip4.dst == <var>VIP</var></code>.  For IPv6
    -      <var>VIPs</var>, the flow matches <code>ip &amp;&amp; ip6.dst ==
    -      <var>VIP</var></code>.  The flow uses the action
    <code>ct_next;</code>
    -      to send IP packets to the connection tracker for packet
    de-fragmentation
    -      and tracking before sending it to the next table.
    +      to the next table.
    +    </p>
    +
    +    <p>
    +      If load balancing rules with virtual IP addresses (and ports) are
    +      configured in <code>OVN_Northbound</code> database for a
    Gateway router,
    +      a priority-100 flow is added for each configured virtual IP
    address
    +      <var>VIP</var>. For IPv4 <var>VIPs</var> the flow matches
    <code>ip
    +      &amp;&amp; ip4.dst == <var>VIP</var></code>.  For IPv6
    <var>VIPs</var>,
    +      the flow matches <code>ip &amp;&amp; ip6.dst ==
    <var>VIP</var></code>.
    +      The flow uses the action <code>ct_next;</code> to send IP
    packets to the
    +      connection tracker for packet de-fragmentation and tracking
    before
    +      sending it to the next table.
    +    </p>
    +
    +    <p>
    +      If ECMP routes with symmetric reply are configured in the
    +      <code>OVN_Northbound</code> database for a gateway router, a
    priority-100
    +      flow is added for each router port on which symmetric replies are
    +      configured. The matching logic for these ports essentially
    reverses the
    +      configured logic of the ECMP route. So for instance, a route
    with a
    +      destination routing policy will instead match if the source
    IP address
    +      matches the static route's prefix. The flow uses the action
    +      <code>ct_next</code> to send IP packets to the connection
    tracker for
    +      packet de-fragmentation and tracking before sending it to the
    next table.
          </p>

          <h3>Ingress Table 5: UNSNAT</h3>
    @@ -2489,7 +2505,15 @@ output;
            table.  This table, instead, is responsible for determine
    the ECMP
            group id and select a member id within the group based on
    5-tuple
            hashing.  It stores group id in <code>reg8[0..15]</code> and
    member id in
    -      <code>reg8[16..31]</code>.
    +      <code>reg8[16..31]</code>. This step is skipped if the
    traffic going
    +      out the ECMP route is reply traffic, and the ECMP route was
    configured
    +      to use symmetric replies. Instead, the stored
    <code>ct_label</code> value
    +      is used to choose the destination. The least significant 48
    bits of the
    +      <code>ct_label</code> tell the destination MAC address to
    which the
    +      packet should be sent. The next 16 bits tell the logical
    router port on
    +      which the packet should be sent. These values in the
    +      <code>ct_label</code> are set when the initial ingress traffic is
    +      received over the ECMP route.
          </p>

          <p>
    @@ -2639,6 +2663,11 @@ select(reg8[16..31], <var>MID1</var>,
    <var>MID2</var>, ...);
            address and <code>reg1</code> as the source protocol address).
          </p>

    +    <p>
    +      This processing is skipped for reply traffic being sent out
    of an ECMP
    +      route if the route was configured to use symmetric replies.
    +    </p>
    +
          <p>
            This table contains the following logical flows:
          </p>
    diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
    index d10e5ee5d..06881edd3 100644
    --- a/northd/ovn-northd.c
    +++ b/northd/ovn-northd.c
    @@ -172,16 +172,17 @@ enum ovn_stage {
          PIPELINE_STAGE(ROUTER, IN,  DEFRAG,          4,
    "lr_in_defrag")       \
          PIPELINE_STAGE(ROUTER, IN,  UNSNAT,          5,
    "lr_in_unsnat")       \
     PIPELINE_STAGE(ROUTER, IN,  DNAT,            6, "lr_in_dnat")        \
    -    PIPELINE_STAGE(ROUTER, IN,  ND_RA_OPTIONS,   7,
    "lr_in_nd_ra_options") \
    -    PIPELINE_STAGE(ROUTER, IN,  ND_RA_RESPONSE,  8,
    "lr_in_nd_ra_response") \
    -    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,      9,
    "lr_in_ip_routing")   \
    -    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING_ECMP, 10,
    "lr_in_ip_routing_ecmp") \
    -    PIPELINE_STAGE(ROUTER, IN,  POLICY,          11,
    "lr_in_policy")       \
    -    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE,     12,
    "lr_in_arp_resolve")  \
    -    PIPELINE_STAGE(ROUTER, IN,  CHK_PKT_LEN   ,  13,
    "lr_in_chk_pkt_len")   \
-    PIPELINE_STAGE(ROUTER, IN,  LARGER_PKTS,  14,"lr_in_larger_pkts")   \
    -    PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT,     15,
    "lr_in_gw_redirect")  \
    -    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST,     16,
    "lr_in_arp_request")  \
    +    PIPELINE_STAGE(ROUTER, IN,  ECMP_STATEFUL,   7,
    "lr_in_ecmp_stateful") \
    +    PIPELINE_STAGE(ROUTER, IN,  ND_RA_OPTIONS,   8,
    "lr_in_nd_ra_options") \
    +    PIPELINE_STAGE(ROUTER, IN,  ND_RA_RESPONSE,  9,
    "lr_in_nd_ra_response") \
    +    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,      10,
    "lr_in_ip_routing")   \
    +    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING_ECMP, 11,
    "lr_in_ip_routing_ecmp") \
    +    PIPELINE_STAGE(ROUTER, IN,  POLICY,          12,
    "lr_in_policy")       \
    +    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE,     13,
    "lr_in_arp_resolve")  \
    +    PIPELINE_STAGE(ROUTER, IN,  CHK_PKT_LEN   ,  14,
    "lr_in_chk_pkt_len")   \
+    PIPELINE_STAGE(ROUTER, IN,  LARGER_PKTS,  15,"lr_in_larger_pkts")   \
    +    PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT,     16,
    "lr_in_gw_redirect")  \
    +    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST,     17,
    "lr_in_arp_request")  \
   \      /* Logical router egress stages. */    \      PIPELINE_STAGE(ROUTER, OUT, UNDNAT,    0, "lr_out_undnat")     \
    @@ -7312,6 +7313,7 @@ struct parsed_route {
          bool is_src_route;
          uint32_t hash;
          const struct nbrec_logical_router_static_route *route;
    +    bool ecmp_symmetric_reply;
      };

      static uint32_t
    @@ -7373,6 +7375,8 @@ parsed_routes_add(struct ovs_list *routes,
                                                       "src-ip"));
          pr->hash = route_hash(pr);
          pr->route = route;
    +    pr->ecmp_symmetric_reply = smap_get_bool(&route->options,
+  "ecmp_symmetric_reply", false);
          ovs_list_insert(routes, &pr->list_node);
          return pr;
      }
    @@ -7621,18 +7625,95 @@ find_static_route_outport(struct
    ovn_datapath *od, struct hmap *ports,
          return true;
      }

    +static void
    +add_ecmp_symmetric_reply_flows(struct hmap *lflows,
    +                               struct ovn_datapath *od,
    +                               const char *port_ip,
    +                               struct ovn_port *out_port,
    +                               const struct parsed_route *route,
    +                               struct ds *route_match)
    +{
    +    const struct nbrec_logical_router_static_route *st_route =
    route->route;
    +    struct ds match = DS_EMPTY_INITIALIZER;
    +    struct ds actions = DS_EMPTY_INITIALIZER;
    +    struct ds ecmp_reply = DS_EMPTY_INITIALIZER;
    +    char *cidr = normalize_v46_prefix(&route->prefix, route->plen);
    +
    +    /* If symmetric ECMP replies are enabled, then packets that
    arrive over
    +     * an ECMP route need to go through conntrack.
    +     */
    +    ds_put_format(&match, "inport == %s && ip%s.%s == %s",
    +                  out_port->json_key,
    +                  route->prefix.family == AF_INET ? "4" : "6",
    +                  route->is_src_route ? "dst" : "src",
    +                  cidr);
    +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DEFRAG, 100,
    +                            ds_cstr(&match), "ct_next;",
    +                            &st_route->header_);
    +
    +    /* And packets that go out over an ECMP route need conntrack */
    +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DEFRAG, 100,
    +                            ds_cstr(route_match), "ct_next;",
    +                            &st_route->header_);
    +
    +    /* Save src eth and inport in ct_label for packets that arrive over
    +     * an ECMP route.
    +     *
    +     * NOTE: we purposely are not clearing match before this
    +     * ds_put_cstr() call. The previous contents are needed.
    +     */
    +    ds_put_cstr(&match, " && (ct.new && !ct.est)");
    +
    +    ds_put_format(&actions, "ct_commit { ct_label.ecmp_reply_eth =
    eth.src;"
    +                  " ct_label.ecmp_reply_port = %" PRId64 ";}; next;",
    +                  out_port->sb->tunnel_key);
    +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_ECMP_STATEFUL, 100,
    +                            ds_cstr(&match), ds_cstr(&actions),
    +                            &st_route->header_);
    +
    +    /* Bypass ECMP selection if we already have ct_label information
    +     * for where to route the packet.
    +     */
    +    ds_put_format(&ecmp_reply, "ct.rpl && ct_label.ecmp_reply_port
    == %" PRId64,
    +                  out_port->sb->tunnel_key);
    +    ds_clear(&match);
    +    ds_put_format(&match, "%s && %s", ds_cstr(&ecmp_reply),
    +                  ds_cstr(route_match));
    +    ds_clear(&actions);
    +    ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; "
    +                  "eth.src = %s; %sreg1 = %s; outport = %s; next;",
    +                  out_port->lrp_networks.ea_s,
    +                  route->prefix.family == AF_INET ? "" : "xx",
    +                  port_ip, out_port->json_key);
    +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_IP_ROUTING, 100,
    +                           ds_cstr(&match), ds_cstr(&actions),
    +                           &st_route->header_);
    +
    +    /* Egress reply traffic for symmetric ECMP routes skips router
    policies. */
    +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_POLICY, 65535,
    +                            ds_cstr(&ecmp_reply), "next;",
    +                            &st_route->header_);
    +
    +    ds_clear(&actions);
    +    ds_put_cstr(&actions, "eth.dst = ct_label.ecmp_reply_eth; next;");
    +    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_ARP_RESOLVE,
    +                            200, ds_cstr(&ecmp_reply),
    +                            ds_cstr(&actions), &st_route->header_);
    +}
    +
      static void
      build_ecmp_route_flow(struct hmap *lflows, struct ovn_datapath *od,
                            struct hmap *ports, struct ecmp_groups_node *eg)

      {
          bool is_ipv4 = (eg->prefix.family == AF_INET);
    -    struct ds match = DS_EMPTY_INITIALIZER;
          uint16_t priority;
    +    struct ecmp_route_list_node *er;
    +    struct ds route_match = DS_EMPTY_INITIALIZER;

          char *prefix_s = build_route_prefix_s(&eg->prefix, eg->plen);
          build_route_match(NULL, prefix_s, eg->plen, eg->is_src_route,
    is_ipv4,
    -                      &match, &priority);
    +                      &route_match, &priority);
          free(prefix_s);

          struct ds actions = DS_EMPTY_INITIALIZER;
    @@ -7640,7 +7721,6 @@ build_ecmp_route_flow(struct hmap *lflows,
    struct ovn_datapath *od,
                        "; %s = select(", REG_ECMP_GROUP_ID, eg->id,
                        REG_ECMP_MEMBER_ID);

    -    struct ecmp_route_list_node *er;
          bool is_first = true;
          LIST_FOR_EACH (er, list_node, &eg->route_list) {
              if (is_first) {
    @@ -7654,11 +7734,12 @@ build_ecmp_route_flow(struct hmap *lflows,
    struct ovn_datapath *od,
          ds_put_cstr(&actions, ");");

          ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, priority,
    -                  ds_cstr(&match), ds_cstr(&actions));
    +                  ds_cstr(&route_match), ds_cstr(&actions));

          /* Add per member flow */
    +    struct ds match = DS_EMPTY_INITIALIZER;
    +    struct sset visited_ports = SSET_INITIALIZER(&visited_ports);
          LIST_FOR_EACH (er, list_node, &eg->route_list) {
    -
              const struct parsed_route *route_ = er->route;
              const struct nbrec_logical_router_static_route *route =
    route_->route;
              /* Find the outgoing port. */
    @@ -7668,6 +7749,11 @@ build_ecmp_route_flow(struct hmap *lflows,
    struct ovn_datapath *od,
                                             &out_port)) {
                  continue;
              }
    +        if (route_->ecmp_symmetric_reply && sset_add(&visited_ports,
    +                                                 out_port->key)) {
    +            add_ecmp_symmetric_reply_flows(lflows, od, lrp_addr_s,
    out_port,
    +                                           route_, &route_match);
    +        }
              ds_clear(&match);
              ds_put_format(&match, REG_ECMP_GROUP_ID" == %"PRIu16" && "
                            REG_ECMP_MEMBER_ID" == %"PRIu16,
    @@ -7688,7 +7774,9 @@ build_ecmp_route_flow(struct hmap *lflows,
    struct ovn_datapath *od,
                                      ds_cstr(&match), ds_cstr(&actions),
                                      &route->header_);
          }
    +    sset_destroy(&visited_ports);
          ds_destroy(&match);
    +    ds_destroy(&route_match);
          ds_destroy(&actions);
      }

    @@ -8972,6 +9060,7 @@ build_lrouter_flows(struct hmap *datapaths,
    struct hmap *ports,
              ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;");
              ovn_lflow_add(lflows, od, S_ROUTER_OUT_UNDNAT, 0, "1",
    "next;");
              ovn_lflow_add(lflows, od, S_ROUTER_OUT_EGR_LOOP, 0, "1",
    "next;");
    +        ovn_lflow_add(lflows, od, S_ROUTER_IN_ECMP_STATEFUL, 0,
    "1", "next;");

              /* Send the IPv6 NS packets to next table. When ovn-controller
               * generates IPv6 NS (for the action - nd_ns{}), the injected
    diff --git a/ovn-architecture.7.xml b/ovn-architecture.7.xml
    index 246cebc19..b1a462933 100644
    --- a/ovn-architecture.7.xml
    +++ b/ovn-architecture.7.xml
    @@ -1210,11 +1210,12 @@
          <dd>
            Fields that denote the connection tracking zones for
    routers.  These
            values only have local significance and are not meaningful
    between
    -      chassis.  OVN stores the zone information for DNATting in
    Open vSwitch
    +      chassis.  OVN stores the zone information for north to south
    traffic
    +      (for DNATting or ECMP symmetric replies) in Open vSwitch
              <!-- Keep the following in sync with MFF_LOG_DNAT_ZONE and
              MFF_LOG_SNAT_ZONE in ovn/lib/logical-fields.h. -->
    -      extension register number 11 and zone information for SNATing in
    -      Open vSwitch extension register number 12.
    +      extension register number 11 and zone information for south
    to north
    +      traffic (for SNATing) in Open vSwitch extension register
    number 12.
          </dd>

          <dt>logical flow flags</dt>
    diff --git a/ovn-nb.ovsschema b/ovn-nb.ovsschema
    index da9af7157..16f7794f2 100644
    --- a/ovn-nb.ovsschema
    +++ b/ovn-nb.ovsschema
    @@ -1,7 +1,7 @@
      {
          "name": "OVN_Northbound",
          "version": "5.24.0",
    -    "cksum": "1092394564 25961",
    +    "cksum": "679745602 26116",
          "tables": {
              "NB_Global": {
                  "columns": {
    @@ -365,6 +365,9 @@
                                          "min": 0, "max": 1}},
                      "nexthop": {"type": "string"},
                      "output_port": {"type": {"key": "string", "min":
    0, "max": 1}},
    +                "options": {
    +                    "type": {"key": "string", "value": "string",
    +                             "min": 0, "max": "unlimited"}},
                      "external_ids": {
                          "type": {"key": "string", "value": "string",
                                   "min": 0, "max": "unlimited"}}},
    diff --git a/ovn-nb.xml b/ovn-nb.xml
    index db5908cd5..fa804d776 100644
    --- a/ovn-nb.xml
    +++ b/ovn-nb.xml
    @@ -2481,6 +2481,20 @@
            </column>
          </group>

    +    <group title="Common options">
    +      <column name="options">
    +        This column provides general key/value settings. The supported
    +        options are described individually below.
    +      </column>
    +
    +      <column name="options" key="ecmp_symmetric_reply">
    +        It true, then new traffic that arrives over this route will
    have
    +        its reply traffic bypass ECMP route selection and will be
    sent out
    +        this route instead. Note that this option overrides any
    rules set
    +        in the <ref table="Logical_Router_policy" /> table.
    +      </column>
    +    </group>
    +
        </table>

        <table name="Logical_Router_Policy" title="Logical router policies">
    diff --git a/tests/ovn.at <http://ovn.at> b/tests/ovn.at <http://ovn.at>
    index 3afdcca1e..1a5ad67cc 100644
    --- a/tests/ovn.at <http://ovn.at>
    +++ b/tests/ovn.at <http://ovn.at>
    @@ -195,6 +195,8 @@ ct.snat = ct_state[6]
      ct.trk = ct_state[5]
      ct_label = NXM_NX_CT_LABEL
      ct_label.blocked = ct_label[0]
    +ct_label.ecmp_reply_eth = ct_label[0..47]
    +ct_label.ecmp_reply_port = ct_label[48..63]
      ct_mark = NXM_NX_CT_MARK
      ct_state = NXM_NX_CT_STATE
      ]])
    @@ -16056,7 +16058,7 @@ ovn-sbctl dump-flows lr0 | grep
    lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \
      # Since the sw0-vir is not claimed by any chassis, eth.dst should
    be set to
      # zero if the ip4.dst is the virtual ip in the router pipeline.
      AT_CHECK([cat lflows.txt], [0], [dnl
    -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    00:00:00:00:00:00; next;)
    +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    00:00:00:00:00:00; next;)
      ])

      ip_to_hex() {
    @@ -16107,7 +16109,7 @@ ovn-sbctl dump-flows lr0 | grep
    lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \
      # There should be an arp resolve flow to resolve the virtual_ip
    with the
      # sw0-p1's MAC.
      AT_CHECK([cat lflows.txt], [0], [dnl
    -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    50:54:00:00:00:03; next;)
    +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    50:54:00:00:00:03; next;)
      ])

      # Forcibly clear virtual_parent. ovn-controller should release the
    binding
    @@ -16148,7 +16150,7 @@ ovn-sbctl dump-flows lr0 | grep
    lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \
      # There should be an arp resolve flow to resolve the virtual_ip
    with the
      # sw0-p2's MAC.
      AT_CHECK([cat lflows.txt], [0], [dnl
    -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    50:54:00:00:00:05; next;)
    +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    50:54:00:00:00:05; next;)
      ])

      # send the garp from sw0-p2 (in hv2). hv2 should claim sw0-vir
    @@ -16171,7 +16173,7 @@ ovn-sbctl dump-flows lr0 | grep
    lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \
      # There should be an arp resolve flow to resolve the virtual_ip
    with the
      # sw0-p3's MAC.
      AT_CHECK([cat lflows.txt], [0], [dnl
    -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    50:54:00:00:00:04; next;)
    +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    50:54:00:00:00:04; next;)
      ])

      # Now send arp reply from sw0-p1. hv1 should claim sw0-vir
    @@ -16192,7 +16194,7 @@ ovn-sbctl dump-flows lr0 | grep
    lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \
      > lflows.txt

      AT_CHECK([cat lflows.txt], [0], [dnl
    -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    50:54:00:00:00:03; next;)
    +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    50:54:00:00:00:03; next;)
      ])

      # Delete hv1-vif1 port. hv1 should release sw0-vir
    @@ -16210,7 +16212,7 @@ ovn-sbctl dump-flows lr0 | grep
    lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \
      > lflows.txt

      AT_CHECK([cat lflows.txt], [0], [dnl
    -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    00:00:00:00:00:00; next;)
    +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    00:00:00:00:00:00; next;)
      ])

      # Now send arp reply from sw0-p2. hv2 should claim sw0-vir
    @@ -16231,7 +16233,7 @@ ovn-sbctl dump-flows lr0 | grep
    lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \
      > lflows.txt

      AT_CHECK([cat lflows.txt], [0], [dnl
    -  table=12(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    50:54:00:00:00:04; next;)
    +  table=13(lr_in_arp_resolve  ), priority=100  , match=(outport ==
    "lr0-sw0" && reg0 == 10.0.0.10), action=(eth.dst =
    50:54:00:00:00:04; next;)
      ])

      # Delete sw0-p2 logical port
    @@ -20265,22 +20267,22 @@ ovn-nbctl set logical_router_policy $pol5
    options:pkt_mark=5
      ovn-nbctl --wait=hv sync

      OVS_WAIT_UNTIL([
    -    test 1 -eq $(as hv1 ovs-ofctl dump-flows br-int table=19 | \
    +    test 1 -eq $(as hv1 ovs-ofctl dump-flows br-int table=20 | \
          grep "load:0x64->NXM_NX_PKT_MARK" -c)
      ])

      OVS_WAIT_UNTIL([
    -    test 1 -eq $(as hv1 ovs-ofctl dump-flows br-int table=19 | \
    +    test 1 -eq $(as hv1 ovs-ofctl dump-flows br-int table=20 | \
          grep "load:0x3->NXM_NX_PKT_MARK" -c)
      ])

      OVS_WAIT_UNTIL([
    -    test 1 -eq $(as hv1 ovs-ofctl dump-flows br-int table=19 | \
    +    test 1 -eq $(as hv1 ovs-ofctl dump-flows br-int table=20 | \
          grep "load:0x4->NXM_NX_PKT_MARK" -c)
      ])

      OVS_WAIT_UNTIL([
    -    test 1 -eq $(as hv1 ovs-ofctl dump-flows br-int table=19 | \
    +    test 1 -eq $(as hv1 ovs-ofctl dump-flows br-int table=20 | \
          grep "load:0x5->NXM_NX_PKT_MARK" -c)
      ])

    @@ -20371,12 +20373,12 @@ send_ipv4_pkt hv1 hv1-vif1 505400000003
    00000000ff01 \
          $(ip_to_hex 10 0 0 3) $(ip_to_hex 172 168 0 120)

      OVS_WAIT_UNTIL([
    -    test 1 -eq $(as hv1 ovs-ofctl dump-flows br-int table=19 | \
    +    test 1 -eq $(as hv1 ovs-ofctl dump-flows br-int table=20 | \
          grep "load:0x2->NXM_NX_PKT_MARK" -c)
      ])

      AT_CHECK([
    -    test 0 -eq $(as hv1 ovs-ofctl dump-flows br-int table=19 | \
    +    test 0 -eq $(as hv1 ovs-ofctl dump-flows br-int table=20 | \
          grep "load:0x64->NXM_NX_PKT_MARK" -c)
      ])

    diff --git a/tests/system-ovn.at <http://system-ovn.at>
    b/tests/system-ovn.at <http://system-ovn.at>
    index eddc530f9..451955bae 100644
    --- a/tests/system-ovn.at <http://system-ovn.at>
    +++ b/tests/system-ovn.at <http://system-ovn.at>
    @@ -4483,3 +4483,147 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query
    port patch-.*/d
      /connection dropped.*/d"])

      AT_CLEANUP
    +
    +AT_SETUP([ovn -- ECMP symmetric reply])
    +AT_KEYWORDS([ecmp])
    +
    +CHECK_CONNTRACK()
    +ovn_start
    +
    +OVS_TRAFFIC_VSWITCHD_START()
    +ADD_BR([br-int])
    +
    +# Set external-ids in br-int needed for ovn-controller
    +ovs-vsctl \
    +        -- set Open_vSwitch . external-ids:system-id=hv1 \
    +        -- set Open_vSwitch .
    external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
    +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
    +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
    +        -- set bridge br-int fail-mode=secure
    other-config:disable-in-band=true
    +
    +# Start ovn-controller
    +start_daemon ovn-controller
    +
    +# Logical network:
    +# Alice is connected to router R1. R1 is connected to two routers,
    R2 and R3 via
    +# a "join" switch.
    +# Bob is connected to both R2 and R3. R1 contains two ECMP routes,
    one through R2
    +# and one through R3, to Bob.
    +#
    +#     alice -- R1 --join---- R2
    +#                     |         \
    +#                     |           bob
    +#                     |         /
    +#                     +----- R3
    +#
    +# For this test, Bob sends request traffic through R2 to Alice. We
    want to ensure that
    +# all response traffic from Alice is routed through R2 as well.
    +
    +ovn-nbctl create Logical_Router name=R1
    +ovn-nbctl create Logical_Router name=R2 options:chassis=hv1
    +ovn-nbctl create Logical_Router name=R3 options:chassis=hv1
    +
    +ovn-nbctl ls-add alice
    +ovn-nbctl ls-add bob
    +ovn-nbctl ls-add join
    +
    +# connect alice to R1
    +ovn-nbctl lrp-add R1 alice 00:00:01:01:02:03 10.0.0.1/24
    <http://10.0.0.1/24>
    +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
    +    type=router options:router-port=alice
    addresses='"00:00:01:01:02:03"'
    +
    +# connect bob to R2
    +ovn-nbctl lrp-add R2 R2_bob 00:00:02:01:02:03 172.16.0.2/16
    <http://172.16.0.2/16>
    +ovn-nbctl lsp-add bob rp2-bob -- set Logical_Switch_Port rp2-bob \
    +    type=router options:router-port=R2_bob
    addresses='"00:00:02:01:02:03"'
    +
    +# connect bob to R3
    +ovn-nbctl lrp-add R3 R3_bob 00:00:02:01:02:04 172.16.0.3/16
    <http://172.16.0.3/16>
    +ovn-nbctl lsp-add bob rp3-bob -- set Logical_Switch_Port rp3-bob \
    +    type=router options:router-port=R3_bob
    addresses='"00:00:02:01:02:04"'
    +
    +# Connect R1 to join
    +ovn-nbctl lrp-add R1 R1_join 00:00:04:01:02:03 20.0.0.1/24
    <http://20.0.0.1/24>
    +ovn-nbctl lsp-add join r1-join -- set Logical_Switch_Port r1-join \
    +    type=router options:router-port=R1_join
    addresses='"00:00:04:01:02:03"'
    +
    +# Connect R2 to join
    +ovn-nbctl lrp-add R2 R2_join 00:00:04:01:02:04 20.0.0.2/24
    <http://20.0.0.2/24>
    +ovn-nbctl lsp-add join r2-join -- set Logical_Switch_Port r2-join \
    +    type=router options:router-port=R2_join
    addresses='"00:00:04:01:02:04"'
    +
    +# Connect R3 to join
    +ovn-nbctl lrp-add R3 R3_join 00:00:04:01:02:05 20.0.0.3/24
    <http://20.0.0.3/24>
    +ovn-nbctl lsp-add join r3-join -- set Logical_Switch_Port r3-join \
    +    type=router options:router-port=R3_join
    addresses='"00:00:04:01:02:05"'
    +
    +# Install ECMP routes for alice.
    +ovn-nbctl --ecmp-symmetric-reply --policy="src-ip" lr-route-add R1
    10.0.0.0/24 <http://10.0.0.0/24> 20.0.0.2
    +ovn-nbctl --ecmp-symmetric-reply --policy="src-ip" lr-route-add R1
    10.0.0.0/24 <http://10.0.0.0/24> 20.0.0.3
    +
    +# Static Routes
    +ovn-nbctl lr-route-add R2 10.0.0.0/24 <http://10.0.0.0/24> 20.0.0.1
    +ovn-nbctl lr-route-add R3 10.0.0.0/24 <http://10.0.0.0/24> 20.0.0.1
    +
    +# Logical port 'alice1' in switch 'alice'.
    +ADD_NAMESPACES(alice1)
    +ADD_VETH(alice1, alice1, br-int, "10.0.0.2/24
    <http://10.0.0.2/24>", "f0:00:00:01:02:04", \
    +         "10.0.0.1")
    +ovn-nbctl lsp-add alice alice1 \
    +-- lsp-set-addresses alice1 "f0:00:00:01:02:04 10.0.0.2"
    +
    +# Logical port 'bob1' in switch 'bob'.
    +ADD_NAMESPACES(bob1)
    +ADD_VETH(bob1, bob1, br-int, "172.16.0.1/16
    <http://172.16.0.1/16>", "f0:00:00:01:02:06", \
    +         "172.16.0.2")
    +ovn-nbctl lsp-add bob bob1 \
    +-- lsp-set-addresses bob1 "f0:00:00:01:02:06 172.16.0.1"
    +
    +# Ensure ovn-controller is caught up
    +ovn-nbctl --wait=hv sync
    +
    +on_exit 'ovs-ofctl dump-flows br-int'
    +
    +# 'bob1' should be able to ping 'alice1' directly.
    +NS_CHECK_EXEC([bob1], [ping -q -c 20 -i 0.3 -w 15 10.0.0.2 |
    FORMAT_PING], \
    +[0], [dnl
    +20 packets transmitted, 20 received, 0% packet loss, time 0ms
    +])
    +
    +# Ensure conntrack entry is present. We should not try to predict
    +# the tunnel key for the output port, so we strip it from the labels
    +# and just ensure that the known ethernet address is present.
    +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.0.1) | \
    +sed -e 's/zone=[[0-9]]*/zone=<cleared>/' |
    +sed -e
    's/labels=0x[[0-9a-f]]*000004010204/labels=0x000004010204/'], [0], [dnl
    
+icmp,orig=(src=172.16.0.1,dst=10.0.0.2,id=<cleared>,type=8,code=0),reply=(src=10.0.0.2,dst=172.16.0.1,id=<cleared>,type=0,code=0),zone=<cleared>,labels=0x000004010204
    +])
    +
    +ovs-appctl dpctl/dump-flows
    +
    +# Ensure datapaths show conntrack states as expected
    +# Like with conntrack entries, we shouldn't try to predict
    +# port binding tunnel keys. So omit them from expected labels.
    +AT_CHECK([ovs-appctl dpctl/dump-flows | grep
    
'ct_state(+new-est-rpl+trk).*ct(.*label=0x.*000004010204/0xffffffffffffffff)'
    -c], [0], [dnl
    +1
    +])
    +AT_CHECK([ovs-appctl dpctl/dump-flows | grep
    'ct_state(-new+est+rpl+trk).*ct_label(0x.*000004010204/0xffffffffffffffff)'
    -c], [0], [dnl
    +1
    +])
    +
    +OVS_APP_EXIT_AND_WAIT([ovn-controller])
    +
    +as ovn-sb
    +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
    +
    +as ovn-nb
    +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
    +
    +as northd
    +OVS_APP_EXIT_AND_WAIT([ovn-northd])
    +
    +as
    +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
    +/connection dropped.*/d"])
    +
    +AT_CLEANUP
    diff --git a/utilities/ovn-nbctl.8.xml b/utilities/ovn-nbctl.8.xml
    index de86b70e6..18bf90e08 100644
    --- a/utilities/ovn-nbctl.8.xml
    +++ b/utilities/ovn-nbctl.8.xml
    @@ -658,7 +658,8 @@

          <dl>
            <dt>[<code>--may-exist</code>]
    [<code>--policy</code>=<var>POLICY</var>]
    -        [<code>--ecmp</code>] <code>lr-route-add</code>
    <var>router</var>
    +        [<code>--ecmp</code>] [<code>--ecmp-symmetric-reply</code>]
    +        <code>lr-route-add</code> <var>router</var>
              <var>prefix</var> <var>nexthop</var> [<var>port</var>]</dt>
            <dd>
              <p>
    @@ -680,15 +681,31 @@
                specified, the default is "dst-ip".
              </p>

    +        <p>
    +          The <code>--ecmp</code> option allows for multiple routes
    with the
    +          same <var>prefix</var> <var>POLICY</var> but different
    +          <var>nexthop</var> and <var>port</var> to be added.
    +        </p>
    +
    +        <p>
    +          The <code>--ecmp-symmetric-reply</code> option makes it
    so that
    +          traffic that arrives over an ECMP route will have its
    reply traffic
    +          sent out over that same route. Setting
    +          <code>--ecmp-symmetric-reply</code> implies
    <code>--ecmp</code> so
    +          it is not necessary to set both.
    +        </p>
    +
              <p>
                It is an error if a route with <var>prefix</var> and
    -          <var>POLICY</var> already exists, unless
    <code>--may-exist</code> or
    -          <code>--ecmp</code> is specified.  If
    <code>--may-exist</code> is
    -          specified but not <code>--ecmp</code>, the existed route
    will be
    -          updated with the new nexthop and port.  If
    <code>--ecmp</code> is
    +          <var>POLICY</var> already exists, unless
    <code>--may-exist</code>,
    +          <code>--ecmp</code>, or
    <code>--ecmp-symmetric-reply</code> is
    +          specified.  If <code>--may-exist</code> is specified but not
    +          <code>--ecmp</code> or
    <code>--ecmp-symmetric-reply</code>, the
    +          existed route will be updated with the new nexthop and
    port.  If
    +          <code>--ecmp</code> or <code>--ecmp-symmetric-reply</code> is
                specified, a new route will be added, regardless of the
    existed
    -          route, which is useful when adding ECMP routes, i.e.
    routes with same
    -          <var>POLICY</var> and <var>prefix</var> but different
    +          route., which is useful when adding ECMP routes, i.e.
    routes with
    +          same <var>POLICY</var> and <var>prefix</var> but different
                <var>nexthop</var> and <var>port</var>.
              </p>
            </dd>
    diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
    index 0079ad5a6..e6d8dbe63 100644
    --- a/utilities/ovn-nbctl.c
    +++ b/utilities/ovn-nbctl.c
    @@ -687,7 +687,8 @@ Logical router port commands:\n\
                                  ('overlay' or 'bridged')\n\
      \n\
      Route commands:\n\
    -  [--policy=POLICY] [--ecmp] lr-route-add ROUTER PREFIX NEXTHOP
    [PORT]\n\
    +  [--policy=POLICY] [--ecmp] [--ecmp-symmetric-reply] lr-route-add
    ROUTER \n\
    +                            PREFIX NEXTHOP [PORT]\n\
                                  add a route to ROUTER\n\
        [--policy=POLICY] lr-route-del ROUTER [PREFIX [NEXTHOP [PORT]]]\n\
                                  remove routes from ROUTER\n\
    @@ -3855,7 +3856,10 @@ nbctl_lr_route_add(struct ctl_context *ctx)
          }

          bool may_exist = shash_find(&ctx->options, "--may-exist") != NULL;
    -    bool ecmp = shash_find(&ctx->options, "--ecmp") != NULL;
    +    bool ecmp_symmetric_reply = shash_find(&ctx->options,
+  "--ecmp-symmetric-reply") != NULL;
    +    bool ecmp = shash_find(&ctx->options, "--ecmp") != NULL ||
    +                ecmp_symmetric_reply;
          if (!ecmp) {
              for (int i = 0; i < lr->n_static_routes; i++) {
                  const struct nbrec_logical_router_static_route *route
    @@ -3920,6 +3924,13 @@ nbctl_lr_route_add(struct ctl_context *ctx)
              nbrec_logical_router_static_route_set_policy(route, policy);
          }

    +    if (ecmp_symmetric_reply) {
    +        const struct smap options = SMAP_CONST1(&options,
    +                                                "ecmp_symmetric_reply",
    +                                                "true");
    +        nbrec_logical_router_static_route_set_options(route, &options);
    +    }
    +
          nbrec_logical_router_verify_static_routes(lr);
          struct nbrec_logical_router_static_route **new_routes
              = xmalloc(sizeof *new_routes * (lr->n_static_routes + 1));
    @@ -6361,7 +6372,8 @@ static const struct ctl_command_syntax
    nbctl_commands[] = {

          /* logical router route commands. */
          { "lr-route-add", 3, 4, "ROUTER PREFIX NEXTHOP [PORT]", NULL,
    -      nbctl_lr_route_add, NULL, "--may-exist,--ecmp,--policy=", RW },
    +      nbctl_lr_route_add, NULL,
    "--may-exist,--ecmp,--ecmp-symmetric-reply,"
    +      "--policy=", RW },
          { "lr-route-del", 1, 4, "ROUTER [PREFIX [NEXTHOP [PORT]]]", NULL,
            nbctl_lr_route_del, NULL, "--if-exists,--policy=", RW },
          { "lr-route-list", 1, 1, "ROUTER", NULL, nbctl_lr_route_list,
    NULL,
-- 2.25.4

    _______________________________________________
    dev mailing list
    [email protected] <mailto:[email protected]>
    https://mail.openvswitch.org/mailman/listinfo/ovs-dev


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to