On Wed, Mar 23, 2022 at 9:42 AM Numan Siddique <[email protected]> wrote:
>
> On Tue, Mar 22, 2022 at 2:54 PM Han Zhou <[email protected]> wrote:
> >
> > Some NICs support HW offloading for datapath flows, but masked access to
> > the 128-bit ct_label field may prevent a flow being offloaded due to HW
> > limitations. OVN's use of ct_label currently includes:
> > - ct_label.blocked (1 bit)
> > - ct_label.natted (1 bit)
> > - ct_label.ecmp_reply_port (16 bits)
> > - ct_label.ecmp_reply_eth (48 bits)
> > - ct_label.label (32 bits)
> >
> > This patch moves the bits blocked, natted and ecmp_reply_port to use
> > ct_mark (18 bits in total among the 32-bit ct_mark), and keep the rest
> > of the fields in ct_label:
> > - ct_mark.blocked (1 bit)
> > - ct_mark.natted (1 bit)
> > - ct_mark.ecmp_reply_port (16 bits)
> > - ct_label.ecmp_reply_eth (48 bits)
> > - ct_label.label (32 bits)
> >
> > This would allow HW offloading to work for most of the cases.
> >
> > For ct_label.ecmp_reply_eth, the flow matching it still uses masked
> > access, but it doesn't matter because the flow is for new connections
> > and requires ct_commit in its actions, so it wouldn't be offloaded
> > anyway for those NICs. There is a flow for established connections that
> > would access the masked field in the actions, while in this patch it
> > avoids masked access by using a register xxreg1 to temporarily read the
> > whole ct_label, and then use masked access to xxreg1 to read the actual
> > value.
> >
> > The only exception is for ct_label.label, there is a flow that matches
> > the masked field for ACL logging of reply direction. This patch cannot
> > avoid the masked access to ct_label in this case. This flow is enabled
> > only for the feature "log-related". So offloading may still not work for
> > some NICs when an ACL is configured with a label and with "log-related"
> > enabled.
> >
> > There are no other flows relying on masked ct_label match, but it's
> > worth noting that the LB hairpin related flows using ct_label.natted
> > which were hardcoded directly in ovn-controller are still kept to avoid
> > traffic breaking during upgrading. User will need to set a new option
> > "ovn-check-ct-label-for-lb-hairpin" in OVS config external_ids for each
> > ovn-controller to "false" to remove those flows.
> >
> > The change is backward compatible as long as the ovn-controller (on
> > worker nodes) are upgraded before the ovn-northd (on central nodes).
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1957786
> > Signed-off-by: Han Zhou <[email protected]>
> > ---
> >  NEWS                            |   6 +
> >  controller/lflow.c              |  34 ++-
> >  controller/lflow.h              |   1 +
> >  controller/ovn-controller.8.xml |  13 +
> >  controller/ovn-controller.c     |  13 +
> >  include/ovn/logical-fields.h    |   3 +
> >  lib/logical-fields.c            |  17 +-
> >  northd/northd.c                 | 107 ++++---
> >  northd/ovn-northd.8.xml         |  48 +--
> >  tests/ovn-northd.at             | 524 ++++++++++++++++----------------
> >  tests/ovn.at                    | 174 ++++++-----
> >  tests/system-ovn.at             | 178 +++++------
> >  12 files changed, 615 insertions(+), 503 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 5ba9c3cf4..9e1274c70 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -2,6 +2,12 @@ Post v22.03.0
> >  -------------
> >    - Support IGMP and MLD snooping on transit logical switches that
connect
> >      different OVN Interconnection availability zones.
> > +  - Replaced the usage of masked ct_label by ct_mark in most cases to
work
> > +    better with hardware-offloading. Note that for backward
compatibility
> > +    during upgrading some hairpin related flows are still kept by
default, and
> > +    a new option "ovn-check-ct-label-for-lb-hairpin" for
ovn-controller needs
> > +    to be set to "false" after the central upgrading completion to
remove those
> > +    flows.
> >
> >  OVN v22.03.0 - XX XXX XXXX
> >  --------------------------
> > diff --git a/controller/lflow.c b/controller/lflow.c
> > index e169edef1..1b40cca0c 100644
> > --- a/controller/lflow.c
> > +++ b/controller/lflow.c
> > @@ -1894,6 +1894,7 @@ add_lb_vip_hairpin_flows(struct ovn_controller_lb
*lb,
> >                           struct ovn_lb_vip *lb_vip,
> >                           struct ovn_lb_backend *lb_backend,
> >                           uint8_t lb_proto,
> > +                         bool check_ct_label_for_lb_hairpin,
> >                           struct ovn_desired_flow_table *flow_table)
> >  {
> >      uint64_t stub[1024 / 8];
> > @@ -1985,14 +1986,29 @@ add_lb_vip_hairpin_flows(struct
ovn_controller_lb *lb,
> >       * - the destination protocol and port must be of a valid backend
that
> >       *   has the same IP as ip.dst.
> >       */
> > -    ovs_u128 lb_ct_label = {
> > -        .u64.lo = OVN_CT_NATTED,
> > -    };
> > -    match_set_ct_label_masked(&hairpin_match, lb_ct_label,
lb_ct_label);
> > +    uint32_t lb_ct_mark = OVN_CT_NATTED;
> > +    match_set_ct_mark_masked(&hairpin_match, lb_ct_mark, lb_ct_mark);
> >
> >      ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN, 100,
> >                      lb->slb->header_.uuid.parts[0], &hairpin_match,
> >                      &ofpacts, &lb->slb->header_.uuid);
> > +
> > +    /* The below flow is identical to the above except that it checks
> > +     * ct_label.natted instead of ct_mark.natted, for backward
compatibility
> > +     * during the upgrade, and can be disabled afterwards by the option
> > +     * check_ct_label_for_lb_hairpin. */
> > +    if (check_ct_label_for_lb_hairpin) {
> > +        match_set_ct_mark_masked(&hairpin_match, 0, 0);
> > +        ovs_u128 lb_ct_label = {
> > +            .u64.lo = OVN_CT_NATTED,
> > +        };
> > +        match_set_ct_label_masked(&hairpin_match, lb_ct_label,
lb_ct_label);
> > +
> > +        ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN, 100,
> > +                        lb->slb->header_.uuid.parts[0], &hairpin_match,
> > +                        &ofpacts, &lb->slb->header_.uuid);
> > +    }
> > +
> >      ofpbuf_uninit(&ofpacts);
> >  }
> >
> > @@ -2265,6 +2281,7 @@ add_lb_ct_snat_hairpin_flows(struct
ovn_controller_lb *lb,
> >  static void
> >  consider_lb_hairpin_flows(const struct sbrec_load_balancer *sbrec_lb,
> >                            const struct hmap *local_datapaths,
> > +                          bool check_ct_label_for_lb_hairpin,
> >                            struct ovn_desired_flow_table *flow_table,
> >                            struct simap *ids)
> >  {
> > @@ -2304,6 +2321,7 @@ consider_lb_hairpin_flows(const struct
sbrec_load_balancer *sbrec_lb,
> >              struct ovn_lb_backend *lb_backend = &lb_vip->backends[j];
> >
> >              add_lb_vip_hairpin_flows(lb, lb_vip, lb_backend, lb_proto,
> > +                                     check_ct_label_for_lb_hairpin,
> >                                       flow_table);
> >          }
> >      }
> > @@ -2318,6 +2336,7 @@ consider_lb_hairpin_flows(const struct
sbrec_load_balancer *sbrec_lb,
> >  static void
> >  add_lb_hairpin_flows(const struct sbrec_load_balancer_table *lb_table,
> >                       const struct hmap *local_datapaths,
> > +                     bool check_ct_label_for_lb_hairpin,
> >                       struct ovn_desired_flow_table *flow_table,
> >                       struct simap *ids,
> >                       struct id_pool *pool)
> > @@ -2340,7 +2359,9 @@ add_lb_hairpin_flows(const struct
sbrec_load_balancer_table *lb_table,
> >              ovs_assert(id_pool_alloc_id(pool, &id));
> >              simap_put(ids, lb->name, id);
> >          }
> > -        consider_lb_hairpin_flows(lb, local_datapaths, flow_table,
ids);
> > +        consider_lb_hairpin_flows(lb, local_datapaths,
> > +                                  check_ct_label_for_lb_hairpin,
> > +                                  flow_table, ids);
> >      }
> >  }
> >
> > @@ -2446,6 +2467,7 @@ lflow_run(struct lflow_ctx_in *l_ctx_in, struct
lflow_ctx_out *l_ctx_out)
> >                         l_ctx_in->mac_binding_table,
l_ctx_in->local_datapaths,
> >                         l_ctx_out->flow_table);
> >      add_lb_hairpin_flows(l_ctx_in->lb_table, l_ctx_in->local_datapaths,
> > +                         l_ctx_in->check_ct_label_for_lb_hairpin,
> >                           l_ctx_out->flow_table,
> >                           l_ctx_out->hairpin_lb_ids,
> >                           l_ctx_out->hairpin_id_pool);
> > @@ -2581,6 +2603,7 @@ lflow_add_flows_for_datapath(const struct
sbrec_datapath_binding *dp,
> >       * associated. */
> >      for (size_t i = 0; i < n_dp_lbs; i++) {
> >          consider_lb_hairpin_flows(dp_lbs[i], l_ctx_in->local_datapaths,
> > +
 l_ctx_in->check_ct_label_for_lb_hairpin,
> >                                    l_ctx_out->flow_table,
> >                                    l_ctx_out->hairpin_lb_ids);
> >      }
> > @@ -2694,6 +2717,7 @@ lflow_handle_changed_lbs(struct lflow_ctx_in
*l_ctx_in,
> >          VLOG_DBG("Add load balancer hairpin flows for "UUID_FMT,
> >                   UUID_ARGS(&lb->header_.uuid));
> >          consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths,
> > +
 l_ctx_in->check_ct_label_for_lb_hairpin,
> >                                    l_ctx_out->flow_table,
> >                                    l_ctx_out->hairpin_lb_ids);
> >      }
> > diff --git a/controller/lflow.h b/controller/lflow.h
> > index d61733bc2..3909605f0 100644
> > --- a/controller/lflow.h
> > +++ b/controller/lflow.h
> > @@ -153,6 +153,7 @@ struct lflow_ctx_in {
> >      const struct sset *active_tunnels;
> >      const struct sset *related_lport_ids;
> >      const struct hmap *chassis_tunnels;
> > +    bool check_ct_label_for_lb_hairpin;
> >  };
> >
> >  struct lflow_ctx_out {
> > diff --git a/controller/ovn-controller.8.xml
b/controller/ovn-controller.8.xml
> > index cc9a7d1c2..a4d1acbaf 100644
> > --- a/controller/ovn-controller.8.xml
> > +++ b/controller/ovn-controller.8.xml
> > @@ -311,6 +311,19 @@
> >          heplful to pin source outer IP for the tunnel when multiple
interfaces
> >          are used on the host for overlay traffic.
> >        </dd>
> > +
 <dt><code>external_ids:ovn-check-ct-label-for-lb-hairpin</code></dt>
> > +      <dd>
> > +        The check for ct_label has been replaced by ct_mark to work
better with
> > +        hardware offloading with smart NICs. However, to be backward
compatible
> > +        during upgrading from an older version that uses ct_label to
the
> > +        current version, some LB hairpin related flows using ct_label
are kept.
> > +        This boolean flag can be set to <code>false</code> after
> > +        <code>ovn-northd</code> (central component) upgrading is
complete, to
> > +        tell <code>ovn-controller</code> to remove those ct_label
flows.
> > +        Without this option (or set to <code>true</code>), the related
hairpin
> > +        flows using ct_label are still programmed, which could still
lead to
> > +        undesirable behavior for hardware-offloading on some NICs.
> > +      </dd>
> >      </dl>
> >
>
> Hi Han,
>
> Instead of having this configuration to be defined for each chassis,
> I'd suggest to do the following
>    -  ovn-northd can set an option in SB_Global options column to
> indicate that it support ct_mark flags.
>         like - ovn-sbctl set SB_Global.options:use_ct_mark=true  (just
> as an example)
>
>   - ovn-controllers can check for this value and add the appropriate flow.
>
> I think this would be seamless.  I think we do something similar now,
> but in the opposite direction.
> i.e ovn-controllers indicate the support of a feature in the chassis
> private column and ovn-northd use this information to set the logical
> port up column (I think).
>
> Let me know what you think.
>
> Numan

Hi Numan,

I agree it may be a burden to update options for each chassis. I also
thought about indicating this from ovn-northd, but it requires changes in
ovn-controller I-P to take SB_Global as input and handle it properly
without triggering recompute for unrelated changes. I will try this path
for the next version. Do you think it is better to use the
ovn-internal-version's OVN_INTERNAL_MINOR_VER to handle this without the
need of a new option? I didn't see any official document regarding how
would this version number be used, but it looks like we can rely on it for
such purpose, if the monior_version part  increases only and doesn't
rewind. Assume the version before upgrading is V1, and at V2 we move
ct_label to ct_mark. Ovn-controller checks if the northd minor version >=
V2 it adds the ct_mark flows only, otherwise, add both ct_mark and ct_label
flows. What do you think?

Thanks,
Han
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to