On Wed, Mar 23, 2022 at 2:13 PM Han Zhou <[email protected]> wrote:
>
> On Wed, Mar 23, 2022 at 9:42 AM Numan Siddique <[email protected]> wrote:
> >
> > On Tue, Mar 22, 2022 at 2:54 PM Han Zhou <[email protected]> wrote:
> > >
> > > Some NICs support HW offloading for datapath flows, but masked access to
> > > the 128-bit ct_label field may prevent a flow being offloaded due to HW
> > > limitations. OVN's use of ct_label currently includes:
> > > - ct_label.blocked (1 bit)
> > > - ct_label.natted (1 bit)
> > > - ct_label.ecmp_reply_port (16 bits)
> > > - ct_label.ecmp_reply_eth (48 bits)
> > > - ct_label.label (32 bits)
> > >
> > > This patch moves the bits blocked, natted and ecmp_reply_port to use
> > > ct_mark (18 bits in total among the 32-bit ct_mark), and keep the rest
> > > of the fields in ct_label:
> > > - ct_mark.blocked (1 bit)
> > > - ct_mark.natted (1 bit)
> > > - ct_mark.ecmp_reply_port (16 bits)
> > > - ct_label.ecmp_reply_eth (48 bits)
> > > - ct_label.label (32 bits)
> > >
> > > This would allow HW offloading to work for most of the cases.
> > >
> > > For ct_label.ecmp_reply_eth, the flow matching it still uses masked
> > > access, but it doesn't matter because the flow is for new connections
> > > and requires ct_commit in its actions, so it wouldn't be offloaded
> > > anyway for those NICs. There is a flow for established connections that
> > > would access the masked field in the actions, while in this patch it
> > > avoids masked access by using a register xxreg1 to temporarily read the
> > > whole ct_label, and then use masked access to xxreg1 to read the actual
> > > value.
> > >
> > > The only exception is for ct_label.label, there is a flow that matches
> > > the masked field for ACL logging of reply direction. This patch cannot
> > > avoid the masked access to ct_label in this case. This flow is enabled
> > > only for the feature "log-related". So offloading may still not work for
> > > some NICs when an ACL is configured with a label and with "log-related"
> > > enabled.
> > >
> > > There are no other flows relying on masked ct_label match, but it's
> > > worth noting that the LB hairpin related flows using ct_label.natted
> > > which were hardcoded directly in ovn-controller are still kept to avoid
> > > traffic breaking during upgrading. User will need to set a new option
> > > "ovn-check-ct-label-for-lb-hairpin" in OVS config external_ids for each
> > > ovn-controller to "false" to remove those flows.
> > >
> > > The change is backward compatible as long as the ovn-controller (on
> > > worker nodes) are upgraded before the ovn-northd (on central nodes).
> > >
> > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1957786
> > > Signed-off-by: Han Zhou <[email protected]>
> > > ---
> > >  NEWS                            |   6 +
> > >  controller/lflow.c              |  34 ++-
> > >  controller/lflow.h              |   1 +
> > >  controller/ovn-controller.8.xml |  13 +
> > >  controller/ovn-controller.c     |  13 +
> > >  include/ovn/logical-fields.h    |   3 +
> > >  lib/logical-fields.c            |  17 +-
> > >  northd/northd.c                 | 107 ++++---
> > >  northd/ovn-northd.8.xml         |  48 +--
> > >  tests/ovn-northd.at             | 524 ++++++++++++++++----------------
> > >  tests/ovn.at                    | 174 ++++++-----
> > >  tests/system-ovn.at             | 178 +++++------
> > >  12 files changed, 615 insertions(+), 503 deletions(-)
> > >
> > > diff --git a/NEWS b/NEWS
> > > index 5ba9c3cf4..9e1274c70 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -2,6 +2,12 @@ Post v22.03.0
> > >  -------------
> > >    - Support IGMP and MLD snooping on transit logical switches that
> connect
> > >      different OVN Interconnection availability zones.
> > > +  - Replaced the usage of masked ct_label by ct_mark in most cases to
> work
> > > +    better with hardware-offloading. Note that for backward
> compatibility
> > > +    during upgrading some hairpin related flows are still kept by
> default, and
> > > +    a new option "ovn-check-ct-label-for-lb-hairpin" for
> ovn-controller needs
> > > +    to be set to "false" after the central upgrading completion to
> remove those
> > > +    flows.
> > >
> > >  OVN v22.03.0 - XX XXX XXXX
> > >  --------------------------
> > > diff --git a/controller/lflow.c b/controller/lflow.c
> > > index e169edef1..1b40cca0c 100644
> > > --- a/controller/lflow.c
> > > +++ b/controller/lflow.c
> > > @@ -1894,6 +1894,7 @@ add_lb_vip_hairpin_flows(struct ovn_controller_lb
> *lb,
> > >                           struct ovn_lb_vip *lb_vip,
> > >                           struct ovn_lb_backend *lb_backend,
> > >                           uint8_t lb_proto,
> > > +                         bool check_ct_label_for_lb_hairpin,
> > >                           struct ovn_desired_flow_table *flow_table)
> > >  {
> > >      uint64_t stub[1024 / 8];
> > > @@ -1985,14 +1986,29 @@ add_lb_vip_hairpin_flows(struct
> ovn_controller_lb *lb,
> > >       * - the destination protocol and port must be of a valid backend
> that
> > >       *   has the same IP as ip.dst.
> > >       */
> > > -    ovs_u128 lb_ct_label = {
> > > -        .u64.lo = OVN_CT_NATTED,
> > > -    };
> > > -    match_set_ct_label_masked(&hairpin_match, lb_ct_label,
> lb_ct_label);
> > > +    uint32_t lb_ct_mark = OVN_CT_NATTED;
> > > +    match_set_ct_mark_masked(&hairpin_match, lb_ct_mark, lb_ct_mark);
> > >
> > >      ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN, 100,
> > >                      lb->slb->header_.uuid.parts[0], &hairpin_match,
> > >                      &ofpacts, &lb->slb->header_.uuid);
> > > +
> > > +    /* The below flow is identical to the above except that it checks
> > > +     * ct_label.natted instead of ct_mark.natted, for backward
> compatibility
> > > +     * during the upgrade, and can be disabled afterwards by the option
> > > +     * check_ct_label_for_lb_hairpin. */
> > > +    if (check_ct_label_for_lb_hairpin) {
> > > +        match_set_ct_mark_masked(&hairpin_match, 0, 0);
> > > +        ovs_u128 lb_ct_label = {
> > > +            .u64.lo = OVN_CT_NATTED,
> > > +        };
> > > +        match_set_ct_label_masked(&hairpin_match, lb_ct_label,
> lb_ct_label);
> > > +
> > > +        ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN, 100,
> > > +                        lb->slb->header_.uuid.parts[0], &hairpin_match,
> > > +                        &ofpacts, &lb->slb->header_.uuid);
> > > +    }
> > > +
> > >      ofpbuf_uninit(&ofpacts);
> > >  }
> > >
> > > @@ -2265,6 +2281,7 @@ add_lb_ct_snat_hairpin_flows(struct
> ovn_controller_lb *lb,
> > >  static void
> > >  consider_lb_hairpin_flows(const struct sbrec_load_balancer *sbrec_lb,
> > >                            const struct hmap *local_datapaths,
> > > +                          bool check_ct_label_for_lb_hairpin,
> > >                            struct ovn_desired_flow_table *flow_table,
> > >                            struct simap *ids)
> > >  {
> > > @@ -2304,6 +2321,7 @@ consider_lb_hairpin_flows(const struct
> sbrec_load_balancer *sbrec_lb,
> > >              struct ovn_lb_backend *lb_backend = &lb_vip->backends[j];
> > >
> > >              add_lb_vip_hairpin_flows(lb, lb_vip, lb_backend, lb_proto,
> > > +                                     check_ct_label_for_lb_hairpin,
> > >                                       flow_table);
> > >          }
> > >      }
> > > @@ -2318,6 +2336,7 @@ consider_lb_hairpin_flows(const struct
> sbrec_load_balancer *sbrec_lb,
> > >  static void
> > >  add_lb_hairpin_flows(const struct sbrec_load_balancer_table *lb_table,
> > >                       const struct hmap *local_datapaths,
> > > +                     bool check_ct_label_for_lb_hairpin,
> > >                       struct ovn_desired_flow_table *flow_table,
> > >                       struct simap *ids,
> > >                       struct id_pool *pool)
> > > @@ -2340,7 +2359,9 @@ add_lb_hairpin_flows(const struct
> sbrec_load_balancer_table *lb_table,
> > >              ovs_assert(id_pool_alloc_id(pool, &id));
> > >              simap_put(ids, lb->name, id);
> > >          }
> > > -        consider_lb_hairpin_flows(lb, local_datapaths, flow_table,
> ids);
> > > +        consider_lb_hairpin_flows(lb, local_datapaths,
> > > +                                  check_ct_label_for_lb_hairpin,
> > > +                                  flow_table, ids);
> > >      }
> > >  }
> > >
> > > @@ -2446,6 +2467,7 @@ lflow_run(struct lflow_ctx_in *l_ctx_in, struct
> lflow_ctx_out *l_ctx_out)
> > >                         l_ctx_in->mac_binding_table,
> l_ctx_in->local_datapaths,
> > >                         l_ctx_out->flow_table);
> > >      add_lb_hairpin_flows(l_ctx_in->lb_table, l_ctx_in->local_datapaths,
> > > +                         l_ctx_in->check_ct_label_for_lb_hairpin,
> > >                           l_ctx_out->flow_table,
> > >                           l_ctx_out->hairpin_lb_ids,
> > >                           l_ctx_out->hairpin_id_pool);
> > > @@ -2581,6 +2603,7 @@ lflow_add_flows_for_datapath(const struct
> sbrec_datapath_binding *dp,
> > >       * associated. */
> > >      for (size_t i = 0; i < n_dp_lbs; i++) {
> > >          consider_lb_hairpin_flows(dp_lbs[i], l_ctx_in->local_datapaths,
> > > +
>  l_ctx_in->check_ct_label_for_lb_hairpin,
> > >                                    l_ctx_out->flow_table,
> > >                                    l_ctx_out->hairpin_lb_ids);
> > >      }
> > > @@ -2694,6 +2717,7 @@ lflow_handle_changed_lbs(struct lflow_ctx_in
> *l_ctx_in,
> > >          VLOG_DBG("Add load balancer hairpin flows for "UUID_FMT,
> > >                   UUID_ARGS(&lb->header_.uuid));
> > >          consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths,
> > > +
>  l_ctx_in->check_ct_label_for_lb_hairpin,
> > >                                    l_ctx_out->flow_table,
> > >                                    l_ctx_out->hairpin_lb_ids);
> > >      }
> > > diff --git a/controller/lflow.h b/controller/lflow.h
> > > index d61733bc2..3909605f0 100644
> > > --- a/controller/lflow.h
> > > +++ b/controller/lflow.h
> > > @@ -153,6 +153,7 @@ struct lflow_ctx_in {
> > >      const struct sset *active_tunnels;
> > >      const struct sset *related_lport_ids;
> > >      const struct hmap *chassis_tunnels;
> > > +    bool check_ct_label_for_lb_hairpin;
> > >  };
> > >
> > >  struct lflow_ctx_out {
> > > diff --git a/controller/ovn-controller.8.xml
> b/controller/ovn-controller.8.xml
> > > index cc9a7d1c2..a4d1acbaf 100644
> > > --- a/controller/ovn-controller.8.xml
> > > +++ b/controller/ovn-controller.8.xml
> > > @@ -311,6 +311,19 @@
> > >          heplful to pin source outer IP for the tunnel when multiple
> interfaces
> > >          are used on the host for overlay traffic.
> > >        </dd>
> > > +
>  <dt><code>external_ids:ovn-check-ct-label-for-lb-hairpin</code></dt>
> > > +      <dd>
> > > +        The check for ct_label has been replaced by ct_mark to work
> better with
> > > +        hardware offloading with smart NICs. However, to be backward
> compatible
> > > +        during upgrading from an older version that uses ct_label to
> the
> > > +        current version, some LB hairpin related flows using ct_label
> are kept.
> > > +        This boolean flag can be set to <code>false</code> after
> > > +        <code>ovn-northd</code> (central component) upgrading is
> complete, to
> > > +        tell <code>ovn-controller</code> to remove those ct_label
> flows.
> > > +        Without this option (or set to <code>true</code>), the related
> hairpin
> > > +        flows using ct_label are still programmed, which could still
> lead to
> > > +        undesirable behavior for hardware-offloading on some NICs.
> > > +      </dd>
> > >      </dl>
> > >
> >
> > Hi Han,
> >
> > Instead of having this configuration to be defined for each chassis,
> > I'd suggest to do the following
> >    -  ovn-northd can set an option in SB_Global options column to
> > indicate that it support ct_mark flags.
> >         like - ovn-sbctl set SB_Global.options:use_ct_mark=true  (just
> > as an example)
> >
> >   - ovn-controllers can check for this value and add the appropriate flow.
> >
> > I think this would be seamless.  I think we do something similar now,
> > but in the opposite direction.
> > i.e ovn-controllers indicate the support of a feature in the chassis
> > private column and ovn-northd use this information to set the logical
> > port up column (I think).
> >
> > Let me know what you think.
> >
> > Numan
>
> Hi Numan,
>
> I agree it may be a burden to update options for each chassis. I also
> thought about indicating this from ovn-northd, but it requires changes in
> ovn-controller I-P to take SB_Global as input and handle it properly
> without triggering recompute for unrelated changes. I will try this path
> for the next version. Do you think it is better to use the
> ovn-internal-version's OVN_INTERNAL_MINOR_VER to handle this without the
> need of a new option? I didn't see any official document regarding how
> would this version number be used, but it looks like we can rely on it for
> such purpose, if the monior_version part  increases only and doesn't
> rewind. Assume the version before upgrading is V1, and at V2 we move
> ct_label to ct_mark. Ovn-controller checks if the northd minor version >=
> V2 it adds the ct_mark flows only, otherwise, add both ct_mark and ct_label
> flows. What do you think?

Even I thought about the ovn-internal-version.  It seems fine for me
to use the internal version
for this purpose.  I suppose you need to increase the version in this patch.

But how would you handle the case when ovn-northd gets updated to the
newer version and it
updates the new internal version to SB_Global.  When that happens,  I
suppose we should trigger force recompute ?
I guess the same can be done with the new option too,  since
ovn-controller will come to know both the ovn-internal-version and the
new option (if we chose to
add) via the SB DB update.

I'd leave it to you to choose whichever seems better.

Numan


>
> Thanks,
> Han
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to