On Wed, Mar 23, 2022 at 1:25 PM Numan Siddique <[email protected]> wrote:
>
> On Wed, Mar 23, 2022 at 2:39 PM Han Zhou <[email protected]> wrote:
> >
> > On Wed, Mar 23, 2022 at 11:26 AM Numan Siddique <[email protected]> wrote:
> > >
> > > On Wed, Mar 23, 2022 at 2:13 PM Han Zhou <[email protected]> wrote:
> > > >
> > > > On Wed, Mar 23, 2022 at 9:42 AM Numan Siddique <[email protected]>
wrote:
> > > > >
> > > > > On Tue, Mar 22, 2022 at 2:54 PM Han Zhou <[email protected]> wrote:
> > > > > >
> > > > > > Some NICs support HW offloading for datapath flows, but masked
> > access to
> > > > > > the 128-bit ct_label field may prevent a flow being offloaded
due
> > to HW
> > > > > > limitations. OVN's use of ct_label currently includes:
> > > > > > - ct_label.blocked (1 bit)
> > > > > > - ct_label.natted (1 bit)
> > > > > > - ct_label.ecmp_reply_port (16 bits)
> > > > > > - ct_label.ecmp_reply_eth (48 bits)
> > > > > > - ct_label.label (32 bits)
> > > > > >
> > > > > > This patch moves the bits blocked, natted and ecmp_reply_port
to use
> > > > > > ct_mark (18 bits in total among the 32-bit ct_mark), and keep
the
> > rest
> > > > > > of the fields in ct_label:
> > > > > > - ct_mark.blocked (1 bit)
> > > > > > - ct_mark.natted (1 bit)
> > > > > > - ct_mark.ecmp_reply_port (16 bits)
> > > > > > - ct_label.ecmp_reply_eth (48 bits)
> > > > > > - ct_label.label (32 bits)
> > > > > >
> > > > > > This would allow HW offloading to work for most of the cases.
> > > > > >
> > > > > > For ct_label.ecmp_reply_eth, the flow matching it still uses
masked
> > > > > > access, but it doesn't matter because the flow is for new
> > connections
> > > > > > and requires ct_commit in its actions, so it wouldn't be
offloaded
> > > > > > anyway for those NICs. There is a flow for established
connections
> > that
> > > > > > would access the masked field in the actions, while in this
patch it
> > > > > > avoids masked access by using a register xxreg1 to temporarily
read
> > the
> > > > > > whole ct_label, and then use masked access to xxreg1 to read the
> > actual
> > > > > > value.
> > > > > >
> > > > > > The only exception is for ct_label.label, there is a flow that
> > matches
> > > > > > the masked field for ACL logging of reply direction. This patch
> > cannot
> > > > > > avoid the masked access to ct_label in this case. This flow is
> > enabled
> > > > > > only for the feature "log-related". So offloading may still not
> > work for
> > > > > > some NICs when an ACL is configured with a label and with
> > "log-related"
> > > > > > enabled.
> > > > > >
> > > > > > There are no other flows relying on masked ct_label match, but
it's
> > > > > > worth noting that the LB hairpin related flows using
ct_label.natted
> > > > > > which were hardcoded directly in ovn-controller are still kept
to
> > avoid
> > > > > > traffic breaking during upgrading. User will need to set a new
> > option
> > > > > > "ovn-check-ct-label-for-lb-hairpin" in OVS config external_ids
for
> > each
> > > > > > ovn-controller to "false" to remove those flows.
> > > > > >
> > > > > > The change is backward compatible as long as the ovn-controller
(on
> > > > > > worker nodes) are upgraded before the ovn-northd (on central
nodes).
> > > > > >
> > > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1957786
> > > > > > Signed-off-by: Han Zhou <[email protected]>
> > > > > > ---
> > > > > >  NEWS                            |   6 +
> > > > > >  controller/lflow.c              |  34 ++-
> > > > > >  controller/lflow.h              |   1 +
> > > > > >  controller/ovn-controller.8.xml |  13 +
> > > > > >  controller/ovn-controller.c     |  13 +
> > > > > >  include/ovn/logical-fields.h    |   3 +
> > > > > >  lib/logical-fields.c            |  17 +-
> > > > > >  northd/northd.c                 | 107 ++++---
> > > > > >  northd/ovn-northd.8.xml         |  48 +--
> > > > > >  tests/ovn-northd.at             | 524
> > ++++++++++++++++----------------
> > > > > >  tests/ovn.at                    | 174 ++++++-----
> > > > > >  tests/system-ovn.at             | 178 +++++------
> > > > > >  12 files changed, 615 insertions(+), 503 deletions(-)
> > > > > >
> > > > > > diff --git a/NEWS b/NEWS
> > > > > > index 5ba9c3cf4..9e1274c70 100644
> > > > > > --- a/NEWS
> > > > > > +++ b/NEWS
> > > > > > @@ -2,6 +2,12 @@ Post v22.03.0
> > > > > >  -------------
> > > > > >    - Support IGMP and MLD snooping on transit logical switches
that
> > > > connect
> > > > > >      different OVN Interconnection availability zones.
> > > > > > +  - Replaced the usage of masked ct_label by ct_mark in most
cases
> > to
> > > > work
> > > > > > +    better with hardware-offloading. Note that for backward
> > > > compatibility
> > > > > > +    during upgrading some hairpin related flows are still kept
by
> > > > default, and
> > > > > > +    a new option "ovn-check-ct-label-for-lb-hairpin" for
> > > > ovn-controller needs
> > > > > > +    to be set to "false" after the central upgrading
completion to
> > > > remove those
> > > > > > +    flows.
> > > > > >
> > > > > >  OVN v22.03.0 - XX XXX XXXX
> > > > > >  --------------------------
> > > > > > diff --git a/controller/lflow.c b/controller/lflow.c
> > > > > > index e169edef1..1b40cca0c 100644
> > > > > > --- a/controller/lflow.c
> > > > > > +++ b/controller/lflow.c
> > > > > > @@ -1894,6 +1894,7 @@ add_lb_vip_hairpin_flows(struct
> > ovn_controller_lb
> > > > *lb,
> > > > > >                           struct ovn_lb_vip *lb_vip,
> > > > > >                           struct ovn_lb_backend *lb_backend,
> > > > > >                           uint8_t lb_proto,
> > > > > > +                         bool check_ct_label_for_lb_hairpin,
> > > > > >                           struct ovn_desired_flow_table
*flow_table)
> > > > > >  {
> > > > > >      uint64_t stub[1024 / 8];
> > > > > > @@ -1985,14 +1986,29 @@ add_lb_vip_hairpin_flows(struct
> > > > ovn_controller_lb *lb,
> > > > > >       * - the destination protocol and port must be of a valid
> > backend
> > > > that
> > > > > >       *   has the same IP as ip.dst.
> > > > > >       */
> > > > > > -    ovs_u128 lb_ct_label = {
> > > > > > -        .u64.lo = OVN_CT_NATTED,
> > > > > > -    };
> > > > > > -    match_set_ct_label_masked(&hairpin_match, lb_ct_label,
> > > > lb_ct_label);
> > > > > > +    uint32_t lb_ct_mark = OVN_CT_NATTED;
> > > > > > +    match_set_ct_mark_masked(&hairpin_match, lb_ct_mark,
> > lb_ct_mark);
> > > > > >
> > > > > >      ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN, 100,
> > > > > >                      lb->slb->header_.uuid.parts[0],
&hairpin_match,
> > > > > >                      &ofpacts, &lb->slb->header_.uuid);
> > > > > > +
> > > > > > +    /* The below flow is identical to the above except that it
> > checks
> > > > > > +     * ct_label.natted instead of ct_mark.natted, for backward
> > > > compatibility
> > > > > > +     * during the upgrade, and can be disabled afterwards by
the
> > option
> > > > > > +     * check_ct_label_for_lb_hairpin. */
> > > > > > +    if (check_ct_label_for_lb_hairpin) {
> > > > > > +        match_set_ct_mark_masked(&hairpin_match, 0, 0);
> > > > > > +        ovs_u128 lb_ct_label = {
> > > > > > +            .u64.lo = OVN_CT_NATTED,
> > > > > > +        };
> > > > > > +        match_set_ct_label_masked(&hairpin_match, lb_ct_label,
> > > > lb_ct_label);
> > > > > > +
> > > > > > +        ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN,
100,
> > > > > > +                        lb->slb->header_.uuid.parts[0],
> > &hairpin_match,
> > > > > > +                        &ofpacts, &lb->slb->header_.uuid);
> > > > > > +    }
> > > > > > +
> > > > > >      ofpbuf_uninit(&ofpacts);
> > > > > >  }
> > > > > >
> > > > > > @@ -2265,6 +2281,7 @@ add_lb_ct_snat_hairpin_flows(struct
> > > > ovn_controller_lb *lb,
> > > > > >  static void
> > > > > >  consider_lb_hairpin_flows(const struct sbrec_load_balancer
> > *sbrec_lb,
> > > > > >                            const struct hmap *local_datapaths,
> > > > > > +                          bool check_ct_label_for_lb_hairpin,
> > > > > >                            struct ovn_desired_flow_table
> > *flow_table,
> > > > > >                            struct simap *ids)
> > > > > >  {
> > > > > > @@ -2304,6 +2321,7 @@ consider_lb_hairpin_flows(const struct
> > > > sbrec_load_balancer *sbrec_lb,
> > > > > >              struct ovn_lb_backend *lb_backend =
> > &lb_vip->backends[j];
> > > > > >
> > > > > >              add_lb_vip_hairpin_flows(lb, lb_vip, lb_backend,
> > lb_proto,
> > > > > > +
check_ct_label_for_lb_hairpin,
> > > > > >                                       flow_table);
> > > > > >          }
> > > > > >      }
> > > > > > @@ -2318,6 +2336,7 @@ consider_lb_hairpin_flows(const struct
> > > > sbrec_load_balancer *sbrec_lb,
> > > > > >  static void
> > > > > >  add_lb_hairpin_flows(const struct sbrec_load_balancer_table
> > *lb_table,
> > > > > >                       const struct hmap *local_datapaths,
> > > > > > +                     bool check_ct_label_for_lb_hairpin,
> > > > > >                       struct ovn_desired_flow_table *flow_table,
> > > > > >                       struct simap *ids,
> > > > > >                       struct id_pool *pool)
> > > > > > @@ -2340,7 +2359,9 @@ add_lb_hairpin_flows(const struct
> > > > sbrec_load_balancer_table *lb_table,
> > > > > >              ovs_assert(id_pool_alloc_id(pool, &id));
> > > > > >              simap_put(ids, lb->name, id);
> > > > > >          }
> > > > > > -        consider_lb_hairpin_flows(lb, local_datapaths,
flow_table,
> > > > ids);
> > > > > > +        consider_lb_hairpin_flows(lb, local_datapaths,
> > > > > > +
 check_ct_label_for_lb_hairpin,
> > > > > > +                                  flow_table, ids);
> > > > > >      }
> > > > > >  }
> > > > > >
> > > > > > @@ -2446,6 +2467,7 @@ lflow_run(struct lflow_ctx_in *l_ctx_in,
> > struct
> > > > lflow_ctx_out *l_ctx_out)
> > > > > >                         l_ctx_in->mac_binding_table,
> > > > l_ctx_in->local_datapaths,
> > > > > >                         l_ctx_out->flow_table);
> > > > > >      add_lb_hairpin_flows(l_ctx_in->lb_table,
> > l_ctx_in->local_datapaths,
> > > > > > +
l_ctx_in->check_ct_label_for_lb_hairpin,
> > > > > >                           l_ctx_out->flow_table,
> > > > > >                           l_ctx_out->hairpin_lb_ids,
> > > > > >                           l_ctx_out->hairpin_id_pool);
> > > > > > @@ -2581,6 +2603,7 @@ lflow_add_flows_for_datapath(const struct
> > > > sbrec_datapath_binding *dp,
> > > > > >       * associated. */
> > > > > >      for (size_t i = 0; i < n_dp_lbs; i++) {
> > > > > >          consider_lb_hairpin_flows(dp_lbs[i],
> > l_ctx_in->local_datapaths,
> > > > > > +
> > > >  l_ctx_in->check_ct_label_for_lb_hairpin,
> > > > > >                                    l_ctx_out->flow_table,
> > > > > >                                    l_ctx_out->hairpin_lb_ids);
> > > > > >      }
> > > > > > @@ -2694,6 +2717,7 @@ lflow_handle_changed_lbs(struct
lflow_ctx_in
> > > > *l_ctx_in,
> > > > > >          VLOG_DBG("Add load balancer hairpin flows for
"UUID_FMT,
> > > > > >                   UUID_ARGS(&lb->header_.uuid));
> > > > > >          consider_lb_hairpin_flows(lb,
l_ctx_in->local_datapaths,
> > > > > > +
> > > >  l_ctx_in->check_ct_label_for_lb_hairpin,
> > > > > >                                    l_ctx_out->flow_table,
> > > > > >                                    l_ctx_out->hairpin_lb_ids);
> > > > > >      }
> > > > > > diff --git a/controller/lflow.h b/controller/lflow.h
> > > > > > index d61733bc2..3909605f0 100644
> > > > > > --- a/controller/lflow.h
> > > > > > +++ b/controller/lflow.h
> > > > > > @@ -153,6 +153,7 @@ struct lflow_ctx_in {
> > > > > >      const struct sset *active_tunnels;
> > > > > >      const struct sset *related_lport_ids;
> > > > > >      const struct hmap *chassis_tunnels;
> > > > > > +    bool check_ct_label_for_lb_hairpin;
> > > > > >  };
> > > > > >
> > > > > >  struct lflow_ctx_out {
> > > > > > diff --git a/controller/ovn-controller.8.xml
> > > > b/controller/ovn-controller.8.xml
> > > > > > index cc9a7d1c2..a4d1acbaf 100644
> > > > > > --- a/controller/ovn-controller.8.xml
> > > > > > +++ b/controller/ovn-controller.8.xml
> > > > > > @@ -311,6 +311,19 @@
> > > > > >          heplful to pin source outer IP for the tunnel when
multiple
> > > > interfaces
> > > > > >          are used on the host for overlay traffic.
> > > > > >        </dd>
> > > > > > +
> > > >
 <dt><code>external_ids:ovn-check-ct-label-for-lb-hairpin</code></dt>
> > > > > > +      <dd>
> > > > > > +        The check for ct_label has been replaced by ct_mark to
work
> > > > better with
> > > > > > +        hardware offloading with smart NICs. However, to be
> > backward
> > > > compatible
> > > > > > +        during upgrading from an older version that uses
ct_label
> > to
> > > > the
> > > > > > +        current version, some LB hairpin related flows using
> > ct_label
> > > > are kept.
> > > > > > +        This boolean flag can be set to <code>false</code>
after
> > > > > > +        <code>ovn-northd</code> (central component) upgrading
is
> > > > complete, to
> > > > > > +        tell <code>ovn-controller</code> to remove those
ct_label
> > > > flows.
> > > > > > +        Without this option (or set to <code>true</code>), the
> > related
> > > > hairpin
> > > > > > +        flows using ct_label are still programmed, which could
> > still
> > > > lead to
> > > > > > +        undesirable behavior for hardware-offloading on some
NICs.
> > > > > > +      </dd>
> > > > > >      </dl>
> > > > > >
> > > > >
> > > > > Hi Han,
> > > > >
> > > > > Instead of having this configuration to be defined for each
chassis,
> > > > > I'd suggest to do the following
> > > > >    -  ovn-northd can set an option in SB_Global options column to
> > > > > indicate that it support ct_mark flags.
> > > > >         like - ovn-sbctl set SB_Global.options:use_ct_mark=true
 (just
> > > > > as an example)
> > > > >
> > > > >   - ovn-controllers can check for this value and add the
appropriate
> > flow.
> > > > >
> > > > > I think this would be seamless.  I think we do something similar
now,
> > > > > but in the opposite direction.
> > > > > i.e ovn-controllers indicate the support of a feature in the
chassis
> > > > > private column and ovn-northd use this information to set the
logical
> > > > > port up column (I think).
> > > > >
> > > > > Let me know what you think.
> > > > >
> > > > > Numan
> > > >
> > > > Hi Numan,
> > > >
> > > > I agree it may be a burden to update options for each chassis. I
also
> > > > thought about indicating this from ovn-northd, but it requires
changes
> > in
> > > > ovn-controller I-P to take SB_Global as input and handle it properly
> > > > without triggering recompute for unrelated changes. I will try this
path
> > > > for the next version. Do you think it is better to use the
> > > > ovn-internal-version's OVN_INTERNAL_MINOR_VER to handle this
without the
> > > > need of a new option? I didn't see any official document regarding
how
> > > > would this version number be used, but it looks like we can rely on
it
> > for
> > > > such purpose, if the monior_version part  increases only and doesn't
> > > > rewind. Assume the version before upgrading is V1, and at V2 we move
> > > > ct_label to ct_mark. Ovn-controller checks if the northd minor
version
> > >=
> > > > V2 it adds the ct_mark flows only, otherwise, add both ct_mark and
> > ct_label
> > > > flows. What do you think?
> > >
> > > Even I thought about the ovn-internal-version.  It seems fine for me
> > > to use the internal version
> > > for this purpose.  I suppose you need to increase the version in this
> > patch.
> > >
> > > But how would you handle the case when ovn-northd gets updated to the
> > > newer version and it
> > > updates the new internal version to SB_Global.  When that happens,  I
> > > suppose we should trigger force recompute ?
> >
> > A recompute needs to be triggered when internal version in SB_Global
> > changes, which is reasonable since it happens only during upgrade. But
we
> > will need to avoid recompute for any other changes to SB_Global.
> >
> > > I guess the same can be done with the new option too,  since
> > > ovn-controller will come to know both the ovn-internal-version and the
> > > new option (if we chose to
> > > add) via the SB DB update.
> > >
> > Yes, same mechanism for both options, but since this is for the one time
> > upgrade only, I think it is better to avoid the need of a new option
> > setting in the SB_Global forever.
>
> Agree.

Hi Numan, please take a look at V4.
https://patchwork.ozlabs.org/project/ovn/list/?series=292245

Thanks,
Han

>
> Numan
>
> >
> > > I'd leave it to you to choose whichever seems better.
> > >
> > > Numan
> > >
> > >
> > > >
> > > > Thanks,
> > > > Han
> > > > _______________________________________________
> > > > dev mailing list
> > > > [email protected]
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to