On Wed, Mar 23, 2022 at 1:25 PM Numan Siddique <[email protected]> wrote: > > On Wed, Mar 23, 2022 at 2:39 PM Han Zhou <[email protected]> wrote: > > > > On Wed, Mar 23, 2022 at 11:26 AM Numan Siddique <[email protected]> wrote: > > > > > > On Wed, Mar 23, 2022 at 2:13 PM Han Zhou <[email protected]> wrote: > > > > > > > > On Wed, Mar 23, 2022 at 9:42 AM Numan Siddique <[email protected]> wrote: > > > > > > > > > > On Tue, Mar 22, 2022 at 2:54 PM Han Zhou <[email protected]> wrote: > > > > > > > > > > > > Some NICs support HW offloading for datapath flows, but masked > > access to > > > > > > the 128-bit ct_label field may prevent a flow being offloaded due > > to HW > > > > > > limitations. OVN's use of ct_label currently includes: > > > > > > - ct_label.blocked (1 bit) > > > > > > - ct_label.natted (1 bit) > > > > > > - ct_label.ecmp_reply_port (16 bits) > > > > > > - ct_label.ecmp_reply_eth (48 bits) > > > > > > - ct_label.label (32 bits) > > > > > > > > > > > > This patch moves the bits blocked, natted and ecmp_reply_port to use > > > > > > ct_mark (18 bits in total among the 32-bit ct_mark), and keep the > > rest > > > > > > of the fields in ct_label: > > > > > > - ct_mark.blocked (1 bit) > > > > > > - ct_mark.natted (1 bit) > > > > > > - ct_mark.ecmp_reply_port (16 bits) > > > > > > - ct_label.ecmp_reply_eth (48 bits) > > > > > > - ct_label.label (32 bits) > > > > > > > > > > > > This would allow HW offloading to work for most of the cases. > > > > > > > > > > > > For ct_label.ecmp_reply_eth, the flow matching it still uses masked > > > > > > access, but it doesn't matter because the flow is for new > > connections > > > > > > and requires ct_commit in its actions, so it wouldn't be offloaded > > > > > > anyway for those NICs. There is a flow for established connections > > that > > > > > > would access the masked field in the actions, while in this patch it > > > > > > avoids masked access by using a register xxreg1 to temporarily read > > the > > > > > > whole ct_label, and then use masked access to xxreg1 to read the > > actual > > > > > > value. > > > > > > > > > > > > The only exception is for ct_label.label, there is a flow that > > matches > > > > > > the masked field for ACL logging of reply direction. This patch > > cannot > > > > > > avoid the masked access to ct_label in this case. This flow is > > enabled > > > > > > only for the feature "log-related". So offloading may still not > > work for > > > > > > some NICs when an ACL is configured with a label and with > > "log-related" > > > > > > enabled. > > > > > > > > > > > > There are no other flows relying on masked ct_label match, but it's > > > > > > worth noting that the LB hairpin related flows using ct_label.natted > > > > > > which were hardcoded directly in ovn-controller are still kept to > > avoid > > > > > > traffic breaking during upgrading. User will need to set a new > > option > > > > > > "ovn-check-ct-label-for-lb-hairpin" in OVS config external_ids for > > each > > > > > > ovn-controller to "false" to remove those flows. > > > > > > > > > > > > The change is backward compatible as long as the ovn-controller (on > > > > > > worker nodes) are upgraded before the ovn-northd (on central nodes). > > > > > > > > > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1957786 > > > > > > Signed-off-by: Han Zhou <[email protected]> > > > > > > --- > > > > > > NEWS | 6 + > > > > > > controller/lflow.c | 34 ++- > > > > > > controller/lflow.h | 1 + > > > > > > controller/ovn-controller.8.xml | 13 + > > > > > > controller/ovn-controller.c | 13 + > > > > > > include/ovn/logical-fields.h | 3 + > > > > > > lib/logical-fields.c | 17 +- > > > > > > northd/northd.c | 107 ++++--- > > > > > > northd/ovn-northd.8.xml | 48 +-- > > > > > > tests/ovn-northd.at | 524 > > ++++++++++++++++---------------- > > > > > > tests/ovn.at | 174 ++++++----- > > > > > > tests/system-ovn.at | 178 +++++------ > > > > > > 12 files changed, 615 insertions(+), 503 deletions(-) > > > > > > > > > > > > diff --git a/NEWS b/NEWS > > > > > > index 5ba9c3cf4..9e1274c70 100644 > > > > > > --- a/NEWS > > > > > > +++ b/NEWS > > > > > > @@ -2,6 +2,12 @@ Post v22.03.0 > > > > > > ------------- > > > > > > - Support IGMP and MLD snooping on transit logical switches that > > > > connect > > > > > > different OVN Interconnection availability zones. > > > > > > + - Replaced the usage of masked ct_label by ct_mark in most cases > > to > > > > work > > > > > > + better with hardware-offloading. Note that for backward > > > > compatibility > > > > > > + during upgrading some hairpin related flows are still kept by > > > > default, and > > > > > > + a new option "ovn-check-ct-label-for-lb-hairpin" for > > > > ovn-controller needs > > > > > > + to be set to "false" after the central upgrading completion to > > > > remove those > > > > > > + flows. > > > > > > > > > > > > OVN v22.03.0 - XX XXX XXXX > > > > > > -------------------------- > > > > > > diff --git a/controller/lflow.c b/controller/lflow.c > > > > > > index e169edef1..1b40cca0c 100644 > > > > > > --- a/controller/lflow.c > > > > > > +++ b/controller/lflow.c > > > > > > @@ -1894,6 +1894,7 @@ add_lb_vip_hairpin_flows(struct > > ovn_controller_lb > > > > *lb, > > > > > > struct ovn_lb_vip *lb_vip, > > > > > > struct ovn_lb_backend *lb_backend, > > > > > > uint8_t lb_proto, > > > > > > + bool check_ct_label_for_lb_hairpin, > > > > > > struct ovn_desired_flow_table *flow_table) > > > > > > { > > > > > > uint64_t stub[1024 / 8]; > > > > > > @@ -1985,14 +1986,29 @@ add_lb_vip_hairpin_flows(struct > > > > ovn_controller_lb *lb, > > > > > > * - the destination protocol and port must be of a valid > > backend > > > > that > > > > > > * has the same IP as ip.dst. > > > > > > */ > > > > > > - ovs_u128 lb_ct_label = { > > > > > > - .u64.lo = OVN_CT_NATTED, > > > > > > - }; > > > > > > - match_set_ct_label_masked(&hairpin_match, lb_ct_label, > > > > lb_ct_label); > > > > > > + uint32_t lb_ct_mark = OVN_CT_NATTED; > > > > > > + match_set_ct_mark_masked(&hairpin_match, lb_ct_mark, > > lb_ct_mark); > > > > > > > > > > > > ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN, 100, > > > > > > lb->slb->header_.uuid.parts[0], &hairpin_match, > > > > > > &ofpacts, &lb->slb->header_.uuid); > > > > > > + > > > > > > + /* The below flow is identical to the above except that it > > checks > > > > > > + * ct_label.natted instead of ct_mark.natted, for backward > > > > compatibility > > > > > > + * during the upgrade, and can be disabled afterwards by the > > option > > > > > > + * check_ct_label_for_lb_hairpin. */ > > > > > > + if (check_ct_label_for_lb_hairpin) { > > > > > > + match_set_ct_mark_masked(&hairpin_match, 0, 0); > > > > > > + ovs_u128 lb_ct_label = { > > > > > > + .u64.lo = OVN_CT_NATTED, > > > > > > + }; > > > > > > + match_set_ct_label_masked(&hairpin_match, lb_ct_label, > > > > lb_ct_label); > > > > > > + > > > > > > + ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN, 100, > > > > > > + lb->slb->header_.uuid.parts[0], > > &hairpin_match, > > > > > > + &ofpacts, &lb->slb->header_.uuid); > > > > > > + } > > > > > > + > > > > > > ofpbuf_uninit(&ofpacts); > > > > > > } > > > > > > > > > > > > @@ -2265,6 +2281,7 @@ add_lb_ct_snat_hairpin_flows(struct > > > > ovn_controller_lb *lb, > > > > > > static void > > > > > > consider_lb_hairpin_flows(const struct sbrec_load_balancer > > *sbrec_lb, > > > > > > const struct hmap *local_datapaths, > > > > > > + bool check_ct_label_for_lb_hairpin, > > > > > > struct ovn_desired_flow_table > > *flow_table, > > > > > > struct simap *ids) > > > > > > { > > > > > > @@ -2304,6 +2321,7 @@ consider_lb_hairpin_flows(const struct > > > > sbrec_load_balancer *sbrec_lb, > > > > > > struct ovn_lb_backend *lb_backend = > > &lb_vip->backends[j]; > > > > > > > > > > > > add_lb_vip_hairpin_flows(lb, lb_vip, lb_backend, > > lb_proto, > > > > > > + check_ct_label_for_lb_hairpin, > > > > > > flow_table); > > > > > > } > > > > > > } > > > > > > @@ -2318,6 +2336,7 @@ consider_lb_hairpin_flows(const struct > > > > sbrec_load_balancer *sbrec_lb, > > > > > > static void > > > > > > add_lb_hairpin_flows(const struct sbrec_load_balancer_table > > *lb_table, > > > > > > const struct hmap *local_datapaths, > > > > > > + bool check_ct_label_for_lb_hairpin, > > > > > > struct ovn_desired_flow_table *flow_table, > > > > > > struct simap *ids, > > > > > > struct id_pool *pool) > > > > > > @@ -2340,7 +2359,9 @@ add_lb_hairpin_flows(const struct > > > > sbrec_load_balancer_table *lb_table, > > > > > > ovs_assert(id_pool_alloc_id(pool, &id)); > > > > > > simap_put(ids, lb->name, id); > > > > > > } > > > > > > - consider_lb_hairpin_flows(lb, local_datapaths, flow_table, > > > > ids); > > > > > > + consider_lb_hairpin_flows(lb, local_datapaths, > > > > > > + check_ct_label_for_lb_hairpin, > > > > > > + flow_table, ids); > > > > > > } > > > > > > } > > > > > > > > > > > > @@ -2446,6 +2467,7 @@ lflow_run(struct lflow_ctx_in *l_ctx_in, > > struct > > > > lflow_ctx_out *l_ctx_out) > > > > > > l_ctx_in->mac_binding_table, > > > > l_ctx_in->local_datapaths, > > > > > > l_ctx_out->flow_table); > > > > > > add_lb_hairpin_flows(l_ctx_in->lb_table, > > l_ctx_in->local_datapaths, > > > > > > + l_ctx_in->check_ct_label_for_lb_hairpin, > > > > > > l_ctx_out->flow_table, > > > > > > l_ctx_out->hairpin_lb_ids, > > > > > > l_ctx_out->hairpin_id_pool); > > > > > > @@ -2581,6 +2603,7 @@ lflow_add_flows_for_datapath(const struct > > > > sbrec_datapath_binding *dp, > > > > > > * associated. */ > > > > > > for (size_t i = 0; i < n_dp_lbs; i++) { > > > > > > consider_lb_hairpin_flows(dp_lbs[i], > > l_ctx_in->local_datapaths, > > > > > > + > > > > l_ctx_in->check_ct_label_for_lb_hairpin, > > > > > > l_ctx_out->flow_table, > > > > > > l_ctx_out->hairpin_lb_ids); > > > > > > } > > > > > > @@ -2694,6 +2717,7 @@ lflow_handle_changed_lbs(struct lflow_ctx_in > > > > *l_ctx_in, > > > > > > VLOG_DBG("Add load balancer hairpin flows for "UUID_FMT, > > > > > > UUID_ARGS(&lb->header_.uuid)); > > > > > > consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths, > > > > > > + > > > > l_ctx_in->check_ct_label_for_lb_hairpin, > > > > > > l_ctx_out->flow_table, > > > > > > l_ctx_out->hairpin_lb_ids); > > > > > > } > > > > > > diff --git a/controller/lflow.h b/controller/lflow.h > > > > > > index d61733bc2..3909605f0 100644 > > > > > > --- a/controller/lflow.h > > > > > > +++ b/controller/lflow.h > > > > > > @@ -153,6 +153,7 @@ struct lflow_ctx_in { > > > > > > const struct sset *active_tunnels; > > > > > > const struct sset *related_lport_ids; > > > > > > const struct hmap *chassis_tunnels; > > > > > > + bool check_ct_label_for_lb_hairpin; > > > > > > }; > > > > > > > > > > > > struct lflow_ctx_out { > > > > > > diff --git a/controller/ovn-controller.8.xml > > > > b/controller/ovn-controller.8.xml > > > > > > index cc9a7d1c2..a4d1acbaf 100644 > > > > > > --- a/controller/ovn-controller.8.xml > > > > > > +++ b/controller/ovn-controller.8.xml > > > > > > @@ -311,6 +311,19 @@ > > > > > > heplful to pin source outer IP for the tunnel when multiple > > > > interfaces > > > > > > are used on the host for overlay traffic. > > > > > > </dd> > > > > > > + > > > > <dt><code>external_ids:ovn-check-ct-label-for-lb-hairpin</code></dt> > > > > > > + <dd> > > > > > > + The check for ct_label has been replaced by ct_mark to work > > > > better with > > > > > > + hardware offloading with smart NICs. However, to be > > backward > > > > compatible > > > > > > + during upgrading from an older version that uses ct_label > > to > > > > the > > > > > > + current version, some LB hairpin related flows using > > ct_label > > > > are kept. > > > > > > + This boolean flag can be set to <code>false</code> after > > > > > > + <code>ovn-northd</code> (central component) upgrading is > > > > complete, to > > > > > > + tell <code>ovn-controller</code> to remove those ct_label > > > > flows. > > > > > > + Without this option (or set to <code>true</code>), the > > related > > > > hairpin > > > > > > + flows using ct_label are still programmed, which could > > still > > > > lead to > > > > > > + undesirable behavior for hardware-offloading on some NICs. > > > > > > + </dd> > > > > > > </dl> > > > > > > > > > > > > > > > > Hi Han, > > > > > > > > > > Instead of having this configuration to be defined for each chassis, > > > > > I'd suggest to do the following > > > > > - ovn-northd can set an option in SB_Global options column to > > > > > indicate that it support ct_mark flags. > > > > > like - ovn-sbctl set SB_Global.options:use_ct_mark=true (just > > > > > as an example) > > > > > > > > > > - ovn-controllers can check for this value and add the appropriate > > flow. > > > > > > > > > > I think this would be seamless. I think we do something similar now, > > > > > but in the opposite direction. > > > > > i.e ovn-controllers indicate the support of a feature in the chassis > > > > > private column and ovn-northd use this information to set the logical > > > > > port up column (I think). > > > > > > > > > > Let me know what you think. > > > > > > > > > > Numan > > > > > > > > Hi Numan, > > > > > > > > I agree it may be a burden to update options for each chassis. I also > > > > thought about indicating this from ovn-northd, but it requires changes > > in > > > > ovn-controller I-P to take SB_Global as input and handle it properly > > > > without triggering recompute for unrelated changes. I will try this path > > > > for the next version. Do you think it is better to use the > > > > ovn-internal-version's OVN_INTERNAL_MINOR_VER to handle this without the > > > > need of a new option? I didn't see any official document regarding how > > > > would this version number be used, but it looks like we can rely on it > > for > > > > such purpose, if the monior_version part increases only and doesn't > > > > rewind. Assume the version before upgrading is V1, and at V2 we move > > > > ct_label to ct_mark. Ovn-controller checks if the northd minor version > > >= > > > > V2 it adds the ct_mark flows only, otherwise, add both ct_mark and > > ct_label > > > > flows. What do you think? > > > > > > Even I thought about the ovn-internal-version. It seems fine for me > > > to use the internal version > > > for this purpose. I suppose you need to increase the version in this > > patch. > > > > > > But how would you handle the case when ovn-northd gets updated to the > > > newer version and it > > > updates the new internal version to SB_Global. When that happens, I > > > suppose we should trigger force recompute ? > > > > A recompute needs to be triggered when internal version in SB_Global > > changes, which is reasonable since it happens only during upgrade. But we > > will need to avoid recompute for any other changes to SB_Global. > > > > > I guess the same can be done with the new option too, since > > > ovn-controller will come to know both the ovn-internal-version and the > > > new option (if we chose to > > > add) via the SB DB update. > > > > > Yes, same mechanism for both options, but since this is for the one time > > upgrade only, I think it is better to avoid the need of a new option > > setting in the SB_Global forever. > > Agree.
Hi Numan, please take a look at V4. https://patchwork.ozlabs.org/project/ovn/list/?series=292245 Thanks, Han > > Numan > > > > > > I'd leave it to you to choose whichever seems better. > > > > > > Numan > > > > > > > > > > > > > > Thanks, > > > > Han > > > > _______________________________________________ > > > > dev mailing list > > > > [email protected] > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
