On Wed, Mar 23, 2022 at 9:42 AM Numan Siddique <[email protected]> wrote: > > On Tue, Mar 22, 2022 at 2:54 PM Han Zhou <[email protected]> wrote: > > > > Some NICs support HW offloading for datapath flows, but masked access to > > the 128-bit ct_label field may prevent a flow being offloaded due to HW > > limitations. OVN's use of ct_label currently includes: > > - ct_label.blocked (1 bit) > > - ct_label.natted (1 bit) > > - ct_label.ecmp_reply_port (16 bits) > > - ct_label.ecmp_reply_eth (48 bits) > > - ct_label.label (32 bits) > > > > This patch moves the bits blocked, natted and ecmp_reply_port to use > > ct_mark (18 bits in total among the 32-bit ct_mark), and keep the rest > > of the fields in ct_label: > > - ct_mark.blocked (1 bit) > > - ct_mark.natted (1 bit) > > - ct_mark.ecmp_reply_port (16 bits) > > - ct_label.ecmp_reply_eth (48 bits) > > - ct_label.label (32 bits) > > > > This would allow HW offloading to work for most of the cases. > > > > For ct_label.ecmp_reply_eth, the flow matching it still uses masked > > access, but it doesn't matter because the flow is for new connections > > and requires ct_commit in its actions, so it wouldn't be offloaded > > anyway for those NICs. There is a flow for established connections that > > would access the masked field in the actions, while in this patch it > > avoids masked access by using a register xxreg1 to temporarily read the > > whole ct_label, and then use masked access to xxreg1 to read the actual > > value. > > > > The only exception is for ct_label.label, there is a flow that matches > > the masked field for ACL logging of reply direction. This patch cannot > > avoid the masked access to ct_label in this case. This flow is enabled > > only for the feature "log-related". So offloading may still not work for > > some NICs when an ACL is configured with a label and with "log-related" > > enabled. > > > > There are no other flows relying on masked ct_label match, but it's > > worth noting that the LB hairpin related flows using ct_label.natted > > which were hardcoded directly in ovn-controller are still kept to avoid > > traffic breaking during upgrading. User will need to set a new option > > "ovn-check-ct-label-for-lb-hairpin" in OVS config external_ids for each > > ovn-controller to "false" to remove those flows. > > > > The change is backward compatible as long as the ovn-controller (on > > worker nodes) are upgraded before the ovn-northd (on central nodes). > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1957786 > > Signed-off-by: Han Zhou <[email protected]> > > --- > > NEWS | 6 + > > controller/lflow.c | 34 ++- > > controller/lflow.h | 1 + > > controller/ovn-controller.8.xml | 13 + > > controller/ovn-controller.c | 13 + > > include/ovn/logical-fields.h | 3 + > > lib/logical-fields.c | 17 +- > > northd/northd.c | 107 ++++--- > > northd/ovn-northd.8.xml | 48 +-- > > tests/ovn-northd.at | 524 ++++++++++++++++---------------- > > tests/ovn.at | 174 ++++++----- > > tests/system-ovn.at | 178 +++++------ > > 12 files changed, 615 insertions(+), 503 deletions(-) > > > > diff --git a/NEWS b/NEWS > > index 5ba9c3cf4..9e1274c70 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -2,6 +2,12 @@ Post v22.03.0 > > ------------- > > - Support IGMP and MLD snooping on transit logical switches that connect > > different OVN Interconnection availability zones. > > + - Replaced the usage of masked ct_label by ct_mark in most cases to work > > + better with hardware-offloading. Note that for backward compatibility > > + during upgrading some hairpin related flows are still kept by default, and > > + a new option "ovn-check-ct-label-for-lb-hairpin" for ovn-controller needs > > + to be set to "false" after the central upgrading completion to remove those > > + flows. > > > > OVN v22.03.0 - XX XXX XXXX > > -------------------------- > > diff --git a/controller/lflow.c b/controller/lflow.c > > index e169edef1..1b40cca0c 100644 > > --- a/controller/lflow.c > > +++ b/controller/lflow.c > > @@ -1894,6 +1894,7 @@ add_lb_vip_hairpin_flows(struct ovn_controller_lb *lb, > > struct ovn_lb_vip *lb_vip, > > struct ovn_lb_backend *lb_backend, > > uint8_t lb_proto, > > + bool check_ct_label_for_lb_hairpin, > > struct ovn_desired_flow_table *flow_table) > > { > > uint64_t stub[1024 / 8]; > > @@ -1985,14 +1986,29 @@ add_lb_vip_hairpin_flows(struct ovn_controller_lb *lb, > > * - the destination protocol and port must be of a valid backend that > > * has the same IP as ip.dst. > > */ > > - ovs_u128 lb_ct_label = { > > - .u64.lo = OVN_CT_NATTED, > > - }; > > - match_set_ct_label_masked(&hairpin_match, lb_ct_label, lb_ct_label); > > + uint32_t lb_ct_mark = OVN_CT_NATTED; > > + match_set_ct_mark_masked(&hairpin_match, lb_ct_mark, lb_ct_mark); > > > > ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN, 100, > > lb->slb->header_.uuid.parts[0], &hairpin_match, > > &ofpacts, &lb->slb->header_.uuid); > > + > > + /* The below flow is identical to the above except that it checks > > + * ct_label.natted instead of ct_mark.natted, for backward compatibility > > + * during the upgrade, and can be disabled afterwards by the option > > + * check_ct_label_for_lb_hairpin. */ > > + if (check_ct_label_for_lb_hairpin) { > > + match_set_ct_mark_masked(&hairpin_match, 0, 0); > > + ovs_u128 lb_ct_label = { > > + .u64.lo = OVN_CT_NATTED, > > + }; > > + match_set_ct_label_masked(&hairpin_match, lb_ct_label, lb_ct_label); > > + > > + ofctrl_add_flow(flow_table, OFTABLE_CHK_LB_HAIRPIN, 100, > > + lb->slb->header_.uuid.parts[0], &hairpin_match, > > + &ofpacts, &lb->slb->header_.uuid); > > + } > > + > > ofpbuf_uninit(&ofpacts); > > } > > > > @@ -2265,6 +2281,7 @@ add_lb_ct_snat_hairpin_flows(struct ovn_controller_lb *lb, > > static void > > consider_lb_hairpin_flows(const struct sbrec_load_balancer *sbrec_lb, > > const struct hmap *local_datapaths, > > + bool check_ct_label_for_lb_hairpin, > > struct ovn_desired_flow_table *flow_table, > > struct simap *ids) > > { > > @@ -2304,6 +2321,7 @@ consider_lb_hairpin_flows(const struct sbrec_load_balancer *sbrec_lb, > > struct ovn_lb_backend *lb_backend = &lb_vip->backends[j]; > > > > add_lb_vip_hairpin_flows(lb, lb_vip, lb_backend, lb_proto, > > + check_ct_label_for_lb_hairpin, > > flow_table); > > } > > } > > @@ -2318,6 +2336,7 @@ consider_lb_hairpin_flows(const struct sbrec_load_balancer *sbrec_lb, > > static void > > add_lb_hairpin_flows(const struct sbrec_load_balancer_table *lb_table, > > const struct hmap *local_datapaths, > > + bool check_ct_label_for_lb_hairpin, > > struct ovn_desired_flow_table *flow_table, > > struct simap *ids, > > struct id_pool *pool) > > @@ -2340,7 +2359,9 @@ add_lb_hairpin_flows(const struct sbrec_load_balancer_table *lb_table, > > ovs_assert(id_pool_alloc_id(pool, &id)); > > simap_put(ids, lb->name, id); > > } > > - consider_lb_hairpin_flows(lb, local_datapaths, flow_table, ids); > > + consider_lb_hairpin_flows(lb, local_datapaths, > > + check_ct_label_for_lb_hairpin, > > + flow_table, ids); > > } > > } > > > > @@ -2446,6 +2467,7 @@ lflow_run(struct lflow_ctx_in *l_ctx_in, struct lflow_ctx_out *l_ctx_out) > > l_ctx_in->mac_binding_table, l_ctx_in->local_datapaths, > > l_ctx_out->flow_table); > > add_lb_hairpin_flows(l_ctx_in->lb_table, l_ctx_in->local_datapaths, > > + l_ctx_in->check_ct_label_for_lb_hairpin, > > l_ctx_out->flow_table, > > l_ctx_out->hairpin_lb_ids, > > l_ctx_out->hairpin_id_pool); > > @@ -2581,6 +2603,7 @@ lflow_add_flows_for_datapath(const struct sbrec_datapath_binding *dp, > > * associated. */ > > for (size_t i = 0; i < n_dp_lbs; i++) { > > consider_lb_hairpin_flows(dp_lbs[i], l_ctx_in->local_datapaths, > > + l_ctx_in->check_ct_label_for_lb_hairpin, > > l_ctx_out->flow_table, > > l_ctx_out->hairpin_lb_ids); > > } > > @@ -2694,6 +2717,7 @@ lflow_handle_changed_lbs(struct lflow_ctx_in *l_ctx_in, > > VLOG_DBG("Add load balancer hairpin flows for "UUID_FMT, > > UUID_ARGS(&lb->header_.uuid)); > > consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths, > > + l_ctx_in->check_ct_label_for_lb_hairpin, > > l_ctx_out->flow_table, > > l_ctx_out->hairpin_lb_ids); > > } > > diff --git a/controller/lflow.h b/controller/lflow.h > > index d61733bc2..3909605f0 100644 > > --- a/controller/lflow.h > > +++ b/controller/lflow.h > > @@ -153,6 +153,7 @@ struct lflow_ctx_in { > > const struct sset *active_tunnels; > > const struct sset *related_lport_ids; > > const struct hmap *chassis_tunnels; > > + bool check_ct_label_for_lb_hairpin; > > }; > > > > struct lflow_ctx_out { > > diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml > > index cc9a7d1c2..a4d1acbaf 100644 > > --- a/controller/ovn-controller.8.xml > > +++ b/controller/ovn-controller.8.xml > > @@ -311,6 +311,19 @@ > > heplful to pin source outer IP for the tunnel when multiple interfaces > > are used on the host for overlay traffic. > > </dd> > > + <dt><code>external_ids:ovn-check-ct-label-for-lb-hairpin</code></dt> > > + <dd> > > + The check for ct_label has been replaced by ct_mark to work better with > > + hardware offloading with smart NICs. However, to be backward compatible > > + during upgrading from an older version that uses ct_label to the > > + current version, some LB hairpin related flows using ct_label are kept. > > + This boolean flag can be set to <code>false</code> after > > + <code>ovn-northd</code> (central component) upgrading is complete, to > > + tell <code>ovn-controller</code> to remove those ct_label flows. > > + Without this option (or set to <code>true</code>), the related hairpin > > + flows using ct_label are still programmed, which could still lead to > > + undesirable behavior for hardware-offloading on some NICs. > > + </dd> > > </dl> > > > > Hi Han, > > Instead of having this configuration to be defined for each chassis, > I'd suggest to do the following > - ovn-northd can set an option in SB_Global options column to > indicate that it support ct_mark flags. > like - ovn-sbctl set SB_Global.options:use_ct_mark=true (just > as an example) > > - ovn-controllers can check for this value and add the appropriate flow. > > I think this would be seamless. I think we do something similar now, > but in the opposite direction. > i.e ovn-controllers indicate the support of a feature in the chassis > private column and ovn-northd use this information to set the logical > port up column (I think). > > Let me know what you think. > > Numan
Hi Numan, I agree it may be a burden to update options for each chassis. I also thought about indicating this from ovn-northd, but it requires changes in ovn-controller I-P to take SB_Global as input and handle it properly without triggering recompute for unrelated changes. I will try this path for the next version. Do you think it is better to use the ovn-internal-version's OVN_INTERNAL_MINOR_VER to handle this without the need of a new option? I didn't see any official document regarding how would this version number be used, but it looks like we can rely on it for such purpose, if the monior_version part increases only and doesn't rewind. Assume the version before upgrading is V1, and at V2 we move ct_label to ct_mark. Ovn-controller checks if the northd minor version >= V2 it adds the ct_mark flows only, otherwise, add both ct_mark and ct_label flows. What do you think? Thanks, Han _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
