On Mon, May 9, 2022 at 5:36 PM Xavier Simonart <[email protected]> wrote: > > Hi Mark, Han > > Thanks for looking into this. > Han's patch is addressing the issues I initially tried fixing with https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ > https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ is definitely not a good solution (incurs some delay and is more a workaround than a fix). > > I have been trying to follow another approach, which would fix a slightly broader issue. > The goal is to try to avoid recomputes when updating Port_Binding in SBDB. > In addition to the case Han (greatly) described, we also hit some recomputes when the same controller binds multiple ports quite quickly consecutively. > > So I have been trying to extend the approach used in if-status (i.e. not write to SBDB if SBDB is read-only, and have the if-status state machine running through the different interface states, update SBDB when possible, and change the states). > I have a patch (almost) ready but had some issues with some test cases, and was still investigating whether those issues were due to my patch.... > I hope to post it tomorrow. > We can then discuss whether we need both approaches.
Thanks Xavier. I am interested in the solution you are working on. I also want to add that my patch here is to provide a generic solution that filters out the changes we don't care about and quits the I-P engine processing early. It can be extended easily in the future if we have new changes to ignore. In many cases this has been achieved by omitting columns in OVSDB IDL monitoring, but the case here is that we care about part of a column that is with k-v pairs and OVSDB IDL can monitor/omit the whole column. So I feel we may need this regardless of the new solution that addresses the consecutively multiple ports binding problem. Anyway, we can wait and evaluate when your new patch is posted. Thanks, Han > > Thanks > Xavier > > > On Mon, May 9, 2022 at 9:32 PM Mark Michelson <[email protected]> wrote: >> >> Based on the explanation and the code, this seems to solve the problem >> of the spurious recomputes. I'd like confirmation from Xavier that this >> is fixing the issue he was addressing in >> https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ >> , and I'd also like confirmation that this approach doesn't incur the >> delays that Dumitru mentioned on Xavier's patch. >> >> With those confirmations, I'll add my ack to this patch. >> >> On 5/3/22 15:37, Han Zhou wrote: >> > During the process of claiming a VIF port on a chassis, there are >> > multiple SB and local OVSDB updates by ovn-controller, which may trigger >> > ovn-controller recompute, depending on the timing between the completion >> > of SB and local OVSDB updates: >> > >> > After port claiming, ovn-controller sends an update to the local OVSDB >> > to clean the Interface:external_ids:ovn-installed, and also sends an >> > update to the SB DB to update the Port_Binding:chassis. If the local >> > OVSDB update notification comes back before the SB update completes, >> > ovn-controller's SB IDL is still "in transaction" and read-only, but the >> > local Interface update triggers the I-P engine runtime_data node to >> > handle the OVS_interface change, which considers the VIF to be claimed >> > (again) potentially, which requires SB updates while it is not possible >> > at the moment, and so the I-P handler returns failure and the engine >> > aborts, which triggers recompute at the next I-P engine iteration when >> > SB DB is writable. >> > >> > If we are lucky at the above step that the SB update completes before >> > the local OVSDB update is received, ovn-controller would handle the >> > change incrementally (and figures out that the SB Port_Binding chassis >> > is already the current one thus wouldn't reclaim the VIF). But the next >> > step after it completes flow computing for the newly bound VIF it sends >> > another update to the local OVSDB to set the >> > Interface:external_ids:ovn-installed and ovn-installed-ts, while at the >> > same time it also sends an update to the SB DB to set the >> > Port_Binding:up. These two parallel updates would lead to the same >> > situation as the above step and potentially cause ovn-controll >> > recompute. >> > >> > The recompute is very likely to be triggered in production because the >> > SB DB is remote (on central nodes) and has heavier load, which requires >> > longer time to complete the updates than the local OVSDB server. >> > >> > The problem is that the runtime_data's OVS_interface change handler >> > couldn't handle the update of the Interface:external_ids:ovn-installed >> > incrementally, because ovn-controller assumes it could require an SB >> > update that is not possible at the moment. However, the assumption is >> > not true. In fact it shouldn't even care about the change because this >> > external_id:ovn-installed is written by ovn-controller itself and it is >> > write-only (no need to read it as an input for any further processing). >> > >> > So, the idea to fix the problem is to ignore such changes that happen to >> > the "write-only" fields only. OVSDB change tracking provides APIs to >> > check if a specific column is updated, but the difficulty here is that >> > the column external_ids is a k-v map which contains not only the data >> > that ovn-controller can ignore, such as ovn-installed, ovn-installed-ts, >> > but also the data ovn-controller cares about, such as iface-id, >> > iface-id-ver and encap-ip. To solve this problem, we need to know what's >> > changed exactly inside the external_ids map. >> > >> > This patch solves it by introducing a new I-P engine node >> > ovs_interface_shadow to wrap the OVSDB input ovs_interface with an extra >> > copy of the older version of the column external_ids, and replace the >> > input OVS_interface of the runtime_data node. In the change handler of the >> > runtime_data node we evaluates if the change needs to be handled before >> > calling consider_iface_claim(), so in the above scenarios the changes >> > don't need to be handled and won't trigger recompute any more. A test >> > case is also updated to capture this (which is very likely, if not 100%, >> > to fail without the fix). >> > >> > Signed-off-by: Han Zhou <[email protected]> >> > --- >> > v2: Add comments that was missing while committing v1. >> > >> > controller/binding.c | 70 ++++++++++++++++++++++++ >> > controller/binding.h | 1 + >> > controller/ovn-controller.c | 105 +++++++++++++++++++++++++++++++++--- >> > tests/ovn-performance.at | 17 ++++++ >> > 4 files changed, 186 insertions(+), 7 deletions(-) >> > >> > diff --git a/controller/binding.c b/controller/binding.c >> > index 7281b0485..0563d9d1e 100644 >> > --- a/controller/binding.c >> > +++ b/controller/binding.c >> > @@ -1907,6 +1907,71 @@ is_iface_in_int_bridge(const struct ovsrec_interface *iface, >> > return false; >> > } >> > >> > +static bool >> > +is_ext_id_changed(const struct smap *a, const struct smap *b, const char *key) >> > +{ >> > + const char *value_a = smap_get(a, key); >> > + const char *value_b = smap_get(b, key); >> > + if ((value_a && !value_b) >> > + || (!value_a && value_b) >> > + || (value_a && value_b && strcmp(value_a, value_b))) { >> > + return true; >> > + } >> > + return false; >> > +} >> > + >> > +/* Check if the change in 'iface_rec' is something we are interested in from >> > + * port binding perspective. Return true if the change needs to be handled, >> > + * otherwise return false. >> > + * >> > + * The 'iface_rec' must be change tracked, i.e. iterator from >> > + * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */ >> > +static bool >> > +ovs_interface_change_need_handle(const struct ovsrec_interface *iface_rec, >> > + struct shash *iface_table_external_ids_old) >> > +{ >> > + if (ovsrec_interface_is_updated(iface_rec, >> > + OVSREC_INTERFACE_COL_NAME)) { >> > + return true; >> > + } >> > + if (ovsrec_interface_is_updated(iface_rec, >> > + OVSREC_INTERFACE_COL_OFPORT)) { >> > + return true; >> > + } >> > + if (ovsrec_interface_is_updated(iface_rec, >> > + OVSREC_INTERFACE_COL_TYPE)) { >> > + return true; >> > + } >> > + if (ovsrec_interface_is_updated(iface_rec, >> > + OVSREC_INTERFACE_COL_EXTERNAL_IDS)) { >> > + /* Compare the external_ids that we are interested in with the old >> > + * values: >> > + * - iface-id >> > + * - iface-id-ver >> > + * - encap-ip >> > + * For any other changes, such as ovn-installed, ovn-installed-ts, etc, >> > + * we don't need to handle. */ >> > + struct smap *external_ids_old = >> > + shash_find_data(iface_table_external_ids_old, iface_rec->name); >> > + if (!external_ids_old) { >> > + return true; >> > + } >> > + if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old, >> > + "iface-id")) { >> > + return true; >> > + } >> > + if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old, >> > + "iface-id-ver")) { >> > + return true; >> > + } >> > + if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old, >> > + "encap-ip")) { >> > + return true; >> > + } >> > + } >> > + return false; >> > +} >> > + >> > /* Returns true if the ovs interface changes were handled successfully, >> > * false otherwise. >> > */ >> > @@ -2018,6 +2083,11 @@ binding_handle_ovs_interface_changes(struct binding_ctx_in *b_ctx_in, >> > continue; >> > } >> > >> > + if (!ovs_interface_change_need_handle( >> > + iface_rec, b_ctx_in->iface_table_external_ids_old)) { >> > + continue; >> > + } >> > + >> > const char *iface_id = smap_get(&iface_rec->external_ids, "iface-id"); >> > int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport : 0; >> > if (iface_id && ofport > 0 && >> > diff --git a/controller/binding.h b/controller/binding.h >> > index 430a8d9b1..e49e1ebb6 100644 >> > --- a/controller/binding.h >> > +++ b/controller/binding.h >> > @@ -55,6 +55,7 @@ struct binding_ctx_in { >> > const struct ovsrec_bridge_table *bridge_table; >> > const struct ovsrec_open_vswitch_table *ovs_table; >> > const struct ovsrec_interface_table *iface_table; >> > + struct shash *iface_table_external_ids_old; >> > }; >> > >> > /* Locally relevant port bindings, e.g., VIFs that might be bound locally, >> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >> > index 22b7fa935..649353f7d 100644 >> > --- a/controller/ovn-controller.c >> > +++ b/controller/ovn-controller.c >> > @@ -1042,6 +1042,91 @@ en_ofctrl_is_connected_run(struct engine_node *node, void *data) >> > engine_set_node_state(node, EN_UNCHANGED); >> > } >> > >> > +/* This engine node is to wrap the OVS_interface input and maintain a copy of >> > + * the old version of data for the column external_ids. >> > + * >> > + * There are some special considerations of this engine node: >> > + * 1. It has a single input OVS_interface, and it transparently passes the >> > + * input changes as its own output data to its dependants. So there is no >> > + * processing to OVS_interface changes but simply mark the node status as >> > + * UPDATED (and so the run() and the change handler is the same). >> > + * 2. The iface_table_external_ids_old is computed/updated in the member >> > + * clear_tracked_data(), because that is when the last round of processing >> > + * has completed but the new IDL data is yet to refresh, so we replace the >> > + * old data with the current data. */ >> > +struct ed_type_ovs_interface_shadow { >> > + struct ovsrec_interface_table *iface_table; >> > + struct shash iface_table_external_ids_old; >> > +}; >> > + >> > +static void * >> > +en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED, >> > + struct engine_arg *arg OVS_UNUSED) >> > +{ >> > + struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof *data); >> > + data->iface_table = NULL; >> > + shash_init(&data->iface_table_external_ids_old); >> > + >> > + return data; >> > +} >> > + >> > +static void >> > +iface_table_external_ids_old_destroy(struct shash *table_ext_ids) >> > +{ >> > + struct shash_node *node; >> > + SHASH_FOR_EACH_SAFE (node, table_ext_ids) { >> > + struct smap *ext_ids = node->data; >> > + smap_destroy(ext_ids); >> > + } >> > + shash_destroy_free_data(table_ext_ids); >> > +} >> > + >> > +static void >> > +en_ovs_interface_shadow_cleanup(void *data_) >> > +{ >> > + struct ed_type_ovs_interface_shadow *data = data_; >> > + iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old); >> > +} >> > + >> > +static void >> > +en_ovs_interface_shadow_clear_tracked_data(void *data_) >> > +{ >> > + struct ed_type_ovs_interface_shadow *data = data_; >> > + iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old); >> > + shash_init(&data->iface_table_external_ids_old); >> > + >> > + if (!data->iface_table) { >> > + return; >> > + } >> > + >> > + const struct ovsrec_interface *iface_rec; >> > + OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) { >> > + struct smap *external_ids = xmalloc(sizeof *external_ids); >> > + smap_clone(external_ids, &iface_rec->external_ids); >> > + shash_add(&data->iface_table_external_ids_old, iface_rec->name, >> > + external_ids); >> > + } >> > +} >> > + >> > +static void >> > +en_ovs_interface_shadow_run(struct engine_node *node, void *data_) >> > +{ >> > + struct ed_type_ovs_interface_shadow *data = data_; >> > + struct ovsrec_interface_table *iface_table = >> > + (struct ovsrec_interface_table *)EN_OVSDB_GET( >> > + engine_get_input("OVS_interface", node)); >> > + data->iface_table = iface_table; >> > + engine_set_node_state(node, EN_UPDATED); >> > +} >> > + >> > +static bool >> > +ovs_interface_shadow_ovs_interface_handler(struct engine_node *node, >> > + void *data_) >> > +{ >> > + en_ovs_interface_shadow_run(node, data_); >> > + return true; >> > +} >> > + >> > struct ed_type_runtime_data { >> > /* Contains "struct local_datapath" nodes. */ >> > struct hmap local_datapaths; >> > @@ -1214,9 +1299,8 @@ init_binding_ctx(struct engine_node *node, >> > (struct ovsrec_port_table *)EN_OVSDB_GET( >> > engine_get_input("OVS_port", node)); >> > >> > - struct ovsrec_interface_table *iface_table = >> > - (struct ovsrec_interface_table *)EN_OVSDB_GET( >> > - engine_get_input("OVS_interface", node)); >> > + struct ed_type_ovs_interface_shadow *iface_shadow = >> > + engine_get_input_data("ovs_interface_shadow", node); >> > >> > struct ovsrec_qos_table *qos_table = >> > (struct ovsrec_qos_table *)EN_OVSDB_GET( >> > @@ -1249,7 +1333,9 @@ init_binding_ctx(struct engine_node *node, >> > b_ctx_in->sbrec_port_binding_by_datapath = sbrec_port_binding_by_datapath; >> > b_ctx_in->sbrec_port_binding_by_name = sbrec_port_binding_by_name; >> > b_ctx_in->port_table = port_table; >> > - b_ctx_in->iface_table = iface_table; >> > + b_ctx_in->iface_table = iface_shadow->iface_table; >> > + b_ctx_in->iface_table_external_ids_old = >> > + &iface_shadow->iface_table_external_ids_old; >> > b_ctx_in->qos_table = qos_table; >> > b_ctx_in->port_binding_table = pb_table; >> > b_ctx_in->br_int = br_int; >> > @@ -1331,7 +1417,7 @@ en_runtime_data_run(struct engine_node *node, void *data) >> > } >> > >> > static bool >> > -runtime_data_ovs_interface_handler(struct engine_node *node, void *data) >> > +runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void *data) >> > { >> > struct ed_type_runtime_data *rt_data = data; >> > struct binding_ctx_in b_ctx_in; >> > @@ -3338,6 +3424,8 @@ main(int argc, char *argv[]) >> > >> > /* Define inc-proc-engine nodes. */ >> > ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones"); >> > + ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, >> > + "ovs_interface_shadow"); >> > ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data"); >> > ENGINE_NODE(non_vif_data, "non_vif_data"); >> > ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve"); >> > @@ -3459,6 +3547,9 @@ main(int argc, char *argv[]) >> > engine_add_input(&en_ct_zones, &en_runtime_data, >> > ct_zones_runtime_data_handler); >> > >> > + engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, >> > + ovs_interface_shadow_ovs_interface_handler); >> > + >> > engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL); >> > >> > engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL); >> > @@ -3480,8 +3571,8 @@ main(int argc, char *argv[]) >> > */ >> > engine_add_input(&en_runtime_data, &en_ovs_port, >> > engine_noop_handler); >> > - engine_add_input(&en_runtime_data, &en_ovs_interface, >> > - runtime_data_ovs_interface_handler); >> > + engine_add_input(&en_runtime_data, &en_ovs_interface_shadow, >> > + runtime_data_ovs_interface_shadow_handler); >> > >> > engine_add_input(&en_flow_output, &en_lflow_output, >> > flow_output_lflow_output_handler); >> > diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at >> > index bc133f93b..9affca498 100644 >> > --- a/tests/ovn-performance.at >> > +++ b/tests/ovn-performance.at >> > @@ -366,6 +366,23 @@ for i in 1 2; do >> > ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' && >> > ovn-nbctl --wait=hv sync] >> > ) >> > + >> > + # Delete and recreate $lp to make it unbind and rebind multiple times, and >> > + # ensure no recompute is triggered. >> > + for k in $(seq 10); do >> > + OVN_CONTROLLER_EXPECT_NO_HIT( >> > + [hv$i hv$j], [lflow_run], >> > + [as hv$i ovs-vsctl set Interface $vif external-ids:iface-id=xxxx && >> > + ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false' && >> > + ovn-nbctl --wait=hv sync] >> > + ) >> > + OVN_CONTROLLER_EXPECT_NO_HIT( >> > + [hv$i hv$j], [lflow_run], >> > + [as hv$i ovs-vsctl set Interface $vif external-ids:iface-id=$lp && >> > + ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' && >> > + ovn-nbctl --wait=hv sync] >> > + ) >> > + done >> > done >> > >> > for i in 1 2; do >> > >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
