Hi Han I think that I agree: both patches are somehow orthogonal, even though they got initiated by the same issue. I just submitted the patch preventing recomputes due to Port_Binding.
Thanks Xavier On Thu, May 12, 2022 at 11:00 AM Dumitru Ceara <[email protected]> wrote: > Hi Han, Mark, Xavier, > > On 5/10/22 03:52, Han Zhou wrote: > > On Mon, May 9, 2022 at 5:36 PM Xavier Simonart <[email protected]> > wrote: > >> > >> Hi Mark, Han > >> > >> Thanks for looking into this. > >> Han's patch is addressing the issues I initially tried fixing with > > > https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ > >> > > > https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ > > is definitely not a good solution (incurs some delay and is more a > > workaround than a fix). > >> > >> I have been trying to follow another approach, which would fix a > slightly > > broader issue. > >> The goal is to try to avoid recomputes when updating Port_Binding in > SBDB. > >> In addition to the case Han (greatly) described, we also hit some > > recomputes when the same controller binds multiple ports quite quickly > > consecutively. > >> > >> So I have been trying to extend the approach used in if-status (i.e. not > > write to SBDB if SBDB is read-only, and have the if-status state machine > > running through the different interface states, update SBDB when > possible, > > and change the states). > >> I have a patch (almost) ready but had some issues with some test cases, > > and was still investigating whether those issues were due to my patch.... > >> I hope to post it tomorrow. > >> We can then discuss whether we need both approaches. > > > > Thanks Xavier. I am interested in the solution you are working on. > > I also want to add that my patch here is to provide a generic solution > that > > filters out the changes we don't care about and quits the I-P engine > > processing early. It can be extended easily in the future if we have new > > changes to ignore. In many cases this has been achieved by omitting > columns > > in OVSDB IDL monitoring, but the case here is that we care about part of > a > > column that is with k-v pairs and OVSDB IDL can monitor/omit the whole > > column. So I feel we may need this regardless of the new solution that > > addresses the consecutively multiple ports binding problem. > > I think so too. If I understand correctly we need both changes. > > > Anyway, we can wait and evaluate when your new patch is posted. > > > > +1 > > Thanks, > Dumitru > > > Thanks, > > Han > > > >> > >> Thanks > >> Xavier > >> > >> > >> On Mon, May 9, 2022 at 9:32 PM Mark Michelson <[email protected]> > wrote: > >>> > >>> Based on the explanation and the code, this seems to solve the problem > >>> of the spurious recomputes. I'd like confirmation from Xavier that this > >>> is fixing the issue he was addressing in > >>> > > > https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ > >>> , and I'd also like confirmation that this approach doesn't incur the > >>> delays that Dumitru mentioned on Xavier's patch. > >>> > >>> With those confirmations, I'll add my ack to this patch. > >>> > >>> On 5/3/22 15:37, Han Zhou wrote: > >>>> During the process of claiming a VIF port on a chassis, there are > >>>> multiple SB and local OVSDB updates by ovn-controller, which may > > trigger > >>>> ovn-controller recompute, depending on the timing between the > > completion > >>>> of SB and local OVSDB updates: > >>>> > >>>> After port claiming, ovn-controller sends an update to the local OVSDB > >>>> to clean the Interface:external_ids:ovn-installed, and also sends an > >>>> update to the SB DB to update the Port_Binding:chassis. If the local > >>>> OVSDB update notification comes back before the SB update completes, > >>>> ovn-controller's SB IDL is still "in transaction" and read-only, but > > the > >>>> local Interface update triggers the I-P engine runtime_data node to > >>>> handle the OVS_interface change, which considers the VIF to be claimed > >>>> (again) potentially, which requires SB updates while it is not > possible > >>>> at the moment, and so the I-P handler returns failure and the engine > >>>> aborts, which triggers recompute at the next I-P engine iteration when > >>>> SB DB is writable. > >>>> > >>>> If we are lucky at the above step that the SB update completes before > >>>> the local OVSDB update is received, ovn-controller would handle the > >>>> change incrementally (and figures out that the SB Port_Binding chassis > >>>> is already the current one thus wouldn't reclaim the VIF). But the > next > >>>> step after it completes flow computing for the newly bound VIF it > sends > >>>> another update to the local OVSDB to set the > >>>> Interface:external_ids:ovn-installed and ovn-installed-ts, while at > the > >>>> same time it also sends an update to the SB DB to set the > >>>> Port_Binding:up. These two parallel updates would lead to the same > >>>> situation as the above step and potentially cause ovn-controll > >>>> recompute. > >>>> > >>>> The recompute is very likely to be triggered in production because the > >>>> SB DB is remote (on central nodes) and has heavier load, which > requires > >>>> longer time to complete the updates than the local OVSDB server. > >>>> > >>>> The problem is that the runtime_data's OVS_interface change handler > >>>> couldn't handle the update of the Interface:external_ids:ovn-installed > >>>> incrementally, because ovn-controller assumes it could require an SB > >>>> update that is not possible at the moment. However, the assumption is > >>>> not true. In fact it shouldn't even care about the change because this > >>>> external_id:ovn-installed is written by ovn-controller itself and it > is > >>>> write-only (no need to read it as an input for any further > processing). > >>>> > >>>> So, the idea to fix the problem is to ignore such changes that happen > > to > >>>> the "write-only" fields only. OVSDB change tracking provides APIs to > >>>> check if a specific column is updated, but the difficulty here is that > >>>> the column external_ids is a k-v map which contains not only the data > >>>> that ovn-controller can ignore, such as ovn-installed, > > ovn-installed-ts, > >>>> but also the data ovn-controller cares about, such as iface-id, > >>>> iface-id-ver and encap-ip. To solve this problem, we need to know > > what's > >>>> changed exactly inside the external_ids map. > >>>> > >>>> This patch solves it by introducing a new I-P engine node > >>>> ovs_interface_shadow to wrap the OVSDB input ovs_interface with an > > extra > >>>> copy of the older version of the column external_ids, and replace the > >>>> input OVS_interface of the runtime_data node. In the change handler of > > the > >>>> runtime_data node we evaluates if the change needs to be handled > before > >>>> calling consider_iface_claim(), so in the above scenarios the changes > >>>> don't need to be handled and won't trigger recompute any more. A test > >>>> case is also updated to capture this (which is very likely, if not > > 100%, > >>>> to fail without the fix). > >>>> > >>>> Signed-off-by: Han Zhou <[email protected]> > >>>> --- > >>>> v2: Add comments that was missing while committing v1. > >>>> > >>>> controller/binding.c | 70 ++++++++++++++++++++++++ > >>>> controller/binding.h | 1 + > >>>> controller/ovn-controller.c | 105 > > +++++++++++++++++++++++++++++++++--- > >>>> tests/ovn-performance.at | 17 ++++++ > >>>> 4 files changed, 186 insertions(+), 7 deletions(-) > >>>> > >>>> diff --git a/controller/binding.c b/controller/binding.c > >>>> index 7281b0485..0563d9d1e 100644 > >>>> --- a/controller/binding.c > >>>> +++ b/controller/binding.c > >>>> @@ -1907,6 +1907,71 @@ is_iface_in_int_bridge(const struct > > ovsrec_interface *iface, > >>>> return false; > >>>> } > >>>> > >>>> +static bool > >>>> +is_ext_id_changed(const struct smap *a, const struct smap *b, const > > char *key) > >>>> +{ > >>>> + const char *value_a = smap_get(a, key); > >>>> + const char *value_b = smap_get(b, key); > >>>> + if ((value_a && !value_b) > >>>> + || (!value_a && value_b) > >>>> + || (value_a && value_b && strcmp(value_a, value_b))) { > >>>> + return true; > >>>> + } > >>>> + return false; > >>>> +} > >>>> + > >>>> +/* Check if the change in 'iface_rec' is something we are interested > > in from > >>>> + * port binding perspective. Return true if the change needs to be > > handled, > >>>> + * otherwise return false. > >>>> + * > >>>> + * The 'iface_rec' must be change tracked, i.e. iterator from > >>>> + * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */ > >>>> +static bool > >>>> +ovs_interface_change_need_handle(const struct ovsrec_interface > > *iface_rec, > >>>> + struct shash > > *iface_table_external_ids_old) > >>>> +{ > >>>> + if (ovsrec_interface_is_updated(iface_rec, > >>>> + OVSREC_INTERFACE_COL_NAME)) { > >>>> + return true; > >>>> + } > >>>> + if (ovsrec_interface_is_updated(iface_rec, > >>>> + OVSREC_INTERFACE_COL_OFPORT)) { > >>>> + return true; > >>>> + } > >>>> + if (ovsrec_interface_is_updated(iface_rec, > >>>> + OVSREC_INTERFACE_COL_TYPE)) { > >>>> + return true; > >>>> + } > >>>> + if (ovsrec_interface_is_updated(iface_rec, > >>>> + > > OVSREC_INTERFACE_COL_EXTERNAL_IDS)) { > >>>> + /* Compare the external_ids that we are interested in with > > the old > >>>> + * values: > >>>> + * - iface-id > >>>> + * - iface-id-ver > >>>> + * - encap-ip > >>>> + * For any other changes, such as ovn-installed, > > ovn-installed-ts, etc, > >>>> + * we don't need to handle. */ > >>>> + struct smap *external_ids_old = > >>>> + shash_find_data(iface_table_external_ids_old, > > iface_rec->name); > >>>> + if (!external_ids_old) { > >>>> + return true; > >>>> + } > >>>> + if (is_ext_id_changed(&iface_rec->external_ids, > > external_ids_old, > >>>> + "iface-id")) { > >>>> + return true; > >>>> + } > >>>> + if (is_ext_id_changed(&iface_rec->external_ids, > > external_ids_old, > >>>> + "iface-id-ver")) { > >>>> + return true; > >>>> + } > >>>> + if (is_ext_id_changed(&iface_rec->external_ids, > > external_ids_old, > >>>> + "encap-ip")) { > >>>> + return true; > >>>> + } > >>>> + } > >>>> + return false; > >>>> +} > >>>> + > >>>> /* Returns true if the ovs interface changes were handled > > successfully, > >>>> * false otherwise. > >>>> */ > >>>> @@ -2018,6 +2083,11 @@ binding_handle_ovs_interface_changes(struct > > binding_ctx_in *b_ctx_in, > >>>> continue; > >>>> } > >>>> > >>>> + if (!ovs_interface_change_need_handle( > >>>> + iface_rec, b_ctx_in->iface_table_external_ids_old)) { > >>>> + continue; > >>>> + } > >>>> + > >>>> const char *iface_id = smap_get(&iface_rec->external_ids, > > "iface-id"); > >>>> int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport : > > 0; > >>>> if (iface_id && ofport > 0 && > >>>> diff --git a/controller/binding.h b/controller/binding.h > >>>> index 430a8d9b1..e49e1ebb6 100644 > >>>> --- a/controller/binding.h > >>>> +++ b/controller/binding.h > >>>> @@ -55,6 +55,7 @@ struct binding_ctx_in { > >>>> const struct ovsrec_bridge_table *bridge_table; > >>>> const struct ovsrec_open_vswitch_table *ovs_table; > >>>> const struct ovsrec_interface_table *iface_table; > >>>> + struct shash *iface_table_external_ids_old; > >>>> }; > >>>> > >>>> /* Locally relevant port bindings, e.g., VIFs that might be bound > > locally, > >>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > >>>> index 22b7fa935..649353f7d 100644 > >>>> --- a/controller/ovn-controller.c > >>>> +++ b/controller/ovn-controller.c > >>>> @@ -1042,6 +1042,91 @@ en_ofctrl_is_connected_run(struct engine_node > > *node, void *data) > >>>> engine_set_node_state(node, EN_UNCHANGED); > >>>> } > >>>> > >>>> +/* This engine node is to wrap the OVS_interface input and maintain a > > copy of > >>>> + * the old version of data for the column external_ids. > >>>> + * > >>>> + * There are some special considerations of this engine node: > >>>> + * 1. It has a single input OVS_interface, and it transparently > > passes the > >>>> + * input changes as its own output data to its dependants. So > > there is no > >>>> + * processing to OVS_interface changes but simply mark the node > > status as > >>>> + * UPDATED (and so the run() and the change handler is the same). > >>>> + * 2. The iface_table_external_ids_old is computed/updated in the > > member > >>>> + * clear_tracked_data(), because that is when the last round of > > processing > >>>> + * has completed but the new IDL data is yet to refresh, so we > > replace the > >>>> + * old data with the current data. */ > >>>> +struct ed_type_ovs_interface_shadow { > >>>> + struct ovsrec_interface_table *iface_table; > >>>> + struct shash iface_table_external_ids_old; > >>>> +}; > >>>> + > >>>> +static void * > >>>> +en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED, > >>>> + struct engine_arg *arg OVS_UNUSED) > >>>> +{ > >>>> + struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof > *data); > >>>> + data->iface_table = NULL; > >>>> + shash_init(&data->iface_table_external_ids_old); > >>>> + > >>>> + return data; > >>>> +} > >>>> + > >>>> +static void > >>>> +iface_table_external_ids_old_destroy(struct shash *table_ext_ids) > >>>> +{ > >>>> + struct shash_node *node; > >>>> + SHASH_FOR_EACH_SAFE (node, table_ext_ids) { > >>>> + struct smap *ext_ids = node->data; > >>>> + smap_destroy(ext_ids); > >>>> + } > >>>> + shash_destroy_free_data(table_ext_ids); > >>>> +} > >>>> + > >>>> +static void > >>>> +en_ovs_interface_shadow_cleanup(void *data_) > >>>> +{ > >>>> + struct ed_type_ovs_interface_shadow *data = data_; > >>>> + > > > iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old); > >>>> +} > >>>> + > >>>> +static void > >>>> +en_ovs_interface_shadow_clear_tracked_data(void *data_) > >>>> +{ > >>>> + struct ed_type_ovs_interface_shadow *data = data_; > >>>> + > > > iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old); > >>>> + shash_init(&data->iface_table_external_ids_old); > >>>> + > >>>> + if (!data->iface_table) { > >>>> + return; > >>>> + } > >>>> + > >>>> + const struct ovsrec_interface *iface_rec; > >>>> + OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) { > >>>> + struct smap *external_ids = xmalloc(sizeof *external_ids); > >>>> + smap_clone(external_ids, &iface_rec->external_ids); > >>>> + shash_add(&data->iface_table_external_ids_old, > > iface_rec->name, > >>>> + external_ids); > >>>> + } > >>>> +} > >>>> + > >>>> +static void > >>>> +en_ovs_interface_shadow_run(struct engine_node *node, void *data_) > >>>> +{ > >>>> + struct ed_type_ovs_interface_shadow *data = data_; > >>>> + struct ovsrec_interface_table *iface_table = > >>>> + (struct ovsrec_interface_table *)EN_OVSDB_GET( > >>>> + engine_get_input("OVS_interface", node)); > >>>> + data->iface_table = iface_table; > >>>> + engine_set_node_state(node, EN_UPDATED); > >>>> +} > >>>> + > >>>> +static bool > >>>> +ovs_interface_shadow_ovs_interface_handler(struct engine_node *node, > >>>> + void *data_) > >>>> +{ > >>>> + en_ovs_interface_shadow_run(node, data_); > >>>> + return true; > >>>> +} > >>>> + > >>>> struct ed_type_runtime_data { > >>>> /* Contains "struct local_datapath" nodes. */ > >>>> struct hmap local_datapaths; > >>>> @@ -1214,9 +1299,8 @@ init_binding_ctx(struct engine_node *node, > >>>> (struct ovsrec_port_table *)EN_OVSDB_GET( > >>>> engine_get_input("OVS_port", node)); > >>>> > >>>> - struct ovsrec_interface_table *iface_table = > >>>> - (struct ovsrec_interface_table *)EN_OVSDB_GET( > >>>> - engine_get_input("OVS_interface", node)); > >>>> + struct ed_type_ovs_interface_shadow *iface_shadow = > >>>> + engine_get_input_data("ovs_interface_shadow", node); > >>>> > >>>> struct ovsrec_qos_table *qos_table = > >>>> (struct ovsrec_qos_table *)EN_OVSDB_GET( > >>>> @@ -1249,7 +1333,9 @@ init_binding_ctx(struct engine_node *node, > >>>> b_ctx_in->sbrec_port_binding_by_datapath = > > sbrec_port_binding_by_datapath; > >>>> b_ctx_in->sbrec_port_binding_by_name = > > sbrec_port_binding_by_name; > >>>> b_ctx_in->port_table = port_table; > >>>> - b_ctx_in->iface_table = iface_table; > >>>> + b_ctx_in->iface_table = iface_shadow->iface_table; > >>>> + b_ctx_in->iface_table_external_ids_old = > >>>> + &iface_shadow->iface_table_external_ids_old; > >>>> b_ctx_in->qos_table = qos_table; > >>>> b_ctx_in->port_binding_table = pb_table; > >>>> b_ctx_in->br_int = br_int; > >>>> @@ -1331,7 +1417,7 @@ en_runtime_data_run(struct engine_node *node, > > void *data) > >>>> } > >>>> > >>>> static bool > >>>> -runtime_data_ovs_interface_handler(struct engine_node *node, void > > *data) > >>>> +runtime_data_ovs_interface_shadow_handler(struct engine_node *node, > > void *data) > >>>> { > >>>> struct ed_type_runtime_data *rt_data = data; > >>>> struct binding_ctx_in b_ctx_in; > >>>> @@ -3338,6 +3424,8 @@ main(int argc, char *argv[]) > >>>> > >>>> /* Define inc-proc-engine nodes. */ > >>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, > "ct_zones"); > >>>> + ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow, > >>>> + "ovs_interface_shadow"); > >>>> ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data"); > >>>> ENGINE_NODE(non_vif_data, "non_vif_data"); > >>>> ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve"); > >>>> @@ -3459,6 +3547,9 @@ main(int argc, char *argv[]) > >>>> engine_add_input(&en_ct_zones, &en_runtime_data, > >>>> ct_zones_runtime_data_handler); > >>>> > >>>> + engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface, > >>>> + ovs_interface_shadow_ovs_interface_handler); > >>>> + > >>>> engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, > > NULL); > >>>> > >>>> engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL); > >>>> @@ -3480,8 +3571,8 @@ main(int argc, char *argv[]) > >>>> */ > >>>> engine_add_input(&en_runtime_data, &en_ovs_port, > >>>> engine_noop_handler); > >>>> - engine_add_input(&en_runtime_data, &en_ovs_interface, > >>>> - runtime_data_ovs_interface_handler); > >>>> + engine_add_input(&en_runtime_data, &en_ovs_interface_shadow, > >>>> + runtime_data_ovs_interface_shadow_handler); > >>>> > >>>> engine_add_input(&en_flow_output, &en_lflow_output, > >>>> flow_output_lflow_output_handler); > >>>> diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at > >>>> index bc133f93b..9affca498 100644 > >>>> --- a/tests/ovn-performance.at > >>>> +++ b/tests/ovn-performance.at > >>>> @@ -366,6 +366,23 @@ for i in 1 2; do > >>>> ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' && > >>>> ovn-nbctl --wait=hv sync] > >>>> ) > >>>> + > >>>> + # Delete and recreate $lp to make it unbind and rebind multiple > > times, and > >>>> + # ensure no recompute is triggered. > >>>> + for k in $(seq 10); do > >>>> + OVN_CONTROLLER_EXPECT_NO_HIT( > >>>> + [hv$i hv$j], [lflow_run], > >>>> + [as hv$i ovs-vsctl set Interface $vif > > external-ids:iface-id=xxxx && > >>>> + ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false' > > && > >>>> + ovn-nbctl --wait=hv sync] > >>>> + ) > >>>> + OVN_CONTROLLER_EXPECT_NO_HIT( > >>>> + [hv$i hv$j], [lflow_run], > >>>> + [as hv$i ovs-vsctl set Interface $vif > > external-ids:iface-id=$lp && > >>>> + ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' > && > >>>> + ovn-nbctl --wait=hv sync] > >>>> + ) > >>>> + done > >>>> done > >>>> > >>>> for i in 1 2; do > >>>> > >>> > > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
