On 3/6/23 15:22, Xavier Simonart wrote: > Hi Dumitru > Hi Xavier,
> Thanks for the review and the comments > Thanks for followin up! > On Wed, Feb 15, 2023 at 2:15 PM Dumitru Ceara <[email protected]> wrote: > >> Hi Xavier, >> >> Thanks for this new version and sorry for the time it took to review it! >> I think it mostly looks good; I only have a few minor remarks/questions >> below. >> >> On 1/4/23 09:32, Xavier Simonart wrote: >>> OVN checks whether ovn-installed is already present (in OVS) before >> updating it. >>> This might cause ovn-installed related issues in the following case: >>> - (1) ovn-installed is present >>> - (2) we claim the interface >>> - (3) we update ovs, removing ovn-installed and start installing flows >>> - (4) (next loop), after flows installed, we check if ovn-installed is >> absent, >>> and if yes, we update OVS with ovn-installed. >>> However, in step4, if OVS is still busy from step 3, ovn-installed is >> read as >>> present; hence OVN controller does not update it and move to INSTALLED >> state. >>> >>> ovn-installed was also not properly deleted on port or binding removal. >>> >>> Note that port->up is not properly removed on external_ids:iface-id >> removal when >>> sb is read-only. This will be fixed in a further patch. >>> >>> Fixes: a7c7d4519e50 ("controller: avoid recomputes triggered by SBDB >> Port_Binding updates.") >>> >>> Signed-off-by: Xavier Simonart <[email protected]> >>> >> >> Nit: there should be a "---" here. Like that the comments below don't >> get included in the actual commit. We can probably fix this at apply >> time so there's no need to send a v3 for now. >> >> Oops - thanks, good catch. > >>> v2: - handled Dumitru's comments >>> - rebased on main >>> - updated state machine diagram as commented in v1 commit message >>> - remove ovn-installed on port deletion or external_ids:iface-id >> removal. >>> - added test case >>> --- >>> controller/binding.c | 59 +++++++++- >>> controller/binding.h | 6 + >>> controller/if-status.c | 113 +++++++++++++----- >>> tests/ovn.at | 257 +++++++++++++++++++++++++++++++++++++++++ >>> 4 files changed, 403 insertions(+), 32 deletions(-) >>> >>> diff --git a/controller/binding.c b/controller/binding.c >>> index 5df62baef..d85a17730 100644 >>> --- a/controller/binding.c >>> +++ b/controller/binding.c >>> @@ -746,6 +746,19 @@ local_binding_get_lport_ofport(const struct shash >> *local_bindings, >>> u16_to_ofp(lbinding->iface->ofport[0]) : 0; >>> } >>> >>> +bool >>> +local_binding_is_ovn_installed(struct shash *local_bindings, >>> + const char *pb_name) >>> +{ >>> + struct local_binding *lbinding = >>> + local_binding_find(local_bindings, pb_name); >>> + if (lbinding && lbinding->iface) { >>> + return smap_get_bool(&lbinding->iface->external_ids, >>> + OVN_INSTALLED_EXT_ID, false); >>> + } >>> + return false; >>> +} >>> + >>> bool >>> local_binding_is_up(struct shash *local_bindings, const char *pb_name, >>> const struct sbrec_chassis *chassis_rec) >>> @@ -783,6 +796,7 @@ local_binding_is_down(struct shash *local_bindings, >> const char *pb_name, >>> } else if (b_lport->pb->chassis) { >>> VLOG_DBG("lport %s already claimed by other chassis", >>> b_lport->pb->logical_port); >>> + return true; >>> } >>> } >>> >>> @@ -834,6 +848,33 @@ local_binding_set_up(struct shash *local_bindings, >> const char *pb_name, >>> } >>> } >>> >>> +static void >>> +remove_ovn_installed(struct local_binding *lbinding, const char >> *pb_name) >>> +{ >>> + if (lbinding && lbinding->iface && >>> + smap_get_bool(&lbinding->iface->external_ids, >>> + OVN_INSTALLED_EXT_ID, false)) { >>> + /* If iface has been deleted, do not try to delete a key from >> it */ >>> + if (!ovsrec_interface_is_deleted(lbinding->iface)) { >> >> Not really a problem, it just felt a bit weird to read this knowing that >> the function can be called even when not processing tracked changes. >> Still, it should be fine. >> >> I think this is really due to the fact that, when we handle tracked > changes and a db is ro, we have two choices > - fail as we were doing before, and recompute > - do nothing in the current loop, and have if-status "reconcile", but hence > not based on tracked changes > OK, I guess. >>> + VLOG_INFO("Removing lport %s ovn-installed in OVS", >> pb_name); >>> + ovsrec_interface_update_external_ids_delkey(lbinding->iface, >>> + >> OVN_INSTALLED_EXT_ID); >>> + } >>> + } >>> +} >>> + >>> +void >>> +local_binding_remove_ovn_installed(struct shash *local_bindings, >>> + const char *pb_name, bool >> ovs_readonly) >>> +{ >>> + if (ovs_readonly) { >>> + return; >>> + } >>> + struct local_binding *lbinding = >>> + local_binding_find(local_bindings, pb_name); >>> + remove_ovn_installed(lbinding, pb_name); >>> +} >>> + >>> void >>> local_binding_set_down(struct shash *local_bindings, const char >> *pb_name, >>> const struct sbrec_chassis *chassis_rec, >>> @@ -1239,7 +1280,9 @@ claim_lport(const struct sbrec_port_binding *pb, >>> return false; >>> } >>> } else { >>> - if (pb->n_up && !pb->up[0]) { >>> + if ((pb->n_up && !pb->up[0]) || >>> + !smap_get_bool(&iface_rec->external_ids, >>> + OVN_INSTALLED_EXT_ID, false)) { >>> if_status_mgr_claim_iface(if_mgr, pb, chassis_rec, >>> sb_readonly); >>> } >>> @@ -1464,9 +1507,11 @@ consider_vif_lport_(const struct >> sbrec_port_binding *pb, >>> const char *requested_chassis_option = smap_get( >>> &pb->options, "requested-chassis"); >>> VLOG_INFO_RL(&rl, >>> - "Not claiming lport %s, chassis %s requested-chassis >> %s", >>> + "Not claiming lport %s, chassis %s requested-chassis %s >> " >>> + "pb->chassis %s", >>> pb->logical_port, b_ctx_in->chassis_rec->name, >>> - requested_chassis_option ? requested_chassis_option : >> "[]"); >>> + requested_chassis_option ? requested_chassis_option : >> "[]", >>> + pb->chassis ? pb->chassis->name: ""); >>> } >>> } >>> >>> @@ -2288,6 +2333,9 @@ consider_iface_release(const struct >> ovsrec_interface *iface_rec, >>> return false; >>> } >>> } >>> + if (b_ctx_in->ovs_idl_txn) { >>> + remove_ovn_installed(lbinding, iface_id); >>> + } >> >> Shouldn't this be done through local_binding_remove_ovn_installed() >> instead? > > I think that both are quite similar: local_binding_remove_ovn_installed() > might be more readable, but it requires an additional local_binding_find, > so slightly slower > >> Can't we just move the iface to OIF_REMOVE_OVN_INST state here >> and let if_status_mgr_update_bindings() synchronize ovn_installed? Like >> that we reconcile the OVS interface in a single place, in if-status.c. >> > I do not think that we can easily use the if-status module for this, as > lbinding->iface is cleared(a few lines down) before if-status has a chance > to run and remove ovn-install. Once binding->iface is cleared we can't > (easily) remove ovn-install > Hmm, OK, but what if b_ctx_in->ovs_idl_txn is NULL? We're just not going to remove ovn-installed at all then AFAICT. Would it be costly to lookup the interface in the if-status module? >> >>> >>> } else if (lbinding && b_lport && b_lport->type == LP_LOCALPORT) { >>> /* lbinding is associated with a localport. Remove it from the >>> @@ -2558,6 +2606,7 @@ handle_deleted_lport(const struct >> sbrec_port_binding *pb, >>> if (ld) { >>> remove_pb_from_local_datapath(pb, >>> b_ctx_out, ld); >>> + if_status_mgr_release_iface(b_ctx_out->if_mgr, >> pb->logical_port); >>> return; >>> } >>> >>> @@ -2581,6 +2630,7 @@ handle_deleted_lport(const struct >> sbrec_port_binding *pb, >>> remove_pb_from_local_datapath(pb, b_ctx_out, >>> ld); >>> } >>> + if_status_mgr_release_iface(b_ctx_out->if_mgr, >> pb->logical_port); >>> } >>> } >>> >>> @@ -2627,6 +2677,9 @@ handle_deleted_vif_lport(const struct >> sbrec_port_binding *pb, >>> } >>> >>> handle_deleted_lport(pb, b_ctx_in, b_ctx_out); >>> + if (b_ctx_in->ovs_idl_txn) { >>> + remove_ovn_installed(lbinding, pb->logical_port); >>> + } >> >> Same comment as above applies here I think. >> >>> return true; >>> } >>> >>> diff --git a/controller/binding.h b/controller/binding.h >>> index 6c3a98b02..72b4a35b6 100644 >>> --- a/controller/binding.h >>> +++ b/controller/binding.h >>> @@ -159,6 +159,12 @@ bool local_binding_is_up(struct shash >> *local_bindings, const char *pb_name, >>> bool local_binding_is_down(struct shash *local_bindings, const char >> *pb_name, >>> const struct sbrec_chassis *); >>> >>> +bool local_binding_is_ovn_installed(struct shash *local_bindings, >>> + const char *pb_name); >>> +void local_binding_remove_ovn_installed(struct shash *local_bindings, >>> + const char *pb_name, >>> + bool ovs_readonly); >>> + >>> void local_binding_set_up(struct shash *local_bindings, const char >> *pb_name, >>> const struct sbrec_chassis *chassis_rec, >>> const char *ts_now_str, bool sb_readonly, >>> diff --git a/controller/if-status.c b/controller/if-status.c >>> index d1c14ac30..c008aa79e 100644 >>> --- a/controller/if-status.c >>> +++ b/controller/if-status.c >>> @@ -54,32 +54,37 @@ VLOG_DEFINE_THIS_MODULE(if_status); >>> */ >>> >>> enum if_state { >>> - OIF_CLAIMED, /* Newly claimed interface. pb->chassis update >> not yet >>> - initiated. */ >>> - OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis update >> sent to >>> - * SB (but update notification not confirmed, so >> the >>> - * update may be resent in any of the following >> states) >>> - * and for which flows are still being installed. >>> - */ >>> - OIF_MARK_UP, /* Interface with flows successfully installed >> in OVS >>> - * but not yet marked "up" in the binding module >> (in >>> - * SB and OVS databases). >>> - */ >>> - OIF_MARK_DOWN, /* Released interface but not yet marked "down" >> in the >>> - * binding module (in SB and/or OVS databases). >>> - */ >>> - OIF_INSTALLED, /* Interface flows programmed in OVS and binding >> marked >>> - * "up" in the binding module. >>> - */ >>> + OIF_CLAIMED, /* Newly claimed interface. pb->chassis update >> not yet >>> + initiated. */ >>> + OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis update >> sent to >>> + * SB (but update notification not confirmed, >> so the >>> + * update may be resent in any of the following >>> + * states and for which flows are still being >>> + * installed. >>> + */ >>> + OIF_REMOVE_OVN_INST, /* Interface with flows successfully installed >> in OVS, >> >> This doesn't sound correct to me. Did you mean something like >> "Interface for which flows are still being installed in OVS"? >> > I think it is correct, even though it sounds strange (see the graph below): > this state is used instead of the MARK_UP state > when flows are successfully installed, but ovn-install was present before > we started installing flows, and is still being removed. > While installing flows, we were also removing ovn-install > (local_binding_set_down in if_status_mgr_update_bindings). > In this case, moving directly to MARK_UP might result in a race condition > ending up in ovn-install > not installed. > Maybe the name of the state should be changed ? OIF_REMOVE_OLD_OVN_INST ? > I think I prefer this one, yes. >> >>> + * but with ovn-installed still in OVSDB. >>> + */ >>> + OIF_MARK_UP, /* Interface with flows successfully installed >> in OVS >>> + * but not yet marked "up" in the binding >> module (in >>> + * SB and OVS databases). >>> + */ >>> + OIF_MARK_DOWN, /* Released interface but not yet marked >> "down" in the >>> + * binding module (in SB and/or OVS databases). >>> + */ >>> + OIF_INSTALLED, /* Interface flows programmed in OVS and >> binding >>> + * marked "up" in the binding module. >>> + */ >>> OIF_MAX, >>> }; >>> >>> static const char *if_state_names[] = { >>> - [OIF_CLAIMED] = "CLAIMED", >>> - [OIF_INSTALL_FLOWS] = "INSTALL_FLOWS", >>> - [OIF_MARK_UP] = "MARK_UP", >>> - [OIF_MARK_DOWN] = "MARK_DOWN", >>> - [OIF_INSTALLED] = "INSTALLED", >>> + [OIF_CLAIMED] = "CLAIMED", >>> + [OIF_INSTALL_FLOWS] = "INSTALL_FLOWS", >>> + [OIF_REMOVE_OVN_INST] = "REMOVE_OVN_INST", >>> + [OIF_MARK_UP] = "MARK_UP", >>> + [OIF_MARK_DOWN] = "MARK_DOWN", >>> + [OIF_INSTALLED] = "INSTALLED", >>> }; >>> >>> /* >>> @@ -109,11 +114,26 @@ static const char *if_state_names[] = { >>> * | | | - remove ovn-installed from ovsdb >> | | | >>> * | | | mgr_update() >> | | | >>> * | +----------------------+ - sbrec_update_chassis if needed >> | | | >>> - * | | >> | | | >>> - * | | mgr_run(seqno rcvd) >> | | | >>> - * | | - set port up in sb >> | | | >>> - * | release_iface | - set ovn-installed in ovs >> | | | >>> - * | V >> | | | >>> + * | | | >> | | | >>> + * | | +----------------------------------------+ >> | | | >>> + * | | | >> | | | >>> + * | | mgr_run(seqno rcvd, ovn-installed present) | >> | | | >>> + * | V | >> | | | >>> + * | +--------------------+ | >> | | | >>> + * | | | mgr_run() | >> | | | >>> + * +--- | REMOVE_OVN_INST | - remove ovn-installed in ovs | >> | | | >>> + * | +--------------------+ | >> | | | >>> + * | | | >> | | | >>> + * | | | >> | | | >>> + * | | mgr_update( ovn_installed not present) | >> | | | >>> + * | | | >> | | | >>> + * | | +-------------------------------------------+ >> | | | >>> + * | | | >> | | | >>> + * | | | mgr_run(seqno rcvd, ovn-installed not >> present) | | | >>> + * | | | - set port up in sb >> | | | >>> + * | | | - set ovn-installed in ovs >> | | | >>> + * |release_iface | | >> | | | >>> + * | V V >> | | | >>> * | +----------------------+ >> | | | >>> * | | | mgr_run() >> | | | >>> * +-- | MARK_UP | - set port up in sb >> | | | >>> @@ -238,6 +258,7 @@ if_status_mgr_claim_iface(struct if_status_mgr *mgr, >>> switch (iface->state) { >>> case OIF_CLAIMED: >>> case OIF_INSTALL_FLOWS: >>> + case OIF_REMOVE_OVN_INST: >>> case OIF_MARK_UP: >>> /* Nothing to do here. */ >>> break; >>> @@ -274,6 +295,7 @@ if_status_mgr_release_iface(struct if_status_mgr >> *mgr, const char *iface_id) >>> /* Not yet fully installed interfaces can be safely deleted. */ >>> ovs_iface_destroy(mgr, iface); >>> break; >>> + case OIF_REMOVE_OVN_INST: >>> case OIF_MARK_UP: >>> case OIF_INSTALLED: >>> /* Properly mark interfaces "down" if their flows were already >>> @@ -305,6 +327,7 @@ if_status_mgr_delete_iface(struct if_status_mgr >> *mgr, const char *iface_id) >>> /* Not yet fully installed interfaces can be safely deleted. */ >>> ovs_iface_destroy(mgr, iface); >>> break; >>> + case OIF_REMOVE_OVN_INST: >>> case OIF_MARK_UP: >>> case OIF_INSTALLED: >>> /* Properly mark interfaces "down" if their flows were already >>> @@ -359,6 +382,17 @@ if_status_mgr_update(struct if_status_mgr *mgr, >>> struct shash *bindings = &binding_data->bindings; >>> struct hmapx_node *node; >>> >>> + /* Move all interfaces that have been confirmed without >> ovn-installed, >>> + * from OIF_REMOVE_OVN_INST to OIF_MARK_UP. >>> + */ >>> + HMAPX_FOR_EACH_SAFE (node, >> &mgr->ifaces_per_state[OIF_REMOVE_OVN_INST]) { >>> + struct ovs_iface *iface = node->data; >>> + >>> + if (!local_binding_is_ovn_installed(bindings, iface->id)) { >>> + ovs_iface_set_state(mgr, iface, OIF_MARK_UP); >>> + } >>> + } >>> + >>> /* Interfaces in OIF_MARK_UP/INSTALL_FLOWS state have already set >> their >>> * pb->chassis. However, the update might still be in fly >> (confirmation >>> * not received yet) or pb->chassis was overwitten by another >> chassis. >>> @@ -471,7 +505,19 @@ if_status_mgr_run(struct if_status_mgr *mgr, >>> iface->install_seqno)) { >>> continue; >>> } >>> - ovs_iface_set_state(mgr, iface, OIF_MARK_UP); >>> + /* Wait for ovn-installed to be absent before moving to MARK_UP >> state. >>> + * Most of the times ovn-installed is already absent and hence >> we will >>> + * not have to wait. >>> + * If there is no binding_data, we can't determine if >> ovn-installed is >>> + * present or not; hence also go to the OIF_REMOVE_OVN_INST >> state. >>> + */ >>> + if (!binding_data || >>> + local_binding_is_ovn_installed(&binding_data->bindings, >>> + iface->id)) { >>> + ovs_iface_set_state(mgr, iface, OIF_REMOVE_OVN_INST); > > Will this cause significant/measurable delays in setting ovn-installed? >> I think not but I was wondering if you have some test data for this part >> here. >> > I do not think so either as: > - this case (having ovn-installed present when we start to install flows, > and so having to remove it) should not occur much anymore. We were hitting > this case before because we were not properly removing ovn-installed when > removing bindings. > - we remove ovn-install while installing flows (i.e. in parallel, not in > serie). So, there might be a delay if DB is slow. But before this patch, > the same case was causing ovn-install to be missing > I see, thanks for the explanation! Regards, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
