H Ales Thanks for reviewing the patch and for your comments.
On Wed, Aug 7, 2024 at 1:45 PM Ales Musil <[email protected]> wrote: > > > On Thu, Jul 25, 2024 at 3:16 PM Xavier Simonart <[email protected]> > wrote: > >> There was a potential ovs_assert when exiting ovn-controller: >> controller/if-status.c:261: assertion >> shash_is_empty(&mgr->ovn_uninstall_hash) failed in if_status_mgr_clear() >> >> Fixes: 395eac485b87 ("ovn-controller: fixed ovn-installed not always >> properly added or removed.") >> >> Signed-off-by: Xavier Simonart <[email protected]> >> --- >> > > Hi Xavier, > > thank you for the patch. I have two questions below. > > controller/if-status.c | 24 +++++++++--------------- >> tests/system-ovn.at | 23 +++++++++++++++++++---- >> 2 files changed, 28 insertions(+), 19 deletions(-) >> >> diff --git a/controller/if-status.c b/controller/if-status.c >> index 9a7488057..ada78a18b 100644 >> --- a/controller/if-status.c >> +++ b/controller/if-status.c >> @@ -219,7 +219,8 @@ ovs_iface_create(struct if_status_mgr *, const char >> *iface_id, >> static void add_to_ovn_uninstall_hash(struct if_status_mgr *, const char >> *, >> const struct uuid *); >> static void ovs_iface_destroy(struct if_status_mgr *, struct ovs_iface >> *); >> -static void ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, char >> *name); >> +static void ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, >> + struct shash_node *node); >> static void ovs_iface_set_state(struct if_status_mgr *, struct ovs_iface >> *, >> enum if_state); >> >> @@ -256,7 +257,7 @@ if_status_mgr_clear(struct if_status_mgr *mgr) >> ovs_assert(shash_is_empty(&mgr->ifaces)); >> >> SHASH_FOR_EACH_SAFE (node, &mgr->ovn_uninstall_hash) { >> - ovn_uninstall_hash_destroy(mgr, node->data); >> + ovn_uninstall_hash_destroy(mgr, node); >> } >> ovs_assert(shash_is_empty(&mgr->ovn_uninstall_hash)); >> >> @@ -789,20 +790,13 @@ ovs_iface_destroy(struct if_status_mgr *mgr, struct >> ovs_iface *iface) >> } >> >> static void >> -ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, char *name) >> +ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, struct shash_node >> *node) >> { >> - struct shash_node *node = shash_find(&mgr->ovn_uninstall_hash, name); >> - char *node_name = NULL; >> - if (node) { >> - free(node->data); >> - VLOG_DBG("Interface name %s destroy", name); >> - node_name = shash_steal(&mgr->ovn_uninstall_hash, node); >> - ovn_uninstall_hash_account_mem(name, true); >> - free(node_name); >> - } else { >> - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); >> - VLOG_WARN_RL(&rl, "Interface name %s not found", name); >> - } >> + free(node->data); >> + VLOG_DBG("Interface name %s destroy", node->name); >> + char *node_name = shash_steal(&mgr->ovn_uninstall_hash, node); >> + ovn_uninstall_hash_account_mem(node_name, true); >> + free(node_name); >> > > The simplification makes sense, however I'm not sure how it could lead to > the node > not being removed from the shash. The name is taken directly from the node > so the > shash_find() should always succeed. Or I'm I missing something? > > The issue was that we were calling ovn_uninstall_hash_destroy(mgr, *node->data*). Then shash_find() did not find anything as shash_find() looks for names, not for data. I guess using ovn_uninstall_hash_destroy(mgr, *node->name*) and leaving ovn_uninstall_hash_destroy intact would have worked as well. ... > } >> >> static void >> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >> index ddb3d14e9..e9d889f43 100644 >> --- a/tests/system-ovn.at >> +++ b/tests/system-ovn.at >> @@ -11325,7 +11325,25 @@ check_ovn_installed >> check_ports_up >> check_ports_bound >> >> -OVS_APP_EXIT_AND_WAIT([ovn-controller]) >> +AS_BOX(["Leave some ovn-installed while closing ovn-controller"]) >> +# Block IDL from ovn-controller to OVSDB >> +stop_ovsdb_controller_updates $TCP_PORT >> +remove_iface_id vif2 >> +ensure_controller_run >> + >> +# OVSDB should now be seen as read-only by ovn-controller >> +remove_iface_id vif1 >> +check ovn-nbctl --wait=hv sync >> + >> +# Stop ovsdb before ovn-controller to ensure it's not updated >> +as >> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >> +/connection dropped.*/d"]) >> + >> +# Don't use OVS_APP_EXIT... to use --restart to avoid cleaning up the >> databases. >> +TMPPID=$(cat $OVS_RUNDIR/ovn-controller.pid) >> +check ovs-appctl -t ovn-controller exit --restart >> +OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null]) >> > > Is the test supposed to show the assertion? Because I have tried to run it > in a loop > for a few minutes and it was always ok. > Yeah, it always does :-) Without sanitizer and using gcc, I assert in ovs_assert(shash_is_empty(&mgr->ovn_uninstall_hash)) in controller/if-status.c:261 I also see the following if running w/ clang & address sanitizer: ==681585==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020001a5e80 at pc 0x0000004e2a99 bp 0x7fffaa58b3e0 sp 0x7fffaa58ab90 READ of size 19 at 0x6020001a5e80 thread T0 #0 0x4e2a98 in __interceptor_strlen.part.0 (./controller/ovn-controller+0x4e2a98) #1 0x83e357 in shash_find ./ovs/lib/shash.c:237:35 #2 0x593708 in ovn_uninstall_hash_destroy ./controller/if-status.c:794:31 #3 0x593708 in if_status_mgr_clear ./controller/if-status.c:259:9 Do you see any WARN in the ovn-controller.log such as "Interface name ... not found" ? Do you see "Removing iface vif2 ovn-installed in OVS" (which you should) and "Removing iface vif1 ovn-installed in OVS" (which you should not) ? Which OS/version are you running on ? If running on some older OS w/o nft, I think that stop_ovsdb_controller_updates would not stop anything, and we would not hit the issue (except in some quite rare race conditions). > > >> >> as ovn-sb >> OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> @@ -11336,9 +11354,6 @@ OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> as northd >> OVS_APP_EXIT_AND_WAIT([ovn-northd]) >> >> -as >> -OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >> -/connection dropped.*/d"]) >> AT_CLEANUP >> ]) >> >> -- >> 2.31.1 >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> > Thanks, > Ales > > Thanks Xavier > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA <https://www.redhat.com> > > [email protected] > <https://red.ht/sig> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
