On Thu, Aug 8, 2024 at 10:26 AM Xavier Simonart <[email protected]> wrote:
> H Ales > > Thanks for reviewing the patch and for your comments. > > On Wed, Aug 7, 2024 at 1:45 PM Ales Musil <[email protected]> wrote: > >> >> >> On Thu, Jul 25, 2024 at 3:16 PM Xavier Simonart <[email protected]> >> wrote: >> >>> There was a potential ovs_assert when exiting ovn-controller: >>> controller/if-status.c:261: assertion >>> shash_is_empty(&mgr->ovn_uninstall_hash) failed in if_status_mgr_clear() >>> >>> Fixes: 395eac485b87 ("ovn-controller: fixed ovn-installed not always >>> properly added or removed.") >>> >>> Signed-off-by: Xavier Simonart <[email protected]> >>> --- >>> >> >> Hi Xavier, >> >> thank you for the patch. I have two questions below. >> >> controller/if-status.c | 24 +++++++++--------------- >>> tests/system-ovn.at | 23 +++++++++++++++++++---- >>> 2 files changed, 28 insertions(+), 19 deletions(-) >>> >>> diff --git a/controller/if-status.c b/controller/if-status.c >>> index 9a7488057..ada78a18b 100644 >>> --- a/controller/if-status.c >>> +++ b/controller/if-status.c >>> @@ -219,7 +219,8 @@ ovs_iface_create(struct if_status_mgr *, const char >>> *iface_id, >>> static void add_to_ovn_uninstall_hash(struct if_status_mgr *, const >>> char *, >>> const struct uuid *); >>> static void ovs_iface_destroy(struct if_status_mgr *, struct ovs_iface >>> *); >>> -static void ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, char >>> *name); >>> +static void ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, >>> + struct shash_node *node); >>> static void ovs_iface_set_state(struct if_status_mgr *, struct >>> ovs_iface *, >>> enum if_state); >>> >>> @@ -256,7 +257,7 @@ if_status_mgr_clear(struct if_status_mgr *mgr) >>> ovs_assert(shash_is_empty(&mgr->ifaces)); >>> >>> SHASH_FOR_EACH_SAFE (node, &mgr->ovn_uninstall_hash) { >>> - ovn_uninstall_hash_destroy(mgr, node->data); >>> + ovn_uninstall_hash_destroy(mgr, node); >>> } >>> ovs_assert(shash_is_empty(&mgr->ovn_uninstall_hash)); >>> >>> @@ -789,20 +790,13 @@ ovs_iface_destroy(struct if_status_mgr *mgr, >>> struct ovs_iface *iface) >>> } >>> >>> static void >>> -ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, char *name) >>> +ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, struct shash_node >>> *node) >>> { >>> - struct shash_node *node = shash_find(&mgr->ovn_uninstall_hash, >>> name); >>> - char *node_name = NULL; >>> - if (node) { >>> - free(node->data); >>> - VLOG_DBG("Interface name %s destroy", name); >>> - node_name = shash_steal(&mgr->ovn_uninstall_hash, node); >>> - ovn_uninstall_hash_account_mem(name, true); >>> - free(node_name); >>> - } else { >>> - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); >>> - VLOG_WARN_RL(&rl, "Interface name %s not found", name); >>> - } >>> + free(node->data); >>> + VLOG_DBG("Interface name %s destroy", node->name); >>> + char *node_name = shash_steal(&mgr->ovn_uninstall_hash, node); >>> + ovn_uninstall_hash_account_mem(node_name, true); >>> + free(node_name); >>> >> >> The simplification makes sense, however I'm not sure how it could lead to >> the node >> not being removed from the shash. The name is taken directly from the >> node so the >> shash_find() should always succeed. Or I'm I missing something? >> >> > The issue was that we were calling ovn_uninstall_hash_destroy(mgr, > *node->data*). Then shash_find() did not find anything as shash_find() > looks for names, not for data. > I guess using ovn_uninstall_hash_destroy(mgr, *node->name*) and leaving > ovn_uninstall_hash_destroy intact would have worked as well. > Ok that makes sense, for some reason I saw node->name being passed. ... > >> } >>> >>> static void >>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>> index ddb3d14e9..e9d889f43 100644 >>> --- a/tests/system-ovn.at >>> +++ b/tests/system-ovn.at >>> @@ -11325,7 +11325,25 @@ check_ovn_installed >>> check_ports_up >>> check_ports_bound >>> >>> -OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>> +AS_BOX(["Leave some ovn-installed while closing ovn-controller"]) >>> +# Block IDL from ovn-controller to OVSDB >>> +stop_ovsdb_controller_updates $TCP_PORT >>> +remove_iface_id vif2 >>> +ensure_controller_run >>> + >>> +# OVSDB should now be seen as read-only by ovn-controller >>> +remove_iface_id vif1 >>> +check ovn-nbctl --wait=hv sync >>> + >>> +# Stop ovsdb before ovn-controller to ensure it's not updated >>> +as >>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >>> +/connection dropped.*/d"]) >>> + >>> +# Don't use OVS_APP_EXIT... to use --restart to avoid cleaning up the >>> databases. >>> +TMPPID=$(cat $OVS_RUNDIR/ovn-controller.pid) >>> +check ovs-appctl -t ovn-controller exit --restart >>> +OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null]) >>> >> >> Is the test supposed to show the assertion? Because I have tried to run >> it in a loop >> for a few minutes and it was always ok. >> > Yeah, it always does :-) > Without sanitizer and using gcc, I assert in > ovs_assert(shash_is_empty(&mgr->ovn_uninstall_hash)) in > controller/if-status.c:261 > I also see the following if running w/ clang & address sanitizer: > ==681585==ERROR: AddressSanitizer: heap-buffer-overflow on address > 0x6020001a5e80 at pc 0x0000004e2a99 bp 0x7fffaa58b3e0 sp 0x7fffaa58ab90 > READ of size 19 at 0x6020001a5e80 thread T0 > #0 0x4e2a98 in __interceptor_strlen.part.0 > (./controller/ovn-controller+0x4e2a98) > #1 0x83e357 in shash_find ./ovs/lib/shash.c:237:35 > #2 0x593708 in ovn_uninstall_hash_destroy > ./controller/if-status.c:794:31 > #3 0x593708 in if_status_mgr_clear ./controller/if-status.c:259:9 > > Do you see any WARN in the ovn-controller.log such as "Interface name ... > not found" ? > Do you see "Removing iface vif2 ovn-installed in OVS" (which you should) > and "Removing iface vif1 ovn-installed in OVS" (which you should not) ? > Which OS/version are you running on ? > If running on some older OS w/o nft, I think that > stop_ovsdb_controller_updates would not stop anything, and we would not hit > the issue (except in some quite rare race conditions). > I think I have found the issue, I was missing the nftables package. Maybe we should add skip for that so it is obvious? Also I don't see the nftables package being installed in the containers. Those tests are most likely not running as intended. > >> >>> >>> as ovn-sb >>> OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> @@ -11336,9 +11354,6 @@ OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> as northd >>> OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>> >>> -as >>> -OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >>> -/connection dropped.*/d"]) >>> AT_CLEANUP >>> ]) >>> >>> -- >>> 2.31.1 >>> >>> _______________________________________________ >>> dev mailing list >>> [email protected] >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >>> >> Thanks, >> Ales >> >> Thanks > Xavier > >> -- >> >> Ales Musil >> >> Senior Software Engineer - OVN Core >> >> Red Hat EMEA <https://www.redhat.com> >> >> [email protected] >> <https://red.ht/sig> >> > Thanks, Ales -- Ales Musil Senior Software Engineer - OVN Core Red Hat EMEA <https://www.redhat.com> [email protected] <https://red.ht/sig> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
