H Ales

Thanks for reviewing the patch and for your comments.

On Wed, Aug 7, 2024 at 1:45 PM Ales Musil <[email protected]> wrote:

>
>
> On Thu, Jul 25, 2024 at 3:16 PM Xavier Simonart <[email protected]>
> wrote:
>
>> There was a potential ovs_assert when exiting ovn-controller:
>>     controller/if-status.c:261: assertion
>> shash_is_empty(&mgr->ovn_uninstall_hash) failed in if_status_mgr_clear()
>>
>> Fixes: 395eac485b87 ("ovn-controller: fixed ovn-installed not always
>> properly added or removed.")
>>
>> Signed-off-by: Xavier Simonart <[email protected]>
>> ---
>>
>
> Hi Xavier,
>
> thank you for the patch. I have two questions below.
>
>  controller/if-status.c | 24 +++++++++---------------
>>  tests/system-ovn.at    | 23 +++++++++++++++++++----
>>  2 files changed, 28 insertions(+), 19 deletions(-)
>>
>> diff --git a/controller/if-status.c b/controller/if-status.c
>> index 9a7488057..ada78a18b 100644
>> --- a/controller/if-status.c
>> +++ b/controller/if-status.c
>> @@ -219,7 +219,8 @@ ovs_iface_create(struct if_status_mgr *, const char
>> *iface_id,
>>  static void add_to_ovn_uninstall_hash(struct if_status_mgr *, const char
>> *,
>>                                        const struct uuid *);
>>  static void ovs_iface_destroy(struct if_status_mgr *, struct ovs_iface
>> *);
>> -static void ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, char
>> *name);
>> +static void ovn_uninstall_hash_destroy(struct if_status_mgr *mgr,
>> +                                       struct shash_node *node);
>>  static void ovs_iface_set_state(struct if_status_mgr *, struct ovs_iface
>> *,
>>                                  enum if_state);
>>
>> @@ -256,7 +257,7 @@ if_status_mgr_clear(struct if_status_mgr *mgr)
>>      ovs_assert(shash_is_empty(&mgr->ifaces));
>>
>>      SHASH_FOR_EACH_SAFE (node, &mgr->ovn_uninstall_hash) {
>> -        ovn_uninstall_hash_destroy(mgr, node->data);
>> +        ovn_uninstall_hash_destroy(mgr, node);
>>      }
>>      ovs_assert(shash_is_empty(&mgr->ovn_uninstall_hash));
>>
>> @@ -789,20 +790,13 @@ ovs_iface_destroy(struct if_status_mgr *mgr, struct
>> ovs_iface *iface)
>>  }
>>
>>  static void
>> -ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, char *name)
>> +ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, struct shash_node
>> *node)
>>  {
>> -    struct shash_node *node = shash_find(&mgr->ovn_uninstall_hash, name);
>> -    char *node_name = NULL;
>> -    if (node) {
>> -        free(node->data);
>> -        VLOG_DBG("Interface name %s destroy", name);
>> -        node_name = shash_steal(&mgr->ovn_uninstall_hash, node);
>> -        ovn_uninstall_hash_account_mem(name, true);
>> -        free(node_name);
>> -    } else {
>> -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
>> -        VLOG_WARN_RL(&rl, "Interface name %s not found", name);
>> -    }
>> +    free(node->data);
>> +    VLOG_DBG("Interface name %s destroy", node->name);
>> +    char *node_name = shash_steal(&mgr->ovn_uninstall_hash, node);
>> +    ovn_uninstall_hash_account_mem(node_name, true);
>> +    free(node_name);
>>
>
> The simplification makes sense, however I'm not sure how it could lead to
> the node
> not being removed from the shash. The name is taken directly from the node
> so the
> shash_find() should always succeed. Or I'm I missing something?
>
>
The issue was that we were calling ovn_uninstall_hash_destroy(mgr,
*node->data*). Then shash_find() did not find anything as shash_find()
looks for names, not for data.
I guess using  ovn_uninstall_hash_destroy(mgr, *node->name*) and leaving
ovn_uninstall_hash_destroy intact would have worked as well.
...

>  }
>>
>>  static void
>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>> index ddb3d14e9..e9d889f43 100644
>> --- a/tests/system-ovn.at
>> +++ b/tests/system-ovn.at
>> @@ -11325,7 +11325,25 @@ check_ovn_installed
>>  check_ports_up
>>  check_ports_bound
>>
>> -OVS_APP_EXIT_AND_WAIT([ovn-controller])
>> +AS_BOX(["Leave some ovn-installed while closing ovn-controller"])
>> +# Block IDL from ovn-controller to OVSDB
>> +stop_ovsdb_controller_updates $TCP_PORT
>> +remove_iface_id vif2
>> +ensure_controller_run
>> +
>> +# OVSDB should now be seen as read-only by ovn-controller
>> +remove_iface_id vif1
>> +check ovn-nbctl --wait=hv sync
>> +
>> +# Stop ovsdb before ovn-controller to ensure it's not updated
>> +as
>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
>> +/connection dropped.*/d"])
>> +
>> +# Don't use OVS_APP_EXIT... to use --restart to avoid cleaning up the
>> databases.
>> +TMPPID=$(cat $OVS_RUNDIR/ovn-controller.pid)
>> +check ovs-appctl -t ovn-controller exit --restart
>> +OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null])
>>
>
> Is the test supposed to show the assertion? Because I have tried to run it
> in a loop
> for a few minutes and it was always ok.
>
Yeah, it always does :-)
Without sanitizer and using gcc, I assert in
ovs_assert(shash_is_empty(&mgr->ovn_uninstall_hash)) in
controller/if-status.c:261
I also see the following if running w/ clang & address sanitizer:
==681585==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x6020001a5e80 at pc 0x0000004e2a99 bp 0x7fffaa58b3e0 sp 0x7fffaa58ab90
READ of size 19 at 0x6020001a5e80 thread T0
    #0 0x4e2a98 in __interceptor_strlen.part.0
(./controller/ovn-controller+0x4e2a98)
    #1 0x83e357 in shash_find ./ovs/lib/shash.c:237:35
    #2 0x593708 in ovn_uninstall_hash_destroy
./controller/if-status.c:794:31
    #3 0x593708 in if_status_mgr_clear ./controller/if-status.c:259:9

Do you see any WARN in the ovn-controller.log such as "Interface name ...
not found" ?
Do you see "Removing iface vif2 ovn-installed in OVS" (which you should)
and "Removing iface vif1 ovn-installed in OVS" (which you should not) ?
Which OS/version are you running on ?
If running on some older OS w/o nft, I think that
stop_ovsdb_controller_updates would not stop anything, and we would not hit
the issue (except in some quite rare race conditions).

>
>
>>
>>  as ovn-sb
>>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>> @@ -11336,9 +11354,6 @@ OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>  as northd
>>  OVS_APP_EXIT_AND_WAIT([ovn-northd])
>>
>> -as
>> -OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
>> -/connection dropped.*/d"])
>>  AT_CLEANUP
>>  ])
>>
>> --
>> 2.31.1
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
> Thanks,
> Ales
>
> Thanks
Xavier

> --
>
> Ales Musil
>
> Senior Software Engineer - OVN Core
>
> Red Hat EMEA <https://www.redhat.com>
>
> [email protected]
> <https://red.ht/sig>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to