On Thu, Aug 8, 2024 at 10:26 AM Xavier Simonart <[email protected]> wrote:

> H Ales
>
> Thanks for reviewing the patch and for your comments.
>
> On Wed, Aug 7, 2024 at 1:45 PM Ales Musil <[email protected]> wrote:
>
>>
>>
>> On Thu, Jul 25, 2024 at 3:16 PM Xavier Simonart <[email protected]>
>> wrote:
>>
>>> There was a potential ovs_assert when exiting ovn-controller:
>>>     controller/if-status.c:261: assertion
>>> shash_is_empty(&mgr->ovn_uninstall_hash) failed in if_status_mgr_clear()
>>>
>>> Fixes: 395eac485b87 ("ovn-controller: fixed ovn-installed not always
>>> properly added or removed.")
>>>
>>> Signed-off-by: Xavier Simonart <[email protected]>
>>> ---
>>>
>>
>> Hi Xavier,
>>
>> thank you for the patch. I have two questions below.
>>
>>  controller/if-status.c | 24 +++++++++---------------
>>>  tests/system-ovn.at    | 23 +++++++++++++++++++----
>>>  2 files changed, 28 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/controller/if-status.c b/controller/if-status.c
>>> index 9a7488057..ada78a18b 100644
>>> --- a/controller/if-status.c
>>> +++ b/controller/if-status.c
>>> @@ -219,7 +219,8 @@ ovs_iface_create(struct if_status_mgr *, const char
>>> *iface_id,
>>>  static void add_to_ovn_uninstall_hash(struct if_status_mgr *, const
>>> char *,
>>>                                        const struct uuid *);
>>>  static void ovs_iface_destroy(struct if_status_mgr *, struct ovs_iface
>>> *);
>>> -static void ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, char
>>> *name);
>>> +static void ovn_uninstall_hash_destroy(struct if_status_mgr *mgr,
>>> +                                       struct shash_node *node);
>>>  static void ovs_iface_set_state(struct if_status_mgr *, struct
>>> ovs_iface *,
>>>                                  enum if_state);
>>>
>>> @@ -256,7 +257,7 @@ if_status_mgr_clear(struct if_status_mgr *mgr)
>>>      ovs_assert(shash_is_empty(&mgr->ifaces));
>>>
>>>      SHASH_FOR_EACH_SAFE (node, &mgr->ovn_uninstall_hash) {
>>> -        ovn_uninstall_hash_destroy(mgr, node->data);
>>> +        ovn_uninstall_hash_destroy(mgr, node);
>>>      }
>>>      ovs_assert(shash_is_empty(&mgr->ovn_uninstall_hash));
>>>
>>> @@ -789,20 +790,13 @@ ovs_iface_destroy(struct if_status_mgr *mgr,
>>> struct ovs_iface *iface)
>>>  }
>>>
>>>  static void
>>> -ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, char *name)
>>> +ovn_uninstall_hash_destroy(struct if_status_mgr *mgr, struct shash_node
>>> *node)
>>>  {
>>> -    struct shash_node *node = shash_find(&mgr->ovn_uninstall_hash,
>>> name);
>>> -    char *node_name = NULL;
>>> -    if (node) {
>>> -        free(node->data);
>>> -        VLOG_DBG("Interface name %s destroy", name);
>>> -        node_name = shash_steal(&mgr->ovn_uninstall_hash, node);
>>> -        ovn_uninstall_hash_account_mem(name, true);
>>> -        free(node_name);
>>> -    } else {
>>> -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
>>> -        VLOG_WARN_RL(&rl, "Interface name %s not found", name);
>>> -    }
>>> +    free(node->data);
>>> +    VLOG_DBG("Interface name %s destroy", node->name);
>>> +    char *node_name = shash_steal(&mgr->ovn_uninstall_hash, node);
>>> +    ovn_uninstall_hash_account_mem(node_name, true);
>>> +    free(node_name);
>>>
>>
>> The simplification makes sense, however I'm not sure how it could lead to
>> the node
>> not being removed from the shash. The name is taken directly from the
>> node so the
>> shash_find() should always succeed. Or I'm I missing something?
>>
>>
> The issue was that we were calling ovn_uninstall_hash_destroy(mgr,
> *node->data*). Then shash_find() did not find anything as shash_find()
> looks for names, not for data.
> I guess using  ovn_uninstall_hash_destroy(mgr, *node->name*) and leaving
> ovn_uninstall_hash_destroy intact would have worked as well.
>

Ok that makes sense, for some reason I saw node->name being passed.

...
>
>>  }
>>>
>>>  static void
>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>>> index ddb3d14e9..e9d889f43 100644
>>> --- a/tests/system-ovn.at
>>> +++ b/tests/system-ovn.at
>>> @@ -11325,7 +11325,25 @@ check_ovn_installed
>>>  check_ports_up
>>>  check_ports_bound
>>>
>>> -OVS_APP_EXIT_AND_WAIT([ovn-controller])
>>> +AS_BOX(["Leave some ovn-installed while closing ovn-controller"])
>>> +# Block IDL from ovn-controller to OVSDB
>>> +stop_ovsdb_controller_updates $TCP_PORT
>>> +remove_iface_id vif2
>>> +ensure_controller_run
>>> +
>>> +# OVSDB should now be seen as read-only by ovn-controller
>>> +remove_iface_id vif1
>>> +check ovn-nbctl --wait=hv sync
>>> +
>>> +# Stop ovsdb before ovn-controller to ensure it's not updated
>>> +as
>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
>>> +/connection dropped.*/d"])
>>> +
>>> +# Don't use OVS_APP_EXIT... to use --restart to avoid cleaning up the
>>> databases.
>>> +TMPPID=$(cat $OVS_RUNDIR/ovn-controller.pid)
>>> +check ovs-appctl -t ovn-controller exit --restart
>>> +OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null])
>>>
>>
>> Is the test supposed to show the assertion? Because I have tried to run
>> it in a loop
>> for a few minutes and it was always ok.
>>
> Yeah, it always does :-)
> Without sanitizer and using gcc, I assert in
> ovs_assert(shash_is_empty(&mgr->ovn_uninstall_hash)) in
> controller/if-status.c:261
> I also see the following if running w/ clang & address sanitizer:
> ==681585==ERROR: AddressSanitizer: heap-buffer-overflow on address
> 0x6020001a5e80 at pc 0x0000004e2a99 bp 0x7fffaa58b3e0 sp 0x7fffaa58ab90
> READ of size 19 at 0x6020001a5e80 thread T0
>     #0 0x4e2a98 in __interceptor_strlen.part.0
> (./controller/ovn-controller+0x4e2a98)
>     #1 0x83e357 in shash_find ./ovs/lib/shash.c:237:35
>     #2 0x593708 in ovn_uninstall_hash_destroy
> ./controller/if-status.c:794:31
>     #3 0x593708 in if_status_mgr_clear ./controller/if-status.c:259:9
>
> Do you see any WARN in the ovn-controller.log such as "Interface name ...
> not found" ?
> Do you see "Removing iface vif2 ovn-installed in OVS" (which you should)
> and "Removing iface vif1 ovn-installed in OVS" (which you should not) ?
> Which OS/version are you running on ?
> If running on some older OS w/o nft, I think that
> stop_ovsdb_controller_updates would not stop anything, and we would not hit
> the issue (except in some quite rare race conditions).
>

I think I have found the issue, I was missing the nftables package. Maybe
we should add skip for that so it is obvious?
Also I don't see the nftables package being installed in the containers.
Those tests are most likely not running
as intended.


>
>>
>>>
>>>  as ovn-sb
>>>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>> @@ -11336,9 +11354,6 @@ OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>>  as northd
>>>  OVS_APP_EXIT_AND_WAIT([ovn-northd])
>>>
>>> -as
>>> -OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
>>> -/connection dropped.*/d"])
>>>  AT_CLEANUP
>>>  ])
>>>
>>> --
>>> 2.31.1
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>>
>> Thanks,
>> Ales
>>
>> Thanks
> Xavier
>
>> --
>>
>> Ales Musil
>>
>> Senior Software Engineer - OVN Core
>>
>> Red Hat EMEA <https://www.redhat.com>
>>
>> [email protected]
>> <https://red.ht/sig>
>>
>
Thanks,
Ales

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA <https://www.redhat.com>

[email protected]
<https://red.ht/sig>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to