On 12/3/25 10:59 PM, Mark Michelson via dev wrote:
> If a logical switch or logical router is added and removed very quickly,
> then it is possible for the IDL to place it in both the "new" and
> "deleted" tracked maps.
> 
> This was not being accounted for in the logical switch and logical
> router handlers.
> 
> For logical routers, this was not a huge issue. We would detect the
> logical router as deleted, then not be able to find the in-memory
> unsynced datapath. This would result in a recompute. Adding the check
> for this specific situation allows us to avoid an unnecessary recompute.
> 
> For logical switches, on the other hand, this was a much worse
> situation. The code first checks if the logical switch is new. In this
> case, the added and deleted logical switch matched this check, so we
> treated it as a new logical switch. This "new" logical switch would then
> get propagated to all other engine nodes. If this logical switch
> happened to reference other database rows that also had been deleted,
> this could cause chaos. One place in particular where this was observed
> was in the en_lb_data node. If the logical switch referenced a load
> balancer group that was deleted, then we would try to look up the load
> balancer group in our in-memory map of load balancer groups. However,
> since the actual database row had been deleted, it no longer existed in
> our in-memory map. This resulted in an assertion being triggered and
> crashing ovn-northd.
> 
> Added and deleted logical switches could also trigger a crash in the
> datapath syncing code as well. Since the logical switch was deleted, its
> memory could be freed by the IDL. This could cause a crash at any point
> in its processing.
> 
> The fix here is to detect the situation where the IDL reports the
> logical router or logical switch as both new and deleted. In this case,
> we treat it as a no-op and continue. This avoids both the recompute for
> logical routers, and the crashes with logical switches.
> 
> Reported-at: https://issues.redhat.com/browse/FDP-2780
> Reported-at: https://issues.redhat.com/browse/FDP-2805
> Signed-off-by: Mark Michelson <[email protected]>
> ---

Hi Mark,

Thanks for the fix!

Nit: missing fixes tag.

Fixes: 50a483086d80 ("northd: Add IP for new logical switches in 
en-datapath-logical-switch node.")

>  northd/en-datapath-logical-router.c |  7 +++++++
>  northd/en-datapath-logical-switch.c |  7 +++++++
>  tests/ovn-northd.at                 | 31 +++++++++++++++++++++++++++++
>  3 files changed, 45 insertions(+)
> 
> diff --git a/northd/en-datapath-logical-router.c 
> b/northd/en-datapath-logical-router.c
> index 4625edfd0..56785ebfb 100644
> --- a/northd/en-datapath-logical-router.c
> +++ b/northd/en-datapath-logical-router.c
> @@ -164,6 +164,13 @@ en_datapath_logical_router_logical_router_handler(struct 
> engine_node *node,
>      struct ovn_unsynced_datapath *udp;
>      const struct nbrec_logical_router *nbr;
>      NBREC_LOGICAL_ROUTER_TABLE_FOR_EACH_TRACKED (nbr, nb_lr_table) {
> +        /* If the logical router is added and removed within the same
> +         * transaction, then this is a no-op
> +         */
> +        if (nbrec_logical_router_is_new(nbr) &&
> +            nbrec_logical_router_is_deleted(nbr)) {
> +            continue;
> +        }
>          udp = ovn_unsynced_datapath_find(map, &nbr->header_.uuid);
>  
>          if (nbrec_logical_router_is_deleted(nbr) && !udp) {
> diff --git a/northd/en-datapath-logical-switch.c 
> b/northd/en-datapath-logical-switch.c
> index 0527239ce..12afbb45a 100644
> --- a/northd/en-datapath-logical-switch.c
> +++ b/northd/en-datapath-logical-switch.c
> @@ -136,6 +136,13 @@ datapath_logical_switch_handler(struct engine_node 
> *node, void *data)
>  
>      const struct nbrec_logical_switch *nbs;
>      NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (nbs, nb_ls_table) {
> +        /* If the logical switch is added and removed within the same
> +         * transaction, then this is a no-op
> +         */
> +        if (nbrec_logical_switch_is_new(nbs) &&
> +            nbrec_logical_switch_is_deleted(nbs)) {
> +            continue;
> +        }
>          struct ovn_unsynced_datapath *udp =
>              ovn_unsynced_datapath_find(map, &nbs->header_.uuid);
>  
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index b5674a464..44524a57a 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -18069,3 +18069,34 @@ wait_row_count Igmp_Group 0 address=mrouters
>  OVN_CLEANUP_NORTHD
>  AT_CLEANUP
>  ])
> +
> +OVN_FOR_EACH_NORTHD_NO_HV([
> +AT_SETUP([Added and Deleted Logical Switch])
> +ovn_start
> +
> +# For this test, we will pause northd. While northd is paused, we will
> +# add a logical switch, a load balancer, and a load balancer group. The load
> +# balancer will be in the load balancer group, and the logical switch will 
> use
> +# the load balancer group. We will then delete the logical switch, the load
> +# balancer, and the load balancer group. We then will unpause ovn-northd.
> +#
> +# The test only fails if putting northd to sleep or waking it up fails due to
> +# ovn-northd crashing.
> +#
> +# We run this a few times in a loop just to be certain we aren't causing any
> +# catastrophic issues.
> +for (( i=0; i<5; i++ )) ; do

Nit: I'd write this as:

for i in $(seq 5); do

> +    sleep_northd
> +    check ovn-nbctl ls-add ls1
> +    check ovn-nbctl lb-add lb1 192.168.0.1 10.0.0.1
> +    lb_uuid=$(ovn-nbctl --bare --columns=_uuid find load_balancer name=lb1)

Nit: I'd use fetch_column:

lb_uuid=$(fetch_column nb:Load_Balancer _uuid name=lb1

> +    lbg_uuid=$(ovn-nbctl create load_balancer_group name=lbg1 
> load_balancer=$lb_uuid)
> +    check ovn-nbctl set logical_switch ls1 load_balancer_group=$lbg_uuid
> +    check ovn-nbctl ls-del ls1
> +    check ovn-nbctl lb-del lb1
> +    check ovn-nbctl --all destroy load_balancer_group
> +    wake_up_northd
> +done
> +OVN_CLEANUP_NORTHD
> +AT_CLEANUP
> +])

With the tiny nits addressed:

Acked-by: Dumitru Ceara <[email protected]>

Regards,
Dumitru


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to