On 12/3/25 10:59 PM, Mark Michelson via dev wrote:
> If a logical switch or logical router is added and removed very quickly,
> then it is possible for the IDL to place it in both the "new" and
> "deleted" tracked maps.
>
> This was not being accounted for in the logical switch and logical
> router handlers.
>
> For logical routers, this was not a huge issue. We would detect the
> logical router as deleted, then not be able to find the in-memory
> unsynced datapath. This would result in a recompute. Adding the check
> for this specific situation allows us to avoid an unnecessary recompute.
>
> For logical switches, on the other hand, this was a much worse
> situation. The code first checks if the logical switch is new. In this
> case, the added and deleted logical switch matched this check, so we
> treated it as a new logical switch. This "new" logical switch would then
> get propagated to all other engine nodes. If this logical switch
> happened to reference other database rows that also had been deleted,
> this could cause chaos. One place in particular where this was observed
> was in the en_lb_data node. If the logical switch referenced a load
> balancer group that was deleted, then we would try to look up the load
> balancer group in our in-memory map of load balancer groups. However,
> since the actual database row had been deleted, it no longer existed in
> our in-memory map. This resulted in an assertion being triggered and
> crashing ovn-northd.
>
> Added and deleted logical switches could also trigger a crash in the
> datapath syncing code as well. Since the logical switch was deleted, its
> memory could be freed by the IDL. This could cause a crash at any point
> in its processing.
>
> The fix here is to detect the situation where the IDL reports the
> logical router or logical switch as both new and deleted. In this case,
> we treat it as a no-op and continue. This avoids both the recompute for
> logical routers, and the crashes with logical switches.
>
> Reported-at: https://issues.redhat.com/browse/FDP-2780
> Reported-at: https://issues.redhat.com/browse/FDP-2805
> Signed-off-by: Mark Michelson <[email protected]>
> ---
Hi Mark,
Thanks for the fix!
Nit: missing fixes tag.
Fixes: 50a483086d80 ("northd: Add IP for new logical switches in
en-datapath-logical-switch node.")
> northd/en-datapath-logical-router.c | 7 +++++++
> northd/en-datapath-logical-switch.c | 7 +++++++
> tests/ovn-northd.at | 31 +++++++++++++++++++++++++++++
> 3 files changed, 45 insertions(+)
>
> diff --git a/northd/en-datapath-logical-router.c
> b/northd/en-datapath-logical-router.c
> index 4625edfd0..56785ebfb 100644
> --- a/northd/en-datapath-logical-router.c
> +++ b/northd/en-datapath-logical-router.c
> @@ -164,6 +164,13 @@ en_datapath_logical_router_logical_router_handler(struct
> engine_node *node,
> struct ovn_unsynced_datapath *udp;
> const struct nbrec_logical_router *nbr;
> NBREC_LOGICAL_ROUTER_TABLE_FOR_EACH_TRACKED (nbr, nb_lr_table) {
> + /* If the logical router is added and removed within the same
> + * transaction, then this is a no-op
> + */
> + if (nbrec_logical_router_is_new(nbr) &&
> + nbrec_logical_router_is_deleted(nbr)) {
> + continue;
> + }
> udp = ovn_unsynced_datapath_find(map, &nbr->header_.uuid);
>
> if (nbrec_logical_router_is_deleted(nbr) && !udp) {
> diff --git a/northd/en-datapath-logical-switch.c
> b/northd/en-datapath-logical-switch.c
> index 0527239ce..12afbb45a 100644
> --- a/northd/en-datapath-logical-switch.c
> +++ b/northd/en-datapath-logical-switch.c
> @@ -136,6 +136,13 @@ datapath_logical_switch_handler(struct engine_node
> *node, void *data)
>
> const struct nbrec_logical_switch *nbs;
> NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (nbs, nb_ls_table) {
> + /* If the logical switch is added and removed within the same
> + * transaction, then this is a no-op
> + */
> + if (nbrec_logical_switch_is_new(nbs) &&
> + nbrec_logical_switch_is_deleted(nbs)) {
> + continue;
> + }
> struct ovn_unsynced_datapath *udp =
> ovn_unsynced_datapath_find(map, &nbs->header_.uuid);
>
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index b5674a464..44524a57a 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -18069,3 +18069,34 @@ wait_row_count Igmp_Group 0 address=mrouters
> OVN_CLEANUP_NORTHD
> AT_CLEANUP
> ])
> +
> +OVN_FOR_EACH_NORTHD_NO_HV([
> +AT_SETUP([Added and Deleted Logical Switch])
> +ovn_start
> +
> +# For this test, we will pause northd. While northd is paused, we will
> +# add a logical switch, a load balancer, and a load balancer group. The load
> +# balancer will be in the load balancer group, and the logical switch will
> use
> +# the load balancer group. We will then delete the logical switch, the load
> +# balancer, and the load balancer group. We then will unpause ovn-northd.
> +#
> +# The test only fails if putting northd to sleep or waking it up fails due to
> +# ovn-northd crashing.
> +#
> +# We run this a few times in a loop just to be certain we aren't causing any
> +# catastrophic issues.
> +for (( i=0; i<5; i++ )) ; do
Nit: I'd write this as:
for i in $(seq 5); do
> + sleep_northd
> + check ovn-nbctl ls-add ls1
> + check ovn-nbctl lb-add lb1 192.168.0.1 10.0.0.1
> + lb_uuid=$(ovn-nbctl --bare --columns=_uuid find load_balancer name=lb1)
Nit: I'd use fetch_column:
lb_uuid=$(fetch_column nb:Load_Balancer _uuid name=lb1
> + lbg_uuid=$(ovn-nbctl create load_balancer_group name=lbg1
> load_balancer=$lb_uuid)
> + check ovn-nbctl set logical_switch ls1 load_balancer_group=$lbg_uuid
> + check ovn-nbctl ls-del ls1
> + check ovn-nbctl lb-del lb1
> + check ovn-nbctl --all destroy load_balancer_group
> + wake_up_northd
> +done
> +OVN_CLEANUP_NORTHD
> +AT_CLEANUP
> +])
With the tiny nits addressed:
Acked-by: Dumitru Ceara <[email protected]>
Regards,
Dumitru
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev