Thanks for the review, Dumitru. I addressed the nits and pushed the
change to main and branch-25.09.

On Thu, Dec 4, 2025 at 5:56 AM Dumitru Ceara <[email protected]> wrote:
>
> On 12/3/25 10:59 PM, Mark Michelson via dev wrote:
> > If a logical switch or logical router is added and removed very quickly,
> > then it is possible for the IDL to place it in both the "new" and
> > "deleted" tracked maps.
> >
> > This was not being accounted for in the logical switch and logical
> > router handlers.
> >
> > For logical routers, this was not a huge issue. We would detect the
> > logical router as deleted, then not be able to find the in-memory
> > unsynced datapath. This would result in a recompute. Adding the check
> > for this specific situation allows us to avoid an unnecessary recompute.
> >
> > For logical switches, on the other hand, this was a much worse
> > situation. The code first checks if the logical switch is new. In this
> > case, the added and deleted logical switch matched this check, so we
> > treated it as a new logical switch. This "new" logical switch would then
> > get propagated to all other engine nodes. If this logical switch
> > happened to reference other database rows that also had been deleted,
> > this could cause chaos. One place in particular where this was observed
> > was in the en_lb_data node. If the logical switch referenced a load
> > balancer group that was deleted, then we would try to look up the load
> > balancer group in our in-memory map of load balancer groups. However,
> > since the actual database row had been deleted, it no longer existed in
> > our in-memory map. This resulted in an assertion being triggered and
> > crashing ovn-northd.
> >
> > Added and deleted logical switches could also trigger a crash in the
> > datapath syncing code as well. Since the logical switch was deleted, its
> > memory could be freed by the IDL. This could cause a crash at any point
> > in its processing.
> >
> > The fix here is to detect the situation where the IDL reports the
> > logical router or logical switch as both new and deleted. In this case,
> > we treat it as a no-op and continue. This avoids both the recompute for
> > logical routers, and the crashes with logical switches.
> >
> > Reported-at: https://issues.redhat.com/browse/FDP-2780
> > Reported-at: https://issues.redhat.com/browse/FDP-2805
> > Signed-off-by: Mark Michelson <[email protected]>
> > ---
>
> Hi Mark,
>
> Thanks for the fix!
>
> Nit: missing fixes tag.
>
> Fixes: 50a483086d80 ("northd: Add IP for new logical switches in 
> en-datapath-logical-switch node.")
>
> >  northd/en-datapath-logical-router.c |  7 +++++++
> >  northd/en-datapath-logical-switch.c |  7 +++++++
> >  tests/ovn-northd.at                 | 31 +++++++++++++++++++++++++++++
> >  3 files changed, 45 insertions(+)
> >
> > diff --git a/northd/en-datapath-logical-router.c 
> > b/northd/en-datapath-logical-router.c
> > index 4625edfd0..56785ebfb 100644
> > --- a/northd/en-datapath-logical-router.c
> > +++ b/northd/en-datapath-logical-router.c
> > @@ -164,6 +164,13 @@ 
> > en_datapath_logical_router_logical_router_handler(struct engine_node *node,
> >      struct ovn_unsynced_datapath *udp;
> >      const struct nbrec_logical_router *nbr;
> >      NBREC_LOGICAL_ROUTER_TABLE_FOR_EACH_TRACKED (nbr, nb_lr_table) {
> > +        /* If the logical router is added and removed within the same
> > +         * transaction, then this is a no-op
> > +         */
> > +        if (nbrec_logical_router_is_new(nbr) &&
> > +            nbrec_logical_router_is_deleted(nbr)) {
> > +            continue;
> > +        }
> >          udp = ovn_unsynced_datapath_find(map, &nbr->header_.uuid);
> >
> >          if (nbrec_logical_router_is_deleted(nbr) && !udp) {
> > diff --git a/northd/en-datapath-logical-switch.c 
> > b/northd/en-datapath-logical-switch.c
> > index 0527239ce..12afbb45a 100644
> > --- a/northd/en-datapath-logical-switch.c
> > +++ b/northd/en-datapath-logical-switch.c
> > @@ -136,6 +136,13 @@ datapath_logical_switch_handler(struct engine_node 
> > *node, void *data)
> >
> >      const struct nbrec_logical_switch *nbs;
> >      NBREC_LOGICAL_SWITCH_TABLE_FOR_EACH_TRACKED (nbs, nb_ls_table) {
> > +        /* If the logical switch is added and removed within the same
> > +         * transaction, then this is a no-op
> > +         */
> > +        if (nbrec_logical_switch_is_new(nbs) &&
> > +            nbrec_logical_switch_is_deleted(nbs)) {
> > +            continue;
> > +        }
> >          struct ovn_unsynced_datapath *udp =
> >              ovn_unsynced_datapath_find(map, &nbs->header_.uuid);
> >
> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > index b5674a464..44524a57a 100644
> > --- a/tests/ovn-northd.at
> > +++ b/tests/ovn-northd.at
> > @@ -18069,3 +18069,34 @@ wait_row_count Igmp_Group 0 address=mrouters
> >  OVN_CLEANUP_NORTHD
> >  AT_CLEANUP
> >  ])
> > +
> > +OVN_FOR_EACH_NORTHD_NO_HV([
> > +AT_SETUP([Added and Deleted Logical Switch])
> > +ovn_start
> > +
> > +# For this test, we will pause northd. While northd is paused, we will
> > +# add a logical switch, a load balancer, and a load balancer group. The 
> > load
> > +# balancer will be in the load balancer group, and the logical switch will 
> > use
> > +# the load balancer group. We will then delete the logical switch, the load
> > +# balancer, and the load balancer group. We then will unpause ovn-northd.
> > +#
> > +# The test only fails if putting northd to sleep or waking it up fails due 
> > to
> > +# ovn-northd crashing.
> > +#
> > +# We run this a few times in a loop just to be certain we aren't causing 
> > any
> > +# catastrophic issues.
> > +for (( i=0; i<5; i++ )) ; do
>
> Nit: I'd write this as:
>
> for i in $(seq 5); do
>
> > +    sleep_northd
> > +    check ovn-nbctl ls-add ls1
> > +    check ovn-nbctl lb-add lb1 192.168.0.1 10.0.0.1
> > +    lb_uuid=$(ovn-nbctl --bare --columns=_uuid find load_balancer name=lb1)
>
> Nit: I'd use fetch_column:
>
> lb_uuid=$(fetch_column nb:Load_Balancer _uuid name=lb1
>
> > +    lbg_uuid=$(ovn-nbctl create load_balancer_group name=lbg1 
> > load_balancer=$lb_uuid)
> > +    check ovn-nbctl set logical_switch ls1 load_balancer_group=$lbg_uuid
> > +    check ovn-nbctl ls-del ls1
> > +    check ovn-nbctl lb-del lb1
> > +    check ovn-nbctl --all destroy load_balancer_group
> > +    wake_up_northd
> > +done
> > +OVN_CLEANUP_NORTHD
> > +AT_CLEANUP
> > +])
>
> With the tiny nits addressed:
>
> Acked-by: Dumitru Ceara <[email protected]>
>
> Regards,
> Dumitru
>
>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to