On Mon, Dec 1, 2025 at 12:15 PM Ales Musil <[email protected]> wrote:
>
>
>
> On Mon, Dec 1, 2025 at 5:40 PM Ales Musil <[email protected]> wrote:
>>
>>
>>
>> On Mon, Dec 1, 2025 at 5:37 PM Mark Michelson <[email protected]> wrote:
>>>
>>> On Mon, Dec 1, 2025 at 10:50 AM Ales Musil via dev
>>> <[email protected]> wrote:
>>> >
>>> > The hmap could be NULL under one circumstance, node depending
>>> > on the northd data would run before the northd node right
>>> > after the start. Make sure we won't crash in that case.
>>>
>>> Can you explain this in a bit more detail? How does a node that
>>> depends on northd data run before the northd node?
>>
>>
>> It also depends on SB nodes. So it will get triggered from the SB update
>> first.
>
>
> Looking through the issue once more you are right, that doesn't make sense.
> It's actually slightly different. We only pass ls_datapaths, so if the
> datapath is lr
> our lr_datapaths are NULL so we would crash there. I can change the commit
> message
> during merge to reflect that better:
>
> northd: Add back check for datapath hmap.
>
> In some cases northd is passing only ls_datapaths/lr_datapaths
> into the ovn_datapath_from_sbrec(), the other hmap being NULL.
> This is problematic when the function runs into the second datapath
> type "unexpectedly" during processing, which would lead to NULL ptr
> deref and crash:
>
> openvswitch/hmap.h:360:40: runtime error: member access within null pointer
> of type 'const struct hmap'
> 0x0000004f5b74 in hmap_first_with_hash openvswitch/hmap.h:360:40
> 0x0000004f1d3a in ovn_datapath_find_ northd/northd.c:601:5
> 0x0000004f21ad in ovn_datapath_from_sbrec northd/northd.c:654:31
> 0x00000060aa50 in build_mcast_groups northd/en-multicast.c:280:14
> 0x000000608c4b in en_multicast_igmp_run northd/en-multicast.c:135:5
> 0x0000006f11ed in engine_recompute lib/inc-proc-eng.c:445:33
> 0x0000006ef315 in engine_run_node lib/inc-proc-eng.c:519:9
> 0x0000006ef315 in engine_run lib/inc-proc-eng.c:576:9
> 0x0000006635fc in inc_proc_northd_run northd/inc-proc-northd.c:608:5
> 0x0000005dc196 in main northd/ovn-northd.c:1109:36
>
> Make sure we check for the hmap being NULL before attempting to
> find the ovn_datapath struct.
>
> WDYT?
That is much more clear than the original commit message, thanks.
However, this seems like a strange problem to have. en-multicast is
providing a SB datapath that it thinks should be a logical switch to
ovn_datapath_from_sbrec(). However, the datapath is actually a logical
router. This seems like something that should be fixed in the IGMP
processing code. If it's valid for IGMP groups to reference logical
routers, then both maps should be provided to
ovn_datapath_from_sbrec(). If IGMP groups cannot reference logical
routers, then there is some bug causing IGMP groups to have incorrect
datapath references.
>>
>>
>>>
>>>
>>> >
>>> > Fixes: 6919992d8781 ("Datapath_Binding: Separate type column and sync
>>> > NB.UUID to SB.")
>>> > Reported-at: https://issues.redhat.com/browse/FDP-2757
>>> > Signed-off-by: Ales Musil <[email protected]>
>>> > ---
>>> > northd/northd.c | 4 ++++
>>> > tests/ovn-northd.at | 29 +++++++++++++++++++++++++++++
>>> > 2 files changed, 33 insertions(+)
>>> >
>>> > diff --git a/northd/northd.c b/northd/northd.c
>>> > index ec219a0c7..3bd1f6849 100644
>>> > --- a/northd/northd.c
>>> > +++ b/northd/northd.c
>>> > @@ -647,6 +647,10 @@ ovn_datapath_from_sbrec(const struct hmap
>>> > *ls_datapaths,
>>> > return NULL;
>>> > }
>>> >
>>> > + if (!dps) {
>>> > + return NULL;
>>> > + }
>>> > +
>>> > struct ovn_datapath *od = ovn_datapath_find_(dps, &key);
>>> > if (od && (od->sdp->sb_dp == sb)) {
>>> > return od;
>>> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>>> > index 380c9ff80..0bc2808ac 100644
>>> > --- a/tests/ovn-northd.at
>>> > +++ b/tests/ovn-northd.at
>>> > @@ -18885,3 +18885,32 @@ AT_CHECK([ovn-sbctl lflow-list ls1 | grep
>>> > ls_in_apply_port_sec | ovn_strip_lflow
>>> >
>>> > AT_CLEANUP
>>> > ])
>>> > +
>>> > +OVN_FOR_EACH_NORTHD_NO_HV([
>>> > +AT_SETUP([IGMP northd crash])
>>> > +ovn_start
>>> > +
>>> > +check ovn-nbctl lr-add lr
>>> > +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1
>>> > +
>>> > +# Kill northd and start it paused this time.
>>> > +as northd
>>> > +OVS_APP_EXIT_AND_WAIT([ovn-northd])
>>> > +ovn_start_northd --paused
>>> > +
>>> > +# Create IGMP Group for lr datapath.
>>> > +check_uuid ovn-sbctl create IGMP_Group address=mrouters \
>>> > + datapath=$(fetch_column Datapath_Binding _uuid external_ids:name=lr)
>>> > \
>>> > + chassis=$(fetch_column Chassis _uuid name=hv1) \
>>> > + chassis_name=hv1
>>> > +wait_row_count Igmp_Group 1 address=mrouters
>>> > +
>>> > +# Unpause northd and check if we didn't crash.
>>> > +as northd ovn-appctl -t ovn-northd resume
>>> > +check ovn-nbctl --wait=sb sync
>>> > +
>>> > +wait_row_count Igmp_Group 0 address=mrouters
>>> > +
>>> > +OVN_CLEANUP_NORTHD
>>> > +AT_CLEANUP
>>> > +])
>>> > --
>>> > 2.51.1
>>> >
>>> > _______________________________________________
>>> > dev mailing list
>>> > [email protected]
>>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>> >
>>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev