On 2/26/25 3:00 PM, Ales Musil wrote: > On Wed, Feb 26, 2025 at 2:52 PM Ales Musil <amu...@redhat.com> wrote: > >> >> >> On Wed, Feb 26, 2025 at 2:34 PM Felix Huettner >> <felix.huettner@stackit.cloud> wrote: >> >>> On Wed, Feb 26, 2025 at 12:01:30PM +0100, Ales Musil wrote: >>>> We would crash northd with assert when LSP was configured to use >>>> LRP that already had a peer defined. Prevent the crash and add >>>> warning that the configuration is not valid. >>>> >>>> Reported-by: Enrique Llorente <ellor...@redhat.com> >>>> Signed-off-by: Ales Musil <amu...@redhat.com> >>> >>> Hi Ales, >>> >>> thanks for the patch. >>>
Thanks Ales for the patch and Felix for the review! I'll drop some more comments here. >>>> --- >>>> northd/northd.c | 9 +++++++++ >>>> tests/ovn-northd.at | 21 +++++++++++++++++++++ >>>> 2 files changed, 30 insertions(+) >>>> >>>> diff --git a/northd/northd.c b/northd/northd.c >>>> index a91e48ac2..15dd022a4 100644 >>>> --- a/northd/northd.c >>>> +++ b/northd/northd.c >>>> @@ -2452,6 +2452,15 @@ join_logical_ports(const struct >>> sbrec_port_binding_table *sbrec_pb_table, >>>> continue; >>>> } >>>> >>>> + if (peer->nbrp->peer) { >>> >>> I think that this is relying on the lrps peers being handled before the >>> lsp that references it. >>> However i did not find such a guarantee. I have a testcase below that >>> seems >>> to be able to still trigger that issue in some cases. >>> >> >> nbrp->peer is directly from the database. Even if it is some order >> relation >> we shouldn't overwrite the peer and cause assert failure along the way. >> >> >>>> + static struct vlog_rate_limit rl = >>> VLOG_RATE_LIMIT_INIT(1, 5); >>>> + VLOG_WARN_RL(&rl, "Bad configuration: The peer of the >>> switch" >>>> + " port '%s' (LRP peer: '%s') has its own >>> peer" >>>> + " configuration: '%s'", op->key, >>> peer->key, Nit: I'd prefer if the " " is at the end of the previous line. >>>> + peer->nbrp->peer); >>>> + continue; >>>> + } >>>> + >>>> ovn_datapath_add_router_port(op->od, op); >>>> ovn_datapath_add_ls_peer(peer->od, op->od); >>>> peer->peer = op; >>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>> index c74d57f7e..a7aaa1832 100644 >>>> --- a/tests/ovn-northd.at >>>> +++ b/tests/ovn-northd.at >>>> @@ -16695,3 +16695,24 @@ check_row_count Port_Binding 1 >>> logical_port=lrp2 options:peer{not-in}lrp0 >>>> >>>> AT_CLEANUP >>>> ]) >>>> + >>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>> +AT_SETUP([LSP using LRP with peer]) >>>> +ovn_start >>>> + >>>> +check ovn-nbctl \ We're missing a --wait=sb sync here which makes the test a bit unreliable. >>>> + -- lr-add lr \ >>>> + -- ls-add ls \ >>>> + -- lrp-add lr lrp0 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp1 \ >>>> + -- lrp-add lr lrp1 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp0 \ >>>> + -- lsp-add ls ls-lrp1 \ >>>> + -- lsp-set-type ls-lrp1 router \ >>>> + -- lsp-set-options ls-lrp1 router-port=lrp1 \ >>>> + -- lsp-set-addresses ls-lrp1 router >>>> + >>>> +AT_CHECK([grep -c "Bad configuration: The peer of the switch port >>> 'ls-lrp1' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'" >>> northd/ovn-northd.log], [0], [dnl >>>> +1 >>>> +]) With the --wait=sb sync this would return 2 (one for the NB update and one for the SB update). I'd change it to "check grep -q ...". >>>> + >>>> +AT_CLEANUP >>>> +]) >>> >>> I used the following testcase. >>> >>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>> +AT_SETUP([LSP using LRP with peer]) >>> +ovn_start >>> + >>> +add_lsp() { >>> + check ovn-nbctl \ >>> + -- lsp-add ls $1 \ >>> + -- lsp-set-type $1 router \ >>> + -- lsp-set-options $1 router-port=lrp1 \ >>> + -- lsp-set-addresses $1 router >>> +} >>> + >>> +check ovn-nbctl \ >>> + -- lr-add lr \ >>> + -- ls-add ls \ >>> + -- lrp-add lr lrp0 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp1 \ >>> + -- lrp-add lr lrp1 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp0 >>> + >>> +add_lsp lsp1 >>> +add_lsp lsp2 >>> +add_lsp lsp3 >>> + >>> +check ovn-nbctl --wait=sb sync >>> + >>> +AT_CHECK([grep -qc "Bad configuration: The peer of the switch port >>> 'lsp1' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'" >>> northd/ovn-northd.log], [0], []) >>> +AT_CHECK([grep -qc "Bad configuration: The peer of the switch port >>> 'lsp2' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'" >>> northd/ovn-northd.log], [0], []) >>> +AT_CHECK([grep -qc "Bad configuration: The peer of the switch port >>> 'lsp3' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'" >>> northd/ovn-northd.log], [0], []) >>> + >>> +AT_CLEANUP >>> +]) >>> >>> The test fails at the third grep (at least on my system). >>> >>> From the output it looks like the message is only shown for lsp1 and >>> lsp2, but not for lsp3. I assume it is related to the hash of the >>> ovn_port in the ports hmap and thereby the iteration order. >>> >> >> So I'm not sure I understand, will we assert in the last case? >> > > Tried your test to see what is happening and you can't see the message > because of rate limiting, if you disable rate limit it's there. > Does it make sense to integrate this into the test suite too? Thanks, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev