On Wed, Feb 26, 2025 at 2:52 PM Ales Musil <amu...@redhat.com> wrote:

>
>
> On Wed, Feb 26, 2025 at 2:34 PM Felix Huettner
> <felix.huettner@stackit.cloud> wrote:
>
>> On Wed, Feb 26, 2025 at 12:01:30PM +0100, Ales Musil wrote:
>> > We would crash northd with assert when LSP was configured to use
>> > LRP that already had a peer defined. Prevent the crash and add
>> > warning that the configuration is not valid.
>> >
>> > Reported-by: Enrique Llorente <ellor...@redhat.com>
>> > Signed-off-by: Ales Musil <amu...@redhat.com>
>>
>> Hi Ales,
>>
>> thanks for the patch.
>>
>> > ---
>> >  northd/northd.c     |  9 +++++++++
>> >  tests/ovn-northd.at | 21 +++++++++++++++++++++
>> >  2 files changed, 30 insertions(+)
>> >
>> > diff --git a/northd/northd.c b/northd/northd.c
>> > index a91e48ac2..15dd022a4 100644
>> > --- a/northd/northd.c
>> > +++ b/northd/northd.c
>> > @@ -2452,6 +2452,15 @@ join_logical_ports(const struct
>> sbrec_port_binding_table *sbrec_pb_table,
>> >                  continue;
>> >              }
>> >
>> > +            if (peer->nbrp->peer) {
>>
>> I think that this is relying on the lrps peers being handled before the
>> lsp that references it.
>> However i did not find such a guarantee. I have a testcase below that
>> seems
>> to be able to still trigger that issue in some cases.
>>
>
> nbrp->peer is directly from the database.  Even if it is some order
> relation
> we shouldn't overwrite the peer and cause assert failure along the way.
>
>
>> > +                static struct vlog_rate_limit rl =
>> VLOG_RATE_LIMIT_INIT(1, 5);
>> > +                VLOG_WARN_RL(&rl, "Bad configuration: The peer of the
>> switch"
>> > +                             " port '%s' (LRP peer: '%s') has its own
>> peer"
>> > +                             " configuration: '%s'", op->key,
>> peer->key,
>> > +                             peer->nbrp->peer);
>> > +                continue;
>> > +            }
>> > +
>> >              ovn_datapath_add_router_port(op->od, op);
>> >              ovn_datapath_add_ls_peer(peer->od, op->od);
>> >              peer->peer = op;
>> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>> > index c74d57f7e..a7aaa1832 100644
>> > --- a/tests/ovn-northd.at
>> > +++ b/tests/ovn-northd.at
>> > @@ -16695,3 +16695,24 @@ check_row_count Port_Binding 1
>> logical_port=lrp2 options:peer{not-in}lrp0
>> >
>> >  AT_CLEANUP
>> >  ])
>> > +
>> > +OVN_FOR_EACH_NORTHD_NO_HV([
>> > +AT_SETUP([LSP using LRP with peer])
>> > +ovn_start
>> > +
>> > +check ovn-nbctl \
>> > +    -- lr-add lr \
>> > +    -- ls-add ls \
>> > +    -- lrp-add lr lrp0 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp1 \
>> > +    -- lrp-add lr lrp1 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp0 \
>> > +    -- lsp-add ls ls-lrp1 \
>> > +    -- lsp-set-type ls-lrp1 router \
>> > +    -- lsp-set-options ls-lrp1 router-port=lrp1 \
>> > +    -- lsp-set-addresses ls-lrp1 router
>> > +
>> > +AT_CHECK([grep -c "Bad configuration: The peer of the switch port
>> 'ls-lrp1' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'"
>> northd/ovn-northd.log], [0], [dnl
>> > +1
>> > +])
>> > +
>> > +AT_CLEANUP
>> > +])
>>
>> I used the following testcase.
>>
>> +OVN_FOR_EACH_NORTHD_NO_HV([
>> +AT_SETUP([LSP using LRP with peer])
>> +ovn_start
>> +
>> +add_lsp() {
>> +    check ovn-nbctl \
>> +        -- lsp-add ls $1 \
>> +        -- lsp-set-type $1 router \
>> +        -- lsp-set-options $1 router-port=lrp1 \
>> +        -- lsp-set-addresses $1 router
>> +}
>> +
>> +check ovn-nbctl \
>> +    -- lr-add lr \
>> +    -- ls-add ls \
>> +    -- lrp-add lr lrp0 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp1 \
>> +    -- lrp-add lr lrp1 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp0
>> +
>> +add_lsp lsp1
>> +add_lsp lsp2
>> +add_lsp lsp3
>> +
>> +check ovn-nbctl --wait=sb sync
>> +
>> +AT_CHECK([grep -qc "Bad configuration: The peer of the switch port
>> 'lsp1' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'"
>> northd/ovn-northd.log], [0], [])
>> +AT_CHECK([grep -qc "Bad configuration: The peer of the switch port
>> 'lsp2' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'"
>> northd/ovn-northd.log], [0], [])
>> +AT_CHECK([grep -qc "Bad configuration: The peer of the switch port
>> 'lsp3' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'"
>> northd/ovn-northd.log], [0], [])
>> +
>> +AT_CLEANUP
>> +])
>>
>> The test fails at the third grep (at least on my system).
>>
>> From the output it looks like the message is only shown for lsp1 and
>> lsp2, but not for lsp3. I assume it is related to the hash of the
>> ovn_port in the ports hmap and thereby the iteration order.
>>
>
> So I'm not sure I understand, will we assert in the last case?
>

Tried your test to see what is happening and you can't see the message
because of rate limiting, if you disable rate limit it's there.

>
>
>> Thanks a lot,
>> Felix
>>
>>
>> > --
>> > 2.48.1
>> >
>> > _______________________________________________
>> > dev mailing list
>> > d...@openvswitch.org
>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
> Thanks,
> Ales
>

Thanks,
Ales
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to