On 2/26/25 3:00 PM, Ales Musil wrote:
> On Wed, Feb 26, 2025 at 2:52 PM Ales Musil <amu...@redhat.com> wrote:
> 
>>
>>
>> On Wed, Feb 26, 2025 at 2:34 PM Felix Huettner
>> <felix.huettner@stackit.cloud> wrote:
>>
>>> On Wed, Feb 26, 2025 at 12:01:30PM +0100, Ales Musil wrote:
>>>> We would crash northd with assert when LSP was configured to use
>>>> LRP that already had a peer defined. Prevent the crash and add
>>>> warning that the configuration is not valid.
>>>>
>>>> Reported-by: Enrique Llorente <ellor...@redhat.com>
>>>> Signed-off-by: Ales Musil <amu...@redhat.com>
>>>
>>> Hi Ales,
>>>
>>> thanks for the patch.
>>>

Thanks Ales for the patch and Felix for the review!

I'll drop some more comments here.

>>>> ---
>>>>  northd/northd.c     |  9 +++++++++
>>>>  tests/ovn-northd.at | 21 +++++++++++++++++++++
>>>>  2 files changed, 30 insertions(+)
>>>>
>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>> index a91e48ac2..15dd022a4 100644
>>>> --- a/northd/northd.c
>>>> +++ b/northd/northd.c
>>>> @@ -2452,6 +2452,15 @@ join_logical_ports(const struct
>>> sbrec_port_binding_table *sbrec_pb_table,
>>>>                  continue;
>>>>              }
>>>>
>>>> +            if (peer->nbrp->peer) {
>>>
>>> I think that this is relying on the lrps peers being handled before the
>>> lsp that references it.
>>> However i did not find such a guarantee. I have a testcase below that
>>> seems
>>> to be able to still trigger that issue in some cases.
>>>
>>
>> nbrp->peer is directly from the database.  Even if it is some order
>> relation
>> we shouldn't overwrite the peer and cause assert failure along the way.
>>
>>
>>>> +                static struct vlog_rate_limit rl =
>>> VLOG_RATE_LIMIT_INIT(1, 5);
>>>> +                VLOG_WARN_RL(&rl, "Bad configuration: The peer of the
>>> switch"
>>>> +                             " port '%s' (LRP peer: '%s') has its own
>>> peer"
>>>> +                             " configuration: '%s'", op->key,
>>> peer->key,

Nit: I'd prefer if the " " is at the end of the previous line.

>>>> +                             peer->nbrp->peer);
>>>> +                continue;
>>>> +            }
>>>> +
>>>>              ovn_datapath_add_router_port(op->od, op);
>>>>              ovn_datapath_add_ls_peer(peer->od, op->od);
>>>>              peer->peer = op;
>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>>>> index c74d57f7e..a7aaa1832 100644
>>>> --- a/tests/ovn-northd.at
>>>> +++ b/tests/ovn-northd.at
>>>> @@ -16695,3 +16695,24 @@ check_row_count Port_Binding 1
>>> logical_port=lrp2 options:peer{not-in}lrp0
>>>>
>>>>  AT_CLEANUP
>>>>  ])
>>>> +
>>>> +OVN_FOR_EACH_NORTHD_NO_HV([
>>>> +AT_SETUP([LSP using LRP with peer])
>>>> +ovn_start
>>>> +
>>>> +check ovn-nbctl \

We're missing a --wait=sb sync here which makes the test a bit unreliable.

>>>> +    -- lr-add lr \
>>>> +    -- ls-add ls \
>>>> +    -- lrp-add lr lrp0 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp1 \
>>>> +    -- lrp-add lr lrp1 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp0 \
>>>> +    -- lsp-add ls ls-lrp1 \
>>>> +    -- lsp-set-type ls-lrp1 router \
>>>> +    -- lsp-set-options ls-lrp1 router-port=lrp1 \
>>>> +    -- lsp-set-addresses ls-lrp1 router
>>>> +
>>>> +AT_CHECK([grep -c "Bad configuration: The peer of the switch port
>>> 'ls-lrp1' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'"
>>> northd/ovn-northd.log], [0], [dnl
>>>> +1
>>>> +])

With the --wait=sb sync this would return 2 (one for the NB update and
one for the SB update).

I'd change it to "check grep -q ...".

>>>> +
>>>> +AT_CLEANUP
>>>> +])
>>>
>>> I used the following testcase.
>>>
>>> +OVN_FOR_EACH_NORTHD_NO_HV([
>>> +AT_SETUP([LSP using LRP with peer])
>>> +ovn_start
>>> +
>>> +add_lsp() {
>>> +    check ovn-nbctl \
>>> +        -- lsp-add ls $1 \
>>> +        -- lsp-set-type $1 router \
>>> +        -- lsp-set-options $1 router-port=lrp1 \
>>> +        -- lsp-set-addresses $1 router
>>> +}
>>> +
>>> +check ovn-nbctl \
>>> +    -- lr-add lr \
>>> +    -- ls-add ls \
>>> +    -- lrp-add lr lrp0 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp1 \
>>> +    -- lrp-add lr lrp1 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp0
>>> +
>>> +add_lsp lsp1
>>> +add_lsp lsp2
>>> +add_lsp lsp3
>>> +
>>> +check ovn-nbctl --wait=sb sync
>>> +
>>> +AT_CHECK([grep -qc "Bad configuration: The peer of the switch port
>>> 'lsp1' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'"
>>> northd/ovn-northd.log], [0], [])
>>> +AT_CHECK([grep -qc "Bad configuration: The peer of the switch port
>>> 'lsp2' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'"
>>> northd/ovn-northd.log], [0], [])
>>> +AT_CHECK([grep -qc "Bad configuration: The peer of the switch port
>>> 'lsp3' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'"
>>> northd/ovn-northd.log], [0], [])
>>> +
>>> +AT_CLEANUP
>>> +])
>>>
>>> The test fails at the third grep (at least on my system).
>>>
>>> From the output it looks like the message is only shown for lsp1 and
>>> lsp2, but not for lsp3. I assume it is related to the hash of the
>>> ovn_port in the ports hmap and thereby the iteration order.
>>>
>>
>> So I'm not sure I understand, will we assert in the last case?
>>
> 
> Tried your test to see what is happening and you can't see the message
> because of rate limiting, if you disable rate limit it's there.
> 

Does it make sense to integrate this into the test suite too?

Thanks,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to