On Wed, Feb 26, 2025 at 3:08 PM Felix Huettner <felix.huettner@stackit.cloud> wrote:
> On Wed, Feb 26, 2025 at 02:52:45PM +0100, Ales Musil wrote: > > On Wed, Feb 26, 2025 at 2:34 PM Felix Huettner > <felix.huettner@stackit.cloud> > > wrote: > > > > > On Wed, Feb 26, 2025 at 12:01:30PM +0100, Ales Musil wrote: > > > > We would crash northd with assert when LSP was configured to use > > > > LRP that already had a peer defined. Prevent the crash and add > > > > warning that the configuration is not valid. > > > > > > > > Reported-by: Enrique Llorente <ellor...@redhat.com> > > > > Signed-off-by: Ales Musil <amu...@redhat.com> > > > > > > Hi Ales, > > > > > > thanks for the patch. > > Hi Ales, > > > > > > > > --- > > > > northd/northd.c | 9 +++++++++ > > > > tests/ovn-northd.at | 21 +++++++++++++++++++++ > > > > 2 files changed, 30 insertions(+) > > > > > > > > diff --git a/northd/northd.c b/northd/northd.c > > > > index a91e48ac2..15dd022a4 100644 > > > > --- a/northd/northd.c > > > > +++ b/northd/northd.c > > > > @@ -2452,6 +2452,15 @@ join_logical_ports(const struct > > > sbrec_port_binding_table *sbrec_pb_table, > > > > continue; > > > > } > > > > > > > > + if (peer->nbrp->peer) { > > > > > > I think that this is relying on the lrps peers being handled before the > > > lsp that references it. > > > However i did not find such a guarantee. I have a testcase below that > seems > > > to be able to still trigger that issue in some cases. > > > > > > > nbrp->peer is directly from the database. Even if it is some order > relation > > we shouldn't overwrite the peer and cause assert failure along the way. > > > > oh i completely missed that. Sorry for the confusion. > > Acked-by: Felix Huettner <felix.huettner@stackit.cloud> > > > > > > > + static struct vlog_rate_limit rl = > > > VLOG_RATE_LIMIT_INIT(1, 5); > > > > + VLOG_WARN_RL(&rl, "Bad configuration: The peer of > the > > > switch" > > > > + " port '%s' (LRP peer: '%s') has its > own > > > peer" > > > > + " configuration: '%s'", op->key, > peer->key, > > > > + peer->nbrp->peer); > > > > + continue; > > > > + } > > > > + > > > > ovn_datapath_add_router_port(op->od, op); > > > > ovn_datapath_add_ls_peer(peer->od, op->od); > > > > peer->peer = op; > > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > > > index c74d57f7e..a7aaa1832 100644 > > > > --- a/tests/ovn-northd.at > > > > +++ b/tests/ovn-northd.at > > > > @@ -16695,3 +16695,24 @@ check_row_count Port_Binding 1 > > > logical_port=lrp2 options:peer{not-in}lrp0 > > > > > > > > AT_CLEANUP > > > > ]) > > > > + > > > > +OVN_FOR_EACH_NORTHD_NO_HV([ > > > > +AT_SETUP([LSP using LRP with peer]) > > > > +ovn_start > > > > + > > > > +check ovn-nbctl \ > > > > + -- lr-add lr \ > > > > + -- ls-add ls \ > > > > + -- lrp-add lr lrp0 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp1 \ > > > > + -- lrp-add lr lrp1 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp0 \ > > > > + -- lsp-add ls ls-lrp1 \ > > > > + -- lsp-set-type ls-lrp1 router \ > > > > + -- lsp-set-options ls-lrp1 router-port=lrp1 \ > > > > + -- lsp-set-addresses ls-lrp1 router > > > > + > > > > +AT_CHECK([grep -c "Bad configuration: The peer of the switch port > > > 'ls-lrp1' (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'" > > > northd/ovn-northd.log], [0], [dnl > > > > +1 > > > > +]) > > > > + > > > > +AT_CLEANUP > > > > +]) > > > > > > I used the following testcase. > > > > > > +OVN_FOR_EACH_NORTHD_NO_HV([ > > > +AT_SETUP([LSP using LRP with peer]) > > > +ovn_start > > > + > > > +add_lsp() { > > > + check ovn-nbctl \ > > > + -- lsp-add ls $1 \ > > > + -- lsp-set-type $1 router \ > > > + -- lsp-set-options $1 router-port=lrp1 \ > > > + -- lsp-set-addresses $1 router > > > +} > > > + > > > +check ovn-nbctl \ > > > + -- lr-add lr \ > > > + -- ls-add ls \ > > > + -- lrp-add lr lrp0 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp1 \ > > > + -- lrp-add lr lrp1 00:00:00:01:ff:01 192.168.1.1/24 peer=lrp0 > > > + > > > +add_lsp lsp1 > > > +add_lsp lsp2 > > > +add_lsp lsp3 > > > + > > > +check ovn-nbctl --wait=sb sync > > > + > > > +AT_CHECK([grep -qc "Bad configuration: The peer of the switch port > 'lsp1' > > > (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'" > > > northd/ovn-northd.log], [0], []) > > > +AT_CHECK([grep -qc "Bad configuration: The peer of the switch port > 'lsp2' > > > (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'" > > > northd/ovn-northd.log], [0], []) > > > +AT_CHECK([grep -qc "Bad configuration: The peer of the switch port > 'lsp3' > > > (LRP peer: 'lrp1') has its own peer configuration: 'lrp0'" > > > northd/ovn-northd.log], [0], []) > > > + > > > +AT_CLEANUP > > > +]) > > > > > > The test fails at the third grep (at least on my system). > > > > > > From the output it looks like the message is only shown for lsp1 and > > > lsp2, but not for lsp3. I assume it is related to the hash of the > > > ovn_port in the ports hmap and thereby the iteration order. > > > > > > > So I'm not sure I understand, will we assert in the last case? > > No it will not assert/crash. > > The grep seems to have just failed because of the rate limit on the log > message. > That then caused me to think that there is still some kind of issue > open. > > Again sorry for the confusion. > Yeah no worries, thank you for the review. > Thanks a lot, > Felix > > > > > > > > Thanks a lot, > > > Felix > > > > > > > > > > -- > > > > 2.48.1 > > > > > > > > _______________________________________________ > > > > dev mailing list > > > > d...@openvswitch.org > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > Thanks, > > Ales > > _______________________________________________ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev