On Thu, Aug 28, 2025 at 12:37 AM Dumitru Ceara <dce...@redhat.com> wrote:
> Hi Ales, Ilya, > > On 8/27/25 3:43 PM, Ilya Maximets wrote: > > On 8/27/25 2:47 PM, Ales Musil wrote: > >> > >> > >> On Wed, Aug 27, 2025 at 2:06 PM Ilya Maximets <i.maxim...@ovn.org > <mailto:i.maxim...@ovn.org>> wrote: > >> > >> On 8/26/25 6:26 PM, Ales Musil wrote: > >> > Make sure we remove the stale Static MAC Bindings if there is SB > >> > datapath re-creation in a single transaction. Otherwise, northd > >> > would be stuck in a commit failed loop due to "referential > integrity > >> > violation" as the static mac binding would still reference the old > >> > datapath. > >> > > >> > At the same time unify the check if dynamic/static mac binding is > >> > stale as it was different up until now. Both of them use the same > >> > columns as an indication of logical port and datapath. > >> > > >> > Fixes: b22684d4e37c ("Add a northbound interface to program > MAC_Binding table") > >> > Reported-at: https://issues.redhat.com/browse/FDP-1623 < > https://issues.redhat.com/browse/FDP-1623> > >> > Signed-off-by: Ales Musil <amu...@redhat.com <mailto: > amu...@redhat.com>> > >> > --- > >> > v2: Replace the Fixes tags. > >> > Remove the redundant check for op->nbrp. > >> > Update the commit message with more details. > >> > --- > >> > northd/northd.c | 28 +++++++++++++++++++--------- > >> > tests/ovn-northd.at <http://ovn-northd.at> | 8 ++++++++ > >> > 2 files changed, 27 insertions(+), 9 deletions(-) > >> > > >> > diff --git a/northd/northd.c b/northd/northd.c > >> > index 015f30a35..4b5b6584e 100644 > >> > --- a/northd/northd.c > >> > +++ b/northd/northd.c > >> > @@ -2867,6 +2867,19 @@ common: > >> > } > >> > } > >> > > >> > +static bool > >> > +mac_binding_is_stale(const struct hmap *lr_datapaths, > >> > + const struct hmap *lr_ports, > >> > + const struct sbrec_datapath_binding *dp, > >> > + const char *logical_port) > >> > +{ > >> > + const struct ovn_datapath *od = > >> > + ovn_datapath_from_sbrec(NULL, lr_datapaths, dp); > >> > + > >> > + return !od || ovn_datapath_is_stale(od) || > >> > + !ovn_port_find(lr_ports, logical_port); > >> > +} > >> > + > >> > /* Remove mac_binding entries that refer to logical_ports which > are > >> > * deleted. */ > >> > static void > >> > @@ -2876,11 +2889,8 @@ cleanup_mac_bindings( > >> > { > >> > const struct sbrec_mac_binding *b; > >> > SBREC_MAC_BINDING_TABLE_FOR_EACH_SAFE (b, > sbrec_mac_binding_table) { > >> > - const struct ovn_datapath *od = > >> > - ovn_datapath_from_sbrec(NULL, lr_datapaths, > b->datapath); > >> > - > >> > - if (!od || ovn_datapath_is_stale(od) || > >> > - !ovn_port_find(lr_ports, b->logical_port)) { > >> > + if (mac_binding_is_stale(lr_datapaths, lr_ports, > b->datapath, > >> > + b->logical_port)) { > >> > sbrec_mac_binding_delete(b); > >> > } > >> > } > >> > @@ -18850,7 +18860,7 @@ build_static_mac_binding_table( > >> > struct ovsdb_idl_txn *ovnsb_txn, > >> > const struct nbrec_static_mac_binding_table > *nbrec_static_mb_table, > >> > const struct sbrec_static_mac_binding_table > *sbrec_static_mb_table, > >> > - struct hmap *lr_ports) > >> > + const struct hmap *lr_datapaths, const struct hmap *lr_ports) > >> > { > >> > /* Cleanup SB Static_MAC_Binding entries which do not have > corresponding > >> > * NB Static_MAC_Binding entries, and SB Static_MAC_Binding > entries for > >> > @@ -18866,8 +18876,8 @@ build_static_mac_binding_table( > >> > continue; > >> > } > >> > > >> > - struct ovn_port *op = ovn_port_find(lr_ports, > nb_smb->logical_port); > >> > - if (!op || !op->nbrp || !op->od || !op->od->sdp->sb_dp) { > >> > + if (mac_binding_is_stale(lr_datapaths, lr_ports, > sb_smb->datapath, > >> > + sb_smb->logical_port)) { > >> > sbrec_static_mac_binding_delete(sb_smb); > >> > } > >> > >> FWIW, in addition to obscuring the logic in this function, the > change above > >> also makes the comment at the top of the loop inaccurate, making > the whole > >> thing even more confusing. So, at least that should be fixed. > >> > >> > >> Sure I can update the comment. > >> > >> > >> > >> > >> > >> The fact that the row will be re-created, but will have the same > UUID also > >> doesn't sound particularly great. > >> > >> I'd much prefer a solution like this (not tested): > >> > >> diff --git a/northd/northd.c b/northd/northd.c > >> index 015f30a35..fb494713a 100644 > >> --- a/northd/northd.c > >> +++ b/northd/northd.c > >> @@ -18852,28 +18852,16 @@ build_static_mac_binding_table( > >> const struct sbrec_static_mac_binding_table > *sbrec_static_mb_table, > >> struct hmap *lr_ports) > >> { > >> - /* Cleanup SB Static_MAC_Binding entries which do not have > corresponding > >> - * NB Static_MAC_Binding entries, and SB Static_MAC_Binding > entries for > >> - * which there is not a NB Logical_Router_Port of the same > name. */ > >> - const struct nbrec_static_mac_binding *nb_smb; > >> + struct hmapx stale = HMAPX_INITIALIZER(&stale); > >> const struct sbrec_static_mac_binding *sb_smb; > >> SBREC_STATIC_MAC_BINDING_TABLE_FOR_EACH_SAFE (sb_smb, > >> sbrec_static_mb_table) { > >> - nb_smb = nbrec_static_mac_binding_table_get_for_uuid( > >> - nbrec_static_mb_table, &sb_smb->header_.uuid); > >> - if (!nb_smb) { > >> - sbrec_static_mac_binding_delete(sb_smb); > >> - continue; > >> - } > >> - > >> - struct ovn_port *op = ovn_port_find(lr_ports, > nb_smb->logical_port); > >> - if (!op || !op->nbrp || !op->od || !op->od->sdp->sb_dp) { > >> - sbrec_static_mac_binding_delete(sb_smb); > >> - } > >> + hmapx_add(&stale, CONST_CAST(void *, sb_smb)); > >> } > >> > >> /* Create/Update SB Static_MAC_Binding entries with > corresponding values > >> * from NB Static_MAC_Binding entries. */ > >> + const struct nbrec_static_mac_binding *nb_smb; > >> NBREC_STATIC_MAC_BINDING_TABLE_FOR_EACH ( > >> nb_smb, nbrec_static_mb_table) { > >> struct ovn_port *op = ovn_port_find(lr_ports, > nb_smb->logical_port); > >> @@ -18905,10 +18893,22 @@ build_static_mac_binding_table( > >> > sbrec_static_mac_binding_set_override_dynamic_mac(mb, > >> nb_smb->override_dynamic_mac); > >> } > >> + if (od->sdp->sb_dp != mb->datapath) { > >> + sbrec_static_mac_binding_set_datapath( > >> + mb, od->sdp->sb_dp); > >> + } > >> + hmapx_find_and_delete(&stale, mb); > >> } > >> } > >> } > >> } > >> + > >> + /* Cleanup SB Static_MAC_Binding entries which are no longer > needed. */ > >> + struct hmapx_node *node; > >> + HMAPX_FOR_EACH (node, &stale) { > >> + sbrec_static_mac_binding_delete(node->data); > >> + } > >> + hmapx_destroy(&stale); > >> } > >> > >> static void > >> --- > >> > >> This approach is much more robust, since it avoids logic > duplication and > >> doesn't require a ton of explanation in the comments. > >> > >> > >> Well the logic duplication makes sense here because I don't see any > >> difference how to define staleness for both dynamic and static mac > bindings. > > > > I meant the logic duplication between the insertion and the removal > paths, > > i.e. duplication of checks between the staleness condition and the > conditions > > that lead to insertion of the new entry. With the hmapx approach we only > > need this logic to be spelled out once while adding/updating the entry > > and removals are automatic as we just remove everything that wasn't > updated. > > > >> > >> > >> > >> > >> Reading this code again, I'm pretty sure there are more bugs here > caused > >> by the recent UUID unification. AFAICT, if I change IP address in > NB, > >> it will not be updated in SB. Before the UUID change, lookup was > done > >> with the port+IP pair and so if we found an entry, it had the right > IP, > >> but now we're not updating IP and also not checking it in the > staleness > >> check. May also be a problem if we change the logical port in NB > while > >> the old one still exists, the staleness check will not catch that. > >> So, we need to add more things into the 'update' branch. > >> > >> > >> Yeah there is a bug in the update, I will post a separate patch for > that. > > > > If we're going to add updates for the port and IP, we may as well add an > > update for the datapath. The logic that change of the datapath makes the > > entry stale, but the change of the port/IP doesn't isn't very sound, IMO. > > > > I stand behind my statement that, in my opinion, SB.Static_Mac_Binding > entries that have incorrect "datapath" are stale. > > Another reason why I think that way is because the SB 'datapath' record > is only written by northd and has no corresponding direct field in the > NB.Static_Mac_Binding. If we detect a SB entry with an invalid datapath > it's clear that the entry was created for a router that does not exist > anymore (even if at the moment we have a new router that happens to have > the same name as the one that was deleted). > > Regardless, I'm OK with both approaches (the one that Ales proposes and > the one Ilya suggested in the diff above). This is a serious bug and > both approaches are technically correct. It would be great if we could > get this fixed sooner rather than later. > > I'm acking Ales' patch because it's technically correct and fixes the > problem. Personally I like that it unifies the "stale" condition for > static and dynamic mac bindings: > > Acked-by: Dumitru Ceara <dce...@redhat.com> > > NOTE: I'd also ack the patch that Ilya suggested as an alternative as I > like its simplicity and it seems technically correct too. So if we > decide to go with the alternative, please keep my ack. > > But, as stated above, I'd really like it if we could get this bug fixed > soon. > > Thank you both again for the good discussion! > > Best regards, > Dumitru > > Thank you Dumitru and Ilya, I have adjusted the comment, merged this into main and backported all the way down to 24.03. Regards, Ales _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev