Hi Ales, Ilya, On 8/27/25 3:43 PM, Ilya Maximets wrote: > On 8/27/25 2:47 PM, Ales Musil wrote: >> >> >> On Wed, Aug 27, 2025 at 2:06 PM Ilya Maximets <i.maxim...@ovn.org >> <mailto:i.maxim...@ovn.org>> wrote: >> >> On 8/26/25 6:26 PM, Ales Musil wrote: >> > Make sure we remove the stale Static MAC Bindings if there is SB >> > datapath re-creation in a single transaction. Otherwise, northd >> > would be stuck in a commit failed loop due to "referential integrity >> > violation" as the static mac binding would still reference the old >> > datapath. >> > >> > At the same time unify the check if dynamic/static mac binding is >> > stale as it was different up until now. Both of them use the same >> > columns as an indication of logical port and datapath. >> > >> > Fixes: b22684d4e37c ("Add a northbound interface to program >> MAC_Binding table") >> > Reported-at: https://issues.redhat.com/browse/FDP-1623 >> <https://issues.redhat.com/browse/FDP-1623> >> > Signed-off-by: Ales Musil <amu...@redhat.com >> <mailto:amu...@redhat.com>> >> > --- >> > v2: Replace the Fixes tags. >> > Remove the redundant check for op->nbrp. >> > Update the commit message with more details. >> > --- >> > northd/northd.c | 28 +++++++++++++++++++--------- >> > tests/ovn-northd.at <http://ovn-northd.at> | 8 ++++++++ >> > 2 files changed, 27 insertions(+), 9 deletions(-) >> > >> > diff --git a/northd/northd.c b/northd/northd.c >> > index 015f30a35..4b5b6584e 100644 >> > --- a/northd/northd.c >> > +++ b/northd/northd.c >> > @@ -2867,6 +2867,19 @@ common: >> > } >> > } >> > >> > +static bool >> > +mac_binding_is_stale(const struct hmap *lr_datapaths, >> > + const struct hmap *lr_ports, >> > + const struct sbrec_datapath_binding *dp, >> > + const char *logical_port) >> > +{ >> > + const struct ovn_datapath *od = >> > + ovn_datapath_from_sbrec(NULL, lr_datapaths, dp); >> > + >> > + return !od || ovn_datapath_is_stale(od) || >> > + !ovn_port_find(lr_ports, logical_port); >> > +} >> > + >> > /* Remove mac_binding entries that refer to logical_ports which are >> > * deleted. */ >> > static void >> > @@ -2876,11 +2889,8 @@ cleanup_mac_bindings( >> > { >> > const struct sbrec_mac_binding *b; >> > SBREC_MAC_BINDING_TABLE_FOR_EACH_SAFE (b, >> sbrec_mac_binding_table) { >> > - const struct ovn_datapath *od = >> > - ovn_datapath_from_sbrec(NULL, lr_datapaths, b->datapath); >> > - >> > - if (!od || ovn_datapath_is_stale(od) || >> > - !ovn_port_find(lr_ports, b->logical_port)) { >> > + if (mac_binding_is_stale(lr_datapaths, lr_ports, b->datapath, >> > + b->logical_port)) { >> > sbrec_mac_binding_delete(b); >> > } >> > } >> > @@ -18850,7 +18860,7 @@ build_static_mac_binding_table( >> > struct ovsdb_idl_txn *ovnsb_txn, >> > const struct nbrec_static_mac_binding_table >> *nbrec_static_mb_table, >> > const struct sbrec_static_mac_binding_table >> *sbrec_static_mb_table, >> > - struct hmap *lr_ports) >> > + const struct hmap *lr_datapaths, const struct hmap *lr_ports) >> > { >> > /* Cleanup SB Static_MAC_Binding entries which do not have >> corresponding >> > * NB Static_MAC_Binding entries, and SB Static_MAC_Binding >> entries for >> > @@ -18866,8 +18876,8 @@ build_static_mac_binding_table( >> > continue; >> > } >> > >> > - struct ovn_port *op = ovn_port_find(lr_ports, >> nb_smb->logical_port); >> > - if (!op || !op->nbrp || !op->od || !op->od->sdp->sb_dp) { >> > + if (mac_binding_is_stale(lr_datapaths, lr_ports, >> sb_smb->datapath, >> > + sb_smb->logical_port)) { >> > sbrec_static_mac_binding_delete(sb_smb); >> > } >> >> FWIW, in addition to obscuring the logic in this function, the change >> above >> also makes the comment at the top of the loop inaccurate, making the >> whole >> thing even more confusing. So, at least that should be fixed. >> >> >> Sure I can update the comment. >> >> >> >> >> >> The fact that the row will be re-created, but will have the same UUID >> also >> doesn't sound particularly great. >> >> I'd much prefer a solution like this (not tested): >> >> diff --git a/northd/northd.c b/northd/northd.c >> index 015f30a35..fb494713a 100644 >> --- a/northd/northd.c >> +++ b/northd/northd.c >> @@ -18852,28 +18852,16 @@ build_static_mac_binding_table( >> const struct sbrec_static_mac_binding_table *sbrec_static_mb_table, >> struct hmap *lr_ports) >> { >> - /* Cleanup SB Static_MAC_Binding entries which do not have >> corresponding >> - * NB Static_MAC_Binding entries, and SB Static_MAC_Binding entries >> for >> - * which there is not a NB Logical_Router_Port of the same name. */ >> - const struct nbrec_static_mac_binding *nb_smb; >> + struct hmapx stale = HMAPX_INITIALIZER(&stale); >> const struct sbrec_static_mac_binding *sb_smb; >> SBREC_STATIC_MAC_BINDING_TABLE_FOR_EACH_SAFE (sb_smb, >> sbrec_static_mb_table) { >> - nb_smb = nbrec_static_mac_binding_table_get_for_uuid( >> - nbrec_static_mb_table, &sb_smb->header_.uuid); >> - if (!nb_smb) { >> - sbrec_static_mac_binding_delete(sb_smb); >> - continue; >> - } >> - >> - struct ovn_port *op = ovn_port_find(lr_ports, >> nb_smb->logical_port); >> - if (!op || !op->nbrp || !op->od || !op->od->sdp->sb_dp) { >> - sbrec_static_mac_binding_delete(sb_smb); >> - } >> + hmapx_add(&stale, CONST_CAST(void *, sb_smb)); >> } >> >> /* Create/Update SB Static_MAC_Binding entries with corresponding >> values >> * from NB Static_MAC_Binding entries. */ >> + const struct nbrec_static_mac_binding *nb_smb; >> NBREC_STATIC_MAC_BINDING_TABLE_FOR_EACH ( >> nb_smb, nbrec_static_mb_table) { >> struct ovn_port *op = ovn_port_find(lr_ports, >> nb_smb->logical_port); >> @@ -18905,10 +18893,22 @@ build_static_mac_binding_table( >> >> sbrec_static_mac_binding_set_override_dynamic_mac(mb, >> nb_smb->override_dynamic_mac); >> } >> + if (od->sdp->sb_dp != mb->datapath) { >> + sbrec_static_mac_binding_set_datapath( >> + mb, od->sdp->sb_dp); >> + } >> + hmapx_find_and_delete(&stale, mb); >> } >> } >> } >> } >> + >> + /* Cleanup SB Static_MAC_Binding entries which are no longer >> needed. */ >> + struct hmapx_node *node; >> + HMAPX_FOR_EACH (node, &stale) { >> + sbrec_static_mac_binding_delete(node->data); >> + } >> + hmapx_destroy(&stale); >> } >> >> static void >> --- >> >> This approach is much more robust, since it avoids logic duplication and >> doesn't require a ton of explanation in the comments. >> >> >> Well the logic duplication makes sense here because I don't see any >> difference how to define staleness for both dynamic and static mac bindings. > > I meant the logic duplication between the insertion and the removal paths, > i.e. duplication of checks between the staleness condition and the conditions > that lead to insertion of the new entry. With the hmapx approach we only > need this logic to be spelled out once while adding/updating the entry > and removals are automatic as we just remove everything that wasn't updated. > >> >> >> >> >> Reading this code again, I'm pretty sure there are more bugs here caused >> by the recent UUID unification. AFAICT, if I change IP address in NB, >> it will not be updated in SB. Before the UUID change, lookup was done >> with the port+IP pair and so if we found an entry, it had the right IP, >> but now we're not updating IP and also not checking it in the staleness >> check. May also be a problem if we change the logical port in NB while >> the old one still exists, the staleness check will not catch that. >> So, we need to add more things into the 'update' branch. >> >> >> Yeah there is a bug in the update, I will post a separate patch for that. > > If we're going to add updates for the port and IP, we may as well add an > update for the datapath. The logic that change of the datapath makes the > entry stale, but the change of the port/IP doesn't isn't very sound, IMO. >
I stand behind my statement that, in my opinion, SB.Static_Mac_Binding entries that have incorrect "datapath" are stale. Another reason why I think that way is because the SB 'datapath' record is only written by northd and has no corresponding direct field in the NB.Static_Mac_Binding. If we detect a SB entry with an invalid datapath it's clear that the entry was created for a router that does not exist anymore (even if at the moment we have a new router that happens to have the same name as the one that was deleted). Regardless, I'm OK with both approaches (the one that Ales proposes and the one Ilya suggested in the diff above). This is a serious bug and both approaches are technically correct. It would be great if we could get this fixed sooner rather than later. I'm acking Ales' patch because it's technically correct and fixes the problem. Personally I like that it unifies the "stale" condition for static and dynamic mac bindings: Acked-by: Dumitru Ceara <dce...@redhat.com> NOTE: I'd also ack the patch that Ilya suggested as an alternative as I like its simplicity and it seems technically correct too. So if we decide to go with the alternative, please keep my ack. But, as stated above, I'd really like it if we could get this bug fixed soon. Thank you both again for the good discussion! Best regards, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev