On 9/2/25 3:25 PM, Mark Michelson wrote: > On 9/1/25 10:48 AM, Dumitru Ceara wrote: >> On 9/1/25 4:25 PM, Ilya Maximets wrote: >>> On 9/1/25 4:19 PM, Dumitru Ceara wrote: >>>> On 9/1/25 4:17 PM, Dumitru Ceara wrote: >>>>> On 9/1/25 4:12 PM, Ilya Maximets wrote: >>>>>> On 9/1/25 3:33 PM, Dumitru Ceara wrote: >>>>>>> Hi Ilya, >>>>>>> >>>>>>> Thanks for the review! >>>>>>> >>>>>>> On 9/1/25 3:23 PM, Ilya Maximets wrote: >>>>>>>> On 9/1/25 1:22 PM, Dumitru Ceara wrote: >>>>>>>>> Since 6919992d8781 ("Datapath_Binding: Separate type column and >>>>>>>>> sync >>>>>>>>> NB.UUID to SB."), a new type column is added to SB datapaths to >>>>>>>>> differentiate between datapath types (switch vs router). The same >>>>>>>>> commit also changed northd such that newly created SB datapaths >>>>>>>>> have as >>>>>>>>> UUID the same value as the NB switch/router UUID. All SB.Datapath >>>>>>>>> records were also updated and their 'type' field populated >>>>>>>>> >>>>>>>>> It also added a helper function, >>>>>>>>> datapath_get_nb_uuid_and_type(), to >>>>>>>>> abstract out the extraction of a SB datapath's corresponding NB >>>>>>>>> UUID and >>>>>>>>> type in the following way: >>>>>>>>> - if the SB.Datapath.type field has a value, the helper assumed >>>>>>>>> that the >>>>>>>>> SB.Datapath.UUID is equal to the NB.Logical_Switch/ >>>>>>>>> Router.UUID so it >>>>>>>>> returned the 'type' value and the SB UUID >>>>>>>>> - else (for "old style") datapaths it extracted the NB UUID and >>>>>>>>> type >>>>>>>>> from the SB.Datapath.external_ids (as used to be done before >>>>>>>>> the >>>>>>>>> commit). >>>>>>>>> >>>>>>>>> This creates an upgrade issue thoughi: older (already existing >>>>>>>>> before >>>>>>>>> upgrade) SB datapaths are not immediately recreated and their >>>>>>>>> 'type' >>>>>>>>> is set; so the NB UUID value returned by >>>>>>>>> datapath_get_nb_uuid_and_type() >>>>>>>>> for these records is incorrect. >>>>>>>>> >>>>>>>>> That stays like that until datapaths are finally recreated (due >>>>>>>>> to other >>>>>>>>> events, e.g., full recompute of the datapath-sync nodes). All >>>>>>>>> that time >>>>>>>>> the lflow manager will incorrectly determine that the existing >>>>>>>>> lflows >>>>>>>>> that are stale. That's because the lflow manager uses >>>>>>>>> ovn_datapath_from_sbrec() which in turn relied on >>>>>>>>> datapath_get_nb_uuid_and_type() to determine the in-memory >>>>>>>>> ovn_datapath >>>>>>>>> corresponding to the logical flow. >>>>>>>>> >>>>>>>>> There's no need to avoid re-creating SB.Datapath bindings as >>>>>>>>> soon as >>>>>>>>> ovn-northd determines that they're not using the new scheme. >>>>>>>>> In order >>>>>>>>> to achieve that we add two sets of helpers: >>>>>>>>> a. one to be used by ovn-northd, >>>>>>>>> ovn_datapath_get_nb_uuid_and_type() >>>>>>>>> which only parses the new type of datapaths (everything >>>>>>>>> else, i.e. >>>>>>>>> old-style datapaths, will be considered stale) >>>>>>>>> b. one to be used by all other readers of SB datapaths (e.g., >>>>>>>>> ovn-controller, ovn-ic), >>>>>>>>> datapath_get_nb_uuid_and_type_legacy() >>>>>>>>> which relies on parsing external IDs (even in the new type of >>>>>>>>> datapaths external IDs are still populated with the NB UUID >>>>>>>>> in order >>>>>>>>> to maintain backwards compatibility). >>>>>>>> >>>>>>>> Can we ever switch all users to the non-leagacy? I'm not sure... >>>>>>>> Should probably be a TODO entry for this. >>>>>>>> >>>>>>> >>>>>>> We can, it should happen in the next LTS (26.03) though. And it >>>>>>> should >>>>>>> be the plan. >>>>>> >>>>>> How the transition is done? i.e. is that guaranteed that the >>>>>> database >>>>>> is fully translated to the new format after the next upgrade? Or >>>>>> was it >>>>>> decided that the full database rewrite on upgrade is not an issue >>>>>> and it >>>>>> works this way? >>>>>> >>>>> >>>>> Yes, it's guaranteed that after upgrade to 25.09.0 (or more recent) >>>>> the >>>>> database is fully translated. It was decided that it's not a >>>>> concern to >>>>> do that. Discussed here: >>>>> >>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2025-August/425327.html >>> >>> OK. FWIW, in comparison with the schema upgrade situation, this >>> datapath >>> binding re-creation business may cause actual temporary traffic >>> disruption >>> as we'll delete all the dynamic MAC bindings. Is that the case? I >>> didn't >>> read it very carefully, but I don't see this being discussed in the >>> linked >>> thread. >>> >> >> You're right, dynamic MAC bindings have not been discussed. I guess we >> have a few options: >> a. leave it as is, dynamic MAC bindings will be recreated with the >> correct datapaths when needed, traffic will be buffered (for a bit) by >> ovn-controller >> b. try to update existing dynamic MAC bindings when their datapath is >> not accurate anymore (similar to what we do for static mac bindings) >> c. revert all the Datpath_Binding schema changes (type and UUID syncing) > > Sorry for the late reply (four day weekend for me). >
Hi Mark, No worries. > A fourth option would be to go with what I had done prior to v16 of the > datapath refactor series. Have a look at v15's patch here: https:// > patchwork.ozlabs.org/project/ovn/patch/20250811171752.3916543-2- > mmich...@redhat.com/ > > In that version, instead of syncing UUIDs from NB to SB, the SB datapath > binding has an "nb_uuid" column that contains the NB logical datapath > UUID. The datapath_get_nb_*() functions can distinguish "legacy" > southbound datapaths from new ones based on whether the nb_uuid column > is NULL or not. If it's NULL, then it's a legacy datapath binding, which > means we need to get the UUID and type from external-ids. If the column > is non-NULL, then we can guarantee that both the NB UUID and type are > populated and use those. > We'd still have to ensure the new nb_uuid column is optional to avoid active-backup upgrade issues (see patch 1/4 of this series). > Going with this option means that any SB records (logical flows, > mac_bindings, etc.) that point to datapath bindings do not need to be > rewritten during an upgrade, since the actual SB UUIDs will not change. > You also still have a simple way to link the SB datapath binding to the > NB logical datapath using the "nb_uuid" column. There's an extra > debugging step to go from the SB datapath binding UUID to the NB logical > datpath UUID, but IMO, it's better and more consistent than what we have > prior to 25.09 with the external-ids. > I think I'm OK with this approach. The only worry I have is how much of code we'd have to change given that the release is currently scheduled to happen this Friday, September 5th. I guess it would help if we had an implementation of "option d" you describe above in order to allow us to compare against "option b" I have implemented here https://github.com/dceara/ovn/commits/fix-northd-loop-v3/. What I don't like about "option b" is that it does rewrite most of the SB database on upgrade. >> >> Other SB records created by ovn-controller and referring to >> Datapath_bindings are: >> >> - IGMP Group - ovn-controller will recreate this from memory immediately >> after the datapath binding update so there might be some disruption but >> minimal >> - Learned_Route - ovn-controller will recreate them too but we don't >> really expect any customer deployments pre 25.09 with BGP enabled. >> >>>>> >>>>>>> Good point about adding a TODO. I can post a v3 or if >>>>>>> this is the only change that needs to happen maybe whoever merges >>>>>>> this >>>>>>> series can add the following incremental: >>>>>>> >>>>>>> diff --git a/TODO.rst b/TODO.rst >>>>>>> index a9fe3ec4e8..4c7aa0c772 100644 >>>>>>> --- a/TODO.rst >>>>>>> +++ b/TODO.rst >>>>>>> @@ -99,6 +99,12 @@ OVN To-do List >>>>>>> * Implement I-P for datapath groups. >>>>>>> * Implement I-P for route exchange relevant ports. >>>>>>> +* ovn-northd Incremental processing >>>>>>> + >>>>>>> + * Remove datapath_get_nb_uuid_and_type_legacy() in the next >>>>>>> LTS (26.03) as >>>> >>>> On second thought, this should be "after the next LTS (in 26.09)". >>> >>> Yep. >>> >>>> >>>>>>> + all deployments should be using the new >>>>>>> SB.Datapath_Binding.type column >>>>>>> + at that point. >>>>>>> + >>>>>>> * ovn-northd parallel logical flow processing >>>>>>> * Multi-threaded logical flow computation was optimized for >>>>>>> the case >>>>>>> --- >>>>>>> >>>>>>>> If not, then we should just drop the "new" way and stop duplicating >>>>>>>> the same information in two places. >>>>>>>> >>>>>>>> Best regards, Ilya Maximets. >>>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Dumitru >>>>>>> >>>>>> >>>> >>> >> > Regards, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev