On 9/1/25 4:48 PM, Dumitru Ceara wrote: > On 9/1/25 4:25 PM, Ilya Maximets wrote: >> On 9/1/25 4:19 PM, Dumitru Ceara wrote: >>> On 9/1/25 4:17 PM, Dumitru Ceara wrote: >>>> On 9/1/25 4:12 PM, Ilya Maximets wrote: >>>>> On 9/1/25 3:33 PM, Dumitru Ceara wrote: >>>>>> Hi Ilya, >>>>>> >>>>>> Thanks for the review! >>>>>> >>>>>> On 9/1/25 3:23 PM, Ilya Maximets wrote: >>>>>>> On 9/1/25 1:22 PM, Dumitru Ceara wrote: >>>>>>>> Since 6919992d8781 ("Datapath_Binding: Separate type column and sync >>>>>>>> NB.UUID to SB."), a new type column is added to SB datapaths to >>>>>>>> differentiate between datapath types (switch vs router). The same >>>>>>>> commit also changed northd such that newly created SB datapaths have as >>>>>>>> UUID the same value as the NB switch/router UUID. All SB.Datapath >>>>>>>> records were also updated and their 'type' field populated >>>>>>>> >>>>>>>> It also added a helper function, datapath_get_nb_uuid_and_type(), to >>>>>>>> abstract out the extraction of a SB datapath's corresponding NB UUID >>>>>>>> and >>>>>>>> type in the following way: >>>>>>>> - if the SB.Datapath.type field has a value, the helper assumed that >>>>>>>> the >>>>>>>> SB.Datapath.UUID is equal to the NB.Logical_Switch/Router.UUID so it >>>>>>>> returned the 'type' value and the SB UUID >>>>>>>> - else (for "old style") datapaths it extracted the NB UUID and type >>>>>>>> from the SB.Datapath.external_ids (as used to be done before the >>>>>>>> commit). >>>>>>>> >>>>>>>> This creates an upgrade issue thoughi: older (already existing before >>>>>>>> upgrade) SB datapaths are not immediately recreated and their 'type' >>>>>>>> is set; so the NB UUID value returned by >>>>>>>> datapath_get_nb_uuid_and_type() >>>>>>>> for these records is incorrect. >>>>>>>> >>>>>>>> That stays like that until datapaths are finally recreated (due to >>>>>>>> other >>>>>>>> events, e.g., full recompute of the datapath-sync nodes). All that >>>>>>>> time >>>>>>>> the lflow manager will incorrectly determine that the existing lflows >>>>>>>> that are stale. That's because the lflow manager uses >>>>>>>> ovn_datapath_from_sbrec() which in turn relied on >>>>>>>> datapath_get_nb_uuid_and_type() to determine the in-memory ovn_datapath >>>>>>>> corresponding to the logical flow. >>>>>>>> >>>>>>>> There's no need to avoid re-creating SB.Datapath bindings as soon as >>>>>>>> ovn-northd determines that they're not using the new scheme. In order >>>>>>>> to achieve that we add two sets of helpers: >>>>>>>> a. one to be used by ovn-northd, ovn_datapath_get_nb_uuid_and_type() >>>>>>>> which only parses the new type of datapaths (everything else, i.e. >>>>>>>> old-style datapaths, will be considered stale) >>>>>>>> b. one to be used by all other readers of SB datapaths (e.g., >>>>>>>> ovn-controller, ovn-ic), datapath_get_nb_uuid_and_type_legacy() >>>>>>>> which relies on parsing external IDs (even in the new type of >>>>>>>> datapaths external IDs are still populated with the NB UUID in order >>>>>>>> to maintain backwards compatibility). >>>>>>> >>>>>>> Can we ever switch all users to the non-leagacy? I'm not sure... >>>>>>> Should probably be a TODO entry for this. >>>>>>> >>>>>> >>>>>> We can, it should happen in the next LTS (26.03) though. And it should >>>>>> be the plan. >>>>> >>>>> How the transition is done? i.e. is that guaranteed that the database >>>>> is fully translated to the new format after the next upgrade? Or was it >>>>> decided that the full database rewrite on upgrade is not an issue and it >>>>> works this way? >>>>> >>>> >>>> Yes, it's guaranteed that after upgrade to 25.09.0 (or more recent) the >>>> database is fully translated. It was decided that it's not a concern to >>>> do that. Discussed here: >>>> >>>> https://mail.openvswitch.org/pipermail/ovs-dev/2025-August/425327.html >> >> OK. FWIW, in comparison with the schema upgrade situation, this datapath >> binding re-creation business may cause actual temporary traffic disruption >> as we'll delete all the dynamic MAC bindings. Is that the case? I didn't >> read it very carefully, but I don't see this being discussed in the linked >> thread. >> > > You're right, dynamic MAC bindings have not been discussed. I guess we > have a few options: > a. leave it as is, dynamic MAC bindings will be recreated with the > correct datapaths when needed, traffic will be buffered (for a bit) by > ovn-controller
The main problem here is that all that traffic is going to ovn-controller and, as practice shows (we had an issue like this earlier this year), CoPP starts dropping a fair share of this traffic causing a noticeable amount of lost packets and visible delays on TCP connections. > b. try to update existing dynamic MAC bindings when their datapath is > not accurate anymore (similar to what we do for static mac bindings) Sounds risky in way that it's not that simple to track. We may have a chance of moving entries to a wrong datapath. Do you have ideas on how to do the transition cleanly in northd? > c. revert all the Datpath_Binding schema changes (type and UUID syncing) Maybe the safest option, but I'm not sure how much reverting this involves. > > Other SB records created by ovn-controller and referring to > Datapath_bindings are: > > - IGMP Group - ovn-controller will recreate this from memory immediately > after the datapath binding update so there might be some disruption but > minimal We should also remember that ovn-controller is handling the whole database rewrite here as well. So, it may take a bit longer than usual. > - Learned_Route - ovn-controller will recreate them too but we don't > really expect any customer deployments pre 25.09 with BGP enabled. The feature was there since 25.03, so we may have some early users, I guess. > >>>> >>>>>> Good point about adding a TODO. I can post a v3 or if >>>>>> this is the only change that needs to happen maybe whoever merges this >>>>>> series can add the following incremental: >>>>>> >>>>>> diff --git a/TODO.rst b/TODO.rst >>>>>> index a9fe3ec4e8..4c7aa0c772 100644 >>>>>> --- a/TODO.rst >>>>>> +++ b/TODO.rst >>>>>> @@ -99,6 +99,12 @@ OVN To-do List >>>>>> * Implement I-P for datapath groups. >>>>>> * Implement I-P for route exchange relevant ports. >>>>>> >>>>>> +* ovn-northd Incremental processing >>>>>> + >>>>>> + * Remove datapath_get_nb_uuid_and_type_legacy() in the next LTS >>>>>> (26.03) as >>> >>> On second thought, this should be "after the next LTS (in 26.09)". >> >> Yep. >> >>> >>>>>> + all deployments should be using the new SB.Datapath_Binding.type >>>>>> column >>>>>> + at that point. >>>>>> + >>>>>> * ovn-northd parallel logical flow processing >>>>>> >>>>>> * Multi-threaded logical flow computation was optimized for the case >>>>>> --- >>>>>> >>>>>>> If not, then we should just drop the "new" way and stop duplicating >>>>>>> the same information in two places. >>>>>>> >>>>>>> Best regards, Ilya Maximets. >>>>>>> >>>>>> >>>>>> Regards, >>>>>> Dumitru >>>>>> >>>>> >>> >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev