On 9/1/25 4:48 PM, Dumitru Ceara wrote:
> On 9/1/25 4:25 PM, Ilya Maximets wrote:
>> On 9/1/25 4:19 PM, Dumitru Ceara wrote:
>>> On 9/1/25 4:17 PM, Dumitru Ceara wrote:
>>>> On 9/1/25 4:12 PM, Ilya Maximets wrote:
>>>>> On 9/1/25 3:33 PM, Dumitru Ceara wrote:
>>>>>> Hi Ilya,
>>>>>>
>>>>>> Thanks for the review!
>>>>>>
>>>>>> On 9/1/25 3:23 PM, Ilya Maximets wrote:
>>>>>>> On 9/1/25 1:22 PM, Dumitru Ceara wrote:
>>>>>>>> Since 6919992d8781 ("Datapath_Binding: Separate type column and sync
>>>>>>>> NB.UUID to SB."), a new type column is added to SB datapaths to
>>>>>>>> differentiate between datapath types (switch vs router).  The same
>>>>>>>> commit also changed northd such that newly created SB datapaths have as
>>>>>>>> UUID the same value as the NB switch/router UUID.  All SB.Datapath
>>>>>>>> records were also updated and their 'type' field populated
>>>>>>>>
>>>>>>>> It also added a helper function, datapath_get_nb_uuid_and_type(), to
>>>>>>>> abstract out the extraction of a SB datapath's corresponding NB UUID 
>>>>>>>> and
>>>>>>>> type in the following way:
>>>>>>>> - if the SB.Datapath.type field has a value, the helper assumed that 
>>>>>>>> the
>>>>>>>>   SB.Datapath.UUID is equal to the NB.Logical_Switch/Router.UUID so it
>>>>>>>>   returned the 'type' value and the SB UUID
>>>>>>>> - else (for "old style") datapaths it extracted the NB UUID and type
>>>>>>>>   from the SB.Datapath.external_ids (as used to be done before the
>>>>>>>>   commit).
>>>>>>>>
>>>>>>>> This creates an upgrade issue thoughi: older (already existing before
>>>>>>>> upgrade) SB datapaths are not immediately recreated and their 'type'
>>>>>>>> is set; so the NB UUID value returned by 
>>>>>>>> datapath_get_nb_uuid_and_type()
>>>>>>>> for these records is incorrect.
>>>>>>>>
>>>>>>>> That stays like that until datapaths are finally recreated (due to 
>>>>>>>> other
>>>>>>>> events, e.g., full recompute of the datapath-sync nodes).  All that 
>>>>>>>> time
>>>>>>>> the lflow manager will incorrectly determine that the existing lflows
>>>>>>>> that are stale.  That's because the lflow manager uses
>>>>>>>> ovn_datapath_from_sbrec() which in turn relied on
>>>>>>>> datapath_get_nb_uuid_and_type() to determine the in-memory ovn_datapath
>>>>>>>> corresponding to the logical flow.
>>>>>>>>
>>>>>>>> There's no need to avoid re-creating SB.Datapath bindings as soon as
>>>>>>>> ovn-northd determines that they're not using the new scheme.  In order
>>>>>>>> to achieve that we add two sets of helpers:
>>>>>>>> a. one to be used by ovn-northd, ovn_datapath_get_nb_uuid_and_type()
>>>>>>>>    which only parses the new type of datapaths (everything else, i.e.
>>>>>>>>    old-style datapaths, will be considered stale)
>>>>>>>> b. one to be used by all other readers of SB datapaths (e.g.,
>>>>>>>>    ovn-controller, ovn-ic), datapath_get_nb_uuid_and_type_legacy()
>>>>>>>>    which relies on parsing external IDs (even in the new type of
>>>>>>>>    datapaths external IDs are still populated with the NB UUID in order
>>>>>>>>    to maintain backwards compatibility).
>>>>>>>
>>>>>>> Can we ever switch all users to the non-leagacy?  I'm not sure...
>>>>>>> Should probably be a TODO entry for this.
>>>>>>>
>>>>>>
>>>>>> We can, it should happen in the next LTS (26.03) though.  And it should
>>>>>> be the plan.
>>>>>
>>>>> How the transition is done?  i.e. is that guaranteed that the database
>>>>> is fully translated to the new format after the next upgrade?  Or was it
>>>>> decided that the full database rewrite on upgrade is not an issue and it
>>>>> works this way?
>>>>>
>>>>
>>>> Yes, it's guaranteed that after upgrade to 25.09.0 (or more recent) the
>>>> database is fully translated.  It was decided that it's not a concern to
>>>> do that.  Discussed here:
>>>>
>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2025-August/425327.html
>>
>> OK.  FWIW, in comparison with the schema upgrade situation, this datapath
>> binding re-creation business may cause actual temporary traffic disruption
>> as we'll delete all the dynamic MAC bindings.  Is that the case?  I didn't
>> read it very carefully, but I don't see this being discussed in the linked
>> thread.
>>
> 
> You're right, dynamic MAC bindings have not been discussed.  I guess we
> have a few options:
> a. leave it as is, dynamic MAC bindings will be recreated with the
> correct datapaths when needed, traffic will be buffered (for a bit) by
> ovn-controller

The main problem here is that all that traffic is going to ovn-controller
and, as practice shows (we had an issue like this earlier this year),
CoPP starts dropping a fair share of this traffic causing a noticeable
amount of lost packets and visible delays on TCP connections.

> b. try to update existing dynamic MAC bindings when their datapath is
> not accurate anymore (similar to what we do for static mac bindings)

Sounds risky in way that it's not that simple to track.  We may have a
chance of moving entries to a wrong datapath.  Do you have ideas on how
to do the transition cleanly in northd?

> c. revert all the Datpath_Binding schema changes (type and UUID syncing)

Maybe the safest option, but I'm not sure how much reverting this involves.

> 
> Other SB records created by ovn-controller and referring to
> Datapath_bindings are:
> 
> - IGMP Group - ovn-controller will recreate this from memory immediately
> after the datapath binding update so there might be some disruption but
> minimal

We should also remember that ovn-controller is handling the whole database
rewrite here as well.  So, it may take a bit longer than usual.

> - Learned_Route - ovn-controller will recreate them too but we don't
> really expect any customer deployments pre 25.09 with BGP enabled.

The feature was there since 25.03, so we may have some early users, I guess.

> 
>>>>
>>>>>> Good point about adding a TODO.  I can post a v3 or if
>>>>>> this is the only change that needs to happen maybe whoever merges this
>>>>>> series can add the following incremental:
>>>>>>
>>>>>> diff --git a/TODO.rst b/TODO.rst
>>>>>> index a9fe3ec4e8..4c7aa0c772 100644
>>>>>> --- a/TODO.rst
>>>>>> +++ b/TODO.rst
>>>>>> @@ -99,6 +99,12 @@ OVN To-do List
>>>>>>    * Implement I-P for datapath groups.
>>>>>>    * Implement I-P for route exchange relevant ports.
>>>>>>  
>>>>>> +* ovn-northd Incremental processing
>>>>>> +
>>>>>> +  * Remove datapath_get_nb_uuid_and_type_legacy() in the next LTS 
>>>>>> (26.03) as
>>>
>>> On second thought, this should be "after the next LTS (in 26.09)".
>>
>> Yep.
>>
>>>
>>>>>> +    all deployments should be using the new SB.Datapath_Binding.type 
>>>>>> column
>>>>>> +    at that point.
>>>>>> +
>>>>>>  * ovn-northd parallel logical flow processing
>>>>>>  
>>>>>>    * Multi-threaded logical flow computation was optimized for the case
>>>>>> ---
>>>>>>
>>>>>>> If not, then we should just drop the "new" way and stop duplicating
>>>>>>> the same information in two places.
>>>>>>>
>>>>>>> Best regards, Ilya Maximets.
>>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Dumitru
>>>>>>
>>>>>
>>>
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to