On 9/2/25 3:25 PM, Mark Michelson wrote:
> On 9/1/25 10:48 AM, Dumitru Ceara wrote:
>> On 9/1/25 4:25 PM, Ilya Maximets wrote:
>>> On 9/1/25 4:19 PM, Dumitru Ceara wrote:
>>>> On 9/1/25 4:17 PM, Dumitru Ceara wrote:
>>>>> On 9/1/25 4:12 PM, Ilya Maximets wrote:
>>>>>> On 9/1/25 3:33 PM, Dumitru Ceara wrote:
>>>>>>> Hi Ilya,
>>>>>>>
>>>>>>> Thanks for the review!
>>>>>>>
>>>>>>> On 9/1/25 3:23 PM, Ilya Maximets wrote:
>>>>>>>> On 9/1/25 1:22 PM, Dumitru Ceara wrote:
>>>>>>>>> Since 6919992d8781 ("Datapath_Binding: Separate type column and
>>>>>>>>> sync
>>>>>>>>> NB.UUID to SB."), a new type column is added to SB datapaths to
>>>>>>>>> differentiate between datapath types (switch vs router).  The same
>>>>>>>>> commit also changed northd such that newly created SB datapaths
>>>>>>>>> have as
>>>>>>>>> UUID the same value as the NB switch/router UUID.  All SB.Datapath
>>>>>>>>> records were also updated and their 'type' field populated
>>>>>>>>>
>>>>>>>>> It also added a helper function,
>>>>>>>>> datapath_get_nb_uuid_and_type(), to
>>>>>>>>> abstract out the extraction of a SB datapath's corresponding NB
>>>>>>>>> UUID and
>>>>>>>>> type in the following way:
>>>>>>>>> - if the SB.Datapath.type field has a value, the helper assumed
>>>>>>>>> that the
>>>>>>>>>    SB.Datapath.UUID is equal to the NB.Logical_Switch/
>>>>>>>>> Router.UUID so it
>>>>>>>>>    returned the 'type' value and the SB UUID
>>>>>>>>> - else (for "old style") datapaths it extracted the NB UUID and
>>>>>>>>> type
>>>>>>>>>    from the SB.Datapath.external_ids (as used to be done before
>>>>>>>>> the
>>>>>>>>>    commit).
>>>>>>>>>
>>>>>>>>> This creates an upgrade issue thoughi: older (already existing
>>>>>>>>> before
>>>>>>>>> upgrade) SB datapaths are not immediately recreated and their
>>>>>>>>> 'type'
>>>>>>>>> is set; so the NB UUID value returned by
>>>>>>>>> datapath_get_nb_uuid_and_type()
>>>>>>>>> for these records is incorrect.
>>>>>>>>>
>>>>>>>>> That stays like that until datapaths are finally recreated (due
>>>>>>>>> to other
>>>>>>>>> events, e.g., full recompute of the datapath-sync nodes).  All
>>>>>>>>> that time
>>>>>>>>> the lflow manager will incorrectly determine that the existing
>>>>>>>>> lflows
>>>>>>>>> that are stale.  That's because the lflow manager uses
>>>>>>>>> ovn_datapath_from_sbrec() which in turn relied on
>>>>>>>>> datapath_get_nb_uuid_and_type() to determine the in-memory
>>>>>>>>> ovn_datapath
>>>>>>>>> corresponding to the logical flow.
>>>>>>>>>
>>>>>>>>> There's no need to avoid re-creating SB.Datapath bindings as
>>>>>>>>> soon as
>>>>>>>>> ovn-northd determines that they're not using the new scheme. 
>>>>>>>>> In order
>>>>>>>>> to achieve that we add two sets of helpers:
>>>>>>>>> a. one to be used by ovn-northd,
>>>>>>>>> ovn_datapath_get_nb_uuid_and_type()
>>>>>>>>>     which only parses the new type of datapaths (everything
>>>>>>>>> else, i.e.
>>>>>>>>>     old-style datapaths, will be considered stale)
>>>>>>>>> b. one to be used by all other readers of SB datapaths (e.g.,
>>>>>>>>>     ovn-controller, ovn-ic),
>>>>>>>>> datapath_get_nb_uuid_and_type_legacy()
>>>>>>>>>     which relies on parsing external IDs (even in the new type of
>>>>>>>>>     datapaths external IDs are still populated with the NB UUID
>>>>>>>>> in order
>>>>>>>>>     to maintain backwards compatibility).
>>>>>>>>
>>>>>>>> Can we ever switch all users to the non-leagacy?  I'm not sure...
>>>>>>>> Should probably be a TODO entry for this.
>>>>>>>>
>>>>>>>
>>>>>>> We can, it should happen in the next LTS (26.03) though.  And it
>>>>>>> should
>>>>>>> be the plan.
>>>>>>
>>>>>> How the transition is done?  i.e. is that guaranteed that the
>>>>>> database
>>>>>> is fully translated to the new format after the next upgrade?  Or
>>>>>> was it
>>>>>> decided that the full database rewrite on upgrade is not an issue
>>>>>> and it
>>>>>> works this way?
>>>>>>
>>>>>
>>>>> Yes, it's guaranteed that after upgrade to 25.09.0 (or more recent)
>>>>> the
>>>>> database is fully translated.  It was decided that it's not a
>>>>> concern to
>>>>> do that.  Discussed here:
>>>>>
>>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2025-August/425327.html
>>>
>>> OK.  FWIW, in comparison with the schema upgrade situation, this
>>> datapath
>>> binding re-creation business may cause actual temporary traffic
>>> disruption
>>> as we'll delete all the dynamic MAC bindings.  Is that the case?  I
>>> didn't
>>> read it very carefully, but I don't see this being discussed in the
>>> linked
>>> thread.
>>>
>>
>> You're right, dynamic MAC bindings have not been discussed.  I guess we
>> have a few options:
>> a. leave it as is, dynamic MAC bindings will be recreated with the
>> correct datapaths when needed, traffic will be buffered (for a bit) by
>> ovn-controller
>> b. try to update existing dynamic MAC bindings when their datapath is
>> not accurate anymore (similar to what we do for static mac bindings)
>> c. revert all the Datpath_Binding schema changes (type and UUID syncing)
> 
> Sorry for the late reply (four day weekend for me).
> 

Hi Mark,

No worries.

> A fourth option would be to go with what I had done prior to v16 of the
> datapath refactor series. Have a look at v15's patch here: https://
> patchwork.ozlabs.org/project/ovn/patch/20250811171752.3916543-2-
> mmich...@redhat.com/
> 
> In that version, instead of syncing UUIDs from NB to SB, the SB datapath
> binding has an "nb_uuid" column that contains the NB logical datapath
> UUID. The datapath_get_nb_*() functions can distinguish "legacy"
> southbound datapaths from new ones based on whether the nb_uuid column
> is NULL or not. If it's NULL, then it's a legacy datapath binding, which
> means we need to get the UUID and type from external-ids. If the column
> is non-NULL, then we can guarantee that both the NB UUID and type are
> populated and use those.
> 

We'd still have to ensure the new nb_uuid column is optional to avoid
active-backup upgrade issues (see patch 1/4 of this series).

> Going with this option means that any SB records (logical flows,
> mac_bindings, etc.) that point to datapath bindings do not need to be
> rewritten during an upgrade, since the actual SB UUIDs will not change.
> You also still have a simple way to link the SB datapath binding to the
> NB logical datapath using the "nb_uuid" column. There's an extra
> debugging step to go from the SB datapath binding UUID to the NB logical
> datpath UUID, but IMO, it's better and more consistent than what we have
> prior to 25.09 with the external-ids.
> 

I think I'm OK with this approach.  The only worry I have is how much of
code we'd have to change given that the release is currently scheduled
to happen this Friday, September 5th.

I guess it would help if we had an implementation of "option d" you
describe above in order to allow us to compare against "option b" I have
implemented here https://github.com/dceara/ovn/commits/fix-northd-loop-v3/.

What I don't like about "option b" is that it does rewrite most of the
SB database on upgrade.

>>
>> Other SB records created by ovn-controller and referring to
>> Datapath_bindings are:
>>
>> - IGMP Group - ovn-controller will recreate this from memory immediately
>> after the datapath binding update so there might be some disruption but
>> minimal
>> - Learned_Route - ovn-controller will recreate them too but we don't
>> really expect any customer deployments pre 25.09 with BGP enabled.
>>
>>>>>
>>>>>>> Good point about adding a TODO.  I can post a v3 or if
>>>>>>> this is the only change that needs to happen maybe whoever merges
>>>>>>> this
>>>>>>> series can add the following incremental:
>>>>>>>
>>>>>>> diff --git a/TODO.rst b/TODO.rst
>>>>>>> index a9fe3ec4e8..4c7aa0c772 100644
>>>>>>> --- a/TODO.rst
>>>>>>> +++ b/TODO.rst
>>>>>>> @@ -99,6 +99,12 @@ OVN To-do List
>>>>>>>     * Implement I-P for datapath groups.
>>>>>>>     * Implement I-P for route exchange relevant ports.
>>>>>>>   +* ovn-northd Incremental processing
>>>>>>> +
>>>>>>> +  * Remove datapath_get_nb_uuid_and_type_legacy() in the next
>>>>>>> LTS (26.03) as
>>>>
>>>> On second thought, this should be "after the next LTS (in 26.09)".
>>>
>>> Yep.
>>>
>>>>
>>>>>>> +    all deployments should be using the new
>>>>>>> SB.Datapath_Binding.type column
>>>>>>> +    at that point.
>>>>>>> +
>>>>>>>   * ovn-northd parallel logical flow processing
>>>>>>>       * Multi-threaded logical flow computation was optimized for
>>>>>>> the case
>>>>>>> ---
>>>>>>>
>>>>>>>> If not, then we should just drop the "new" way and stop duplicating
>>>>>>>> the same information in two places.
>>>>>>>>
>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Dumitru
>>>>>>>
>>>>>>
>>>>
>>>
>>
> 

Regards,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to