On 9/1/25 10:48 AM, Dumitru Ceara wrote:
On 9/1/25 4:25 PM, Ilya Maximets wrote:
On 9/1/25 4:19 PM, Dumitru Ceara wrote:
On 9/1/25 4:17 PM, Dumitru Ceara wrote:
On 9/1/25 4:12 PM, Ilya Maximets wrote:
On 9/1/25 3:33 PM, Dumitru Ceara wrote:
Hi Ilya,

Thanks for the review!

On 9/1/25 3:23 PM, Ilya Maximets wrote:
On 9/1/25 1:22 PM, Dumitru Ceara wrote:
Since 6919992d8781 ("Datapath_Binding: Separate type column and sync
NB.UUID to SB."), a new type column is added to SB datapaths to
differentiate between datapath types (switch vs router).  The same
commit also changed northd such that newly created SB datapaths have as
UUID the same value as the NB switch/router UUID.  All SB.Datapath
records were also updated and their 'type' field populated

It also added a helper function, datapath_get_nb_uuid_and_type(), to
abstract out the extraction of a SB datapath's corresponding NB UUID and
type in the following way:
- if the SB.Datapath.type field has a value, the helper assumed that the
   SB.Datapath.UUID is equal to the NB.Logical_Switch/Router.UUID so it
   returned the 'type' value and the SB UUID
- else (for "old style") datapaths it extracted the NB UUID and type
   from the SB.Datapath.external_ids (as used to be done before the
   commit).

This creates an upgrade issue thoughi: older (already existing before
upgrade) SB datapaths are not immediately recreated and their 'type'
is set; so the NB UUID value returned by datapath_get_nb_uuid_and_type()
for these records is incorrect.

That stays like that until datapaths are finally recreated (due to other
events, e.g., full recompute of the datapath-sync nodes).  All that time
the lflow manager will incorrectly determine that the existing lflows
that are stale.  That's because the lflow manager uses
ovn_datapath_from_sbrec() which in turn relied on
datapath_get_nb_uuid_and_type() to determine the in-memory ovn_datapath
corresponding to the logical flow.

There's no need to avoid re-creating SB.Datapath bindings as soon as
ovn-northd determines that they're not using the new scheme.  In order
to achieve that we add two sets of helpers:
a. one to be used by ovn-northd, ovn_datapath_get_nb_uuid_and_type()
    which only parses the new type of datapaths (everything else, i.e.
    old-style datapaths, will be considered stale)
b. one to be used by all other readers of SB datapaths (e.g.,
    ovn-controller, ovn-ic), datapath_get_nb_uuid_and_type_legacy()
    which relies on parsing external IDs (even in the new type of
    datapaths external IDs are still populated with the NB UUID in order
    to maintain backwards compatibility).

Can we ever switch all users to the non-leagacy?  I'm not sure...
Should probably be a TODO entry for this.


We can, it should happen in the next LTS (26.03) though.  And it should
be the plan.

How the transition is done?  i.e. is that guaranteed that the database
is fully translated to the new format after the next upgrade?  Or was it
decided that the full database rewrite on upgrade is not an issue and it
works this way?


Yes, it's guaranteed that after upgrade to 25.09.0 (or more recent) the
database is fully translated.  It was decided that it's not a concern to
do that.  Discussed here:

https://mail.openvswitch.org/pipermail/ovs-dev/2025-August/425327.html

OK.  FWIW, in comparison with the schema upgrade situation, this datapath
binding re-creation business may cause actual temporary traffic disruption
as we'll delete all the dynamic MAC bindings.  Is that the case?  I didn't
read it very carefully, but I don't see this being discussed in the linked
thread.


You're right, dynamic MAC bindings have not been discussed.  I guess we
have a few options:
a. leave it as is, dynamic MAC bindings will be recreated with the
correct datapaths when needed, traffic will be buffered (for a bit) by
ovn-controller
b. try to update existing dynamic MAC bindings when their datapath is
not accurate anymore (similar to what we do for static mac bindings)
c. revert all the Datpath_Binding schema changes (type and UUID syncing)

Sorry for the late reply (four day weekend for me).

A fourth option would be to go with what I had done prior to v16 of the datapath refactor series. Have a look at v15's patch here: https://patchwork.ozlabs.org/project/ovn/patch/20250811171752.3916543-2-mmich...@redhat.com/

In that version, instead of syncing UUIDs from NB to SB, the SB datapath binding has an "nb_uuid" column that contains the NB logical datapath UUID. The datapath_get_nb_*() functions can distinguish "legacy" southbound datapaths from new ones based on whether the nb_uuid column is NULL or not. If it's NULL, then it's a legacy datapath binding, which means we need to get the UUID and type from external-ids. If the column is non-NULL, then we can guarantee that both the NB UUID and type are populated and use those.

Going with this option means that any SB records (logical flows, mac_bindings, etc.) that point to datapath bindings do not need to be rewritten during an upgrade, since the actual SB UUIDs will not change. You also still have a simple way to link the SB datapath binding to the NB logical datapath using the "nb_uuid" column. There's an extra debugging step to go from the SB datapath binding UUID to the NB logical datpath UUID, but IMO, it's better and more consistent than what we have prior to 25.09 with the external-ids.


Other SB records created by ovn-controller and referring to
Datapath_bindings are:

- IGMP Group - ovn-controller will recreate this from memory immediately
after the datapath binding update so there might be some disruption but
minimal
- Learned_Route - ovn-controller will recreate them too but we don't
really expect any customer deployments pre 25.09 with BGP enabled.


Good point about adding a TODO.  I can post a v3 or if
this is the only change that needs to happen maybe whoever merges this
series can add the following incremental:

diff --git a/TODO.rst b/TODO.rst
index a9fe3ec4e8..4c7aa0c772 100644
--- a/TODO.rst
+++ b/TODO.rst
@@ -99,6 +99,12 @@ OVN To-do List
    * Implement I-P for datapath groups.
    * Implement I-P for route exchange relevant ports.
+* ovn-northd Incremental processing
+
+  * Remove datapath_get_nb_uuid_and_type_legacy() in the next LTS (26.03) as

On second thought, this should be "after the next LTS (in 26.09)".

Yep.


+    all deployments should be using the new SB.Datapath_Binding.type column
+    at that point.
+
  * ovn-northd parallel logical flow processing
* Multi-threaded logical flow computation was optimized for the case
---

If not, then we should just drop the "new" way and stop duplicating
the same information in two places.

Best regards, Ilya Maximets.


Regards,
Dumitru






_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to