On 8/29/25 12:22 PM, Ilya Maximets wrote:
> On 8/29/25 9:27 AM, Dumitru Ceara wrote:
>> On 8/28/25 11:55 PM, Ilya Maximets wrote:
>>> On 8/28/25 11:41 PM, Ilya Maximets wrote:
>>>> On 8/13/25 7:20 PM, Mark Michelson via dev wrote:
>>>>> Prior to this commit, if you wanted to find the corresponding northbound
>>>>> UUID for a southbound Logical Datapath, you could find it in one of two
>>>>> places:
>>>>>
>>>>> For logical switches, it was in external-ids:logical-switch.
>>>>> For logical routers, it was in external-ids:logical-router.
>>>>>
>>>>> With this commit, we are separating the type and UUID into separate
>>>>> fields. This way, no matter the type of the datapath, you can find the
>>>>> UUID. And you can find the type of the datapath without having to
>>>>> potentially check multiple external-ids keys to do so.
>>>>>
>>>>> These fields are going to be used pretty heavily by northd in upcoming
>>>>> patches, so instead of making them external-ids, these are now
>>>>> full-fledged columns on the southbound Datapath_Binding. The UUID of the
>>>>> northbound logical datapath is in a column called "nb_uuid" and the type
>>>>> of the logical datapath is stored in an enumerated column called "type".
>>>>>
>>>>> Note that there is a seemingly-unrelated change in the check packet
>>>>> length test in tests/ovn.at. This test started failing because its use
>>>>> of `grep` was picking up the new "nb_uuid" column accidentally. The
>>>>> change to the test uses `fetch_column` since it is more precise.
>>>>>
>>>>> Signed-off-by: Mark Michelson <mmich...@redhat.com>
>>>>> Acked-by: Ales Musil <amu...@redhat.com>
>>>>> ---
>>>>> V17:
>>>>> - cherry picked from v15
>>>>> ---
>>>>>  controller/local_data.c |  2 +-
>>>>>  ic/ovn-ic.c             |  5 +++--
>>>>>  lib/ovn-util.c          | 45 +++++++++++++++++++++++++++++++++++++++++
>>>>>  lib/ovn-util.h          |  8 ++++++++
>>>>>  northd/northd.c         | 38 +++++++++++++++++-----------------
>>>>>  northd/northd.h         |  5 +++--
>>>>>  ovn-sb.ovsschema        | 11 ++++++++--
>>>>>  ovn-sb.xml              | 21 ++++++++++---------
>>>>>  tests/ovn-controller.at |  4 ++++
>>>>>  tests/ovn-northd.at     |  6 +++---
>>>>>  tests/ovn.at            |  6 ++----
>>>>>  utilities/ovn-sbctl.c   |  4 ++--
>>>>>  utilities/ovn-trace.c   |  3 +--
>>>>>  13 files changed, 112 insertions(+), 46 deletions(-)
>>>>
>>>> <snip>
>>>>
>>>>>  /* Returns an "enum ovn_stage" built from the arguments.
>>>>> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
>>>>> index 4c24f5b51..8bc7bbba3 100644
>>>>> --- a/ovn-sb.ovsschema
>>>>> +++ b/ovn-sb.ovsschema
>>>>> @@ -1,7 +1,7 @@
>>>>>  {
>>>>>      "name": "OVN_Southbound",
>>>>> -    "version": "21.2.0",
>>>>> -    "cksum": "29145795 34859",
>>>>> +    "version": "21.3.0",
>>>>> +    "cksum": "3179330761 35286",
>>>>>      "tables": {
>>>>>          "SB_Global": {
>>>>>              "columns": {
>>>>> @@ -196,6 +196,13 @@
>>>>>                  "load_balancers": {"type": {"key": {"type": "uuid"},
>>>>>                                              "min": 0,
>>>>>                                              "max": "unlimited"}},
>>>>> +                "type": {"type": {"key": {"type": "string",
>>>>> +                                          "enum": ["set",
>>>>> +                                                      ["logical-switch",
>>>>> +                                                       
>>>>> "logical-router"]]}}},
>>>>
>>>> Hi, Mark, Dumitru, Ales.
>>>>
>>
>> Hi Ilya,
>>
>>>> This change breaks upgrades, at least in active-backup database 
>>>> configuration.
>>>>
>>>> When the old database is converted to the new schema, the default value for
>>>> the column is an empty string.  But this is against the schema definition.
>>>> This is not a problem for the database itself, as it doesn't check these
>>>> constraints, but if we have the backup database connected, it will check 
>>>> the
>>>> constraints upon receiving the new data and will fail:
>>>>
>>>> |replication|INFO|OVN_Southbound: Monitor reply received. Resetting the 
>>>> database.
>>>> |00045|ovsdb_error|ERR|unexpected ovsdb error: constraint violation: "" is 
>>>> not
>>>>                        one of the allowed values ([logical-router, 
>>>> logical-switch])
>>>>
>>>> At this point the backup server will disconnect and turn into active server
>>>> as ti thinks that the origin is broken.  We probably need to allow this
>>>> column to not have a value specified.
>>>>
>>>> It's fairly easy to reporoduce - get an older database and start a sandbox
>>>> with it.  The sb2 will error out and disconnect.
>>>>
>>
>> Nice catch!  I managed to reproduce the problem as following your
>> suggestion locally: I started a 25.03 sandbox, ran ./ovn-setup.sh, saved
>> the NB/SB and then started a sandbox on main with the saved DBs.
>>
>>>> Also, northd and ovn-controller should be able to handle this case where
>>>> the column is not yet set after upgrade.  I didn't check if that works or
>>>> not.
>>
>> Yes, that part is handled correctly, see datapath_get_nb_uuid_and_type().
>>
>> Are you planning to post a patch to fix this?  I assume we just need to
>> make the column value optional in the SB schema.
> 
> I'll probably not have time for this today, so feel free to work on it.
> Otherwise, I can find some time early next week.
> 

No worries, I posted a patch here:
https://patchwork.ozlabs.org/project/ovn/patch/20250829140040.39933-1-dce...@redhat.com/

>>
>>>
>>> And it seems like northd is in the infinite lop of re-writing entire sb.
>>> Not sure if it is related or some other issue.
>>>
>>
>> Could it be that you didn't have the Static_Mac_Binding fix in your
>> local main branch?  I don't see the infinite loop in my reproducer.
>>
>> https://github.com/ovn-org/ovn/commit/e8a0ade
> 
> I saw this on the latest main:
> https://github.com/ovn-org/ovn/commit/b6ee938b9
> 
> So, both of the static MAC binding changes are there.
> 

Upon further debugging there is a problem with static MAC bindings but
that's not what is causing the loop.  Ales just posted a separate patch
for that.

> It feels like northd removes datapaths with all the acssociated resources
> and then re-adds them on the next iteration.  E.g. I see the same static
> mac binding deleted in one transaction and added back in the next and
> so on in the loop, same with the logical flows and some other stuff.
> 
>>
>> If not, it might be good to share the DBs if possible.
> 
> I saw this in one of the OpenShift DBs from OVN 24.03.  I can share more
> details on this particular database privately.
> 

Thanks for the databases, I managed to reproduce this locally.
Debugging further, I'll update once I know more.

Regards,
Dumitru

>>
>>>>
>>>> Best regards, Ilya Maximets.
>>>
>>
>> Regards,
>> Dumitru
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to