On 8/29/25 12:22 PM, Ilya Maximets wrote: > On 8/29/25 9:27 AM, Dumitru Ceara wrote: >> On 8/28/25 11:55 PM, Ilya Maximets wrote: >>> On 8/28/25 11:41 PM, Ilya Maximets wrote: >>>> On 8/13/25 7:20 PM, Mark Michelson via dev wrote: >>>>> Prior to this commit, if you wanted to find the corresponding northbound >>>>> UUID for a southbound Logical Datapath, you could find it in one of two >>>>> places: >>>>> >>>>> For logical switches, it was in external-ids:logical-switch. >>>>> For logical routers, it was in external-ids:logical-router. >>>>> >>>>> With this commit, we are separating the type and UUID into separate >>>>> fields. This way, no matter the type of the datapath, you can find the >>>>> UUID. And you can find the type of the datapath without having to >>>>> potentially check multiple external-ids keys to do so. >>>>> >>>>> These fields are going to be used pretty heavily by northd in upcoming >>>>> patches, so instead of making them external-ids, these are now >>>>> full-fledged columns on the southbound Datapath_Binding. The UUID of the >>>>> northbound logical datapath is in a column called "nb_uuid" and the type >>>>> of the logical datapath is stored in an enumerated column called "type". >>>>> >>>>> Note that there is a seemingly-unrelated change in the check packet >>>>> length test in tests/ovn.at. This test started failing because its use >>>>> of `grep` was picking up the new "nb_uuid" column accidentally. The >>>>> change to the test uses `fetch_column` since it is more precise. >>>>> >>>>> Signed-off-by: Mark Michelson <mmich...@redhat.com> >>>>> Acked-by: Ales Musil <amu...@redhat.com> >>>>> --- >>>>> V17: >>>>> - cherry picked from v15 >>>>> --- >>>>> controller/local_data.c | 2 +- >>>>> ic/ovn-ic.c | 5 +++-- >>>>> lib/ovn-util.c | 45 +++++++++++++++++++++++++++++++++++++++++ >>>>> lib/ovn-util.h | 8 ++++++++ >>>>> northd/northd.c | 38 +++++++++++++++++----------------- >>>>> northd/northd.h | 5 +++-- >>>>> ovn-sb.ovsschema | 11 ++++++++-- >>>>> ovn-sb.xml | 21 ++++++++++--------- >>>>> tests/ovn-controller.at | 4 ++++ >>>>> tests/ovn-northd.at | 6 +++--- >>>>> tests/ovn.at | 6 ++---- >>>>> utilities/ovn-sbctl.c | 4 ++-- >>>>> utilities/ovn-trace.c | 3 +-- >>>>> 13 files changed, 112 insertions(+), 46 deletions(-) >>>> >>>> <snip> >>>> >>>>> /* Returns an "enum ovn_stage" built from the arguments. >>>>> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema >>>>> index 4c24f5b51..8bc7bbba3 100644 >>>>> --- a/ovn-sb.ovsschema >>>>> +++ b/ovn-sb.ovsschema >>>>> @@ -1,7 +1,7 @@ >>>>> { >>>>> "name": "OVN_Southbound", >>>>> - "version": "21.2.0", >>>>> - "cksum": "29145795 34859", >>>>> + "version": "21.3.0", >>>>> + "cksum": "3179330761 35286", >>>>> "tables": { >>>>> "SB_Global": { >>>>> "columns": { >>>>> @@ -196,6 +196,13 @@ >>>>> "load_balancers": {"type": {"key": {"type": "uuid"}, >>>>> "min": 0, >>>>> "max": "unlimited"}}, >>>>> + "type": {"type": {"key": {"type": "string", >>>>> + "enum": ["set", >>>>> + ["logical-switch", >>>>> + >>>>> "logical-router"]]}}}, >>>> >>>> Hi, Mark, Dumitru, Ales. >>>> >> >> Hi Ilya, >> >>>> This change breaks upgrades, at least in active-backup database >>>> configuration. >>>> >>>> When the old database is converted to the new schema, the default value for >>>> the column is an empty string. But this is against the schema definition. >>>> This is not a problem for the database itself, as it doesn't check these >>>> constraints, but if we have the backup database connected, it will check >>>> the >>>> constraints upon receiving the new data and will fail: >>>> >>>> |replication|INFO|OVN_Southbound: Monitor reply received. Resetting the >>>> database. >>>> |00045|ovsdb_error|ERR|unexpected ovsdb error: constraint violation: "" is >>>> not >>>> one of the allowed values ([logical-router, >>>> logical-switch]) >>>> >>>> At this point the backup server will disconnect and turn into active server >>>> as ti thinks that the origin is broken. We probably need to allow this >>>> column to not have a value specified. >>>> >>>> It's fairly easy to reporoduce - get an older database and start a sandbox >>>> with it. The sb2 will error out and disconnect. >>>> >> >> Nice catch! I managed to reproduce the problem as following your >> suggestion locally: I started a 25.03 sandbox, ran ./ovn-setup.sh, saved >> the NB/SB and then started a sandbox on main with the saved DBs. >> >>>> Also, northd and ovn-controller should be able to handle this case where >>>> the column is not yet set after upgrade. I didn't check if that works or >>>> not. >> >> Yes, that part is handled correctly, see datapath_get_nb_uuid_and_type(). >> >> Are you planning to post a patch to fix this? I assume we just need to >> make the column value optional in the SB schema. > > I'll probably not have time for this today, so feel free to work on it. > Otherwise, I can find some time early next week. >
No worries, I posted a patch here: https://patchwork.ozlabs.org/project/ovn/patch/20250829140040.39933-1-dce...@redhat.com/ >> >>> >>> And it seems like northd is in the infinite lop of re-writing entire sb. >>> Not sure if it is related or some other issue. >>> >> >> Could it be that you didn't have the Static_Mac_Binding fix in your >> local main branch? I don't see the infinite loop in my reproducer. >> >> https://github.com/ovn-org/ovn/commit/e8a0ade > > I saw this on the latest main: > https://github.com/ovn-org/ovn/commit/b6ee938b9 > > So, both of the static MAC binding changes are there. > Upon further debugging there is a problem with static MAC bindings but that's not what is causing the loop. Ales just posted a separate patch for that. > It feels like northd removes datapaths with all the acssociated resources > and then re-adds them on the next iteration. E.g. I see the same static > mac binding deleted in one transaction and added back in the next and > so on in the loop, same with the logical flows and some other stuff. > >> >> If not, it might be good to share the DBs if possible. > > I saw this in one of the OpenShift DBs from OVN 24.03. I can share more > details on this particular database privately. > Thanks for the databases, I managed to reproduce this locally. Debugging further, I'll update once I know more. Regards, Dumitru >> >>>> >>>> Best regards, Ilya Maximets. >>> >> >> Regards, >> Dumitru >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev