> On 22 Jul 2025, at 09:03, Felix Huettner <felix.huettner@stackit.cloud> wrote:
> 
> On Mon, Jul 21, 2025 at 04:03:13PM +0000, Odintsov Vladislav wrote:
>> Hello guys,
>> 
>> ovn-ic routes origin behavior was introduced by me and it has special 
>> meaning for connected vs static origin type for learned routes.
>> 
>> Let me explain the logic. Consider 2-AZ topology, where logical router LR1 
>> in az1 connected to local router LR2 in az2 via transit switch. These 
>> routers should provide a “distributed cross-az LR” abstract: all networks 
>> connected to both of them should be reachable out-of-box and all user static 
>> routes should be advertized and reachable with OVN-IC mechanism.
>> 
>> Similar to how “local” (non-IC) routers work with connected vs static 
>> routes, we should provide unified behavior between AZs. If we have 
>> 192.168.0.0/24 as a local network on LRP and in the same LR, we have static 
>> route for same prefix, the connected route should take an effect like in 
>> classic networking following administrative distance technique. This 
>> functionality was adopted to ovn-ic-interconnected routers: connected routes 
>> should take precedence over static despite of their location: remote or 
>> local.
>> 
>> So, it is incorrect to have interconnected LRs with intersected LRP networks 
>> (you can, but this will harm routing to this prefix). Learning mechanism can 
>> be adjusted here if needed to not learn same connected route.
>> 
>> If user installs static routes with the same prefix in interconnected LRs in 
>> different AZs, each of these routes will be installed only locally: route 
>> from remote as will not be learned if favor of a local static route.
>> 
>> If user installs static route, which is the same with a connected (LRP 
>> network) route (even from remote AZ), this static route should have lower 
>> priority than remote “connected” route. It will get into LR pupeline, but 
>> will not affect traffic because it wouldn’t be hit.
>> 
>> This is why we need the “origin” field in IC routes and how we handle them.
>> 
>> This is a summarization of OVN docs and “how it was designed to work”. Maybe 
>> some of this was broken after a time in a fresh releases. Hope, this 
>> clarifies the idea.
> 
> Hi Vladislav,
> 
> just to summarize if i got it right. Your proposal would be to have the
> following routing priority:
> 
> 1. ROUTE_SOURCE_CONNECTED
> 2. ROUTE_SOURCE_IC_CONNECTED
> 3. ROUTE_SOURCE_STATIC
> 4. ROUTE_SOURCE_IC_STATIC
> 
> where the ROUTE_SOURCE_IC.* are found based on the "origin" field.
> 
> Is that understanding correct?

Hi Felix,

In my understanding current priorities are fine (connected and normal static 
routes have 2 different priorities despite of their locality).

Things that can be fixed I see:
1. Skip learning route with origin=connected if such prefix is already created 
in local AZ (LRP networks)
2. Allow learning remote origin=connected routes in ECMP manner (same as 
origin=static remote routes).

@Ilya, you were concerned here about such routing topology, but I can tell that 
here can be the same behavior applied with a distributing traffic across 
multiple OVN-IC L3 Gateways (interconn) by interconnecting LRs from different 
AZs with multiple transit switches and ECMP routes for traffic load 
distribution. Different LRPs can be scheduled on different L3 Gateways, so 
traffic load between 2 LRs from different AZs is distributed across multiple 
nodes. This applies to both: static routes and directly connected networks.

> 
> Thanks,
> Felix
> 
>> 
>> regards,
>> Vladislav Odintsov
>> 
>>>> On 21 Jul 2025, at 16:51, Ilya Maximets <i.maxim...@ovn.org> wrote:
>>> On 7/21/25 2:41 PM, Felix Huettner wrote:
>>>> On Thu, Jul 17, 2025 at 01:50:56PM +0000, Smirnov Aleksandr (K2 Cloud) 
>>>> wrote:
>>>>> In case you decide the fix in not desired, ovn-nbctl must be fixed
>>>>> because current report is confusing saying routes are ecmp while in fact
>>>>> they are not.
>>>> 
>>>> Hi everyone,
>>>> 
>>>> i just took a further look and discovered more chaos in the
>>>> "origin=connected" topic.
>>>> 
>>>> Currently treating it as ROUTE_SOURCE_CONNECTED also means that we will
>>>> prioritize these as normal connected routes and higher than
>>>> ROUTE_SOURCE_STATIC.
>>>> 
>>>> So if we assume the following routes:
>>>> * 192.168.0.0/24 via 10.0.10.10 as normale Logical_Router_Static_Route
>>>> * 192.168.0.0/24 via 10.0.11.11 as Logical_Router_Static_Route via
>>>> ovn-ic with origin=connected
>>>> * 192.168.0.0/24 via 10.0.12.12 as Logical_Router_Static_Route via
>>>> ovn-ic with origin=static
>>>> 
>>>> This currently results in the following effective routes (if i
>>>> understood it correctly):
>>>> * High priority: 192.168.0.0/24 via 10.0.11.11
>>>> * Low priority: 192.168.0.0/24 via ecmp of 10.0.10.10 and 10.0.12.12
>>>> 
>>>> This is honestly quite confusing from my perspective:
>>>> 1. why should an ovn-ic route be of higher priority than a local route
>>>> 2. why should ecmp work between ovn-ic and non-ic routes
>>>> 
>>>> From my view (of never using ovn-ic) i would have expected that the
>>>> ovn-ic are always of lower priority than non-ic routes.
>>>> 
>>>> What are your opinions on that?
>>> 
>>> Yeah, I'd say that IC routes should have lower general priority and
>>> should definitely not be added to ecmp groups together with the local
>>> routes as the cost is definitely not equal at this point.
>>> 
>>> I suppose, as you previously suggested, we need a set or different
>>> origin types, or a separate option that marks routes as learned
>>> through IC, so northd can adjust priorities accordingly.
>>> 
>>> Best regards, Ilya Maximets.
>>> 
>>>> 
>>>> Thanks a lot,
>>>> Felix
>>>> 
>>>>> On 7/17/25 4:16 PM, Ilya Maximets wrote:
>>>>>> Hrm, adding Felix back.
>>>>>> On 7/17/25 3:14 PM, Ilya Maximets wrote:
>>>>>>> On 7/17/25 11:56 AM, Felix Huettner wrote:
>>>>>>>> On Thu, Jul 17, 2025 at 11:29:24AM +0200, Ilya Maximets wrote:
>>>>>>>>> On 7/16/25 9:05 AM, Smirnov Aleksandr (K2 Cloud) wrote:
>>>>>>>>>> Hello,
>>>>>>>>>> I noticed a big difference in the flow generated by northd between
>>>>>>>>>> releases 24.09 and 25.03
>>>>>>>>>> In the 25.03 northd fail to find similar routes and form ecmp group.
>>>>>>>>>> I append following information:
>>>>>>>>>> 1. Testcase scenario that can be easily copy-pasted to ovn-ic.at
>>>>>>>>>> 2. Test output if ran in 24.09
>>>>>>>>>> 3. Test output if ran in 25.03
>>>>>>>>>> Could you please clarify is this real issue?
>>>>>>>>> It looks like Felix made a change to never group "connected" routes,
>>>>>>>>> i.e. the learned routes, in commit:
>>>>>>>>>  f8924740f26e ("northd: Move connected routes to route engine.")
>>>>>>>>> The code that makes all such routes to never consider groupping is
>>>>>>>>> the following:
>>>>>>>>> northd/en-group-ecmp-route.c:
>>>>>>>>> static void
>>>>>>>>> add_route(struct group_ecmp_datapath *gn, const struct parsed_route 
>>>>>>>>> *pr)
>>>>>>>>> {
>>>>>>>>>    if (pr->source == ROUTE_SOURCE_CONNECTED) {
>>>>>>>>>        unique_routes_add(gn, pr);
>>>>>>>>>        return;
>>>>>>>>>    }
>>>>>>>>> ...
>>>>>>>>> All the routes learned from the other router through the transit 
>>>>>>>>> switch
>>>>>>>>> have ROUTE_SOURCE_CONNECTED as their source and not being considered 
>>>>>>>>> for
>>>>>>>>> ecmp groupping.  There is also a comment in the removal part:
>>>>>>>>>        if (pr->source == ROUTE_SOURCE_CONNECTED) {
>>>>>>>>>            /* Connected routes are never part of an ecmp group.
>>>>>>>>>             * We should recompute. */
>>>>>>>>>            return false;
>>>>>>>>>        }
>>>>>>>>> This makes me think that the change was intentional.
>>>>>>>> Hi Ilya, Hi Smirnov,
>>>>>>>> so i implemented it this way because i assumed that
>>>>>>>> ROUTE_SOURCE_CONNECTED means that this route is directly connected to
>>>>>>>> the local LR. So that the LR has an interface that really has IPs out 
>>>>>>>> of
>>>>>>>> that network. In that case i never saw a way how one LR would have
>>>>>>>> multiple LRPs with the same network range. That just seemed like a
>>>>>>>> unrealistic case. So i decided to skip the ecmp grouping checks because
>>>>>>>> i thought this will just never happen.
>>>>>>>> However i just now saw that ROUTE_SOURCE_CONNECTED is actually also set
>>>>>>>> for the ic routes. Since there it seems to be more used for route
>>>>>>>> prioritization. It no longer holds that guarantee that there can be no
>>>>>>>> duplicate IPs.
>>>>>>>> Would it make sense to create ROUTE_SOURCE_ORIGIN_CONNECTED and
>>>>>>>> ROUTE_SOURCE_ORIGIN_STATIC and map the "origin" values to that. Then
>>>>>>>> grouping should work as expected. Then the ROUTE_SOURCE_ORIGIN_* could
>>>>>>>> also be covered route_source_to_offset to prioritize them correctly.
>>>>>>>>> But also, I'm not sure what is the end goal of this kind of setup.
>>>>>>>>> The underlying traffic through both transit switches will go through
>>>>>>>>> the same tunnels in the end, with just a slightly different metadata,
>>>>>>>>> so there is no real high-availability in this setup.  Or am I missing
>>>>>>>>> some other use case here?
>>>>>>>> You could also do ecmp to different destinations if you have 3 ovn
>>>>>>>> clusters. But i honestly see the point even less :)
>>>>>>>>> At the same time it seems a little arbitrary that learned routes can't
>>>>>>>>> form ecmp groups though.  Not sure why we have this seemingly 
>>>>>>>>> artificial
>>>>>>>>> restriction.
>>>>>>>> For me it was just that i thought there is never a reason to group 
>>>>>>>> them,
>>>>>>>> so i just wanted to skip unnecessary further processing. But it seems
>>>>>>>> like that assumption no longer holds.
>>>>>>>> I hope that helps clarifying it.
>>>>>>> Ack, thanks!  It seems like the issue only appears when ovn-ic copies
>>>>>>> "connected" routes from the other zone.  And unless we have multiple
>>>>>>> ports with the same subnet on the same router, we can only get these
>>>>>>> multiple routes when we learn the same route through multiple transit
>>>>>>> switches.  Which is a questionable topology.  So, I'm not sure if we
>>>>>>> actually need to fix that or not.
>>>>>>> Aleksandr, do you have a practical use case for this kind of topology?
>>>>>>>> Thanks a lot,
>>>>>>>> Felix
>>>>>>>>> What happens if learn an actual ecmp route from the other router?  
>>>>>>>>> i.e.
>>>>>>>>> if we have a real ecmp route to something external configured on one 
>>>>>>>>> of
>>>>>>>>> the routers connected through a transit switch, will it be learned
>>>>>>>>> properly?  It sounds like it wouldn't...
>>>>>>> This is not really a case, if it's a real statically configured ecmp 
>>>>>>> route,
>>>>>>> then it will not be "connected" in the first place and will be properly
>>>>>>> grouped after learning it in the other zone, because ovn-ic just copies
>>>>>>> the "origin".  So, this is not a problem and the only questionable case 
>>>>>>> is
>>>>>>> the actual learning of "connected" routes through different 
>>>>>>> interconnects.
>>>>>>>>> Felix, do you have some comments on this one?
>>>>>>>>> Best regards, Ilya Maximets.
>>> 
>>> _______________________________________________
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to