In case you decide the fix in not desired, ovn-nbctl must be fixed 
because current report is confusing saying routes are ecmp while in fact 
they are not.

On 7/17/25 4:16 PM, Ilya Maximets wrote:
> Hrm, adding Felix back.
>
> On 7/17/25 3:14 PM, Ilya Maximets wrote:
>> On 7/17/25 11:56 AM, Felix Huettner wrote:
>>> On Thu, Jul 17, 2025 at 11:29:24AM +0200, Ilya Maximets wrote:
>>>> On 7/16/25 9:05 AM, Smirnov Aleksandr (K2 Cloud) wrote:
>>>>> Hello,
>>>>>
>>>>> I noticed a big difference in the flow generated by northd between
>>>>> releases 24.09 and 25.03
>>>>>
>>>>> In the 25.03 northd fail to find similar routes and form ecmp group.
>>>>>
>>>>> I append following information:
>>>>>
>>>>> 1. Testcase scenario that can be easily copy-pasted to ovn-ic.at
>>>>>
>>>>> 2. Test output if ran in 24.09
>>>>>
>>>>> 3. Test output if ran in 25.03
>>>>>
>>>>> Could you please clarify is this real issue?
>>>> It looks like Felix made a change to never group "connected" routes,
>>>> i.e. the learned routes, in commit:
>>>>    f8924740f26e ("northd: Move connected routes to route engine.")
>>>>
>>>> The code that makes all such routes to never consider groupping is
>>>> the following:
>>>>
>>>> northd/en-group-ecmp-route.c:
>>>> static void
>>>> add_route(struct group_ecmp_datapath *gn, const struct parsed_route *pr)
>>>> {
>>>>      if (pr->source == ROUTE_SOURCE_CONNECTED) {
>>>>          unique_routes_add(gn, pr);
>>>>          return;
>>>>      }
>>>> ...
>>>>
>>>> All the routes learned from the other router through the transit switch
>>>> have ROUTE_SOURCE_CONNECTED as their source and not being considered for
>>>> ecmp groupping.  There is also a comment in the removal part:
>>>>
>>>>          if (pr->source == ROUTE_SOURCE_CONNECTED) {
>>>>              /* Connected routes are never part of an ecmp group.
>>>>               * We should recompute. */
>>>>              return false;
>>>>          }
>>>>
>>>> This makes me think that the change was intentional.
>>> Hi Ilya, Hi Smirnov,
>>>
>>> so i implemented it this way because i assumed that
>>> ROUTE_SOURCE_CONNECTED means that this route is directly connected to
>>> the local LR. So that the LR has an interface that really has IPs out of
>>> that network. In that case i never saw a way how one LR would have
>>> multiple LRPs with the same network range. That just seemed like a
>>> unrealistic case. So i decided to skip the ecmp grouping checks because
>>> i thought this will just never happen.
>>>
>>> However i just now saw that ROUTE_SOURCE_CONNECTED is actually also set
>>> for the ic routes. Since there it seems to be more used for route
>>> prioritization. It no longer holds that guarantee that there can be no
>>> duplicate IPs.
>>>
>>> Would it make sense to create ROUTE_SOURCE_ORIGIN_CONNECTED and
>>> ROUTE_SOURCE_ORIGIN_STATIC and map the "origin" values to that. Then
>>> grouping should work as expected. Then the ROUTE_SOURCE_ORIGIN_* could
>>> also be covered route_source_to_offset to prioritize them correctly.
>>>
>>>> But also, I'm not sure what is the end goal of this kind of setup.
>>>> The underlying traffic through both transit switches will go through
>>>> the same tunnels in the end, with just a slightly different metadata,
>>>> so there is no real high-availability in this setup.  Or am I missing
>>>> some other use case here?
>>> You could also do ecmp to different destinations if you have 3 ovn
>>> clusters. But i honestly see the point even less :)
>>>
>>>> At the same time it seems a little arbitrary that learned routes can't
>>>> form ecmp groups though.  Not sure why we have this seemingly artificial
>>>> restriction.
>>> For me it was just that i thought there is never a reason to group them,
>>> so i just wanted to skip unnecessary further processing. But it seems
>>> like that assumption no longer holds.
>>>
>>> I hope that helps clarifying it.
>> Ack, thanks!  It seems like the issue only appears when ovn-ic copies
>> "connected" routes from the other zone.  And unless we have multiple
>> ports with the same subnet on the same router, we can only get these
>> multiple routes when we learn the same route through multiple transit
>> switches.  Which is a questionable topology.  So, I'm not sure if we
>> actually need to fix that or not.
>>
>> Aleksandr, do you have a practical use case for this kind of topology?
>>
>>> Thanks a lot,
>>> Felix
>>>
>>>> What happens if learn an actual ecmp route from the other router?  i.e.
>>>> if we have a real ecmp route to something external configured on one of
>>>> the routers connected through a transit switch, will it be learned
>>>> properly?  It sounds like it wouldn't...
>> This is not really a case, if it's a real statically configured ecmp route,
>> then it will not be "connected" in the first place and will be properly
>> grouped after learning it in the other zone, because ovn-ic just copies
>> the "origin".  So, this is not a problem and the only questionable case is
>> the actual learning of "connected" routes through different interconnects.
>>
>>>> Felix, do you have some comments on this one?
>>>>
>>>> Best regards, Ilya Maximets.


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to