On Thu, Jul 17, 2025 at 01:50:56PM +0000, Smirnov Aleksandr (K2 Cloud) wrote:
> In case you decide the fix in not desired, ovn-nbctl must be fixed 
> because current report is confusing saying routes are ecmp while in fact 
> they are not.

Hi everyone,

i just took a further look and discovered more chaos in the
"origin=connected" topic.

Currently treating it as ROUTE_SOURCE_CONNECTED also means that we will
prioritize these as normal connected routes and higher than
ROUTE_SOURCE_STATIC.

So if we assume the following routes:
* 192.168.0.0/24 via 10.0.10.10 as normale Logical_Router_Static_Route
* 192.168.0.0/24 via 10.0.11.11 as Logical_Router_Static_Route via
  ovn-ic with origin=connected
* 192.168.0.0/24 via 10.0.12.12 as Logical_Router_Static_Route via
  ovn-ic with origin=static

This currently results in the following effective routes (if i
understood it correctly):
* High priority: 192.168.0.0/24 via 10.0.11.11
* Low priority: 192.168.0.0/24 via ecmp of 10.0.10.10 and 10.0.12.12

This is honestly quite confusing from my perspective:
1. why should an ovn-ic route be of higher priority than a local route
2. why should ecmp work between ovn-ic and non-ic routes

>From my view (of never using ovn-ic) i would have expected that the
ovn-ic are always of lower priority than non-ic routes.

What are your opinions on that?

Thanks a lot,
Felix

> 
> On 7/17/25 4:16 PM, Ilya Maximets wrote:
> > Hrm, adding Felix back.
> >
> > On 7/17/25 3:14 PM, Ilya Maximets wrote:
> >> On 7/17/25 11:56 AM, Felix Huettner wrote:
> >>> On Thu, Jul 17, 2025 at 11:29:24AM +0200, Ilya Maximets wrote:
> >>>> On 7/16/25 9:05 AM, Smirnov Aleksandr (K2 Cloud) wrote:
> >>>>> Hello,
> >>>>>
> >>>>> I noticed a big difference in the flow generated by northd between
> >>>>> releases 24.09 and 25.03
> >>>>>
> >>>>> In the 25.03 northd fail to find similar routes and form ecmp group.
> >>>>>
> >>>>> I append following information:
> >>>>>
> >>>>> 1. Testcase scenario that can be easily copy-pasted to ovn-ic.at
> >>>>>
> >>>>> 2. Test output if ran in 24.09
> >>>>>
> >>>>> 3. Test output if ran in 25.03
> >>>>>
> >>>>> Could you please clarify is this real issue?
> >>>> It looks like Felix made a change to never group "connected" routes,
> >>>> i.e. the learned routes, in commit:
> >>>>    f8924740f26e ("northd: Move connected routes to route engine.")
> >>>>
> >>>> The code that makes all such routes to never consider groupping is
> >>>> the following:
> >>>>
> >>>> northd/en-group-ecmp-route.c:
> >>>> static void
> >>>> add_route(struct group_ecmp_datapath *gn, const struct parsed_route *pr)
> >>>> {
> >>>>      if (pr->source == ROUTE_SOURCE_CONNECTED) {
> >>>>          unique_routes_add(gn, pr);
> >>>>          return;
> >>>>      }
> >>>> ...
> >>>>
> >>>> All the routes learned from the other router through the transit switch
> >>>> have ROUTE_SOURCE_CONNECTED as their source and not being considered for
> >>>> ecmp groupping.  There is also a comment in the removal part:
> >>>>
> >>>>          if (pr->source == ROUTE_SOURCE_CONNECTED) {
> >>>>              /* Connected routes are never part of an ecmp group.
> >>>>               * We should recompute. */
> >>>>              return false;
> >>>>          }
> >>>>
> >>>> This makes me think that the change was intentional.
> >>> Hi Ilya, Hi Smirnov,
> >>>
> >>> so i implemented it this way because i assumed that
> >>> ROUTE_SOURCE_CONNECTED means that this route is directly connected to
> >>> the local LR. So that the LR has an interface that really has IPs out of
> >>> that network. In that case i never saw a way how one LR would have
> >>> multiple LRPs with the same network range. That just seemed like a
> >>> unrealistic case. So i decided to skip the ecmp grouping checks because
> >>> i thought this will just never happen.
> >>>
> >>> However i just now saw that ROUTE_SOURCE_CONNECTED is actually also set
> >>> for the ic routes. Since there it seems to be more used for route
> >>> prioritization. It no longer holds that guarantee that there can be no
> >>> duplicate IPs.
> >>>
> >>> Would it make sense to create ROUTE_SOURCE_ORIGIN_CONNECTED and
> >>> ROUTE_SOURCE_ORIGIN_STATIC and map the "origin" values to that. Then
> >>> grouping should work as expected. Then the ROUTE_SOURCE_ORIGIN_* could
> >>> also be covered route_source_to_offset to prioritize them correctly.
> >>>
> >>>> But also, I'm not sure what is the end goal of this kind of setup.
> >>>> The underlying traffic through both transit switches will go through
> >>>> the same tunnels in the end, with just a slightly different metadata,
> >>>> so there is no real high-availability in this setup.  Or am I missing
> >>>> some other use case here?
> >>> You could also do ecmp to different destinations if you have 3 ovn
> >>> clusters. But i honestly see the point even less :)
> >>>
> >>>> At the same time it seems a little arbitrary that learned routes can't
> >>>> form ecmp groups though.  Not sure why we have this seemingly artificial
> >>>> restriction.
> >>> For me it was just that i thought there is never a reason to group them,
> >>> so i just wanted to skip unnecessary further processing. But it seems
> >>> like that assumption no longer holds.
> >>>
> >>> I hope that helps clarifying it.
> >> Ack, thanks!  It seems like the issue only appears when ovn-ic copies
> >> "connected" routes from the other zone.  And unless we have multiple
> >> ports with the same subnet on the same router, we can only get these
> >> multiple routes when we learn the same route through multiple transit
> >> switches.  Which is a questionable topology.  So, I'm not sure if we
> >> actually need to fix that or not.
> >>
> >> Aleksandr, do you have a practical use case for this kind of topology?
> >>
> >>> Thanks a lot,
> >>> Felix
> >>>
> >>>> What happens if learn an actual ecmp route from the other router?  i.e.
> >>>> if we have a real ecmp route to something external configured on one of
> >>>> the routers connected through a transit switch, will it be learned
> >>>> properly?  It sounds like it wouldn't...
> >> This is not really a case, if it's a real statically configured ecmp route,
> >> then it will not be "connected" in the first place and will be properly
> >> grouped after learning it in the other zone, because ovn-ic just copies
> >> the "origin".  So, this is not a problem and the only questionable case is
> >> the actual learning of "connected" routes through different interconnects.
> >>
> >>>> Felix, do you have some comments on this one?
> >>>>
> >>>> Best regards, Ilya Maximets.
> 
> 
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to