On Thu, Jul 17, 2025 at 11:29:24AM +0200, Ilya Maximets wrote:
> On 7/16/25 9:05 AM, Smirnov Aleksandr (K2 Cloud) wrote:
> > Hello,
> > 
> > I noticed a big difference in the flow generated by northd between 
> > releases 24.09 and 25.03
> > 
> > In the 25.03 northd fail to find similar routes and form ecmp group.
> > 
> > I append following information:
> > 
> > 1. Testcase scenario that can be easily copy-pasted to ovn-ic.at
> > 
> > 2. Test output if ran in 24.09
> > 
> > 3. Test output if ran in 25.03
> > 
> > Could you please clarify is this real issue?
> 
> It looks like Felix made a change to never group "connected" routes,
> i.e. the learned routes, in commit:
>   f8924740f26e ("northd: Move connected routes to route engine.")
> 
> The code that makes all such routes to never consider groupping is
> the following:
> 
> northd/en-group-ecmp-route.c:
> static void
> add_route(struct group_ecmp_datapath *gn, const struct parsed_route *pr)
> {
>     if (pr->source == ROUTE_SOURCE_CONNECTED) {
>         unique_routes_add(gn, pr);
>         return;
>     }
> ...
> 
> All the routes learned from the other router through the transit switch
> have ROUTE_SOURCE_CONNECTED as their source and not being considered for
> ecmp groupping.  There is also a comment in the removal part:
> 
>         if (pr->source == ROUTE_SOURCE_CONNECTED) {
>             /* Connected routes are never part of an ecmp group.
>              * We should recompute. */
>             return false;
>         }
> 
> This makes me think that the change was intentional.

Hi Ilya, Hi Smirnov,

so i implemented it this way because i assumed that
ROUTE_SOURCE_CONNECTED means that this route is directly connected to
the local LR. So that the LR has an interface that really has IPs out of
that network. In that case i never saw a way how one LR would have
multiple LRPs with the same network range. That just seemed like a
unrealistic case. So i decided to skip the ecmp grouping checks because
i thought this will just never happen.

However i just now saw that ROUTE_SOURCE_CONNECTED is actually also set
for the ic routes. Since there it seems to be more used for route
prioritization. It no longer holds that guarantee that there can be no
duplicate IPs.

Would it make sense to create ROUTE_SOURCE_ORIGIN_CONNECTED and
ROUTE_SOURCE_ORIGIN_STATIC and map the "origin" values to that. Then
grouping should work as expected. Then the ROUTE_SOURCE_ORIGIN_* could
also be covered route_source_to_offset to prioritize them correctly.

> 
> But also, I'm not sure what is the end goal of this kind of setup.
> The underlying traffic through both transit switches will go through
> the same tunnels in the end, with just a slightly different metadata,
> so there is no real high-availability in this setup.  Or am I missing
> some other use case here?

You could also do ecmp to different destinations if you have 3 ovn
clusters. But i honestly see the point even less :)

> 
> At the same time it seems a little arbitrary that learned routes can't
> form ecmp groups though.  Not sure why we have this seemingly artificial
> restriction.

For me it was just that i thought there is never a reason to group them,
so i just wanted to skip unnecessary further processing. But it seems
like that assumption no longer holds.

I hope that helps clarifying it.

Thanks a lot,
Felix

> 
> What happens if learn an actual ecmp route from the other router?  i.e.
> if we have a real ecmp route to something external configured on one of
> the routers connected through a transit switch, will it be learned
> properly?  It sounds like it wouldn't...
> 
> Felix, do you have some comments on this one?
> 
> Best regards, Ilya Maximets.
> 
> > 
> > Thank you,
> > 
> > Alexander
> > 
> > =============== TEST =====================
> > 
> > OVN_FOR_EACH_NORTHD([
> > AT_SETUP([ovn-ic -- east-west - 2az])
> > 
> > ovn_init_ic_db
> > ovn-ic-nbctl ts-add rtb-1
> > ovn-ic-nbctl ts-add rtb-2
> > 
> > ovn_start az1
> > ovn_as az1
> > 
> > ovn_as az1 check ovn-ic-nbctl --wait=sb sync
> > ovn_as az1 check ovn-nbctl set nb_global . options:ic-route-learn=true
> > ovn_as az1 check ovn-nbctl set nb_global . options:ic-route-adv=true
> > 
> > ovn_as az1 check ovn-nbctl ls-add subnet-A
> > ovn_as az1 check ovn-nbctl lsp-add subnet-A subnet-A-up -- lsp-set-type 
> > subnet-A-up router -- lsp-set-addresses subnet-A-up router -- 
> > lsp-set-options subnet-A-up router-port=subnet-A
> > ovn_as az1 check ovn-nbctl lsp-add subnet-A client -- lsp-set-addresses 
> > client "0a:00:43:1e:92:20 172.31.0.4"
> > 
> > ovn_as az1 check ovn-nbctl lsp-add rtb-1 rtb-1-down1 -- lsp-set-type 
> > rtb-1-down1 router -- lsp-set-addresses rtb-1-down1 router -- 
> > lsp-set-options rtb-1-down1 router-port=rtb-1
> > ovn_as az1 check ovn-nbctl lsp-add rtb-2 rtb-2-down1 -- lsp-set-type 
> > rtb-2-down1 router -- lsp-set-addresses rtb-2-down1 router -- 
> > lsp-set-options rtb-2-down1 router-port=rtb-2
> > 
> > ovn_as az1 check ovn-nbctl lr-add vpc
> > ovn_as az1 check ovn-nbctl lrp-add vpc subnet-A "d0:fe:00:00:00:14" 
> > "172.31.0.1/24" -- lrp-set-options subnet-A route_table=table1
> > ovn_as az1 check ovn-nbctl lrp-add vpc rtb-1 "00:00:a0:9e:9d:40" 
> > "169.254.100.1/27" -- lrp-set-options rtb-1 route_table=table1
> > ovn_as az1 check ovn-nbctl lrp-add vpc rtb-2 "00:00:60:15:b8:20" 
> > "169.254.100.33/27" -- lrp-set-options rtb-2 route_table=table2
> > 
> > ovn_start az2
> > ovn_as az2
> > 
> > ovn_as az2 check ovn-ic-nbctl --wait=sb sync
> > ovn_as az2 check ovn-nbctl set nb_global . options:ic-route-learn=true
> > ovn_as az2 check ovn-nbctl set nb_global . options:ic-route-adv=true
> > 
> > ovn_as az2 check ovn-nbctl ls-add subnet-D
> > ovn_as az2 check ovn-nbctl lsp-add subnet-D subnet-D-up -- lsp-set-type 
> > subnet-D-up router -- lsp-set-addresses subnet-D-up router -- 
> > lsp-set-options subnet-D-up router-port=subnet-D
> > ovn_as az2 check ovn-nbctl lsp-add subnet-D filter2 -- lsp-set-addresses 
> > filter2 "0a:01:39:eb:b1:41 172.31.3.4"
> > 
> > ovn_as az2 check ovn-nbctl lsp-add rtb-1 rtb-1-down2 -- lsp-set-type 
> > rtb-1-down2 router -- lsp-set-addresses rtb-1-down2 router -- 
> > lsp-set-options rtb-1-down2 router-port=rtb-1
> > ovn_as az2 check ovn-nbctl lsp-add rtb-2 rtb-2-down2 -- lsp-set-type 
> > rtb-2-down2 router -- lsp-set-addresses rtb-2-down2 router -- 
> > lsp-set-options rtb-2-down2 router-port=rtb-2
> > 
> > ovn_as az2 check ovn-nbctl lr-add vpc
> > ovn_as az2 check ovn-nbctl lrp-add vpc subnet-D "d0:fe:00:00:00:17" 
> > "172.31.3.1/24" -- lrp-set-options subnet-D route_table=table2
> > ovn_as az2 check ovn-nbctl lrp-add vpc rtb-1 "00:01:a0:9e:9d:40" 
> > "169.254.100.2/27" -- lrp-set-options rtb-1 route_table=table1
> > ovn_as az2 check ovn-nbctl lrp-add vpc rtb-2 "00:01:60:15:b8:20" 
> > "169.254.100.34/27" -- lrp-set-options rtb-2 route_table=table2
> > 
> > check ovn-ic-nbctl --wait=sb sync
> > 
> > ovn_as az1 ovn-nbctl list logical_router_static
> > ovn_as az1 ovn-nbctl lr-route-list vpc
> > ovn_as az1 ovn-sbctl lflow-list vpc | grep lr_in_ip_routing
> > 
> > OVN_CLEANUP_IC([az1], [az2])
> > 
> > AT_CLEANUP
> > ])
> > 
> > =============== CUT =====================
> > 
> > 
> > =============== OUTPUT 24.09.02 =====================
> > 
> > _uuid               : ca00ec23-dbb7-4fe1-87ab-04765efa5d7f
> > bfd                 : []
> > external_ids        : 
> > {ic-learned-route="cde67ffc-bcfa-4470-bd8e-93ac59401344"}
> > ip_prefix           : "172.31.3.1/24"
> > nexthop             : "169.254.100.2"
> > options             : {origin=connected}
> > output_port         : []
> > policy              : []
> > route_table         : ""
> > 
> > _uuid               : 421525df-80d0-4a92-ad8d-e00dde47d60d
> > bfd                 : []
> > external_ids        : 
> > {ic-learned-route="fa63cc12-5fdf-44df-9b07-e8006641d7b4"}
> > ip_prefix           : "172.31.3.1/24"
> > nexthop             : "169.254.100.34"
> > options             : {origin=connected}
> > output_port         : []
> > policy              : []
> > route_table         : ""
> > 
> > IPv4 Routes
> > Route Table <main>:
> >              172.31.3.0/24             169.254.100.2 dst-ip (learned) ecmp
> >              172.31.3.0/24            169.254.100.34 dst-ip (learned) ecmp
> > 
> >    table=13(lr_in_ip_routing_pre), priority=100  , match=(inport == 
> > "rtb-1"), action=(reg7 = 2; next;)
> >    table=13(lr_in_ip_routing_pre), priority=100  , match=(inport == 
> > "rtb-2"), action=(reg7 = 1; next;)
> >    table=13(lr_in_ip_routing_pre), priority=100  , match=(inport == 
> > "subnet-A"), action=(reg7 = 2; next;)
> >    table=13(lr_in_ip_routing_pre), priority=0    , match=(1), 
> > action=(reg7 = 0; next;)
> >    table=14(lr_in_ip_routing   ), priority=10550, match=(nd_rs || 
> > nd_ra), action=(drop;)
> >    table=14(lr_in_ip_routing   ), priority=194  , match=(inport == 
> > "rtb-1" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[0..15] = 0; 
> > xxreg0 = ip6.dst; xxreg1 = fe80::200:a0ff:fe9e:9d40; eth.src = 
> > 00:00:a0:9e:9d:40; outport = "rtb-1"; flags.loopback = 1; next;)
> >    table=14(lr_in_ip_routing   ), priority=194  , match=(inport == 
> > "rtb-2" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[0..15] = 0; 
> > xxreg0 = ip6.dst; xxreg1 = fe80::200:60ff:fe15:b820; eth.src = 
> > 00:00:60:15:b8:20; outport = "rtb-2"; flags.loopback = 1; next;)
> >    table=14(lr_in_ip_routing   ), priority=194  , match=(inport == 
> > "subnet-A" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[0..15] = 0; 
> > xxreg0 = ip6.dst; xxreg1 = fe80::d2fe:ff:fe00:14; eth.src = 
> > d0:fe:00:00:00:14; outport = "subnet-A"; flags.loopback = 1; next;)
> >    table=14(lr_in_ip_routing   ), priority=83   , match=(ip4.dst == 
> > 169.254.100.0/27), action=(ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; 
> > reg1 = 169.254.100.1; eth.src = 00:00:a0:9e:9d:40; outport = "rtb-1"; 
> > flags.loopback = 1; next;)
> >    table=14(lr_in_ip_routing   ), priority=83   , match=(ip4.dst == 
> > 169.254.100.32/27), action=(ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; 
> > reg1 = 169.254.100.33; eth.src = 00:00:60:15:b8:20; outport = "rtb-2"; 
> > flags.loopback = 1; next;)
> >    table=14(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 
> > 172.31.0.0/24), action=(ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; reg1 
> > = 172.31.0.1; eth.src = d0:fe:00:00:00:14; outport = "subnet-A"; 
> > flags.loopback = 1; next;)
> >    table=14(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 
> > 172.31.3.0/24), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1; 
> > reg8[16..31] = select(1, 2);)
> >    table=14(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
> >    table=15(lr_in_ip_routing_ecmp), priority=150  , match=(reg8[0..15] 
> > == 0), action=(next;)
> >    table=15(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] 
> > == 1 && reg8[16..31] == 1), action=(reg0 = 169.254.100.34; reg1 = 
> > 169.254.100.33; eth.src = 00:00:60:15:b8:20; outport = "rtb-2"; next;)
> >    table=15(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] 
> > == 1 && reg8[16..31] == 2), action=(reg0 = 169.254.100.2; reg1 = 
> > 169.254.100.1; eth.src = 00:00:a0:9e:9d:40; outport = "rtb-1"; next;)
> >    table=15(lr_in_ip_routing_ecmp), priority=0    , match=(1), 
> > action=(drop;)
> > 
> > =============== CUT =====================
> > 
> > 
> > =============== OUTPUT 25.03.00 =====================
> > 
> > _uuid               : 18e0b3b1-4d5b-4e8f-937b-d5d7b33d3f9c
> > bfd                 : []
> > external_ids        : 
> > {ic-learned-route="974ae0e0-6b21-4e5f-9a61-5ad329a7b3af"}
> > ip_prefix           : "172.31.3.1/24"
> > nexthop             : "169.254.100.34"
> > options             : {origin=connected}
> > output_port         : []
> > policy              : []
> > route_table         : ""
> > selection_fields    : []
> > 
> > _uuid               : a00be0c2-48e1-406d-945d-e0084888e10d
> > bfd                 : []
> > external_ids        : 
> > {ic-learned-route="613ac324-26cb-4b78-b0bc-3a91be53fe12"}
> > ip_prefix           : "172.31.3.1/24"
> > nexthop             : "169.254.100.2"
> > options             : {origin=connected}
> > output_port         : []
> > policy              : []
> > route_table         : ""
> > selection_fields    : []
> > IPv4 Routes
> > 
> > Route Table <main>:
> >              172.31.3.0/24             169.254.100.2 dst-ip (learned) ecmp
> >              172.31.3.0/24            169.254.100.34 dst-ip (learned) ecmp
> > 
> >    table=14(lr_in_ip_routing_pre), priority=100  , match=(inport == 
> > "rtb-1"), action=(reg7 = 1; next;)
> >    table=14(lr_in_ip_routing_pre), priority=100  , match=(inport == 
> > "rtb-2"), action=(reg7 = 2; next;)
> >    table=14(lr_in_ip_routing_pre), priority=100  , match=(inport == 
> > "subnet-A"), action=(reg7 = 1; next;)
> >    table=14(lr_in_ip_routing_pre), priority=0    , match=(1), 
> > action=(reg7 = 0; next;)
> >    table=15(lr_in_ip_routing   ), priority=10550, match=(nd_rs || 
> > nd_ra), action=(drop;)
> >    table=15(lr_in_ip_routing   ), priority=518  , match=(inport == 
> > "rtb-1" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[0..15] = 0; 
> > xxreg0 = ip6.dst; xxreg1 = fe80::200:a0ff:fe9e:9d40; eth.src = 
> > 00:00:a0:9e:9d:40; outport = "rtb-1"; flags.loopback = 1; reg9[9] = 0; 
> > next;)
> >    table=15(lr_in_ip_routing   ), priority=518  , match=(inport == 
> > "rtb-2" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[0..15] = 0; 
> > xxreg0 = ip6.dst; xxreg1 = fe80::200:60ff:fe15:b820; eth.src = 
> > 00:00:60:15:b8:20; outport = "rtb-2"; flags.loopback = 1; reg9[9] = 0; 
> > next;)
> >    table=15(lr_in_ip_routing   ), priority=518  , match=(inport == 
> > "subnet-A" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[0..15] = 0; 
> > xxreg0 = ip6.dst; xxreg1 = fe80::d2fe:ff:fe00:14; eth.src = 
> > d0:fe:00:00:00:14; outport = "subnet-A"; flags.loopback = 1; reg9[9] = 
> > 0; next;)
> >    table=15(lr_in_ip_routing   ), priority=222  , match=(ip4.dst == 
> > 169.254.100.0/27), action=(ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; 
> > reg5 = 169.254.100.1; eth.src = 00:00:a0:9e:9d:40; outport = "rtb-1"; 
> > flags.loopback = 1; reg9[9] = 1; next;)
> >    table=15(lr_in_ip_routing   ), priority=222  , match=(ip4.dst == 
> > 169.254.100.32/27), action=(ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; 
> > reg5 = 169.254.100.33; eth.src = 00:00:60:15:b8:20; outport = "rtb-2"; 
> > flags.loopback = 1; reg9[9] = 1; next;)
> >    table=15(lr_in_ip_routing   ), priority=198  , match=(ip4.dst == 
> > 172.31.0.0/24), action=(ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; reg5 
> > = 172.31.0.1; eth.src = d0:fe:00:00:00:14; outport = "subnet-A"; 
> > flags.loopback = 1; reg9[9] = 1; next;)
> >    table=15(lr_in_ip_routing   ), priority=198  , match=(ip4.dst == 
> > 172.31.3.0/24), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 169.254.100.2; 
> > reg5 = 169.254.100.1; eth.src = 00:00:a0:9e:9d:40; outport = "rtb-1"; 
> > flags.loopback = 1; reg9[9] = 1; next;)
> >    table=15(lr_in_ip_routing   ), priority=198  , match=(ip4.dst == 
> > 172.31.3.0/24), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 
> > 169.254.100.34; reg5 = 169.254.100.33; eth.src = 00:00:60:15:b8:20; 
> > outport = "rtb-2"; flags.loopback = 1; reg9[9] = 1; next;)
> >    table=15(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
> >    table=16(lr_in_ip_routing_ecmp), priority=150  , match=(reg8[0..15] 
> > == 0), action=(next;)
> >    table=16(lr_in_ip_routing_ecmp), priority=0    , match=(1), 
> > action=(drop;)
> > 
> > =============== CUT =====================
> > 
> > 
> > _______________________________________________
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to