In 712fca55b3b1 ("controller: Prioritize host routes.") and later in cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.") support was added for advertising routes for objects (logical port / NAT / LB IPs) that are bound to a single chassis (e.g., distributed NAT) with a better metric on the chassis where they're bound. On all other chassis, however, the route was still advertised but only with a worse metric.
While this works fine in deployments as described in 712fca55b3b1 ("controller: Prioritize host routes."), this behavior actually makes the dynamic routing feature unusable in cases when all inter-node traffic is forwarded through the L3 fabric, e.g. spine-leaf topologies with iBGP between leaves and spines and eBGP between OVN compute nodes and fabric leafs: that's due to the fact that eBGP routes are usually preferred over iBGP routes. Consider the following example: +------+ +------+ |Spine1| |Spine2| +------+ +------+ | \ / | | \ / | | \ / | | X | | / \ | | / \ | | / \ | +-----+ +----+ |Leaf1| |Leaf2| +-----+ +----+ | | | | +----------+ +----------+ | Chassis1 | | Chassis2 | +----------+ +----------+ An OVN distributed NAT, e.g., 42.42.42.42 "bound" to Chassis1 would be advertised with a metric of 100 on the eBGP Chassis1 <-> Leaf1 connection and with a metric of 1000 (worse) on the eBGP Chasssi2 <-> Leaf2 connection. Leaf2 will also learn an iBGP route (through Spine1 and Spine2) for the same prefix (towards Chassis1) but because eBGP administrative distance is better than the iBGP one, Leaf2 will always prefer the metric 1000 route. That means Leaf2 will always forward traffic destined to 42.42.42.42 via Chassis2 which is sub-optimal. The main reason for advertising the (NAT) IP on both chassis was likely to provide redundancy in case traffic hits the OVN cluster on a node that doesn't host the NAT. But with topologies as the one depicted above the redundancy is handled by the fabric. OVN didn't have a way to disable worse metric route advertisements. This commit adds one, through a new logical router / logical router port option, "dynamic-routing-redistribute-local-only" which, if enabled, informs ovn-controller to not advertise routes for chassis bound IPs (Sb.Advertised_Route.tracked_port set) on chassis where the tracked port is not bound. By default this option is disabled. The option is propagated by ovn-northd to the SB.Port_Binding corresponding to the logical router port (or all router ports if configured on the router). Fixes: 712fca55b3b1 ("controller: Prioritize host routes.") Fixes: cd4ad2f56179 ("northd: Redistribution of NAT/LB routes.") Reported-at: https://issues.redhat.com/browse/FDP-1464 Tested-by: Jakub Libosvar <jlibo...@redhat.com> Signed-off-by: Dumitru Ceara <dce...@redhat.com> --- NOTE: while this change adds a new configuration option, I see this as a bug fix because there's no (not overly complex) way of using the OVN dynamic routing feature in 25.03 with topologies as above (which are likely common). This change is backport-safe because the new option is opt-in (disabled by default) so the default behavior stays untouched. Also, there's no restriction on update order as older ovn-northd or ovn-controllers just ignore the new option and there's no database schema change. Changes in v3: - Addressed Felix's review, fixed tracked_port in tracked_ports_remote when skipping not locally bound routes. - Added Jakub's tested-by. Changes in v2: - Addressed issue with Advertised_Routes that have logical_port set to a non chassis-redirect port (for distributed routers with DGP). - Added system test for the case above. --- NEWS | 5 + controller/route.c | 9 ++ northd/northd.c | 44 ++++--- ovn-nb.xml | 39 ++++++ ovn-sb.xml | 18 +++ tests/system-ovn.at | 281 ++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 381 insertions(+), 15 deletions(-) diff --git a/NEWS b/NEWS index 0cce1790db..54d676be87 100644 --- a/NEWS +++ b/NEWS @@ -41,6 +41,11 @@ Post v25.03.0 - Added support for running tests from the 'check-kernel' system test target under retis by setting OVS_TEST_WITH_RETIS=yes. See the 'Testing' section of the documentation for more details. + - Dynamic Routing: + * Add the option "dynamic-routing-redistribute-local-only" to Logical + Routers and Logical Router Ports which refines the way in which + chassis-specific Advertised_Routes (e.g., for NAT and LB IPs) are + advertised. OVN v25.03.0 - 07 Mar 2025 -------------------------- diff --git a/controller/route.c b/controller/route.c index 7615f3f593..841370ab2d 100644 --- a/controller/route.c +++ b/controller/route.c @@ -253,6 +253,10 @@ route_run(struct route_ctx_in *r_ctx_in, unsigned int priority = PRIORITY_DEFAULT; if (route->tracked_port) { + bool redistribute_local_bound_only = + smap_get_bool(&route->logical_port->options, + "dynamic-routing-redistribute-local-only", + false); if (lport_is_local(r_ctx_in->sbrec_port_binding_by_name, r_ctx_in->chassis, route->tracked_port->logical_port)) { @@ -262,6 +266,11 @@ route_run(struct route_ctx_in *r_ctx_in, } else { sset_add(r_ctx_out->tracked_ports_remote, route->tracked_port->logical_port); + if (redistribute_local_bound_only) { + /* We're not advertising routes whose 'tracked_port' is + * not local, skip this route. */ + continue; + } } } diff --git a/northd/northd.c b/northd/northd.c index d027d5c66d..2cb69f9aa8 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -3893,21 +3893,35 @@ sync_pb_for_lrp(struct ovn_port *op, } } - if ((is_cr_port(op) || chassis_name) && op->od->dynamic_routing) { - smap_add(&new, "dynamic-routing", "true"); - if (smap_get_bool(&op->nbrp->options, - "dynamic-routing-maintain-vrf", false)) { - smap_add(&new, "dynamic-routing-maintain-vrf", "true"); - } - const char *vrfname = smap_get(&op->od->nbr->options, - "dynamic-routing-vrf-name"); - if (vrfname) { - smap_add(&new, "dynamic-routing-vrf-name", vrfname); - } - const char *portname = smap_get(&op->nbrp->options, - "dynamic-routing-port-name"); - if (portname) { - smap_add(&new, "dynamic-routing-port-name", portname); + if (op->od->dynamic_routing) { + if (is_cr_port(op) || chassis_name) { + smap_add(&new, "dynamic-routing", "true"); + if (smap_get_bool(&op->nbrp->options, + "dynamic-routing-maintain-vrf", false)) { + smap_add(&new, "dynamic-routing-maintain-vrf", "true"); + } + const char *vrfname = smap_get(&op->od->nbr->options, + "dynamic-routing-vrf-name"); + if (vrfname) { + smap_add(&new, "dynamic-routing-vrf-name", vrfname); + } + const char *portname = smap_get(&op->nbrp->options, + "dynamic-routing-port-name"); + if (portname) { + smap_add(&new, "dynamic-routing-port-name", portname); + } + } + + const char *redistribute_local_only_name = + "dynamic-routing-redistribute-local-only"; + bool redistribute_local_only_val = + smap_get_bool(&op->nbrp->options, + redistribute_local_only_name, + smap_get_bool(&op->od->nbr->options, + redistribute_local_only_name, + false)); + if (redistribute_local_only_val) { + smap_add(&new, redistribute_local_only_name, "true"); } } diff --git a/ovn-nb.xml b/ovn-nb.xml index 4a75818075..cbe9c40bbe 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -3192,6 +3192,22 @@ or </p> </column> + <column name="options" + key="dynamic-routing-redistribute-local-only" + type='{"type": "boolean"}'> + <p> + Only relevant if <ref column="options" key="dynamic-routing"/> + is set to <code>true</code>. + </p> + + <p> + This controls whether <code>ovn-controller</code> will advertise + <ref table="Advertised_Route" db="OVN_Southbound"/> records + only on the chassis where their <code>tracked_port</code> is + bound. Default: <code>false</code>. + </p> + </column> + <column name="options" key="dynamic-routing-vrf-name" type='{"type": "string"}'> <p> @@ -4166,6 +4182,29 @@ or </column> + <column name="options" + key="dynamic-routing-redistribute-local-only" + type='{"type": "boolean"}'> + <p> + Only relevant if <ref column="options" key="dynamic-routing"/> + is set to <code>true</code>. + </p> + + <p> + This controls whether <code>ovn-controller</code> will advertise + <ref table="Advertised_Route" db="OVN_Southbound"/> records + only on the chassis where their <code>tracked_port</code> is + bound. + </p> + + <p> + If not set the value from <ref column="options" + key="dynamic-routing-redistribute-local-only" + table="Logical_Router"/> on the <ref table="Logical_Router"/> will + be used. + </p> + </column> + <column name="options" key="dynamic-routing-maintain-vrf" type='{"type": "boolean"}'> <p> diff --git a/ovn-sb.xml b/ovn-sb.xml index db5faac661..395deae83d 100644 --- a/ovn-sb.xml +++ b/ovn-sb.xml @@ -3908,6 +3908,24 @@ tcp.flags = RST; </column> </group> + <group title="Dynamic Routing"> + <column name="options" + key="dynamic-routing-redistribute-local-only" + type='{"type": "boolean"}'> + <p> + Only relevant if <ref column="options" key="dynamic-routing"/> + is set to <code>true</code>. + </p> + + <p> + This controls whether <code>ovn-controller</code> will advertise + <ref table="Advertised_Route" db="OVN_Southbound"/> records + only on the chassis where their <code>tracked_port</code> is + bound. Default: <code>false</code>. + </p> + </column> + </group> + <group title="Nested Containers"> <p> These columns support containers nested within a VM. Specifically, diff --git a/tests/system-ovn.at b/tests/system-ovn.at index e0407383af..5a5476af75 100644 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -16759,6 +16759,29 @@ OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl blackhole 10.42.10.10 proto ovn metric 1000 blackhole 172.16.1.150 proto ovn metric 1000]) +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true +# and verify that lower priority routes are not advertised anymore. +check ovn-nbctl --wait=hv set logical_router_port r1-join \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 172.16.1.150 proto ovn metric 1000]) + +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ + options dynamic-routing-redistribute-local-only \ + -- set logical_router R1 \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 172.16.1.150 proto ovn metric 1000]) + +check ovn-nbctl --wait=hv remove logical_router R1 \ + options dynamic-routing-redistribute-local-only + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 10.42.10.10 proto ovn metric 1000 +blackhole 172.16.1.150 proto ovn metric 1000]) + # Before cleanup of hv1 ovn-controller, trigger a recompute # to cleanup the local datapaths. Otherwise, the test will fail. # This is because we don't remove a datapath from @@ -16904,6 +16927,29 @@ OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium]) +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true +# and verify that lower priority routes are not advertised anymore. +check ovn-nbctl --wait=hv set logical_router_port r1-join \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium]) + +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ + options dynamic-routing-redistribute-local-only \ + -- set logical_router R1 \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium]) + +check ovn-nbctl --wait=hv remove logical_router R1 \ + options dynamic-routing-redistribute-local-only + +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl +blackhole 2001:db8:1001::150 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:3001::150 dev lo proto ovn metric 1000 pref medium]) + # Before cleanup of hv1 ovn-controller, trigger a recompute # to cleanup the local datapaths. Otherwise, the test will fail. # This is because we don't remove a datapath from @@ -17069,6 +17115,35 @@ blackhole 10.42.10.13 proto ovn metric 1000 blackhole 172.16.1.10 proto ovn metric 1000 blackhole 172.16.1.11 proto ovn metric 1000]) +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true +# and verify that lower priority routes are not advertised anymore. +check ovn-nbctl --wait=hv set logical_router_port r1-join \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 172.16.1.10 proto ovn metric 1000 +blackhole 172.16.1.11 proto ovn metric 1000]) + +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ + options dynamic-routing-redistribute-local-only \ + -- set logical_router R1 \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 172.16.1.10 proto ovn metric 1000 +blackhole 172.16.1.11 proto ovn metric 1000]) + +check ovn-nbctl --wait=hv remove logical_router R1 \ + options dynamic-routing-redistribute-local-only + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 10.42.10.10 proto ovn metric 1000 +blackhole 10.42.10.11 proto ovn metric 1000 +blackhole 10.42.10.12 proto ovn metric 1000 +blackhole 10.42.10.13 proto ovn metric 1000 +blackhole 172.16.1.10 proto ovn metric 1000 +blackhole 172.16.1.11 proto ovn metric 1000]) + # Add "guest" LS connected the distributed router R2 and one "VM" called # guest1. # Also, connect R2 to ls-join via another DGW. @@ -17283,6 +17358,35 @@ blackhole 2001:db8:1004::151 dev lo proto ovn metric 1000 pref medium blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium]) +# Set LR/LRP.options.dynamic-routing-redistribute-local-only=true +# and verify that lower priority routes are not advertised anymore. +check ovn-nbctl --wait=hv set logical_router_port r1-join \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium]) + +check ovn-nbctl --wait=hv remove logical_router_port r1-join \ + options dynamic-routing-redistribute-local-only \ + -- set logical_router R1 \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium]) + +check ovn-nbctl --wait=hv remove logical_router R1 \ + options dynamic-routing-redistribute-local-only + +OVN_ROUTE_V6_EQUAL([ovnvrf$vrf], [dnl +blackhole 2001:db8:1003::150 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1003::151 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1004::150 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1004::151 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1004::152 dev lo proto ovn metric 1000 pref medium +blackhole 2001:db8:1004::153 dev lo proto ovn metric 1000 pref medium]) + # Add "guest" LS connected the distributed router R2 and one "VM" called # guest1. # Also, connect R2 to ls-join via another DGW. @@ -17356,6 +17460,183 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d AT_CLEANUP ]) +OVN_FOR_EACH_NORTHD([ +AT_SETUP([dynamic-routing - NAT IPs on DGP]) +AT_KEYWORDS([dynamic-routing]) + +CHECK_VRF() +CHECK_CONNTRACK() +CHECK_CONNTRACK_NAT() + +vrf=1000 +VRF_RESERVE([$vrf]) +ovn_start +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-int]) +ADD_BR([br-ext], [set Bridge br-ext fail-mode=standalone]) + +dnl Set external-ids in br-int needed for ovn-controller. +check ovs-vsctl \ + -- set Open_vSwitch . external-ids:system-id=hv1 \ + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ + -- set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext \ + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true + +dnl Start ovn-controller. +start_daemon ovn-controller + +OVS_WAIT_WHILE([ip link | grep -q ovnvrf$vrf:.*UP]) + + +dnl Logical topology: +dnl +dnl +----+ +----+ +dnl |vif1| (chassis=hv1) |vif2| (chassis=hv2) +dnl +-+--+ +-+--+ +dnl | | +dnl +-----------------------+ +dnl | +dnl +--+---+ +dnl |LS sw0| +dnl +--+---+ +dnl | +dnl +--+--+ +dnl | LR | (distributed DNAT_AND_SNAT for vif1 and vif2) +dnl | R1 | +dnl +--+--+ +dnl | +dnl +---+------+ +dnl |LS public1| +dnl +---+------+ +dnl | +dnl +--+--+ +dnl | LR | (dynamic-routing=true, dynamic-routing-redistribute=nat,lb) +dnl | R2 | +dnl +--+--+ +dnl | +dnl +---+------+ +dnl |LS public2| +dnl +----------+ + +ADD_NAMESPACES(vif1) +ADD_VETH(vif1, vif1, br-int, "192.168.1.100/24", "f0:00:00:00:01:00", "192.168.1.1") + +check ovn-nbctl \ + -- lr-add R1 \ + -- lrp-add R1 R1-S0 00:00:01:01:02:03 192.168.1.1/24 \ + -- lrp-add R1 R1-PUB1 00:00:02:01:02:03 172.16.1.1/24 \ + -- lrp-set-gateway-chassis R1-PUB1 hv1 + +check ovn-nbctl \ + -- lr-add R2 \ + -- set Logical_Router R2 \ + options:requested-tnl-key=$vrf \ + options:dynamic-routing=true \ + options:dynamic-routing-redistribute=lb,nat \ + -- lrp-add R2 R2-PUB1 00:02:02:01:02:03 172.16.1.2/24 \ + -- lrp-add R2 R2-PUB2 00:03:02:01:02:03 42.42.42.1/24 \ + -- lrp-set-options R2-PUB2 \ + dynamic-routing-maintain-vrf=true \ + -- lrp-set-gateway-chassis R2-PUB2 hv1 + +check ovn-nbctl \ + -- ls-add S0 \ + -- lsp-add S0 S0-R1 \ + -- set Logical_Switch_Port S0-R1 type=router \ + options:router-port=R1-S0 \ + -- lsp-set-addresses S0-R1 router \ + -- lsp-add S0 vif1 \ + -- lsp-add S0 vif2 + +check ovn-nbctl \ + -- ls-add PUB1 \ + -- lsp-add PUB1 PUB1-R1 \ + -- set Logical_Switch_Port PUB1-R1 type=router \ + options:router-port=R1-PUB1 \ + -- lsp-set-addresses PUB1-R1 router \ + -- lsp-add PUB1 PUB1-R2 \ + -- set Logical_Switch_Port PUB1-R2 type=router \ + options:router-port=R2-PUB1 \ + -- lsp-set-addresses PUB1-R2 router \ + -- lsp-add PUB1 ln1 \ + -- lsp-set-addresses ln1 unknown \ + -- lsp-set-type ln1 localnet \ + -- lsp-set-options ln1 network_name=phynet + +check ovn-nbctl \ + -- ls-add PUB2 \ + -- lsp-add PUB2 PUB2-R2 \ + -- set Logical_Switch_Port PUB2-R2 type=router \ + options:router-port=R2-PUB2 \ + -- lsp-set-addresses PUB2-R2 router \ + -- lsp-add PUB2 ln2 \ + -- lsp-set-addresses ln2 unknown \ + -- lsp-set-type ln2 localnet \ + -- lsp-set-options ln2 network_name=phynet + +OVN_POPULATE_ARP +check ovn-nbctl --wait=hv sync +wait_for_ports_up vif1 + +AT_CHECK([test `ip route show table $vrf | wc -l` -eq 1], [1]) +AT_CHECK([ip link | grep -q ovnvrf$vrf:.*UP]) + +dnl Create distributed NATs and associate them with vif1 and vif2. +check ovn-nbctl --wait=hv \ + -- lr-nat-add R1 dnat_and_snat 172.16.1.100 192.168.1.100 vif1 00:00:00:00:00:01 \ + -- lr-nat-add R1 dnat_and_snat 172.16.1.200 192.168.1.200 vif2 00:00:00:00:00:02 + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 172.16.1.100 proto ovn metric 100 +blackhole 172.16.1.200 proto ovn metric 1000]) + +dnl Set LR/LRP.options.dynamic-routing-redistribute-local-only=true +dnl and verify that lower priority routes are not advertised anymore. +check ovn-nbctl --wait=hv set logical_router_port R2-PUB1 \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 172.16.1.100 proto ovn metric 100]) + +check ovn-nbctl --wait=hv remove logical_router_port R2-PUB1 \ + options dynamic-routing-redistribute-local-only \ + -- set logical_router R2 \ + options:dynamic-routing-redistribute-local-only=true + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 172.16.1.100 proto ovn metric 100]) + +check ovn-nbctl --wait=hv remove logical_router R2 \ + options dynamic-routing-redistribute-local-only + +OVN_ROUTE_EQUAL([ovnvrf$vrf], [dnl +blackhole 172.16.1.100 proto ovn metric 100 +blackhole 172.16.1.200 proto ovn metric 1000]) + +OVN_CLEANUP_CONTROLLER([hv1]) + +dnl Ensure system resources are cleaned up. +AT_CHECK([ip link | grep -q ovnvrf$vrf:.*UP], [1]) +AT_CHECK([test `ip route show table $vrf | wc -l` -eq 1], [1]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d +/Failed to acquire.*/d +/connection dropped.*/d"]) +AT_CLEANUP +]) + OVN_FOR_EACH_NORTHD([ AT_SETUP([Mac binding aging - Probing]) AT_KEYWORDS([mac_binding_probing]) -- 2.50.1 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev