On 1/28/25 1:13 PM, Felix Huettner wrote: > On Thu, Jan 16, 2025 at 01:05:42PM +0100, Dumitru Ceara wrote: >> Hi Felix, > > Hi Dumitru, > > thanks for the review. >
Hi Felix, >> >> On 1/2/25 4:19 PM, Felix Huettner via dev wrote: >>> Previously we just assumed that if a LR had multiple LRPs bound to the >>> local chassis that all routes where valid for all of these LRPs. >>> This commit handles a previous option in ovn-nb that allowes the user to >>> specify the interface name that a route needs to use to be acceptable >>> for a given LRP. >>> The users can then have a 1:1 relationship between LRPs and interfaces >>> to allow ovn-controller to determine which route belongs to which LRP. >> >> I'm a bit confused by the dynamic-routing-ifname option. If I >> understand correctly, this is used to filter what routes we inject in >> the SB for a given LRP, i.e., only if the LRP.dynamic-routing-ifname >> matches the interface the route was learnt in the VRF where the routing >> control plane (FRR) runs. >> >> But, in which case will FRR learn routes on more than one interface (the >> one on which we redirected BGP control traffic)? >> >> I can imagine a scenario in which FRR gets its control plane traffic >> (e.g., BGP packets) in a different way (e.g., before that traffic gets >> sent to br-int), potentially on different interfaces but I thought >> that's not the use case we're targeting in the OVN BGP integration. >> >> It might be a good idea to add some examples to the NB documentation in >> the patch where we introduce this option. > > I did add more docs here > https://patchwork.ozlabs.org/project/ovn/patch/0236ff31f79abbf95b07a6a247d42fcfc348cad3.1737472971.git.felix.huettner@stackit.cloud/ > So i hope this helps. I'll also update the commit message to make this > clearer. > > The setup i had in mind was a LR with 5 LRPs. > LRP 1 goes on to some internal LS with VIFs or so. > LRP 2-5 are bound to the same LS that has a localnet port. > LRP 2-3 are bound to one chassis. > LRP 4-5 are bound to another chassis. > Each of LRP 2-5 represents a single uplink from the chassis it is bound > on. So each of the two chassis has two separate uplinks. > On these separate uplinks you can then run separate BGP sessions using > routing-protocol-redirect. Having different MACs and IPs ensures that > this actually works, allthough it looks really hacky. > Oh, so in the LR ovnvrf you'll have 2 different interfaces that are bound to two different LSPs that correspond to LRP2,3 (or LRP 4,5) routing-protocol-redirect values. OK, I guess that would work. Do we have a system test for this kind of deployment? (even if we only cover one chassis). > > When sending now traffic from something behind LRP 1 to the outside > there is a 1:4 chance for each of LRP 2-5 behing used. > > The benefit of learning e.g. a default route individually on each of the > LRPs is if e.g. one of the uplinks now fails. > If e.g. LRP 2 fails then the route gets removed and we then have a 1:3 > chance for each of LRP 3-5 for egress traffic. That means that one > chassis will receive half of the traffic than the other one for > forwarding. > This exactly matches the available uplinks. > > If we would bond the two uplinks on each chassis, or if we would just > learn the routes on both LRPs then traffic would still hit both chassis > in the same ratio. As one chassis actually has less bandwidth available > we would utilize it more than we actually should. > > I hope that clarifies what i had in mind. > I think it does, thanks! > >> >>> >>> Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud> >>> --- >>> controller/route-exchange-netlink.c | 1 + >>> controller/route-exchange-netlink.h | 2 ++ >>> controller/route-exchange.c | 21 +++++++++++++++------ >>> controller/route.c | 12 +++++++++--- >>> controller/route.h | 6 ++++-- >>> tests/system-ovn.at | 25 +++++++++++++++++++++++-- >>> 6 files changed, 54 insertions(+), 13 deletions(-) >>> >>> diff --git a/controller/route-exchange-netlink.c >>> b/controller/route-exchange-netlink.c >>> index c2a7551f3..9fea0d955 100644 >>> --- a/controller/route-exchange-netlink.c >>> +++ b/controller/route-exchange-netlink.c >>> @@ -222,6 +222,7 @@ handle_route_msg_delete_routes(const struct >>> route_table_msg *msg, void *data) >>> rr->addr = rd->rta_dst; >>> rr->plen = rd->rtm_dst_len; >>> rr->nexthop = nexthop->addr; >>> + memcpy(rr->ifname, nexthop->ifname, IFNAMSIZ); >> >> Is it guaranteed that nexthop->ifname is null-terminated? We use >> strcmp() on it in sb_sync_learned_routes() so it needs to be >> null-terminated. > > Ok, so there is no documentation that backs this up, so i went hunting > through code to find out: > * This ifname comes from route-table.c which calls if_indextoname > * if_indextoname uses the ioctl SIOCGIFNAME > * This ends up in the kernel function netdev_get_name > * That function uses strcpy to copy the string > > As strcpy relies on the string being null terminated it should be here > as well if the size matches. The size in the kernel (struct net_device) > is defined with IFNAMSIZ and as we use that here as well we should be > safe. > We could be extra careful, allocate IFNAMSIZ + 1 bytes and make sure we explicitly set (at least) the last one to 0. >> >>> } >>> return; >>> } >>> diff --git a/controller/route-exchange-netlink.h >>> b/controller/route-exchange-netlink.h >>> index 566b38fde..fca2429e6 100644 >>> --- a/controller/route-exchange-netlink.h >>> +++ b/controller/route-exchange-netlink.h >>> @@ -18,6 +18,7 @@ >>> #include <stdint.h> >>> #include "openvswitch/hmap.h" >>> #include <netinet/in.h> >>> +#include <net/if.h> >>> >>> /* This value is arbitrary but currently unused. >>> * See >>> https://github.com/iproute2/iproute2/blob/main/etc/iproute2/rt_protos >>> @@ -31,6 +32,7 @@ struct re_nl_received_route_node { >>> struct in6_addr addr; >>> unsigned int plen; >>> struct in6_addr nexthop; >>> + char ifname[IFNAMSIZ]; >>> }; >>> >>> int re_nl_create_vrf(const char *ifname, uint32_t table_id); >>> diff --git a/controller/route-exchange.c b/controller/route-exchange.c >>> index febe13a8e..77d5ce51d 100644 >>> --- a/controller/route-exchange.c >>> +++ b/controller/route-exchange.c >>> @@ -101,7 +101,7 @@ route_erase_entry(struct route_entry *route_e) >>> static void >>> sb_sync_learned_routes(const struct sbrec_datapath_binding *datapath, >>> const struct hmap *learned_routes, >>> - const struct sset *bound_ports, >>> + const struct smap *bound_ports, >>> struct ovsdb_idl *ovnsb_idl, >>> struct ovsdb_idl_txn *ovnsb_idl_txn, >>> struct ovsdb_idl_index >>> *sbrec_learned_route_by_datapath, >>> @@ -121,8 +121,9 @@ sb_sync_learned_routes(const struct >>> sbrec_datapath_binding *datapath, >>> SBREC_LEARNED_ROUTE_FOR_EACH_EQUAL (sb_route, filter, >>> sbrec_learned_route_by_datapath) { >>> /* If the port is not local we don't care about it. >>> - * Some other ovn-controller will handle it. */ >>> - if (!sset_contains(bound_ports, >>> + * Some other ovn-controller will handle it. >>> + * We may not use smap_get since the value might be validly NULL. >>> */ >>> + if (!smap_get_node(bound_ports, >>> sb_route->logical_port->logical_port)) { >>> continue; >>> } >>> @@ -142,14 +143,22 @@ sb_sync_learned_routes(const struct >>> sbrec_datapath_binding *datapath, >>> learned_route->plen); >>> char *nexthop = normalize_v46(&learned_route->nexthop); >>> >>> - const char *logical_port_name; >>> - SSET_FOR_EACH (logical_port_name, bound_ports) { >>> + struct smap_node *port_node; >>> + SMAP_FOR_EACH (port_node, bound_ports) { >>> + /* The user specified an ifname, but we learned it on a >>> different >>> + * port. */ >>> + if (port_node->value && strcmp(port_node->value, >>> + learned_route->ifname)) { >>> + continue; >>> + } >>> + >>> const struct sbrec_port_binding *logical_port = >>> lport_lookup_by_name(sbrec_port_binding_by_name, >>> - logical_port_name); >>> + port_node->key); >>> if (!logical_port) { >>> continue; >>> } >>> + >> >> Nit: unrelated, it should probably be part of an earlier patch. >> >>> route_e = route_lookup_or_add(&sync_routes, >>> datapath, >>> logical_port, ip_prefix, nexthop); >>> diff --git a/controller/route.c b/controller/route.c >>> index b3ff77b83..036a574d8 100644 >>> --- a/controller/route.c >>> +++ b/controller/route.c >>> @@ -83,7 +83,7 @@ advertise_datapath_cleanup(struct >>> advertise_datapath_entry *ad) >>> free(ar); >>> } >>> hmap_destroy(&ad->routes); >>> - sset_destroy(&ad->bound_ports); >>> + smap_destroy(&ad->bound_ports); >>> free(ad); >>> } >>> >>> @@ -91,6 +91,8 @@ void >>> route_run(struct route_ctx_in *r_ctx_in, >>> struct route_ctx_out *r_ctx_out) >>> { >>> + tracked_datapaths_destroy(r_ctx_out->tracked_re_datapaths); >>> + >> >> This doesn't look correct. It will destroy the >> r_ctx_out->tracked_re_datapaths map completely. But then below we just >> add stuff to it again (and the hmap is not properly initialized anymore). >> >> Shouldn't we just clear the tracked_route_datapaths hmap in >> en_route_run() before we call route_run()? Do we need a new >> tracked_datapaths_clear() function? >> >> Also, I guess, that should not be part of this patch but should be >> instead in the patch that adds the routes I-P node. > > Thanks a lot. Will fix and move it. > >> >>> const struct local_datapath *ld; >>> HMAP_FOR_EACH (ld, hmap_node, r_ctx_in->local_datapaths) { >>> if (!ld->n_peer_ports || ld->is_switch) { >>> @@ -102,7 +104,7 @@ route_run(struct route_ctx_in *r_ctx_in, >>> ad->key = ld->datapath->tunnel_key; >>> ad->db = ld->datapath; >>> hmap_init(&ad->routes); >>> - sset_init(&ad->bound_ports); >>> + smap_init(&ad->bound_ports); >>> >>> /* This is a LR datapath, find LRPs with route exchange options >>> * that are bound locally. */ >>> @@ -122,8 +124,12 @@ route_run(struct route_ctx_in *r_ctx_in, >>> "maintain-vrf", false); >>> ad->use_netns |= smap_get_bool(&repb->options, >>> "use-netns", false); >>> + char *ifname = nullable_xstrdup( >>> + smap_get(&repb->options, >>> + "dynamic-routing-ifname")); >>> relevant_datapath = true; >>> - sset_add(&ad->bound_ports, local_peer->logical_port); >>> + smap_add_nocopy(&ad->bound_ports, >>> + xstrdup(local_peer->logical_port), ifname); >>> } >>> >>> if (!relevant_datapath) { >>> diff --git a/controller/route.h b/controller/route.h >>> index 2a54cf3e3..4b71fa88a 100644 >>> --- a/controller/route.h >>> +++ b/controller/route.h >>> @@ -19,6 +19,7 @@ >>> #include <netinet/in.h> >>> #include "openvswitch/hmap.h" >>> #include "sset.h" >>> +#include "smap.h" >>> >>> struct hmap; >>> struct ovsdb_idl_index; >>> @@ -51,8 +52,9 @@ struct advertise_datapath_entry { >>> bool use_netns; >>> struct hmap routes; >>> /* the name of the port bindings locally bound for this datapath and >>> - * running route exchange logic. */ >>> - struct sset bound_ports; >>> + * running route exchange logic. >>> + * The key is the port name and the value is the ifname if set. */ >>> + struct smap bound_ports; >>> }; >>> >>> struct advertise_route_entry { >>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>> index 02d5ce7e1..7ad666f20 100644 >>> --- a/tests/system-ovn.at >>> +++ b/tests/system-ovn.at >>> @@ -14920,6 +14920,17 @@ AT_CHECK_UNQUOTED([ovn-sbctl --columns >>> ip_prefix,nexthop,logical_port --bare fin >>> $lp >>> ]) >>> >>> +# by setting a learning interface filter we will now forget about this >>> route >> >> Nit: comments should be sentences (for all comments in this file). >> >>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ >>> + options:dynamic-routing-ifname=thisdoesnotexist >>> +check_row_count Learned_Route 0 >>> + >>> +# chaning it to "lo" will allow us to learn the route again >> >> Typo: chaning >> >>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ >>> + options:dynamic-routing-ifname=lo >>> +check_row_count Learned_Route 1 >>> + >>> + >>> OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>> >>> as ovn-sb >>> @@ -15130,14 +15141,24 @@ check ip route add 233.252.0.0/24 via >>> 192.168.10.10 dev lo onlink vrf ovnvrf1337 >>> # for now we trigger a recompute as route watching is not yet implemented >>> check ovn-appctl -t ovn-controller inc-engine/recompute >>> check ovn-nbctl --wait=hv sync >>> -check_row_count Learned_Route 2 >>> +check_row_count Learned_Route 1 >>> lp=$(ovn-sbctl --bare --columns _uuid list port_binding internet-phys) >>> -AT_CHECK_UNQUOTED([ovn-sbctl --columns ip_prefix,nexthop,logical_port >>> --bare find Learned_Route logical_port=$lp], [0], [dnl >>> +AT_CHECK_UNQUOTED([ovn-sbctl --columns ip_prefix,nexthop,logical_port >>> --bare find Learned_Route], [0], [dnl >>> 233.252.0.0/24 >>> 192.168.10.10 >>> $lp >>> ]) >>> >>> +# by setting a learning interface filter we will now forget about this >>> route >>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ >>> + options:dynamic-routing-ifname=thisdoesnotexist >>> +check_row_count Learned_Route 0 >>> + >>> +# chaning it to "lo" will allow us to learn the route again >> >> Typo: chaning >> >>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ >>> + options:dynamic-routing-ifname=lo >>> +check_row_count Learned_Route 1 >>> + >>> as ovn-sb >>> OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> > > > Thanks a lot, > Felix Regards, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev