On Tue, Jan 28, 2025 at 01:42:01PM +0100, Dumitru Ceara wrote: > On 1/28/25 1:13 PM, Felix Huettner wrote: > > On Thu, Jan 16, 2025 at 01:05:42PM +0100, Dumitru Ceara wrote: > >> Hi Felix, > > > > Hi Dumitru, > > > > thanks for the review. > > > > Hi Felix,
Hi Dumitru, > > >> > >> On 1/2/25 4:19 PM, Felix Huettner via dev wrote: > >>> Previously we just assumed that if a LR had multiple LRPs bound to the > >>> local chassis that all routes where valid for all of these LRPs. > >>> This commit handles a previous option in ovn-nb that allowes the user to > >>> specify the interface name that a route needs to use to be acceptable > >>> for a given LRP. > >>> The users can then have a 1:1 relationship between LRPs and interfaces > >>> to allow ovn-controller to determine which route belongs to which LRP. > >> > >> I'm a bit confused by the dynamic-routing-ifname option. If I > >> understand correctly, this is used to filter what routes we inject in > >> the SB for a given LRP, i.e., only if the LRP.dynamic-routing-ifname > >> matches the interface the route was learnt in the VRF where the routing > >> control plane (FRR) runs. > >> > >> But, in which case will FRR learn routes on more than one interface (the > >> one on which we redirected BGP control traffic)? > >> > >> I can imagine a scenario in which FRR gets its control plane traffic > >> (e.g., BGP packets) in a different way (e.g., before that traffic gets > >> sent to br-int), potentially on different interfaces but I thought > >> that's not the use case we're targeting in the OVN BGP integration. > >> > >> It might be a good idea to add some examples to the NB documentation in > >> the patch where we introduce this option. > > > > I did add more docs here > > https://patchwork.ozlabs.org/project/ovn/patch/0236ff31f79abbf95b07a6a247d42fcfc348cad3.1737472971.git.felix.huettner@stackit.cloud/ > > So i hope this helps. I'll also update the commit message to make this > > clearer. > > > > The setup i had in mind was a LR with 5 LRPs. > > LRP 1 goes on to some internal LS with VIFs or so. > > LRP 2-5 are bound to the same LS that has a localnet port. > > LRP 2-3 are bound to one chassis. > > LRP 4-5 are bound to another chassis. > > Each of LRP 2-5 represents a single uplink from the chassis it is bound > > on. So each of the two chassis has two separate uplinks. > > On these separate uplinks you can then run separate BGP sessions using > > routing-protocol-redirect. Having different MACs and IPs ensures that > > this actually works, allthough it looks really hacky. > > > > Oh, so in the LR ovnvrf you'll have 2 different interfaces that are > bound to two different LSPs that correspond to LRP2,3 (or LRP 4,5) > routing-protocol-redirect values. > > OK, I guess that would work. > > Do we have a system test for this kind of deployment? (even if we only > cover one chassis). No not yet. I would add it to the multinode tests that i still need to create. I think there it fits better than here in the ovn-system. > > > > > When sending now traffic from something behind LRP 1 to the outside > > there is a 1:4 chance for each of LRP 2-5 behing used. > > > > The benefit of learning e.g. a default route individually on each of the > > LRPs is if e.g. one of the uplinks now fails. > > If e.g. LRP 2 fails then the route gets removed and we then have a 1:3 > > chance for each of LRP 3-5 for egress traffic. That means that one > > chassis will receive half of the traffic than the other one for > > forwarding. > > This exactly matches the available uplinks. > > > > If we would bond the two uplinks on each chassis, or if we would just > > learn the routes on both LRPs then traffic would still hit both chassis > > in the same ratio. As one chassis actually has less bandwidth available > > we would utilize it more than we actually should. > > > > I hope that clarifies what i had in mind. > > > > I think it does, thanks! > > > > >> > >>> > >>> Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud> > >>> --- > >>> controller/route-exchange-netlink.c | 1 + > >>> controller/route-exchange-netlink.h | 2 ++ > >>> controller/route-exchange.c | 21 +++++++++++++++------ > >>> controller/route.c | 12 +++++++++--- > >>> controller/route.h | 6 ++++-- > >>> tests/system-ovn.at | 25 +++++++++++++++++++++++-- > >>> 6 files changed, 54 insertions(+), 13 deletions(-) > >>> > >>> diff --git a/controller/route-exchange-netlink.c > >>> b/controller/route-exchange-netlink.c > >>> index c2a7551f3..9fea0d955 100644 > >>> --- a/controller/route-exchange-netlink.c > >>> +++ b/controller/route-exchange-netlink.c > >>> @@ -222,6 +222,7 @@ handle_route_msg_delete_routes(const struct > >>> route_table_msg *msg, void *data) > >>> rr->addr = rd->rta_dst; > >>> rr->plen = rd->rtm_dst_len; > >>> rr->nexthop = nexthop->addr; > >>> + memcpy(rr->ifname, nexthop->ifname, IFNAMSIZ); > >> > >> Is it guaranteed that nexthop->ifname is null-terminated? We use > >> strcmp() on it in sb_sync_learned_routes() so it needs to be > >> null-terminated. > > > > Ok, so there is no documentation that backs this up, so i went hunting > > through code to find out: > > * This ifname comes from route-table.c which calls if_indextoname > > * if_indextoname uses the ioctl SIOCGIFNAME > > * This ends up in the kernel function netdev_get_name > > * That function uses strcpy to copy the string > > > > As strcpy relies on the string being null terminated it should be here > > as well if the size matches. The size in the kernel (struct net_device) > > is defined with IFNAMSIZ and as we use that here as well we should be > > safe. > > > > We could be extra careful, allocate IFNAMSIZ + 1 bytes and make sure we > explicitly set (at least) the last one to 0. I'll add +1 to the ifname size. The xzalloc and limitet memcpy should ensure we always have the last byte as zero. Thanks a lot, Felix > > >> > >>> } > >>> return; > >>> } > >>> diff --git a/controller/route-exchange-netlink.h > >>> b/controller/route-exchange-netlink.h > >>> index 566b38fde..fca2429e6 100644 > >>> --- a/controller/route-exchange-netlink.h > >>> +++ b/controller/route-exchange-netlink.h > >>> @@ -18,6 +18,7 @@ > >>> #include <stdint.h> > >>> #include "openvswitch/hmap.h" > >>> #include <netinet/in.h> > >>> +#include <net/if.h> > >>> > >>> /* This value is arbitrary but currently unused. > >>> * See > >>> https://github.com/iproute2/iproute2/blob/main/etc/iproute2/rt_protos > >>> @@ -31,6 +32,7 @@ struct re_nl_received_route_node { > >>> struct in6_addr addr; > >>> unsigned int plen; > >>> struct in6_addr nexthop; > >>> + char ifname[IFNAMSIZ]; > >>> }; > >>> > >>> int re_nl_create_vrf(const char *ifname, uint32_t table_id); > >>> diff --git a/controller/route-exchange.c b/controller/route-exchange.c > >>> index febe13a8e..77d5ce51d 100644 > >>> --- a/controller/route-exchange.c > >>> +++ b/controller/route-exchange.c > >>> @@ -101,7 +101,7 @@ route_erase_entry(struct route_entry *route_e) > >>> static void > >>> sb_sync_learned_routes(const struct sbrec_datapath_binding *datapath, > >>> const struct hmap *learned_routes, > >>> - const struct sset *bound_ports, > >>> + const struct smap *bound_ports, > >>> struct ovsdb_idl *ovnsb_idl, > >>> struct ovsdb_idl_txn *ovnsb_idl_txn, > >>> struct ovsdb_idl_index > >>> *sbrec_learned_route_by_datapath, > >>> @@ -121,8 +121,9 @@ sb_sync_learned_routes(const struct > >>> sbrec_datapath_binding *datapath, > >>> SBREC_LEARNED_ROUTE_FOR_EACH_EQUAL (sb_route, filter, > >>> sbrec_learned_route_by_datapath) > >>> { > >>> /* If the port is not local we don't care about it. > >>> - * Some other ovn-controller will handle it. */ > >>> - if (!sset_contains(bound_ports, > >>> + * Some other ovn-controller will handle it. > >>> + * We may not use smap_get since the value might be validly > >>> NULL. */ > >>> + if (!smap_get_node(bound_ports, > >>> sb_route->logical_port->logical_port)) { > >>> continue; > >>> } > >>> @@ -142,14 +143,22 @@ sb_sync_learned_routes(const struct > >>> sbrec_datapath_binding *datapath, > >>> learned_route->plen); > >>> char *nexthop = normalize_v46(&learned_route->nexthop); > >>> > >>> - const char *logical_port_name; > >>> - SSET_FOR_EACH (logical_port_name, bound_ports) { > >>> + struct smap_node *port_node; > >>> + SMAP_FOR_EACH (port_node, bound_ports) { > >>> + /* The user specified an ifname, but we learned it on a > >>> different > >>> + * port. */ > >>> + if (port_node->value && strcmp(port_node->value, > >>> + learned_route->ifname)) { > >>> + continue; > >>> + } > >>> + > >>> const struct sbrec_port_binding *logical_port = > >>> lport_lookup_by_name(sbrec_port_binding_by_name, > >>> - logical_port_name); > >>> + port_node->key); > >>> if (!logical_port) { > >>> continue; > >>> } > >>> + > >> > >> Nit: unrelated, it should probably be part of an earlier patch. > >> > >>> route_e = route_lookup_or_add(&sync_routes, > >>> datapath, > >>> logical_port, ip_prefix, nexthop); > >>> diff --git a/controller/route.c b/controller/route.c > >>> index b3ff77b83..036a574d8 100644 > >>> --- a/controller/route.c > >>> +++ b/controller/route.c > >>> @@ -83,7 +83,7 @@ advertise_datapath_cleanup(struct > >>> advertise_datapath_entry *ad) > >>> free(ar); > >>> } > >>> hmap_destroy(&ad->routes); > >>> - sset_destroy(&ad->bound_ports); > >>> + smap_destroy(&ad->bound_ports); > >>> free(ad); > >>> } > >>> > >>> @@ -91,6 +91,8 @@ void > >>> route_run(struct route_ctx_in *r_ctx_in, > >>> struct route_ctx_out *r_ctx_out) > >>> { > >>> + tracked_datapaths_destroy(r_ctx_out->tracked_re_datapaths); > >>> + > >> > >> This doesn't look correct. It will destroy the > >> r_ctx_out->tracked_re_datapaths map completely. But then below we just > >> add stuff to it again (and the hmap is not properly initialized anymore). > >> > >> Shouldn't we just clear the tracked_route_datapaths hmap in > >> en_route_run() before we call route_run()? Do we need a new > >> tracked_datapaths_clear() function? > >> > >> Also, I guess, that should not be part of this patch but should be > >> instead in the patch that adds the routes I-P node. > > > > Thanks a lot. Will fix and move it. > > > >> > >>> const struct local_datapath *ld; > >>> HMAP_FOR_EACH (ld, hmap_node, r_ctx_in->local_datapaths) { > >>> if (!ld->n_peer_ports || ld->is_switch) { > >>> @@ -102,7 +104,7 @@ route_run(struct route_ctx_in *r_ctx_in, > >>> ad->key = ld->datapath->tunnel_key; > >>> ad->db = ld->datapath; > >>> hmap_init(&ad->routes); > >>> - sset_init(&ad->bound_ports); > >>> + smap_init(&ad->bound_ports); > >>> > >>> /* This is a LR datapath, find LRPs with route exchange options > >>> * that are bound locally. */ > >>> @@ -122,8 +124,12 @@ route_run(struct route_ctx_in *r_ctx_in, > >>> "maintain-vrf", false); > >>> ad->use_netns |= smap_get_bool(&repb->options, > >>> "use-netns", false); > >>> + char *ifname = nullable_xstrdup( > >>> + smap_get(&repb->options, > >>> + "dynamic-routing-ifname")); > >>> relevant_datapath = true; > >>> - sset_add(&ad->bound_ports, local_peer->logical_port); > >>> + smap_add_nocopy(&ad->bound_ports, > >>> + xstrdup(local_peer->logical_port), ifname); > >>> } > >>> > >>> if (!relevant_datapath) { > >>> diff --git a/controller/route.h b/controller/route.h > >>> index 2a54cf3e3..4b71fa88a 100644 > >>> --- a/controller/route.h > >>> +++ b/controller/route.h > >>> @@ -19,6 +19,7 @@ > >>> #include <netinet/in.h> > >>> #include "openvswitch/hmap.h" > >>> #include "sset.h" > >>> +#include "smap.h" > >>> > >>> struct hmap; > >>> struct ovsdb_idl_index; > >>> @@ -51,8 +52,9 @@ struct advertise_datapath_entry { > >>> bool use_netns; > >>> struct hmap routes; > >>> /* the name of the port bindings locally bound for this datapath and > >>> - * running route exchange logic. */ > >>> - struct sset bound_ports; > >>> + * running route exchange logic. > >>> + * The key is the port name and the value is the ifname if set. */ > >>> + struct smap bound_ports; > >>> }; > >>> > >>> struct advertise_route_entry { > >>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at > >>> index 02d5ce7e1..7ad666f20 100644 > >>> --- a/tests/system-ovn.at > >>> +++ b/tests/system-ovn.at > >>> @@ -14920,6 +14920,17 @@ AT_CHECK_UNQUOTED([ovn-sbctl --columns > >>> ip_prefix,nexthop,logical_port --bare fin > >>> $lp > >>> ]) > >>> > >>> +# by setting a learning interface filter we will now forget about this > >>> route > >> > >> Nit: comments should be sentences (for all comments in this file). > >> > >>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ > >>> + options:dynamic-routing-ifname=thisdoesnotexist > >>> +check_row_count Learned_Route 0 > >>> + > >>> +# chaning it to "lo" will allow us to learn the route again > >> > >> Typo: chaning > >> > >>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ > >>> + options:dynamic-routing-ifname=lo > >>> +check_row_count Learned_Route 1 > >>> + > >>> + > >>> OVS_APP_EXIT_AND_WAIT([ovn-controller]) > >>> > >>> as ovn-sb > >>> @@ -15130,14 +15141,24 @@ check ip route add 233.252.0.0/24 via > >>> 192.168.10.10 dev lo onlink vrf ovnvrf1337 > >>> # for now we trigger a recompute as route watching is not yet implemented > >>> check ovn-appctl -t ovn-controller inc-engine/recompute > >>> check ovn-nbctl --wait=hv sync > >>> -check_row_count Learned_Route 2 > >>> +check_row_count Learned_Route 1 > >>> lp=$(ovn-sbctl --bare --columns _uuid list port_binding internet-phys) > >>> -AT_CHECK_UNQUOTED([ovn-sbctl --columns ip_prefix,nexthop,logical_port > >>> --bare find Learned_Route logical_port=$lp], [0], [dnl > >>> +AT_CHECK_UNQUOTED([ovn-sbctl --columns ip_prefix,nexthop,logical_port > >>> --bare find Learned_Route], [0], [dnl > >>> 233.252.0.0/24 > >>> 192.168.10.10 > >>> $lp > >>> ]) > >>> > >>> +# by setting a learning interface filter we will now forget about this > >>> route > >>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ > >>> + options:dynamic-routing-ifname=thisdoesnotexist > >>> +check_row_count Learned_Route 0 > >>> + > >>> +# chaning it to "lo" will allow us to learn the route again > >> > >> Typo: chaning > >> > >>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ > >>> + options:dynamic-routing-ifname=lo > >>> +check_row_count Learned_Route 1 > >>> + > >>> as ovn-sb > >>> OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > >>> > > > > > > Thanks a lot, > > Felix > > Regards, > Dumitru > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev