On 2/6/25 11:38 AM, Frode Nordahl wrote: > On Thu, Feb 6, 2025 at 11:22 AM Felix Huettner > <[email protected]> wrote: >> >> On Thu, Feb 06, 2025 at 10:51:29AM +0100, Dumitru Ceara wrote: >>> On 2/6/25 10:31 AM, Frode Nordahl wrote: >>>> On Tue, Feb 4, 2025 at 3:04 PM Felix Huettner via dev >>>> <[email protected]> wrote: >>>>> >>>>> Previously we just assumed that if a LR had multiple LRPs bound to the >>>>> local chassis that all routes where valid for all of these LRPs. >>>>> This commit handles a previous option in ovn-nb that allowes the user to >>>>> specify the interface name that a route needs to use to be acceptable >>>>> for a given LRP. >>>>> The users can then have a 1:1 relationship between LRPs and interfaces >>>>> to allow ovn-controller to determine which route belongs to which LRP. >>>>> >>>>> This can e.g. be used if a chassis has multiple uplinks. Each of these >>>>> uplinks could be assigned to a different LRP of the same LR connected to >>>>> the same localnet. >>>>> The option allows then to map e.g. a learned default route to each >>>>> individual LRP. If one of the links would go down just one of the >>>>> learned routes would be removed, causing ECMP across multiple chassis to >>>>> be correctly weighted. >>>>> >>>>> Acked-by: Dumitru Ceara <[email protected]> >>>>> Signed-off-by: Felix Huettner <[email protected]> >>>>> --- >> >> Hi Frode, Hi Dumitru, >> >>>> >>>> Thank you for the patch, this is indeed an important use case, I >>>> wonder a bit about the approach to it though. >>>> >>>> Most constructs and options currently in OVN refer to logical >>>> constructs that reside either in OVN or in OVS. Your proposal >>>> introduces an option that refers to system details, which may or may >>>> not be the same across chassis. >>>> >>>> Putting my feet into the shoes of a CMS developer, how would I know >>>> how to map between these two worlds using only OVN APIs? >>>> >>>> I assume that you are using this feature together with the LRP >>>> options:routing-protocol-redirect feature, which takes the name of an >>>> LSP. >>>> >>>> Could we instead either use that value, or have this new option take a >>>> LSP name which you then can map to a system interface on each chassis >>>> as needed in the ovn-controller code? >>>> >>> >>> Ah, good point, you're right Frode. dynamic-routing-ifname is the name >>> of a system interface and there's no guarantee that will be the same on >>> all chassis. Using a LSP name instead might be the best way forward >>> indeed. We could use >>> local_binding_is_up(LRP.options.dynamic-routing-ifname) to determine if >>> that associated LSP is bound locally or not. >> >> I don't think we can directly map from an lsp name to the interface >> name. >> Allthough the following argumentation comes from the time where >> the patchseries had netns support i think this would still be valid. >> In the netns usecase i would assume that the routing-protocol-redirect >> LSP is locally bound to a veth pair. One part of the pair is plugged >> into OVS and mapped to the LSP. >> The other end of the pair would be in a netns. This would also be the >> interface that would actually learn the routes. >> >> So the interface that is mapped to the LSP would be different from the >> one we learn routes on. >> In that case i assumed that it is easier for users to ensure that the >> interface name across chassis is identical based on a value they are >> free to choose. > > Dealing with the many ways a VIF can be attached in a system is indeed > non-trivial. Thank you for highlighting the veth pair case, I think > that is indeed the most common one. It is useful also without > namespaces, as it would allow OVS to be supervisor of one side and the > VRF to be supervisor of the peer. > >> What might be an option is to use the LSP path you describe above with >> an optional overwrite on the local ovsdb. >> >> What do you think of that? > > For the veth pair case, from cursory view the peer ifindex is visible > in `ethtool -S` for both sides, even across namespaces. > > However, we don't know how people will actually wire this, so perhaps > your suggestion of adding some option in the local ovsdb is the best > course of action. > > external_id in the Open vSwitch Interfaces table? >
I'm trying to figure out how trivial incremental processing would be in ovn-controller for new OVS.Interface.external_ids. Felix, would route_runtime_data_handler() be called if such a new external-id changes value? If it would then I assume it would return "false" (we don't yet support I-P for that case) and trigger a route reconciliation. So that'd be fine. Or am I missing something? Thanks, Dumitru > -- > Frode Nordahl > >> Thanks a lot, >> Felix >> >>> >>> Regards, >>> Dumitru >>> >>>> -- >>>> Frode Nordahl >>>> >>>>> v5->v6: >>>>> * addressed review comments >>>>> v3->v4: >>>>> - addressed review comments. >>>>> - updated commit message to be more descriptive. >>>>> >>>>> controller/route-exchange-netlink.c | 2 ++ >>>>> controller/route-exchange-netlink.h | 3 +++ >>>>> controller/route-exchange.c | 20 ++++++++++++++------ >>>>> controller/route.c | 12 ++++++++---- >>>>> controller/route.h | 6 ++++-- >>>>> tests/system-ovn.at | 23 ++++++++++++++++++++++- >>>>> 6 files changed, 53 insertions(+), 13 deletions(-) >>>>> >>>>> diff --git a/controller/route-exchange-netlink.c >>>>> b/controller/route-exchange-netlink.c >>>>> index 0969f15a1..82b87ae4e 100644 >>>>> --- a/controller/route-exchange-netlink.c >>>>> +++ b/controller/route-exchange-netlink.c >>>>> @@ -230,6 +230,8 @@ handle_route_msg(const struct route_table_msg *msg, >>>>> void *data) >>>>> rr->prefix = rd->rta_dst; >>>>> rr->plen = rd->rtm_dst_len; >>>>> rr->nexthop = nexthop->addr; >>>>> + memcpy(rr->ifname, nexthop->ifname, IFNAMSIZ); >>>>> + rr->ifname[IFNAMSIZ] = 0; >>>>> ovs_list_push_back(handle_data->learned_routes, >>>>> &rr->list_node); >>>>> } >>>>> return; >>>>> diff --git a/controller/route-exchange-netlink.h >>>>> b/controller/route-exchange-netlink.h >>>>> index 9ea2fb8cb..dee651971 100644 >>>>> --- a/controller/route-exchange-netlink.h >>>>> +++ b/controller/route-exchange-netlink.h >>>>> @@ -21,6 +21,7 @@ >>>>> #include <stdint.h> >>>>> #include "openvswitch/list.h" >>>>> #include <netinet/in.h> >>>>> +#include <net/if.h> >>>>> >>>>> /* This value is arbitrary but currently unused. >>>>> * See >>>>> https://github.com/iproute2/iproute2/blob/main/etc/iproute2/rt_protos */ >>>>> @@ -35,6 +36,8 @@ struct re_nl_received_route_node { >>>>> struct in6_addr prefix; >>>>> unsigned int plen; >>>>> struct in6_addr nexthop; >>>>> + /* Adding 1 to this to be sure we actually have a terminating '\0' */ >>>>> + char ifname[IFNAMSIZ + 1]; >>>>> }; >>>>> >>>>> int re_nl_create_vrf(const char *ifname, uint32_t table_id); >>>>> diff --git a/controller/route-exchange.c b/controller/route-exchange.c >>>>> index 850316465..a1976234b 100644 >>>>> --- a/controller/route-exchange.c >>>>> +++ b/controller/route-exchange.c >>>>> @@ -95,7 +95,7 @@ route_lookup(struct hmap *route_map, >>>>> static void >>>>> sb_sync_learned_routes(const struct ovs_list *learned_routes, >>>>> const struct sbrec_datapath_binding *datapath, >>>>> - const struct sset *bound_ports, >>>>> + const struct smap *bound_ports, >>>>> struct ovsdb_idl_txn *ovnsb_idl_txn, >>>>> struct ovsdb_idl_index >>>>> *sbrec_port_binding_by_name, >>>>> struct ovsdb_idl_index >>>>> *sbrec_learned_route_by_datapath) >>>>> @@ -110,8 +110,9 @@ sb_sync_learned_routes(const struct ovs_list >>>>> *learned_routes, >>>>> SBREC_LEARNED_ROUTE_FOR_EACH_EQUAL (sb_route, filter, >>>>> sbrec_learned_route_by_datapath) >>>>> { >>>>> /* If the port is not local we don't care about it. >>>>> - * Some other ovn-controller will handle it. */ >>>>> - if (!sset_contains(bound_ports, >>>>> + * Some other ovn-controller will handle it. >>>>> + * We may not use smap_get since the value might be validly >>>>> NULL. */ >>>>> + if (!smap_get_node(bound_ports, >>>>> sb_route->logical_port->logical_port)) { >>>>> continue; >>>>> } >>>>> @@ -125,11 +126,18 @@ sb_sync_learned_routes(const struct ovs_list >>>>> *learned_routes, >>>>> learned_route->plen); >>>>> char *nexthop = normalize_v46(&learned_route->nexthop); >>>>> >>>>> - const char *logical_port_name; >>>>> - SSET_FOR_EACH (logical_port_name, bound_ports) { >>>>> + struct smap_node *port_node; >>>>> + SMAP_FOR_EACH (port_node, bound_ports) { >>>>> + /* The user specified an ifname, but we learned it on a >>>>> different >>>>> + * port. */ >>>>> + if (port_node->value && strcmp(port_node->value, >>>>> + learned_route->ifname)) { >>>>> + continue; >>>>> + } >>>>> + >>>>> const struct sbrec_port_binding *logical_port = >>>>> lport_lookup_by_name(sbrec_port_binding_by_name, >>>>> - logical_port_name); >>>>> + port_node->key); >>>>> if (!logical_port) { >>>>> continue; >>>>> } >>>>> diff --git a/controller/route.c b/controller/route.c >>>>> index 108435bad..7b8eed289 100644 >>>>> --- a/controller/route.c >>>>> +++ b/controller/route.c >>>>> @@ -82,7 +82,7 @@ advertise_datapath_cleanup(struct >>>>> advertise_datapath_entry *ad) >>>>> free(ar); >>>>> } >>>>> hmap_destroy(&ad->routes); >>>>> - sset_destroy(&ad->bound_ports); >>>>> + smap_destroy(&ad->bound_ports); >>>>> free(ad); >>>>> } >>>>> >>>>> @@ -114,7 +114,7 @@ route_run(struct route_ctx_in *r_ctx_in, >>>>> ad = xzalloc(sizeof(*ad)); >>>>> ad->db = ld->datapath; >>>>> hmap_init(&ad->routes); >>>>> - sset_init(&ad->bound_ports); >>>>> + smap_init(&ad->bound_ports); >>>>> >>>>> /* This is a LR datapath, find LRPs with route exchange options >>>>> * that are bound locally. */ >>>>> @@ -132,10 +132,14 @@ route_run(struct route_ctx_in *r_ctx_in, >>>>> >>>>> ad->maintain_vrf |= smap_get_bool( >>>>> &repb->options, "dynamic-routing-maintain-vrf", false); >>>>> - sset_add(&ad->bound_ports, local_peer->logical_port); >>>>> + char *ifname = nullable_xstrdup( >>>>> + smap_get(&repb->options, >>>>> + "dynamic-routing-ifname")); >>>>> + smap_add_nocopy(&ad->bound_ports, >>>>> + xstrdup(local_peer->logical_port), ifname); >>>>> } >>>>> >>>>> - if (sset_is_empty(&ad->bound_ports)) { >>>>> + if (smap_is_empty(&ad->bound_ports)) { >>>>> advertise_datapath_cleanup(ad); >>>>> continue; >>>>> } >>>>> diff --git a/controller/route.h b/controller/route.h >>>>> index 10b9434fc..c53134559 100644 >>>>> --- a/controller/route.h >>>>> +++ b/controller/route.h >>>>> @@ -22,6 +22,7 @@ >>>>> #include <netinet/in.h> >>>>> #include "openvswitch/hmap.h" >>>>> #include "sset.h" >>>>> +#include "smap.h" >>>>> >>>>> struct hmap; >>>>> struct ovsdb_idl_index; >>>>> @@ -50,8 +51,9 @@ struct advertise_datapath_entry { >>>>> struct hmap routes; >>>>> >>>>> /* The name of the port bindings locally bound for this datapath and >>>>> - * running route exchange logic. */ >>>>> - struct sset bound_ports; >>>>> + * running route exchange logic. >>>>> + * The key is the port name and the value is the ifname if set. */ >>>>> + struct smap bound_ports; >>>>> }; >>>>> >>>>> struct advertise_route_entry { >>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>>> index afe0ee902..f41a23115 100644 >>>>> --- a/tests/system-ovn.at >>>>> +++ b/tests/system-ovn.at >>>>> @@ -15660,6 +15660,17 @@ check_row_count Learned_Route 1 >>>>> lp=$(fetch_column port_binding _uuid logical_port=internet-phys) >>>>> check_row_count Learned_Route 1 logical_port=$lp >>>>> ip_prefix=233.252.0.0/24 nexthop=192.168.10.10 >>>>> >>>>> +# By setting a learning interface filter we will now forget about this >>>>> route. >>>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ >>>>> + options:dynamic-routing-ifname=thisdoesnotexist >>>>> +check_row_count Learned_Route 0 >>>>> + >>>>> +# Changing it to "lo" will allow us to learn the route again. >>>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ >>>>> + options:dynamic-routing-ifname=lo >>>>> +check_row_count Learned_Route 1 >>>>> + >>>>> + >>>>> OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>> >>>>> as ovn-sb >>>>> @@ -15869,10 +15880,20 @@ check ip route add 233.252.0.0/24 via >>>>> 192.168.10.10 dev lo onlink vrf ovnvrf1337 >>>>> # For now we trigger a recompute as route watching is not yet >>>>> implemented. >>>>> check ovn-appctl -t ovn-controller inc-engine/recompute >>>>> check ovn-nbctl --wait=hv sync >>>>> -check_row_count Learned_Route 2 >>>>> +check_row_count Learned_Route 1 >>>>> lp=$(fetch_column port_binding _uuid logical_port=internet-phys) >>>>> check_row_count Learned_Route 1 logical_port=$lp >>>>> ip_prefix=233.252.0.0/24 nexthop=192.168.10.10 >>>>> >>>>> +# By setting a learning interface filter we will now forget about this >>>>> route. >>>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ >>>>> + options:dynamic-routing-ifname=thisdoesnotexist >>>>> +check_row_count Learned_Route 0 >>>>> + >>>>> +# Changing it to "lo" will allow us to learn the route again. >>>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ >>>>> + options:dynamic-routing-ifname=lo >>>>> +check_row_count Learned_Route 1 >>>>> + >>>>> as ovn-sb >>>>> OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>> >>>>> -- >>>>> 2.47.1 >>>>> >>>>> >>>>> _______________________________________________ >>>>> dev mailing list >>>>> [email protected] >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> >>> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
