On 2/6/25 1:08 PM, Felix Huettner wrote: > On Thu, Feb 06, 2025 at 11:48:45AM +0100, Dumitru Ceara wrote: >> On 2/6/25 11:38 AM, Frode Nordahl wrote: >>> On Thu, Feb 6, 2025 at 11:22 AM Felix Huettner >>> <[email protected]> wrote: >>>> >>>> On Thu, Feb 06, 2025 at 10:51:29AM +0100, Dumitru Ceara wrote: >>>>> On 2/6/25 10:31 AM, Frode Nordahl wrote: >>>>>> On Tue, Feb 4, 2025 at 3:04 PM Felix Huettner via dev >>>>>> <[email protected]> wrote: >>>>>>> >>>>>>> Previously we just assumed that if a LR had multiple LRPs bound to the >>>>>>> local chassis that all routes where valid for all of these LRPs. >>>>>>> This commit handles a previous option in ovn-nb that allowes the user to >>>>>>> specify the interface name that a route needs to use to be acceptable >>>>>>> for a given LRP. >>>>>>> The users can then have a 1:1 relationship between LRPs and interfaces >>>>>>> to allow ovn-controller to determine which route belongs to which LRP. >>>>>>> >>>>>>> This can e.g. be used if a chassis has multiple uplinks. Each of these >>>>>>> uplinks could be assigned to a different LRP of the same LR connected to >>>>>>> the same localnet. >>>>>>> The option allows then to map e.g. a learned default route to each >>>>>>> individual LRP. If one of the links would go down just one of the >>>>>>> learned routes would be removed, causing ECMP across multiple chassis to >>>>>>> be correctly weighted. >>>>>>> >>>>>>> Acked-by: Dumitru Ceara <[email protected]> >>>>>>> Signed-off-by: Felix Huettner <[email protected]> >>>>>>> --- >>>> >>>> Hi Frode, Hi Dumitru, >>>> >>>>>> >>>>>> Thank you for the patch, this is indeed an important use case, I >>>>>> wonder a bit about the approach to it though. >>>>>> >>>>>> Most constructs and options currently in OVN refer to logical >>>>>> constructs that reside either in OVN or in OVS. Your proposal >>>>>> introduces an option that refers to system details, which may or may >>>>>> not be the same across chassis. >>>>>> >>>>>> Putting my feet into the shoes of a CMS developer, how would I know >>>>>> how to map between these two worlds using only OVN APIs? >>>>>> >>>>>> I assume that you are using this feature together with the LRP >>>>>> options:routing-protocol-redirect feature, which takes the name of an >>>>>> LSP. >>>>>> >>>>>> Could we instead either use that value, or have this new option take a >>>>>> LSP name which you then can map to a system interface on each chassis >>>>>> as needed in the ovn-controller code? >>>>>> >>>>> >>>>> Ah, good point, you're right Frode. dynamic-routing-ifname is the name >>>>> of a system interface and there's no guarantee that will be the same on >>>>> all chassis. Using a LSP name instead might be the best way forward >>>>> indeed. We could use >>>>> local_binding_is_up(LRP.options.dynamic-routing-ifname) to determine if >>>>> that associated LSP is bound locally or not. >>>> >>>> I don't think we can directly map from an lsp name to the interface >>>> name. >>>> Allthough the following argumentation comes from the time where >>>> the patchseries had netns support i think this would still be valid. >>>> In the netns usecase i would assume that the routing-protocol-redirect >>>> LSP is locally bound to a veth pair. One part of the pair is plugged >>>> into OVS and mapped to the LSP. >>>> The other end of the pair would be in a netns. This would also be the >>>> interface that would actually learn the routes. >>>> >>>> So the interface that is mapped to the LSP would be different from the >>>> one we learn routes on. >>>> In that case i assumed that it is easier for users to ensure that the >>>> interface name across chassis is identical based on a value they are >>>> free to choose. > > Hi Frode, Hi Dumitru, > >>> >>> Dealing with the many ways a VIF can be attached in a system is indeed >>> non-trivial. Thank you for highlighting the veth pair case, I think >>> that is indeed the most common one. It is useful also without >>> namespaces, as it would allow OVS to be supervisor of one side and the >>> VRF to be supervisor of the peer. >>> >>>> What might be an option is to use the LSP path you describe above with >>>> an optional overwrite on the local ovsdb. >>>> >>>> What do you think of that? >>> >>> For the veth pair case, from cursory view the peer ifindex is visible >>> in `ethtool -S` for both sides, even across namespaces. >>> >>> However, we don't know how people will actually wire this, so perhaps >>> your suggestion of adding some option in the local ovsdb is the best >>> course of action. >>> >>> external_id in the Open vSwitch Interfaces table? >>> >> >> I'm trying to figure out how trivial incremental processing would be in >> ovn-controller for new OVS.Interface.external_ids. >> >> Felix, would route_runtime_data_handler() be called if such a new >> external-id changes value? >> >> If it would then I assume it would return "false" (we don't yet support >> I-P for that case) and trigger a route reconciliation. So that'd be >> fine. Or am I missing something? > > I would assume the same, but honestly i would only be sure once i tried > it out. > > Another thing that came to my mind: Going this way prevents the user > from using an interface filter if there is no LSP mapping to the > interface. > I think that something like this could happen if you integrate routing > with some custom agent. Especially if such a agent does not do in-band > BGP but rather uses some outside logic to learn routes. > > In such cases it might make more sense to use the external_id column in > the OpenVswitch table. > Per default we would lookup the interface based on the locally bound > LSP. > However if we find that LSP name in the external_ids of the OpenVswitch > table then we use the interface name in there, independent if that LSP > is bound locally or even exists. > > What are your opinions on that? > > > I would try such an implementation in the next hour or so. > If i don't get it to a nice state by then i would send out v7 with the > current state of this patch. > If needed this patch could be removed from the series and would probably > not affect the others too much. >
This sounds OK to me, if Frode agrees. > Thanks a lot, > Felix > > >> >> Thanks, >> Dumitru >> >>> -- >>> Frode Nordahl >>> >>>> Thanks a lot, >>>> Felix >>>> >>>>> >>>>> Regards, >>>>> Dumitru >>>>> >>>>>> -- >>>>>> Frode Nordahl >>>>>> >>>>>>> v5->v6: >>>>>>> * addressed review comments >>>>>>> v3->v4: >>>>>>> - addressed review comments. >>>>>>> - updated commit message to be more descriptive. >>>>>>> >>>>>>> controller/route-exchange-netlink.c | 2 ++ >>>>>>> controller/route-exchange-netlink.h | 3 +++ >>>>>>> controller/route-exchange.c | 20 ++++++++++++++------ >>>>>>> controller/route.c | 12 ++++++++---- >>>>>>> controller/route.h | 6 ++++-- >>>>>>> tests/system-ovn.at | 23 ++++++++++++++++++++++- >>>>>>> 6 files changed, 53 insertions(+), 13 deletions(-) >>>>>>> >>>>>>> diff --git a/controller/route-exchange-netlink.c >>>>>>> b/controller/route-exchange-netlink.c >>>>>>> index 0969f15a1..82b87ae4e 100644 >>>>>>> --- a/controller/route-exchange-netlink.c >>>>>>> +++ b/controller/route-exchange-netlink.c >>>>>>> @@ -230,6 +230,8 @@ handle_route_msg(const struct route_table_msg *msg, >>>>>>> void *data) >>>>>>> rr->prefix = rd->rta_dst; >>>>>>> rr->plen = rd->rtm_dst_len; >>>>>>> rr->nexthop = nexthop->addr; >>>>>>> + memcpy(rr->ifname, nexthop->ifname, IFNAMSIZ); >>>>>>> + rr->ifname[IFNAMSIZ] = 0; >>>>>>> ovs_list_push_back(handle_data->learned_routes, >>>>>>> &rr->list_node); >>>>>>> } >>>>>>> return; >>>>>>> diff --git a/controller/route-exchange-netlink.h >>>>>>> b/controller/route-exchange-netlink.h >>>>>>> index 9ea2fb8cb..dee651971 100644 >>>>>>> --- a/controller/route-exchange-netlink.h >>>>>>> +++ b/controller/route-exchange-netlink.h >>>>>>> @@ -21,6 +21,7 @@ >>>>>>> #include <stdint.h> >>>>>>> #include "openvswitch/list.h" >>>>>>> #include <netinet/in.h> >>>>>>> +#include <net/if.h> >>>>>>> >>>>>>> /* This value is arbitrary but currently unused. >>>>>>> * See >>>>>>> https://github.com/iproute2/iproute2/blob/main/etc/iproute2/rt_protos */ >>>>>>> @@ -35,6 +36,8 @@ struct re_nl_received_route_node { >>>>>>> struct in6_addr prefix; >>>>>>> unsigned int plen; >>>>>>> struct in6_addr nexthop; >>>>>>> + /* Adding 1 to this to be sure we actually have a terminating '\0' >>>>>>> */ >>>>>>> + char ifname[IFNAMSIZ + 1]; >>>>>>> }; >>>>>>> >>>>>>> int re_nl_create_vrf(const char *ifname, uint32_t table_id); >>>>>>> diff --git a/controller/route-exchange.c b/controller/route-exchange.c >>>>>>> index 850316465..a1976234b 100644 >>>>>>> --- a/controller/route-exchange.c >>>>>>> +++ b/controller/route-exchange.c >>>>>>> @@ -95,7 +95,7 @@ route_lookup(struct hmap *route_map, >>>>>>> static void >>>>>>> sb_sync_learned_routes(const struct ovs_list *learned_routes, >>>>>>> const struct sbrec_datapath_binding *datapath, >>>>>>> - const struct sset *bound_ports, >>>>>>> + const struct smap *bound_ports, >>>>>>> struct ovsdb_idl_txn *ovnsb_idl_txn, >>>>>>> struct ovsdb_idl_index >>>>>>> *sbrec_port_binding_by_name, >>>>>>> struct ovsdb_idl_index >>>>>>> *sbrec_learned_route_by_datapath) >>>>>>> @@ -110,8 +110,9 @@ sb_sync_learned_routes(const struct ovs_list >>>>>>> *learned_routes, >>>>>>> SBREC_LEARNED_ROUTE_FOR_EACH_EQUAL (sb_route, filter, >>>>>>> >>>>>>> sbrec_learned_route_by_datapath) { >>>>>>> /* If the port is not local we don't care about it. >>>>>>> - * Some other ovn-controller will handle it. */ >>>>>>> - if (!sset_contains(bound_ports, >>>>>>> + * Some other ovn-controller will handle it. >>>>>>> + * We may not use smap_get since the value might be validly >>>>>>> NULL. */ >>>>>>> + if (!smap_get_node(bound_ports, >>>>>>> sb_route->logical_port->logical_port)) { >>>>>>> continue; >>>>>>> } >>>>>>> @@ -125,11 +126,18 @@ sb_sync_learned_routes(const struct ovs_list >>>>>>> *learned_routes, >>>>>>> learned_route->plen); >>>>>>> char *nexthop = normalize_v46(&learned_route->nexthop); >>>>>>> >>>>>>> - const char *logical_port_name; >>>>>>> - SSET_FOR_EACH (logical_port_name, bound_ports) { >>>>>>> + struct smap_node *port_node; >>>>>>> + SMAP_FOR_EACH (port_node, bound_ports) { >>>>>>> + /* The user specified an ifname, but we learned it on a >>>>>>> different >>>>>>> + * port. */ >>>>>>> + if (port_node->value && strcmp(port_node->value, >>>>>>> + learned_route->ifname)) { >>>>>>> + continue; >>>>>>> + } >>>>>>> + >>>>>>> const struct sbrec_port_binding *logical_port = >>>>>>> lport_lookup_by_name(sbrec_port_binding_by_name, >>>>>>> - logical_port_name); >>>>>>> + port_node->key); >>>>>>> if (!logical_port) { >>>>>>> continue; >>>>>>> } >>>>>>> diff --git a/controller/route.c b/controller/route.c >>>>>>> index 108435bad..7b8eed289 100644 >>>>>>> --- a/controller/route.c >>>>>>> +++ b/controller/route.c >>>>>>> @@ -82,7 +82,7 @@ advertise_datapath_cleanup(struct >>>>>>> advertise_datapath_entry *ad) >>>>>>> free(ar); >>>>>>> } >>>>>>> hmap_destroy(&ad->routes); >>>>>>> - sset_destroy(&ad->bound_ports); >>>>>>> + smap_destroy(&ad->bound_ports); >>>>>>> free(ad); >>>>>>> } >>>>>>> >>>>>>> @@ -114,7 +114,7 @@ route_run(struct route_ctx_in *r_ctx_in, >>>>>>> ad = xzalloc(sizeof(*ad)); >>>>>>> ad->db = ld->datapath; >>>>>>> hmap_init(&ad->routes); >>>>>>> - sset_init(&ad->bound_ports); >>>>>>> + smap_init(&ad->bound_ports); >>>>>>> >>>>>>> /* This is a LR datapath, find LRPs with route exchange options >>>>>>> * that are bound locally. */ >>>>>>> @@ -132,10 +132,14 @@ route_run(struct route_ctx_in *r_ctx_in, >>>>>>> >>>>>>> ad->maintain_vrf |= smap_get_bool( >>>>>>> &repb->options, "dynamic-routing-maintain-vrf", false); >>>>>>> - sset_add(&ad->bound_ports, local_peer->logical_port); >>>>>>> + char *ifname = nullable_xstrdup( >>>>>>> + smap_get(&repb->options, >>>>>>> + >>>>>>> "dynamic-routing-ifname")); >>>>>>> + smap_add_nocopy(&ad->bound_ports, >>>>>>> + xstrdup(local_peer->logical_port), ifname); >>>>>>> } >>>>>>> >>>>>>> - if (sset_is_empty(&ad->bound_ports)) { >>>>>>> + if (smap_is_empty(&ad->bound_ports)) { >>>>>>> advertise_datapath_cleanup(ad); >>>>>>> continue; >>>>>>> } >>>>>>> diff --git a/controller/route.h b/controller/route.h >>>>>>> index 10b9434fc..c53134559 100644 >>>>>>> --- a/controller/route.h >>>>>>> +++ b/controller/route.h >>>>>>> @@ -22,6 +22,7 @@ >>>>>>> #include <netinet/in.h> >>>>>>> #include "openvswitch/hmap.h" >>>>>>> #include "sset.h" >>>>>>> +#include "smap.h" >>>>>>> >>>>>>> struct hmap; >>>>>>> struct ovsdb_idl_index; >>>>>>> @@ -50,8 +51,9 @@ struct advertise_datapath_entry { >>>>>>> struct hmap routes; >>>>>>> >>>>>>> /* The name of the port bindings locally bound for this datapath >>>>>>> and >>>>>>> - * running route exchange logic. */ >>>>>>> - struct sset bound_ports; >>>>>>> + * running route exchange logic. >>>>>>> + * The key is the port name and the value is the ifname if set. */ >>>>>>> + struct smap bound_ports; >>>>>>> }; >>>>>>> >>>>>>> struct advertise_route_entry { >>>>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>>>>> index afe0ee902..f41a23115 100644 >>>>>>> --- a/tests/system-ovn.at >>>>>>> +++ b/tests/system-ovn.at >>>>>>> @@ -15660,6 +15660,17 @@ check_row_count Learned_Route 1 >>>>>>> lp=$(fetch_column port_binding _uuid logical_port=internet-phys) >>>>>>> check_row_count Learned_Route 1 logical_port=$lp >>>>>>> ip_prefix=233.252.0.0/24 nexthop=192.168.10.10 >>>>>>> >>>>>>> +# By setting a learning interface filter we will now forget about this >>>>>>> route. >>>>>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ >>>>>>> + options:dynamic-routing-ifname=thisdoesnotexist >>>>>>> +check_row_count Learned_Route 0 >>>>>>> + >>>>>>> +# Changing it to "lo" will allow us to learn the route again. >>>>>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ >>>>>>> + options:dynamic-routing-ifname=lo >>>>>>> +check_row_count Learned_Route 1 >>>>>>> + >>>>>>> + >>>>>>> OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>>>> >>>>>>> as ovn-sb >>>>>>> @@ -15869,10 +15880,20 @@ check ip route add 233.252.0.0/24 via >>>>>>> 192.168.10.10 dev lo onlink vrf ovnvrf1337 >>>>>>> # For now we trigger a recompute as route watching is not yet >>>>>>> implemented. >>>>>>> check ovn-appctl -t ovn-controller inc-engine/recompute >>>>>>> check ovn-nbctl --wait=hv sync >>>>>>> -check_row_count Learned_Route 2 >>>>>>> +check_row_count Learned_Route 1 >>>>>>> lp=$(fetch_column port_binding _uuid logical_port=internet-phys) >>>>>>> check_row_count Learned_Route 1 logical_port=$lp >>>>>>> ip_prefix=233.252.0.0/24 nexthop=192.168.10.10 >>>>>>> >>>>>>> +# By setting a learning interface filter we will now forget about this >>>>>>> route. >>>>>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ >>>>>>> + options:dynamic-routing-ifname=thisdoesnotexist >>>>>>> +check_row_count Learned_Route 0 >>>>>>> + >>>>>>> +# Changing it to "lo" will allow us to learn the route again. >>>>>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \ >>>>>>> + options:dynamic-routing-ifname=lo >>>>>>> +check_row_count Learned_Route 1 >>>>>>> + >>>>>>> as ovn-sb >>>>>>> OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>> >>>>>>> -- >>>>>>> 2.47.1 >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> dev mailing list >>>>>>> [email protected] >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>> >>>>> >>> >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
