On Thu, Feb 06, 2025 at 11:48:45AM +0100, Dumitru Ceara wrote:
> On 2/6/25 11:38 AM, Frode Nordahl wrote:
> > On Thu, Feb 6, 2025 at 11:22 AM Felix Huettner
> > <[email protected]> wrote:
> >>
> >> On Thu, Feb 06, 2025 at 10:51:29AM +0100, Dumitru Ceara wrote:
> >>> On 2/6/25 10:31 AM, Frode Nordahl wrote:
> >>>> On Tue, Feb 4, 2025 at 3:04 PM Felix Huettner via dev
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> Previously we just assumed that if a LR had multiple LRPs bound to the
> >>>>> local chassis that all routes where valid for all of these LRPs.
> >>>>> This commit handles a previous option in ovn-nb that allowes the user to
> >>>>> specify the interface name that a route needs to use to be acceptable
> >>>>> for a given LRP.
> >>>>> The users can then have a 1:1 relationship between LRPs and interfaces
> >>>>> to allow ovn-controller to determine which route belongs to which LRP.
> >>>>>
> >>>>> This can e.g. be used if a chassis has multiple uplinks. Each of these
> >>>>> uplinks could be assigned to a different LRP of the same LR connected to
> >>>>> the same localnet.
> >>>>> The option allows then to map e.g. a learned default route to each
> >>>>> individual LRP. If one of the links would go down just one of the
> >>>>> learned routes would be removed, causing ECMP across multiple chassis to
> >>>>> be correctly weighted.
> >>>>>
> >>>>> Acked-by: Dumitru Ceara <[email protected]>
> >>>>> Signed-off-by: Felix Huettner <[email protected]>
> >>>>> ---
> >>
> >> Hi Frode, Hi Dumitru,
> >>
> >>>>
> >>>> Thank you for the patch, this is indeed an important use case, I
> >>>> wonder a bit about the approach to it though.
> >>>>
> >>>> Most constructs and options currently in OVN refer to logical
> >>>> constructs that reside either in OVN or in OVS. Your proposal
> >>>> introduces an option that refers to system details, which may or may
> >>>> not be the same across chassis.
> >>>>
> >>>> Putting my feet into the shoes of a CMS developer, how would I know
> >>>> how to map between these two worlds using only OVN APIs?
> >>>>
> >>>> I assume that you are using this feature together with the LRP
> >>>> options:routing-protocol-redirect feature, which takes the name of an
> >>>> LSP.
> >>>>
> >>>> Could we instead either use that value, or have this new option take a
> >>>> LSP name which you then can map to a system interface on each chassis
> >>>> as needed in the ovn-controller code?
> >>>>
> >>>
> >>> Ah, good point, you're right Frode. dynamic-routing-ifname is the name
> >>> of a system interface and there's no guarantee that will be the same on
> >>> all chassis.  Using a LSP name instead might be the best way forward
> >>> indeed.  We could use
> >>> local_binding_is_up(LRP.options.dynamic-routing-ifname) to determine if
> >>> that associated LSP is bound locally or not.
> >>
> >> I don't think we can directly map from an lsp name to the interface
> >> name.
> >> Allthough the following argumentation comes from the time where
> >> the patchseries had netns support i think this would still be valid.
> >> In the netns usecase i would assume that the routing-protocol-redirect
> >> LSP is locally bound to a veth pair. One part of the pair is plugged
> >> into OVS and mapped to the LSP.
> >> The other end of the pair would be in a netns. This would also be the
> >> interface that would actually learn the routes.
> >>
> >> So the interface that is mapped to the LSP would be different from the
> >> one we learn routes on.
> >> In that case i assumed that it is easier for users to ensure that the
> >> interface name across chassis is identical based on a value they are
> >> free to choose.

Hi Frode, Hi Dumitru,

> > 
> > Dealing with the many ways a VIF can be attached in a system is indeed
> > non-trivial. Thank you for highlighting the veth pair case, I think
> > that is indeed the most common one. It is useful also without
> > namespaces, as it would allow OVS to be supervisor of one side and the
> > VRF to be supervisor of the peer.
> > 
> >> What might be an option is to use the LSP path you describe above with
> >> an optional overwrite on the local ovsdb.
> >>
> >> What do you think of that?
> > 
> > For the veth pair case, from cursory view the peer ifindex is visible
> > in `ethtool -S` for both sides, even across namespaces.
> > 
> > However, we don't know how people will actually wire this, so perhaps
> > your suggestion of adding some option in the local ovsdb is the best
> > course of action.
> > 
> > external_id in the Open vSwitch Interfaces table?
> > 
> 
> I'm trying to figure out how trivial incremental processing would be in
> ovn-controller for new OVS.Interface.external_ids.
> 
> Felix, would route_runtime_data_handler() be called if such a new
> external-id changes value?
> 
> If it would then I assume it would return "false" (we don't yet support
> I-P for that case) and trigger a route reconciliation.  So that'd be
> fine.  Or am I missing something?

I would assume the same, but honestly i would only be sure once i tried
it out.

Another thing that came to my mind: Going this way prevents the user
from using an interface filter if there is no LSP mapping to the
interface.
I think that something like this could happen if you integrate routing
with some custom agent. Especially if such a agent does not do in-band
BGP but rather uses some outside logic to learn routes.

In such cases it might make more sense to use the external_id column in
the OpenVswitch table.
Per default we would lookup the interface based on the locally bound
LSP.
However if we find that LSP name in the external_ids of the OpenVswitch
table then we use the interface name in there, independent if that LSP
is bound locally or even exists.

What are your opinions on that?


I would try such an implementation in the next hour or so.
If i don't get it to a nice state by then i would send out v7 with the
current state of this patch.
If needed this patch could be removed from the series and would probably
not affect the others too much.

Thanks a lot,
Felix


> 
> Thanks,
> Dumitru
> 
> > --
> > Frode Nordahl
> > 
> >> Thanks a lot,
> >> Felix
> >>
> >>>
> >>> Regards,
> >>> Dumitru
> >>>
> >>>> --
> >>>> Frode Nordahl
> >>>>
> >>>>> v5->v6:
> >>>>>   * addressed review comments
> >>>>> v3->v4:
> >>>>>   - addressed review comments.
> >>>>>   - updated commit message to be more descriptive.
> >>>>>
> >>>>>  controller/route-exchange-netlink.c |  2 ++
> >>>>>  controller/route-exchange-netlink.h |  3 +++
> >>>>>  controller/route-exchange.c         | 20 ++++++++++++++------
> >>>>>  controller/route.c                  | 12 ++++++++----
> >>>>>  controller/route.h                  |  6 ++++--
> >>>>>  tests/system-ovn.at                 | 23 ++++++++++++++++++++++-
> >>>>>  6 files changed, 53 insertions(+), 13 deletions(-)
> >>>>>
> >>>>> diff --git a/controller/route-exchange-netlink.c 
> >>>>> b/controller/route-exchange-netlink.c
> >>>>> index 0969f15a1..82b87ae4e 100644
> >>>>> --- a/controller/route-exchange-netlink.c
> >>>>> +++ b/controller/route-exchange-netlink.c
> >>>>> @@ -230,6 +230,8 @@ handle_route_msg(const struct route_table_msg *msg, 
> >>>>> void *data)
> >>>>>              rr->prefix = rd->rta_dst;
> >>>>>              rr->plen = rd->rtm_dst_len;
> >>>>>              rr->nexthop = nexthop->addr;
> >>>>> +            memcpy(rr->ifname, nexthop->ifname, IFNAMSIZ);
> >>>>> +            rr->ifname[IFNAMSIZ] = 0;
> >>>>>              ovs_list_push_back(handle_data->learned_routes, 
> >>>>> &rr->list_node);
> >>>>>          }
> >>>>>          return;
> >>>>> diff --git a/controller/route-exchange-netlink.h 
> >>>>> b/controller/route-exchange-netlink.h
> >>>>> index 9ea2fb8cb..dee651971 100644
> >>>>> --- a/controller/route-exchange-netlink.h
> >>>>> +++ b/controller/route-exchange-netlink.h
> >>>>> @@ -21,6 +21,7 @@
> >>>>>  #include <stdint.h>
> >>>>>  #include "openvswitch/list.h"
> >>>>>  #include <netinet/in.h>
> >>>>> +#include <net/if.h>
> >>>>>
> >>>>>  /* This value is arbitrary but currently unused.
> >>>>>   * See 
> >>>>> https://github.com/iproute2/iproute2/blob/main/etc/iproute2/rt_protos */
> >>>>> @@ -35,6 +36,8 @@ struct re_nl_received_route_node {
> >>>>>      struct in6_addr prefix;
> >>>>>      unsigned int plen;
> >>>>>      struct in6_addr nexthop;
> >>>>> +    /* Adding 1 to this to be sure we actually have a terminating '\0' 
> >>>>> */
> >>>>> +    char ifname[IFNAMSIZ + 1];
> >>>>>  };
> >>>>>
> >>>>>  int re_nl_create_vrf(const char *ifname, uint32_t table_id);
> >>>>> diff --git a/controller/route-exchange.c b/controller/route-exchange.c
> >>>>> index 850316465..a1976234b 100644
> >>>>> --- a/controller/route-exchange.c
> >>>>> +++ b/controller/route-exchange.c
> >>>>> @@ -95,7 +95,7 @@ route_lookup(struct hmap *route_map,
> >>>>>  static void
> >>>>>  sb_sync_learned_routes(const struct ovs_list *learned_routes,
> >>>>>                         const struct sbrec_datapath_binding *datapath,
> >>>>> -                       const struct sset *bound_ports,
> >>>>> +                       const struct smap *bound_ports,
> >>>>>                         struct ovsdb_idl_txn *ovnsb_idl_txn,
> >>>>>                         struct ovsdb_idl_index 
> >>>>> *sbrec_port_binding_by_name,
> >>>>>                         struct ovsdb_idl_index 
> >>>>> *sbrec_learned_route_by_datapath)
> >>>>> @@ -110,8 +110,9 @@ sb_sync_learned_routes(const struct ovs_list 
> >>>>> *learned_routes,
> >>>>>      SBREC_LEARNED_ROUTE_FOR_EACH_EQUAL (sb_route, filter,
> >>>>>                                          
> >>>>> sbrec_learned_route_by_datapath) {
> >>>>>          /* If the port is not local we don't care about it.
> >>>>> -         * Some other ovn-controller will handle it. */
> >>>>> -        if (!sset_contains(bound_ports,
> >>>>> +         * Some other ovn-controller will handle it.
> >>>>> +         * We may not use smap_get since the value might be validly 
> >>>>> NULL. */
> >>>>> +        if (!smap_get_node(bound_ports,
> >>>>>                             sb_route->logical_port->logical_port)) {
> >>>>>              continue;
> >>>>>          }
> >>>>> @@ -125,11 +126,18 @@ sb_sync_learned_routes(const struct ovs_list 
> >>>>> *learned_routes,
> >>>>>                                                 learned_route->plen);
> >>>>>          char *nexthop = normalize_v46(&learned_route->nexthop);
> >>>>>
> >>>>> -        const char *logical_port_name;
> >>>>> -        SSET_FOR_EACH (logical_port_name, bound_ports) {
> >>>>> +        struct smap_node *port_node;
> >>>>> +        SMAP_FOR_EACH (port_node, bound_ports) {
> >>>>> +            /* The user specified an ifname, but we learned it on a 
> >>>>> different
> >>>>> +             * port. */
> >>>>> +            if (port_node->value && strcmp(port_node->value,
> >>>>> +                                           learned_route->ifname)) {
> >>>>> +                continue;
> >>>>> +            }
> >>>>> +
> >>>>>              const struct sbrec_port_binding *logical_port =
> >>>>>                  lport_lookup_by_name(sbrec_port_binding_by_name,
> >>>>> -                                     logical_port_name);
> >>>>> +                                     port_node->key);
> >>>>>              if (!logical_port) {
> >>>>>                  continue;
> >>>>>              }
> >>>>> diff --git a/controller/route.c b/controller/route.c
> >>>>> index 108435bad..7b8eed289 100644
> >>>>> --- a/controller/route.c
> >>>>> +++ b/controller/route.c
> >>>>> @@ -82,7 +82,7 @@ advertise_datapath_cleanup(struct 
> >>>>> advertise_datapath_entry *ad)
> >>>>>          free(ar);
> >>>>>      }
> >>>>>      hmap_destroy(&ad->routes);
> >>>>> -    sset_destroy(&ad->bound_ports);
> >>>>> +    smap_destroy(&ad->bound_ports);
> >>>>>      free(ad);
> >>>>>  }
> >>>>>
> >>>>> @@ -114,7 +114,7 @@ route_run(struct route_ctx_in *r_ctx_in,
> >>>>>          ad = xzalloc(sizeof(*ad));
> >>>>>          ad->db = ld->datapath;
> >>>>>          hmap_init(&ad->routes);
> >>>>> -        sset_init(&ad->bound_ports);
> >>>>> +        smap_init(&ad->bound_ports);
> >>>>>
> >>>>>          /* This is a LR datapath, find LRPs with route exchange options
> >>>>>           * that are bound locally. */
> >>>>> @@ -132,10 +132,14 @@ route_run(struct route_ctx_in *r_ctx_in,
> >>>>>
> >>>>>              ad->maintain_vrf |= smap_get_bool(
> >>>>>                  &repb->options, "dynamic-routing-maintain-vrf", false);
> >>>>> -            sset_add(&ad->bound_ports, local_peer->logical_port);
> >>>>> +            char *ifname = nullable_xstrdup(
> >>>>> +                                    smap_get(&repb->options,
> >>>>> +                                             
> >>>>> "dynamic-routing-ifname"));
> >>>>> +            smap_add_nocopy(&ad->bound_ports,
> >>>>> +                            xstrdup(local_peer->logical_port), ifname);
> >>>>>          }
> >>>>>
> >>>>> -        if (sset_is_empty(&ad->bound_ports)) {
> >>>>> +        if (smap_is_empty(&ad->bound_ports)) {
> >>>>>              advertise_datapath_cleanup(ad);
> >>>>>              continue;
> >>>>>          }
> >>>>> diff --git a/controller/route.h b/controller/route.h
> >>>>> index 10b9434fc..c53134559 100644
> >>>>> --- a/controller/route.h
> >>>>> +++ b/controller/route.h
> >>>>> @@ -22,6 +22,7 @@
> >>>>>  #include <netinet/in.h>
> >>>>>  #include "openvswitch/hmap.h"
> >>>>>  #include "sset.h"
> >>>>> +#include "smap.h"
> >>>>>
> >>>>>  struct hmap;
> >>>>>  struct ovsdb_idl_index;
> >>>>> @@ -50,8 +51,9 @@ struct advertise_datapath_entry {
> >>>>>      struct hmap routes;
> >>>>>
> >>>>>      /* The name of the port bindings locally bound for this datapath 
> >>>>> and
> >>>>> -     * running route exchange logic. */
> >>>>> -    struct sset bound_ports;
> >>>>> +     * running route exchange logic.
> >>>>> +     * The key is the port name and the value is the ifname if set. */
> >>>>> +    struct smap bound_ports;
> >>>>>  };
> >>>>>
> >>>>>  struct advertise_route_entry {
> >>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> >>>>> index afe0ee902..f41a23115 100644
> >>>>> --- a/tests/system-ovn.at
> >>>>> +++ b/tests/system-ovn.at
> >>>>> @@ -15660,6 +15660,17 @@ check_row_count Learned_Route 1
> >>>>>  lp=$(fetch_column port_binding _uuid logical_port=internet-phys)
> >>>>>  check_row_count Learned_Route 1 logical_port=$lp 
> >>>>> ip_prefix=233.252.0.0/24 nexthop=192.168.10.10
> >>>>>
> >>>>> +# By setting a learning interface filter we will now forget about this 
> >>>>> route.
> >>>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \
> >>>>> +      options:dynamic-routing-ifname=thisdoesnotexist
> >>>>> +check_row_count Learned_Route 0
> >>>>> +
> >>>>> +# Changing it to "lo" will allow us to learn the route again.
> >>>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \
> >>>>> +      options:dynamic-routing-ifname=lo
> >>>>> +check_row_count Learned_Route 1
> >>>>> +
> >>>>> +
> >>>>>  OVS_APP_EXIT_AND_WAIT([ovn-controller])
> >>>>>
> >>>>>  as ovn-sb
> >>>>> @@ -15869,10 +15880,20 @@ check ip route add 233.252.0.0/24 via 
> >>>>> 192.168.10.10 dev lo onlink vrf ovnvrf1337
> >>>>>  # For now we trigger a recompute as route watching is not yet 
> >>>>> implemented.
> >>>>>  check ovn-appctl -t ovn-controller inc-engine/recompute
> >>>>>  check ovn-nbctl --wait=hv sync
> >>>>> -check_row_count Learned_Route 2
> >>>>> +check_row_count Learned_Route 1
> >>>>>  lp=$(fetch_column port_binding _uuid logical_port=internet-phys)
> >>>>>  check_row_count Learned_Route 1 logical_port=$lp 
> >>>>> ip_prefix=233.252.0.0/24 nexthop=192.168.10.10
> >>>>>
> >>>>> +# By setting a learning interface filter we will now forget about this 
> >>>>> route.
> >>>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \
> >>>>> +      options:dynamic-routing-ifname=thisdoesnotexist
> >>>>> +check_row_count Learned_Route 0
> >>>>> +
> >>>>> +# Changing it to "lo" will allow us to learn the route again.
> >>>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \
> >>>>> +      options:dynamic-routing-ifname=lo
> >>>>> +check_row_count Learned_Route 1
> >>>>> +
> >>>>>  as ovn-sb
> >>>>>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >>>>>
> >>>>> --
> >>>>> 2.47.1
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> dev mailing list
> >>>>> [email protected]
> >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>>
> >>>
> > 
> 
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to