On Tue, Jan 28, 2025 at 01:42:01PM +0100, Dumitru Ceara wrote:
> On 1/28/25 1:13 PM, Felix Huettner wrote:
> > On Thu, Jan 16, 2025 at 01:05:42PM +0100, Dumitru Ceara wrote:
> >> Hi Felix,
> > 
> > Hi Dumitru,
> > 
> > thanks for the review.
> > 
> 
> Hi Felix,

Hi Dumitru,

> 
> >>
> >> On 1/2/25 4:19 PM, Felix Huettner via dev wrote:
> >>> Previously we just assumed that if a LR had multiple LRPs bound to the
> >>> local chassis that all routes where valid for all of these LRPs.
> >>> This commit handles a previous option in ovn-nb that allowes the user to
> >>> specify the interface name that a route needs to use to be acceptable
> >>> for a given LRP.
> >>> The users can then have a 1:1 relationship between LRPs and interfaces
> >>> to allow ovn-controller to determine which route belongs to which LRP.
> >>
> >> I'm a bit confused by the dynamic-routing-ifname option.  If I
> >> understand correctly, this is used to filter what routes we inject in
> >> the SB for a given LRP, i.e., only if the LRP.dynamic-routing-ifname
> >> matches the interface the route was learnt in the VRF where the routing
> >> control plane (FRR) runs.
> >>
> >> But, in which case will FRR learn routes on more than one interface (the
> >> one on which we redirected BGP control traffic)?
> >>
> >> I can imagine a scenario in which FRR gets its control plane traffic
> >> (e.g., BGP packets) in a different way (e.g., before that traffic gets
> >> sent to br-int), potentially on different interfaces but I thought
> >> that's not the use case we're targeting in the OVN BGP integration.
> >>
> >> It might be a good idea to add some examples to the NB documentation in
> >> the patch where we introduce this option.
> > 
> > I did add more docs here 
> > https://patchwork.ozlabs.org/project/ovn/patch/0236ff31f79abbf95b07a6a247d42fcfc348cad3.1737472971.git.felix.huettner@stackit.cloud/
> > So i hope this helps. I'll also update the commit message to make this
> > clearer.
> > 
> > The setup i had in mind was a LR with 5 LRPs.
> > LRP 1 goes on to some internal LS with VIFs or so.
> > LRP 2-5 are bound to the same LS that has a localnet port.
> > LRP 2-3 are bound to one chassis.
> > LRP 4-5 are bound to another chassis.
> > Each of LRP 2-5 represents a single uplink from the chassis it is bound
> > on. So each of the two chassis has two separate uplinks.
> > On these separate uplinks you can then run separate BGP sessions using
> > routing-protocol-redirect. Having different MACs and IPs ensures that
> > this actually works, allthough it looks really hacky.
> > 
> 
> Oh, so in the LR ovnvrf you'll have 2 different interfaces that are
> bound to two different LSPs that correspond to LRP2,3 (or LRP 4,5)
> routing-protocol-redirect values.
> 
> OK, I guess that would work.
> 
> Do we have a system test for this kind of deployment?  (even if we only
> cover one chassis).

No not yet. I would add it to the multinode tests that i still need to
create. I think there it fits better than here in the ovn-system.

> 
> > 
> > When sending now traffic from something behind LRP 1 to the outside
> > there is a 1:4 chance for each of LRP 2-5 behing used.
> > 
> > The benefit of learning e.g. a default route individually on each of the
> > LRPs is if e.g. one of the uplinks now fails.
> > If e.g. LRP 2 fails then the route gets removed and we then have a 1:3
> > chance for each of LRP 3-5 for egress traffic. That means that one
> > chassis will receive half of the traffic than the other one for
> > forwarding.
> > This exactly matches the available uplinks.
> > 
> > If we would bond the two uplinks on each chassis, or if we would just
> > learn the routes on both LRPs then traffic would still hit both chassis
> > in the same ratio. As one chassis actually has less bandwidth available
> > we would utilize it more than we actually should.
> > 
> > I hope that clarifies what i had in mind.
> > 
> 
> I think it does, thanks!
> 
> > 
> >>
> >>>
> >>> Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud>
> >>> ---
> >>>  controller/route-exchange-netlink.c |  1 +
> >>>  controller/route-exchange-netlink.h |  2 ++
> >>>  controller/route-exchange.c         | 21 +++++++++++++++------
> >>>  controller/route.c                  | 12 +++++++++---
> >>>  controller/route.h                  |  6 ++++--
> >>>  tests/system-ovn.at                 | 25 +++++++++++++++++++++++--
> >>>  6 files changed, 54 insertions(+), 13 deletions(-)
> >>>
> >>> diff --git a/controller/route-exchange-netlink.c 
> >>> b/controller/route-exchange-netlink.c
> >>> index c2a7551f3..9fea0d955 100644
> >>> --- a/controller/route-exchange-netlink.c
> >>> +++ b/controller/route-exchange-netlink.c
> >>> @@ -222,6 +222,7 @@ handle_route_msg_delete_routes(const struct 
> >>> route_table_msg *msg, void *data)
> >>>              rr->addr = rd->rta_dst;
> >>>              rr->plen = rd->rtm_dst_len;
> >>>              rr->nexthop = nexthop->addr;
> >>> +            memcpy(rr->ifname, nexthop->ifname, IFNAMSIZ);
> >>
> >> Is it guaranteed that nexthop->ifname is null-terminated?  We use
> >> strcmp() on it in sb_sync_learned_routes() so it needs to be
> >> null-terminated.
> > 
> > Ok, so there is no documentation that backs this up, so i went hunting
> > through code to find out:
> > * This ifname comes from route-table.c which calls if_indextoname
> > * if_indextoname uses the ioctl SIOCGIFNAME
> > * This ends up in the kernel function netdev_get_name
> > * That function uses strcpy to copy the string
> > 
> > As strcpy relies on the string being null terminated it should be here
> > as well if the size matches. The size in the kernel (struct net_device)
> > is defined with IFNAMSIZ and as we use that here as well we should be
> > safe.
> > 
> 
> We could be extra careful, allocate IFNAMSIZ + 1 bytes and make sure we
> explicitly set (at least) the last one to 0.

I'll add +1 to the ifname size. The xzalloc and limitet memcpy should
ensure we always have the last byte as zero.

Thanks a lot,
Felix

> 
> >>
> >>>          }
> >>>          return;
> >>>      }
> >>> diff --git a/controller/route-exchange-netlink.h 
> >>> b/controller/route-exchange-netlink.h
> >>> index 566b38fde..fca2429e6 100644
> >>> --- a/controller/route-exchange-netlink.h
> >>> +++ b/controller/route-exchange-netlink.h
> >>> @@ -18,6 +18,7 @@
> >>>  #include <stdint.h>
> >>>  #include "openvswitch/hmap.h"
> >>>  #include <netinet/in.h>
> >>> +#include <net/if.h>
> >>>  
> >>>  /* This value is arbitrary but currently unused.
> >>>   * See 
> >>> https://github.com/iproute2/iproute2/blob/main/etc/iproute2/rt_protos
> >>> @@ -31,6 +32,7 @@ struct re_nl_received_route_node {
> >>>      struct in6_addr addr;
> >>>      unsigned int plen;
> >>>      struct in6_addr nexthop;
> >>> +    char ifname[IFNAMSIZ];
> >>>  };
> >>>  
> >>>  int re_nl_create_vrf(const char *ifname, uint32_t table_id);
> >>> diff --git a/controller/route-exchange.c b/controller/route-exchange.c
> >>> index febe13a8e..77d5ce51d 100644
> >>> --- a/controller/route-exchange.c
> >>> +++ b/controller/route-exchange.c
> >>> @@ -101,7 +101,7 @@ route_erase_entry(struct route_entry *route_e)
> >>>  static void
> >>>  sb_sync_learned_routes(const struct sbrec_datapath_binding *datapath,
> >>>                         const struct hmap *learned_routes,
> >>> -                       const struct sset *bound_ports,
> >>> +                       const struct smap *bound_ports,
> >>>                         struct ovsdb_idl *ovnsb_idl,
> >>>                         struct ovsdb_idl_txn *ovnsb_idl_txn,
> >>>                         struct ovsdb_idl_index 
> >>> *sbrec_learned_route_by_datapath,
> >>> @@ -121,8 +121,9 @@ sb_sync_learned_routes(const struct 
> >>> sbrec_datapath_binding *datapath,
> >>>      SBREC_LEARNED_ROUTE_FOR_EACH_EQUAL (sb_route, filter,
> >>>                                          sbrec_learned_route_by_datapath) 
> >>> {
> >>>          /* If the port is not local we don't care about it.
> >>> -         * Some other ovn-controller will handle it. */
> >>> -        if (!sset_contains(bound_ports,
> >>> +         * Some other ovn-controller will handle it.
> >>> +         * We may not use smap_get since the value might be validly 
> >>> NULL. */
> >>> +        if (!smap_get_node(bound_ports,
> >>>                             sb_route->logical_port->logical_port)) {
> >>>              continue;
> >>>          }
> >>> @@ -142,14 +143,22 @@ sb_sync_learned_routes(const struct 
> >>> sbrec_datapath_binding *datapath,
> >>>                                                 learned_route->plen);
> >>>          char *nexthop = normalize_v46(&learned_route->nexthop);
> >>>  
> >>> -        const char *logical_port_name;
> >>> -        SSET_FOR_EACH (logical_port_name, bound_ports) {
> >>> +        struct smap_node *port_node;
> >>> +        SMAP_FOR_EACH (port_node, bound_ports) {
> >>> +            /* The user specified an ifname, but we learned it on a 
> >>> different
> >>> +             * port. */
> >>> +            if (port_node->value && strcmp(port_node->value,
> >>> +                                           learned_route->ifname)) {
> >>> +                continue;
> >>> +            }
> >>> +
> >>>              const struct sbrec_port_binding *logical_port =
> >>>                  lport_lookup_by_name(sbrec_port_binding_by_name,
> >>> -                                     logical_port_name);
> >>> +                                     port_node->key);
> >>>              if (!logical_port) {
> >>>                  continue;
> >>>              }
> >>> +
> >>
> >> Nit: unrelated, it should probably be part of an earlier patch.
> >>
> >>>              route_e = route_lookup_or_add(&sync_routes,
> >>>                  datapath,
> >>>                  logical_port, ip_prefix, nexthop);
> >>> diff --git a/controller/route.c b/controller/route.c
> >>> index b3ff77b83..036a574d8 100644
> >>> --- a/controller/route.c
> >>> +++ b/controller/route.c
> >>> @@ -83,7 +83,7 @@ advertise_datapath_cleanup(struct 
> >>> advertise_datapath_entry *ad)
> >>>          free(ar);
> >>>      }
> >>>      hmap_destroy(&ad->routes);
> >>> -    sset_destroy(&ad->bound_ports);
> >>> +    smap_destroy(&ad->bound_ports);
> >>>      free(ad);
> >>>  }
> >>>  
> >>> @@ -91,6 +91,8 @@ void
> >>>  route_run(struct route_ctx_in *r_ctx_in,
> >>>            struct route_ctx_out *r_ctx_out)
> >>>  {
> >>> +    tracked_datapaths_destroy(r_ctx_out->tracked_re_datapaths);
> >>> +
> >>
> >> This doesn't look correct.  It will destroy the
> >> r_ctx_out->tracked_re_datapaths map completely.  But then below we just
> >> add stuff to it again (and the hmap is not properly initialized anymore).
> >>
> >> Shouldn't we just clear the tracked_route_datapaths hmap in
> >> en_route_run() before we call route_run()?  Do we need a new
> >> tracked_datapaths_clear() function?
> >>
> >> Also, I guess, that should not be part of this patch but should be
> >> instead in the patch that adds the routes I-P node.
> > 
> > Thanks a lot. Will fix and move it.
> > 
> >>
> >>>      const struct local_datapath *ld;
> >>>      HMAP_FOR_EACH (ld, hmap_node, r_ctx_in->local_datapaths) {
> >>>          if (!ld->n_peer_ports || ld->is_switch) {
> >>> @@ -102,7 +104,7 @@ route_run(struct route_ctx_in *r_ctx_in,
> >>>          ad->key = ld->datapath->tunnel_key;
> >>>          ad->db = ld->datapath;
> >>>          hmap_init(&ad->routes);
> >>> -        sset_init(&ad->bound_ports);
> >>> +        smap_init(&ad->bound_ports);
> >>>  
> >>>          /* This is a LR datapath, find LRPs with route exchange options
> >>>           * that are bound locally. */
> >>> @@ -122,8 +124,12 @@ route_run(struct route_ctx_in *r_ctx_in,
> >>>                                            "maintain-vrf", false);
> >>>              ad->use_netns |= smap_get_bool(&repb->options,
> >>>                                         "use-netns", false);
> >>> +            char *ifname = nullable_xstrdup(
> >>> +                                    smap_get(&repb->options,
> >>> +                                             "dynamic-routing-ifname"));
> >>>              relevant_datapath = true;
> >>> -            sset_add(&ad->bound_ports, local_peer->logical_port);
> >>> +            smap_add_nocopy(&ad->bound_ports,
> >>> +                            xstrdup(local_peer->logical_port), ifname);
> >>>          }
> >>>  
> >>>          if (!relevant_datapath) {
> >>> diff --git a/controller/route.h b/controller/route.h
> >>> index 2a54cf3e3..4b71fa88a 100644
> >>> --- a/controller/route.h
> >>> +++ b/controller/route.h
> >>> @@ -19,6 +19,7 @@
> >>>  #include <netinet/in.h>
> >>>  #include "openvswitch/hmap.h"
> >>>  #include "sset.h"
> >>> +#include "smap.h"
> >>>  
> >>>  struct hmap;
> >>>  struct ovsdb_idl_index;
> >>> @@ -51,8 +52,9 @@ struct advertise_datapath_entry {
> >>>      bool use_netns;
> >>>      struct hmap routes;
> >>>      /* the name of the port bindings locally bound for this datapath and
> >>> -     * running route exchange logic. */
> >>> -    struct sset bound_ports;
> >>> +     * running route exchange logic.
> >>> +     * The key is the port name and the value is the ifname if set. */
> >>> +    struct smap bound_ports;
> >>>  };
> >>>  
> >>>  struct advertise_route_entry {
> >>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> >>> index 02d5ce7e1..7ad666f20 100644
> >>> --- a/tests/system-ovn.at
> >>> +++ b/tests/system-ovn.at
> >>> @@ -14920,6 +14920,17 @@ AT_CHECK_UNQUOTED([ovn-sbctl --columns 
> >>> ip_prefix,nexthop,logical_port --bare fin
> >>>  $lp
> >>>  ])
> >>>  
> >>> +# by setting a learning interface filter we will now forget about this 
> >>> route
> >>
> >> Nit: comments should be sentences (for all comments in this file).
> >>
> >>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \
> >>> +      options:dynamic-routing-ifname=thisdoesnotexist
> >>> +check_row_count Learned_Route 0
> >>> +
> >>> +# chaning it to "lo" will allow us to learn the route again
> >>
> >> Typo: chaning
> >>
> >>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \
> >>> +      options:dynamic-routing-ifname=lo
> >>> +check_row_count Learned_Route 1
> >>> +
> >>> +
> >>>  OVS_APP_EXIT_AND_WAIT([ovn-controller])
> >>>  
> >>>  as ovn-sb
> >>> @@ -15130,14 +15141,24 @@ check ip route add 233.252.0.0/24 via 
> >>> 192.168.10.10 dev lo onlink vrf ovnvrf1337
> >>>  # for now we trigger a recompute as route watching is not yet implemented
> >>>  check ovn-appctl -t ovn-controller inc-engine/recompute
> >>>  check ovn-nbctl --wait=hv sync
> >>> -check_row_count Learned_Route 2
> >>> +check_row_count Learned_Route 1
> >>>  lp=$(ovn-sbctl --bare --columns _uuid list port_binding internet-phys)
> >>> -AT_CHECK_UNQUOTED([ovn-sbctl --columns ip_prefix,nexthop,logical_port 
> >>> --bare find Learned_Route logical_port=$lp], [0], [dnl
> >>> +AT_CHECK_UNQUOTED([ovn-sbctl --columns ip_prefix,nexthop,logical_port 
> >>> --bare find Learned_Route], [0], [dnl
> >>>  233.252.0.0/24
> >>>  192.168.10.10
> >>>  $lp
> >>>  ])
> >>>  
> >>> +# by setting a learning interface filter we will now forget about this 
> >>> route
> >>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \
> >>> +      options:dynamic-routing-ifname=thisdoesnotexist
> >>> +check_row_count Learned_Route 0
> >>> +
> >>> +# chaning it to "lo" will allow us to learn the route again
> >>
> >> Typo: chaning
> >>
> >>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \
> >>> +      options:dynamic-routing-ifname=lo
> >>> +check_row_count Learned_Route 1
> >>> +
> >>>  as ovn-sb
> >>>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >>>  
> > 
> > 
> > Thanks a lot,
> > Felix
> 
> Regards,
> Dumitru
> 
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to