On 1/28/25 1:13 PM, Felix Huettner wrote:
> On Thu, Jan 16, 2025 at 01:05:42PM +0100, Dumitru Ceara wrote:
>> Hi Felix,
> 
> Hi Dumitru,
> 
> thanks for the review.
> 

Hi Felix,

>>
>> On 1/2/25 4:19 PM, Felix Huettner via dev wrote:
>>> Previously we just assumed that if a LR had multiple LRPs bound to the
>>> local chassis that all routes where valid for all of these LRPs.
>>> This commit handles a previous option in ovn-nb that allowes the user to
>>> specify the interface name that a route needs to use to be acceptable
>>> for a given LRP.
>>> The users can then have a 1:1 relationship between LRPs and interfaces
>>> to allow ovn-controller to determine which route belongs to which LRP.
>>
>> I'm a bit confused by the dynamic-routing-ifname option.  If I
>> understand correctly, this is used to filter what routes we inject in
>> the SB for a given LRP, i.e., only if the LRP.dynamic-routing-ifname
>> matches the interface the route was learnt in the VRF where the routing
>> control plane (FRR) runs.
>>
>> But, in which case will FRR learn routes on more than one interface (the
>> one on which we redirected BGP control traffic)?
>>
>> I can imagine a scenario in which FRR gets its control plane traffic
>> (e.g., BGP packets) in a different way (e.g., before that traffic gets
>> sent to br-int), potentially on different interfaces but I thought
>> that's not the use case we're targeting in the OVN BGP integration.
>>
>> It might be a good idea to add some examples to the NB documentation in
>> the patch where we introduce this option.
> 
> I did add more docs here 
> https://patchwork.ozlabs.org/project/ovn/patch/0236ff31f79abbf95b07a6a247d42fcfc348cad3.1737472971.git.felix.huettner@stackit.cloud/
> So i hope this helps. I'll also update the commit message to make this
> clearer.
> 
> The setup i had in mind was a LR with 5 LRPs.
> LRP 1 goes on to some internal LS with VIFs or so.
> LRP 2-5 are bound to the same LS that has a localnet port.
> LRP 2-3 are bound to one chassis.
> LRP 4-5 are bound to another chassis.
> Each of LRP 2-5 represents a single uplink from the chassis it is bound
> on. So each of the two chassis has two separate uplinks.
> On these separate uplinks you can then run separate BGP sessions using
> routing-protocol-redirect. Having different MACs and IPs ensures that
> this actually works, allthough it looks really hacky.
> 

Oh, so in the LR ovnvrf you'll have 2 different interfaces that are
bound to two different LSPs that correspond to LRP2,3 (or LRP 4,5)
routing-protocol-redirect values.

OK, I guess that would work.

Do we have a system test for this kind of deployment?  (even if we only
cover one chassis).

> 
> When sending now traffic from something behind LRP 1 to the outside
> there is a 1:4 chance for each of LRP 2-5 behing used.
> 
> The benefit of learning e.g. a default route individually on each of the
> LRPs is if e.g. one of the uplinks now fails.
> If e.g. LRP 2 fails then the route gets removed and we then have a 1:3
> chance for each of LRP 3-5 for egress traffic. That means that one
> chassis will receive half of the traffic than the other one for
> forwarding.
> This exactly matches the available uplinks.
> 
> If we would bond the two uplinks on each chassis, or if we would just
> learn the routes on both LRPs then traffic would still hit both chassis
> in the same ratio. As one chassis actually has less bandwidth available
> we would utilize it more than we actually should.
> 
> I hope that clarifies what i had in mind.
> 

I think it does, thanks!

> 
>>
>>>
>>> Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud>
>>> ---
>>>  controller/route-exchange-netlink.c |  1 +
>>>  controller/route-exchange-netlink.h |  2 ++
>>>  controller/route-exchange.c         | 21 +++++++++++++++------
>>>  controller/route.c                  | 12 +++++++++---
>>>  controller/route.h                  |  6 ++++--
>>>  tests/system-ovn.at                 | 25 +++++++++++++++++++++++--
>>>  6 files changed, 54 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/controller/route-exchange-netlink.c 
>>> b/controller/route-exchange-netlink.c
>>> index c2a7551f3..9fea0d955 100644
>>> --- a/controller/route-exchange-netlink.c
>>> +++ b/controller/route-exchange-netlink.c
>>> @@ -222,6 +222,7 @@ handle_route_msg_delete_routes(const struct 
>>> route_table_msg *msg, void *data)
>>>              rr->addr = rd->rta_dst;
>>>              rr->plen = rd->rtm_dst_len;
>>>              rr->nexthop = nexthop->addr;
>>> +            memcpy(rr->ifname, nexthop->ifname, IFNAMSIZ);
>>
>> Is it guaranteed that nexthop->ifname is null-terminated?  We use
>> strcmp() on it in sb_sync_learned_routes() so it needs to be
>> null-terminated.
> 
> Ok, so there is no documentation that backs this up, so i went hunting
> through code to find out:
> * This ifname comes from route-table.c which calls if_indextoname
> * if_indextoname uses the ioctl SIOCGIFNAME
> * This ends up in the kernel function netdev_get_name
> * That function uses strcpy to copy the string
> 
> As strcpy relies on the string being null terminated it should be here
> as well if the size matches. The size in the kernel (struct net_device)
> is defined with IFNAMSIZ and as we use that here as well we should be
> safe.
> 

We could be extra careful, allocate IFNAMSIZ + 1 bytes and make sure we
explicitly set (at least) the last one to 0.

>>
>>>          }
>>>          return;
>>>      }
>>> diff --git a/controller/route-exchange-netlink.h 
>>> b/controller/route-exchange-netlink.h
>>> index 566b38fde..fca2429e6 100644
>>> --- a/controller/route-exchange-netlink.h
>>> +++ b/controller/route-exchange-netlink.h
>>> @@ -18,6 +18,7 @@
>>>  #include <stdint.h>
>>>  #include "openvswitch/hmap.h"
>>>  #include <netinet/in.h>
>>> +#include <net/if.h>
>>>  
>>>  /* This value is arbitrary but currently unused.
>>>   * See 
>>> https://github.com/iproute2/iproute2/blob/main/etc/iproute2/rt_protos
>>> @@ -31,6 +32,7 @@ struct re_nl_received_route_node {
>>>      struct in6_addr addr;
>>>      unsigned int plen;
>>>      struct in6_addr nexthop;
>>> +    char ifname[IFNAMSIZ];
>>>  };
>>>  
>>>  int re_nl_create_vrf(const char *ifname, uint32_t table_id);
>>> diff --git a/controller/route-exchange.c b/controller/route-exchange.c
>>> index febe13a8e..77d5ce51d 100644
>>> --- a/controller/route-exchange.c
>>> +++ b/controller/route-exchange.c
>>> @@ -101,7 +101,7 @@ route_erase_entry(struct route_entry *route_e)
>>>  static void
>>>  sb_sync_learned_routes(const struct sbrec_datapath_binding *datapath,
>>>                         const struct hmap *learned_routes,
>>> -                       const struct sset *bound_ports,
>>> +                       const struct smap *bound_ports,
>>>                         struct ovsdb_idl *ovnsb_idl,
>>>                         struct ovsdb_idl_txn *ovnsb_idl_txn,
>>>                         struct ovsdb_idl_index 
>>> *sbrec_learned_route_by_datapath,
>>> @@ -121,8 +121,9 @@ sb_sync_learned_routes(const struct 
>>> sbrec_datapath_binding *datapath,
>>>      SBREC_LEARNED_ROUTE_FOR_EACH_EQUAL (sb_route, filter,
>>>                                          sbrec_learned_route_by_datapath) {
>>>          /* If the port is not local we don't care about it.
>>> -         * Some other ovn-controller will handle it. */
>>> -        if (!sset_contains(bound_ports,
>>> +         * Some other ovn-controller will handle it.
>>> +         * We may not use smap_get since the value might be validly NULL. 
>>> */
>>> +        if (!smap_get_node(bound_ports,
>>>                             sb_route->logical_port->logical_port)) {
>>>              continue;
>>>          }
>>> @@ -142,14 +143,22 @@ sb_sync_learned_routes(const struct 
>>> sbrec_datapath_binding *datapath,
>>>                                                 learned_route->plen);
>>>          char *nexthop = normalize_v46(&learned_route->nexthop);
>>>  
>>> -        const char *logical_port_name;
>>> -        SSET_FOR_EACH (logical_port_name, bound_ports) {
>>> +        struct smap_node *port_node;
>>> +        SMAP_FOR_EACH (port_node, bound_ports) {
>>> +            /* The user specified an ifname, but we learned it on a 
>>> different
>>> +             * port. */
>>> +            if (port_node->value && strcmp(port_node->value,
>>> +                                           learned_route->ifname)) {
>>> +                continue;
>>> +            }
>>> +
>>>              const struct sbrec_port_binding *logical_port =
>>>                  lport_lookup_by_name(sbrec_port_binding_by_name,
>>> -                                     logical_port_name);
>>> +                                     port_node->key);
>>>              if (!logical_port) {
>>>                  continue;
>>>              }
>>> +
>>
>> Nit: unrelated, it should probably be part of an earlier patch.
>>
>>>              route_e = route_lookup_or_add(&sync_routes,
>>>                  datapath,
>>>                  logical_port, ip_prefix, nexthop);
>>> diff --git a/controller/route.c b/controller/route.c
>>> index b3ff77b83..036a574d8 100644
>>> --- a/controller/route.c
>>> +++ b/controller/route.c
>>> @@ -83,7 +83,7 @@ advertise_datapath_cleanup(struct 
>>> advertise_datapath_entry *ad)
>>>          free(ar);
>>>      }
>>>      hmap_destroy(&ad->routes);
>>> -    sset_destroy(&ad->bound_ports);
>>> +    smap_destroy(&ad->bound_ports);
>>>      free(ad);
>>>  }
>>>  
>>> @@ -91,6 +91,8 @@ void
>>>  route_run(struct route_ctx_in *r_ctx_in,
>>>            struct route_ctx_out *r_ctx_out)
>>>  {
>>> +    tracked_datapaths_destroy(r_ctx_out->tracked_re_datapaths);
>>> +
>>
>> This doesn't look correct.  It will destroy the
>> r_ctx_out->tracked_re_datapaths map completely.  But then below we just
>> add stuff to it again (and the hmap is not properly initialized anymore).
>>
>> Shouldn't we just clear the tracked_route_datapaths hmap in
>> en_route_run() before we call route_run()?  Do we need a new
>> tracked_datapaths_clear() function?
>>
>> Also, I guess, that should not be part of this patch but should be
>> instead in the patch that adds the routes I-P node.
> 
> Thanks a lot. Will fix and move it.
> 
>>
>>>      const struct local_datapath *ld;
>>>      HMAP_FOR_EACH (ld, hmap_node, r_ctx_in->local_datapaths) {
>>>          if (!ld->n_peer_ports || ld->is_switch) {
>>> @@ -102,7 +104,7 @@ route_run(struct route_ctx_in *r_ctx_in,
>>>          ad->key = ld->datapath->tunnel_key;
>>>          ad->db = ld->datapath;
>>>          hmap_init(&ad->routes);
>>> -        sset_init(&ad->bound_ports);
>>> +        smap_init(&ad->bound_ports);
>>>  
>>>          /* This is a LR datapath, find LRPs with route exchange options
>>>           * that are bound locally. */
>>> @@ -122,8 +124,12 @@ route_run(struct route_ctx_in *r_ctx_in,
>>>                                            "maintain-vrf", false);
>>>              ad->use_netns |= smap_get_bool(&repb->options,
>>>                                         "use-netns", false);
>>> +            char *ifname = nullable_xstrdup(
>>> +                                    smap_get(&repb->options,
>>> +                                             "dynamic-routing-ifname"));
>>>              relevant_datapath = true;
>>> -            sset_add(&ad->bound_ports, local_peer->logical_port);
>>> +            smap_add_nocopy(&ad->bound_ports,
>>> +                            xstrdup(local_peer->logical_port), ifname);
>>>          }
>>>  
>>>          if (!relevant_datapath) {
>>> diff --git a/controller/route.h b/controller/route.h
>>> index 2a54cf3e3..4b71fa88a 100644
>>> --- a/controller/route.h
>>> +++ b/controller/route.h
>>> @@ -19,6 +19,7 @@
>>>  #include <netinet/in.h>
>>>  #include "openvswitch/hmap.h"
>>>  #include "sset.h"
>>> +#include "smap.h"
>>>  
>>>  struct hmap;
>>>  struct ovsdb_idl_index;
>>> @@ -51,8 +52,9 @@ struct advertise_datapath_entry {
>>>      bool use_netns;
>>>      struct hmap routes;
>>>      /* the name of the port bindings locally bound for this datapath and
>>> -     * running route exchange logic. */
>>> -    struct sset bound_ports;
>>> +     * running route exchange logic.
>>> +     * The key is the port name and the value is the ifname if set. */
>>> +    struct smap bound_ports;
>>>  };
>>>  
>>>  struct advertise_route_entry {
>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>>> index 02d5ce7e1..7ad666f20 100644
>>> --- a/tests/system-ovn.at
>>> +++ b/tests/system-ovn.at
>>> @@ -14920,6 +14920,17 @@ AT_CHECK_UNQUOTED([ovn-sbctl --columns 
>>> ip_prefix,nexthop,logical_port --bare fin
>>>  $lp
>>>  ])
>>>  
>>> +# by setting a learning interface filter we will now forget about this 
>>> route
>>
>> Nit: comments should be sentences (for all comments in this file).
>>
>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \
>>> +      options:dynamic-routing-ifname=thisdoesnotexist
>>> +check_row_count Learned_Route 0
>>> +
>>> +# chaning it to "lo" will allow us to learn the route again
>>
>> Typo: chaning
>>
>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \
>>> +      options:dynamic-routing-ifname=lo
>>> +check_row_count Learned_Route 1
>>> +
>>> +
>>>  OVS_APP_EXIT_AND_WAIT([ovn-controller])
>>>  
>>>  as ovn-sb
>>> @@ -15130,14 +15141,24 @@ check ip route add 233.252.0.0/24 via 
>>> 192.168.10.10 dev lo onlink vrf ovnvrf1337
>>>  # for now we trigger a recompute as route watching is not yet implemented
>>>  check ovn-appctl -t ovn-controller inc-engine/recompute
>>>  check ovn-nbctl --wait=hv sync
>>> -check_row_count Learned_Route 2
>>> +check_row_count Learned_Route 1
>>>  lp=$(ovn-sbctl --bare --columns _uuid list port_binding internet-phys)
>>> -AT_CHECK_UNQUOTED([ovn-sbctl --columns ip_prefix,nexthop,logical_port 
>>> --bare find Learned_Route logical_port=$lp], [0], [dnl
>>> +AT_CHECK_UNQUOTED([ovn-sbctl --columns ip_prefix,nexthop,logical_port 
>>> --bare find Learned_Route], [0], [dnl
>>>  233.252.0.0/24
>>>  192.168.10.10
>>>  $lp
>>>  ])
>>>  
>>> +# by setting a learning interface filter we will now forget about this 
>>> route
>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \
>>> +      options:dynamic-routing-ifname=thisdoesnotexist
>>> +check_row_count Learned_Route 0
>>> +
>>> +# chaning it to "lo" will allow us to learn the route again
>>
>> Typo: chaning
>>
>>> +check ovn-nbctl --wait=hv set Logical_Router_Port internet-phys \
>>> +      options:dynamic-routing-ifname=lo
>>> +check_row_count Learned_Route 1
>>> +
>>>  as ovn-sb
>>>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>>  
> 
> 
> Thanks a lot,
> Felix

Regards,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to