On 1/27/25 3:22 PM, Felix Huettner wrote:
> On Wed, Jan 15, 2025 at 11:42:47AM +0100, Dumitru Ceara wrote:
>> Hi Felix, Frode,
> 
> Hi Dumitru,
> 

Hi Felix,

> thanks for the review.
> As always i'll just comment on the major things.
> 
>>
>> On 1/2/25 4:19 PM, Felix Huettner via dev wrote:
>>> Introduce route-exchange-netlink module which implements interface
>>> for maintaining VRFs [0] and routes through Netlink.
>>>
>>> There is a desire to do this without having to (re-)implement
>>> routing protocol state machines in OVN, and to accomplish this we
>>> make use of Netlink.
>>>
>>> Netlink was chosen because:
>>> * Its ubiquitous nature with availability on any Linux system as
>>>   as well other platforms.
>>> * Presence of a very good Netlink library implementation in our
>>>   sibling project and library, Open vSwitch.
>>> * Popular routing protocol software conveniently already have
>>>   support for redistributing routes to/from Netlink.
>>> * Support for interacting with Virtual Routing and Forwarding
>>>   domains [0], allowing full isolation between virtual network
>>>   resources defined within OVN and the hosting system while
>>>   retaining access to all system network interfaces.
>>>
>>> It is important to note that the purpose of this integration is
>>> generic exchange of control plane information, while allowing to
>>> keep the datapath in OVS/OVN, enabling users to leverage its full
>>> range of user-, kernel- and mixed- space datapath implementations.
>>>
>>> 0: https://docs.kernel.org/networking/vrf.html
>>>
>>> This was orignally built by Frode Nordahl <fnord...@ubuntu.com>
>>
>> Should this have Frode as author then?  Also, Frode should be at least
>> co-author.
> 
> I am in contact with Frode and this will be fixed in v4.
> 

Cool, thanks!

>>
>>>
>>> Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud>
>>> ---
>>
>> The code looks mostly ok to me but I left some comments and suggestions
>> below.
>>
>>>  configure.ac                        |   2 +
>>>  controller/automake.mk              |   6 +
>>>  controller/route-exchange-netlink.c | 249 ++++++++++++++++++++++++++++
>>>  controller/route-exchange-netlink.h |  40 +++++
>>>  m4/ovn.m4                           |  25 +++
>>>  tests/automake.mk                   |   5 +
>>>  tests/system-common-macros.at       |  12 ++
>>>  7 files changed, 339 insertions(+)
>>>  create mode 100644 controller/route-exchange-netlink.c
>>>  create mode 100644 controller/route-exchange-netlink.h
>>>
>>> diff --git a/configure.ac b/configure.ac
>>> index 4a63b84d4..cf9b4a6fd 100644
>>> --- a/configure.ac
>>> +++ b/configure.ac
>>> @@ -87,6 +87,8 @@ OVS_CHECK_WIN32
>>>  OVS_CHECK_VISUAL_STUDIO_DDK
>>>  OVN_CHECK_COVERAGE
>>>  OVS_CHECK_NDEBUG
>>> +OVS_CHECK_NETLINK
>>> +OVS_CHECK_LINUX_NETLINK
>>>  OVS_CHECK_OPENSSL
>>>  OVN_CHECK_LOGDIR
>>>  OVN_CHECK_PYTHON3
>>> diff --git a/controller/automake.mk b/controller/automake.mk
>>> index a6a2c517a..39deeb029 100644
>>> --- a/controller/automake.mk
>>> +++ b/controller/automake.mk
>>> @@ -55,6 +55,12 @@ controller_ovn_controller_SOURCES = \
>>>     controller/route.h \
>>>     controller/route.c
>>>  
>>> +if HAVE_NETLINK
>>> +controller_ovn_controller_SOURCES += \
>>> +   controller/route-exchange-netlink.h \
>>> +   controller/route-exchange-netlink.c
>>> +endif
>>> +
>>>  controller_ovn_controller_LDADD = lib/libovn.la 
>>> $(OVS_LIBDIR)/libopenvswitch.la
>>>  man_MANS += controller/ovn-controller.8
>>>  EXTRA_DIST += controller/ovn-controller.8.xml
>>> diff --git a/controller/route-exchange-netlink.c 
>>> b/controller/route-exchange-netlink.c
>>> new file mode 100644
>>> index 000000000..e065c49c1
>>> --- /dev/null
>>> +++ b/controller/route-exchange-netlink.c
>>> @@ -0,0 +1,249 @@
>>> +/*
>>
>> Missing copyright.
>>
>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>> + * you may not use this file except in compliance with the License.
>>> + * You may obtain a copy of the License at:
>>> + *
>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>> + *
>>> + * Unless required by applicable law or agreed to in writing, software
>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>> + * See the License for the specific language governing permissions and
>>> + * limitations under the License.
>>> + */
>>> +
>>> +#include <config.h>
>>> +
>>> +#include <errno.h>
>>> +#include <inttypes.h>
>>> +#include <linux/rtnetlink.h>
>>> +#include <net/if.h>
>>> +#include <netinet/in.h>
>>> +
>>> +#include "netlink-socket.h"
>>> +#include "netlink.h"
>>> +#include "openvswitch/hmap.h"
>>> +#include "openvswitch/ofpbuf.h"
>>> +#include "openvswitch/vlog.h"
>>> +#include "packets.h"
>>> +#include "route-table.h"
>>> +#include "route.h"
>>> +
>>> +#include "route-exchange-netlink.h"
>>> +
>>> +VLOG_DEFINE_THIS_MODULE(route_exchange_netlink);
>>> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>>> +
>>> +#define TABLE_ID_VALID(table_id) (table_id != RT_TABLE_UNSPEC &&           
>>>    \
>>> +                                  table_id != RT_TABLE_COMPAT &&           
>>>    \
>>> +                                  table_id != RT_TABLE_DEFAULT &&          
>>>    \
>>> +                                  table_id != RT_TABLE_MAIN &&             
>>>    \
>>> +                                  table_id != RT_TABLE_LOCAL &&            
>>>    \
>>> +                                  table_id != RT_TABLE_MAX)
>>> +
>>> +static int
>>> +modify_vrf(uint32_t type, uint32_t flags_arg,
>>> +           const char *ifname, uint32_t table_id)
>>> +{
>>> +    uint32_t flags = NLM_F_REQUEST | NLM_F_ACK;
>>> +    size_t linkinfo_off, infodata_off;
>>> +    struct ifinfomsg *ifinfo;
>>> +    struct ofpbuf request;
>>> +    int err;
>>> +
>>> +    flags |= flags_arg;
>>> +
>>> +    ofpbuf_init(&request, 0);
>>> +    nl_msg_put_nlmsghdr(&request, 0, type, flags);
>>> +    ifinfo = ofpbuf_put_zeros(&request, sizeof *ifinfo);
>>> +    nl_msg_put_string(&request, IFLA_IFNAME, ifname);
>>> +    if (type == RTM_DELLINK) {
>>
>> I find this a bit unfortunate.  modify_vrf() seems generic at first
>> glance but then at a closer look I realized we have this piece of code
>> we skip for deletion.
>>
>> I don't have a very strong preference but I think I'd just duplicate the
>> (small) boilerplace section (init + transact + uninit) directly in the
>> re_nl_create_vrf() and re_nl_delete_vrf() functions.
> 
> Yes, looks nicer now.
> 
>>
>>
>>> +        goto out;
>>> +    }
>>> +
>>> +    ifinfo->ifi_change = ifinfo->ifi_flags = IFF_UP;
>>> +    linkinfo_off = nl_msg_start_nested(&request, IFLA_LINKINFO);
>>> +    nl_msg_put_string(&request, IFLA_INFO_KIND, "vrf");
>>> +    infodata_off = nl_msg_start_nested(&request, IFLA_INFO_DATA);
>>> +    nl_msg_put_u32(&request, IFLA_VRF_TABLE, table_id);
>>> +    nl_msg_end_nested(&request, infodata_off);
>>> +    nl_msg_end_nested(&request, linkinfo_off);
>>> +
>>> +out:
>>> +    err = nl_transact(NETLINK_ROUTE, &request, NULL);
>>> +
>>> +    ofpbuf_uninit(&request);
>>> +
>>> +    return err;
>>> +}
>>> +
>>> +int
>>> +re_nl_create_vrf(const char *ifname, uint32_t table_id)
>>> +{
>>> +    uint32_t flags = NLM_F_CREATE | NLM_F_EXCL;
>>> +    uint32_t type = RTM_NEWLINK;
>>> +
>>> +    if (!TABLE_ID_VALID(table_id)) {
>>> +        VLOG_WARN_RL(&rl,
>>> +                     "attempt to create VRF using invalid table id 
>>> %"PRIu32,
>>> +                     table_id);
>>> +        return EINVAL;
>>> +    }
>>> +
>>> +    return modify_vrf(type, flags, ifname, table_id);
>>
>> We could avoid the "type" and "flags" variables and pass the actual
>> constant values here.
>>
>>> +}
>>> +
>>> +int
>>> +re_nl_delete_vrf(const char *ifname)
>>> +{
>>> +    return modify_vrf(RTM_DELLINK, 0, ifname, 0);
>>> +}
>>> +
>>> +static int
>>> +modify_route(uint32_t type, uint32_t flags_arg, uint32_t table_id,
>>> +             const struct in6_addr *dst, unsigned int plen)
>>> +{
>>> +    uint32_t flags = NLM_F_REQUEST | NLM_F_ACK;
>>> +    bool is_ipv4 = IN6_IS_ADDR_V4MAPPED(dst);
>>> +    struct ofpbuf request;
>>> +    struct rtmsg *rt;
>>> +    int err;
>>> +
>>> +    flags |= flags_arg;
>>> +
>>> +    ofpbuf_init(&request, 0);
>>> +    nl_msg_put_nlmsghdr(&request, 0, type, flags);
>>> +    rt = ofpbuf_put_zeros(&request, sizeof *rt);
>>> +    rt->rtm_family = is_ipv4 ? AF_INET : AF_INET6;
>>> +    rt->rtm_table = RT_TABLE_UNSPEC; /* RTA_TABLE attribute allows id > 
>>> 256 */
>>> +    /* Manage only OVN routes */
>>> +    rt->rtm_protocol = RTPROT_OVN;
>>> +    rt->rtm_type = RTN_BLACKHOLE;
>>> +    if (type == RTM_DELROUTE) {
>>> +        rt->rtm_scope = RT_SCOPE_NOWHERE;
>>> +    } else {
>>> +        rt->rtm_scope = RT_SCOPE_UNIVERSE;
>>> +    }
>>> +    rt->rtm_dst_len = plen;
>>> +
>>> +    nl_msg_put_u32(&request, RTA_TABLE, table_id);
>>> +
>>> +    if (is_ipv4) {
>>> +        nl_msg_put_be32(&request, RTA_DST, in6_addr_get_mapped_ipv4(dst));
>>> +    } else {
>>> +        nl_msg_put_in6_addr(&request, RTA_DST, dst);
>>> +    }
>>> +
>>> +    err = nl_transact(NETLINK_ROUTE, &request, NULL);
>>> +    ofpbuf_uninit(&request);
>>> +
>>> +    return err;
>>> +}
>>> +
>>> +int
>>> +re_nl_add_route(uint32_t table_id, const struct in6_addr *dst,
>>> +                unsigned int plen)
>>> +{
>>> +    uint32_t flags = NLM_F_CREATE | NLM_F_EXCL;
>>> +    uint32_t type = RTM_NEWROUTE;
>>> +
>>> +    if (!TABLE_ID_VALID(table_id)) {
>>> +        VLOG_WARN_RL(&rl,
>>> +                     "attempt to add route using invalid table id %"PRIu32,
>>> +                     table_id);
>>> +        return EINVAL;
>>> +    }
>>> +
>>> +    return modify_route(type, flags, table_id, dst, plen);
>>
>> We could avoid the "type" and "flags" variables and pass the actual
>> constant values here.
>>
>>> +}
>>> +
>>> +int
>>> +re_nl_delete_route(uint32_t table_id, const struct in6_addr *dst,
>>> +                   unsigned int plen)
>>> +{
>>> +    if (!TABLE_ID_VALID(table_id)) {
>>> +        VLOG_WARN_RL(&rl,
>>> +                     "attempt to delete route using invalid table id 
>>> %"PRIu32,
>>> +                     table_id);
>>> +        return EINVAL;
>>> +    }
>>> +
>>> +    return modify_route(RTM_DELROUTE, 0, table_id, dst, plen);
>>> +}
>>> +
>>> +struct route_msg_handle_data {
>>> +    const struct hmap *routes;
>>> +};
>>> +
>>> +static void
>>> +handle_route_msg_delete_routes(const struct route_table_msg *msg, void 
>>> *data)
>>> +{
>>> +    const struct route_data *rd = &msg->rd;
>>> +    struct route_msg_handle_data *handle_data = data;
>>> +    const struct hmap *routes = handle_data->routes;
>>> +    struct advertise_route_entry *ar;
>>> +    int err;
>>> +
>>> +    /* This route is not from us, we should not touch it. */
>>> +    if (rd->rtm_protocol != RTPROT_OVN) {
>>> +        return;
>>> +    }
>>> +
>>> +    uint32_t arhash = advertise_route_hash(&rd->rta_dst, rd->rtm_dst_len);
>>> +    HMAP_FOR_EACH_WITH_HASH (ar, node, arhash, routes) {
>>> +        if (ipv6_addr_equals(&ar->addr, &rd->rta_dst)
>>> +                && ar->plen == rd->rtm_dst_len) {
>>> +            ar->installed = true;
>>> +            return;
>>> +        }
>>> +    }
>>> +    err = re_nl_delete_route(rd->rta_table_id, &rd->rta_dst,
>>> +                             rd->rtm_dst_len);
>>> +    if (err) {
>>> +        char addr_s[INET6_ADDRSTRLEN + 1];
>>> +        VLOG_WARN_RL(&rl, "Delete route table_id=%"PRIu32" dst=%s plen=%d: 
>>> %s",
>>> +                     rd->rta_table_id,
>>> +                     ipv6_string_mapped(
>>> +                         addr_s, &rd->rta_dst) ? addr_s : "(invalid)",
>>> +                     rd->rtm_dst_len,
>>> +                     ovs_strerror(err));
>>> +    }
>>> +}
>>> +
>>> +void
>>> +re_nl_sync_routes(uint32_t table_id,
>>> +                  const struct hmap *routes)
>>
>> Fits on one line.
>>
>>> +{
>>> +    struct advertise_route_entry *ar;
>>> +    HMAP_FOR_EACH (ar, node, routes) {
>>> +        ar->installed = false;
>>
>> Do we really really need ar->installed?  Can't we pass a hmapx as
>> additional argument to the dump callback and populate that one with
>> everything that needs to be installed.  The callback would remove all ar
>> pointers that correspond to already installed routes.
> 
> Yes, reworked for this.
> 
>>
>>> +    }
>>> +
>>> +    /* Remove routes from the system that are not in the host_routes hmap 
>>> and
>>
>> In this whole comment probably: s/host_routes/routes
>>
>>> +     * remove entries from host_routes hmap that match routes already 
>>> installed
>>> +     * in the system. */
>>> +    struct route_msg_handle_data data = {
>>> +        .routes = routes,
>>> +    };
>>> +    route_table_dump_one_table(table_id, handle_route_msg_delete_routes,
>>> +                               &data);
>>> +
>>> +    /* Add any remaining routes in the host_routes hmap to the system 
>>> routing
>>> +     * table. */
>>> +    HMAP_FOR_EACH (ar, node, routes) {\
>>
>> Then here we'd just iterate on the remaining hmapx pointers - those
>> correspond to all not-yet-installed entries.
>>
>>> +        if (ar->installed) {
>>> +            continue;
>>> +        }
>>> +        int err = re_nl_add_route(table_id, &ar->addr, ar->plen);
>>> +        if (err) {
>>> +            char addr_s[INET6_ADDRSTRLEN + 1];
>>> +            VLOG_WARN_RL(&rl, "Add route table_id=%"PRIu32" dst=%s "
>>> +                         "plen=%d: %s",
>>> +                         table_id,
>>> +                         ipv6_string_mapped(
>>> +                             addr_s, &ar->addr) ? addr_s : "(invalid)",
>>> +                         ar->plen,
>>> +                         ovs_strerror(err));
>>> +        }
>>> +    }
>>> +}
>>> diff --git a/controller/route-exchange-netlink.h 
>>> b/controller/route-exchange-netlink.h
>>> new file mode 100644
>>> index 000000000..f87ebd75d
>>> --- /dev/null
>>> +++ b/controller/route-exchange-netlink.h
>>> @@ -0,0 +1,40 @@
>>> +/*
>>
>> Missing copyright.
>>
>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>> + * you may not use this file except in compliance with the License.
>>> + * You may obtain a copy of the License at:
>>> + *
>>> + *     http://www.apache.org/licenses/LICENSE-2.0
>>> + *
>>> + * Unless required by applicable law or agreed to in writing, software
>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>> + * See the License for the specific language governing permissions and
>>> + * limitations under the License.
>>> + */
>>> +
>>> +#ifndef ROUTE_EXCHANGE_NETLINK_H
>>> +#define ROUTE_EXCHANGE_NETLINK_H 1
>>> +
>>> +#include <stdint.h>
>>> +
>>> +/* This value is arbitrary but currently unused.
>>
>> :)
>>
>>> + * See 
>>> https://github.com/iproute2/iproute2/blob/main/etc/iproute2/rt_protos
>>
>> Should we try to change this in the future?
> 
> I wanted to first get a general agreement here. I would assume that the
> rt_protos list in iproute2 is not complete so i had the hope that if
> there is a conflict someone will notice it here :)
> 
> If we have an agreement for the value i can send a patch there to get
> the OVN value registered in iproute2.
> 

OK.  Let's wait for the OVN series (well, all of the BGP series) to get
in first.

>>
>>> +#define RTPROT_OVN 84
>>> +
>>> +struct in6_addr;
>>> +struct hmap;
>>> +
>>> +int re_nl_create_vrf(const char *ifname, uint32_t table_id);
>>> +int re_nl_delete_vrf(const char *ifname);
>>> +
>>> +int re_nl_add_route(uint32_t table_id, const struct in6_addr *dst,
>>> +                    unsigned int plen);
>>> +int re_nl_delete_route(uint32_t table_id, const struct in6_addr *dst,
>>> +                       unsigned int plen);
>>> +
>>> +void re_nl_dump(uint32_t table_id);
>>> +
>>> +void re_nl_sync_routes(uint32_t table_id,
>>> +                       const struct hmap *host_routes);
>>
>> s/host_routes/routes/ ?
>>
>> Also, this fits on one line.
>>
>>> +
>>> +#endif /* route-exchange-netlink.h */
>>> diff --git a/m4/ovn.m4 b/m4/ovn.m4
>>> index ebe4c9612..e8f30e0ac 100644
>>> --- a/m4/ovn.m4
>>> +++ b/m4/ovn.m4
>>> @@ -576,3 +576,28 @@ AC_DEFUN([OVN_CHECK_UNBOUND],
>>>     fi
>>>     AM_CONDITIONAL([HAVE_UNBOUND], [test "$HAVE_UNBOUND" = yes])
>>>     AC_SUBST([HAVE_UNBOUND])])
>>> +
>>> +dnl Checks for Netlink support.
>>> +AC_DEFUN([OVS_CHECK_NETLINK],
>>> +  [AC_CHECK_HEADER([linux/netlink.h],
>>> +                   [HAVE_NETLINK=yes],
>>> +                   [HAVE_NETLINK=no],
>>> +                   [#include <sys/socket.h>
>>> +   ])
>>> +   AM_CONDITIONAL([HAVE_NETLINK], [test "$HAVE_NETLINK" = yes])
>>> +   if test "$HAVE_NETLINK" = yes; then
>>> +      AC_DEFINE([HAVE_NETLINK], [1],
>>> +                [Define to 1 if Netlink protocol is available.])
>>> +   fi])
>>> +
>>> +dnl OVS_CHECK_LINUX_NETLINK
>>> +dnl
>>> +dnl Configure Linux netlink compat.
>>> +AC_DEFUN([OVS_CHECK_LINUX_NETLINK], [
>>> +  AC_COMPILE_IFELSE([
>>> +    AC_LANG_PROGRAM([#include <linux/netlink.h>], [
>>> +        struct nla_bitfield32 x =  { 0 };
>>> +    ])],
>>> +    [AC_DEFINE([HAVE_NLA_BITFIELD32], [1],
>>
>> Do we use HAVE_NLA_BITFIELD32 anywhere in any of BGP related the series?
>>  I failed to find any reference to it.  I'm guessing this is a leftover
>> from the time when we tried duplicating the netlink support from OVS, right?
> 
> We actually need to keep it.
> The reason is that route-exchange-netlink.c includes
> "<linux/rtnetlink.h>". This is resolved to "ovs/linux/rtnetlink.h" which
> includes "ovs/linux/netlink.h".
> This file actually uses HAVE_NLA_BITFIELD32 to define "struct
> nla_bitfield32" if it does not exist.
> It then continues to include the official kernel "<linux/netlink.h>".
> 
> So while we do not use the "nla_bitfield32" struct at all we otherwise
> have a confliciting definition as we indirectly include this headerfile.
> 

I see, OK, let's keep it as is then.

>>
>>> +    [Define to 1 if struct nla_bitfield32 is available.])])
>>> +])
>>> diff --git a/tests/automake.mk b/tests/automake.mk
>>> index 9244532fa..11696f159 100644
>>> --- a/tests/automake.mk
>>> +++ b/tests/automake.mk
>>> @@ -308,6 +308,11 @@ tests_ovstest_LDADD = $(OVS_LIBDIR)/daemon.lo \
>>>     controller/route.$(OBJEXT) \
>>>     northd/ipam.$(OBJEXT)
>>>  
>>> +if HAVE_NETLINK
>>> +tests_ovstest_LDADD += \
>>> +   controller/route-exchange-netlink.$(OBJEXT)
>>> +endif
>>> +
>>>  # Python tests.
>>>  CHECK_PYFILES = \
>>>     tests/test-l7.py \
>>> diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at
>>> index c59556173..0ed5bc567 100644
>>> --- a/tests/system-common-macros.at
>>> +++ b/tests/system-common-macros.at
>>> @@ -530,3 +530,15 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
>>>  /failed to query port patch-.*/d
>>>  /.*terminating with signal 15.*/d"])
>>>  ]))
>>> +
>>> +# CHECK_VRF()
>>> +#
>>> +# Perform a requirements check for running VRF tests.
>>> +#
>>> +m4_define([CHECK_VRF],
>>> +[
>>> +    rc=0
>>> +    modprobe vrf || rc=$?
>>> +    AT_SKIP_IF([test $rc -ne 0])
>>> +    on_exit 'modprobe -r vrf'
>>> +])
> 
> Thanks a lot,
> Felix
> 

Thanks,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to