On Wed, Jan 15, 2025 at 11:42:47AM +0100, Dumitru Ceara wrote: > Hi Felix, Frode,
Hi Dumitru, thanks for the review. As always i'll just comment on the major things. > > On 1/2/25 4:19 PM, Felix Huettner via dev wrote: > > Introduce route-exchange-netlink module which implements interface > > for maintaining VRFs [0] and routes through Netlink. > > > > There is a desire to do this without having to (re-)implement > > routing protocol state machines in OVN, and to accomplish this we > > make use of Netlink. > > > > Netlink was chosen because: > > * Its ubiquitous nature with availability on any Linux system as > > as well other platforms. > > * Presence of a very good Netlink library implementation in our > > sibling project and library, Open vSwitch. > > * Popular routing protocol software conveniently already have > > support for redistributing routes to/from Netlink. > > * Support for interacting with Virtual Routing and Forwarding > > domains [0], allowing full isolation between virtual network > > resources defined within OVN and the hosting system while > > retaining access to all system network interfaces. > > > > It is important to note that the purpose of this integration is > > generic exchange of control plane information, while allowing to > > keep the datapath in OVS/OVN, enabling users to leverage its full > > range of user-, kernel- and mixed- space datapath implementations. > > > > 0: https://docs.kernel.org/networking/vrf.html > > > > This was orignally built by Frode Nordahl <fnord...@ubuntu.com> > > Should this have Frode as author then? Also, Frode should be at least > co-author. I am in contact with Frode and this will be fixed in v4. > > > > > Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud> > > --- > > The code looks mostly ok to me but I left some comments and suggestions > below. > > > configure.ac | 2 + > > controller/automake.mk | 6 + > > controller/route-exchange-netlink.c | 249 ++++++++++++++++++++++++++++ > > controller/route-exchange-netlink.h | 40 +++++ > > m4/ovn.m4 | 25 +++ > > tests/automake.mk | 5 + > > tests/system-common-macros.at | 12 ++ > > 7 files changed, 339 insertions(+) > > create mode 100644 controller/route-exchange-netlink.c > > create mode 100644 controller/route-exchange-netlink.h > > > > diff --git a/configure.ac b/configure.ac > > index 4a63b84d4..cf9b4a6fd 100644 > > --- a/configure.ac > > +++ b/configure.ac > > @@ -87,6 +87,8 @@ OVS_CHECK_WIN32 > > OVS_CHECK_VISUAL_STUDIO_DDK > > OVN_CHECK_COVERAGE > > OVS_CHECK_NDEBUG > > +OVS_CHECK_NETLINK > > +OVS_CHECK_LINUX_NETLINK > > OVS_CHECK_OPENSSL > > OVN_CHECK_LOGDIR > > OVN_CHECK_PYTHON3 > > diff --git a/controller/automake.mk b/controller/automake.mk > > index a6a2c517a..39deeb029 100644 > > --- a/controller/automake.mk > > +++ b/controller/automake.mk > > @@ -55,6 +55,12 @@ controller_ovn_controller_SOURCES = \ > > controller/route.h \ > > controller/route.c > > > > +if HAVE_NETLINK > > +controller_ovn_controller_SOURCES += \ > > + controller/route-exchange-netlink.h \ > > + controller/route-exchange-netlink.c > > +endif > > + > > controller_ovn_controller_LDADD = lib/libovn.la > > $(OVS_LIBDIR)/libopenvswitch.la > > man_MANS += controller/ovn-controller.8 > > EXTRA_DIST += controller/ovn-controller.8.xml > > diff --git a/controller/route-exchange-netlink.c > > b/controller/route-exchange-netlink.c > > new file mode 100644 > > index 000000000..e065c49c1 > > --- /dev/null > > +++ b/controller/route-exchange-netlink.c > > @@ -0,0 +1,249 @@ > > +/* > > Missing copyright. > > > + * Licensed under the Apache License, Version 2.0 (the "License"); > > + * you may not use this file except in compliance with the License. > > + * You may obtain a copy of the License at: > > + * > > + * http://www.apache.org/licenses/LICENSE-2.0 > > + * > > + * Unless required by applicable law or agreed to in writing, software > > + * distributed under the License is distributed on an "AS IS" BASIS, > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > + * See the License for the specific language governing permissions and > > + * limitations under the License. > > + */ > > + > > +#include <config.h> > > + > > +#include <errno.h> > > +#include <inttypes.h> > > +#include <linux/rtnetlink.h> > > +#include <net/if.h> > > +#include <netinet/in.h> > > + > > +#include "netlink-socket.h" > > +#include "netlink.h" > > +#include "openvswitch/hmap.h" > > +#include "openvswitch/ofpbuf.h" > > +#include "openvswitch/vlog.h" > > +#include "packets.h" > > +#include "route-table.h" > > +#include "route.h" > > + > > +#include "route-exchange-netlink.h" > > + > > +VLOG_DEFINE_THIS_MODULE(route_exchange_netlink); > > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); > > + > > +#define TABLE_ID_VALID(table_id) (table_id != RT_TABLE_UNSPEC && > > \ > > + table_id != RT_TABLE_COMPAT && > > \ > > + table_id != RT_TABLE_DEFAULT && > > \ > > + table_id != RT_TABLE_MAIN && > > \ > > + table_id != RT_TABLE_LOCAL && > > \ > > + table_id != RT_TABLE_MAX) > > + > > +static int > > +modify_vrf(uint32_t type, uint32_t flags_arg, > > + const char *ifname, uint32_t table_id) > > +{ > > + uint32_t flags = NLM_F_REQUEST | NLM_F_ACK; > > + size_t linkinfo_off, infodata_off; > > + struct ifinfomsg *ifinfo; > > + struct ofpbuf request; > > + int err; > > + > > + flags |= flags_arg; > > + > > + ofpbuf_init(&request, 0); > > + nl_msg_put_nlmsghdr(&request, 0, type, flags); > > + ifinfo = ofpbuf_put_zeros(&request, sizeof *ifinfo); > > + nl_msg_put_string(&request, IFLA_IFNAME, ifname); > > + if (type == RTM_DELLINK) { > > I find this a bit unfortunate. modify_vrf() seems generic at first > glance but then at a closer look I realized we have this piece of code > we skip for deletion. > > I don't have a very strong preference but I think I'd just duplicate the > (small) boilerplace section (init + transact + uninit) directly in the > re_nl_create_vrf() and re_nl_delete_vrf() functions. Yes, looks nicer now. > > > > + goto out; > > + } > > + > > + ifinfo->ifi_change = ifinfo->ifi_flags = IFF_UP; > > + linkinfo_off = nl_msg_start_nested(&request, IFLA_LINKINFO); > > + nl_msg_put_string(&request, IFLA_INFO_KIND, "vrf"); > > + infodata_off = nl_msg_start_nested(&request, IFLA_INFO_DATA); > > + nl_msg_put_u32(&request, IFLA_VRF_TABLE, table_id); > > + nl_msg_end_nested(&request, infodata_off); > > + nl_msg_end_nested(&request, linkinfo_off); > > + > > +out: > > + err = nl_transact(NETLINK_ROUTE, &request, NULL); > > + > > + ofpbuf_uninit(&request); > > + > > + return err; > > +} > > + > > +int > > +re_nl_create_vrf(const char *ifname, uint32_t table_id) > > +{ > > + uint32_t flags = NLM_F_CREATE | NLM_F_EXCL; > > + uint32_t type = RTM_NEWLINK; > > + > > + if (!TABLE_ID_VALID(table_id)) { > > + VLOG_WARN_RL(&rl, > > + "attempt to create VRF using invalid table id > > %"PRIu32, > > + table_id); > > + return EINVAL; > > + } > > + > > + return modify_vrf(type, flags, ifname, table_id); > > We could avoid the "type" and "flags" variables and pass the actual > constant values here. > > > +} > > + > > +int > > +re_nl_delete_vrf(const char *ifname) > > +{ > > + return modify_vrf(RTM_DELLINK, 0, ifname, 0); > > +} > > + > > +static int > > +modify_route(uint32_t type, uint32_t flags_arg, uint32_t table_id, > > + const struct in6_addr *dst, unsigned int plen) > > +{ > > + uint32_t flags = NLM_F_REQUEST | NLM_F_ACK; > > + bool is_ipv4 = IN6_IS_ADDR_V4MAPPED(dst); > > + struct ofpbuf request; > > + struct rtmsg *rt; > > + int err; > > + > > + flags |= flags_arg; > > + > > + ofpbuf_init(&request, 0); > > + nl_msg_put_nlmsghdr(&request, 0, type, flags); > > + rt = ofpbuf_put_zeros(&request, sizeof *rt); > > + rt->rtm_family = is_ipv4 ? AF_INET : AF_INET6; > > + rt->rtm_table = RT_TABLE_UNSPEC; /* RTA_TABLE attribute allows id > > > 256 */ > > + /* Manage only OVN routes */ > > + rt->rtm_protocol = RTPROT_OVN; > > + rt->rtm_type = RTN_BLACKHOLE; > > + if (type == RTM_DELROUTE) { > > + rt->rtm_scope = RT_SCOPE_NOWHERE; > > + } else { > > + rt->rtm_scope = RT_SCOPE_UNIVERSE; > > + } > > + rt->rtm_dst_len = plen; > > + > > + nl_msg_put_u32(&request, RTA_TABLE, table_id); > > + > > + if (is_ipv4) { > > + nl_msg_put_be32(&request, RTA_DST, in6_addr_get_mapped_ipv4(dst)); > > + } else { > > + nl_msg_put_in6_addr(&request, RTA_DST, dst); > > + } > > + > > + err = nl_transact(NETLINK_ROUTE, &request, NULL); > > + ofpbuf_uninit(&request); > > + > > + return err; > > +} > > + > > +int > > +re_nl_add_route(uint32_t table_id, const struct in6_addr *dst, > > + unsigned int plen) > > +{ > > + uint32_t flags = NLM_F_CREATE | NLM_F_EXCL; > > + uint32_t type = RTM_NEWROUTE; > > + > > + if (!TABLE_ID_VALID(table_id)) { > > + VLOG_WARN_RL(&rl, > > + "attempt to add route using invalid table id %"PRIu32, > > + table_id); > > + return EINVAL; > > + } > > + > > + return modify_route(type, flags, table_id, dst, plen); > > We could avoid the "type" and "flags" variables and pass the actual > constant values here. > > > +} > > + > > +int > > +re_nl_delete_route(uint32_t table_id, const struct in6_addr *dst, > > + unsigned int plen) > > +{ > > + if (!TABLE_ID_VALID(table_id)) { > > + VLOG_WARN_RL(&rl, > > + "attempt to delete route using invalid table id > > %"PRIu32, > > + table_id); > > + return EINVAL; > > + } > > + > > + return modify_route(RTM_DELROUTE, 0, table_id, dst, plen); > > +} > > + > > +struct route_msg_handle_data { > > + const struct hmap *routes; > > +}; > > + > > +static void > > +handle_route_msg_delete_routes(const struct route_table_msg *msg, void > > *data) > > +{ > > + const struct route_data *rd = &msg->rd; > > + struct route_msg_handle_data *handle_data = data; > > + const struct hmap *routes = handle_data->routes; > > + struct advertise_route_entry *ar; > > + int err; > > + > > + /* This route is not from us, we should not touch it. */ > > + if (rd->rtm_protocol != RTPROT_OVN) { > > + return; > > + } > > + > > + uint32_t arhash = advertise_route_hash(&rd->rta_dst, rd->rtm_dst_len); > > + HMAP_FOR_EACH_WITH_HASH (ar, node, arhash, routes) { > > + if (ipv6_addr_equals(&ar->addr, &rd->rta_dst) > > + && ar->plen == rd->rtm_dst_len) { > > + ar->installed = true; > > + return; > > + } > > + } > > + err = re_nl_delete_route(rd->rta_table_id, &rd->rta_dst, > > + rd->rtm_dst_len); > > + if (err) { > > + char addr_s[INET6_ADDRSTRLEN + 1]; > > + VLOG_WARN_RL(&rl, "Delete route table_id=%"PRIu32" dst=%s plen=%d: > > %s", > > + rd->rta_table_id, > > + ipv6_string_mapped( > > + addr_s, &rd->rta_dst) ? addr_s : "(invalid)", > > + rd->rtm_dst_len, > > + ovs_strerror(err)); > > + } > > +} > > + > > +void > > +re_nl_sync_routes(uint32_t table_id, > > + const struct hmap *routes) > > Fits on one line. > > > +{ > > + struct advertise_route_entry *ar; > > + HMAP_FOR_EACH (ar, node, routes) { > > + ar->installed = false; > > Do we really really need ar->installed? Can't we pass a hmapx as > additional argument to the dump callback and populate that one with > everything that needs to be installed. The callback would remove all ar > pointers that correspond to already installed routes. Yes, reworked for this. > > > + } > > + > > + /* Remove routes from the system that are not in the host_routes hmap > > and > > In this whole comment probably: s/host_routes/routes > > > + * remove entries from host_routes hmap that match routes already > > installed > > + * in the system. */ > > + struct route_msg_handle_data data = { > > + .routes = routes, > > + }; > > + route_table_dump_one_table(table_id, handle_route_msg_delete_routes, > > + &data); > > + > > + /* Add any remaining routes in the host_routes hmap to the system > > routing > > + * table. */ > > + HMAP_FOR_EACH (ar, node, routes) {\ > > Then here we'd just iterate on the remaining hmapx pointers - those > correspond to all not-yet-installed entries. > > > + if (ar->installed) { > > + continue; > > + } > > + int err = re_nl_add_route(table_id, &ar->addr, ar->plen); > > + if (err) { > > + char addr_s[INET6_ADDRSTRLEN + 1]; > > + VLOG_WARN_RL(&rl, "Add route table_id=%"PRIu32" dst=%s " > > + "plen=%d: %s", > > + table_id, > > + ipv6_string_mapped( > > + addr_s, &ar->addr) ? addr_s : "(invalid)", > > + ar->plen, > > + ovs_strerror(err)); > > + } > > + } > > +} > > diff --git a/controller/route-exchange-netlink.h > > b/controller/route-exchange-netlink.h > > new file mode 100644 > > index 000000000..f87ebd75d > > --- /dev/null > > +++ b/controller/route-exchange-netlink.h > > @@ -0,0 +1,40 @@ > > +/* > > Missing copyright. > > > + * Licensed under the Apache License, Version 2.0 (the "License"); > > + * you may not use this file except in compliance with the License. > > + * You may obtain a copy of the License at: > > + * > > + * http://www.apache.org/licenses/LICENSE-2.0 > > + * > > + * Unless required by applicable law or agreed to in writing, software > > + * distributed under the License is distributed on an "AS IS" BASIS, > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > + * See the License for the specific language governing permissions and > > + * limitations under the License. > > + */ > > + > > +#ifndef ROUTE_EXCHANGE_NETLINK_H > > +#define ROUTE_EXCHANGE_NETLINK_H 1 > > + > > +#include <stdint.h> > > + > > +/* This value is arbitrary but currently unused. > > :) > > > + * See > > https://github.com/iproute2/iproute2/blob/main/etc/iproute2/rt_protos > > Should we try to change this in the future? I wanted to first get a general agreement here. I would assume that the rt_protos list in iproute2 is not complete so i had the hope that if there is a conflict someone will notice it here :) If we have an agreement for the value i can send a patch there to get the OVN value registered in iproute2. > > > +#define RTPROT_OVN 84 > > + > > +struct in6_addr; > > +struct hmap; > > + > > +int re_nl_create_vrf(const char *ifname, uint32_t table_id); > > +int re_nl_delete_vrf(const char *ifname); > > + > > +int re_nl_add_route(uint32_t table_id, const struct in6_addr *dst, > > + unsigned int plen); > > +int re_nl_delete_route(uint32_t table_id, const struct in6_addr *dst, > > + unsigned int plen); > > + > > +void re_nl_dump(uint32_t table_id); > > + > > +void re_nl_sync_routes(uint32_t table_id, > > + const struct hmap *host_routes); > > s/host_routes/routes/ ? > > Also, this fits on one line. > > > + > > +#endif /* route-exchange-netlink.h */ > > diff --git a/m4/ovn.m4 b/m4/ovn.m4 > > index ebe4c9612..e8f30e0ac 100644 > > --- a/m4/ovn.m4 > > +++ b/m4/ovn.m4 > > @@ -576,3 +576,28 @@ AC_DEFUN([OVN_CHECK_UNBOUND], > > fi > > AM_CONDITIONAL([HAVE_UNBOUND], [test "$HAVE_UNBOUND" = yes]) > > AC_SUBST([HAVE_UNBOUND])]) > > + > > +dnl Checks for Netlink support. > > +AC_DEFUN([OVS_CHECK_NETLINK], > > + [AC_CHECK_HEADER([linux/netlink.h], > > + [HAVE_NETLINK=yes], > > + [HAVE_NETLINK=no], > > + [#include <sys/socket.h> > > + ]) > > + AM_CONDITIONAL([HAVE_NETLINK], [test "$HAVE_NETLINK" = yes]) > > + if test "$HAVE_NETLINK" = yes; then > > + AC_DEFINE([HAVE_NETLINK], [1], > > + [Define to 1 if Netlink protocol is available.]) > > + fi]) > > + > > +dnl OVS_CHECK_LINUX_NETLINK > > +dnl > > +dnl Configure Linux netlink compat. > > +AC_DEFUN([OVS_CHECK_LINUX_NETLINK], [ > > + AC_COMPILE_IFELSE([ > > + AC_LANG_PROGRAM([#include <linux/netlink.h>], [ > > + struct nla_bitfield32 x = { 0 }; > > + ])], > > + [AC_DEFINE([HAVE_NLA_BITFIELD32], [1], > > Do we use HAVE_NLA_BITFIELD32 anywhere in any of BGP related the series? > I failed to find any reference to it. I'm guessing this is a leftover > from the time when we tried duplicating the netlink support from OVS, right? We actually need to keep it. The reason is that route-exchange-netlink.c includes "<linux/rtnetlink.h>". This is resolved to "ovs/linux/rtnetlink.h" which includes "ovs/linux/netlink.h". This file actually uses HAVE_NLA_BITFIELD32 to define "struct nla_bitfield32" if it does not exist. It then continues to include the official kernel "<linux/netlink.h>". So while we do not use the "nla_bitfield32" struct at all we otherwise have a confliciting definition as we indirectly include this headerfile. > > > + [Define to 1 if struct nla_bitfield32 is available.])]) > > +]) > > diff --git a/tests/automake.mk b/tests/automake.mk > > index 9244532fa..11696f159 100644 > > --- a/tests/automake.mk > > +++ b/tests/automake.mk > > @@ -308,6 +308,11 @@ tests_ovstest_LDADD = $(OVS_LIBDIR)/daemon.lo \ > > controller/route.$(OBJEXT) \ > > northd/ipam.$(OBJEXT) > > > > +if HAVE_NETLINK > > +tests_ovstest_LDADD += \ > > + controller/route-exchange-netlink.$(OBJEXT) > > +endif > > + > > # Python tests. > > CHECK_PYFILES = \ > > tests/test-l7.py \ > > diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at > > index c59556173..0ed5bc567 100644 > > --- a/tests/system-common-macros.at > > +++ b/tests/system-common-macros.at > > @@ -530,3 +530,15 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d > > /failed to query port patch-.*/d > > /.*terminating with signal 15.*/d"]) > > ])) > > + > > +# CHECK_VRF() > > +# > > +# Perform a requirements check for running VRF tests. > > +# > > +m4_define([CHECK_VRF], > > +[ > > + rc=0 > > + modprobe vrf || rc=$? > > + AT_SKIP_IF([test $rc -ne 0]) > > + on_exit 'modprobe -r vrf' > > +]) Thanks a lot, Felix > > Thanks, > Dumitru > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev