The patch introduces experimental AF_XDP support for OVS netdev. AF_XDP is a new address family working together with eBPF/XDP. A socket with AF_XDP family can receive and send raw packets from an eBPF/XDP program attached to the netdev. For details introduction and configuration, see Documentation/intro/install/afxdp.rst
Signed-off-by: William Tu <u9012...@gmail.com> Co-authored-by: Yi-Hung Wei <yihung....@gmail.com> Cc: Tim Rozet <tro...@redhat.com> Cc: Eelco Chaudron <echau...@redhat.com> --- Documentation/automake.mk | 1 + Documentation/index.rst | 1 + Documentation/intro/install/afxdp.rst | 299 +++++++++++ Documentation/intro/install/index.rst | 1 + acinclude.m4 | 23 + configure.ac | 1 + lib/automake.mk | 7 +- lib/dp-packet.c | 13 + lib/dp-packet.h | 32 +- lib/dpif-netdev-perf.h | 13 + lib/netdev-afxdp.c | 592 ++++++++++++++++++++ lib/netdev-afxdp.h | 46 ++ lib/netdev-linux.c | 89 +++- lib/netdev-linux.h | 1 + lib/netdev-provider.h | 1 + lib/netdev.c | 1 + lib/xdpsock.c | 211 ++++++++ lib/xdpsock.h | 133 +++++ tests/automake.mk | 17 + tests/system-afxdp-macros.at | 153 ++++++ tests/system-afxdp-testsuite.at | 26 + tests/system-afxdp-traffic.at | 978 ++++++++++++++++++++++++++++++++++ 22 files changed, 2633 insertions(+), 6 deletions(-) create mode 100644 Documentation/intro/install/afxdp.rst create mode 100644 lib/netdev-afxdp.c create mode 100644 lib/netdev-afxdp.h create mode 100644 lib/xdpsock.c create mode 100644 lib/xdpsock.h create mode 100644 tests/system-afxdp-macros.at create mode 100644 tests/system-afxdp-testsuite.at create mode 100644 tests/system-afxdp-traffic.at diff --git a/Documentation/automake.mk b/Documentation/automake.mk index 082438e09a33..11cc59efc881 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -10,6 +10,7 @@ DOC_SOURCE = \ Documentation/intro/why-ovs.rst \ Documentation/intro/install/index.rst \ Documentation/intro/install/bash-completion.rst \ + Documentation/intro/install/afxdp.rst \ Documentation/intro/install/debian.rst \ Documentation/intro/install/documentation.rst \ Documentation/intro/install/distributions.rst \ diff --git a/Documentation/index.rst b/Documentation/index.rst index 46261235c732..aa9e7c49f179 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -59,6 +59,7 @@ vSwitch? Start here. :doc:`intro/install/windows` | :doc:`intro/install/xenserver` | :doc:`intro/install/dpdk` | + :doc:`intro/install/afxdp` | :doc:`Installation FAQs <faq/releases>` - **Tutorials:** :doc:`tutorials/faucet` | diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst new file mode 100644 index 000000000000..db55d1bc4dd7 --- /dev/null +++ b/Documentation/intro/install/afxdp.rst @@ -0,0 +1,299 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + + +======================== +Open vSwitch with AF_XDP +======================== + +This document describes how to build and install Open vSwitch using +AF_XDP netdev. + +.. warning:: + The AF_XDP support of Open vSwitch is considered 'experimental', + it is not compiled in by default. + +Introduction +------------ +AF_XDP, Address Family eXpress Data Path, is a new address family +working together with eBPF/XDP. An AF_XDP socket can receive and +send packets from an eBPF/XDP program attached to the netdev and +have much better performance than AF_PACKET. +For more details about AF_XDP, please see linux kernel's +Documentation/networking/af_xdp.rst + +AF_XDP Netdev +------------- +OVS has a couple of netdev types, i.e., system, tap, or +internal. The AF_XDP feature adds a new netdev types called +"afxdp", and implement its configuration, packet reception, +and transmit functions. Since the AF_XDP socket, xsk, +operates in userspace, once ovs-vswitchd receives packets +from xsk, the proposed architecture re-uses the existing +userspace dpif-netdev datapath. As a result, most of +the packet processing happens at the userspace instead of +linux kernel. + +:: + + | +-------------------+ + | | ovs-vswitchd |<-->ovsdb-server + | +-------------------+ + | | ofproto |<-->OpenFlow controllers + | +--------+-+--------+ + | | netdev | |ofproto-| + userspace | +--------+ | dpif | + | | afxdp | +--------+ + | | netdev | | dpif | + | +---||---+ +--------+ + | || | dpif- | + | || | netdev | + |_ || +--------+ + || + _ +---||-----+--------+ + | | AF_XDP prog + | + kernel | | xsk_map | + |_ +--------||---------+ + || + physical + NIC + + +Build requirements +------------------ + +In addition to the requirements described in :doc:`general`, building Open +vSwitch with AF_XDP will require the following: + +- libbpf from kernel source tree (kernel 5.0.0 or later) + +- Linux kernel XDP support, with the following options + ``_CONFIG_BPF=y`` + + ``_CONFIG_BPF_SYSCALL=y`` + + ``_CONFIG_XDP_SOCKETS=y`` + + +- The following optional Kconfig options are also recommended, but not required: + + ``_CONFIG_BPF_JIT=y`` + + ``_CONFIG_HAVE_BPF_JIT=y`` + + ``_CONFIG_XDP_SOCKETS_DIAG=y`` + +- If possible, run **./xdpsock -r -N -z -i <your device>** under linux/samples/bpf. + This is the OVS indepedent benchmark tools for AF_XDP. It makes sure your basic + kernel requirements are met for AF_XDP. + + +Installing +---------- +For OVS to use AF_XDP netdev, it has to be configured with LIBBPF support. +Fist, clone a recent version of Linux bpf-next tree:: + + git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git + +Second, go into the Linux source directory and build libbpf in the tools directory:: + + cd bpf-next/ + cd tools/lib/bpf/ + make && make install + make install_headers + +.. note:: + Make sure xsk.h is install in system's library path + +Make sure the libbpf.so is installed correctly:: + + ldconfig -p | grep libbpf + + +Third, ensure the standard OVS requirements are installed and bootstrap/configure the package:: + + ./boot.sh && ./configure --enable-afxdp + +Finally, build and install OVS:: + + make && make install + +To kick start end-to-end autotesting:: + + make check-afxdp + +if a test case fails, check the log at:: + + cat tests/system-afxdp-testsuite.dir/<failed number>/system-afxdp-testsuite.log + + +Setup AF_XDP netdev +------------------- +Before running OVS with AF_XDP, make sure the libbpf and libelf are set-up right:: + + ldd vswitchd/ovs-vswitchd + +Open vSwitch should be started using userspace datapath as described in :doc:`general`:: + + ovs-vswitchd --disable-system + ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev + +.. note:: + OVS AF_XDP netdev is using the userspace datapath, the same datapath + as used by OVS-DPDK. So it requires --disable-system for ovs-vswitchd + and datapath_type=netdev when adding a new bridge. + +Make sure your device support AF_XDP, and to use 1 PMD (on core 4) +on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask, +pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or "skb":: + + ethtool -L enp2s0 combined 1 + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10 + ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \ + options:n_rxq=1 options:xdpmode=drv other_config:pmd-rxq-affinity="0:4" + +Or, use 4 pmds/cores and 4 queues by doing:: + + ethtool -L enp2s0 combined 4 + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36 + ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \ + options:n_rxq=4 options:xdpmode=drv other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4" + +To validate that the bridge has successfully instantiated, you can use the:: + + ovs-vsctl show + +should show something like:: + + Port "ens802f0" + Interface "ens802f0" + type: afxdp + options: {n_rxq="1", xdpmode=drv} + +Otherwise, enable debug by:: + + ovs-appctl vlog/set netdev_afxdp::dbg + + +References +---------- +Most of the design details are described in the paper presented at +Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1], +section 4, and slides[2][4]. +"The Path to DPDK Speeds for AF XDP"[3] gives a very good introduction +about AF_XDP current and future work. + + +[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf + +[2] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf + +[3] http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf + +[4] https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp + + +Performance Tuning +------------------ +The name of the game is to keep your CPU running in userspace, allowing PMD to +keep polling the AF_XDP queues without any interferences from kernel. + +#. Make sure everything is in the same NUMA node (memory used by AF_XDP, pmd + running cores, device plug-in slot) + +#. Isolate your CPU by doing isolcpu at grub configure. + +#. IRQ should not set to pmd running core. + +#. The Spectre and Meltdown fixes increase the overhead of system calls. + +Debugging performance issue +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +While running the traffic, use linux perf tool to see where your cpu spends its cycle:: + + cd bpf-next/tools/perf + make + ./perf record -p `pidof ovs-vswitchd` sleep 10 + ./perf report + +Or, use OVS pmd tool:: + + ovs-appctl dpif-netdev/pmd-stats-show + + +Example Script +-------------- + +Below is a script using namespaces and veth peer:: + + #!/bin/bash + ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl --disable-system --detach + ovs-vsctl -- add-br br0 -- set Bridge br0 \ + protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 \ + fail-mode=secure datapath_type=netdev + ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev + + ip netns add at_ns0 + ovs-appctl vlog/set netdev_afxdp::dbg + + ip link add p0 type veth peer name afxdp-p0 + ip link set p0 netns at_ns0 + ip link set dev afxdp-p0 up + ovs-vsctl add-port br0 afxdp-p0 -- \ + set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp" + + ip netns exec at_ns0 sh << NS_EXEC_HEREDOC + ip addr add "10.1.1.1/24" dev p0 + ip link set dev p0 up + NS_EXEC_HEREDOC + + ip netns add at_ns1 + ip link add p1 type veth peer name afxdp-p1 + ip link set p1 netns at_ns1 + ip link set dev afxdp-p1 up + + ovs-vsctl add-port br0 afxdp-p1 -- \ + set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp" + ip netns exec at_ns1 sh << NS_EXEC_HEREDOC + ip addr add "10.1.1.2/24" dev p1 + ip link set dev p1 up + NS_EXEC_HEREDOC + + ip netns exec at_ns0 ping -i .2 10.1.1.2 + + +Limitations/Known Issues +------------------------ +#. Device's numa ID is always 0, need a way to find numa id from a netdev. +#. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A possible + work-around is to use OpenFlow meter action. +#. AF_XDP device added to bridge, remove, and added again will fail. +#. Most of the tests are done using i40e single port. Multiple ports and + also ixgbe driver also needs to be tested. +#. No latency test result (TODO items) + + +Bug Reporting +------------- + +Please report problems to d...@openvswitch.org. diff --git a/Documentation/intro/install/index.rst b/Documentation/intro/install/index.rst index 3193c736cf17..c27a9c9d16ff 100644 --- a/Documentation/intro/install/index.rst +++ b/Documentation/intro/install/index.rst @@ -45,6 +45,7 @@ Installation from Source xenserver userspace dpdk + afxdp Installation from Packages -------------------------- diff --git a/acinclude.m4 b/acinclude.m4 index 1607d5f4b1d9..9ff981e28c32 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -221,6 +221,29 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [ ]) ]) +dnl OVS_CHECK_LINUX_AF_XDP +dnl +dnl Check both Linux kernel AF_XDP and libbpf support +AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [ + AC_MSG_CHECKING([whether AF_XDP is supported]) + AC_ARG_ENABLE([afxdp], + [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP support])], + [], [enable_afxdp=no]) + AC_CHECK_HEADER([bpf/libbpf.h], + [HAVE_LIBBPF=yes], + [HAVE_LIBBPF=no]) + AC_CHECK_HEADER([linux/if_xdp.h], + [HAVE_IF_XDP=yes], + [HAVE_IF_XDP=no]) + AM_CONDITIONAL([SUPPORT_AF_XDP], + [test "$enable_afxdp" = yes && test "$HAVE_LIBBPF" = yes && test "$HAVE_IF_XDP" = yes]) + AM_COND_IF([SUPPORT_AF_XDP], [ + AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF-XDP support is available and enabled.]) + LIBBPF_LDADD=" -lbpf -lelf" + AC_SUBST([LIBBPF_LDADD]) + ]) +]) + dnl OVS_CHECK_DPDK dnl dnl Configure DPDK source tree diff --git a/configure.ac b/configure.ac index 505e3d041e93..29c90b73f836 100644 --- a/configure.ac +++ b/configure.ac @@ -99,6 +99,7 @@ OVS_CHECK_SPHINX OVS_CHECK_DOT OVS_CHECK_IF_DL OVS_CHECK_STRTOK_R +OVS_CHECK_LINUX_AF_XDP AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]]) AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec], [], [], [[#include <sys/stat.h>]]) diff --git a/lib/automake.mk b/lib/automake.mk index cc5dccf39d6b..8b9df5635bbe 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -9,6 +9,7 @@ lib_LTLIBRARIES += lib/libopenvswitch.la lib_libopenvswitch_la_LIBADD = $(SSL_LIBS) lib_libopenvswitch_la_LIBADD += $(CAPNG_LDADD) +lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD) if WIN32 lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS} @@ -327,7 +328,11 @@ lib_libopenvswitch_la_SOURCES = \ lib/lldp/lldpd.c \ lib/lldp/lldpd.h \ lib/lldp/lldpd-structs.c \ - lib/lldp/lldpd-structs.h + lib/lldp/lldpd-structs.h \ + lib/xdpsock.c \ + lib/xdpsock.h \ + lib/netdev-afxdp.c \ + lib/netdev-afxdp.h if WIN32 lib_libopenvswitch_la_SOURCES += \ diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 0976a35e758b..5db1aa4b67c0 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -122,6 +122,16 @@ dp_packet_uninit(struct dp_packet *b) * created as a dp_packet */ free_dpdk_buf((struct dp_packet*) b); #endif + } else if (b->source == DPBUF_AFXDP) { +#ifdef HAVE_AF_XDP + struct dp_packet_afxdp *xpacket; + + xpacket = dp_packet_cast_afxdp(b); + if (xpacket->mpool) { + umem_elem_push(xpacket->mpool, dp_packet_base(b)); + } +#endif + return; } } } @@ -248,6 +258,8 @@ dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom case DPBUF_STACK: OVS_NOT_REACHED(); + case DPBUF_AFXDP: + OVS_NOT_REACHED(); case DPBUF_STUB: b->source = DPBUF_MALLOC; new_base = xmalloc(new_allocated); @@ -433,6 +445,7 @@ dp_packet_steal_data(struct dp_packet *b) { void *p; ovs_assert(b->source != DPBUF_DPDK); + ovs_assert(b->source != DPBUF_AFXDP); if (b->source == DPBUF_MALLOC && dp_packet_data(b) == dp_packet_base(b)) { p = dp_packet_data(b); diff --git a/lib/dp-packet.h b/lib/dp-packet.h index a5e9ade1244a..78f92f0be583 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -30,6 +30,7 @@ #include "packets.h" #include "util.h" #include "flow.h" +#include "lib/xdpsock.h" #ifdef __cplusplus extern "C" { @@ -42,6 +43,7 @@ enum OVS_PACKED_ENUM dp_packet_source { DPBUF_DPDK, /* buffer data is from DPDK allocated memory. * ref to dp_packet_init_dpdk() in dp-packet.c. */ + DPBUF_AFXDP, /* buffer data from XDP frame */ }; #define DP_PACKET_CONTEXT_SIZE 64 @@ -89,6 +91,20 @@ struct dp_packet { }; }; +struct dp_packet_afxdp { + struct umem_pool *mpool; + struct dp_packet packet; +}; + +#if HAVE_AF_XDP +static struct dp_packet_afxdp * +dp_packet_cast_afxdp(const struct dp_packet *d OVS_UNUSED) +{ + ovs_assert(d->source == DPBUF_AFXDP); + return CONTAINER_OF(d, struct dp_packet_afxdp, packet); +} +#endif + static inline void *dp_packet_data(const struct dp_packet *); static inline void dp_packet_set_data(struct dp_packet *, void *); static inline void *dp_packet_base(const struct dp_packet *); @@ -183,7 +199,21 @@ dp_packet_delete(struct dp_packet *b) free_dpdk_buf((struct dp_packet*) b); return; } - + if (b->source == DPBUF_AFXDP) { +#ifdef HAVE_AF_XDP + struct dp_packet_afxdp *xpacket; + + /* if a packet is received from afxdp port, + * and tx to a system port. Then we need to + * push the rx umem back here + */ + xpacket = dp_packet_cast_afxdp(b); + if (xpacket->mpool) { + umem_elem_push(xpacket->mpool, dp_packet_base(b)); + } +#endif + return; + } dp_packet_uninit(b); free(b); } diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index 859c05613ddf..e47cf73bf3c9 100644 --- a/lib/dpif-netdev-perf.h +++ b/lib/dpif-netdev-perf.h @@ -198,6 +198,19 @@ cycles_counter_update(struct pmd_perf_stats *s) { #ifdef DPDK_NETDEV return s->last_tsc = rte_get_tsc_cycles(); +#elif HAVE_AF_XDP + union { + uint64_t tsc_64; + struct { + uint32_t lo_32; + uint32_t hi_32; + }; + } tsc; + asm volatile("rdtsc" : + "=a" (tsc.lo_32), + "=d" (tsc.hi_32)); + + return s->last_tsc = tsc.tsc_64; #else return s->last_tsc = 0; #endif diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c new file mode 100644 index 000000000000..56f313606190 --- /dev/null +++ b/lib/netdev-afxdp.c @@ -0,0 +1,592 @@ +/* + * Copyright (c) 2018, 2019 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include <config.h> +#include "netdev-linux.h" +#include <errno.h> +#include <fcntl.h> +#include <sys/types.h> +#include <netinet/in.h> +#include <arpa/inet.h> +#include <inttypes.h> +#include <linux/filter.h> +#include <linux/gen_stats.h> +#include <linux/if_ether.h> +#include <linux/if_tun.h> +#include <linux/types.h> +#include <linux/ethtool.h> +#include <linux/mii.h> +#include <linux/rtnetlink.h> +#include <linux/sockios.h> +#include <linux/if_xdp.h> +#include <sys/ioctl.h> +#include <sys/socket.h> +#include <sys/utsname.h> +#include <netpacket/packet.h> +#include <net/if.h> +#include <net/if_arp.h> +#include <net/route.h> +#include <poll.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#include "coverage.h" +#include "dp-packet.h" +#include "dpif-netlink.h" +#include "dpif-netdev.h" +#include "openvswitch/dynamic-string.h" +#include "fatal-signal.h" +#include "hash.h" +#include "openvswitch/hmap.h" +#include "netdev-provider.h" +#include "netdev-tc-offloads.h" +#include "netdev-vport.h" +#include "netlink-notifier.h" +#include "netlink-socket.h" +#include "netlink.h" +#include "netnsid.h" +#include "openvswitch/ofpbuf.h" +#include "openflow/openflow.h" +#include "ovs-atomic.h" +#include "packets.h" +#include "openvswitch/poll-loop.h" +#include "rtnetlink.h" +#include "openvswitch/shash.h" +#include "socket-util.h" +#include "sset.h" +#include "tc.h" +#include "timer.h" +#include "unaligned.h" +#include "openvswitch/vlog.h" +#include "util.h" +#include "xdpsock.h" +#include "netdev-afxdp.h" + +#ifdef HAVE_AF_XDP +#ifndef SOL_XDP +#define SOL_XDP 283 +#endif +#ifndef AF_XDP +#define AF_XDP 44 +#endif +#ifndef PF_XDP +#define PF_XDP AF_XDP +#endif + +VLOG_DEFINE_THIS_MODULE(netdev_afxdp); +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + +#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base)) +#define UMEM2XPKT(base, i) \ + (struct dp_packet_afxdp *)((char *)base + i * sizeof(struct dp_packet_afxdp)) + +static uint32_t opt_xdp_bind_flags = XDP_COPY; +static uint32_t opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE; +#if 0 +static uint32_t opt_xdp_bind_flags = XDP_ZEROCOPY; +static uint32_t opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE; +#endif +static uint32_t prog_id; + +static struct xsk_umem_info *xsk_configure_umem(void *buffer, uint64_t size) +{ + struct xsk_umem_info *umem; + int ret; + int i; + + umem = xcalloc(1, sizeof(*umem)); + if (!umem) { + VLOG_FATAL("xsk config umem failed (%s)", ovs_strerror(errno)); + } + + ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq, + NULL); + + if (ret) { + VLOG_FATAL("xsk umem create failed (%s) mode: %s", + ovs_strerror(errno), + opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV"); + } + + umem->buffer = buffer; + + /* set-up umem pool */ + umem_pool_init(&umem->mpool, NUM_FRAMES); + + for (i = NUM_FRAMES - 1; i >= 0; i--) { + struct umem_elem *elem; + + elem = (struct umem_elem *)((char *)umem->buffer + + i * FRAME_SIZE); + umem_elem_push(&umem->mpool, elem); + } + + /* set-up metadata */ + xpacket_pool_init(&umem->xpool, NUM_FRAMES); + + VLOG_DBG("%s xpacket pool from %p to %p", __func__, + umem->xpool.array, + (char *)umem->xpool.array + + NUM_FRAMES * sizeof(struct dp_packet_afxdp)); + + for (i = NUM_FRAMES - 1; i >= 0; i--) { + struct dp_packet_afxdp *xpacket; + struct dp_packet *packet; + + xpacket = UMEM2XPKT(umem->xpool.array, i); + xpacket->mpool = &umem->mpool; + + packet = &xpacket->packet; + packet->source = DPBUF_AFXDP; + } + + return umem; +} + +static struct xsk_socket_info * +xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex, + uint32_t queue_id) +{ + struct xsk_socket_config cfg; + struct xsk_socket_info *xsk; + char devname[IF_NAMESIZE]; + uint32_t idx; + int ret; + int i; + + xsk = xcalloc(1, sizeof(*xsk)); + if (!xsk) { + VLOG_FATAL("xsk calloc failed (%s)", ovs_strerror(errno)); + } + + xsk->umem = umem; + cfg.rx_size = CONS_NUM_DESCS; + cfg.tx_size = PROD_NUM_DESCS; + cfg.libbpf_flags = 0; + cfg.xdp_flags = opt_xdp_flags; + cfg.bind_flags = opt_xdp_bind_flags; + + if (if_indextoname(ifindex, devname) == NULL) { + VLOG_FATAL("ifindex %d devname failed (%s)", + ifindex, ovs_strerror(errno)); + } + + ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem, + &xsk->rx, &xsk->tx, &cfg); + if (ret) { + VLOG_FATAL("xsk_socket_create failed (%s) mode: %s qid: %d", + ovs_strerror(errno), + opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV", + queue_id); + } + + /* make sure the XDP program is there */ + ret = bpf_get_link_xdp_id(ifindex, &prog_id, opt_xdp_flags); + if (ret) { + VLOG_FATAL("get XDP prog ID failed (%s)", ovs_strerror(errno)); + } + + ret = xsk_ring_prod__reserve(&xsk->umem->fq, + PROD_NUM_DESCS, + &idx); + if (ret != PROD_NUM_DESCS) { + VLOG_FATAL("fq set-up failed (%s)", ovs_strerror(errno)); + } + + for (i = 0; + i < PROD_NUM_DESCS * FRAME_SIZE; + i += FRAME_SIZE) { + struct umem_elem *elem; + uint64_t addr; + + elem = umem_elem_pop(&xsk->umem->mpool); + addr = UMEM2DESC(elem, xsk->umem->buffer); + + *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = addr; + } + + xsk_ring_prod__submit(&xsk->umem->fq, + PROD_NUM_DESCS); + return xsk; +} + +struct xsk_socket_info * +xsk_configure(int ifindex, int xdp_queue_id) +{ + struct xsk_socket_info *xsk; + struct xsk_umem_info *umem; + void *bufs; + int ret; + + ret = posix_memalign(&bufs, getpagesize(), + NUM_FRAMES * FRAME_SIZE); + ovs_assert(!ret); + + /* Create sockets... */ + umem = xsk_configure_umem(bufs, + NUM_FRAMES * FRAME_SIZE); + xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id); + return xsk; +} + +static void OVS_UNUSED vlog_hex_dump(const void *buf, size_t count) +{ + struct ds ds = DS_EMPTY_INITIALIZER; + ds_put_hex_dump(&ds, buf, count, 0, false); + VLOG_DBG_RL(&rl, "%s", ds_cstr(&ds)); + ds_destroy(&ds); +} + +void +xsk_destroy(struct xsk_socket_info *xsk) +{ + struct xsk_umem *umem; + + if (!xsk) { + return; + } + + umem = xsk->umem->umem; + xsk_socket__delete(xsk->xsk); + (void)xsk_umem__delete(umem); + + /* cleanup umem pool */ + umem_pool_cleanup(&xsk->umem->mpool); + + /* cleanup metadata pool */ + xpacket_pool_cleanup(&xsk->umem->xpool); + + return; +} + +static inline void +print_xsk_stat(struct xsk_socket_info *xsk OVS_UNUSED) { + struct xdp_statistics stat; + socklen_t optlen; + + optlen = sizeof(stat); + ovs_assert(getsockopt(xsk_socket__fd(xsk->xsk), SOL_XDP, XDP_STATISTICS, + &stat, &optlen) == 0); + + VLOG_DBG_RL(&rl, "rx dropped %llu, rx_invalid %llu, tx_invalid %llu", + stat.rx_dropped, stat.rx_invalid_descs, stat.tx_invalid_descs); + return; +} + +int +netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args, + char **errp OVS_UNUSED) +{ + const char *xdpmode; + int new_n_rxq; + + /* TODO: add mutex lock */ + new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); + + if (netdev->n_rxq != new_n_rxq) { + + if (new_n_rxq > MAX_XSKQ) { + VLOG_WARN("set n_rxq %d too large", new_n_rxq); + goto out; + } + + netdev->n_rxq = new_n_rxq; + VLOG_INFO("set AF_XDP device %s to %d n_rxq", netdev->name, new_n_rxq); + netdev_request_reconfigure(netdev); + } + + xdpmode = smap_get(args, "xdpmode"); + if (xdpmode && strncmp(xdpmode, "drv", 3) == 0) { + if (opt_xdp_bind_flags != XDP_ZEROCOPY) { + opt_xdp_bind_flags = XDP_ZEROCOPY; + opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE; + } + VLOG_INFO("AF_XDP device %s in ZC driver mode", netdev->name); + } else { + opt_xdp_bind_flags = XDP_COPY; + opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE; + VLOG_INFO("AF_XDP device %s in SKB mode", netdev->name); + } + +out: + return 0; +} + +int +netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args) +{ + /* TODO: add mutex lock */ + smap_add_format(args, "n_rxq", "%d", netdev->n_rxq); + smap_add_format(args, "xdpmode", "%s", + opt_xdp_bind_flags == XDP_ZEROCOPY ? "drv" : "skb"); + + return 0; +} + +int +netdev_afxdp_get_numa_id(const struct netdev *netdev) +{ + /* FIXME: Get netdev's PCIe device ID, then find + * its NUMA node id. + */ + VLOG_INFO("FIXME: Device %s always use numa id 0", netdev->name); + return 0; +} + +void +xsk_remove_xdp_program(uint32_t ifindex) +{ + uint32_t curr_prog_id = 0; + + /* remove_xdp_program() */ + if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, opt_xdp_flags)) { + bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags); + } + if (prog_id == curr_prog_id) { + bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags); + } else if (!curr_prog_id) { + VLOG_WARN("couldn't find a prog id on a given interface"); + } else { + VLOG_WARN("program on interface changed, not removing"); + } + + return; +} + +/* Receive packet from AF_XDP socket */ +int +netdev_linux_rxq_xsk(struct xsk_socket_info *xsk, + struct dp_packet_batch *batch) +{ + unsigned int rcvd, i; + uint32_t idx_rx = 0, idx_fq = 0; + int ret = 0; + + /* See if there is any packet on RX queue, + * if yes, idx_rx is the index having the packet. + */ + rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx); + if (!rcvd) { + return 0; + } + + /* Form a dp_packet batch from descriptor in RX queue */ + for (i = 0; i < rcvd; i++) { + uint64_t addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr; + uint32_t len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->len; + char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr); + uint64_t index; + + struct dp_packet_afxdp *xpacket; + struct dp_packet *packet; + + index = addr >> FRAME_SHIFT; + xpacket = UMEM2XPKT(xsk->umem->xpool.array, index); + + packet = &xpacket->packet; + xpacket->mpool = &xsk->umem->mpool; + + if (packet->source != DPBUF_AFXDP) { + /* FIXME: might be a bug */ + continue; + } + + /* Initialize the struct dp_packet */ + if (opt_xdp_bind_flags == XDP_ZEROCOPY) { + dp_packet_set_base(packet, pkt - FRAME_HEADROOM); + } else { + /* SKB mode */ + dp_packet_set_base(packet, pkt); + } + dp_packet_set_data(packet, pkt); + dp_packet_set_size(packet, len); + + /* Add packet into batch, increase batch->count */ + dp_packet_batch_add(batch, packet); + + idx_rx++; + } + + /* We've consume rcvd packets in RX, now re-fill the + * same number back to FILL queue. + */ + for (i = 0; i < rcvd; i++) { + uint64_t index; + struct umem_elem *elem; + + ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq); + while (ret == 0) { + /* The FILL queue is full, so retry. (or skip)? */ + ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq); + } + + /* Get one free umem, program it into FILL queue */ + elem = umem_elem_pop(&xsk->umem->mpool); + index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer); + ovs_assert((index & FRAME_SHIFT_MASK) == 0); + *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq) = index; + + idx_fq++; + } + xsk_ring_prod__submit(&xsk->umem->fq, rcvd); + + /* Release the RX queue */ + xsk_ring_cons__release(&xsk->rx, rcvd); + xsk->rx_npkts += rcvd; + +#ifdef AFXDP_DEBUG + print_xsk_stat(xsk); +#endif + return 0; +} + +static void kick_tx(struct xsk_socket_info *xsk) +{ + int ret; + + ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0); + if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN || errno == EBUSY) { + return; + } +} + +/* + * A dp_packet might come from + * 1) AFXDP buffer + * 2) non-AFXDP buffer, ex: send from tap device + */ +int +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk, + struct dp_packet_batch *batch) +{ + uint32_t tx_done, idx_cq = 0; + struct dp_packet *packet; + uint32_t idx; + int j; + + /* Make sure we have enough TX descs */ + if (xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx) == 0) { + return -EAGAIN; + } + + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { + struct dp_packet_afxdp *xpacket; + struct umem_elem *elem; + uint64_t index; + + elem = umem_elem_pop(&xsk->umem->mpool); + if (!elem) { + return -EAGAIN; + } + + memcpy(elem, dp_packet_data(packet), dp_packet_size(packet)); + + index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer); + xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index; + xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len + = dp_packet_size(packet); + + if (packet->source == DPBUF_AFXDP) { + xpacket = dp_packet_cast_afxdp(packet); + umem_elem_push(xpacket->mpool, dp_packet_base(packet)); + /* Avoid freeing it twice at dp_packet_uninit */ + xpacket->mpool = NULL; + } + } + xsk_ring_prod__submit(&xsk->tx, batch->count); + xsk->outstanding_tx += batch->count; + +retry: + kick_tx(xsk); + + /* Process CQ */ + tx_done = xsk_ring_cons__peek(&xsk->umem->cq, batch->count, &idx_cq); + if (tx_done > 0) { + xsk->outstanding_tx -= tx_done; + xsk->tx_npkts += tx_done; + } + + /* Recycle back to umem pool */ + for (j = 0; j < tx_done; j++) { + struct umem_elem *elem; + uint64_t addr; + + addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++); + + elem = (struct umem_elem *)((char *)xsk->umem->buffer + addr); + umem_elem_push(&xsk->umem->mpool, elem); + } + xsk_ring_cons__release(&xsk->umem->cq, tx_done); + + if (xsk->outstanding_tx > PROD_NUM_DESCS - (PROD_NUM_DESCS >> 2)) { + /* If there are still a lot not transmitted, + * try harder. + */ + goto retry; + } + + return 0; +} + +#else +struct xsk_socket_info * +xsk_configure(int ifindex OVS_UNUSED, int xdp_queue_id OVS_UNUSED) +{ + return NULL; +} + +void +xsk_destroy(struct xsk_socket_info *xsk OVS_UNUSED) +{ + return; +} + +int +netdev_linux_rxq_xsk(struct xsk_socket_info *xsk OVS_UNUSED, + struct dp_packet_batch *batch OVS_UNUSED) +{ + return 0; +} + +int +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk OVS_UNUSED, + struct dp_packet_batch *batch OVS_UNUSED) +{ + return 0; +} + +int +netdev_afxdp_set_config(struct netdev *netdev OVS_UNUSED, + const struct smap *args OVS_UNUSED, + char **errp OVS_UNUSED) +{ + return 0; +} + +int +netdev_afxdp_get_config(const struct netdev *netdev OVS_UNUSED, + struct smap *args OVS_UNUSED) +{ + return 0; +} + +int +netdev_afxdp_get_numa_id(const struct netdev *netdev OVS_UNUSED) +{ + return 0; +} +#endif diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h new file mode 100644 index 000000000000..dd78135c77dc --- /dev/null +++ b/lib/netdev-afxdp.h @@ -0,0 +1,46 @@ +/* + * Copyright (c) 2018 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef NETDEV_AFXDP_H +#define NETDEV_AFXDP_H 1 + +#include <stdint.h> +#include <stdbool.h> + +/* These functions are Linux AF_XDP specific, so they should be used directly + * only by Linux-specific code. */ +#define MAX_XSKQ 16 +struct netdev; +struct xsk_socket_info; +struct xdp_umem; +struct dp_packet_batch; + +struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id); +void xsk_destroy(struct xsk_socket_info *xsk); + +int netdev_linux_rxq_xsk(struct xsk_socket_info *xsk, + struct dp_packet_batch *batch); + +int netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk, + struct dp_packet_batch *batch); + +void xsk_remove_xdp_program(uint32_t ifindex); +int netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args, + char **errp); +int netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args); +int netdev_afxdp_get_numa_id(const struct netdev *netdev); + +#endif /* netdev-afxdp.h */ diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 84e2f52dc7b8..d4058d3b1d18 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -75,6 +75,7 @@ #include "unaligned.h" #include "openvswitch/vlog.h" #include "util.h" +#include "netdev-afxdp.h" VLOG_DEFINE_THIS_MODULE(netdev_linux); @@ -531,6 +532,7 @@ struct netdev_linux { /* LAG information. */ bool is_lag_master; /* True if the netdev is a LAG master. */ + struct xsk_socket_info *xsk[MAX_XSKQ]; /* af_xdp socket */ }; struct netdev_rxq_linux { @@ -580,12 +582,18 @@ is_netdev_linux_class(const struct netdev_class *netdev_class) } static bool +is_afxdp_netdev(const struct netdev *netdev) +{ + return netdev_get_class(netdev) == &netdev_afxdp_class; +} + +static bool is_tap_netdev(const struct netdev *netdev) { return netdev_get_class(netdev) == &netdev_tap_class; } -static struct netdev_linux * +struct netdev_linux * netdev_linux_cast(const struct netdev *netdev) { ovs_assert(is_netdev_linux_class(netdev_get_class(netdev))); @@ -1084,6 +1092,25 @@ netdev_linux_destruct(struct netdev *netdev_) atomic_count_dec(&miimon_cnt); } +#if HAVE_AF_XDP + if (is_afxdp_netdev(netdev_)) { + int ifindex; + int ret, i; + + ret = get_ifindex(netdev_, &ifindex); + if (ret) { + VLOG_ERR("get ifindex error"); + } else { + for (i = 0; i < MAX_XSKQ; i++) { + if (netdev->xsk[i]) { + VLOG_INFO("destroy xsk[%d]", i); + xsk_destroy(netdev->xsk[i]); + } + } + xsk_remove_xdp_program(ifindex); + } + } +#endif ovs_mutex_destroy(&netdev->mutex); } @@ -1113,6 +1140,32 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_) rx->is_tap = is_tap_netdev(netdev_); if (rx->is_tap) { rx->fd = netdev->tap_fd; + } else if (is_afxdp_netdev(netdev_)) { +#if HAVE_AF_XDP + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; + int ifindex; + int xdp_queue_id = rxq_->queue_id; + struct xsk_socket_info *xsk; + + if (setrlimit(RLIMIT_MEMLOCK, &r)) { + VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n", + ovs_strerror(errno)); + ovs_assert(0); + } + + VLOG_DBG("%s: %s: queue=%d configuring xdp sock", + __func__, netdev_->name, xdp_queue_id); + + /* Get ethernet device index. */ + error = get_ifindex(&netdev->up, &ifindex); + if (error) { + goto error; + } + + xsk = xsk_configure(ifindex, xdp_queue_id); + netdev->xsk[xdp_queue_id] = xsk; + rx->fd = xsk_socket__fd(xsk->xsk); /* for netdev layer to poll */ +#endif } else { struct sockaddr_ll sll; int ifindex, val; @@ -1318,9 +1371,16 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch, { struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_); struct netdev *netdev = rx->up.netdev; - struct dp_packet *buffer; + struct dp_packet *buffer = NULL; ssize_t retval; int mtu; + struct netdev_linux *netdev_ = netdev_linux_cast(netdev); + + if (is_afxdp_netdev(netdev)) { + int qid = rxq_->queue_id; + + return netdev_linux_rxq_xsk(netdev_->xsk[qid], batch); + } if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) { mtu = ETH_PAYLOAD_MAX; @@ -1329,6 +1389,7 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch, /* Assume Ethernet port. No need to set packet_type. */ buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu, DP_NETDEV_HEADROOM); + retval = (rx->is_tap ? netdev_linux_rxq_recv_tap(rx->fd, buffer) : netdev_linux_rxq_recv_sock(rx->fd, buffer)); @@ -1473,14 +1534,15 @@ netdev_linux_tap_batch_send(struct netdev *netdev_, * The kernel maintains a packet transmission queue, so the caller is not * expected to do additional queuing of packets. */ static int -netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED, +netdev_linux_send(struct netdev *netdev_, int qid, struct dp_packet_batch *batch, bool concurrent_txq OVS_UNUSED) { int error = 0; int sock = 0; - if (!is_tap_netdev(netdev_)) { + if (!is_tap_netdev(netdev_) && + !is_afxdp_netdev(netdev_)) { if (netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_))) { error = EOPNOTSUPP; goto free_batch; @@ -1499,6 +1561,10 @@ netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED, } error = netdev_linux_sock_batch_send(sock, ifindex, batch); + } else if (is_afxdp_netdev(netdev_)) { + struct netdev_linux *netdev = netdev_linux_cast(netdev_); + + error = netdev_linux_afxdp_batch_send(netdev->xsk[qid], batch); } else { error = netdev_linux_tap_batch_send(netdev_, batch); } @@ -3322,6 +3388,7 @@ const struct netdev_class netdev_linux_class = { NETDEV_LINUX_CLASS_COMMON, LINUX_FLOW_OFFLOAD_API, .type = "system", + .is_pmd = false, .construct = netdev_linux_construct, .get_stats = netdev_linux_get_stats, .get_features = netdev_linux_get_features, @@ -3332,6 +3399,7 @@ const struct netdev_class netdev_linux_class = { const struct netdev_class netdev_tap_class = { NETDEV_LINUX_CLASS_COMMON, .type = "tap", + .is_pmd = false, .construct = netdev_linux_construct_tap, .get_stats = netdev_tap_get_stats, .get_features = netdev_linux_get_features, @@ -3342,10 +3410,23 @@ const struct netdev_class netdev_internal_class = { NETDEV_LINUX_CLASS_COMMON, LINUX_FLOW_OFFLOAD_API, .type = "internal", + .is_pmd = false, .construct = netdev_linux_construct, .get_stats = netdev_internal_get_stats, .get_status = netdev_internal_get_status, }; + +const struct netdev_class netdev_afxdp_class = { + NETDEV_LINUX_CLASS_COMMON, + .type = "afxdp", + .is_pmd = true, + .construct = netdev_linux_construct, + .get_stats = netdev_linux_get_stats, + .get_status = netdev_linux_get_status, + .set_config = netdev_afxdp_set_config, + .get_config = netdev_afxdp_get_config, + .get_numa_id = netdev_afxdp_get_numa_id, +}; #define CODEL_N_QUEUES 0x0000 diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h index 17ca9120168a..afcb20ee8d0a 100644 --- a/lib/netdev-linux.h +++ b/lib/netdev-linux.h @@ -28,6 +28,7 @@ struct netdev; int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t flag, const char *flag_name, bool enable); int linux_get_ifindex(const char *netdev_name); +struct netdev_linux *netdev_linux_cast(const struct netdev *netdev); #define LINUX_FLOW_OFFLOAD_API \ .flow_flush = netdev_tc_flow_flush, \ diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index fb0c27e6e8e8..5bf041316503 100644 --- a/lib/netdev-provider.h +++ b/lib/netdev-provider.h @@ -902,6 +902,7 @@ extern const struct netdev_class netdev_linux_class; #endif extern const struct netdev_class netdev_internal_class; extern const struct netdev_class netdev_tap_class; +extern const struct netdev_class netdev_afxdp_class; #ifdef __cplusplus } diff --git a/lib/netdev.c b/lib/netdev.c index 7d7ecf6f0946..c30016b34033 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -145,6 +145,7 @@ netdev_initialize(void) netdev_register_provider(&netdev_linux_class); netdev_register_provider(&netdev_internal_class); netdev_register_provider(&netdev_tap_class); + netdev_register_provider(&netdev_afxdp_class); netdev_vport_tunnel_register(); #endif #if defined(__FreeBSD__) || defined(__NetBSD__) diff --git a/lib/xdpsock.c b/lib/xdpsock.c new file mode 100644 index 000000000000..9bd574e61774 --- /dev/null +++ b/lib/xdpsock.c @@ -0,0 +1,211 @@ +/* + * Copyright (c) 2018, 2019 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include <config.h> +#include <ctype.h> +#include <errno.h> +#include <fcntl.h> +#include <stdarg.h> +#include <stdlib.h> +#include <string.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <syslog.h> +#include <time.h> +#include <unistd.h> +#include "openvswitch/vlog.h" +#include "async-append.h" +#include "coverage.h" +#include "dirs.h" +#include "openvswitch/dynamic-string.h" +#include "openvswitch/ofpbuf.h" +#include "ovs-thread.h" +#include "sat-math.h" +#include "socket-util.h" +#include "svec.h" +#include "syslog-direct.h" +#include "syslog-libc.h" +#include "syslog-provider.h" +#include "timeval.h" +#include "unixctl.h" +#include "util.h" +#include "ovs-atomic.h" +#include "xdpsock.h" +#include "openvswitch/compiler.h" +#include "dp-packet.h" + +#ifdef HAVE_AF_XDP +static inline void ovs_spinlock_init(ovs_spinlock_t *sl) +{ + sl->locked = 0; +} + +static inline void ovs_spin_lock(ovs_spinlock_t *sl) +{ + int exp = 0; + + while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0, + __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) { + while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) { + ; + } + exp = 0; + } +} + +static inline void ovs_spin_unlock(ovs_spinlock_t *sl) +{ + __atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE); +} + +static inline int ovs_spin_trylock(ovs_spinlock_t *sl) +{ + int exp = 0; + return __atomic_compare_exchange_n(&sl->locked, &exp, 1, + 0, /* disallow spurious failure */ + __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); +} + +void +__umem_elem_push_n(struct umem_pool *umemp OVS_UNUSED, void **addrs, int n) +{ + void *ptr; + + if (OVS_UNLIKELY(umemp->index + n > umemp->size)) { + OVS_NOT_REACHED(); + } + + ptr = &umemp->array[umemp->index]; + memcpy(ptr, addrs, n * sizeof(void *)); + umemp->index += n; +} + +inline void +__umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr) +{ + umemp->array[umemp->index++] = addr; +} + +void +umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr) +{ + + if (OVS_UNLIKELY(umemp->index >= umemp->size)) { + /* stack is full */ + /* it's possible that one umem gets pushed twice, + * because actions=1,2,3... multiple ports? + */ + OVS_NOT_REACHED(); + } + + ovs_assert(((uint64_t)addr & FRAME_SHIFT_MASK) == 0); + + ovs_spin_lock(&umemp->mutex); + __umem_elem_push(umemp, addr); + ovs_spin_unlock(&umemp->mutex); +} + +void +__umem_elem_pop_n(struct umem_pool *umemp OVS_UNUSED, void **addrs, int n) +{ + void *ptr; + + umemp->index -= n; + + if (OVS_UNLIKELY(umemp->index < 0)) { + OVS_NOT_REACHED(); + } + + ptr = &umemp->array[umemp->index]; + memcpy(addrs, ptr, n * sizeof(void *)); +} + +inline void * +__umem_elem_pop(struct umem_pool *umemp OVS_UNUSED) +{ + return umemp->array[--umemp->index]; +} + +void * +umem_elem_pop(struct umem_pool *umemp OVS_UNUSED) +{ + void *ptr; + + ovs_spin_lock(&umemp->mutex); + ptr = __umem_elem_pop(umemp); + ovs_spin_unlock(&umemp->mutex); + + return ptr; +} + +void ** +__umem_pool_alloc(unsigned int size) +{ + void *bufs; + + ovs_assert(posix_memalign(&bufs, getpagesize(), + size * sizeof(void *)) == 0); + memset(bufs, 0, size * sizeof(void *)); + return (void **)bufs; +} + +unsigned int +umem_elem_count(struct umem_pool *mpool) +{ + return mpool->index; +} + +int +umem_pool_init(struct umem_pool *umemp OVS_UNUSED, unsigned int size) +{ + umemp->array = __umem_pool_alloc(size); + if (!umemp->array) { + OVS_NOT_REACHED(); + } + + umemp->size = size; + umemp->index = 0; + ovs_spinlock_init(&umemp->mutex); + return 0; +} + +void +umem_pool_cleanup(struct umem_pool *umemp OVS_UNUSED) +{ + free(umemp->array); +} + +/* AF_XDP metadata init/destroy */ +int +xpacket_pool_init(struct xpacket_pool *xp, unsigned int size) +{ + void *bufs; + + ovs_assert(posix_memalign(&bufs, getpagesize(), + size * sizeof(struct dp_packet_afxdp)) == 0); + memset(bufs, 0, size * sizeof(struct dp_packet_afxdp)); + + xp->array = bufs; + xp->size = size; + return 0; +} + +void +xpacket_pool_cleanup(struct xpacket_pool *xp) +{ + free(xp->array); +} +#else /* !HAVE_AF_XDP below */ +#endif diff --git a/lib/xdpsock.h b/lib/xdpsock.h new file mode 100644 index 000000000000..cb64befe7dba --- /dev/null +++ b/lib/xdpsock.h @@ -0,0 +1,133 @@ +/* + * Copyright (c) 2018, 2019 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#ifndef XDPSOCK_H +#define XDPSOCK_H 1 +#include <errno.h> +#include <getopt.h> +#include <libgen.h> +#include <linux/bpf.h> +#include <linux/if_link.h> +#include <linux/if_xdp.h> +#include <linux/if_ether.h> +#include <net/if.h> +#include <signal.h> +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <net/ethernet.h> +#include <sys/resource.h> +#include <sys/socket.h> +#include <sys/mman.h> +#include <time.h> +#include <unistd.h> +#include <pthread.h> +#include <locale.h> +#include <sys/types.h> +#include <poll.h> +#include <bpf/libbpf.h> + +#include "ovs-atomic.h" +#include "openvswitch/thread.h" + +/* bpf/xsk.h uses the following macros not defined in OVS, + * so re-define them before include. + */ +#define unlikely OVS_UNLIKELY +#define likely OVS_LIKELY +#define barrier() __asm__ __volatile__("": : :"memory") +#define smp_rmb() barrier() +#define smp_wmb() barrier() +#include <bpf/xsk.h> + +#define FRAME_HEADROOM XDP_PACKET_HEADROOM +#define FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE +#define BATCH_SIZE NETDEV_MAX_BURST +#define FRAME_SHIFT XSK_UMEM__DEFAULT_FRAME_SHIFT +#define FRAME_SHIFT_MASK ((1<<FRAME_SHIFT)-1) + +#define NUM_FRAMES 1024 +#define PROD_NUM_DESCS 128 +#define CONS_NUM_DESCS 128 + +#ifdef USE_XSK_DEFAULT +#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS +#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS +#endif + +typedef struct { + volatile int locked; +} ovs_spinlock_t; + +/* LIFO ptr_array */ +struct umem_pool { + int index; /* point to top */ + unsigned int size; + ovs_spinlock_t mutex; + void **array; /* a pointer array */ +}; + +/* array-based dp_packet_afxdp */ +struct xpacket_pool { + unsigned int size; + struct dp_packet_afxdp **array; +}; + +struct xsk_umem_info { + struct umem_pool mpool; + struct xpacket_pool xpool; + struct xsk_ring_prod fq; + struct xsk_ring_cons cq; + struct xsk_umem *umem; + void *buffer; +}; + +struct xsk_socket_info { + struct xsk_ring_cons rx; + struct xsk_ring_prod tx; + struct xsk_umem_info *umem; + struct xsk_socket *xsk; + unsigned long rx_npkts; + unsigned long tx_npkts; + unsigned long prev_rx_npkts; + unsigned long prev_tx_npkts; + uint32_t outstanding_tx; +}; + +struct umem_elem_head { + unsigned int index; + struct ovs_mutex mutex; + uint32_t n; +}; + +struct umem_elem { + struct umem_elem *next; +}; + +void __umem_elem_push(struct umem_pool *umemp, void *addr); +void umem_elem_push(struct umem_pool *umemp, void *addr); +void *__umem_elem_pop(struct umem_pool *umemp); +void *umem_elem_pop(struct umem_pool *umemp); +void **__umem_pool_alloc(unsigned int size); +int umem_pool_init(struct umem_pool *umemp, unsigned int size); +void umem_pool_cleanup(struct umem_pool *umemp); +unsigned int umem_elem_count(struct umem_pool *mpool); +void __umem_elem_pop_n(struct umem_pool *umemp, void **addrs, int n); +void __umem_elem_push_n(struct umem_pool *umemp, void **addrs, int n); +int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size); +void xpacket_pool_cleanup(struct xpacket_pool *xp); + +#endif diff --git a/tests/automake.mk b/tests/automake.mk index 017d2d416156..b591405c1a21 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -4,12 +4,14 @@ EXTRA_DIST += \ $(SYSTEM_TESTSUITE_AT) \ $(SYSTEM_KMOD_TESTSUITE_AT) \ $(SYSTEM_USERSPACE_TESTSUITE_AT) \ + $(SYSTEM_AFXDP_TESTSUITE_AT) \ $(SYSTEM_OFFLOADS_TESTSUITE_AT) \ $(SYSTEM_DPDK_TESTSUITE_AT) \ $(OVSDB_CLUSTER_TESTSUITE_AT) \ $(TESTSUITE) \ $(SYSTEM_KMOD_TESTSUITE) \ $(SYSTEM_USERSPACE_TESTSUITE) \ + $(SYSTEM_AFXDP_TESTSUITE) \ $(SYSTEM_OFFLOADS_TESTSUITE) \ $(SYSTEM_DPDK_TESTSUITE) \ $(OVSDB_CLUSTER_TESTSUITE) \ @@ -157,6 +159,11 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \ tests/system-userspace-macros.at \ tests/system-userspace-packet-type-aware.at +SYSTEM_AFXDP_TESTSUITE_AT = \ + tests/system-afxdp-testsuite.at \ + tests/system-afxdp-traffic.at \ + tests/system-afxdp-macros.at + SYSTEM_TESTSUITE_AT = \ tests/system-common-macros.at \ tests/system-ovn.at \ @@ -181,6 +188,7 @@ TESTSUITE = $(srcdir)/tests/testsuite TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite +SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite OVSDB_CLUSTER_TESTSUITE = $(srcdir)/tests/ovsdb-cluster-testsuite @@ -314,6 +322,11 @@ check-system-userspace: all set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \ "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) +check-afxdp: all + $(MAKE) install + set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \ + "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) + check-offloads: all set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \ "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) @@ -351,6 +364,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at $(AM_V_at)mv $@.tmp $@ +$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT) + $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at + $(AM_V_at)mv $@.tmp $@ + $(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT) $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at $(AM_V_at)mv $@.tmp $@ diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at new file mode 100644 index 000000000000..2c58c2d6554b --- /dev/null +++ b/tests/system-afxdp-macros.at @@ -0,0 +1,153 @@ +# _ADD_BR([name]) +# +# Expands into the proper ovs-vsctl commands to create a bridge with the +# appropriate type and properties +m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1 datapath_type=netdev protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 fail-mode=secure ]]) + +# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output], [=override]) +# +# Creates a database and starts ovsdb-server, starts ovs-vswitchd +# connected to that database, calls ovs-vsctl to create a bridge named +# br0 with predictable settings, passing 'vsctl-args' as additional +# commands to ovs-vsctl. If 'vsctl-args' causes ovs-vsctl to provide +# output (e.g. because it includes "create" commands) then 'vsctl-output' +# specifies the expected output after filtering through uuidfilt. +m4_define([OVS_TRAFFIC_VSWITCHD_START], + [ + export OVS_PKGDATADIR=$(`pwd`) + _OVS_VSWITCHD_START([--disable-system]) + AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| uuidfilt])], [0], [$2]) +]) + +# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds]) +# +# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log files +# for messages with severity WARN or higher and signaling an error if any +# is present. The optional WHITELIST may contain shell-quoted "sed" +# commands to delete any warnings that are actually expected, e.g.: +# +# OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"]) +# +# 'extra_cmds' are shell commands to be executed afte OVS_VSWITCHD_STOP() is +# invoked. They can be used to perform additional cleanups such as name space +# removal. +m4_define([OVS_TRAFFIC_VSWITCHD_STOP], + [OVS_VSWITCHD_STOP([dnl +$1";/netdev_linux.*obtaining netdev stats via vport failed/d +/dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded./d +/dpif_netdev(revalidator.*)|ERR|internal error parsing flow key/d +/dpif(revalidator.*)|WARN|netdev@ovs-netdev: failed to put/d +"]) + AT_CHECK([:; $2]) + ]) + +m4_define([ADD_VETH_AFXDP], + [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return 77]) + CONFIGURE_AFXDP_VETH_OFFLOADS([$1]) + AT_CHECK([ip link set $1 netns $2]) + AT_CHECK([ip link set dev ovs-$1 up]) + AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \ + set interface ovs-$1 external-ids:iface-id="$1" type="afxdp"]) + NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7]) + NS_CHECK_EXEC([$2], [ip link set dev $1 up]) + if test -n "$5"; then + NS_CHECK_EXEC([$2], [ip link set dev $1 address $5]) + fi + if test -n "$6"; then + NS_CHECK_EXEC([$2], [ip route add default via $6]) + fi + on_exit 'ip link del ovs-$1' + ] +) + +# CONFIGURE_AFXDP_VETH_OFFLOADS([VETH]) +# +# Disable TX offloads and VLAN offloads for veths used in AF_XDP. +m4_define([CONFIGURE_AFXDP_VETH_OFFLOADS], + [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore]) + AT_CHECK([ethtool -K $1 rxvlan off], [0], [ignore], [ignore]) + AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore]) + ] +) + +# CONFIGURE_VETH_OFFLOADS([VETH]) +# +# Disable TX offloads for veths. The userspace datapath uses the AF_PACKET +# socket to receive packets for veths. Unfortunately, the AF_PACKET socket +# doesn't play well with offloads: +# 1. GSO packets are received without segmentation and therefore discarded. +# 2. Packets with offloaded partial checksum are received with the wrong +# checksum, therefore discarded by the receiver. +# +# By disabling tx offloads in the non-OVS side of the veth peer we make sure +# that the AF_PACKET socket will not receive bad packets. +# +# This is a workaround, and should be removed when offloads are properly +# supported in netdev-linux. +m4_define([CONFIGURE_VETH_OFFLOADS], + [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])] +) + +# CHECK_CONNTRACK() +# +# Perform requirements checks for running conntrack tests. +# +m4_define([CHECK_CONNTRACK], + [AT_SKIP_IF([test $HAVE_PYTHON = no])] +) + +# CHECK_CONNTRACK_ALG() +# +# Perform requirements checks for running conntrack ALG tests. The userspace +# supports FTP and TFTP. +# +m4_define([CHECK_CONNTRACK_ALG]) + +# CHECK_CONNTRACK_FRAG() +# +# Perform requirements checks for running conntrack fragmentations tests. +# The userspace doesn't support fragmentation yet, so skip the tests. +m4_define([CHECK_CONNTRACK_FRAG], +[ + AT_SKIP_IF([:]) +]) + +# CHECK_CONNTRACK_LOCAL_STACK() +# +# Perform requirements checks for running conntrack tests with local stack. +# While the kernel connection tracker automatically passes all the connection +# tracking state from an internal port to the OpenvSwitch kernel module, there +# is simply no way of doing that with the userspace, so skip the tests. +m4_define([CHECK_CONNTRACK_LOCAL_STACK], +[ + AT_SKIP_IF([:]) +]) + +# CHECK_CONNTRACK_NAT() +# +# Perform requirements checks for running conntrack NAT tests. The userspace +# datapath supports NAT. +# +m4_define([CHECK_CONNTRACK_NAT]) + +# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE() +# +# Perform requirements checks for running ovs-dpctl flush-conntrack by +# conntrack 5-tuple test. The userspace datapath does not support +# this feature yet. +m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE], +[ + AT_SKIP_IF([:]) +]) + +# CHECK_CT_DPIF_SET_GET_MAXCONNS() +# +# Perform requirements checks for running ovs-dpctl ct-set-maxconns or +# ovs-dpctl ct-get-maxconns. The userspace datapath does support this feature. +m4_define([CHECK_CT_DPIF_SET_GET_MAXCONNS]) + +# CHECK_CT_DPIF_GET_NCONNS() +# +# Perform requirements checks for running ovs-dpctl ct-get-nconns. The +# userspace datapath does support this feature. +m4_define([CHECK_CT_DPIF_GET_NCONNS]) diff --git a/tests/system-afxdp-testsuite.at b/tests/system-afxdp-testsuite.at new file mode 100644 index 000000000000..538c0d15d556 --- /dev/null +++ b/tests/system-afxdp-testsuite.at @@ -0,0 +1,26 @@ +AT_INIT + +AT_COPYRIGHT([Copyright (c) 2018 Nicira, Inc. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at: + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License.]) + +m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS]) + +m4_include([tests/ovs-macros.at]) +m4_include([tests/ovsdb-macros.at]) +m4_include([tests/ofproto-macros.at]) +m4_include([tests/system-afxdp-macros.at]) +m4_include([tests/system-common-macros.at]) + +m4_include([tests/system-afxdp-traffic.at]) +m4_include([tests/system-ovn.at]) diff --git a/tests/system-afxdp-traffic.at b/tests/system-afxdp-traffic.at new file mode 100644 index 000000000000..26f72acf48ef --- /dev/null +++ b/tests/system-afxdp-traffic.at @@ -0,0 +1,978 @@ +AT_BANNER([AF_XDP netdev datapath-sanity]) + +AT_SETUP([datapath - ping between two ports]) +OVS_TRAFFIC_VSWITCHD_START() + +ulimit -l unlimited + +ADD_NAMESPACES(at_ns0, at_ns1) +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping between two ports on vlan]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +ADD_VLAN(p0, at_ns0, 100, "10.2.2.1/24") +ADD_VLAN(p1, at_ns1, 100, "10.2.2.2/24") + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping6 between two ports]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96") +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96") + +dnl Linux seems to take a little time to get its IPv6 stack in order. Without +dnl waiting, we get occasional failures due to the following error: +dnl "connect: Cannot assign requested address" +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2]) + +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping6 between two ports on vlan]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96") +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96") + +ADD_VLAN(p0, at_ns0, 100, "fc00:1::1/96") +ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96") + +dnl Linux seems to take a little time to get its IPv6 stack in order. Without +dnl waiting, we get occasional failures due to the following error: +dnl "connect: Cannot assign requested address" +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2]) + +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over vxlan tunnel]) +OVS_CHECK_VXLAN() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([vxlan], [br0], [at_vxlan0], [172.31.1.1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL([vxlan], [at_vxlan1], [at_ns0], [172.31.1.100], [10.1.1.1/24], + [id 0 dstport 4789]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK +]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over vxlan6 tunnel]) +OVS_CHECK_VXLAN_UDP6ZEROCSUM() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad") +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL6([vxlan], [br0], [at_vxlan0], [fc00::1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], [fc00::100], [10.1.1.1/24], + [id 0 dstport 4789 udp6zerocsumtx udp6zerocsumrx]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK +]) + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over gre tunnel]) +OVS_CHECK_GRE() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [10.1.1.1/24]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK +]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over erspan v1 tunnel]) +OVS_CHECK_GRE() +OVS_CHECK_ERSPAN() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [10.1.1.100/24], [options:key=1 options:erspan_ver=1 options:erspan_idx=7]) +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [seq key 1 erspan_ver 1 erspan 7]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK +]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over erspan v2 tunnel]) +OVS_CHECK_GRE() +OVS_CHECK_ERSPAN() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [10.1.1.100/24], [options:key=1 options:erspan_ver=2 options:erspan_dir=1 options:erspan_hwid=0x7]) +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [seq key 1 erspan_ver 2 erspan_dir egress erspan_hwid 7]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK +]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over ip6erspan v1 tunnel]) +OVS_CHECK_GRE() +OVS_CHECK_ERSPAN() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad) +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [10.1.1.100/24], + [options:key=123 options:erspan_ver=1 options:erspan_idx=0x7]) +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100], + [10.1.1.1/24], [local fc00:100::1 seq key 123 erspan_ver 1 erspan 7]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK +]) + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over ip6erspan v2 tunnel]) +OVS_CHECK_GRE() +OVS_CHECK_ERSPAN() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad) +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [10.1.1.100/24], + [options:key=121 options:erspan_ver=2 options:erspan_dir=0 options:erspan_hwid=0x7]) +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100], + [10.1.1.1/24], + [local fc00:100::1 seq key 121 erspan_ver 2 erspan_dir ingress erspan_hwid 0x7]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK +]) + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over geneve tunnel]) +OVS_CHECK_GENEVE() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24], + [vni 0]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.100/24 br-underlay], [0], [OK +]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over geneve6 tunnel]) +OVS_CHECK_GENEVE_UDP6ZEROCSUM() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad") +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL6([geneve], [br0], [at_gnv0], [fc00::1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL6([geneve], [ns_gnv0], [at_ns0], [fc00::100], [10.1.1.1/24], + [vni 0 udp6zerocsumtx udp6zerocsumrx]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK +]) + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - clone action]) +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +AT_CHECK([ovs-vsctl -- set interface ovs-p0 ofport_request=1 \ + -- set interface ovs-p1 ofport_request=2]) + +AT_DATA([flows.txt], [dnl +priority=1 actions=NORMAL +priority=10 in_port=1,ip,actions=clone(mod_dl_dst(50:54:00:00:00:0a),set_field:192.168.3.3->ip_dst), output:2 +priority=10 in_port=2,ip,actions=clone(mod_dl_src(ae:c6:7e:54:8d:4d),mod_dl_dst(50:54:00:00:00:0b),set_field:192.168.4.4->ip_dst, controller), output:1 +]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log]) +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +AT_CHECK([cat ofctl_monitor.log | STRIP_MONITOR_CSUM], [0], [dnl +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip> +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - basic truncate action]) +AT_SKIP_IF([test $HAVE_NC = no]) +OVS_TRAFFIC_VSWITCHD_START() +AT_CHECK([ovs-ofctl del-flows br0]) + +dnl Create p0 and ovs-p0(1) +ADD_NAMESPACES(at_ns0) +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address e6:66:c1:11:11:11]) +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22]) + +dnl Create p1(3) and ovs-p1(2), packets received from ovs-p1 will appear in p1 +AT_CHECK([ip link add p1 type veth peer name ovs-p1]) +on_exit 'ip link del ovs-p1' +AT_CHECK([ip link set dev ovs-p1 up]) +AT_CHECK([ip link set dev p1 up]) +AT_CHECK([ovs-vsctl add-port br0 ovs-p1 -- set interface ovs-p1 ofport_request=2]) +dnl Use p1 to check the truncated packet +AT_CHECK([ovs-vsctl add-port br0 p1 -- set interface p1 ofport_request=3]) + +dnl Create p2(5) and ovs-p2(4) +AT_CHECK([ip link add p2 type veth peer name ovs-p2]) +on_exit 'ip link del ovs-p2' +AT_CHECK([ip link set dev ovs-p2 up]) +AT_CHECK([ip link set dev p2 up]) +AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 ofport_request=4]) +dnl Use p2 to check the truncated packet +AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2 ofport_request=5]) + +dnl basic test +AT_CHECK([ovs-ofctl del-flows br0]) +AT_DATA([flows.txt], [dnl +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop +in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4 +]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +dnl use this file as payload file for ncat +AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2> /dev/null]) +on_exit 'rm -f payload200.bin' +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin]) + +dnl packet with truncated size +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=100 +]) +dnl packet with original size +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=242 +]) + +dnl more complicated output actions +AT_CHECK([ovs-ofctl del-flows br0]) +AT_DATA([flows.txt], [dnl +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop +in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4,output(port=2,max_len=100),output(port=4,max_len=100),output:2,output(port=4,max_len=200),output(port=2,max_len=65535) +]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin]) + +dnl 100 + 100 + 242 + min(65535,242) = 684 +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=684 +]) +dnl 242 + 100 + min(242,200) = 542 +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=542 +]) + +dnl SLOW_ACTION: disable kernel datapath truncate support +dnl Repeat the test above, but exercise the SLOW_ACTION code path +AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0]) + +dnl SLOW_ACTION test1: check datapatch actions +AT_CHECK([ovs-ofctl del-flows br0]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +AT_CHECK([ovs-appctl ofproto/trace br0 "in_port=1,dl_type=0x800,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=6,tp_src=8,tp_dst=9"], [0], [stdout]) +AT_CHECK([tail -3 stdout], [0], +[Datapath actions: trunc(100),3,5,trunc(100),3,trunc(100),5,3,trunc(200),5,trunc(65535),3 +This flow is handled by the userspace slow path because it: + - Uses action(s) not supported by datapath. +]) + +dnl SLOW_ACTION test2: check actual packet truncate +AT_CHECK([ovs-ofctl del-flows br0]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin]) + +dnl 100 + 100 + 242 + min(65535,242) = 684 +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=684 +]) + +dnl 242 + 100 + min(242,200) = 542 +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=542 +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + + +AT_BANNER([conntrack]) + +AT_SETUP([conntrack - controller]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg ofproto_dpif_upcall:dbg]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=100,in_port=1,udp,action=ct(commit),controller +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0) +priority=100,in_port=2,ct_state=+trk+est,udp,action=controller +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +AT_CAPTURE_FILE([ofctl_monitor.log]) +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log]) + +dnl Send an unsolicited reply from port 2. This should be dropped. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000']) + +dnl OK, now start a new connection from port 1. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1 ct\(commit\),controller '50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000']) + +dnl Now try a reply from port 2. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000']) + +dnl Check this output. We only see the latter two packets, not the first. +AT_CHECK([cat ofctl_monitor.log], [0], [dnl +NXT_PACKET_IN2 (xid=0x0): total_len=42 in_port=1 (via action) data_len=42 (unbuffered) +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2 udp_csum:0 +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 ct_state=est|rpl|trk,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2,ip,in_port=2 (via action) data_len=42 (unbuffered) +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1 udp_csum:0 +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - force commit]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg ofproto_dpif_upcall:dbg]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=100,in_port=1,udp,action=ct(force,commit),controller +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0) +priority=100,in_port=2,ct_state=+trk+est,udp,action=ct(force,commit,table=1) +table=1,in_port=2,ct_state=+trk,udp,action=controller +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +AT_CAPTURE_FILE([ofctl_monitor.log]) +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log]) + +dnl Send an unsolicited reply from port 2. This should be dropped. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"]) + +dnl OK, now start a new connection from port 1. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"]) + +dnl Now try a reply from port 2. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"]) + +AT_CHECK([ovs-appctl revalidator/purge], [0]) + +dnl Check this output. We only see the latter two packets, not the first. +AT_CHECK([cat ofctl_monitor.log], [0], [dnl +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via action) data_len=42 (unbuffered) +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2 udp_csum:0 +NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=42 ct_state=new|trk,ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1,ip,in_port=2 (via action) data_len=42 (unbuffered) +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1 udp_csum:0 +]) + +dnl +dnl Check that the directionality has been changed by force commit. +dnl +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [], [dnl +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2) +]) + +dnl OK, now send another packet from port 1 and see that it switches again +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"]) +AT_CHECK([ovs-appctl revalidator/purge], [0]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [], [dnl +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1) +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - ct flush by 5-tuple]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=100,in_port=1,udp,action=ct(commit),2 +priority=100,in_port=2,udp,action=ct(zone=5,commit),1 +priority=100,in_port=1,icmp,action=ct(commit),2 +priority=100,in_port=2,icmp,action=ct(zone=5,commit),1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +dnl Test UDP from port 1 +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [], [dnl +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1) +]) + +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1']) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [1], [dnl +]) + +dnl Test UDP from port 2 +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [0], [dnl +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5 +]) + +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2']) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +]) + +dnl Test ICMP traffic +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [0], [stdout]) +AT_CHECK([cat stdout | FORMAT_CT(10.1.1.1)], [0],[dnl +icmp,orig=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=8,code=0),reply=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=0,code=0),zone=5 +]) + +ICMP_ID=`cat stdout | cut -d ',' -f4 | cut -d '=' -f2` +ICMP_TUPLE=ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=1,icmp_id=$ICMP_ID,icmp_type=8,icmp_code=0 +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [1], [dnl +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - IPv4 ping]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=100,in_port=1,icmp,action=ct(commit),2 +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0) +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +dnl Pings from ns0->ns1 should work fine. +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0) +]) + +AT_CHECK([ovs-appctl dpctl/flush-conntrack]) + +dnl Pings from ns1->ns0 should fail. +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl +7 packets transmitted, 0 received, 100% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - get_nconns and get/set_maxconns]) +CHECK_CONNTRACK() +CHECK_CT_DPIF_SET_GET_MAXCONNS() +CHECK_CT_DPIF_GET_NCONNS() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=100,in_port=1,icmp,action=ct(commit),2 +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0) +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +dnl Pings from ns0->ns1 should work fine. +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0) +]) + +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp], [2], [], [dnl +ovs-vswitchd: maxconns missing or malformed (Invalid argument) +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns a], [2], [], [dnl +ovs-vswitchd: maxconns missing or malformed (Invalid argument) +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp 10], [2], [], [dnl +ovs-vswitchd: datapath not found (Invalid argument) +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns one-bad-dp], [2], [], [dnl +ovs-vswitchd: datapath not found (Invalid argument) +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpctl/ct-get-nconns one-bad-dp], [2], [], [dnl +ovs-vswitchd: datapath not found (Invalid argument) +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl +1 +]) + +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl +3000000 +]) + +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns 10], [], [dnl +setting maxconns successful +]) + +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl +10 +]) + +AT_CHECK([ovs-appctl dpctl/flush-conntrack]) + +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl +0 +]) + +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl +10 +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - IPv6 ping]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96") +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96") + +AT_DATA([flows.txt], [dnl + +dnl ICMPv6 echo request and reply go to table 1. The rest of the traffic goes +dnl through normal action. +table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1 +table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1 +table=0,priority=1,action=normal + +dnl Allow everything from ns0->ns1. Only allow return traffic from ns1->ns0. +table=1,priority=100,in_port=1,icmp6,action=ct(commit),2 +table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0) +table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1 +table=1,priority=1,action=drop +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2]) + +dnl The above ping creates state in the connection tracker. We're not +dnl interested in that state. +AT_CHECK([ovs-appctl dpctl/flush-conntrack]) + +dnl Pings from ns1->ns0 should fail. +NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 | FORMAT_PING], [0], [dnl +7 packets transmitted, 0 received, 100% packet loss, time 0ms +]) + +dnl Pings from ns0->ns1 should work fine. +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0], [dnl +icmpv6,orig=(src=fc00::1,dst=fc00::2,id=<cleared>,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=<cleared>,type=129,code=0) +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP -- 2.7.4 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev