Hi William,
First a list of issues I found during some basic testing...
- When I restart or stop OVS (using the systemctl interface as found in
RHEL) it does not clean up the BFP program causing the restart to fail:
2019-05-10T09:12:11.384Z|00042|netdev_afxdp|ERR|AF_XDP device eno1
reconfig fails
2019-05-10T09:12:11.384Z|00043|dpif_netdev|ERR|Failed to set
interface eno1 new configuration
I need to manually run "ip link set dev eno1 xdp off" to make it
recover.
- When I remove a bridge, I get an emer in the revalidator:
2019-05-10T09:40:34.401Z|00045|netdev_afxdp|INFO|remove xdp program
2019-05-10T09:40:34.652Z|00001|util(revalidator49)|EMER|lib/poll-loop.c:111:
assertion !fd != !wevent failed in poll_create_node()
Easy to replicate with this:
$ ovs-vsctl add-br ovs_pvp_br0 -- set bridge ovs_pvp_br0
datapath_type=netdev
$ ovs-vsctl add-port ovs_pvp_br0 eno1 -- set interface eno1
type="afxdp" options:xdpmode=drv
$ ovs-vsctl del-br ovs_pvp_br0
- High pmd usage on the statistics, even with no packets is this
expected?
$ ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 1:
isolated : false
port: dpdk0 queue-id: 0 pmd usage: 0 %
port: eno1 queue-id: 0 pmd usage: 49 %
It goes up slowly and gets stuck at 49%
- When doing the PVP testing I noticed that the physical port has odd/no
tx statistics:
$ ovs-ofctl dump-ports ovs_pvp_br0
OFPST_PORT reply (xid=0x2): 3 ports
port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0,
crc=0
tx pkts=0, bytes=0, drop=0, errs=0, coll=0
port eno1: rx pkts=103256197, bytes=6195630508, drop=0, errs=0,
frame=0, over=0, crc=0
tx pkts=0, bytes=19789272440056, drop=0, errs=0, coll=0
port tapVM: rx pkts=4043, bytes=501278, drop=0, errs=0, frame=0,
over=0, crc=0
tx pkts=4058, bytes=502504, drop=0, errs=0, coll=0
- Packets larger than 1028 bytes are dropped. Guess this needs to be
fixed, and we need to state that jumbo frames are not supported. Are you
planning on adding this?
Currently I can find not mentioning of MTU limitation in the
documentation, or any code to prevent it from being changed above the
supported limit.
- ovs-vswitchd is still crashing or stops forwarding packets when trying
to do
PVP testing with Qemu that has a TAP interface doing XDP and running
packets
at wire speed to the 10G interface.
When trying with lower volume packets it seems to work, so with 1%
traffic
rate, it forwards packets without any problems (148,771 pps). If I go
to
10% the first couple of packet pass, then it stops forwarding. If
it's not
crashing I still see packets being received by eno1 flow rules, but
no
packets make it to the VM.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000009b2505 in netdev_linux_afxdp_batch_send (xsk=0x0,
batch=batch@entry=0x7fc928005570) at lib/netdev-afxdp.c:654
654 ret = umem_elem_pop_n(&xsk->umem->mpool, batch->count,
(void **)elems_pop);
[Current thread is 1 (Thread 0x7fc95e734700 (LWP 3926))]
Missing separate debuginfos, use: dnf debuginfo-install
openvswitch2.11-2.11.0-0.20190129gitd3a10db.el8fdb.x86_64
(gdb) bt
#0 0x00000000009b2505 in netdev_linux_afxdp_batch_send (xsk=0x0,
batch=batch@entry=0x7fc928005570) at lib/netdev-afxdp.c:654
#1 0x00000000009a1850 in netdev_linux_send (netdev_=0x2f7f540,
qid=<optimized out>, batch=0x7fc928005570, concurrent_txq=<optimized
out>) at lib/netdev-linux.c:1486
#2 0x0000000000906051 in netdev_send (netdev=<optimized out>,
qid=qid@entry=0, batch=batch@entry=0x7fc928005570,
concurrent_txq=concurrent_txq@entry=true)
at lib/netdev.c:797
#3 0x00000000008d2c94 in dp_netdev_pmd_flush_output_on_port
(pmd=pmd@entry=0x7fc95e735010, p=p@entry=0x7fc928005540) at
lib/dpif-netdev.c:4185
#4 0x00000000008d2faf in dp_netdev_pmd_flush_output_packets
(pmd=pmd@entry=0x7fc95e735010, force=force@entry=false) at
lib/dpif-netdev.c:4225
#5 0x00000000008db317 in dp_netdev_pmd_flush_output_packets
(force=false, pmd=0x7fc95e735010) at lib/dpif-netdev.c:4280
#6 dp_netdev_process_rxq_port (pmd=pmd@entry=0x7fc95e735010,
rxq=0x2f36c50, port_no=1) at lib/dpif-netdev.c:4280
#7 0x00000000008db67d in pmd_thread_main (f_=<optimized out>) at
lib/dpif-netdev.c:5446
#8 0x000000000095c96d in ovsthread_wrapper (aux_=<optimized out>)
at lib/ovs-thread.c:352
#9 0x00007fc9789d62de in start_thread () from
/lib64/libpthread.so.0
#10 0x00007fc97817ba63 in clone () from /lib64/libc.so.6
- make check-afxpd is failing for me, however, make check-kernel works
fine.
Did not dive into it too much, but it fails here for all test cases,
this is the same build I use for testing.
./system-afxdp-traffic.at:4: ovs-vsctl -- add-br br0 -- set Bridge
br0 datapath_type=netdev
protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15
fail-mode=secure --
--- /dev/null 2019-05-16 09:09:33.445562692 -0400
+++
/root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/at-groups/1/stderr 2019-05-17
05:46:20.506814939 -0400
@@ -0,0 +1,2 @@
+ovs-vsctl: Error detected while setting up 'br0'. See ovs-vswitchd
log for details.
+ovs-vsctl: The default log directory is
"/root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/01".
ovsdb-server.log:
> 2019-05-17T09:46:20.437Z|00001|vlog|INFO|opened log file
/root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/01/ovsdb-server.log
> 2019-05-17T09:46:20.441Z|00002|ovsdb_server|INFO|ovsdb-server (Open
vSwitch) 2.11.90
ovs-vswitchd.log:
> 2019-05-17T09:46:20.461Z|00001|vlog|INFO|opened log file
/root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/01/ovs-vswitchd.log
> 2019-05-17T09:46:20.462Z|00002|ovs_numa|INFO|Discovered 28 CPU cores
on NUMA node 0
> 2019-05-17T09:46:20.462Z|00003|ovs_numa|INFO|Discovered 1 NUMA nodes
and 28 CPU cores
>
2019-05-17T09:46:20.462Z|00004|reconnect|INFO|unix:/root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/01/db.sock:
connecting...
>
2019-05-17T09:46:20.462Z|00005|reconnect|INFO|unix:/root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/01/db.sock:
connected
> 2019-05-17T09:46:20.465Z|00006|bridge|INFO|ovs-vswitchd (Open
vSwitch) 2.11.90
> 2019-05-17T09:46:20.505Z|00007|netdev_linux|WARN|ovs-netdev:
creating tap device failed: Device or resource busy
> 2019-05-17T09:46:20.508Z|00008|dpif|WARN|datapath ovs-netdev already
exists but cannot be opened: No such device
> 2019-05-17T09:46:20.508Z|00009|ofproto_dpif|ERR|failed to open
datapath of type netdev: No such device
> 2019-05-17T09:46:20.508Z|00010|ofproto|ERR|failed to open datapath
br0: No such device
> 2019-05-17T09:46:20.508Z|00011|bridge|ERR|failed to create bridge
br0: No such device
1. system-afxdp-traffic.at:3: FAILED (system-afxdp-traffic.at:4)
The following might be useful when combining DPDK and AF_XDP:
Currently, DPDK and AF_XDP polling can be combined on a single PMD
thread, it
might be nice to have an option to not do this, i.e. have separate
PMD
threads for each type. I know we can do this with assigning specific
PMDs to
queues, but this will disable auto-balancing. This will also help
later if
we would add poll() mode support for AF_XDP.
Other review comments see inline below. I reviewed the code, not the
unit tests or automake changes.
Cheers,
Eelco
On 10 May 2019, at 1:54, William Tu wrote:
The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket
type
built upon the eBPF and XDP technology. It is aims to have comparable
performance to DPDK but cooperate better with existing kernel's
networking
stack. An AF_XDP socket receives and sends packets from an eBPF/XDP
program
attached to the netdev, by-passing a couple of Linux kernel's
subsystems
As a result, AF_XDP socket shows much better performance than
AF_PACKET
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst. Note that by default, this is not
compiled in.
Signed-off-by: William Tu <[email protected]>
---
v1->v2:
- add a list to maintain unused umem elements
- remove copy from rx umem to ovs internal buffer
- use hugetlb to reduce misses (not much difference)
- use pmd mode netdev in OVS (huge performance improve)
- remove malloc dp_packet, instead put dp_packet in umem
v2->v3:
- rebase on the OVS master, 7ab4b0653784
("configure: Check for more specific function to pull in pthread
library.")
- remove the dependency on libbpf and dpif-bpf.
instead, use the built-in XDP_ATTACH feature.
- data structure optimizations for better performance, see[1]
- more test cases support
v3:
https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html
v3->v4:
- Use AF_XDP API provided by libbpf
- Remove the dependency on XDP_ATTACH kernel patch set
- Add documentation, bpf.rst
v4->v5:
- rebase to master
- remove rfc, squash all into a single patch
- add --enable-afxdp, so by default, AF_XDP is not compiled
- add options: xdpmode=drv,skb
- add multiple queue and multiple PMD support, with options: n_rxq
- improve documentation, rename bpf.rst to af_xdp.rst
v5->v6
- rebase to master, commit 0cdd5b13de91b98
- address errors from sparse and clang
- pass travis-ci test
- address feedback from Ben
- fix issues reported by 0-day robot
- improved documentation
v6-v7
- rebase to master, commit abf11558c1515bf3b1
- address feedbacks from Ilya, Ben, and Eelco, see:
https://www.mail-archive.com/[email protected]/msg32357.html
- add XDP mode change, implement get/set_config, reconfigure
- Fix reconfiguration/crash issue caused by libbpf, see patch:
[PATCH bpf 0/2] libbpf: fixes for AF_XDP teardown
- perf optimization for batching umem_push/pop
- perf optimization for batching kick_tx
- test build with dpdk
- fix/refactor atomic operation
- make AF_XDP x86 specific, otherwise fail at build time
- lots of code refactoring
- add PVP setup in documentation
v7-v8:
- Address feedback from Ilya at:
https://patchwork.ozlabs.org/patch/1095019/
- add netdev-linux-private.h
- fix afxdp reconfigure issue
- sort include headers
- remove unnecessary OVS_UNUSED
- coding style fixes
- error case handling and memory leak
---
Documentation/automake.mk | 1 +
Documentation/index.rst | 1 +
Documentation/intro/install/afxdp.rst | 479 +++++++++++++++++
Documentation/intro/install/index.rst | 1 +
acinclude.m4 | 32 ++
configure.ac | 1 +
lib/automake.mk | 13 +
lib/dp-packet.c | 33 ++
lib/dp-packet.h | 22 +-
lib/dpif-netdev-perf.h | 14 +
lib/netdev-afxdp.c | 727 +++++++++++++++++++++++++
lib/netdev-afxdp.h | 53 ++
lib/netdev-linux-private.h | 124 +++++
lib/netdev-linux.c | 137 +++--
lib/netdev-linux.h | 14 +
lib/netdev-provider.h | 4 +-
lib/netdev.c | 3 +
lib/xdpsock.c | 239 +++++++++
lib/xdpsock.h | 123 +++++
tests/automake.mk | 17 +
tests/system-afxdp-macros.at | 153 ++++++
tests/system-afxdp-testsuite.at | 26 +
tests/system-afxdp-traffic.at | 978
++++++++++++++++++++++++++++++++++
23 files changed, 3137 insertions(+), 58 deletions(-)
create mode 100644 Documentation/intro/install/afxdp.rst
create mode 100644 lib/netdev-afxdp.c
create mode 100644 lib/netdev-afxdp.h
create mode 100644 lib/netdev-linux-private.h
create mode 100644 lib/xdpsock.c
create mode 100644 lib/xdpsock.h
create mode 100644 tests/system-afxdp-macros.at
create mode 100644 tests/system-afxdp-testsuite.at
create mode 100644 tests/system-afxdp-traffic.at
diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 082438e09a33..11cc59efc881 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -10,6 +10,7 @@ DOC_SOURCE = \
Documentation/intro/why-ovs.rst \
Documentation/intro/install/index.rst \
Documentation/intro/install/bash-completion.rst \
+ Documentation/intro/install/afxdp.rst \
Documentation/intro/install/debian.rst \
Documentation/intro/install/documentation.rst \
Documentation/intro/install/distributions.rst \
diff --git a/Documentation/index.rst b/Documentation/index.rst
index 46261235c732..aa9e7c49f179 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -59,6 +59,7 @@ vSwitch? Start here.
:doc:`intro/install/windows` |
:doc:`intro/install/xenserver` |
:doc:`intro/install/dpdk` |
+ :doc:`intro/install/afxdp` |
:doc:`Installation FAQs <faq/releases>`
- **Tutorials:** :doc:`tutorials/faucet` |
diff --git a/Documentation/intro/install/afxdp.rst
b/Documentation/intro/install/afxdp.rst
new file mode 100644
index 000000000000..1222b433dbbb
--- /dev/null
+++ b/Documentation/intro/install/afxdp.rst
@@ -0,0 +1,479 @@
+..
+ Licensed under the Apache License, Version 2.0 (the "License");
you may
+ not use this file except in compliance with the License. You
may obtain
+ a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
software
+ distributed under the License is distributed on an "AS IS"
BASIS, WITHOUT
+ WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the
+ License for the specific language governing permissions and
limitations
+ under the License.
+
+ Convention for heading levels in Open vSwitch documentation:
+
+ ======= Heading 0 (reserved for the title in a document)
+ ------- Heading 1
+ ~~~~~~~ Heading 2
+ +++++++ Heading 3
+ ''''''' Heading 4
+
+ Avoid deeper levels because they do not render well.
+
+
+========================
+Open vSwitch with AF_XDP
+========================
+
+This document describes how to build and install Open vSwitch using
+AF_XDP netdev.
+
+.. warning::
+ The AF_XDP support of Open vSwitch is considered 'experimental',
+ and it is not compiled in by default.
+
+Introduction
+------------
+AF_XDP, Address Family of the eXpress Data Path, is a new Linux
socket type
+built upon the eBPF and XDP technology. It is aims to have
comparable
+performance to DPDK but cooperate better with existing kernel's
networking
+stack. An AF_XDP socket receives and sends packets from an eBPF/XDP
program
+attached to the netdev, by-passing a couple of Linux kernel's
subsystems.
+As a result, AF_XDP socket shows much better performance than
AF_PACKET.
+For more details about AF_XDP, please see linux kernel's
+Documentation/networking/af_xdp.rst
+
+
+AF_XDP Netdev
+-------------
+OVS has a couple of netdev types, i.e., system, tap, or
+internal. The AF_XDP feature adds a new netdev types called
+"afxdp", and implement its configuration, packet reception,
+and transmit functions. Since the AF_XDP socket, xsk,
+operates in userspace, once ovs-vswitchd receives packets
+from xsk, the proposed architecture re-uses the existing
+userspace dpif-netdev datapath. As a result, most of
+the packet processing happens at the userspace instead of
+linux kernel.
+
+::
+
+ | +-------------------+
+ | | ovs-vswitchd |<-->ovsdb-server
+ | +-------------------+
+ | | ofproto |<-->OpenFlow controllers
+ | +--------+-+--------+
+ | | netdev | |ofproto-|
+ userspace | +--------+ | dpif |
+ | | afxdp | +--------+
+ | | netdev | | dpif |
+ | +---||---+ +--------+
+ | || | dpif- |
+ | || | netdev |
+ |_ || +--------+
+ ||
+ _ +---||-----+--------+
+ | | AF_XDP prog + |
+ kernel | | xsk_map |
+ |_ +--------||---------+
+ ||
+ physical
+ NIC
+
+
+Build requirements
+------------------
+
+In addition to the requirements described in :doc:`general`, building
Open
+vSwitch with AF_XDP will require the following:
+
+- libbpf from kernel source tree (kernel 5.0.0 or later)
+
+- Linux kernel XDP support, with the following options (required)
+
+ * CONFIG_BPF=y
+
+ * CONFIG_BPF_SYSCALL=y
+
+ * CONFIG_XDP_SOCKETS=y
+
+
+- The following optional Kconfig options are also recommended, but
not
+ required:
+
+ * CONFIG_BPF_JIT=y (Performance)
+
+ * CONFIG_HAVE_BPF_JIT=y (Performance)
+
+ * CONFIG_XDP_SOCKETS_DIAG=y (Debugging)
+
+- If possible, run **./xdpsock -r -N -z -i <your device>** under
+ linux/samples/bpf. This is the OVS indepedent benchmark tools for
AF_XDP.
+ It makes sure your basic kernel requirements are met for AF_XDP.
+
+
+Installing
+----------
+For OVS to use AF_XDP netdev, it has to be configured with LIBBPF
support.
+Frist, clone a recent version of Linux bpf-next tree::
+
+ git clone
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
+
+Second, go into the Linux source directory and build libbpf in the
tools
+directory::
+
+ cd bpf-next/
+ cd tools/lib/bpf/
+ make && make install
+ make install_headers
+
+.. note::
+ Make sure xsk.h and bpf.h are installed in system's library path,
+ e.g. /usr/local/include/bpf/ or /usr/include/bpf/
+
+Make sure the libbpf.so is installed correctly::
+
+ ldconfig
+ ldconfig -p | grep libbpf
+
+
+Third, ensure the standard OVS requirements are installed and
+bootstrap/configure the package::
+
+ ./boot.sh && ./configure --enable-afxdp
+
+Finally, build and install OVS::
+
+ make && make install
+
+To kick start end-to-end autotesting::
+
+ uname -a # make sure having 5.0+ kernel
+ make check-afxdp
+
+if a test case fails, check the log at::
+
+ cat
tests/system-afxdp-testsuite.dir/<number>/system-afxdp-testsuite.log
+
+
+Setup AF_XDP netdev
+-------------------
+Before running OVS with AF_XDP, make sure the libbpf and libelf are
+set-up right::
+
+ ldd vswitchd/ovs-vswitchd
+
+Open vSwitch should be started using userspace datapath as described
+in :doc:`general`::
+
+ ovs-vswitchd --disable-system
+ ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
+
+.. note::
+ OVS AF_XDP netdev is using the userspace datapath, the same
datapath
+ as used by OVS-DPDK. So it requires --disable-system for
ovs-vswitchd
+ and datapath_type=netdev when adding a new bridge.
As mentioned earlier offline I think --disable-system can be removed as
the Kernel and userspace datapath can be run at the same time.
+
+Make sure your device driver support AF_XDP, and to use 1 PMD (on
core 4)
+on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask,
+pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or
"skb"::
Wondering how options:xdpmode should operate without it being specified?
I would prefer that if the option is not specified it would try drv, and
if it fails fallback to skb.
We need to add these new options to the vswitch.xml file
+
+ ethtool -L enp2s0 combined 1
+ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
+ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp"
\
+ options:n_rxq=1 options:xdpmode=drv \
+ other_config:pmd-rxq-affinity="0:4"
+
+Or, use 4 pmds/cores and 4 queues by doing::
+
+ ethtool -L enp2s0 combined 4
+ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
+ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp"
\
+ options:n_rxq=4 options:xdpmode=drv \
+ other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
+
Add some text that pmd-rxq-affinity is not a requirement, the system
will auto (re)assign.
Also, note that cores used by pmd-rxq-affinity are not shared/used by
floating PMDs.
+To validate that the bridge has successfully instantiated, you can
use the::
+
+ ovs-vsctl show
+
+should show something like::
+
+ Port "ens802f0"
+ Interface "ens802f0"
+ type: afxdp
+ options: {n_rxq="1", xdpmode=drv}
+
+Otherwise, enable debug by::
+
+ ovs-appctl vlog/set netdev_afxdp::dbg
+
+References
+----------
+Most of the design details are described in the paper presented at
+Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
+section 4, and slides[2][4].
+"The Path to DPDK Speeds for AF XDP"[3] gives a very good
introduction
+about AF_XDP current and future work.
+
+
+[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
+
+[2]
http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
+
+[3]
http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
+
+[4]
https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
+
+
+Performance Tuning
+------------------
+The name of the game is to keep your CPU running in userspace,
allowing PMD
+to keep polling the AF_XDP queues without any interferences from
kernel.
+
+#. Make sure everything is in the same NUMA node (memory used by
AF_XDP, pmd
+ running cores, device plug-in slot)
How can you do this? The code is not taking care of NUMA, and memory is
allocated with posix_memalign so no idea which NUMA node it gets
allocated.
+#. Isolate your CPU by doing isolcpu at grub configure.
+
+#. IRQ should not set to pmd running core.
+
+#. The Spectre and Meltdown fixes increase the overhead of system
calls.
+
Maybe be more consistent, either one or two newlines before a heading?
+Debugging performance issue
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+While running the traffic, use linux perf tool to see where your cpu
+spends its cycle::
+
+ cd bpf-next/tools/perf
+ make
+ ./perf record -p `pidof ovs-vswitchd` sleep 10
+ ./perf report
+
+Measure your system call rate by doing::
+
+ pstree -p `pidof ovs-vswitchd`
+ strace -c -p <your pmd's PID>
+
+Or, use OVS pmd tool::
+
+ ovs-appctl dpif-netdev/pmd-stats-show
+
+
+Example Script
+--------------
+
+Below is a script using namespaces and veth peer::
+
+ #!/bin/bash
+ ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl
\
+ --disable-system --detach \
+ ovs-vsctl -- add-br br0 -- set Bridge br0 \
+ protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14
\
+ fail-mode=secure datapath_type=netdev
+ ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
+
+ ip netns add at_ns0
+ ovs-appctl vlog/set netdev_afxdp::dbg
+
+ ip link add p0 type veth peer name afxdp-p0
+ ip link set p0 netns at_ns0
+ ip link set dev afxdp-p0 up
+ ovs-vsctl add-port br0 afxdp-p0 -- \
+ set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
+
+ ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
+ ip addr add "10.1.1.1/24" dev p0
+ ip link set dev p0 up
+ NS_EXEC_HEREDOC
+
+ ip netns add at_ns1
+ ip link add p1 type veth peer name afxdp-p1
+ ip link set p1 netns at_ns1
+ ip link set dev afxdp-p1 up
+
+ ovs-vsctl add-port br0 afxdp-p1 -- \
+ set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
+ ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
+ ip addr add "10.1.1.2/24" dev p1
+ ip link set dev p1 up
+ NS_EXEC_HEREDOC
+
+ ip netns exec at_ns0 ping -i .2 10.1.1.2
+
+
+Limitations/Known Issues
+------------------------
+#. Device's numa ID is always 0, need a way to find numa id from a
netdev.
+#. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A
possible
+ work-around is to use OpenFlow meter action.
+#. AF_XDP device added to bridge, remove, and added again will fail.
+#. Most of the tests are done using i40e single port. Multiple ports
and
+ also ixgbe driver also needs to be tested.
+#. No latency test result (TODO items)
+
+
+make check-afxdp
+----------------
+When executing 'make check-afxdp', OVS creates namespaces, sets up
AF_XDP on
+veth devices and kicks start the testing. So far we have the
following test
+cases::
+
+ AF_XDP netdev datapath-sanity
+
+ 1: datapath - ping between two ports ok
+ 2: datapath - ping between two ports on vlan ok
+ 3: datapath - ping6 between two ports ok
+ 4: datapath - ping6 between two ports on vlan ok
+ 5: datapath - ping over vxlan tunnel ok
+ 6: datapath - ping over vxlan6 tunnel ok
+ 7: datapath - ping over gre tunnel ok
+ 8: datapath - ping over erspan v1 tunnel ok
+ 9: datapath - ping over erspan v2 tunnel ok
+ 10: datapath - ping over ip6erspan v1 tunnel ok
+ 11: datapath - ping over ip6erspan v2 tunnel ok
+ 12: datapath - ping over geneve tunnel ok
+ 13: datapath - ping over geneve6 tunnel ok
+ 14: datapath - clone action ok
+ 15: datapath - basic truncate action ok
+
+ conntrack
+
+ 16: conntrack - controller ok
+ 17: conntrack - force commit ok
+ 18: conntrack - ct flush by 5-tuple ok
+ 19: conntrack - IPv4 ping ok
+ 20: conntrack - get_nconns and get/set_maxconns ok
+ 21: conntrack - IPv6 ping ok
+
+ system-ovn
+
+ 22: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok
+ 23: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok
+ 24: ovn -- multiple gateway routers, SNAT and DNAT ok
+ 25: ovn -- load-balancing ok
+ 26: ovn -- load-balancing - same subnet. ok
+ 27: ovn -- load balancing in gateway router ok
+ 28: ovn -- multiple gateway routers, load-balancing ok
+ 29: ovn -- load balancing in router with gateway router port ok
+ 30: ovn -- DNAT and SNAT on distributed router - N/S ok
+ 31: ovn -- DNAT and SNAT on distributed router - E/W ok
+
+PVP using tap device
+--------------------
+Assume you have enp2s0 as physical nic, and a tap device connected to
VM.
+First, start OVS, then add physical port::
+
+ ethtool -L enp2s0 combined 1
+ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
+ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp"
\
+ options:n_rxq=1 options:xdpmode=drv \
+ other_config:pmd-rxq-affinity="0:4"
+
+Start a VM with virtio and tap device::
+
+ qemu-system-x86_64 -hda ubuntu1810.qcow \
+ -m 4096 \
+ -cpu host,+x2apic -enable-kvm \
+ -device virtio-net-pci,mac=00:02:00:00:00:01,netdev=net0,mq=on,\
+ vectors=10,mrg_rxbuf=on,rx_queue_size=1024 \
+ -netdev type=tap,id=net0,vhost=on,queues=8 \
+ -object memory-backend-file,id=mem,size=4096M,\
+ mem-path=/dev/hugepages,share=on \
+ -numa node,memdev=mem -mem-prealloc -smp 2
+
+Create OpenFlow rules::
+
+ ovs-vsctl add-port br0 tap0
Maybe add tap as XDP or else it will be an AF_PACKET interface polling
in the main thread.
+ ovs-ofctl del-flows br0
+ ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:tap0"
+ ovs-ofctl add-flow br0 "in_port=tap0, actions=output:enp2s0"
+
+Inside the VM, use xdp_rxq_info to bounce back the traffic::
+
+ ./xdp_rxq_info --dev ens3 --action XDP_TX
+
+The performance number I got is around 700Kpps.
+This is due to using the kernel's tap interface, which requires
copying
+packet into kernel from the umem buffer in userspace.
+
+PVP using vhostuser device
+--------------------------
+First, build OVS with DPDK and AFXDP::
+
+ ./configure --enable-afxdp --with-dpdk=<dpdk path>
+ make -j4 && make install
+
+Create a vhost-user port from OVS::
+
+ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
+ ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev \
+ other_config:pmd-cpu-mask=0xfff
+ ovs-vsctl add-port br0 vhost-user-1 \
+ -- set Interface vhost-user-1 type=dpdkvhostuser
+
+Start VM using vhost-user mode::
+
+ qemu-system-x86_64 -hda ubuntu1810.qcow \
+ -m 4096 \
+ -cpu host,+x2apic -enable-kvm \
+ -chardev
socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 \
+ -netdev
type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=4 \
+ -device virtio-net-pci,mac=00:00:00:00:00:01,\
+ netdev=mynet1,mq=on,vectors=10 \
+ -object memory-backend-file,id=mem,size=4096M,\
+ mem-path=/dev/hugepages,share=on \
+ -numa node,memdev=mem -mem-prealloc -smp 2
+
+Setup the OpenFlow ruls::
+
+ ovs-ofctl del-flows br0
+ ovs-ofctl add-flow br0 "in_port=enp2s0,
actions=output:vhost-user-1"
+ ovs-ofctl add-flow br0 "in_port=vhost-user-1,
actions=output:enp2s0"
+
+Inside the VM, use xdp_rxq_info to drop or bounce back the traffic::
+
+ ./xdp_rxq_info --dev ens3 --action XDP_DROP
+ ./xdp_rxq_info --dev ens3 --action XDP_TX
+
+Performance: for RX_DROP: 6.6Mpps, TX: 2.3Mpps
+
+PCP container using veth
+------------------------
+Create namespace and veth peer devices::
+
+ ip netns add at_ns0
+ ip link add p0 type veth peer name afxdp-p0
+ ip link set p0 netns at_ns0
+ ip link set dev afxdp-p0 up
+ ip netns exec at_ns0 ip link set dev p0 up
+
+Attach the veth port to br0 (linux kernel mode)::
+
+ ovs-vsctl add-port br0 afxdp-p0 -- \
+ set interface afxdp-p0 options:n_rxq=1 options:xdpmode=skb
+
Remove the xdpmode=skb above... Also, see above on the PF_PACKET
interface in the bridge_run(),
I would advise against using this, and you might want to remove it.
+
+Or, use AF_XDP with skb mode::
+
+ ovs-vsctl add-port br0 afxdp-p0 -- \
+ set interface afxdp-p0 type="afxdp" options:n_rxq=1
options:xdpmode=skb
+
+Setup the OpenFlow rules::
+
+ ovs-ofctl del-flows br0
+ ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:afxdp-p0"
+ ovs-ofctl add-flow br0 "in_port=afxdp-p0, actions=output:enp2s0"
+
+In the namespace, run drop or bounce back the packet::
+
+ ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_DROP
+ ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_TX
+
+Performace: for RX_DROP: 800Kpps, TX: 700Kpps
+
+Bug Reporting
+-------------
+
+Please report problems to [email protected].
diff --git a/Documentation/intro/install/index.rst
b/Documentation/intro/install/index.rst
index 3193c736cf17..c27a9c9d16ff 100644
--- a/Documentation/intro/install/index.rst
+++ b/Documentation/intro/install/index.rst
@@ -45,6 +45,7 @@ Installation from Source
xenserver
userspace
dpdk
+ afxdp
Installation from Packages
--------------------------
diff --git a/acinclude.m4 b/acinclude.m4
index b532a4579266..5782f7e4bc2e 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -221,6 +221,38 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [
])
])
+dnl OVS_CHECK_LINUX_AF_XDP
+dnl
+dnl Check both Linux kernel AF_XDP and libbpf support
+AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
+ AC_ARG_ENABLE([afxdp],
+ [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP
support])],
+ [], [enable_afxdp=no])
+ AC_MSG_CHECKING([whether AF_XDP is enabled])
+ if test "$enable_afxdp" != yes; then
+ AC_MSG_RESULT([no])
+ AF_XDP_ENABLE=false
+ else
+ AC_MSG_RESULT([yes])
+ AF_XDP_ENABLE=true
+
+ AC_CHECK_HEADER([bpf/libbpf.h], [],
+ [AC_MSG_ERROR([unable to find bpf/libbpf.h for AF_XDP
support])])
+
+ AC_CHECK_HEADER([linux/if_xdp.h], [],
+ [AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDP
support])])
+
+ AC_CHECK_HEADER([bpf/xsk.h], [],
+ [AC_MSG_ERROR([unable to find bpf/xsk.h for AF_XDP support])])
+
+ AC_DEFINE([HAVE_AF_XDP], [1],
+ [Define to 1 if AF_XDP support is available and
enabled.])
+ LIBBPF_LDADD=" -lbpf -lelf"
+ AC_SUBST([LIBBPF_LDADD])
+ fi
+ AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
+])
+
dnl OVS_CHECK_DPDK
dnl
dnl Configure DPDK source tree
diff --git a/configure.ac b/configure.ac
index 505e3d041e93..29c90b73f836 100644
--- a/configure.ac
+++ b/configure.ac
@@ -99,6 +99,7 @@ OVS_CHECK_SPHINX
OVS_CHECK_DOT
OVS_CHECK_IF_DL
OVS_CHECK_STRTOK_R
+OVS_CHECK_LINUX_AF_XDP
AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct
stat.st_mtimensec],
[], [], [[#include <sys/stat.h>]])
diff --git a/lib/automake.mk b/lib/automake.mk
index cc5dccf39d6b..686e57f8c472 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -14,6 +14,10 @@ if WIN32
lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS}
endif
+if HAVE_AF_XDP
+lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
+endif
+
lib_libopenvswitch_la_LDFLAGS = \
$(OVS_LTINFO) \
-Wl,--version-script=$(top_builddir)/lib/libopenvswitch.sym \
@@ -392,6 +396,7 @@ lib_libopenvswitch_la_SOURCES += \
lib/if-notifier.h \
lib/netdev-linux.c \
lib/netdev-linux.h \
+ lib/netdev-linux-private.h \
lib/netdev-tc-offloads.c \
lib/netdev-tc-offloads.h \
lib/netlink-conntrack.c \
@@ -409,6 +414,14 @@ lib_libopenvswitch_la_SOURCES += \
lib/tc.h
endif
+if HAVE_AF_XDP
+lib_libopenvswitch_la_SOURCES += \
+ lib/xdpsock.c \
+ lib/xdpsock.h \
+ lib/netdev-afxdp.c \
+ lib/netdev-afxdp.h
+endif
+
if DPDK_NETDEV
lib_libopenvswitch_la_SOURCES += \
lib/dpdk.c \
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 0976a35e758b..7d086dc5e860 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -22,6 +22,9 @@
#include "netdev-dpdk.h"
#include "openvswitch/dynamic-string.h"
#include "util.h"
+#ifdef HAVE_AF_XDP
+#include "netdev-afxdp.h"
+#endif
Why the protection above? You do not do this in netdev-linux.c.
Maybe you should move the #ifdef HAVE_AF_XDP inside the include file?
static void
dp_packet_init__(struct dp_packet *b, size_t allocated, enum
dp_packet_source source)
@@ -59,6 +62,27 @@ dp_packet_use(struct dp_packet *b, void *base,
size_t allocated)
dp_packet_use__(b, base, allocated, DPBUF_MALLOC);
}
+#if HAVE_AF_XDP
+/* Initialize 'b' as an empty dp_packet that contains
+ * memory starting at AF_XDP umem base.
+ */
+void
+dp_packet_use_afxdp(struct dp_packet *b, void *base, size_t
allocated)
+{
+ dp_packet_set_base(b, base);
+ dp_packet_set_data(b, base);
+ dp_packet_set_size(b, 0);
+
+ dp_packet_set_allocated(b, allocated);
+ b->source = DPBUF_AFXDP;
+ dp_packet_reset_offsets(b);
+ pkt_metadata_init(&b->md, 0);
+ dp_packet_reset_cutlen(b);
+ dp_packet_reset_offload(b);
+ b->packet_type = htonl(PT_ETH);
+}
+#endif
Guess the above ifdef saves some bytes if not build with AF_XDP, but we
do not seem to do it for other functions either, like
dp_packet_init_dpdk().
/* Initializes 'b' as an empty dp_packet that contains the
'allocated' bytes of
* memory starting at 'base'. 'base' should point to a buffer on the
stack.
* (Nothing actually relies on 'base' being allocated on the stack.
It could
@@ -122,6 +146,11 @@ dp_packet_uninit(struct dp_packet *b)
* created as a dp_packet */
free_dpdk_buf((struct dp_packet*) b);
#endif
+ } else if (b->source == DPBUF_AFXDP) {
+#ifdef HAVE_AF_XDP
+ free_afxdp_buf(b);
+#endif
If you move the #ifdef HAVE_AF_XDP check to the include file (see
comment above), you can use the DPDK inline trick and remove the #ifdef
above.
See lib/netdev-dpdk.h
+ return;
}
}
}
@@ -248,6 +277,9 @@ dp_packet_resize__(struct dp_packet *b, size_t
new_headroom, size_t new_tailroom
case DPBUF_STACK:
OVS_NOT_REACHED();
+ case DPBUF_AFXDP:
+ OVS_NOT_REACHED();
+
case DPBUF_STUB:
b->source = DPBUF_MALLOC;
new_base = xmalloc(new_allocated);
@@ -433,6 +465,7 @@ dp_packet_steal_data(struct dp_packet *b)
{
void *p;
ovs_assert(b->source != DPBUF_DPDK);
+ ovs_assert(b->source != DPBUF_AFXDP);
if (b->source == DPBUF_MALLOC && dp_packet_data(b) ==
dp_packet_base(b)) {
p = dp_packet_data(b);
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index a5e9ade1244a..0f533201f956 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -25,6 +25,10 @@
#include <rte_mbuf.h>
#endif
+#ifdef HAVE_AF_XDP
+#include "netdev-afxdp.h"
+#endif
+
See comment in dp-packet.c, if done all #ifdef HAVE_AF_XDP in this file
can be removed.
#include "netdev-dpdk.h"
#include "openvswitch/list.h"
#include "packets.h"
@@ -42,6 +46,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
DPBUF_DPDK, /* buffer data is from DPDK allocated
memory.
* ref to dp_packet_init_dpdk() in
dp-packet.c.
*/
+ DPBUF_AFXDP, /* buffer data from XDP frame */
};
#define DP_PACKET_CONTEXT_SIZE 64
@@ -89,6 +94,13 @@ struct dp_packet {
};
};
+#if HAVE_AF_XDP
+struct dp_packet_afxdp {
+ struct umem_pool *mpool;
+ struct dp_packet packet;
+};
+#endif
+
static inline void *dp_packet_data(const struct dp_packet *);
static inline void dp_packet_set_data(struct dp_packet *, void *);
static inline void *dp_packet_base(const struct dp_packet *);
@@ -122,7 +134,9 @@ static inline const void
*dp_packet_get_nd_payload(const struct dp_packet *);
void dp_packet_use(struct dp_packet *, void *, size_t);
void dp_packet_use_stub(struct dp_packet *, void *, size_t);
void dp_packet_use_const(struct dp_packet *, const void *, size_t);
-
+#if HAVE_AF_XDP
+void dp_packet_use_afxdp(struct dp_packet *, void *, size_t);
+#endif
void dp_packet_init_dpdk(struct dp_packet *);
void dp_packet_init(struct dp_packet *, size_t);
@@ -184,6 +198,12 @@ dp_packet_delete(struct dp_packet *b)
return;
}
+#ifdef HAVE_AF_XDP
+ if (b->source == DPBUF_AFXDP) {
+ free_afxdp_buf(b);
+ return;
+ }
+#endif
dp_packet_uninit(b);
free(b);
}
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 859c05613ddf..cc91720fad6e 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -198,6 +198,20 @@ cycles_counter_update(struct pmd_perf_stats *s)
{
#ifdef DPDK_NETDEV
return s->last_tsc = rte_get_tsc_cycles();
+#elif HAVE_AF_XDP
We need to add support for at least ARM and PPC, not sure how to do this
nicely.
This code is already a quick cut/paste from DPDK, license?
+ /* This is x86-specific instructions. */
+ union {
+ uint64_t tsc_64;
+ struct {
+ uint32_t lo_32;
+ uint32_t hi_32;
+ };
+ } tsc;
+ asm volatile("rdtsc" :
+ "=a" (tsc.lo_32),
+ "=d" (tsc.hi_32));
+
+ return s->last_tsc = tsc.tsc_64;
#else
return s->last_tsc = 0;
#endif
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
new file mode 100644
index 000000000000..cd1b9ca8be77
--- /dev/null
+++ b/lib/netdev-afxdp.c
@@ -0,0 +1,727 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+ * See the License for the specific language governing permissions
and
+ * limitations under the License.
+ */
+
+#if !defined(__i386__) && !defined(__x86_64__)
+#error AF_XDP supported only for Linux on x86 or x86_64
Any reason why we do not support PPC and ARM?
+#endif
+
+#include <config.h>
+
+#include "netdev-linux-private.h"
+#include "netdev-linux.h"
Swap the two above, see comment in netdev-linux-private.h
+#include "netdev-afxdp.h"
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <linux/if_ether.h>
+#include <linux/if_tun.h>
+#include <linux/types.h>
+#include <linux/ethtool.h>
+#include <linux/mii.h>
+#include <linux/rtnetlink.h>
+#include <linux/sockios.h>
+#include <linux/if_xdp.h>
+#include <net/if.h>
+#include <net/if_arp.h>
+#include <net/route.h>
+#include <netinet/in.h>
+#include <netpacket/packet.h>
+#include <poll.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/utsname.h>
+#include <unistd.h>
+
Some of these includes are included by netdev-linux(-private).h already
so why not remove them?
+#include "coverage.h"
+#include "dp-packet.h"
+#include "dpif-netlink.h"
+#include "dpif-netdev.h"
+#include "fatal-signal.h"
+#include "hash.h"
+#include "netdev-provider.h"
+#include "netdev-tc-offloads.h"
+#include "netdev-vport.h"
+#include "netlink-notifier.h"
+#include "netlink-socket.h"
+#include "netlink.h"
+#include "netnsid.h"
+#include "openflow/openflow.h"
+#include "openvswitch/dynamic-string.h"
+#include "openvswitch/hmap.h"
+#include "openvswitch/ofpbuf.h"
+#include "openvswitch/poll-loop.h"
+#include "openvswitch/vlog.h"
+#include "openvswitch/shash.h"
+#include "ovs-atomic.h"
+#include "packets.h"
+#include "rtnetlink.h"
+#include "socket-util.h"
+#include "sset.h"
+#include "tc.h"
+#include "timer.h"
+#include "unaligned.h"
+#include "util.h"
+#include "xdpsock.h"
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
Do we really need to include the above? Or should we update the install
instruction to move them over from the kernel headers?
+
+VLOG_DEFINE_THIS_MODULE(netdev_afxdp);
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+
+#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char
*)base))
+#define UMEM2XPKT(base, i) \
+ ALIGNED_CAST(struct dp_packet_afxdp *, (char *)base
+ \
+ i * sizeof(struct dp_packet_afxdp))
+
+static uint32_t prog_id;
+static struct xsk_socket_info *xsk_configure(int ifindex, int
xdp_queue_id,
+ int mode);
+static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
+static void xsk_destroy(struct xsk_socket_info *xsk);
+
+static struct xsk_umem_info *xsk_configure_umem(void *buffer,
uint64_t size,
+ int xdpmode)
+{
+ struct xsk_umem_info *umem;
+ int ret;
+ int i;
+
+ umem = xcalloc(1, sizeof(*umem));
+ ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq,
&umem->cq,
+ NULL);
Here you pass no user data, so this call will allocate
XSK_RING_PROD__DEFAULT_NUM_DESCS and XSK_RING_CONS__DEFAULT_NUM_DESCS,
not the values you define in xdpsock.h
+
+ if (ret) {
+ VLOG_ERR("xsk umem create failed (%s) mode: %s",
+ ovs_strerror(errno),
+ xdpmode == XDP_COPY ? "SKB": "DRV");
+ free(umem);
+ return NULL;
+ }
+
+ umem->buffer = buffer;
+
+ /* set-up umem pool */
+ umem_pool_init(&umem->mpool, NUM_FRAMES);
Here we should check for return value, see also note in xdpsock.c
+
+ for (i = NUM_FRAMES - 1; i >= 0; i--) {
+ struct umem_elem *elem;
+
+ elem = ALIGNED_CAST(struct umem_elem *,
+ (char *)umem->buffer + i * FRAME_SIZE);
+ umem_elem_push(&umem->mpool, elem);
+ }
+
+ /* set-up metadata */
+ xpacket_pool_init(&umem->xpool, NUM_FRAMES);
Check return value and cleanup/return NULL on error, see
xpacket_pool_init()
+
+ VLOG_DBG("%s xpacket pool from %p to %p", __func__,
+ umem->xpool.array,
+ (char *)umem->xpool.array +
+ NUM_FRAMES * sizeof(struct dp_packet_afxdp));
+
+ for (i = NUM_FRAMES - 1; i >= 0; i--) {
+ struct dp_packet_afxdp *xpacket;
+ struct dp_packet *packet;
+
+ xpacket = UMEM2XPKT(umem->xpool.array, i);
+ xpacket->mpool = &umem->mpool;
+
+ packet = &xpacket->packet;
+ packet->source = DPBUF_AFXDP;
+ }
+
+ return umem;
+}
+
+static struct xsk_socket_info *
+xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
+ uint32_t queue_id, int xdpmode)
+{
+ struct xsk_socket_config cfg;
+ struct xsk_socket_info *xsk;
+ char devname[IF_NAMESIZE];
+ uint32_t idx = 0;
+ int ret;
+ int i;
+
+ xsk = xcalloc(1, sizeof(*xsk));
+ xsk->umem = umem;
+ cfg.rx_size = CONS_NUM_DESCS;
+ cfg.tx_size = PROD_NUM_DESCS;
+ cfg.libbpf_flags = 0;
+
+ if (xdpmode == XDP_ZEROCOPY) {
+ cfg.bind_flags = XDP_ZEROCOPY;
+ cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |
XDP_FLAGS_DRV_MODE;
+ } else {
+ cfg.bind_flags = XDP_COPY;
+ cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |
XDP_FLAGS_SKB_MODE;
+ }
+
+ if (if_indextoname(ifindex, devname) == NULL) {
+ VLOG_ERR("ifindex %d to devname failed (%s)",
+ ifindex, ovs_strerror(errno));
+ free(xsk);
+ return NULL;
+ }
+
+ ret = xsk_socket__create(&xsk->xsk, devname, queue_id,
umem->umem,
+ &xsk->rx, &xsk->tx, &cfg);
+ if (ret) {
+ VLOG_ERR("xsk_socket_create failed (%s) mode: %s qid: %d",
+ ovs_strerror(errno),
+ xdpmode == XDP_COPY ? "SKB": "DRV",
+ queue_id);
+ free(xsk);
+ return NULL;
+ }
+
+ /* Make sure the built-in AF_XDP program is loaded */
+ ret = bpf_get_link_xdp_id(ifindex, &prog_id, cfg.xdp_flags);
+ if (ret) {
+ VLOG_ERR("get XDP prog ID failed (%s)", ovs_strerror(errno));
+ xsk_socket__delete(xsk->xsk);
+ free(xsk);
+ return NULL;
+ }
+
+ xsk_ring_prod__reserve(&xsk->umem->fq, PROD_NUM_DESCS, &idx);
+
We should check if we got the entries we requested
+ for (i = 0;
+ i < PROD_NUM_DESCS * FRAME_SIZE;
+ i += FRAME_SIZE) {
+ struct umem_elem *elem;
+ uint64_t addr;
+
+ elem = umem_elem_pop(&xsk->umem->mpool);
Error check?
+ addr = UMEM2DESC(elem, xsk->umem->buffer);
+
+ *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = addr;
+ }
+
+ xsk_ring_prod__submit(&xsk->umem->fq,
+ PROD_NUM_DESCS);
+ return xsk;
+}
+
+static struct xsk_socket_info *
+xsk_configure(int ifindex, int xdp_queue_id, int xdpmode)
+{
+ struct xsk_socket_info *xsk;
+ struct xsk_umem_info *umem;
+ void *bufs;
+ int ret;
+
+ /* umem memory region */
+ ret = posix_memalign(&bufs, get_page_size(),
+ NUM_FRAMES * FRAME_SIZE);
+ memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
+ ovs_assert(!ret);
We should not assert, just report out of memory and return NULL.
+
+ /* create AF_XDP socket */
+ umem = xsk_configure_umem(bufs,
+ NUM_FRAMES * FRAME_SIZE,
+ xdpmode);
+ if (!umem) {
+ free(bufs);
+ return NULL;
+ }
+
+ xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode);
+ if (!xsk) {
+ /* clean up umem and xpacket pool */
+ (void)xsk_umem__delete(umem->umem);
+ free(bufs);
+ umem_pool_cleanup(&umem->mpool);
+ xpacket_pool_cleanup(&umem->xpool);
+ free(umem);
+ }
+ return xsk;
+}
+
+int
+xsk_configure_all(struct netdev *netdev)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+ struct xsk_socket_info *xsk;
+ int i, ifindex;
+
+ ifindex = linux_get_ifindex(netdev_get_name(netdev));
+
+ /* configure each queue */
+ for (i = 0; i < netdev->n_rxq; i++) {
+ VLOG_INFO("%s configure queue %d mode %s", __func__, i,
+ dev->xdpmode == XDP_COPY ? "SKB" : "DRV");
+ xsk = xsk_configure(ifindex, i, dev->xdpmode);
+ if (!xsk) {
+ VLOG_ERR("failed to create AF_XDP socket on queue %d",
i);
+ goto err;
+ }
+ dev->xsk[i] = xsk;
+ }
+
+ return 0;
+
+err:
+ xsk_destroy_all(netdev);
+ return EINVAL;
+}
+
+static void OVS_UNUSED vlog_hex_dump(const void *buf, size_t count)
+{
+ struct ds ds = DS_EMPTY_INITIALIZER;
+ ds_put_hex_dump(&ds, buf, count, 0, false);
+ VLOG_DBG_RL(&rl, "%s", ds_cstr(&ds));
+ ds_destroy(&ds);
+}
+
+static void
+xsk_destroy(struct xsk_socket_info *xsk)
+{
+ struct xsk_umem *umem;
+
+ if (!xsk) {
+ return;
+ }
+
+ umem = xsk->umem->umem;
+ xsk_socket__delete(xsk->xsk);
+ (void)xsk_umem__delete(umem);
I would log any errors here, specially if we ever support sharing of
umem.
+
+ /* free the packet buffer */
+ free(xsk->umem->buffer);
+
+ /* cleanup umem pool */
+ umem_pool_cleanup(&xsk->umem->mpool);
+
+ /* cleanup metadata pool */
+ xpacket_pool_cleanup(&xsk->umem->xpool);
+
+ free(xsk->umem);
+ free(xsk);
+}
+
+void
+xsk_destroy_all(struct netdev *netdev)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+ int i, ifindex;
+
+ ifindex = linux_get_ifindex(netdev_get_name(netdev));
+
+ for (i = 0; i < MAX_XSKQ; i++) {
+ if (dev->xsk[i]) {
+ VLOG_INFO("destroy xsk[%d]", i);
+ xsk_destroy(dev->xsk[i]);
+ dev->xsk[i] = NULL;
+ }
+ }
+ VLOG_INFO("remove xdp program");
+ xsk_remove_xdp_program(ifindex, dev->xdpmode);
+}
+
+static inline void OVS_UNUSED
+print_xsk_stat(struct xsk_socket_info *xsk OVS_UNUSED) {
+ struct xdp_statistics stat;
+ socklen_t optlen;
+
+ optlen = sizeof stat;
+ ovs_assert(getsockopt(xsk_socket__fd(xsk->xsk), SOL_XDP,
XDP_STATISTICS,
+ &stat, &optlen) == 0);
+
+ VLOG_DBG_RL(&rl, "rx dropped %llu, rx_invalid %llu, tx_invalid
%llu",
+ stat.rx_dropped,
+ stat.rx_invalid_descs,
+ stat.tx_invalid_descs);
+}
Do we want to move this to some specific statistics dump, like
"ovs-vsctl get Interface eno1 statistics"
If you want to keep it, maybe rename it to log_xsk_stats()
+
+int
+netdev_afxdp_set_config(struct netdev *netdev, const struct smap
*args,
+ char **errp OVS_UNUSED)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+ const char *xdpmode;
+ int new_n_rxq;
+
+ ovs_mutex_lock(&dev->mutex);
+
+ new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
+ if (new_n_rxq > MAX_XSKQ) {
+ ovs_mutex_unlock(&dev->mutex);
+ return EINVAL;
+ }
+
+ if (new_n_rxq != netdev->n_rxq) {
+ dev->requested_n_rxq = new_n_rxq;
+ netdev_request_reconfigure(netdev);
+ }
+
+ xdpmode = smap_get(args, "xdpmode");
+ if (xdpmode && strncmp(xdpmode, "drv", 3) == 0) {
+ dev->requested_xdpmode = XDP_ZEROCOPY;
+ if (dev->xdpmode != dev->requested_xdpmode) {
+ netdev_request_reconfigure(netdev);
+ }
+ } else {
+ dev->requested_xdpmode = XDP_COPY;
+ if (dev->xdpmode != dev->requested_xdpmode) {
+ netdev_request_reconfigure(netdev);
+ }
+ }
+ ovs_mutex_unlock(&dev->mutex);
+ return 0;
+}
+
+int
+netdev_afxdp_get_config(const struct netdev *netdev, struct smap
*args)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+
+ ovs_mutex_lock(&dev->mutex);
+ smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
+ smap_add_format(args, "xdpmode", "%s",
+ dev->xdp_bind_flags == XDP_ZEROCOPY ? "drv" : "skb");
+ ovs_mutex_unlock(&dev->mutex);
+ return 0;
+}
+
+int
+netdev_afxdp_reconfigure(struct netdev *netdev)
+{
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+ struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+ int err = 0;
+
+ ovs_mutex_lock(&dev->mutex);
+
+ if (netdev->n_rxq == dev->requested_n_rxq
+ && dev->xdpmode == dev->requested_xdpmode) {
+ goto out;
+ }
+
+ xsk_destroy_all(netdev);
+ netdev->n_rxq = dev->requested_n_rxq;
+
+ if (dev->requested_xdpmode == XDP_ZEROCOPY) {
+ VLOG_INFO("AF_XDP device %s in DRV mode",
netdev_get_name(netdev));
+ /* From SKB mode to DRV mode */
+ dev->xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |
XDP_FLAGS_DRV_MODE;
+ dev->xdp_bind_flags = XDP_ZEROCOPY;
+ dev->xdpmode = XDP_ZEROCOPY;
+
+ if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+ VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK): %s",
+ ovs_strerror(errno));
+ }
+ } else {
+ VLOG_INFO("AF_XDP device %s in SKB mode",
netdev_get_name(netdev));
+ /* From DRV mode to SKB mode */
+ dev->xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |
XDP_FLAGS_SKB_MODE;
+ dev->xdp_bind_flags = XDP_COPY;
+ dev->xdpmode = XDP_COPY;
+ /* TODO: set rlimit back to previous value
+ * when no device is in DRV mode.
+ */
+ }
+
+ err = xsk_configure_all(netdev);
+ if (err) {
+ VLOG_ERR("AF_XDP device %s reconfig fails",
netdev_get_name(netdev));
+ }
+ netdev_change_seq_changed(netdev);
+out:
+ ovs_mutex_unlock(&dev->mutex);
+ return err;
+}
+
+int
+netdev_afxdp_get_numa_id(const struct netdev *netdev)
+{
+ /* FIXME: Get netdev's PCIe device ID, then find
+ * its NUMA node id.
+ */
+ VLOG_INFO("FIXME: Device %s always use numa id 0",
+ netdev_get_name(netdev));
+ return 0;
+}
+
+void
+xsk_remove_xdp_program(uint32_t ifindex, int xdpmode)
+{
+ uint32_t curr_prog_id = 0;
+ uint32_t flags;
+
+ /* remove_xdp_program() */
+ if (xdpmode == XDP_COPY) {
+ flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
+ } else {
+ flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
+ }
+
+ if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, flags)) {
+ bpf_set_link_xdp_fd(ifindex, -1, flags);
+ }
+ if (prog_id == curr_prog_id) {
+ bpf_set_link_xdp_fd(ifindex, -1, flags);
+ } else if (!curr_prog_id) {
+ VLOG_INFO("couldn't find a prog id on a given interface");
+ } else {
+ VLOG_INFO("program on interface changed, not removing");
+ }
+}
+
+struct dp_packet_afxdp *
+dp_packet_cast_afxdp(const struct dp_packet *d)
+{
+ ovs_assert(d->source == DPBUF_AFXDP);
+ return CONTAINER_OF(d, struct dp_packet_afxdp, packet);
+}
+
+void
+free_afxdp_buf(struct dp_packet *p)
+{
+ struct dp_packet_afxdp *xpacket;
+ unsigned long addr;
+
+ xpacket = dp_packet_cast_afxdp(p);
+ if (xpacket->mpool) {
+ void *base = dp_packet_base(p);
+
+ addr = (unsigned long)base & (~FRAME_SHIFT_MASK);
+ umem_elem_push(xpacket->mpool, (void *)addr);
+ }
+}
+
+void
+free_afxdp_buf_batch(struct dp_packet_batch *batch)
+{
+ struct dp_packet_afxdp *xpacket = NULL;
+ struct dp_packet *packet;
+ void *elems[BATCH_SIZE];
+ unsigned long addr;
+
+ /* all packets are AF_XDP, so handles its own delete in batch
*/
+ DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+ xpacket = dp_packet_cast_afxdp(packet);
+ if (xpacket->mpool) {
+ void *base = dp_packet_base(packet);
+
+ addr = (unsigned long)base & (~FRAME_SHIFT_MASK);
+ elems[i] = (void *)addr;
+ }
+ }
+ umem_elem_push_n(xpacket->mpool, batch->count, elems);
+ dp_packet_batch_init(batch);
+}
+
+/* Receive packet from AF_XDP socket */
+int
+netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
+ struct dp_packet_batch *batch)
+{
+ struct umem_elem *elems[BATCH_SIZE];
+ uint32_t idx_rx = 0, idx_fq = 0;
+ unsigned int rcvd, i;
+ int ret = 0;
+
+ /* See if there is any packet on RX queue,
+ * if yes, idx_rx is the index having the packet.
+ */
+ rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
+ if (!rcvd) {
+ return 0;
+ }
+
+ /* Form a dp_packet batch from descriptor in RX queue */
+ for (i = 0; i < rcvd; i++) {
+ uint64_t addr = xsk_ring_cons__rx_desc(&xsk->rx,
idx_rx)->addr;
+ uint32_t len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->len;
+ char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+ uint64_t index;
+
+ struct dp_packet_afxdp *xpacket;
+ struct dp_packet *packet;
+
+ index = addr >> FRAME_SHIFT;
+ xpacket = UMEM2XPKT(xsk->umem->xpool.array, index);
+
+ packet = &xpacket->packet;
+ xpacket->mpool = &xsk->umem->mpool;
Do we need to set this up again? This should be static and setup in
xsk_configure_umem()
+
+ /* Initialize the struct dp_packet */
+ dp_packet_use_afxdp(packet, pkt, FRAME_SIZE -
FRAME_HEADROOM);
+ dp_packet_set_size(packet, len);
+
+ /* Add packet into batch, increase batch->count */
+ dp_packet_batch_add(batch, packet);
+
+ idx_rx++;
+ }
+
+ /* We've consume rcvd packets in RX, now re-fill the
+ * same number back to FILL queue.
+ */
+ ret = umem_elem_pop_n(&xsk->umem->mpool, rcvd, (void **)elems);
+ if (OVS_UNLIKELY(ret)) {
+ return -ENOMEM;
+ }
+
I saw Ilya's comments on this section also, but should we not continue
to process the batch even if we can't stock the kernel with new buffers?
Maybe other PMDs have a bunch of packets pending (send and receive) so
if we are temporarily out of buffers.
Maybe we can re-stock later...
+ for (i = 0; i < rcvd; i++) {
+ uint64_t index;
+ struct umem_elem *elem;
+
+ ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
+ while (OVS_UNLIKELY(ret == 0)) {
+ /* The FILL queue is full, so retry. (or skip)? */
+ ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
+ }
+
+ /* Get one free umem, program it into FILL queue */
+ elem = elems[i];
+ index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
+ ovs_assert((index & FRAME_SHIFT_MASK) == 0);
+ *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq) = index;
+
+ idx_fq++;
+ }
+ xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
+
+ /* Release the RX queue */
+ xsk_ring_cons__release(&xsk->rx, rcvd);
We should move this more up, so the entries are available for the kernel
to fill...
+ xsk->rx_npkts += rcvd;
+
+#ifdef AFXDP_DEBUG
+ print_xsk_stat(xsk);
+#endif
+ return 0;
+}
+
+static inline int kick_tx(struct xsk_socket_info *xsk)
+{
+ int ret;
+
+ /* This causes system call into kernel's xsk_sendmsg, and
+ * xsk_generic_xmit (skb mode) or xsk_async_xmit (driver mode).
+ */
+ ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT,
NULL, 0);
+ if (OVS_UNLIKELY(ret < 0)) {
+ if (errno == ENXIO || errno == ENOBUFS || errno ==
EOPNOTSUPP) {
+ return errno;
+ }
+ }
+ /* no error, or EBUSY or EAGAIN */
+ return 0;
+}
+
+int
+netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
+ struct dp_packet_batch *batch)
+{
See Ilya's comment on thread safety on the ring APIs.
+ struct umem_elem *elems_pop[BATCH_SIZE];
+ struct umem_elem *elems_push[BATCH_SIZE];
+ uint32_t tx_done, idx_cq = 0;
+ struct dp_packet *packet;
+ uint32_t idx = 0;
+ int j, ret, retry_count = 0;
+ const int max_retry = 4;
+
+ ret = umem_elem_pop_n(&xsk->umem->mpool, batch->count, (void
**)elems_pop);
+ if (OVS_UNLIKELY(ret)) {
+ return EAGAIN;
+ }
+
+ /* Make sure we have enough TX descs */
+ ret = xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx);
+ if (OVS_UNLIKELY(ret == 0)) {
+ umem_elem_push_n(&xsk->umem->mpool, batch->count, (void
**)elems_pop);
+ return EAGAIN;
+ }
+
+ DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+ struct umem_elem *elem;
+ uint64_t index;
+
+ elem = elems_pop[i];
+ /* Copy the packet to the umem we just pop from umem pool.
+ * We can avoid this copy if the packet and the pop umem
+ * are located in the same umem.
+ */
The comment mentions the copy can be avoided, but it's not implemented
in the code, is this correct or was something removed?
+ memcpy(elem, dp_packet_data(packet), dp_packet_size(packet));
+
+ index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
+ xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index;
+ xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len
+ = dp_packet_size(packet);
+ }
+ xsk_ring_prod__submit(&xsk->tx, batch->count);
+ xsk->outstanding_tx += batch->count;
+
+ ret = kick_tx(xsk);
+ if (OVS_UNLIKELY(ret)) {
+ umem_elem_push_n(&xsk->umem->mpool, batch->count, (void
**)elems_pop);
+ VLOG_WARN_RL(&rl, "error sending AF_XDP packet: %s",
+ ovs_strerror(ret));
+ return ret;
I think we should still try to recover the CQ below, even on failure.
+ }
+
+retry:
+ /* Process CQ */
+ tx_done = xsk_ring_cons__peek(&xsk->umem->cq, batch->count,
&idx_cq);
+ if (tx_done > 0) {
+ xsk->outstanding_tx -= tx_done;
+ xsk->tx_npkts += tx_done;
+ }
+
+ /* Recycle back to umem pool */
+ for (j = 0; j < tx_done; j++) {
+ struct umem_elem *elem;
+ uint64_t addr;
+
+ addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
+
+ elem = ALIGNED_CAST(struct umem_elem *,
+ (char *)xsk->umem->buffer + addr);
+ elems_push[j] = elem;
+ }
+
+ ret = umem_elem_push_n(&xsk->umem->mpool, tx_done, (void
**)elems_push);
+ ovs_assert(ret == 0);
+
+ xsk_ring_cons__release(&xsk->umem->cq, tx_done);
+
+ if (xsk->outstanding_tx > PROD_NUM_DESCS - (PROD_NUM_DESCS >> 2))
{
+ /* If there are still a lot not transmitted, try harder. */
+ if (retry_count++ > max_retry) {
+ return 0;
+ }
+ goto retry;
+ }
+
I think the code above is causing my lockup at wire speed mentioned
above...
I guess the retry_count expires every transmit sending packets to the
TAP interface.
No all buffers are used... This is causing the umem_elem_pop_n() in the
beginning to fail, hence the buffers are never returned!
Guess we might need some reclaim in the beginning, or maybe even in the
rx loop?
+ return 0;
+}
diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
new file mode 100644
index 000000000000..6518d8fca0b5
--- /dev/null
+++ b/lib/netdev-afxdp.h
@@ -0,0 +1,53 @@
+/*
+ * Copyright (c) 2018 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+ * See the License for the specific language governing permissions
and
+ * limitations under the License.
+ */
+
+#ifndef NETDEV_AFXDP_H
+#define NETDEV_AFXDP_H 1
+
+#include <stdint.h>
+#include <stdbool.h>
+
+/* These functions are Linux AF_XDP specific, so they should be used
directly
+ * only by Linux-specific code. */
Extra enter?
+#define MAX_XSKQ 16
Extra enter?
+struct netdev;
+struct xsk_socket_info;
+struct xdp_umem;
+struct dp_packet_batch;
+struct smap;
+struct dp_packet;
+
+struct dp_packet_afxdp * dp_packet_cast_afxdp(const struct dp_packet
*d);
+
+int xsk_configure_all(struct netdev *netdev);
+
+void xsk_destroy_all(struct netdev *netdev);
+
+int netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
+ struct dp_packet_batch *batch);
+
+int netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
+ struct dp_packet_batch *batch);
+
+int netdev_afxdp_set_config(struct netdev *netdev, const struct smap
*args,
+ char **errp);
+int netdev_afxdp_get_config(const struct netdev *netdev, struct smap
*args);
+int netdev_afxdp_get_numa_id(const struct netdev *netdev);
+
+void free_afxdp_buf(struct dp_packet *p);
+void free_afxdp_buf_batch(struct dp_packet_batch *batch);
+int netdev_afxdp_reconfigure(struct netdev *netdev);
+#endif /* netdev-afxdp.h */
diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
new file mode 100644
index 000000000000..3dd3d902b3c4
--- /dev/null
+++ b/lib/netdev-linux-private.h
@@ -0,0 +1,124 @@
+/*
+ * Copyright (c) 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+ * See the License for the specific language governing permissions
and
+ * limitations under the License.
+ */
+
+#ifndef NETDEV_LINUX_PRIVATE_H
+#define NETDEV_LINUX_PRIVATE_H 1
+
+#include <config.h>
+
+#include <linux/filter.h>
+#include <linux/gen_stats.h>
+#include <linux/if_ether.h>
+#include <linux/if_tun.h>
+#include <linux/types.h>
+#include <linux/ethtool.h>
+#include <linux/mii.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+#include "netdev-provider.h"
+#include "netdev-tc-offloads.h"
+#include "netdev-vport.h"
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+#include "timer.h"
Why include all the above? They where just added to netdev-linux.h, so
if you make sure you include netdev-lunux.h before -private it should
work out.
+
+#if HAVE_AF_XDP
+#include "netdev-afxdp.h"
+#endif
See earlier comment
+
+/* These functions are Linux specific, so they should be used
directly only by
+ * Linux-specific code. */
+
+struct netdev;
+
+int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t
flag,
+ const char *flag_name, bool
enable);
+int linux_get_ifindex(const char *netdev_name);
+
These functions are now both specified in netdev-linux.h and
netdev-linux-private.h
+#define LINUX_FLOW_OFFLOAD_API \
+ .flow_flush = netdev_tc_flow_flush, \
+ .flow_dump_create = netdev_tc_flow_dump_create, \
+ .flow_dump_destroy = netdev_tc_flow_dump_destroy, \
+ .flow_dump_next = netdev_tc_flow_dump_next, \
+ .flow_put = netdev_tc_flow_put, \
+ .flow_get = netdev_tc_flow_get, \
+ .flow_del = netdev_tc_flow_del, \
+ .init_flow_api = netdev_tc_init_flow_api
+
Same here, this define is in both include files.
+struct netdev_linux {
+ struct netdev up;
+
+ /* Protects all members below. */
+ struct ovs_mutex mutex;
+
+ unsigned int cache_valid;
+
+ bool miimon; /* Link status of last poll. */
+ long long int miimon_interval; /* Miimon Poll rate. Disabled if
<= 0. */
+ struct timer miimon_timer;
+
+ int netnsid; /* Network namespace ID. */
+ /* The following are figured out "on demand" only. They are only
valid
+ * when the corresponding VALID_* bit in 'cache_valid' is set. */
+ int ifindex;
+ struct eth_addr etheraddr;
+ int mtu;
+ unsigned int ifi_flags;
+ long long int carrier_resets;
+ uint32_t kbits_rate; /* Policing data. */
+ uint32_t kbits_burst;
+ int vport_stats_error; /* Cached error code from
vport_get_stats().
+ 0 or an errno value. */
+ int netdev_mtu_error; /* Cached error code from SIOCGIFMTU
+ * or SIOCSIFMTU.
+ */
+ int ether_addr_error; /* Cached error code from set/get
etheraddr. */
+ int netdev_policing_error; /* Cached error code from set
policing. */
+ int get_features_error; /* Cached error code from
ETHTOOL_GSET. */
+ int get_ifindex_error; /* Cached error code from
SIOCGIFINDEX. */
+
+ enum netdev_features current; /* Cached from ETHTOOL_GSET. */
+ enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
+ enum netdev_features supported; /* Cached from ETHTOOL_GSET. */
+
+ struct ethtool_drvinfo drvinfo; /* Cached from ETHTOOL_GDRVINFO.
*/
+ struct tc *tc;
+
+ /* For devices of class netdev_tap_class only. */
+ int tap_fd;
+ bool present; /* If the device is present in the
namespace */
+ uint64_t tx_dropped; /* tap device can drop if the iface
is down */
+
+ /* LAG information. */
+ bool is_lag_master; /* True if the netdev is a LAG
master. */
+
+ /* AF_XDP information */
+#ifdef HAVE_AF_XDP
+ struct xsk_socket_info *xsk[MAX_XSKQ];
+ int requested_n_rxq;
+ int xdpmode, requested_xdpmode; /* detect mode changed */
+ int xdp_flags, xdp_bind_flags;
+#endif
+};
+
+static struct netdev_linux *
+netdev_linux_cast(const struct netdev *netdev)
+{
In the original definition there was an assert() here, was it removed by
accident?
netdev_linux_rxq_xsk
+ return CONTAINER_OF(netdev, struct netdev_linux, up);
+}
+
+#endif /* netdev-linux-private.h */
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index f75d73fd39f8..1f190406d145 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -17,6 +17,7 @@
#include <config.h>
#include "netdev-linux.h"
+#include "netdev-linux-private.h"
#include <errno.h>
#include <fcntl.h>
@@ -54,6 +55,7 @@
#include "fatal-signal.h"
#include "hash.h"
#include "openvswitch/hmap.h"
+#include "netdev-afxdp.h"
#include "netdev-provider.h"
#include "netdev-tc-offloads.h"
#include "netdev-vport.h"
@@ -487,51 +489,6 @@ static int tc_calc_cell_log(unsigned int mtu);
static void tc_fill_rate(struct tc_ratespec *rate, uint64_t bps, int
mtu);
static int tc_calc_buffer(unsigned int Bps, int mtu, uint64_t
burst_bytes);
-struct netdev_linux {
- struct netdev up;
-
- /* Protects all members below. */
- struct ovs_mutex mutex;
-
- unsigned int cache_valid;
-
- bool miimon; /* Link status of last poll. */
- long long int miimon_interval; /* Miimon Poll rate. Disabled if
<= 0. */
- struct timer miimon_timer;
-
- int netnsid; /* Network namespace ID. */
- /* The following are figured out "on demand" only. They are only
valid
- * when the corresponding VALID_* bit in 'cache_valid' is set. */
- int ifindex;
- struct eth_addr etheraddr;
- int mtu;
- unsigned int ifi_flags;
- long long int carrier_resets;
- uint32_t kbits_rate; /* Policing data. */
- uint32_t kbits_burst;
- int vport_stats_error; /* Cached error code from
vport_get_stats().
- 0 or an errno value. */
- int netdev_mtu_error; /* Cached error code from SIOCGIFMTU
or SIOCSIFMTU. */
- int ether_addr_error; /* Cached error code from set/get
etheraddr. */
- int netdev_policing_error; /* Cached error code from set
policing. */
- int get_features_error; /* Cached error code from
ETHTOOL_GSET. */
- int get_ifindex_error; /* Cached error code from
SIOCGIFINDEX. */
-
- enum netdev_features current; /* Cached from ETHTOOL_GSET. */
- enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
- enum netdev_features supported; /* Cached from ETHTOOL_GSET. */
-
- struct ethtool_drvinfo drvinfo; /* Cached from ETHTOOL_GDRVINFO.
*/
- struct tc *tc;
-
- /* For devices of class netdev_tap_class only. */
- int tap_fd;
- bool present; /* If the device is present in the
namespace */
- uint64_t tx_dropped; /* tap device can drop if the iface
is down */
-
- /* LAG information. */
- bool is_lag_master; /* True if the netdev is a LAG
master. */
-};
struct netdev_rxq_linux {
struct netdev_rxq up;
@@ -579,18 +536,23 @@ is_netdev_linux_class(const struct netdev_class
*netdev_class)
return netdev_class->run == netdev_linux_run;
}
+#if HAVE_AF_XDP
static bool
-is_tap_netdev(const struct netdev *netdev)
+is_afxdp_netdev(const struct netdev *netdev)
{
- return netdev_get_class(netdev) == &netdev_tap_class;
+ return netdev_get_class(netdev) == &netdev_afxdp_class;
}
-
-static struct netdev_linux *
-netdev_linux_cast(const struct netdev *netdev)
+#else
+static bool
+is_afxdp_netdev(const struct netdev *netdev OVS_UNUSED)
{
- ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
-
- return CONTAINER_OF(netdev, struct netdev_linux, up);
+ return false;
+}
+#endif
+static bool
+is_tap_netdev(const struct netdev *netdev)
+{
+ return netdev_get_class(netdev) == &netdev_tap_class;
}
static struct netdev_rxq_linux *
@@ -1084,6 +1046,11 @@ netdev_linux_destruct(struct netdev *netdev_)
atomic_count_dec(&miimon_cnt);
}
+#if HAVE_AF_XDP
+ if (is_afxdp_netdev(netdev_)) {
+ xsk_destroy_all(netdev_);
+ }
+#endif
Think you can remove the HAVE_AF_XDP here, as you do not use it below
either.
ovs_mutex_destroy(&netdev->mutex);
}
@@ -1113,7 +1080,7 @@ netdev_linux_rxq_construct(struct netdev_rxq
*rxq_)
rx->is_tap = is_tap_netdev(netdev_);
if (rx->is_tap) {
rx->fd = netdev->tap_fd;
- } else {
+ } else if (!is_afxdp_netdev(netdev_)) {
struct sockaddr_ll sll;
int ifindex, val;
/* Result of tcpdump -dd inbound */
@@ -1318,10 +1285,18 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_,
struct dp_packet_batch *batch,
{
struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
struct netdev *netdev = rx->up.netdev;
- struct dp_packet *buffer;
+ struct dp_packet *buffer = NULL;
ssize_t retval;
int mtu;
+#if HAVE_AF_XDP
Think this #if HAVE_AF_XDP can be removed as the compiler should
optimize out the if (false).
+ if (is_afxdp_netdev(netdev)) {
+ struct netdev_linux *dev = netdev_linux_cast(netdev);
+ int qid = rxq_->queue_id;
+
+ return netdev_linux_rxq_xsk(dev->xsk[qid], batch);
+ }
+#endif
if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) {
mtu = ETH_PAYLOAD_MAX;
}
@@ -1329,6 +1304,7 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_,
struct dp_packet_batch *batch,
/* Assume Ethernet port. No need to set packet_type. */
buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu,
DP_NETDEV_HEADROOM);
+
retval = (rx->is_tap
? netdev_linux_rxq_recv_tap(rx->fd, buffer)
: netdev_linux_rxq_recv_sock(rx->fd, buffer));
@@ -1480,7 +1456,8 @@ netdev_linux_send(struct netdev *netdev_, int
qid OVS_UNUSED,
int error = 0;
int sock = 0;
- if (!is_tap_netdev(netdev_)) {
+ if (!is_tap_netdev(netdev_) &&
+ !is_afxdp_netdev(netdev_)) {
if
(netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_))) {
error = EOPNOTSUPP;
goto free_batch;
@@ -1499,6 +1476,36 @@ netdev_linux_send(struct netdev *netdev_, int
qid OVS_UNUSED,
}
error = netdev_linux_sock_batch_send(sock, ifindex, batch);
+#if HAVE_AF_XDP
Same here remove the #if HAVE_AF_XDP
+ } else if (is_afxdp_netdev(netdev_)) {
+ struct netdev_linux *dev = netdev_linux_cast(netdev_);
+ struct dp_packet_afxdp *xpacket;
+ struct umem_pool *first_mpool;
+ struct dp_packet *packet;
+
+ error = netdev_linux_afxdp_batch_send(dev->xsk[qid], batch);
+
+ /* all packets must come frome the same umem pool
+ * and has DPBUF_AFXDP type, otherwise free on-by-one
+ */
+ DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+ if (packet->source != DPBUF_AFXDP) {
+ goto free_batch;
+ }
+
+ xpacket = dp_packet_cast_afxdp(packet);
+ if (i == 0) {
+ first_mpool = xpacket->mpool;
+ continue;
+ }
+ if (xpacket->mpool != first_mpool) {
+ goto free_batch;
+ }
+ }
Why do not we not move all the packet type checks to
free_afxdp_buf_batch()?
+ /* free in batch */
+ free_afxdp_buf_batch(batch);
+ return error;
+#endif
} else {
error = netdev_linux_tap_batch_send(netdev_, batch);
}
@@ -3323,6 +3330,7 @@ const struct netdev_class netdev_linux_class = {
NETDEV_LINUX_CLASS_COMMON,
LINUX_FLOW_OFFLOAD_API,
.type = "system",
+ .is_pmd = false,
.construct = netdev_linux_construct,
.get_stats = netdev_linux_get_stats,
.get_features = netdev_linux_get_features,
@@ -3333,6 +3341,7 @@ const struct netdev_class netdev_linux_class = {
const struct netdev_class netdev_tap_class = {
NETDEV_LINUX_CLASS_COMMON,
.type = "tap",
+ .is_pmd = false,
.construct = netdev_linux_construct_tap,
.get_stats = netdev_tap_get_stats,
.get_features = netdev_linux_get_features,
@@ -3343,10 +3352,26 @@ const struct netdev_class
netdev_internal_class = {
NETDEV_LINUX_CLASS_COMMON,
LINUX_FLOW_OFFLOAD_API,
.type = "internal",
+ .is_pmd = false,
.construct = netdev_linux_construct,
.get_stats = netdev_internal_get_stats,
.get_status = netdev_internal_get_status,
};
+
+#ifdef HAVE_AF_XDP
+const struct netdev_class netdev_afxdp_class = {
+ NETDEV_LINUX_CLASS_COMMON,
+ .type = "afxdp",
+ .is_pmd = true,
+ .construct = netdev_linux_construct,
+ .get_stats = netdev_linux_get_stats,
+ .get_status = netdev_linux_get_status,
+ .set_config = netdev_afxdp_set_config,
+ .get_config = netdev_afxdp_get_config,
+ .reconfigure = netdev_afxdp_reconfigure,
+ .get_numa_id = netdev_afxdp_get_numa_id,
+};
+#endif
#define CODEL_N_QUEUES 0x0000
diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h
index 17ca9120168a..b812e64cb078 100644
--- a/lib/netdev-linux.h
+++ b/lib/netdev-linux.h
@@ -19,6 +19,20 @@
#include <stdint.h>
#include <stdbool.h>
+#include <linux/filter.h>
+#include <linux/gen_stats.h>
+#include <linux/if_ether.h>
+#include <linux/if_tun.h>
+#include <linux/types.h>
+#include <linux/ethtool.h>
+#include <linux/mii.h>
+
+#include "netdev-provider.h"
+#include "netdev-tc-offloads.h"
+#include "netdev-vport.h"
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+#include "timer.h"
Is there a reason why you move all these includes here? If there is you
might as well remove the duplicates from .c files that include
netdev-linux.h, for example, netdev-linux.c
/* These functions are Linux specific, so they should be used
directly only by
* Linux-specific code. */
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index fb0c27e6e8e8..d433818f7064 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -902,7 +902,9 @@ extern const struct netdev_class
netdev_linux_class;
#endif
extern const struct netdev_class netdev_internal_class;
extern const struct netdev_class netdev_tap_class;
-
+#if HAVE_AF_XDP
+extern const struct netdev_class netdev_afxdp_class;
+#endif
#ifdef __cplusplus
}
#endif
diff --git a/lib/netdev.c b/lib/netdev.c
index 7d7ecf6f0946..e2fae37d5a5e 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -146,6 +146,9 @@ netdev_initialize(void)
netdev_register_provider(&netdev_internal_class);
netdev_register_provider(&netdev_tap_class);
netdev_vport_tunnel_register();
+#ifdef HAVE_AF_XDP
+ netdev_register_provider(&netdev_afxdp_class);
+#endif
#endif
#if defined(__FreeBSD__) || defined(__NetBSD__)
netdev_register_provider(&netdev_tap_class);
diff --git a/lib/xdpsock.c b/lib/xdpsock.c
new file mode 100644
index 000000000000..2d80e74d69e4
--- /dev/null
+++ b/lib/xdpsock.c
@@ -0,0 +1,239 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+ * See the License for the specific language governing permissions
and
+ * limitations under the License.
+ */
+#include <config.h>
+
+#include "xdpsock.h"
+
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdarg.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <syslog.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "async-append.h"
+#include "coverage.h"
+#include "dirs.h"
+#include "dp-packet.h"
+#include "openvswitch/compiler.h"
+#include "openvswitch/vlog.h"
+#include "ovs-atomic.h"
+#include "ovs-thread.h"
+#include "sat-math.h"
+#include "socket-util.h"
+#include "svec.h"
+#include "syslog-direct.h"
+#include "syslog-libc.h"
+#include "syslog-provider.h"
+#include "timeval.h"
+#include "unixctl.h"
+#include "util.h"
+
+static inline void
+ovs_spinlock_init(ovs_spinlock_t *sl)
+{
+ atomic_init(&sl->locked, 0);
+}
+
+static inline void
+ovs_spin_lock(ovs_spinlock_t *sl)
+{
+ int exp = 0, locked = 0;
+
+ while (!atomic_compare_exchange_strong_explicit(&sl->locked,
&exp, 1,
+ memory_order_acquire,
+ memory_order_relaxed)) {
+ locked = 1;
+ while (locked) {
+ atomic_read_relaxed(&sl->locked, &locked);
+ }
+ exp = 0;
+ }
+}
+
+static inline void
+ovs_spin_unlock(ovs_spinlock_t *sl)
+{
+ atomic_store_explicit(&sl->locked, 0, memory_order_release);
+}
+
+static inline int OVS_UNUSED
+ovs_spin_trylock(ovs_spinlock_t *sl)
+{
+ int exp = 0;
+ return atomic_compare_exchange_strong_explicit(&sl->locked, &exp,
1,
+ memory_order_acquire,
+ memory_order_relaxed);
+}
Move spinlock function out to a common file
+
+inline int
+__umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs)
+{
+ void *ptr;
+
+ if (OVS_UNLIKELY(umemp->index + n > umemp->size)) {
This is a stack overflow
+ return -ENOMEM;
+ }
+
+ ptr = &umemp->array[umemp->index];
+ memcpy(ptr, addrs, n * sizeof(void *));
+ umemp->index += n;
+
+ return 0;
+}
+
+int umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs)
+{
+ int ret;
+
+ ovs_spin_lock(&umemp->mutex);
+ ret = __umem_elem_push_n(umemp, n, addrs);
+ ovs_spin_unlock(&umemp->mutex);
+
+ return ret;
+}
+
+inline void
+__umem_elem_push(struct umem_pool *umemp, void *addr)
+{
+ umemp->array[umemp->index++] = addr;
+}
+
+void
+umem_elem_push(struct umem_pool *umemp, void *addr)
+{
+
+ if (OVS_UNLIKELY(umemp->index >= umemp->size)) {
+ /* stack is overflow, this should not happen */
+ OVS_NOT_REACHED();
+ }
Should this not be moved after the spinlock, i.e. to __umem_elem_push
+
+ ovs_assert(((uint64_t)addr & FRAME_SHIFT_MASK) == 0);
+
+ ovs_spin_lock(&umemp->mutex);
+ __umem_elem_push(umemp, addr);
+ ovs_spin_unlock(&umemp->mutex);
+}
+
+inline int
+__umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs)
+{
+ void *ptr;
+
+ if (OVS_UNLIKELY(umemp->index - n < 0)) {
+ return -ENOMEM;
+ }
+
+ umemp->index -= n;
+ ptr = &umemp->array[umemp->index];
+ memcpy(addrs, ptr, n * sizeof(void *));
+
+ return 0;
+}
+
+int
+umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs)
+{
+ int ret;
+
+ ovs_spin_lock(&umemp->mutex);
+ ret = __umem_elem_pop_n(umemp, n, addrs);
+ ovs_spin_unlock(&umemp->mutex);
+
+ return ret;
+}
+
+inline void *
+__umem_elem_pop(struct umem_pool *umemp)
+{
There is no check here to see if there are actual any elements left,
like there is for pop_n,
so we could corrupt memory/umem_pool
+ return umemp->array[--umemp->index];
+}
+
+void *
+umem_elem_pop(struct umem_pool *umemp)
+{
+ void *ptr;
+
+ ovs_spin_lock(&umemp->mutex);
+ ptr = __umem_elem_pop(umemp);
+ ovs_spin_unlock(&umemp->mutex);
+
+ return ptr;
+}
+
+void **
+__umem_pool_alloc(unsigned int size)
+{
+ void *bufs;
+
+ ovs_assert(posix_memalign(&bufs, getpagesize(),
+ size * sizeof(void *)) == 0);
We should not assert, just return NULL here.
+ memset(bufs, 0, size * sizeof(void *));
+ return (void **)bufs;
+}
+
+unsigned int
+umem_elem_count(struct umem_pool *mpool)
+{
+ return mpool->index;
+}
+
+int
+umem_pool_init(struct umem_pool *umemp, unsigned int size)
+{
+ umemp->array = __umem_pool_alloc(size);
+ if (!umemp->array) {
+ OVS_NOT_REACHED();
If NULL is returned return ENOMEM
+ }
+
+ umemp->size = size;
+ umemp->index = 0;
+ ovs_spinlock_init(&umemp->mutex);
+ return 0;
+}
+
+void
+umem_pool_cleanup(struct umem_pool *umemp)
+{
+ free(umemp->array);
umemp->array = NULL;
+}
+
+/* AF_XDP metadata init/destroy */
+int
+xpacket_pool_init(struct xpacket_pool *xp, unsigned int size)
+{
+ void *bufs;
+
+ /* TODO: check HAVE_POSIX_MEMALIGN */
Guess the above needs to be done
+ ovs_assert(posix_memalign(&bufs, getpagesize(),
+ size * sizeof(struct dp_packet_afxdp))
== 0);
We should not assert, just return false
+ memset(bufs, 0, size * sizeof(struct dp_packet_afxdp));
+
+ xp->array = bufs;
+ xp->size = size;
+ return 0;
+}
+
+void
+xpacket_pool_cleanup(struct xpacket_pool *xp)
+{
+ free(xp->array);
xp->array = NULL;
+}
diff --git a/lib/xdpsock.h b/lib/xdpsock.h
new file mode 100644
index 000000000000..aabaa8e5df24
--- /dev/null
+++ b/lib/xdpsock.h
@@ -0,0 +1,123 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+ * See the License for the specific language governing permissions
and
+ * limitations under the License.
+ */
+
+#ifndef XDPSOCK_H
+#define XDPSOCK_H 1
+
+#include <bpf/libbpf.h>
+#include <bpf/xsk.h>
+#include <errno.h>
+#include <getopt.h>
+#include <libgen.h>
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <linux/if_xdp.h>
+#include <linux/if_ether.h>
+#include <locale.h>
+#include <net/if.h>
+#include <poll.h>
+#include <pthread.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/resource.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+
+#define FRAME_HEADROOM XDP_PACKET_HEADROOM
+#define FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE
+#define BATCH_SIZE NETDEV_MAX_BURST
Move this item to the bottom, so you have FRAME specific define's first
+#define FRAME_SHIFT XSK_UMEM__DEFAULT_FRAME_SHIFT
+#define FRAME_SHIFT_MASK ((1 << FRAME_SHIFT) - 1)
+
+#define NUM_FRAMES 4096
Should we add a note/check to make sure this value is a power of 2?
+#define PROD_NUM_DESCS 512
+#define CONS_NUM_DESCS 512
+
+#ifdef USE_XSK_DEFAULT
+#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
+#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
+#endif
Any reason for having this? Should we use the default values? They are
4x larger than you have, did it make any difference in performance
results?
We could make it configurable like for DPDK, using the
n_txq_desc/n_rxq_desc option.
+
+typedef struct {
+ atomic_int locked;
+} ovs_spinlock_t;
+
Think we should move the ovs_spinlock code and includes to some global
place, maybe util or thread
+/* LIFO ptr_array */
+struct umem_pool {
+ int index; /* point to top */
+ unsigned int size;
+ ovs_spinlock_t mutex;
+ void **array; /* a pointer array, point to umem buf */
+};
+
+/* array-based dp_packet_afxdp */
+struct xpacket_pool {
+ unsigned int size;
+ struct dp_packet_afxdp **array;
+};
+
+struct xsk_umem_info {
+ struct umem_pool mpool;
+ struct xpacket_pool xpool;
+ struct xsk_ring_prod fq;
+ struct xsk_ring_cons cq;
+ struct xsk_umem *umem;
+ void *buffer;
+};
+
+struct xsk_socket_info {
+ struct xsk_ring_cons rx;
+ struct xsk_ring_prod tx;
+ struct xsk_umem_info *umem;
+ struct xsk_socket *xsk;
+ unsigned long rx_npkts;
+ unsigned long tx_npkts;
+ unsigned long prev_rx_npkts;
+ unsigned long prev_tx_npkts;
+ uint32_t outstanding_tx;
+};
+
+struct umem_elem {
+ struct umem_elem *next;
+};
+
+void __umem_elem_push(struct umem_pool *umemp, void *addr);
+void umem_elem_push(struct umem_pool *umemp, void *addr);
+int __umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs);
+int umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs);
+
+void *__umem_elem_pop(struct umem_pool *umemp);
+void *umem_elem_pop(struct umem_pool *umemp);
+int __umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs);
+int umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs);
+
+void **__umem_pool_alloc(unsigned int size);
+int umem_pool_init(struct umem_pool *umemp, unsigned int size);
+void umem_pool_cleanup(struct umem_pool *umemp);
+unsigned int umem_elem_count(struct umem_pool *mpool);
+int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size);
+void xpacket_pool_cleanup(struct xpacket_pool *xp);
+
Think all the __umem_* function are only used internally so they should
be come static and be removed here.
+#endif
diff --git a/tests/automake.mk b/tests/automake.mk
index ea16532dd2a0..715cef9a6b3b 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -4,12 +4,14 @@ EXTRA_DIST += \
$(SYSTEM_TESTSUITE_AT) \
$(SYSTEM_KMOD_TESTSUITE_AT) \
$(SYSTEM_USERSPACE_TESTSUITE_AT) \
+ $(SYSTEM_AFXDP_TESTSUITE_AT) \
$(SYSTEM_OFFLOADS_TESTSUITE_AT) \
$(SYSTEM_DPDK_TESTSUITE_AT) \
$(OVSDB_CLUSTER_TESTSUITE_AT) \
$(TESTSUITE) \
$(SYSTEM_KMOD_TESTSUITE) \
$(SYSTEM_USERSPACE_TESTSUITE) \
+ $(SYSTEM_AFXDP_TESTSUITE) \
$(SYSTEM_OFFLOADS_TESTSUITE) \
$(SYSTEM_DPDK_TESTSUITE) \
$(OVSDB_CLUSTER_TESTSUITE) \
@@ -158,6 +160,11 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \
tests/system-userspace-macros.at \
tests/system-userspace-packet-type-aware.at
+SYSTEM_AFXDP_TESTSUITE_AT = \
+ tests/system-afxdp-testsuite.at \
+ tests/system-afxdp-traffic.at \
+ tests/system-afxdp-macros.at
+
SYSTEM_TESTSUITE_AT = \
tests/system-common-macros.at \
tests/system-ovn.at \
@@ -182,6 +189,7 @@ TESTSUITE = $(srcdir)/tests/testsuite
TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite
SYSTEM_USERSPACE_TESTSUITE =
$(srcdir)/tests/system-userspace-testsuite
+SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite
SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite
SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite
OVSDB_CLUSTER_TESTSUITE = $(srcdir)/tests/ovsdb-cluster-testsuite
@@ -315,6 +323,11 @@ check-system-userspace: all
set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests
AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@"
--recheck)
+check-afxdp: all
+ $(MAKE) install
+ set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests
AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
+ "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
+
check-offloads: all
set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests
AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@"
--recheck)
@@ -352,6 +365,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4
$(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP
$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o [email protected] [email protected]
$(AM_V_at)mv [email protected] $@
+$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT)
$(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT)
+ $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o [email protected] [email protected]
+ $(AM_V_at)mv [email protected] $@
+
$(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT)
$(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT)
$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o [email protected] [email protected]
$(AM_V_at)mv [email protected] $@
diff --git a/tests/system-afxdp-macros.at
b/tests/system-afxdp-macros.at
new file mode 100644
index 000000000000..2c58c2d6554b
--- /dev/null
+++ b/tests/system-afxdp-macros.at
@@ -0,0 +1,153 @@
+# _ADD_BR([name])
+#
+# Expands into the proper ovs-vsctl commands to create a bridge with
the
+# appropriate type and properties
+m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1
datapath_type=netdev
protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15
fail-mode=secure ]])
+
+# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output],
[=override])
+#
+# Creates a database and starts ovsdb-server, starts ovs-vswitchd
+# connected to that database, calls ovs-vsctl to create a bridge
named
+# br0 with predictable settings, passing 'vsctl-args' as additional
+# commands to ovs-vsctl. If 'vsctl-args' causes ovs-vsctl to provide
+# output (e.g. because it includes "create" commands) then
'vsctl-output'
+# specifies the expected output after filtering through uuidfilt.
+m4_define([OVS_TRAFFIC_VSWITCHD_START],
+ [
+ export OVS_PKGDATADIR=$(`pwd`)
+ _OVS_VSWITCHD_START([--disable-system])
+ AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [|
uuidfilt])], [0], [$2])
+])
+
+# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds])
+#
+# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log
files
+# for messages with severity WARN or higher and signaling an error if
any
+# is present. The optional WHITELIST may contain shell-quoted "sed"
+# commands to delete any warnings that are actually expected, e.g.:
+#
+# OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"])
+#
+# 'extra_cmds' are shell commands to be executed afte
OVS_VSWITCHD_STOP() is
+# invoked. They can be used to perform additional cleanups such as
name space
+# removal.
+m4_define([OVS_TRAFFIC_VSWITCHD_STOP],
+ [OVS_VSWITCHD_STOP([dnl
+$1";/netdev_linux.*obtaining netdev stats via vport failed/d
+/dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist.
The Open vSwitch kernel module is probably not loaded./d
+/dpif_netdev(revalidator.*)|ERR|internal error parsing flow key/d
+/dpif(revalidator.*)|WARN|netdev@ovs-netdev: failed to put/d
+"])
+ AT_CHECK([:; $2])
+ ])
+
+m4_define([ADD_VETH_AFXDP],
+ [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return
77])
+ CONFIGURE_AFXDP_VETH_OFFLOADS([$1])
+ AT_CHECK([ip link set $1 netns $2])
+ AT_CHECK([ip link set dev ovs-$1 up])
+ AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
+ set interface ovs-$1 external-ids:iface-id="$1"
type="afxdp"])
+ NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7])
+ NS_CHECK_EXEC([$2], [ip link set dev $1 up])
+ if test -n "$5"; then
+ NS_CHECK_EXEC([$2], [ip link set dev $1 address $5])
+ fi
+ if test -n "$6"; then
+ NS_CHECK_EXEC([$2], [ip route add default via $6])
+ fi
+ on_exit 'ip link del ovs-$1'
+ ]
+)
+
+# CONFIGURE_AFXDP_VETH_OFFLOADS([VETH])
+#
+# Disable TX offloads and VLAN offloads for veths used in AF_XDP.
+m4_define([CONFIGURE_AFXDP_VETH_OFFLOADS],
+ [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])
+ AT_CHECK([ethtool -K $1 rxvlan off], [0], [ignore], [ignore])
+ AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
+ ]
+)
+
+# CONFIGURE_VETH_OFFLOADS([VETH])
+#
+# Disable TX offloads for veths. The userspace datapath uses the
AF_PACKET
+# socket to receive packets for veths. Unfortunately, the AF_PACKET
socket
+# doesn't play well with offloads:
+# 1. GSO packets are received without segmentation and therefore
discarded.
+# 2. Packets with offloaded partial checksum are received with the
wrong
+# checksum, therefore discarded by the receiver.
+#
+# By disabling tx offloads in the non-OVS side of the veth peer we
make sure
+# that the AF_PACKET socket will not receive bad packets.
+#
+# This is a workaround, and should be removed when offloads are
properly
+# supported in netdev-linux.
+m4_define([CONFIGURE_VETH_OFFLOADS],
+ [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])]
+)
+
+# CHECK_CONNTRACK()
+#
+# Perform requirements checks for running conntrack tests.
+#
+m4_define([CHECK_CONNTRACK],
+ [AT_SKIP_IF([test $HAVE_PYTHON = no])]
+)
+
+# CHECK_CONNTRACK_ALG()
+#
+# Perform requirements checks for running conntrack ALG tests. The
userspace
+# supports FTP and TFTP.
+#
+m4_define([CHECK_CONNTRACK_ALG])
+
+# CHECK_CONNTRACK_FRAG()
+#
+# Perform requirements checks for running conntrack fragmentations
tests.
+# The userspace doesn't support fragmentation yet, so skip the tests.
+m4_define([CHECK_CONNTRACK_FRAG],
+[
+ AT_SKIP_IF([:])
+])
+
+# CHECK_CONNTRACK_LOCAL_STACK()
+#
+# Perform requirements checks for running conntrack tests with local
stack.
+# While the kernel connection tracker automatically passes all the
connection
+# tracking state from an internal port to the OpenvSwitch kernel
module, there
+# is simply no way of doing that with the userspace, so skip the
tests.
+m4_define([CHECK_CONNTRACK_LOCAL_STACK],
+[
+ AT_SKIP_IF([:])
+])
+
+# CHECK_CONNTRACK_NAT()
+#
+# Perform requirements checks for running conntrack NAT tests. The
userspace
+# datapath supports NAT.
+#
+m4_define([CHECK_CONNTRACK_NAT])
+
+# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE()
+#
+# Perform requirements checks for running ovs-dpctl flush-conntrack
by
+# conntrack 5-tuple test. The userspace datapath does not support
+# this feature yet.
+m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE],
+[
+ AT_SKIP_IF([:])
+])
+
+# CHECK_CT_DPIF_SET_GET_MAXCONNS()
+#
+# Perform requirements checks for running ovs-dpctl ct-set-maxconns
or
+# ovs-dpctl ct-get-maxconns. The userspace datapath does support this
feature.
+m4_define([CHECK_CT_DPIF_SET_GET_MAXCONNS])
+
+# CHECK_CT_DPIF_GET_NCONNS()
+#
+# Perform requirements checks for running ovs-dpctl ct-get-nconns.
The
+# userspace datapath does support this feature.
+m4_define([CHECK_CT_DPIF_GET_NCONNS])
diff --git a/tests/system-afxdp-testsuite.at
b/tests/system-afxdp-testsuite.at
new file mode 100644
index 000000000000..538c0d15d556
--- /dev/null
+++ b/tests/system-afxdp-testsuite.at
@@ -0,0 +1,26 @@
+AT_INIT
+
+AT_COPYRIGHT([Copyright (c) 2018 Nicira, Inc.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at:
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+See the License for the specific language governing permissions and
+limitations under the License.])
+
+m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS])
+
+m4_include([tests/ovs-macros.at])
+m4_include([tests/ovsdb-macros.at])
+m4_include([tests/ofproto-macros.at])
+m4_include([tests/system-afxdp-macros.at])
+m4_include([tests/system-common-macros.at])
+
+m4_include([tests/system-afxdp-traffic.at])
+m4_include([tests/system-ovn.at])
diff --git a/tests/system-afxdp-traffic.at
b/tests/system-afxdp-traffic.at
new file mode 100644
index 000000000000..26f72acf48ef
--- /dev/null
+++ b/tests/system-afxdp-traffic.at
@@ -0,0 +1,978 @@
+AT_BANNER([AF_XDP netdev datapath-sanity])
+
+AT_SETUP([datapath - ping between two ports])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ulimit -l unlimited
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping between two ports on vlan])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+ADD_VLAN(p0, at_ns0, 100, "10.2.2.1/24")
+ADD_VLAN(p1, at_ns1, 100, "10.2.2.2/24")
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping6 between two ports])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
+
+dnl Linux seems to take a little time to get its IPv6 stack in order.
Without
+dnl waiting, we get occasional failures due to the following error:
+dnl "connect: Cannot assign requested address"
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
+
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::2 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping6 between two ports on vlan])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
+
+ADD_VLAN(p0, at_ns0, 100, "fc00:1::1/96")
+ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96")
+
+dnl Linux seems to take a little time to get its IPv6 stack in order.
Without
+dnl waiting, we get occasional failures due to the following error:
+dnl "connect: Cannot assign requested address"
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2])
+
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00:1::2
| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2
| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over vxlan tunnel])
+OVS_CHECK_VXLAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth
pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a
native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([vxlan], [br0], [at_vxlan0], [172.31.1.1],
[10.1.1.100/24])
+ADD_NATIVE_TUNNEL([vxlan], [at_vxlan1], [at_ns0], [172.31.1.100],
[10.1.1.1/24],
+ [id 0 dstport 4789])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
[OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100
| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100
| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over vxlan6 tunnel])
+OVS_CHECK_VXLAN_UDP6ZEROCSUM()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth
pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [],
"nodad")
+AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a
native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([vxlan], [br0], [at_vxlan0], [fc00::1],
[10.1.1.100/24])
+ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], [fc00::100],
[10.1.1.1/24],
+ [id 0 dstport 4789 udp6zerocsumtx udp6zerocsumrx])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0],
[OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100
| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100
| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over gre tunnel])
+OVS_CHECK_GRE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth
pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a
native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1],
[10.1.1.100/24])
+ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100],
[10.1.1.1/24])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
[OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100
| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100
| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over erspan v1 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth
pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a
native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1],
[10.1.1.100/24], [options:key=1 options:erspan_ver=1
options:erspan_idx=7])
+ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100],
[10.1.1.1/24], [seq key 1 erspan_ver 1 erspan 7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
[OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
FORMAT_PING], [0], [dnl
+NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over erspan v2 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth
pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a
native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1],
[10.1.1.100/24], [options:key=1 options:erspan_ver=2
options:erspan_dir=1 options:erspan_hwid=0x7])
+ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100],
[10.1.1.1/24], [seq key 1 erspan_ver 2 erspan_dir egress erspan_hwid
7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
[OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
FORMAT_PING], [0], [dnl
+NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over ip6erspan v1 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth
pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [],
nodad)
+AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a
native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1],
[10.1.1.100/24],
+ [options:key=123 options:erspan_ver=1
options:erspan_idx=0x7])
+ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0],
[fc00:100::100],
+ [10.1.1.1/24], [local fc00:100::1 seq key 123
erspan_ver 1 erspan 7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0],
[OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over ip6erspan v2 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth
pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [],
nodad)
+AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a
native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1],
[10.1.1.100/24],
+ [options:key=121 options:erspan_ver=2
options:erspan_dir=0 options:erspan_hwid=0x7])
+ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0],
[fc00:100::100],
+ [10.1.1.1/24],
+ [local fc00:100::1 seq key 121 erspan_ver 2
erspan_dir ingress erspan_hwid 0x7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0],
[OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over geneve tunnel])
+OVS_CHECK_GENEVE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth
pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a
native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1],
[10.1.1.100/24])
+ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100],
[10.1.1.1/24],
+ [vni 0])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.100/24 br-underlay], [0],
[OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100
| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100
| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over geneve6 tunnel])
+OVS_CHECK_GENEVE_UDP6ZEROCSUM()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth
pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [],
"nodad")
+AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a
native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([geneve], [br0], [at_gnv0], [fc00::1],
[10.1.1.100/24])
+ADD_NATIVE_TUNNEL6([geneve], [ns_gnv0], [at_ns0], [fc00::100],
[10.1.1.1/24],
+ [vni 0 udp6zerocsumtx udp6zerocsumrx])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0],
[OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100
| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100
| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - clone action])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1, at_ns2)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_CHECK([ovs-vsctl -- set interface ovs-p0 ofport_request=1 \
+ -- set interface ovs-p1 ofport_request=2])
+
+AT_DATA([flows.txt], [dnl
+priority=1 actions=NORMAL
+priority=10
in_port=1,ip,actions=clone(mod_dl_dst(50:54:00:00:00:0a),set_field:192.168.3.3->ip_dst),
output:2
+priority=10
in_port=2,ip,actions=clone(mod_dl_src(ae:c6:7e:54:8d:4d),mod_dl_dst(50:54:00:00:00:0b),set_field:192.168.4.4->ip_dst,
controller), output:1
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir
--pidfile 2> ofctl_monitor.log])
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([cat ofctl_monitor.log | STRIP_MONITOR_CSUM], [0], [dnl
+icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
icmp_csum: <skip>
+icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
icmp_csum: <skip>
+icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
icmp_csum: <skip>
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - basic truncate action])
+AT_SKIP_IF([test $HAVE_NC = no])
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-ofctl del-flows br0])
+
+dnl Create p0 and ovs-p0(1)
+ADD_NAMESPACES(at_ns0)
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address
e6:66:c1:11:11:11])
+NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22])
+
+dnl Create p1(3) and ovs-p1(2), packets received from ovs-p1 will
appear in p1
+AT_CHECK([ip link add p1 type veth peer name ovs-p1])
+on_exit 'ip link del ovs-p1'
+AT_CHECK([ip link set dev ovs-p1 up])
+AT_CHECK([ip link set dev p1 up])
+AT_CHECK([ovs-vsctl add-port br0 ovs-p1 -- set interface ovs-p1
ofport_request=2])
+dnl Use p1 to check the truncated packet
+AT_CHECK([ovs-vsctl add-port br0 p1 -- set interface p1
ofport_request=3])
+
+dnl Create p2(5) and ovs-p2(4)
+AT_CHECK([ip link add p2 type veth peer name ovs-p2])
+on_exit 'ip link del ovs-p2'
+AT_CHECK([ip link set dev ovs-p2 up])
+AT_CHECK([ip link set dev p2 up])
+AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2
ofport_request=4])
+dnl Use p2 to check the truncated packet
+AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2
ofport_request=5])
+
+dnl basic test
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_DATA([flows.txt], [dnl
+in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=1 dl_dst=e6:66:c1:22:22:22
actions=output(port=2,max_len=100),output:4
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+dnl use this file as payload file for ncat
+AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2>
/dev/null])
+on_exit 'rm -f payload200.bin'
+NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <
payload200.bin])
+
+dnl packet with truncated size
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed
-n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=100
+])
+dnl packet with original size
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed
-n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=242
+])
+
+dnl more complicated output actions
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_DATA([flows.txt], [dnl
+in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=1 dl_dst=e6:66:c1:22:22:22
actions=output(port=2,max_len=100),output:4,output(port=2,max_len=100),output(port=4,max_len=100),output:2,output(port=4,max_len=200),output(port=2,max_len=65535)
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <
payload200.bin])
+
+dnl 100 + 100 + 242 + min(65535,242) = 684
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed
-n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=684
+])
+dnl 242 + 100 + min(242,200) = 542
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed
-n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=542
+])
+
+dnl SLOW_ACTION: disable kernel datapath truncate support
+dnl Repeat the test above, but exercise the SLOW_ACTION code path
+AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0])
+
+dnl SLOW_ACTION test1: check datapatch actions
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CHECK([ovs-appctl ofproto/trace br0
"in_port=1,dl_type=0x800,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=6,tp_src=8,tp_dst=9"],
[0], [stdout])
+AT_CHECK([tail -3 stdout], [0],
+[Datapath actions:
trunc(100),3,5,trunc(100),3,trunc(100),5,3,trunc(200),5,trunc(65535),3
+This flow is handled by the userspace slow path because it:
+ - Uses action(s) not supported by datapath.
+])
+
+dnl SLOW_ACTION test2: check actual packet truncate
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <
payload200.bin])
+
+dnl 100 + 100 + 242 + min(65535,242) = 684
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed
-n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=684
+])
+
+dnl 242 + 100 + min(242,200) = 542
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed
-n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=542
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+
+AT_BANNER([conntrack])
+
+AT_SETUP([conntrack - controller])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg
ofproto_dpif_upcall:dbg])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic
from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(commit),controller
+priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
+priority=100,in_port=2,ct_state=+trk+est,udp,action=controller
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+AT_CAPTURE_FILE([ofctl_monitor.log])
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir
--pidfile 2> ofctl_monitor.log])
+
+dnl Send an unsolicited reply from port 2. This should be dropped.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\)
'50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
+
+dnl OK, now start a new connection from port 1.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1
ct\(commit\),controller
'50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000'])
+
+dnl Now try a reply from port 2.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\)
'50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
+
+dnl Check this output. We only see the latter two packets, not the
first.
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN2 (xid=0x0): total_len=42 in_port=1 (via action)
data_len=42 (unbuffered)
+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2
udp_csum:0
+NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42
ct_state=est|rpl|trk,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2,ip,in_port=2
(via action) data_len=42 (unbuffered)
+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1
udp_csum:0
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - force commit])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg
ofproto_dpif_upcall:dbg])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(force,commit),controller
+priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
+priority=100,in_port=2,ct_state=+trk+est,udp,action=ct(force,commit,table=1)
+table=1,in_port=2,ct_state=+trk,udp,action=controller
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+AT_CAPTURE_FILE([ofctl_monitor.log])
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir
--pidfile 2> ofctl_monitor.log])
+
+dnl Send an unsolicited reply from port 2. This should be dropped.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
actions=resubmit(,0)"])
+
+dnl OK, now start a new connection from port 1.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
actions=resubmit(,0)"])
+
+dnl Now try a reply from port 2.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+
+dnl Check this output. We only see the latter two packets, not the
first.
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via
action) data_len=42 (unbuffered)
+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2
udp_csum:0
+NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=42
ct_state=new|trk,ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1,ip,in_port=2
(via action) data_len=42 (unbuffered)
+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1
udp_csum:0
+])
+
+dnl
+dnl Check that the directionality has been changed by force commit.
+dnl
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
"orig=.src=10\.1\.1\.2,"], [], [dnl
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2)
+])
+
+dnl OK, now send another packet from port 1 and see that it switches
again
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
actions=resubmit(,0)"])
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
"orig=.src=10\.1\.1\.1,"], [], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - ct flush by 5-tuple])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(commit),2
+priority=100,in_port=2,udp,action=ct(zone=5,commit),1
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,action=ct(zone=5,commit),1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Test UDP from port 1
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
"orig=.src=10\.1\.1\.1,"], [], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack
'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
"orig=.src=10\.1\.1\.1,"], [1], [dnl
+])
+
+dnl Test UDP from port 2
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
"orig=.src=10\.1\.1\.2,"], [0], [dnl
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5
'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)],
[0], [dnl
+])
+
+dnl Test ICMP traffic
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
"orig=.src=10\.1\.1\.2,"], [0], [stdout])
+AT_CHECK([cat stdout | FORMAT_CT(10.1.1.1)], [0],[dnl
+icmp,orig=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=8,code=0),reply=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=0,code=0),zone=5
+])
+
+ICMP_ID=`cat stdout | cut -d ',' -f4 | cut -d '=' -f2`
+ICMP_TUPLE=ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=1,icmp_id=$ICMP_ID,icmp_type=8,icmp_code=0
+AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
"orig=.src=10\.1\.1\.2,"], [1], [dnl
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - IPv4 ping])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic
from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
+priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)],
[0], [dnl
+icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+dnl Pings from ns1->ns0 should fail.
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 |
FORMAT_PING], [0], [dnl
+7 packets transmitted, 0 received, 100% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - get_nconns and get/set_maxconns])
+CHECK_CONNTRACK()
+CHECK_CT_DPIF_SET_GET_MAXCONNS()
+CHECK_CT_DPIF_GET_NCONNS()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic
from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
+priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)],
[0], [dnl
+icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp], [2], [], [dnl
+ovs-vswitchd: maxconns missing or malformed (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns a], [2], [], [dnl
+ovs-vswitchd: maxconns missing or malformed (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp 10], [2], [],
[dnl
+ovs-vswitchd: datapath not found (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns one-bad-dp], [2], [], [dnl
+ovs-vswitchd: datapath not found (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-nconns one-bad-dp], [2], [], [dnl
+ovs-vswitchd: datapath not found (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
+1
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
+3000000
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns 10], [], [dnl
+setting maxconns successful
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
+10
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
+0
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
+10
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - IPv6 ping])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
+
+AT_DATA([flows.txt], [dnl
+
+dnl ICMPv6 echo request and reply go to table 1. The rest of the
traffic goes
+dnl through normal action.
+table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1
+table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1
+table=0,priority=1,action=normal
+
+dnl Allow everything from ns0->ns1. Only allow return traffic from
ns1->ns0.
+table=1,priority=100,in_port=1,icmp6,action=ct(commit),2
+table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0)
+table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1
+table=1,priority=1,action=drop
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
+
+dnl The above ping creates state in the connection tracker. We're
not
+dnl interested in that state.
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+dnl Pings from ns1->ns0 should fail.
+NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 |
FORMAT_PING], [0], [dnl
+7 packets transmitted, 0 received, 100% packet loss, time 0ms
+])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 |
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0],
[dnl
+icmpv6,orig=(src=fc00::1,dst=fc00::2,id=<cleared>,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=<cleared>,type=129,code=0)
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
--
2.7.4
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev