Re: [ovs-dev] [PATCHv8] netdev-afxdp: add new netdev type for AF_XDP.

Eelco Chaudron Fri, 17 May 2019 03:24:25 -0700

Hi William,

First a list of issues I found during some basic testing...

- When I restart or stop OVS (using the systemctl interface as found inRHEL) it does not clean up the BFP program causing the restart to fail:

2019-05-10T09:12:11.384Z|00042|netdev_afxdp|ERR|AF_XDP device eno1reconfig fails2019-05-10T09:12:11.384Z|00043|dpif_netdev|ERR|Failed to setinterface eno1 new configuration

I need to manually run "ip link set dev eno1 xdp off" to make itrecover.



- When I remove a bridge, I get an emer in the revalidator:

  2019-05-10T09:40:34.401Z|00045|netdev_afxdp|INFO|remove xdp program

2019-05-10T09:40:34.652Z|00001|util(revalidator49)|EMER|lib/poll-loop.c:111:assertion !fd != !wevent failed in poll_create_node()


  Easy to replicate with this:

$ ovs-vsctl add-br ovs_pvp_br0 -- set bridge ovs_pvp_br0datapath_type=netdev$ ovs-vsctl add-port ovs_pvp_br0 eno1 -- set interface eno1type="afxdp" options:xdpmode=drv

    $ ovs-vsctl del-br ovs_pvp_br0

- High pmd usage on the statistics, even with no packets is thisexpected?


  $ ovs-appctl dpif-netdev/pmd-rxq-show
  pmd thread numa_id 0 core_id 1:
    isolated : false
    port: dpdk0             queue-id:  0  pmd usage:  0 %
    port: eno1              queue-id:  0  pmd usage: 49 %

  It goes up slowly and gets stuck at 49%


- When doing the PVP testing I noticed that the physical port has odd/no
  tx statistics:

  $ ovs-ofctl dump-ports ovs_pvp_br0
  OFPST_PORT reply (xid=0x2): 3 ports

port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0,crc=0

             tx pkts=0, bytes=0, drop=0, errs=0, coll=0

port eno1: rx pkts=103256197, bytes=6195630508, drop=0, errs=0,frame=0, over=0, crc=0

             tx pkts=0, bytes=19789272440056, drop=0, errs=0, coll=0

port tapVM: rx pkts=4043, bytes=501278, drop=0, errs=0, frame=0,over=0, crc=0

             tx pkts=4058, bytes=502504, drop=0, errs=0, coll=0

- Packets larger than 1028 bytes are dropped. Guess this needs to befixed, and we need to state that jumbo frames are not supported. Are youplanning on adding this?

Currently I can find not mentioning of MTU limitation in thedocumentation, or any code to prevent it from being changed above thesupported limit.

- ovs-vswitchd is still crashing or stops forwarding packets when tryingto doPVP testing with Qemu that has a TAP interface doing XDP and runningpackets

  at wire speed to the 10G interface.

When trying with lower volume packets it seems to work, so with 1%trafficrate, it forwards packets without any problems (148,771 pps). If I goto10% the first couple of packet pass, then it stops forwarding. Ifit's notcrashing I still see packets being received by eno1 flow rules, butno

  packets make it to the VM.

    Program terminated with signal SIGSEGV, Segmentation fault.

#0 0x00000000009b2505 in netdev_linux_afxdp_batch_send (xsk=0x0,batch=batch@entry=0x7fc928005570) at lib/netdev-afxdp.c:654654 ret = umem_elem_pop_n(&xsk->umem->mpool, batch->count,(void **)elems_pop);

    [Current thread is 1 (Thread 0x7fc95e734700 (LWP 3926))]

Missing separate debuginfos, use: dnf debuginfo-installopenvswitch2.11-2.11.0-0.20190129gitd3a10db.el8fdb.x86_64

    (gdb) bt

#0 0x00000000009b2505 in netdev_linux_afxdp_batch_send (xsk=0x0,batch=batch@entry=0x7fc928005570) at lib/netdev-afxdp.c:654#1 0x00000000009a1850 in netdev_linux_send (netdev_=0x2f7f540,qid=<optimized out>, batch=0x7fc928005570, concurrent_txq=<optimizedout>) at lib/netdev-linux.c:1486#2 0x0000000000906051 in netdev_send (netdev=<optimized out>,qid=qid@entry=0, batch=batch@entry=0x7fc928005570,concurrent_txq=concurrent_txq@entry=true)

        at lib/netdev.c:797

#3 0x00000000008d2c94 in dp_netdev_pmd_flush_output_on_port(pmd=pmd@entry=0x7fc95e735010, p=p@entry=0x7fc928005540) atlib/dpif-netdev.c:4185#4 0x00000000008d2faf in dp_netdev_pmd_flush_output_packets(pmd=pmd@entry=0x7fc95e735010, force=force@entry=false) atlib/dpif-netdev.c:4225#5 0x00000000008db317 in dp_netdev_pmd_flush_output_packets(force=false, pmd=0x7fc95e735010) at lib/dpif-netdev.c:4280#6 dp_netdev_process_rxq_port (pmd=pmd@entry=0x7fc95e735010,rxq=0x2f36c50, port_no=1) at lib/dpif-netdev.c:4280#7 0x00000000008db67d in pmd_thread_main (f_=<optimized out>) atlib/dpif-netdev.c:5446#8 0x000000000095c96d in ovsthread_wrapper (aux_=<optimized out>)at lib/ovs-thread.c:352#9 0x00007fc9789d62de in start_thread () from/lib64/libpthread.so.0

    #10 0x00007fc97817ba63 in clone () from /lib64/libc.so.6

- make check-afxpd is failing for me, however, make check-kernel worksfine.Did not dive into it too much, but it fails here for all test cases,this is the same build I use for testing.

./system-afxdp-traffic.at:4: ovs-vsctl -- add-br br0 -- set Bridgebr0 datapath_type=netdevprotocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15fail-mode=secure --

  --- /dev/null 2019-05-16 09:09:33.445562692 -0400

+++/root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/at-groups/1/stderr 2019-05-1705:46:20.506814939 -0400

  @@ -0,0 +1,2 @@

+ovs-vsctl: Error detected while setting up 'br0'. See ovs-vswitchdlog for details.+ovs-vsctl: The default log directory is"/root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/01".

  ovsdb-server.log:

  ovs-vswitchd.log:

  1. system-afxdp-traffic.at:3:  FAILED (system-afxdp-traffic.at:4)




The following might be useful when combining DPDK and AF_XDP:

Currently, DPDK and AF_XDP polling can be combined on a single PMDthread, itmight be nice to have an option to not do this, i.e. have separatePMDthreads for each type. I know we can do this with assigning specificPMDs toqueues, but this will disable auto-balancing. This will also helplater if

  we would add poll() mode support for AF_XDP.

Other review comments see inline below. I reviewed the code, not theunit tests or automake changes.


Cheers,


Eelco


On 10 May 2019, at 1:54, William Tu wrote:

The patch introduces experimental AF_XDP support for OVS netdev.

AF_XDP, Address Family of the eXpress Data Path, is a new Linux sockettype

built upon the eBPF and XDP technology.  It is aims to have comparable

performance to DPDK but cooperate better with existing kernel'snetworkingstack. An AF_XDP socket receives and sends packets from an eBPF/XDPprogramattached to the netdev, by-passing a couple of Linux kernel'ssubsystemsAs a result, AF_XDP socket shows much better performance thanAF_PACKET

For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst. Note that by default, this is not
compiled in.

Signed-off-by: William Tu <[email protected]>

---
v1->v2:
- add a list to maintain unused umem elements
- remove copy from rx umem to ovs internal buffer
- use hugetlb to reduce misses (not much difference)
- use pmd mode netdev in OVS (huge performance improve)
- remove malloc dp_packet, instead put dp_packet in umem

v2->v3:
- rebase on the OVS master, 7ab4b0653784

("configure: Check for more specific function to pull in pthreadlibrary.")

- remove the dependency on libbpf and dpif-bpf.
  instead, use the built-in XDP_ATTACH feature.
- data structure optimizations for better performance, see[1]
- more test cases support

v3:https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html


v3->v4:
- Use AF_XDP API provided by libbpf
- Remove the dependency on XDP_ATTACH kernel patch set
- Add documentation, bpf.rst

v4->v5:
- rebase to master
- remove rfc, squash all into a single patch
- add --enable-afxdp, so by default, AF_XDP is not compiled
- add options: xdpmode=drv,skb
- add multiple queue and multiple PMD support, with options: n_rxq
- improve documentation, rename bpf.rst to af_xdp.rst

v5->v6
- rebase to master, commit 0cdd5b13de91b98
- address errors from sparse and clang
- pass travis-ci test
- address feedback from Ben
- fix issues reported by 0-day robot
- improved documentation

v6-v7
- rebase to master, commit abf11558c1515bf3b1
- address feedbacks from Ilya, Ben, and Eelco, see:
  https://www.mail-archive.com/[email protected]/msg32357.html
- add XDP mode change, implement get/set_config, reconfigure
- Fix reconfiguration/crash issue caused by libbpf, see patch:
  [PATCH bpf 0/2] libbpf: fixes for AF_XDP teardown
- perf optimization for batching umem_push/pop
- perf optimization for batching kick_tx
- test build with dpdk
- fix/refactor atomic operation
- make AF_XDP x86 specific, otherwise fail at build time
- lots of code refactoring
- add PVP setup in documentation

v7-v8:
- Address feedback from Ilya at:
  https://patchwork.ozlabs.org/patch/1095019/
- add netdev-linux-private.h
- fix afxdp reconfigure issue
- sort include headers
- remove unnecessary OVS_UNUSED
- coding style fixes
- error case handling and memory leak
---
 Documentation/automake.mk             |   1 +
 Documentation/index.rst               |   1 +
 Documentation/intro/install/afxdp.rst | 479 +++++++++++++++++
 Documentation/intro/install/index.rst |   1 +
 acinclude.m4                          |  32 ++
 configure.ac                          |   1 +
 lib/automake.mk                       |  13 +
 lib/dp-packet.c                       |  33 ++
 lib/dp-packet.h                       |  22 +-
 lib/dpif-netdev-perf.h                |  14 +
 lib/netdev-afxdp.c                    | 727 +++++++++++++++++++++++++
 lib/netdev-afxdp.h                    |  53 ++
 lib/netdev-linux-private.h            | 124 +++++
 lib/netdev-linux.c                    | 137 +++--
 lib/netdev-linux.h                    |  14 +
 lib/netdev-provider.h                 |   4 +-
 lib/netdev.c                          |   3 +
 lib/xdpsock.c                         | 239 +++++++++
 lib/xdpsock.h                         | 123 +++++
 tests/automake.mk                     |  17 +
 tests/system-afxdp-macros.at          | 153 ++++++
 tests/system-afxdp-testsuite.at       |  26 +

tests/system-afxdp-traffic.at | 978++++++++++++++++++++++++++++++++++

 23 files changed, 3137 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/intro/install/afxdp.rst
 create mode 100644 lib/netdev-afxdp.c
 create mode 100644 lib/netdev-afxdp.h
 create mode 100644 lib/netdev-linux-private.h
 create mode 100644 lib/xdpsock.c
 create mode 100644 lib/xdpsock.h
 create mode 100644 tests/system-afxdp-macros.at
 create mode 100644 tests/system-afxdp-testsuite.at
 create mode 100644 tests/system-afxdp-traffic.at

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 082438e09a33..11cc59efc881 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -10,6 +10,7 @@ DOC_SOURCE = \
        Documentation/intro/why-ovs.rst \
        Documentation/intro/install/index.rst \
        Documentation/intro/install/bash-completion.rst \
+       Documentation/intro/install/afxdp.rst \
        Documentation/intro/install/debian.rst \
        Documentation/intro/install/documentation.rst \
        Documentation/intro/install/distributions.rst \
diff --git a/Documentation/index.rst b/Documentation/index.rst
index 46261235c732..aa9e7c49f179 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -59,6 +59,7 @@ vSwitch? Start here.
   :doc:`intro/install/windows` |
   :doc:`intro/install/xenserver` |
   :doc:`intro/install/dpdk` |
+  :doc:`intro/install/afxdp` |
   :doc:`Installation FAQs <faq/releases>`

 - **Tutorials:** :doc:`tutorials/faucet` |

diff --git a/Documentation/intro/install/afxdp.rstb/Documentation/intro/install/afxdp.rst

new file mode 100644
index 000000000000..1222b433dbbb
--- /dev/null
+++ b/Documentation/intro/install/afxdp.rst
@@ -0,0 +1,479 @@
+..

+ Licensed under the Apache License, Version 2.0 (the "License");you may+ not use this file except in compliance with the License. Youmay obtain

+      a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+

+ Unless required by applicable law or agreed to in writing,software+ distributed under the License is distributed on an "AS IS"BASIS, WITHOUT+ WARRANTIES OR CONDITIONS OF ANY KIND, either express orimplied. See the+ License for the specific language governing permissions andlimitations

+      under the License.
+
+      Convention for heading levels in Open vSwitch documentation:
+
+      =======  Heading 0 (reserved for the title in a document)
+      -------  Heading 1
+      ~~~~~~~  Heading 2
+      +++++++  Heading 3
+      '''''''  Heading 4
+
+      Avoid deeper levels because they do not render well.
+
+
+========================
+Open vSwitch with AF_XDP
+========================
+
+This document describes how to build and install Open vSwitch using
+AF_XDP netdev.
+
+.. warning::
+  The AF_XDP support of Open vSwitch is considered 'experimental',
+  and it is not compiled in by default.
+
+Introduction
+------------

+AF_XDP, Address Family of the eXpress Data Path, is a new Linuxsocket type+built upon the eBPF and XDP technology. It is aims to havecomparable+performance to DPDK but cooperate better with existing kernel'snetworking+stack. An AF_XDP socket receives and sends packets from an eBPF/XDPprogram+attached to the netdev, by-passing a couple of Linux kernel'ssubsystems.+As a result, AF_XDP socket shows much better performance thanAF_PACKET.

+For more details about AF_XDP, please see linux kernel's
+Documentation/networking/af_xdp.rst
+
+
+AF_XDP Netdev
+-------------
+OVS has a couple of netdev types, i.e., system, tap, or
+internal.  The AF_XDP feature adds a new netdev types called
+"afxdp", and implement its configuration, packet reception,
+and transmit functions.  Since the AF_XDP socket, xsk,
+operates in userspace, once ovs-vswitchd receives packets
+from xsk, the proposed architecture re-uses the existing
+userspace dpif-netdev datapath.  As a result, most of
+the packet processing happens at the userspace instead of
+linux kernel.
+
+::
+
+              |   +-------------------+
+              |   |    ovs-vswitchd   |<-->ovsdb-server
+              |   +-------------------+
+              |   |      ofproto      |<-->OpenFlow controllers
+              |   +--------+-+--------+
+              |   | netdev | |ofproto-|
+    userspace |   +--------+ |  dpif  |
+              |   | afxdp  | +--------+
+              |   | netdev | |  dpif  |
+              |   +---||---+ +--------+
+              |       ||     |  dpif- |
+              |       ||     | netdev |
+              |_      ||     +--------+
+                      ||
+               _  +---||-----+--------+
+              |   | AF_XDP prog +     |
+       kernel |   |   xsk_map         |
+              |_  +--------||---------+
+                           ||
+                        physical
+                           NIC
+
+
+Build requirements
+------------------
+

+In addition to the requirements described in :doc:`general`, buildingOpen

+vSwitch with AF_XDP will require the following:
+
+- libbpf from kernel source tree (kernel 5.0.0 or later)
+
+- Linux kernel XDP support, with the following options (required)
+
+  * CONFIG_BPF=y
+
+  * CONFIG_BPF_SYSCALL=y
+
+  * CONFIG_XDP_SOCKETS=y
+
+

+- The following optional Kconfig options are also recommended, butnot

+  required:
+
+  * CONFIG_BPF_JIT=y (Performance)
+
+  * CONFIG_HAVE_BPF_JIT=y (Performance)
+
+  * CONFIG_XDP_SOCKETS_DIAG=y (Debugging)
+
+- If possible, run **./xdpsock -r -N -z -i <your device>** under

+ linux/samples/bpf. This is the OVS indepedent benchmark tools forAF_XDP.

+  It makes sure your basic kernel requirements are met for AF_XDP.
+
+
+Installing
+----------

+For OVS to use AF_XDP netdev, it has to be configured with LIBBPFsupport.

+Frist, clone a recent version of Linux bpf-next tree::
+

+ git clonegit://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

+Second, go into the Linux source directory and build libbpf in thetools

+directory::
+
+  cd bpf-next/
+  cd tools/lib/bpf/
+  make && make install
+  make install_headers
+
+.. note::
+   Make sure xsk.h and bpf.h are installed in system's library path,
+   e.g. /usr/local/include/bpf/ or /usr/include/bpf/
+
+Make sure the libbpf.so is installed correctly::
+
+  ldconfig
+  ldconfig -p | grep libbpf
+
+
+Third, ensure the standard OVS requirements are installed and
+bootstrap/configure the package::
+
+  ./boot.sh && ./configure --enable-afxdp
+
+Finally, build and install OVS::
+
+  make && make install
+
+To kick start end-to-end autotesting::
+
+  uname -a # make sure having 5.0+ kernel
+  make check-afxdp
+
+if a test case fails, check the log at::
+

+ cattests/system-afxdp-testsuite.dir/<number>/system-afxdp-testsuite.log

+
+
+Setup AF_XDP netdev
+-------------------
+Before running OVS with AF_XDP, make sure the libbpf and libelf are
+set-up right::
+
+  ldd vswitchd/ovs-vswitchd
+
+Open vSwitch should be started using userspace datapath as described
+in :doc:`general`::
+
+  ovs-vswitchd --disable-system
+  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
+
+.. note::

+ OVS AF_XDP netdev is using the userspace datapath, the samedatapath+ as used by OVS-DPDK. So it requires --disable-system forovs-vswitchd

+   and datapath_type=netdev when adding a new bridge.

As mentioned earlier offline I think --disable-system can be removed asthe Kernel and userspace datapath can be run at the same time.

+
+Make sure your device driver support AF_XDP, and to use 1 PMD (oncore 4)
+on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask,
+pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or"skb"::

Wondering how options:xdpmode should operate without it being specified?I would prefer that if the option is not specified it would try drv, andif it fails fallback to skb.


We need to add these new options to the vswitch.xml file

+
+  ethtool -L enp2s0 combined 1
+  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10

+ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp"\

+    options:n_rxq=1 options:xdpmode=drv \
+    other_config:pmd-rxq-affinity="0:4"
+
+Or, use 4 pmds/cores and 4 queues by doing::
+
+  ethtool -L enp2s0 combined 4
+  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36

+ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp"\

+    options:n_rxq=4 options:xdpmode=drv \
+    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
+

Add some text that pmd-rxq-affinity is not a requirement, the systemwill auto (re)assign.Also, note that cores used by pmd-rxq-affinity are not shared/used byfloating PMDs.

+To validate that the bridge has successfully instantiated, you canuse the::
+
+  ovs-vsctl show
+
+should show something like::
+
+  Port "ens802f0"
+   Interface "ens802f0"
+      type: afxdp
+      options: {n_rxq="1", xdpmode=drv}
+
+Otherwise, enable debug by::
+
+  ovs-appctl vlog/set netdev_afxdp::dbg
+
+References
+----------
+Most of the design details are described in the paper presented at
+Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
+section 4, and slides[2][4].
+"The Path to DPDK Speeds for AF XDP"[3] gives a very goodintroduction
+about AF_XDP current and future work.
+
+
+[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
+
+[2]http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
+
+[3]http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
+
+[4]https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
+
+
+Performance Tuning
+------------------
+The name of the game is to keep your CPU running in userspace,allowing PMD+to keep polling the AF_XDP queues without any interferences fromkernel.
+
+#. Make sure everything is in the same NUMA node (memory used byAF_XDP, pmd
+   running cores, device plug-in slot)

How can you do this? The code is not taking care of NUMA, and memory isallocated with posix_memalign so no idea which NUMA node it getsallocated.

+#. Isolate your CPU by doing isolcpu at grub configure.
+
+#. IRQ should not set to pmd running core.
+
+#. The Spectre and Meltdown fixes increase the overhead of systemcalls.
+


Maybe be more consistent, either one or two newlines before a heading?

+Debugging performance issue
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+While running the traffic, use linux perf tool to see where your cpu
+spends its cycle::
+
+  cd bpf-next/tools/perf
+  make
+  ./perf record -p `pidof ovs-vswitchd` sleep 10
+  ./perf report
+
+Measure your system call rate by doing::
+
+  pstree -p `pidof ovs-vswitchd`
+  strace -c -p <your pmd's PID>
+
+Or, use OVS pmd tool::
+
+  ovs-appctl dpif-netdev/pmd-stats-show
+
+
+Example Script
+--------------
+
+Below is a script using namespaces and veth peer::
+
+  #!/bin/bash

+ ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl\

+    --disable-system --detach \
+  ovs-vsctl -- add-br br0 -- set Bridge br0 \

+ protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14\

+    fail-mode=secure datapath_type=netdev
+  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
+
+  ip netns add at_ns0
+  ovs-appctl vlog/set netdev_afxdp::dbg
+
+  ip link add p0 type veth peer name afxdp-p0
+  ip link set p0 netns at_ns0
+  ip link set dev afxdp-p0 up
+  ovs-vsctl add-port br0 afxdp-p0 -- \
+    set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
+
+  ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
+  ip addr add "10.1.1.1/24" dev p0
+  ip link set dev p0 up
+  NS_EXEC_HEREDOC
+
+  ip netns add at_ns1
+  ip link add p1 type veth peer name afxdp-p1
+  ip link set p1 netns at_ns1
+  ip link set dev afxdp-p1 up
+
+  ovs-vsctl add-port br0 afxdp-p1 -- \
+    set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
+  ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
+  ip addr add "10.1.1.2/24" dev p1
+  ip link set dev p1 up
+  NS_EXEC_HEREDOC
+
+  ip netns exec at_ns0 ping -i .2 10.1.1.2
+
+
+Limitations/Known Issues
+------------------------

+#. Device's numa ID is always 0, need a way to find numa id from anetdev.+#. No QoS support because AF_XDP netdev by-pass the Linux TC layer. Apossible

+   work-around is to use OpenFlow meter action.
+#. AF_XDP device added to bridge, remove, and added again will fail.

+#. Most of the tests are done using i40e single port. Multiple portsand

+   also ixgbe driver also needs to be tested.
+#. No latency test result (TODO items)
+
+
+make check-afxdp
+----------------

+When executing 'make check-afxdp', OVS creates namespaces, sets upAF_XDP on+veth devices and kicks start the testing. So far we have thefollowing test

+cases::
+
+ AF_XDP netdev datapath-sanity
+
+  1: datapath - ping between two ports               ok
+  2: datapath - ping between two ports on vlan       ok
+  3: datapath - ping6 between two ports              ok
+  4: datapath - ping6 between two ports on vlan      ok
+  5: datapath - ping over vxlan tunnel               ok
+  6: datapath - ping over vxlan6 tunnel              ok
+  7: datapath - ping over gre tunnel                 ok
+  8: datapath - ping over erspan v1 tunnel           ok
+  9: datapath - ping over erspan v2 tunnel           ok
+ 10: datapath - ping over ip6erspan v1 tunnel        ok
+ 11: datapath - ping over ip6erspan v2 tunnel        ok
+ 12: datapath - ping over geneve tunnel              ok
+ 13: datapath - ping over geneve6 tunnel             ok
+ 14: datapath - clone action                         ok
+ 15: datapath - basic truncate action                ok
+
+ conntrack
+
+ 16: conntrack - controller                          ok
+ 17: conntrack - force commit                        ok
+ 18: conntrack - ct flush by 5-tuple                 ok
+ 19: conntrack - IPv4 ping                           ok
+ 20: conntrack - get_nconns and get/set_maxconns     ok
+ 21: conntrack - IPv6 ping                           ok
+
+ system-ovn
+
+ 22: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok
+ 23: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok
+ 24: ovn -- multiple gateway routers, SNAT and DNAT  ok
+ 25: ovn -- load-balancing                           ok
+ 26: ovn -- load-balancing - same subnet.            ok
+ 27: ovn -- load balancing in gateway router         ok
+ 28: ovn -- multiple gateway routers, load-balancing ok
+ 29: ovn -- load balancing in router with gateway router port ok
+ 30: ovn -- DNAT and SNAT on distributed router - N/S ok
+ 31: ovn -- DNAT and SNAT on distributed router - E/W ok
+
+PVP using tap device
+--------------------

+Assume you have enp2s0 as physical nic, and a tap device connected toVM.

+First, start OVS, then add physical port::
+
+  ethtool -L enp2s0 combined 1
+  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10

+ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp"\

+    options:n_rxq=1 options:xdpmode=drv \
+    other_config:pmd-rxq-affinity="0:4"
+
+Start a VM with virtio and tap device::
+
+  qemu-system-x86_64 -hda ubuntu1810.qcow \
+    -m 4096 \
+    -cpu host,+x2apic -enable-kvm \
+    -device virtio-net-pci,mac=00:02:00:00:00:01,netdev=net0,mq=on,\
+      vectors=10,mrg_rxbuf=on,rx_queue_size=1024 \
+    -netdev type=tap,id=net0,vhost=on,queues=8 \
+    -object memory-backend-file,id=mem,size=4096M,\
+      mem-path=/dev/hugepages,share=on \
+    -numa node,memdev=mem -mem-prealloc -smp 2
+
+Create OpenFlow rules::
+
+  ovs-vsctl add-port br0 tap0

Maybe add tap as XDP or else it will be an AF_PACKET interface pollingin the main thread.

+  ovs-ofctl del-flows br0
+  ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:tap0"
+  ovs-ofctl add-flow br0 "in_port=tap0, actions=output:enp2s0"
+
+Inside the VM, use xdp_rxq_info to bounce back the traffic::
+
+  ./xdp_rxq_info --dev ens3 --action XDP_TX
+
+The performance number I got is around 700Kpps.

+This is due to using the kernel's tap interface, which requirescopying

+packet into kernel from the umem buffer in userspace.
+
+PVP using vhostuser device
+--------------------------
+First, build OVS with DPDK and AFXDP::
+
+  ./configure  --enable-afxdp --with-dpdk=<dpdk path>
+  make -j4 && make install
+
+Create a vhost-user port from OVS::
+
+  ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
+  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev \
+    other_config:pmd-cpu-mask=0xfff
+  ovs-vsctl add-port br0 vhost-user-1 \
+    -- set Interface vhost-user-1 type=dpdkvhostuser
+
+Start VM using vhost-user mode::
+
+  qemu-system-x86_64 -hda ubuntu1810.qcow \
+   -m 4096 \
+   -cpu host,+x2apic -enable-kvm \

+ -chardevsocket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 \+ -netdevtype=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=4 \

+   -device virtio-net-pci,mac=00:00:00:00:00:01,\
+      netdev=mynet1,mq=on,vectors=10 \
+   -object memory-backend-file,id=mem,size=4096M,\
+      mem-path=/dev/hugepages,share=on \
+   -numa node,memdev=mem -mem-prealloc -smp 2
+
+Setup the OpenFlow ruls::
+
+  ovs-ofctl del-flows br0

+ ovs-ofctl add-flow br0 "in_port=enp2s0,actions=output:vhost-user-1"+ ovs-ofctl add-flow br0 "in_port=vhost-user-1,actions=output:enp2s0"

+
+Inside the VM, use xdp_rxq_info to drop or bounce back the traffic::
+
+  ./xdp_rxq_info --dev ens3 --action XDP_DROP
+  ./xdp_rxq_info --dev ens3 --action XDP_TX
+
+Performance: for RX_DROP: 6.6Mpps, TX: 2.3Mpps
+
+PCP container using veth
+------------------------
+Create namespace and veth peer devices::
+
+  ip netns add at_ns0
+  ip link add p0 type veth peer name afxdp-p0
+  ip link set p0 netns at_ns0
+  ip link set dev afxdp-p0 up
+  ip netns exec at_ns0 ip link set dev p0 up
+
+Attach the veth port to br0 (linux kernel mode)::
+
+  ovs-vsctl add-port br0 afxdp-p0 -- \
+    set interface afxdp-p0 options:n_rxq=1 options:xdpmode=skb
+

Remove the xdpmode=skb above... Also, see above on the PF_PACKETinterface in the bridge_run(),

I would advise against using this, and you might want to remove it.

+
+Or, use AF_XDP with skb mode::
+
+  ovs-vsctl add-port br0 afxdp-p0 -- \

+ set interface afxdp-p0 type="afxdp" options:n_rxq=1options:xdpmode=skb

+
+Setup the OpenFlow rules::
+
+  ovs-ofctl del-flows br0
+  ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:afxdp-p0"
+  ovs-ofctl add-flow br0 "in_port=afxdp-p0, actions=output:enp2s0"
+
+In the namespace, run drop or bounce back the packet::
+
+  ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_DROP
+  ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_TX
+
+Performace: for RX_DROP: 800Kpps, TX: 700Kpps
+
+Bug Reporting
+-------------
+
+Please report problems to [email protected].

diff --git a/Documentation/intro/install/index.rstb/Documentation/intro/install/index.rst

index 3193c736cf17..c27a9c9d16ff 100644
--- a/Documentation/intro/install/index.rst
+++ b/Documentation/intro/install/index.rst
@@ -45,6 +45,7 @@ Installation from Source
    xenserver
    userspace
    dpdk
+   afxdp

 Installation from Packages
 --------------------------
diff --git a/acinclude.m4 b/acinclude.m4
index b532a4579266..5782f7e4bc2e 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -221,6 +221,38 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [
   ])
 ])

+dnl OVS_CHECK_LINUX_AF_XDP
+dnl
+dnl Check both Linux kernel AF_XDP and libbpf support
+AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
+  AC_ARG_ENABLE([afxdp],

+ [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDPsupport])],

+                [], [enable_afxdp=no])
+  AC_MSG_CHECKING([whether AF_XDP is enabled])
+  if test "$enable_afxdp" != yes; then
+    AC_MSG_RESULT([no])
+    AF_XDP_ENABLE=false
+  else
+    AC_MSG_RESULT([yes])
+    AF_XDP_ENABLE=true
+
+    AC_CHECK_HEADER([bpf/libbpf.h], [],

+ [AC_MSG_ERROR([unable to find bpf/libbpf.h for AF_XDPsupport])])

+
+    AC_CHECK_HEADER([linux/if_xdp.h], [],

+ [AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDPsupport])])

+
+    AC_CHECK_HEADER([bpf/xsk.h], [],
+      [AC_MSG_ERROR([unable to find bpf/xsk.h for AF_XDP support])])
+
+    AC_DEFINE([HAVE_AF_XDP], [1],

+ [Define to 1 if AF_XDP support is available andenabled.])

+    LIBBPF_LDADD=" -lbpf -lelf"
+    AC_SUBST([LIBBPF_LDADD])
+  fi
+  AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
+])
+
 dnl OVS_CHECK_DPDK
 dnl
 dnl Configure DPDK source tree
diff --git a/configure.ac b/configure.ac
index 505e3d041e93..29c90b73f836 100644
--- a/configure.ac
+++ b/configure.ac
@@ -99,6 +99,7 @@ OVS_CHECK_SPHINX
 OVS_CHECK_DOT
 OVS_CHECK_IF_DL
 OVS_CHECK_STRTOK_R
+OVS_CHECK_LINUX_AF_XDP
 AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])

AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, structstat.st_mtimensec],

   [], [], [[#include <sys/stat.h>]])
diff --git a/lib/automake.mk b/lib/automake.mk
index cc5dccf39d6b..686e57f8c472 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -14,6 +14,10 @@ if WIN32
 lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS}
 endif

+if HAVE_AF_XDP
+lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
+endif
+
 lib_libopenvswitch_la_LDFLAGS = \
         $(OVS_LTINFO) \
         -Wl,--version-script=$(top_builddir)/lib/libopenvswitch.sym \
@@ -392,6 +396,7 @@ lib_libopenvswitch_la_SOURCES += \
        lib/if-notifier.h \
        lib/netdev-linux.c \
        lib/netdev-linux.h \
+       lib/netdev-linux-private.h \
        lib/netdev-tc-offloads.c \
        lib/netdev-tc-offloads.h \
        lib/netlink-conntrack.c \
@@ -409,6 +414,14 @@ lib_libopenvswitch_la_SOURCES += \
        lib/tc.h
 endif

+if HAVE_AF_XDP
+lib_libopenvswitch_la_SOURCES += \
+       lib/xdpsock.c \
+       lib/xdpsock.h \
+       lib/netdev-afxdp.c \
+       lib/netdev-afxdp.h
+endif
+
 if DPDK_NETDEV
 lib_libopenvswitch_la_SOURCES += \
        lib/dpdk.c \
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 0976a35e758b..7d086dc5e860 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -22,6 +22,9 @@
 #include "netdev-dpdk.h"
 #include "openvswitch/dynamic-string.h"
 #include "util.h"
+#ifdef HAVE_AF_XDP
+#include "netdev-afxdp.h"
+#endif


Why the protection above? You do not do this in netdev-linux.c.
Maybe you should move the #ifdef HAVE_AF_XDP inside the include file?

 static void

dp_packet_init__(struct dp_packet *b, size_t allocated, enumdp_packet_source source)@@ -59,6 +62,27 @@ dp_packet_use(struct dp_packet *b, void *base,size_t allocated)

     dp_packet_use__(b, base, allocated, DPBUF_MALLOC);
 }

+#if HAVE_AF_XDP
+/* Initialize 'b' as an empty dp_packet that contains
+ * memory starting at AF_XDP umem base.
+ */
+void

+dp_packet_use_afxdp(struct dp_packet *b, void *base, size_tallocated)

+{
+    dp_packet_set_base(b, base);
+    dp_packet_set_data(b, base);
+    dp_packet_set_size(b, 0);
+
+    dp_packet_set_allocated(b, allocated);
+    b->source = DPBUF_AFXDP;
+    dp_packet_reset_offsets(b);
+    pkt_metadata_init(&b->md, 0);
+    dp_packet_reset_cutlen(b);
+    dp_packet_reset_offload(b);
+    b->packet_type = htonl(PT_ETH);
+}
+#endif

Guess the above ifdef saves some bytes if not build with AF_XDP, but wedo not seem to do it for other functions either, likedp_packet_init_dpdk().

/* Initializes 'b' as an empty dp_packet that contains the'allocated' bytes of* memory starting at 'base'. 'base' should point to a buffer on thestack.* (Nothing actually relies on 'base' being allocated on the stack.It could
@@ -122,6 +146,11 @@ dp_packet_uninit(struct dp_packet *b)
              * created as a dp_packet */
             free_dpdk_buf((struct dp_packet*) b);
 #endif
+        } else if (b->source == DPBUF_AFXDP) {
+#ifdef HAVE_AF_XDP
+            free_afxdp_buf(b);
+#endif

If you move the #ifdef HAVE_AF_XDP check to the include file (seecomment above), you can use the DPDK inline trick and remove the #ifdefabove.

See lib/netdev-dpdk.h

+            return;
         }
     }
 }

@@ -248,6 +277,9 @@ dp_packet_resize__(struct dp_packet *b, size_tnew_headroom, size_t new_tailroom

     case DPBUF_STACK:
         OVS_NOT_REACHED();

+    case DPBUF_AFXDP:
+        OVS_NOT_REACHED();
+
     case DPBUF_STUB:
         b->source = DPBUF_MALLOC;
         new_base = xmalloc(new_allocated);
@@ -433,6 +465,7 @@ dp_packet_steal_data(struct dp_packet *b)
 {
     void *p;
     ovs_assert(b->source != DPBUF_DPDK);
+    ovs_assert(b->source != DPBUF_AFXDP);

if (b->source == DPBUF_MALLOC && dp_packet_data(b) ==dp_packet_base(b)) {

         p = dp_packet_data(b);
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index a5e9ade1244a..0f533201f956 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -25,6 +25,10 @@
 #include <rte_mbuf.h>
 #endif

+#ifdef HAVE_AF_XDP
+#include "netdev-afxdp.h"
+#endif
+

See comment in dp-packet.c, if done all #ifdef HAVE_AF_XDP in this filecan be removed.

 #include "netdev-dpdk.h"
 #include "openvswitch/list.h"
 #include "packets.h"
@@ -42,6 +46,7 @@ enum OVS_PACKED_ENUM dp_packet_source {

DPBUF_DPDK, /* buffer data is from DPDK allocatedmemory.* ref to dp_packet_init_dpdk() indp-packet.c.

                                 */
+    DPBUF_AFXDP,               /* buffer data from XDP frame */
 };

 #define DP_PACKET_CONTEXT_SIZE 64
@@ -89,6 +94,13 @@ struct dp_packet {
     };
 };

+#if HAVE_AF_XDP
+struct dp_packet_afxdp {
+    struct umem_pool *mpool;
+    struct dp_packet packet;
+};
+#endif
+
 static inline void *dp_packet_data(const struct dp_packet *);
 static inline void dp_packet_set_data(struct dp_packet *, void *);
 static inline void *dp_packet_base(const struct dp_packet *);

@@ -122,7 +134,9 @@ static inline const void*dp_packet_get_nd_payload(const struct dp_packet *);

 void dp_packet_use(struct dp_packet *, void *, size_t);
 void dp_packet_use_stub(struct dp_packet *, void *, size_t);
 void dp_packet_use_const(struct dp_packet *, const void *, size_t);
-
+#if HAVE_AF_XDP
+void dp_packet_use_afxdp(struct dp_packet *, void *, size_t);
+#endif
 void dp_packet_init_dpdk(struct dp_packet *);

 void dp_packet_init(struct dp_packet *, size_t);
@@ -184,6 +198,12 @@ dp_packet_delete(struct dp_packet *b)
             return;
         }

+#ifdef HAVE_AF_XDP
+        if (b->source == DPBUF_AFXDP) {
+            free_afxdp_buf(b);
+            return;
+        }
+#endif
         dp_packet_uninit(b);
         free(b);
     }
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 859c05613ddf..cc91720fad6e 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -198,6 +198,20 @@ cycles_counter_update(struct pmd_perf_stats *s)
 {
 #ifdef DPDK_NETDEV
     return s->last_tsc = rte_get_tsc_cycles();
+#elif HAVE_AF_XDP

We need to add support for at least ARM and PPC, not sure how to do thisnicely.

This code is already a quick cut/paste from DPDK, license?

+    /* This is x86-specific instructions. */
+    union {
+        uint64_t tsc_64;
+        struct {
+            uint32_t lo_32;
+            uint32_t hi_32;
+        };
+    } tsc;
+    asm volatile("rdtsc" :
+             "=a" (tsc.lo_32),
+             "=d" (tsc.hi_32));
+
+    return s->last_tsc = tsc.tsc_64;
 #else
     return s->last_tsc = 0;
 #endif
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
new file mode 100644
index 000000000000..cd1b9ca8be77
--- /dev/null
+++ b/lib/netdev-afxdp.c
@@ -0,0 +1,727 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *

+ * Unless required by applicable law or agreed to in writing,software

+ * distributed under the License is distributed on an "AS IS" BASIS,

+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express orimplied.+ * See the License for the specific language governing permissionsand

+ * limitations under the License.
+ */
+
+#if !defined(__i386__) && !defined(__x86_64__)
+#error AF_XDP supported only for Linux on x86 or x86_64


Any reason why we do not support PPC and ARM?

+#endif
+
+#include <config.h>
+
+#include "netdev-linux-private.h"
+#include "netdev-linux.h"


Swap the two above, see comment in netdev-linux-private.h

+#include "netdev-afxdp.h"
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <linux/if_ether.h>
+#include <linux/if_tun.h>
+#include <linux/types.h>
+#include <linux/ethtool.h>
+#include <linux/mii.h>
+#include <linux/rtnetlink.h>
+#include <linux/sockios.h>
+#include <linux/if_xdp.h>
+#include <net/if.h>
+#include <net/if_arp.h>
+#include <net/route.h>
+#include <netinet/in.h>
+#include <netpacket/packet.h>
+#include <poll.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/utsname.h>
+#include <unistd.h>
+

Some of these includes are included by netdev-linux(-private).h alreadyso why not remove them?

+#include "coverage.h"
+#include "dp-packet.h"
+#include "dpif-netlink.h"
+#include "dpif-netdev.h"
+#include "fatal-signal.h"
+#include "hash.h"
+#include "netdev-provider.h"
+#include "netdev-tc-offloads.h"
+#include "netdev-vport.h"
+#include "netlink-notifier.h"
+#include "netlink-socket.h"
+#include "netlink.h"
+#include "netnsid.h"
+#include "openflow/openflow.h"
+#include "openvswitch/dynamic-string.h"
+#include "openvswitch/hmap.h"
+#include "openvswitch/ofpbuf.h"
+#include "openvswitch/poll-loop.h"
+#include "openvswitch/vlog.h"
+#include "openvswitch/shash.h"
+#include "ovs-atomic.h"
+#include "packets.h"
+#include "rtnetlink.h"
+#include "socket-util.h"
+#include "sset.h"
+#include "tc.h"
+#include "timer.h"
+#include "unaligned.h"
+#include "util.h"
+#include "xdpsock.h"
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif

Do we really need to include the above? Or should we update the installinstruction to move them over from the kernel headers?

+
+VLOG_DEFINE_THIS_MODULE(netdev_afxdp);
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+
+#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char*)base))
+#define UMEM2XPKT(base, i) \
+ ALIGNED_CAST(struct dp_packet_afxdp *, (char *)base+ \
+                               i * sizeof(struct dp_packet_afxdp))
+
+static uint32_t prog_id;
+static struct xsk_socket_info *xsk_configure(int ifindex, intxdp_queue_id,
+                                             int mode);
+static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
+static void xsk_destroy(struct xsk_socket_info *xsk);
+
+static struct xsk_umem_info *xsk_configure_umem(void *buffer,uint64_t size,
+                                                int xdpmode)
+{
+    struct xsk_umem_info *umem;
+    int ret;
+    int i;
+
+    umem = xcalloc(1, sizeof(*umem));
+ ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq,&umem->cq,
+                           NULL);

Here you pass no user data, so this call will allocateXSK_RING_PROD__DEFAULT_NUM_DESCS and XSK_RING_CONS__DEFAULT_NUM_DESCS,not the values you define in xdpsock.h

+
+    if (ret) {
+        VLOG_ERR("xsk umem create failed (%s) mode: %s",
+                 ovs_strerror(errno),
+                 xdpmode == XDP_COPY ? "SKB": "DRV");
+        free(umem);
+        return NULL;
+    }
+
+    umem->buffer = buffer;
+
+    /* set-up umem pool */
+    umem_pool_init(&umem->mpool, NUM_FRAMES);


Here we should check for return value, see also note in xdpsock.c

+
+    for (i = NUM_FRAMES - 1; i >= 0; i--) {
+        struct umem_elem *elem;
+
+        elem = ALIGNED_CAST(struct umem_elem *,
+                            (char *)umem->buffer + i * FRAME_SIZE);
+        umem_elem_push(&umem->mpool, elem);
+    }
+
+    /* set-up metadata */
+    xpacket_pool_init(&umem->xpool, NUM_FRAMES);

Check return value and cleanup/return NULL on error, seexpacket_pool_init()

+
+    VLOG_DBG("%s xpacket pool from %p to %p", __func__,
+              umem->xpool.array,
+              (char *)umem->xpool.array +
+              NUM_FRAMES * sizeof(struct dp_packet_afxdp));
+
+    for (i = NUM_FRAMES - 1; i >= 0; i--) {
+        struct dp_packet_afxdp *xpacket;
+        struct dp_packet *packet;
+
+        xpacket = UMEM2XPKT(umem->xpool.array, i);
+        xpacket->mpool = &umem->mpool;
+
+        packet = &xpacket->packet;
+        packet->source = DPBUF_AFXDP;
+    }
+
+    return umem;
+}
+
+static struct xsk_socket_info *
+xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
+                     uint32_t queue_id, int xdpmode)
+{
+    struct xsk_socket_config cfg;
+    struct xsk_socket_info *xsk;
+    char devname[IF_NAMESIZE];
+    uint32_t idx = 0;
+    int ret;
+    int i;
+
+    xsk = xcalloc(1, sizeof(*xsk));
+    xsk->umem = umem;
+    cfg.rx_size = CONS_NUM_DESCS;
+    cfg.tx_size = PROD_NUM_DESCS;
+    cfg.libbpf_flags = 0;
+
+    if (xdpmode == XDP_ZEROCOPY) {
+        cfg.bind_flags = XDP_ZEROCOPY;

+ cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |XDP_FLAGS_DRV_MODE;

+    } else {
+        cfg.bind_flags = XDP_COPY;

+ cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |XDP_FLAGS_SKB_MODE;

+    }
+
+    if (if_indextoname(ifindex, devname) == NULL) {
+        VLOG_ERR("ifindex %d to devname failed (%s)",
+                 ifindex, ovs_strerror(errno));
+        free(xsk);
+        return NULL;
+    }
+

+ ret = xsk_socket__create(&xsk->xsk, devname, queue_id,umem->umem,

+                             &xsk->rx, &xsk->tx, &cfg);
+    if (ret) {
+        VLOG_ERR("xsk_socket_create failed (%s) mode: %s qid: %d",
+                 ovs_strerror(errno),
+                 xdpmode == XDP_COPY ? "SKB": "DRV",
+                 queue_id);
+        free(xsk);
+        return NULL;
+    }
+
+    /* Make sure the built-in AF_XDP program is loaded */
+    ret = bpf_get_link_xdp_id(ifindex, &prog_id, cfg.xdp_flags);
+    if (ret) {
+        VLOG_ERR("get XDP prog ID failed (%s)", ovs_strerror(errno));
+        xsk_socket__delete(xsk->xsk);
+        free(xsk);
+        return NULL;
+    }
+
+    xsk_ring_prod__reserve(&xsk->umem->fq, PROD_NUM_DESCS, &idx);
+


We should check if we got the entries we requested

+    for (i = 0;
+         i < PROD_NUM_DESCS * FRAME_SIZE;
+         i += FRAME_SIZE) {
+        struct umem_elem *elem;
+        uint64_t addr;
+
+        elem = umem_elem_pop(&xsk->umem->mpool);


Error check?

+        addr = UMEM2DESC(elem, xsk->umem->buffer);
+
+        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = addr;
+    }
+
+    xsk_ring_prod__submit(&xsk->umem->fq,
+                          PROD_NUM_DESCS);
+    return xsk;
+}
+
+static struct xsk_socket_info *
+xsk_configure(int ifindex, int xdp_queue_id, int xdpmode)
+{
+    struct xsk_socket_info *xsk;
+    struct xsk_umem_info *umem;
+    void *bufs;
+    int ret;
+
+    /* umem memory region */
+    ret = posix_memalign(&bufs, get_page_size(),
+                         NUM_FRAMES * FRAME_SIZE);
+    memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
+    ovs_assert(!ret);


We should not assert, just report out of memory and return NULL.

+
+    /* create AF_XDP socket */
+    umem = xsk_configure_umem(bufs,
+                              NUM_FRAMES * FRAME_SIZE,
+                              xdpmode);
+    if (!umem) {
+        free(bufs);
+        return NULL;
+    }
+
+    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode);
+    if (!xsk) {
+        /* clean up umem and xpacket pool */
+        (void)xsk_umem__delete(umem->umem);
+        free(bufs);
+        umem_pool_cleanup(&umem->mpool);
+        xpacket_pool_cleanup(&umem->xpool);
+        free(umem);
+    }
+    return xsk;
+}
+
+int
+xsk_configure_all(struct netdev *netdev)
+{
+    struct netdev_linux *dev = netdev_linux_cast(netdev);
+    struct xsk_socket_info *xsk;
+    int i, ifindex;
+
+    ifindex = linux_get_ifindex(netdev_get_name(netdev));
+
+    /* configure each queue */
+    for (i = 0; i < netdev->n_rxq; i++) {
+        VLOG_INFO("%s configure queue %d mode %s", __func__, i,
+                  dev->xdpmode == XDP_COPY ? "SKB" : "DRV");
+        xsk = xsk_configure(ifindex, i, dev->xdpmode);
+        if (!xsk) {

+ VLOG_ERR("failed to create AF_XDP socket on queue %d",i);

+            goto err;
+        }
+        dev->xsk[i] = xsk;
+    }
+
+    return 0;
+
+err:
+    xsk_destroy_all(netdev);
+    return EINVAL;
+}
+
+static void OVS_UNUSED vlog_hex_dump(const void *buf, size_t count)
+{
+    struct ds ds = DS_EMPTY_INITIALIZER;
+    ds_put_hex_dump(&ds, buf, count, 0, false);
+    VLOG_DBG_RL(&rl, "%s", ds_cstr(&ds));
+    ds_destroy(&ds);
+}
+
+static void
+xsk_destroy(struct xsk_socket_info *xsk)
+{
+    struct xsk_umem *umem;
+
+    if (!xsk) {
+        return;
+    }
+
+    umem = xsk->umem->umem;
+    xsk_socket__delete(xsk->xsk);
+    (void)xsk_umem__delete(umem);

I would log any errors here, specially if we ever support sharing ofumem.

+
+    /* free the packet buffer */
+    free(xsk->umem->buffer);
+
+    /* cleanup umem pool */
+    umem_pool_cleanup(&xsk->umem->mpool);
+
+    /* cleanup metadata pool */
+    xpacket_pool_cleanup(&xsk->umem->xpool);
+
+    free(xsk->umem);
+    free(xsk);
+}
+
+void
+xsk_destroy_all(struct netdev *netdev)
+{
+    struct netdev_linux *dev = netdev_linux_cast(netdev);
+    int i, ifindex;
+
+    ifindex = linux_get_ifindex(netdev_get_name(netdev));
+
+    for (i = 0; i < MAX_XSKQ; i++) {
+        if (dev->xsk[i]) {
+            VLOG_INFO("destroy xsk[%d]", i);
+            xsk_destroy(dev->xsk[i]);
+            dev->xsk[i] = NULL;
+        }
+    }
+    VLOG_INFO("remove xdp program");
+    xsk_remove_xdp_program(ifindex, dev->xdpmode);
+}
+
+static inline void OVS_UNUSED
+print_xsk_stat(struct xsk_socket_info *xsk OVS_UNUSED) {
+    struct xdp_statistics stat;
+    socklen_t optlen;
+
+    optlen = sizeof stat;

+ ovs_assert(getsockopt(xsk_socket__fd(xsk->xsk), SOL_XDP,XDP_STATISTICS,

+               &stat, &optlen) == 0);
+

+ VLOG_DBG_RL(&rl, "rx dropped %llu, rx_invalid %llu, tx_invalid%llu",

+                stat.rx_dropped,
+                stat.rx_invalid_descs,
+                stat.tx_invalid_descs);
+}

Do we want to move this to some specific statistics dump, like"ovs-vsctl get Interface eno1 statistics"

If you want to keep it, maybe rename it to log_xsk_stats()

+
+int

+netdev_afxdp_set_config(struct netdev *netdev, const struct smap*args,

+                        char **errp OVS_UNUSED)
+{
+    struct netdev_linux *dev = netdev_linux_cast(netdev);
+    const char *xdpmode;
+    int new_n_rxq;
+
+    ovs_mutex_lock(&dev->mutex);
+
+    new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
+    if (new_n_rxq > MAX_XSKQ) {
+        ovs_mutex_unlock(&dev->mutex);
+        return EINVAL;
+    }
+
+    if (new_n_rxq != netdev->n_rxq) {
+        dev->requested_n_rxq = new_n_rxq;
+        netdev_request_reconfigure(netdev);
+    }
+
+    xdpmode = smap_get(args, "xdpmode");
+    if (xdpmode && strncmp(xdpmode, "drv", 3) == 0) {
+        dev->requested_xdpmode = XDP_ZEROCOPY;
+        if (dev->xdpmode != dev->requested_xdpmode) {
+            netdev_request_reconfigure(netdev);
+        }
+    } else {
+        dev->requested_xdpmode = XDP_COPY;
+        if (dev->xdpmode != dev->requested_xdpmode) {
+            netdev_request_reconfigure(netdev);
+        }
+    }
+    ovs_mutex_unlock(&dev->mutex);
+    return 0;
+}
+
+int

+netdev_afxdp_get_config(const struct netdev *netdev, struct smap*args)

+{
+    struct netdev_linux *dev = netdev_linux_cast(netdev);
+
+    ovs_mutex_lock(&dev->mutex);
+    smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
+    smap_add_format(args, "xdpmode", "%s",
+        dev->xdp_bind_flags == XDP_ZEROCOPY ? "drv" : "skb");
+    ovs_mutex_unlock(&dev->mutex);
+    return 0;
+}
+
+int
+netdev_afxdp_reconfigure(struct netdev *netdev)
+{
+    struct netdev_linux *dev = netdev_linux_cast(netdev);
+    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+    int err = 0;
+
+    ovs_mutex_lock(&dev->mutex);
+
+    if (netdev->n_rxq == dev->requested_n_rxq
+        && dev->xdpmode == dev->requested_xdpmode) {
+        goto out;
+    }
+
+    xsk_destroy_all(netdev);
+    netdev->n_rxq = dev->requested_n_rxq;
+
+    if (dev->requested_xdpmode == XDP_ZEROCOPY) {

+ VLOG_INFO("AF_XDP device %s in DRV mode",netdev_get_name(netdev));

+        /* From SKB mode to DRV mode */

+ dev->xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |XDP_FLAGS_DRV_MODE;

+        dev->xdp_bind_flags = XDP_ZEROCOPY;
+        dev->xdpmode = XDP_ZEROCOPY;
+
+        if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+            VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK): %s",
+                      ovs_strerror(errno));
+        }
+    } else {

+ VLOG_INFO("AF_XDP device %s in SKB mode",netdev_get_name(netdev));

+        /* From DRV mode to SKB mode */

+ dev->xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |XDP_FLAGS_SKB_MODE;

+        dev->xdp_bind_flags = XDP_COPY;
+        dev->xdpmode = XDP_COPY;
+        /* TODO: set rlimit back to previous value
+         * when no device is in DRV mode.
+         */
+    }
+
+    err = xsk_configure_all(netdev);
+    if (err) {

+ VLOG_ERR("AF_XDP device %s reconfig fails",netdev_get_name(netdev));

+    }
+    netdev_change_seq_changed(netdev);
+out:
+    ovs_mutex_unlock(&dev->mutex);
+    return err;
+}
+
+int
+netdev_afxdp_get_numa_id(const struct netdev *netdev)
+{
+    /* FIXME: Get netdev's PCIe device ID, then find
+     * its NUMA node id.
+     */
+    VLOG_INFO("FIXME: Device %s always use numa id 0",
+              netdev_get_name(netdev));
+    return 0;
+}
+
+void
+xsk_remove_xdp_program(uint32_t ifindex, int xdpmode)
+{
+    uint32_t curr_prog_id = 0;
+    uint32_t flags;
+
+    /* remove_xdp_program() */
+    if (xdpmode == XDP_COPY) {
+        flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
+    } else {
+        flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
+    }
+
+    if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, flags)) {
+        bpf_set_link_xdp_fd(ifindex, -1, flags);
+    }
+    if (prog_id == curr_prog_id) {
+        bpf_set_link_xdp_fd(ifindex, -1, flags);
+    } else if (!curr_prog_id) {
+        VLOG_INFO("couldn't find a prog id on a given interface");
+    } else {
+        VLOG_INFO("program on interface changed, not removing");
+    }
+}
+
+struct dp_packet_afxdp *
+dp_packet_cast_afxdp(const struct dp_packet *d)
+{
+    ovs_assert(d->source == DPBUF_AFXDP);
+    return CONTAINER_OF(d, struct dp_packet_afxdp, packet);
+}
+
+void
+free_afxdp_buf(struct dp_packet *p)
+{
+    struct dp_packet_afxdp *xpacket;
+    unsigned long addr;
+
+    xpacket = dp_packet_cast_afxdp(p);
+    if (xpacket->mpool) {
+        void *base = dp_packet_base(p);
+
+        addr = (unsigned long)base & (~FRAME_SHIFT_MASK);
+        umem_elem_push(xpacket->mpool, (void *)addr);
+    }
+}
+
+void
+free_afxdp_buf_batch(struct dp_packet_batch *batch)
+{
+        struct dp_packet_afxdp *xpacket = NULL;
+        struct dp_packet *packet;
+        void *elems[BATCH_SIZE];
+        unsigned long addr;
+

+ /* all packets are AF_XDP, so handles its own delete in batch*/

+        DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+            xpacket = dp_packet_cast_afxdp(packet);
+            if (xpacket->mpool) {
+                void *base = dp_packet_base(packet);
+
+                addr = (unsigned long)base & (~FRAME_SHIFT_MASK);
+                elems[i] = (void *)addr;
+            }
+        }
+        umem_elem_push_n(xpacket->mpool, batch->count, elems);
+        dp_packet_batch_init(batch);
+}
+
+/* Receive packet from AF_XDP socket */
+int
+netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
+                     struct dp_packet_batch *batch)
+{
+    struct umem_elem *elems[BATCH_SIZE];
+    uint32_t idx_rx = 0, idx_fq = 0;
+    unsigned int rcvd, i;
+    int ret = 0;
+
+    /* See if there is any packet on RX queue,
+     * if yes, idx_rx is the index having the packet.
+     */
+    rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
+    if (!rcvd) {
+        return 0;
+    }
+
+    /* Form a dp_packet batch from descriptor in RX queue */
+    for (i = 0; i < rcvd; i++) {

+ uint64_t addr = xsk_ring_cons__rx_desc(&xsk->rx,idx_rx)->addr;

+        uint32_t len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->len;
+        char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+        uint64_t index;
+
+        struct dp_packet_afxdp *xpacket;
+        struct dp_packet *packet;
+
+        index = addr >> FRAME_SHIFT;
+        xpacket = UMEM2XPKT(xsk->umem->xpool.array, index);
+
+        packet = &xpacket->packet;
+        xpacket->mpool = &xsk->umem->mpool;

Do we need to set this up again? This should be static and setup inxsk_configure_umem()

+
+        /* Initialize the struct dp_packet */

+ dp_packet_use_afxdp(packet, pkt, FRAME_SIZE -FRAME_HEADROOM);

+        dp_packet_set_size(packet, len);
+
+        /* Add packet into batch, increase batch->count */
+        dp_packet_batch_add(batch, packet);
+
+        idx_rx++;
+    }
+
+    /* We've consume rcvd packets in RX, now re-fill the
+     * same number back to FILL queue.
+     */
+    ret = umem_elem_pop_n(&xsk->umem->mpool, rcvd, (void **)elems);
+    if (OVS_UNLIKELY(ret)) {
+        return -ENOMEM;
+    }
+

I saw Ilya's comments on this section also, but should we not continueto process the batch even if we can't stock the kernel with new buffers?Maybe other PMDs have a bunch of packets pending (send and receive) soif we are temporarily out of buffers.

Maybe we can re-stock later...

+    for (i = 0; i < rcvd; i++) {
+        uint64_t index;
+        struct umem_elem *elem;
+
+        ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
+        while (OVS_UNLIKELY(ret == 0)) {
+            /* The FILL queue is full, so retry. (or skip)? */
+            ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
+        }
+
+        /* Get one free umem, program it into FILL queue */
+        elem = elems[i];
+        index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
+        ovs_assert((index & FRAME_SHIFT_MASK) == 0);
+        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq) = index;
+
+        idx_fq++;
+    }
+    xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
+
+    /* Release the RX queue */
+    xsk_ring_cons__release(&xsk->rx, rcvd);

We should move this more up, so the entries are available for the kernelto fill...

+    xsk->rx_npkts += rcvd;
+
+#ifdef AFXDP_DEBUG
+    print_xsk_stat(xsk);
+#endif
+    return 0;
+}
+
+static inline int kick_tx(struct xsk_socket_info *xsk)
+{
+    int ret;
+
+    /* This causes system call into kernel's xsk_sendmsg, and
+     * xsk_generic_xmit (skb mode) or xsk_async_xmit (driver mode).
+     */

+ ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT,NULL, 0);

+    if (OVS_UNLIKELY(ret < 0)) {

+ if (errno == ENXIO || errno == ENOBUFS || errno ==EOPNOTSUPP) {

+            return errno;
+        }
+    }
+    /* no error, or EBUSY or EAGAIN */
+    return 0;
+}
+
+int
+netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
+                              struct dp_packet_batch *batch)
+{


See Ilya's comment on thread safety on the ring APIs.

+    struct umem_elem *elems_pop[BATCH_SIZE];
+    struct umem_elem *elems_push[BATCH_SIZE];
+    uint32_t tx_done, idx_cq = 0;
+    struct dp_packet *packet;
+    uint32_t idx = 0;
+    int j, ret, retry_count = 0;
+    const int max_retry = 4;
+

+ ret = umem_elem_pop_n(&xsk->umem->mpool, batch->count, (void**)elems_pop);

+    if (OVS_UNLIKELY(ret)) {
+        return EAGAIN;
+    }
+
+    /* Make sure we have enough TX descs */
+    ret = xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx);
+    if (OVS_UNLIKELY(ret == 0)) {

+ umem_elem_push_n(&xsk->umem->mpool, batch->count, (void**)elems_pop);

+        return EAGAIN;
+    }
+
+    DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+        struct umem_elem *elem;
+        uint64_t index;
+
+        elem = elems_pop[i];
+        /* Copy the packet to the umem we just pop from umem pool.
+         * We can avoid this copy if the packet and the pop umem
+         * are located in the same umem.
+         */

The comment mentions the copy can be avoided, but it's not implementedin the code, is this correct or was something removed?

+        memcpy(elem, dp_packet_data(packet), dp_packet_size(packet));
+
+        index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
+        xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index;
+        xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len
+            = dp_packet_size(packet);
+    }
+    xsk_ring_prod__submit(&xsk->tx, batch->count);
+    xsk->outstanding_tx += batch->count;
+
+    ret = kick_tx(xsk);
+    if (OVS_UNLIKELY(ret)) {

+ umem_elem_push_n(&xsk->umem->mpool, batch->count, (void**)elems_pop);

+        VLOG_WARN_RL(&rl, "error sending AF_XDP packet: %s",
+                     ovs_strerror(ret));
+        return ret;


I think we should still try to recover the CQ below, even on failure.

+    }
+
+retry:
+    /* Process CQ */

+ tx_done = xsk_ring_cons__peek(&xsk->umem->cq, batch->count,&idx_cq);

+    if (tx_done > 0) {
+        xsk->outstanding_tx -= tx_done;
+        xsk->tx_npkts += tx_done;
+    }
+
+    /* Recycle back to umem pool */
+    for (j = 0; j < tx_done; j++) {
+        struct umem_elem *elem;
+        uint64_t addr;
+
+        addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
+
+        elem = ALIGNED_CAST(struct umem_elem *,
+                            (char *)xsk->umem->buffer + addr);
+        elems_push[j] = elem;
+    }
+

+ ret = umem_elem_push_n(&xsk->umem->mpool, tx_done, (void**)elems_push);

+    ovs_assert(ret == 0);
+
+    xsk_ring_cons__release(&xsk->umem->cq, tx_done);
+

+ if (xsk->outstanding_tx > PROD_NUM_DESCS - (PROD_NUM_DESCS >> 2)){

+        /* If there are still a lot not transmitted, try harder. */
+        if (retry_count++ > max_retry) {
+            return 0;
+        }
+        goto retry;
+    }
+

I think the code above is causing my lockup at wire speed mentionedabove...I guess the retry_count expires every transmit sending packets to theTAP interface.No all buffers are used... This is causing the umem_elem_pop_n() in thebeginning to fail, hence the buffers are never returned!

Guess we might need some reclaim in the beginning, or maybe even in therx loop?

+    return 0;
+}
diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
new file mode 100644
index 000000000000..6518d8fca0b5
--- /dev/null
+++ b/lib/netdev-afxdp.h
@@ -0,0 +1,53 @@
+/*
+ * Copyright (c) 2018 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *

+ * Unless required by applicable law or agreed to in writing,software

+ * distributed under the License is distributed on an "AS IS" BASIS,

+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express orimplied.+ * See the License for the specific language governing permissionsand

+ * limitations under the License.
+ */
+
+#ifndef NETDEV_AFXDP_H
+#define NETDEV_AFXDP_H 1
+
+#include <stdint.h>
+#include <stdbool.h>
+

+/* These functions are Linux AF_XDP specific, so they should be useddirectly

+ * only by Linux-specific code. */


Extra enter?

+#define MAX_XSKQ 16


Extra enter?

+struct netdev;
+struct xsk_socket_info;
+struct xdp_umem;
+struct dp_packet_batch;
+struct smap;
+struct dp_packet;
+

+struct dp_packet_afxdp * dp_packet_cast_afxdp(const struct dp_packet*d);

+
+int xsk_configure_all(struct netdev *netdev);
+
+void xsk_destroy_all(struct netdev *netdev);
+
+int netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
+                         struct dp_packet_batch *batch);
+
+int netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
+                                  struct dp_packet_batch *batch);
+

+int netdev_afxdp_set_config(struct netdev *netdev, const struct smap*args,

+                            char **errp);

+int netdev_afxdp_get_config(const struct netdev *netdev, struct smap*args);

+int netdev_afxdp_get_numa_id(const struct netdev *netdev);
+
+void free_afxdp_buf(struct dp_packet *p);
+void free_afxdp_buf_batch(struct dp_packet_batch *batch);
+int netdev_afxdp_reconfigure(struct netdev *netdev);
+#endif /* netdev-afxdp.h */
diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
new file mode 100644
index 000000000000..3dd3d902b3c4
--- /dev/null
+++ b/lib/netdev-linux-private.h
@@ -0,0 +1,124 @@
+/*
+ * Copyright (c) 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *

+ * Unless required by applicable law or agreed to in writing,software

+ * distributed under the License is distributed on an "AS IS" BASIS,

+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express orimplied.+ * See the License for the specific language governing permissionsand

+ * limitations under the License.
+ */
+
+#ifndef NETDEV_LINUX_PRIVATE_H
+#define NETDEV_LINUX_PRIVATE_H 1
+
+#include <config.h>
+
+#include <linux/filter.h>
+#include <linux/gen_stats.h>
+#include <linux/if_ether.h>
+#include <linux/if_tun.h>
+#include <linux/types.h>
+#include <linux/ethtool.h>
+#include <linux/mii.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+#include "netdev-provider.h"
+#include "netdev-tc-offloads.h"
+#include "netdev-vport.h"
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+#include "timer.h"

Why include all the above? They where just added to netdev-linux.h, soif you make sure you include netdev-lunux.h before -private it shouldwork out.

+
+#if HAVE_AF_XDP
+#include "netdev-afxdp.h"
+#endif


See earlier comment

+
+/* These functions are Linux specific, so they should be useddirectly only by
+ * Linux-specific code. */
+
+struct netdev;
+
+int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_tflag,+ const char *flag_name, boolenable);
+int linux_get_ifindex(const char *netdev_name);
+

These functions are now both specified in netdev-linux.h andnetdev-linux-private.h

+#define LINUX_FLOW_OFFLOAD_API                          \
+   .flow_flush = netdev_tc_flow_flush,                  \
+   .flow_dump_create = netdev_tc_flow_dump_create,      \
+   .flow_dump_destroy = netdev_tc_flow_dump_destroy,    \
+   .flow_dump_next = netdev_tc_flow_dump_next,          \
+   .flow_put = netdev_tc_flow_put,                      \
+   .flow_get = netdev_tc_flow_get,                      \
+   .flow_del = netdev_tc_flow_del,                      \
+   .init_flow_api = netdev_tc_init_flow_api
+


Same here, this define is in both include files.

+struct netdev_linux {
+    struct netdev up;
+
+    /* Protects all members below. */
+    struct ovs_mutex mutex;
+
+    unsigned int cache_valid;
+
+    bool miimon;                    /* Link status of last poll. */
+ long long int miimon_interval; /* Miimon Poll rate. Disabled if<= 0. */
+    struct timer miimon_timer;
+
+    int netnsid;                    /* Network namespace ID. */
+ /* The following are figured out "on demand" only. They are onlyvalid
+     * when the corresponding VALID_* bit in 'cache_valid' is set. */
+    int ifindex;
+    struct eth_addr etheraddr;
+    int mtu;
+    unsigned int ifi_flags;
+    long long int carrier_resets;
+    uint32_t kbits_rate;        /* Policing data. */
+    uint32_t kbits_burst;
+ int vport_stats_error; /* Cached error code fromvport_get_stats().
+                                   0 or an errno value. */
+    int netdev_mtu_error;       /* Cached error code from SIOCGIFMTU
+                                 * or SIOCSIFMTU.
+                                 */
+ int ether_addr_error; /* Cached error code from set/getetheraddr. */+ int netdev_policing_error; /* Cached error code from setpolicing. */+ int get_features_error; /* Cached error code fromETHTOOL_GSET. */+ int get_ifindex_error; /* Cached error code fromSIOCGIFINDEX. */
+
+    enum netdev_features current;    /* Cached from ETHTOOL_GSET. */
+    enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
+    enum netdev_features supported;  /* Cached from ETHTOOL_GSET. */
+
+ struct ethtool_drvinfo drvinfo; /* Cached from ETHTOOL_GDRVINFO.*/
+    struct tc *tc;
+
+    /* For devices of class netdev_tap_class only. */
+    int tap_fd;
+ bool present; /* If the device is present in thenamespace */+ uint64_t tx_dropped; /* tap device can drop if the ifaceis down */
+
+    /* LAG information. */
+ bool is_lag_master; /* True if the netdev is a LAGmaster. */
+
+    /* AF_XDP information */
+#ifdef HAVE_AF_XDP
+    struct xsk_socket_info *xsk[MAX_XSKQ];
+    int requested_n_rxq;
+    int xdpmode, requested_xdpmode; /* detect mode changed */
+    int xdp_flags, xdp_bind_flags;
+#endif
+};
+
+static struct netdev_linux *
+netdev_linux_cast(const struct netdev *netdev)
+{

In the original definition there was an assert() here, was it removed byaccident?

netdev_linux_rxq_xsk

+    return CONTAINER_OF(netdev, struct netdev_linux, up);
+}
+
+#endif /* netdev-linux-private.h */
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index f75d73fd39f8..1f190406d145 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -17,6 +17,7 @@
 #include <config.h>

 #include "netdev-linux.h"
+#include "netdev-linux-private.h"

 #include <errno.h>
 #include <fcntl.h>
@@ -54,6 +55,7 @@
 #include "fatal-signal.h"
 #include "hash.h"
 #include "openvswitch/hmap.h"
+#include "netdev-afxdp.h"
 #include "netdev-provider.h"
 #include "netdev-tc-offloads.h"
 #include "netdev-vport.h"
@@ -487,51 +489,6 @@ static int tc_calc_cell_log(unsigned int mtu);

static void tc_fill_rate(struct tc_ratespec *rate, uint64_t bps, intmtu);static int tc_calc_buffer(unsigned int Bps, int mtu, uint64_tburst_bytes);

 
-struct netdev_linux {
-    struct netdev up;
-
-    /* Protects all members below. */
-    struct ovs_mutex mutex;
-
-    unsigned int cache_valid;
-
-    bool miimon;                    /* Link status of last poll. */

- long long int miimon_interval; /* Miimon Poll rate. Disabled if<= 0. */

-    struct timer miimon_timer;
-
-    int netnsid;                    /* Network namespace ID. */

- /* The following are figured out "on demand" only. They are onlyvalid

-     * when the corresponding VALID_* bit in 'cache_valid' is set. */
-    int ifindex;
-    struct eth_addr etheraddr;
-    int mtu;
-    unsigned int ifi_flags;
-    long long int carrier_resets;
-    uint32_t kbits_rate;        /* Policing data. */
-    uint32_t kbits_burst;

- int vport_stats_error; /* Cached error code fromvport_get_stats().

-                                   0 or an errno value. */

- int netdev_mtu_error; /* Cached error code from SIOCGIFMTUor SIOCSIFMTU. */- int ether_addr_error; /* Cached error code from set/getetheraddr. */- int netdev_policing_error; /* Cached error code from setpolicing. */- int get_features_error; /* Cached error code fromETHTOOL_GSET. */- int get_ifindex_error; /* Cached error code fromSIOCGIFINDEX. */

-
-    enum netdev_features current;    /* Cached from ETHTOOL_GSET. */
-    enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
-    enum netdev_features supported;  /* Cached from ETHTOOL_GSET. */
-

- struct ethtool_drvinfo drvinfo; /* Cached from ETHTOOL_GDRVINFO.*/

-    struct tc *tc;
-
-    /* For devices of class netdev_tap_class only. */
-    int tap_fd;

- bool present; /* If the device is present in thenamespace */- uint64_t tx_dropped; /* tap device can drop if the ifaceis down */

-
-    /* LAG information. */

- bool is_lag_master; /* True if the netdev is a LAGmaster. */

-};

 struct netdev_rxq_linux {
     struct netdev_rxq up;

@@ -579,18 +536,23 @@ is_netdev_linux_class(const struct netdev_class*netdev_class)

     return netdev_class->run == netdev_linux_run;
 }

+#if HAVE_AF_XDP
 static bool
-is_tap_netdev(const struct netdev *netdev)
+is_afxdp_netdev(const struct netdev *netdev)
 {
-    return netdev_get_class(netdev) == &netdev_tap_class;
+    return netdev_get_class(netdev) == &netdev_afxdp_class;
 }
-
-static struct netdev_linux *
-netdev_linux_cast(const struct netdev *netdev)
+#else
+static bool
+is_afxdp_netdev(const struct netdev *netdev OVS_UNUSED)
 {
-    ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
-
-    return CONTAINER_OF(netdev, struct netdev_linux, up);
+    return false;
+}
+#endif
+static bool
+is_tap_netdev(const struct netdev *netdev)
+{
+    return netdev_get_class(netdev) == &netdev_tap_class;
 }

 static struct netdev_rxq_linux *
@@ -1084,6 +1046,11 @@ netdev_linux_destruct(struct netdev *netdev_)
         atomic_count_dec(&miimon_cnt);
     }

+#if HAVE_AF_XDP
+    if (is_afxdp_netdev(netdev_)) {
+        xsk_destroy_all(netdev_);
+    }
+#endif

Think you can remove the HAVE_AF_XDP here, as you do not use it beloweither.

     ovs_mutex_destroy(&netdev->mutex);
 }

@@ -1113,7 +1080,7 @@ netdev_linux_rxq_construct(struct netdev_rxq*rxq_)

     rx->is_tap = is_tap_netdev(netdev_);
     if (rx->is_tap) {
         rx->fd = netdev->tap_fd;
-    } else {
+    } else if (!is_afxdp_netdev(netdev_)) {
         struct sockaddr_ll sll;
         int ifindex, val;
         /* Result of tcpdump -dd inbound */

@@ -1318,10 +1285,18 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_,struct dp_packet_batch *batch,

 {
     struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
     struct netdev *netdev = rx->up.netdev;
-    struct dp_packet *buffer;
+    struct dp_packet *buffer = NULL;
     ssize_t retval;
     int mtu;

+#if HAVE_AF_XDP

Think this #if HAVE_AF_XDP can be removed as the compiler shouldoptimize out the if (false).

+    if (is_afxdp_netdev(netdev)) {
+        struct netdev_linux *dev = netdev_linux_cast(netdev);
+        int qid = rxq_->queue_id;
+
+        return netdev_linux_rxq_xsk(dev->xsk[qid], batch);
+    }
+#endif
     if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) {
         mtu = ETH_PAYLOAD_MAX;
     }

@@ -1329,6 +1304,7 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_,struct dp_packet_batch *batch,

     /* Assume Ethernet port. No need to set packet_type. */
     buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu,
                                            DP_NETDEV_HEADROOM);
+
     retval = (rx->is_tap
               ? netdev_linux_rxq_recv_tap(rx->fd, buffer)
               : netdev_linux_rxq_recv_sock(rx->fd, buffer));

@@ -1480,7 +1456,8 @@ netdev_linux_send(struct netdev *netdev_, intqid OVS_UNUSED,

     int error = 0;
     int sock = 0;

-    if (!is_tap_netdev(netdev_)) {
+    if (!is_tap_netdev(netdev_) &&
+        !is_afxdp_netdev(netdev_)) {

if(netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_))) {

             error = EOPNOTSUPP;
             goto free_batch;

@@ -1499,6 +1476,36 @@ netdev_linux_send(struct netdev *netdev_, intqid OVS_UNUSED,

         }

         error = netdev_linux_sock_batch_send(sock, ifindex, batch);
+#if HAVE_AF_XDP


Same here remove the #if HAVE_AF_XDP

+    } else if (is_afxdp_netdev(netdev_)) {
+        struct netdev_linux *dev = netdev_linux_cast(netdev_);
+        struct dp_packet_afxdp *xpacket;
+        struct umem_pool *first_mpool;
+        struct dp_packet *packet;
+
+        error = netdev_linux_afxdp_batch_send(dev->xsk[qid], batch);
+
+        /* all packets must come frome the same umem pool
+         * and has DPBUF_AFXDP type, otherwise free on-by-one
+         */
+        DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+            if (packet->source != DPBUF_AFXDP) {
+                goto free_batch;
+            }
+
+            xpacket = dp_packet_cast_afxdp(packet);
+            if (i == 0) {
+                first_mpool = xpacket->mpool;
+                continue;
+            }
+            if (xpacket->mpool != first_mpool) {
+                goto free_batch;
+            }
+        }

Why do not we not move all the packet type checks tofree_afxdp_buf_batch()?

+        /* free in batch */
+        free_afxdp_buf_batch(batch);
+        return error;
+#endif
     } else {
         error = netdev_linux_tap_batch_send(netdev_, batch);
     }
@@ -3323,6 +3330,7 @@ const struct netdev_class netdev_linux_class = {
     NETDEV_LINUX_CLASS_COMMON,
     LINUX_FLOW_OFFLOAD_API,
     .type = "system",
+    .is_pmd = false,
     .construct = netdev_linux_construct,
     .get_stats = netdev_linux_get_stats,
     .get_features = netdev_linux_get_features,
@@ -3333,6 +3341,7 @@ const struct netdev_class netdev_linux_class = {
 const struct netdev_class netdev_tap_class = {
     NETDEV_LINUX_CLASS_COMMON,
     .type = "tap",
+    .is_pmd = false,
     .construct = netdev_linux_construct_tap,
     .get_stats = netdev_tap_get_stats,
     .get_features = netdev_linux_get_features,

@@ -3343,10 +3352,26 @@ const struct netdev_classnetdev_internal_class = {

     NETDEV_LINUX_CLASS_COMMON,
     LINUX_FLOW_OFFLOAD_API,
     .type = "internal",
+    .is_pmd = false,
     .construct = netdev_linux_construct,
     .get_stats = netdev_internal_get_stats,
     .get_status = netdev_internal_get_status,
 };
+
+#ifdef HAVE_AF_XDP
+const struct netdev_class netdev_afxdp_class = {
+    NETDEV_LINUX_CLASS_COMMON,
+    .type = "afxdp",
+    .is_pmd = true,
+    .construct = netdev_linux_construct,
+    .get_stats = netdev_linux_get_stats,
+    .get_status = netdev_linux_get_status,
+    .set_config = netdev_afxdp_set_config,
+    .get_config = netdev_afxdp_get_config,
+    .reconfigure = netdev_afxdp_reconfigure,
+    .get_numa_id = netdev_afxdp_get_numa_id,
+};
+#endif
 

 #define CODEL_N_QUEUES 0x0000
diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h
index 17ca9120168a..b812e64cb078 100644
--- a/lib/netdev-linux.h
+++ b/lib/netdev-linux.h
@@ -19,6 +19,20 @@

 #include <stdint.h>
 #include <stdbool.h>
+#include <linux/filter.h>
+#include <linux/gen_stats.h>
+#include <linux/if_ether.h>
+#include <linux/if_tun.h>
+#include <linux/types.h>
+#include <linux/ethtool.h>
+#include <linux/mii.h>
+
+#include "netdev-provider.h"
+#include "netdev-tc-offloads.h"
+#include "netdev-vport.h"
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+#include "timer.h"

Is there a reason why you move all these includes here? If there is youmight as well remove the duplicates from .c files that includenetdev-linux.h, for example, netdev-linux.c

/* These functions are Linux specific, so they should be useddirectly only by

  * Linux-specific code. */
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index fb0c27e6e8e8..d433818f7064 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h

@@ -902,7 +902,9 @@ extern const struct netdev_classnetdev_linux_class;

 #endif
 extern const struct netdev_class netdev_internal_class;
 extern const struct netdev_class netdev_tap_class;
-
+#if HAVE_AF_XDP
+extern const struct netdev_class netdev_afxdp_class;
+#endif
 #ifdef  __cplusplus
 }
 #endif
diff --git a/lib/netdev.c b/lib/netdev.c
index 7d7ecf6f0946..e2fae37d5a5e 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -146,6 +146,9 @@ netdev_initialize(void)
         netdev_register_provider(&netdev_internal_class);
         netdev_register_provider(&netdev_tap_class);
         netdev_vport_tunnel_register();
+#ifdef HAVE_AF_XDP
+        netdev_register_provider(&netdev_afxdp_class);
+#endif
 #endif
 #if defined(__FreeBSD__) || defined(__NetBSD__)
         netdev_register_provider(&netdev_tap_class);
diff --git a/lib/xdpsock.c b/lib/xdpsock.c
new file mode 100644
index 000000000000..2d80e74d69e4
--- /dev/null
+++ b/lib/xdpsock.c
@@ -0,0 +1,239 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *

+ * Unless required by applicable law or agreed to in writing,software

+ * distributed under the License is distributed on an "AS IS" BASIS,

+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express orimplied.+ * See the License for the specific language governing permissionsand

+ * limitations under the License.
+ */
+#include <config.h>
+
+#include "xdpsock.h"
+
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdarg.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <syslog.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "async-append.h"
+#include "coverage.h"
+#include "dirs.h"
+#include "dp-packet.h"
+#include "openvswitch/compiler.h"
+#include "openvswitch/vlog.h"
+#include "ovs-atomic.h"
+#include "ovs-thread.h"
+#include "sat-math.h"
+#include "socket-util.h"
+#include "svec.h"
+#include "syslog-direct.h"
+#include "syslog-libc.h"
+#include "syslog-provider.h"
+#include "timeval.h"
+#include "unixctl.h"
+#include "util.h"
+
+static inline void
+ovs_spinlock_init(ovs_spinlock_t *sl)
+{
+    atomic_init(&sl->locked, 0);
+}
+
+static inline void
+ovs_spin_lock(ovs_spinlock_t *sl)
+{
+    int exp = 0, locked = 0;
+

+ while (!atomic_compare_exchange_strong_explicit(&sl->locked,&exp, 1,

+                memory_order_acquire,
+                memory_order_relaxed)) {
+        locked = 1;
+        while (locked) {
+            atomic_read_relaxed(&sl->locked, &locked);
+        }
+        exp = 0;
+    }
+}
+
+static inline void
+ovs_spin_unlock(ovs_spinlock_t *sl)
+{
+    atomic_store_explicit(&sl->locked, 0, memory_order_release);
+}
+
+static inline int OVS_UNUSED
+ovs_spin_trylock(ovs_spinlock_t *sl)
+{
+    int exp = 0;

+ return atomic_compare_exchange_strong_explicit(&sl->locked, &exp,1,

+                memory_order_acquire,
+                memory_order_relaxed);
+}


Move spinlock function out to a common file

+
+inline int
+__umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs)
+{
+    void *ptr;
+
+    if (OVS_UNLIKELY(umemp->index + n > umemp->size)) {


This is a stack overflow

+        return -ENOMEM;
+    }
+
+    ptr = &umemp->array[umemp->index];
+    memcpy(ptr, addrs, n * sizeof(void *));
+    umemp->index += n;
+
+    return 0;
+}
+
+int umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs)
+{
+    int ret;
+
+    ovs_spin_lock(&umemp->mutex);
+    ret = __umem_elem_push_n(umemp, n, addrs);
+    ovs_spin_unlock(&umemp->mutex);
+
+    return ret;
+}
+
+inline void
+__umem_elem_push(struct umem_pool *umemp, void *addr)
+{
+    umemp->array[umemp->index++] = addr;
+}
+
+void
+umem_elem_push(struct umem_pool *umemp, void *addr)
+{
+
+    if (OVS_UNLIKELY(umemp->index >= umemp->size)) {
+        /* stack is overflow, this should not happen */
+        OVS_NOT_REACHED();
+    }


Should this not be moved after the spinlock, i.e. to __umem_elem_push

+
+    ovs_assert(((uint64_t)addr & FRAME_SHIFT_MASK) == 0);
+
+    ovs_spin_lock(&umemp->mutex);
+    __umem_elem_push(umemp, addr);
+    ovs_spin_unlock(&umemp->mutex);
+}
+
+inline int
+__umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs)
+{
+    void *ptr;
+
+    if (OVS_UNLIKELY(umemp->index - n < 0)) {
+        return -ENOMEM;
+    }
+
+    umemp->index -= n;
+    ptr = &umemp->array[umemp->index];
+    memcpy(addrs, ptr, n * sizeof(void *));
+
+    return 0;
+}
+
+int
+umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs)
+{
+    int ret;
+
+    ovs_spin_lock(&umemp->mutex);
+    ret = __umem_elem_pop_n(umemp, n, addrs);
+    ovs_spin_unlock(&umemp->mutex);
+
+    return ret;
+}
+
+inline void *
+__umem_elem_pop(struct umem_pool *umemp)
+{

There is no check here to see if there are actual any elements left,like there is for pop_n,

so we could corrupt memory/umem_pool

+    return umemp->array[--umemp->index];
+}
+
+void *
+umem_elem_pop(struct umem_pool *umemp)
+{
+    void *ptr;
+
+    ovs_spin_lock(&umemp->mutex);
+    ptr = __umem_elem_pop(umemp);
+    ovs_spin_unlock(&umemp->mutex);
+
+    return ptr;
+}
+
+void **
+__umem_pool_alloc(unsigned int size)
+{
+    void *bufs;
+
+    ovs_assert(posix_memalign(&bufs, getpagesize(),
+                              size * sizeof(void *)) == 0);


We should not assert, just return NULL here.

+    memset(bufs, 0, size * sizeof(void *));
+    return (void **)bufs;
+}
+
+unsigned int
+umem_elem_count(struct umem_pool *mpool)
+{
+    return mpool->index;
+}
+
+int
+umem_pool_init(struct umem_pool *umemp, unsigned int size)
+{
+    umemp->array = __umem_pool_alloc(size);
+    if (!umemp->array) {
+        OVS_NOT_REACHED();


If NULL is returned return ENOMEM

+    }
+
+    umemp->size = size;
+    umemp->index = 0;
+    ovs_spinlock_init(&umemp->mutex);
+    return 0;
+}
+
+void
+umem_pool_cleanup(struct umem_pool *umemp)
+{
+    free(umemp->array);

       umemp->array = NULL;

+}
+
+/* AF_XDP metadata init/destroy */
+int
+xpacket_pool_init(struct xpacket_pool *xp, unsigned int size)
+{
+    void *bufs;
+
+    /* TODO: check HAVE_POSIX_MEMALIGN  */


Guess the above needs to be done

+    ovs_assert(posix_memalign(&bufs, getpagesize(),
+ size * sizeof(struct dp_packet_afxdp))== 0);


We should not assert, just return false

+    memset(bufs, 0, size * sizeof(struct dp_packet_afxdp));
+
+    xp->array = bufs;
+    xp->size = size;
+    return 0;
+}
+
+void
+xpacket_pool_cleanup(struct xpacket_pool *xp)
+{
+    free(xp->array);

       xp->array = NULL;

+}
diff --git a/lib/xdpsock.h b/lib/xdpsock.h
new file mode 100644
index 000000000000..aabaa8e5df24
--- /dev/null
+++ b/lib/xdpsock.h
@@ -0,0 +1,123 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *

+ * Unless required by applicable law or agreed to in writing,software

+ * distributed under the License is distributed on an "AS IS" BASIS,

+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express orimplied.+ * See the License for the specific language governing permissionsand

+ * limitations under the License.
+ */
+
+#ifndef XDPSOCK_H
+#define XDPSOCK_H 1
+
+#include <bpf/libbpf.h>
+#include <bpf/xsk.h>
+#include <errno.h>
+#include <getopt.h>
+#include <libgen.h>
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <linux/if_xdp.h>
+#include <linux/if_ether.h>
+#include <locale.h>
+#include <net/if.h>
+#include <poll.h>
+#include <pthread.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/resource.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+
+#define FRAME_HEADROOM  XDP_PACKET_HEADROOM
+#define FRAME_SIZE      XSK_UMEM__DEFAULT_FRAME_SIZE
+#define BATCH_SIZE      NETDEV_MAX_BURST


Move this item to the bottom, so you have FRAME specific define's first

+#define FRAME_SHIFT     XSK_UMEM__DEFAULT_FRAME_SHIFT
+#define FRAME_SHIFT_MASK    ((1 << FRAME_SHIFT) - 1)
+
+#define NUM_FRAMES      4096


Should we add a note/check to make sure this value is a power of 2?

+#define PROD_NUM_DESCS  512
+#define CONS_NUM_DESCS  512
+
+#ifdef USE_XSK_DEFAULT
+#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
+#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
+#endif

Any reason for having this? Should we use the default values? They are4x larger than you have, did it make any difference in performanceresults?We could make it configurable like for DPDK, using then_txq_desc/n_rxq_desc option.

+
+typedef struct {
+    atomic_int locked;
+} ovs_spinlock_t;
+

Think we should move the ovs_spinlock code and includes to some globalplace, maybe util or thread

+/* LIFO ptr_array */
+struct umem_pool {
+    int index;      /* point to top */
+    unsigned int size;
+    ovs_spinlock_t mutex;
+    void **array;   /* a pointer array, point to umem buf */
+};
+
+/* array-based dp_packet_afxdp */
+struct xpacket_pool {
+    unsigned int size;
+    struct dp_packet_afxdp **array;
+};
+
+struct xsk_umem_info {
+    struct umem_pool mpool;
+    struct xpacket_pool xpool;
+    struct xsk_ring_prod fq;
+    struct xsk_ring_cons cq;
+    struct xsk_umem *umem;
+    void *buffer;
+};
+
+struct xsk_socket_info {
+    struct xsk_ring_cons rx;
+    struct xsk_ring_prod tx;
+    struct xsk_umem_info *umem;
+    struct xsk_socket *xsk;
+    unsigned long rx_npkts;
+    unsigned long tx_npkts;
+    unsigned long prev_rx_npkts;
+    unsigned long prev_tx_npkts;
+    uint32_t outstanding_tx;
+};
+
+struct umem_elem {
+    struct umem_elem *next;
+};
+
+void __umem_elem_push(struct umem_pool *umemp, void *addr);
+void umem_elem_push(struct umem_pool *umemp, void *addr);
+int __umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs);
+int umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs);
+
+void *__umem_elem_pop(struct umem_pool *umemp);
+void *umem_elem_pop(struct umem_pool *umemp);
+int __umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs);
+int umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs);
+
+void **__umem_pool_alloc(unsigned int size);
+int umem_pool_init(struct umem_pool *umemp, unsigned int size);
+void umem_pool_cleanup(struct umem_pool *umemp);
+unsigned int umem_elem_count(struct umem_pool *mpool);
+int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size);
+void xpacket_pool_cleanup(struct xpacket_pool *xp);
+

Think all the __umem_* function are only used internally so they shouldbe come static and be removed here.

+#endif
diff --git a/tests/automake.mk b/tests/automake.mk
index ea16532dd2a0..715cef9a6b3b 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -4,12 +4,14 @@ EXTRA_DIST += \
        $(SYSTEM_TESTSUITE_AT) \
        $(SYSTEM_KMOD_TESTSUITE_AT) \
        $(SYSTEM_USERSPACE_TESTSUITE_AT) \
+       $(SYSTEM_AFXDP_TESTSUITE_AT) \
        $(SYSTEM_OFFLOADS_TESTSUITE_AT) \
        $(SYSTEM_DPDK_TESTSUITE_AT) \
        $(OVSDB_CLUSTER_TESTSUITE_AT) \
        $(TESTSUITE) \
        $(SYSTEM_KMOD_TESTSUITE) \
        $(SYSTEM_USERSPACE_TESTSUITE) \
+       $(SYSTEM_AFXDP_TESTSUITE) \
        $(SYSTEM_OFFLOADS_TESTSUITE) \
        $(SYSTEM_DPDK_TESTSUITE) \
        $(OVSDB_CLUSTER_TESTSUITE) \
@@ -158,6 +160,11 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \
        tests/system-userspace-macros.at \
        tests/system-userspace-packet-type-aware.at

+SYSTEM_AFXDP_TESTSUITE_AT = \
+       tests/system-afxdp-testsuite.at \
+       tests/system-afxdp-traffic.at \
+       tests/system-afxdp-macros.at
+
 SYSTEM_TESTSUITE_AT = \
        tests/system-common-macros.at \
        tests/system-ovn.at \
@@ -182,6 +189,7 @@ TESTSUITE = $(srcdir)/tests/testsuite
 TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
 SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite
SYSTEM_USERSPACE_TESTSUITE =$(srcdir)/tests/system-userspace-testsuite
+SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite
 SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite
 SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite
 OVSDB_CLUSTER_TESTSUITE = $(srcdir)/tests/ovsdb-cluster-testsuite
@@ -315,6 +323,11 @@ check-system-userspace: all
set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C testsAUTOTEST_PATH='$(AUTOTEST_PATH)'; \"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@"--recheck)
+check-afxdp: all
+       $(MAKE) install
+ set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C testsAUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
+       "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
+
 check-offloads: all
set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C testsAUTOTEST_PATH='$(AUTOTEST_PATH)'; \"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@"--recheck)@@ -352,6 +365,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4$(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP
        $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o [email protected] [email protected]
        $(AM_V_at)mv [email protected] $@
+$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT)$(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT)
+       $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o [email protected] [email protected]
+       $(AM_V_at)mv [email protected] $@
+
$(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT)$(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT)
        $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o [email protected] [email protected]
        $(AM_V_at)mv [email protected] $@
diff --git a/tests/system-afxdp-macros.atb/tests/system-afxdp-macros.at
new file mode 100644
index 000000000000..2c58c2d6554b
--- /dev/null
+++ b/tests/system-afxdp-macros.at
@@ -0,0 +1,153 @@
+# _ADD_BR([name])
+#
+# Expands into the proper ovs-vsctl commands to create a bridge withthe
+# appropriate type and properties
+m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1datapath_type=netdevprotocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15fail-mode=secure ]])
+
+# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output],[=override])
+#
+# Creates a database and starts ovsdb-server, starts ovs-vswitchd
+# connected to that database, calls ovs-vsctl to create a bridgenamed
+# br0 with predictable settings, passing 'vsctl-args' as additional
+# commands to ovs-vsctl.  If 'vsctl-args' causes ovs-vsctl to provide
+# output (e.g. because it includes "create" commands) then'vsctl-output'
+# specifies the expected output after filtering through uuidfilt.
+m4_define([OVS_TRAFFIC_VSWITCHD_START],
+  [
+   export OVS_PKGDATADIR=$(`pwd`)
+   _OVS_VSWITCHD_START([--disable-system])
+ AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [|uuidfilt])], [0], [$2])
+])
+
+# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds])
+#
+# Gracefully stops ovs-vswitchd and ovsdb-server, checking their logfiles+# for messages with severity WARN or higher and signaling an error ifany
+# is present.  The optional WHITELIST may contain shell-quoted "sed"
+# commands to delete any warnings that are actually expected, e.g.:
+#
+#   OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"])
+#
+# 'extra_cmds' are shell commands to be executed afteOVS_VSWITCHD_STOP() is+# invoked. They can be used to perform additional cleanups such asname space
+# removal.
+m4_define([OVS_TRAFFIC_VSWITCHD_STOP],
+  [OVS_VSWITCHD_STOP([dnl
+$1";/netdev_linux.*obtaining netdev stats via vport failed/d
+/dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist.The Open vSwitch kernel module is probably not loaded./d
+/dpif_netdev(revalidator.*)|ERR|internal error parsing flow key/d
+/dpif(revalidator.*)|WARN|netdev@ovs-netdev: failed to put/d
+"])
+   AT_CHECK([:; $2])
+  ])
+
+m4_define([ADD_VETH_AFXDP],
+ [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return77])
+      CONFIGURE_AFXDP_VETH_OFFLOADS([$1])
+      AT_CHECK([ip link set $1 netns $2])
+      AT_CHECK([ip link set dev ovs-$1 up])
+      AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
+ set interface ovs-$1 external-ids:iface-id="$1"type="afxdp"])
+      NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7])
+      NS_CHECK_EXEC([$2], [ip link set dev $1 up])
+      if test -n "$5"; then
+        NS_CHECK_EXEC([$2], [ip link set dev $1 address $5])
+      fi
+      if test -n "$6"; then
+        NS_CHECK_EXEC([$2], [ip route add default via $6])
+      fi
+      on_exit 'ip link del ovs-$1'
+    ]
+)
+
+# CONFIGURE_AFXDP_VETH_OFFLOADS([VETH])
+#
+# Disable TX offloads and VLAN offloads for veths used in AF_XDP.
+m4_define([CONFIGURE_AFXDP_VETH_OFFLOADS],
+    [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])
+     AT_CHECK([ethtool -K $1 rxvlan off], [0], [ignore], [ignore])
+     AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
+    ]
+)
+
+# CONFIGURE_VETH_OFFLOADS([VETH])
+#
+# Disable TX offloads for veths. The userspace datapath uses theAF_PACKET+# socket to receive packets for veths. Unfortunately, the AF_PACKETsocket
+# doesn't play well with offloads:
+# 1. GSO packets are received without segmentation and thereforediscarded.+# 2. Packets with offloaded partial checksum are received with thewrong
+#    checksum, therefore discarded by the receiver.
+#
+# By disabling tx offloads in the non-OVS side of the veth peer wemake sure
+# that the AF_PACKET socket will not receive bad packets.
+#
+# This is a workaround, and should be removed when offloads areproperly
+# supported in netdev-linux.
+m4_define([CONFIGURE_VETH_OFFLOADS],
+    [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])]
+)
+
+# CHECK_CONNTRACK()
+#
+# Perform requirements checks for running conntrack tests.
+#
+m4_define([CHECK_CONNTRACK],
+    [AT_SKIP_IF([test $HAVE_PYTHON = no])]
+)
+
+# CHECK_CONNTRACK_ALG()
+#
+# Perform requirements checks for running conntrack ALG tests. Theuserspace
+# supports FTP and TFTP.
+#
+m4_define([CHECK_CONNTRACK_ALG])
+
+# CHECK_CONNTRACK_FRAG()
+#
+# Perform requirements checks for running conntrack fragmentationstests.
+# The userspace doesn't support fragmentation yet, so skip the tests.
+m4_define([CHECK_CONNTRACK_FRAG],
+[
+    AT_SKIP_IF([:])
+])
+
+# CHECK_CONNTRACK_LOCAL_STACK()
+#
+# Perform requirements checks for running conntrack tests with localstack.+# While the kernel connection tracker automatically passes all theconnection+# tracking state from an internal port to the OpenvSwitch kernelmodule, there+# is simply no way of doing that with the userspace, so skip thetests.
+m4_define([CHECK_CONNTRACK_LOCAL_STACK],
+[
+    AT_SKIP_IF([:])
+])
+
+# CHECK_CONNTRACK_NAT()
+#
+# Perform requirements checks for running conntrack NAT tests. Theuserspace
+# datapath supports NAT.
+#
+m4_define([CHECK_CONNTRACK_NAT])
+
+# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE()
+#
+# Perform requirements checks for running ovs-dpctl flush-conntrackby
+# conntrack 5-tuple test. The userspace datapath does not support
+# this feature yet.
+m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE],
+[
+    AT_SKIP_IF([:])
+])
+
+# CHECK_CT_DPIF_SET_GET_MAXCONNS()
+#
+# Perform requirements checks for running ovs-dpctl ct-set-maxconnsor+# ovs-dpctl ct-get-maxconns. The userspace datapath does support thisfeature.
+m4_define([CHECK_CT_DPIF_SET_GET_MAXCONNS])
+
+# CHECK_CT_DPIF_GET_NCONNS()
+#
+# Perform requirements checks for running ovs-dpctl ct-get-nconns.The
+# userspace datapath does support this feature.
+m4_define([CHECK_CT_DPIF_GET_NCONNS])
diff --git a/tests/system-afxdp-testsuite.atb/tests/system-afxdp-testsuite.at
new file mode 100644
index 000000000000..538c0d15d556
--- /dev/null
+++ b/tests/system-afxdp-testsuite.at
@@ -0,0 +1,26 @@
+AT_INIT
+
+AT_COPYRIGHT([Copyright (c) 2018 Nicira, Inc.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at:
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express orimplied.
+See the License for the specific language governing permissions and
+limitations under the License.])
+
+m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS])
+
+m4_include([tests/ovs-macros.at])
+m4_include([tests/ovsdb-macros.at])
+m4_include([tests/ofproto-macros.at])
+m4_include([tests/system-afxdp-macros.at])
+m4_include([tests/system-common-macros.at])
+
+m4_include([tests/system-afxdp-traffic.at])
+m4_include([tests/system-ovn.at])
diff --git a/tests/system-afxdp-traffic.atb/tests/system-afxdp-traffic.at
new file mode 100644
index 000000000000..26f72acf48ef
--- /dev/null
+++ b/tests/system-afxdp-traffic.at
@@ -0,0 +1,978 @@
+AT_BANNER([AF_XDP netdev datapath-sanity])
+
+AT_SETUP([datapath - ping between two ports])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ulimit -l unlimited
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping between two ports on vlan])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+ADD_VLAN(p0, at_ns0, 100, "10.2.2.1/24")
+ADD_VLAN(p1, at_ns1, 100, "10.2.2.2/24")
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping6 between two ports])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
+
+dnl Linux seems to take a little time to get its IPv6 stack in order.Without
+dnl waiting, we get occasional failures due to the following error:
+dnl "connect: Cannot assign requested address"
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
+
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::2 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping6 between two ports on vlan])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
+
+ADD_VLAN(p0, at_ns0, 100, "fc00:1::1/96")
+ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96")
+
+dnl Linux seems to take a little time to get its IPv6 stack in order.Without
+dnl waiting, we get occasional failures due to the following error:
+dnl "connect: Cannot assign requested address"
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2])
+
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00:1::2| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over vxlan tunnel])
+OVS_CHECK_VXLAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using vethpair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with anative
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([vxlan], [br0], [at_vxlan0], [172.31.1.1],[10.1.1.100/24])+ADD_NATIVE_TUNNEL([vxlan], [at_vxlan1], [at_ns0], [172.31.1.100],[10.1.1.1/24],
+                  [id 0 dstport 4789])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],[OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over vxlan6 tunnel])
+OVS_CHECK_VXLAN_UDP6ZEROCSUM()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using vethpair.+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [],"nodad")
+AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with anative
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([vxlan], [br0], [at_vxlan0], [fc00::1],[10.1.1.100/24])+ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], [fc00::100],[10.1.1.1/24],
+                   [id 0 dstport 4789 udp6zerocsumtx udp6zerocsumrx])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0],[OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over gre tunnel])
+OVS_CHECK_GRE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using vethpair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with anative
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1],[10.1.1.100/24])+ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100],[10.1.1.1/24])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],[OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over erspan v1 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using vethpair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with anative
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1],[10.1.1.100/24], [options:key=1 options:erspan_ver=1options:erspan_idx=7])+ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100],[10.1.1.1/24], [seq key 1 erspan_ver 1 erspan 7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],[OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |FORMAT_PING], [0], [dnl+NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over erspan v2 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using vethpair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with anative
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1],[10.1.1.100/24], [options:key=1 options:erspan_ver=2options:erspan_dir=1 options:erspan_hwid=0x7])+ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100],[10.1.1.1/24], [seq key 1 erspan_ver 2 erspan_dir egress erspan_hwid7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],[OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |FORMAT_PING], [0], [dnl+NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over ip6erspan v1 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using vethpair.+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [],nodad)
+AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with anative
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1],[10.1.1.100/24],+ [options:key=123 options:erspan_ver=1options:erspan_idx=0x7])+ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0],[fc00:100::100],+ [10.1.1.1/24], [local fc00:100::1 seq key 123erspan_ver 1 erspan 7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0],[OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over ip6erspan v2 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using vethpair.+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [],nodad)
+AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with anative
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1],[10.1.1.100/24],+ [options:key=121 options:erspan_ver=2options:erspan_dir=0 options:erspan_hwid=0x7])+ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0],[fc00:100::100],
+                   [10.1.1.1/24],
+ [local fc00:100::1 seq key 121 erspan_ver 2erspan_dir ingress erspan_hwid 0x7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0],[OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over geneve tunnel])
+OVS_CHECK_GENEVE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using vethpair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with anative
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1],[10.1.1.100/24])+ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100],[10.1.1.1/24],
+                  [vni 0])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.100/24 br-underlay], [0],[OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over geneve6 tunnel])
+OVS_CHECK_GENEVE_UDP6ZEROCSUM()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using vethpair.+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [],"nodad")
+AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with anative
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([geneve], [br0], [at_gnv0], [fc00::1],[10.1.1.100/24])+ADD_NATIVE_TUNNEL6([geneve], [ns_gnv0], [at_ns0], [fc00::100],[10.1.1.1/24],
+                   [vni 0 udp6zerocsumtx udp6zerocsumrx])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0],[OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100| FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - clone action])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1, at_ns2)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_CHECK([ovs-vsctl -- set interface ovs-p0 ofport_request=1 \
+                    -- set interface ovs-p1 ofport_request=2])
+
+AT_DATA([flows.txt], [dnl
+priority=1 actions=NORMAL
+priority=10in_port=1,ip,actions=clone(mod_dl_dst(50:54:00:00:00:0a),set_field:192.168.3.3->ip_dst),output:2+priority=10in_port=2,ip,actions=clone(mod_dl_src(ae:c6:7e:54:8d:4d),mod_dl_dst(50:54:00:00:00:0b),set_field:192.168.4.4->ip_dst,controller), output:1
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir--pidfile 2> ofctl_monitor.log])+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([cat ofctl_monitor.log | STRIP_MONITOR_CSUM], [0], [dnl
+icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0icmp_csum: <skip>+icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0icmp_csum: <skip>+icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0icmp_csum: <skip>
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - basic truncate action])
+AT_SKIP_IF([test $HAVE_NC = no])
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-ofctl del-flows br0])
+
+dnl Create p0 and ovs-p0(1)
+ADD_NAMESPACES(at_ns0)
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+NS_CHECK_EXEC([at_ns0], [ip link set dev p0 addresse6:66:c1:11:11:11])
+NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22])
+
+dnl Create p1(3) and ovs-p1(2), packets received from ovs-p1 willappear in p1
+AT_CHECK([ip link add p1 type veth peer name ovs-p1])
+on_exit 'ip link del ovs-p1'
+AT_CHECK([ip link set dev ovs-p1 up])
+AT_CHECK([ip link set dev p1 up])
+AT_CHECK([ovs-vsctl add-port br0 ovs-p1 -- set interface ovs-p1ofport_request=2])
+dnl Use p1 to check the truncated packet
+AT_CHECK([ovs-vsctl add-port br0 p1 -- set interface p1ofport_request=3])
+
+dnl Create p2(5) and ovs-p2(4)
+AT_CHECK([ip link add p2 type veth peer name ovs-p2])
+on_exit 'ip link del ovs-p2'
+AT_CHECK([ip link set dev ovs-p2 up])
+AT_CHECK([ip link set dev p2 up])
+AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2ofport_request=4])
+dnl Use p2 to check the truncated packet
+AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2ofport_request=5])
+
+dnl basic test
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_DATA([flows.txt], [dnl
+in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=1 dl_dst=e6:66:c1:22:22:22actions=output(port=2,max_len=100),output:4
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+dnl use this file as payload file for ncat
+AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2>/dev/null])
+on_exit 'rm -f payload200.bin'
+NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <payload200.bin])
+
+dnl packet with truncated size
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed-n 's/.*$n\_bytes=[[0-9]]*$.*/\1/p'], [0], [dnl
+n_bytes=100
+])
+dnl packet with original size
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed-n 's/.*$n\_bytes=[[0-9]]*$.*/\1/p'], [0], [dnl
+n_bytes=242
+])
+
+dnl more complicated output actions
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_DATA([flows.txt], [dnl
+in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=1 dl_dst=e6:66:c1:22:22:22actions=output(port=2,max_len=100),output:4,output(port=2,max_len=100),output(port=4,max_len=100),output:2,output(port=4,max_len=200),output(port=2,max_len=65535)
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <payload200.bin])
+
+dnl 100 + 100 + 242 + min(65535,242) = 684
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed-n 's/.*$n\_bytes=[[0-9]]*$.*/\1/p'], [0], [dnl
+n_bytes=684
+])
+dnl 242 + 100 + min(242,200) = 542
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed-n 's/.*$n\_bytes=[[0-9]]*$.*/\1/p'], [0], [dnl
+n_bytes=542
+])
+
+dnl SLOW_ACTION: disable kernel datapath truncate support
+dnl Repeat the test above, but exercise the SLOW_ACTION code path
+AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0])
+
+dnl SLOW_ACTION test1: check datapatch actions
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CHECK([ovs-appctl ofproto/trace br0"in_port=1,dl_type=0x800,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=6,tp_src=8,tp_dst=9"],[0], [stdout])
+AT_CHECK([tail -3 stdout], [0],
+[Datapath actions:trunc(100),3,5,trunc(100),3,trunc(100),5,3,trunc(200),5,trunc(65535),3
+This flow is handled by the userspace slow path because it:
+  - Uses action(s) not supported by datapath.
+])
+
+dnl SLOW_ACTION test2: check actual packet truncate
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <payload200.bin])
+
+dnl 100 + 100 + 242 + min(65535,242) = 684
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed-n 's/.*$n\_bytes=[[0-9]]*$.*/\1/p'], [0], [dnl
+n_bytes=684
+])
+
+dnl 242 + 100 + min(242,200) = 542
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed-n 's/.*$n\_bytes=[[0-9]]*$.*/\1/p'], [0], [dnl
+n_bytes=542
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+
+AT_BANNER([conntrack])
+
+AT_SETUP([conntrack - controller])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbgofproto_dpif_upcall:dbg])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return trafficfrom ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(commit),controller
+priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
+priority=100,in_port=2,ct_state=+trk+est,udp,action=controller
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+AT_CAPTURE_FILE([ofctl_monitor.log])
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir--pidfile 2> ofctl_monitor.log])
+
+dnl Send an unsolicited reply from port 2. This should be dropped.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct$table=0$'50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
+
+dnl OK, now start a new connection from port 1.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1ct$commit$,controller'50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000'])
+
+dnl Now try a reply from port 2.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct$table=0$'50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
+
+dnl Check this output. We only see the latter two packets, not thefirst.
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN2 (xid=0x0): total_len=42 in_port=1 (via action)data_len=42 (unbuffered)+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2udp_csum:0+NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42ct_state=est|rpl|trk,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2,ip,in_port=2(via action) data_len=42 (unbuffered)+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1udp_csum:0
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - force commit])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbgofproto_dpif_upcall:dbg])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(force,commit),controller
+priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
+priority=100,in_port=2,ct_state=+trk+est,udp,action=ct(force,commit,table=1)
+table=1,in_port=2,ct_state=+trk,udp,action=controller
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+AT_CAPTURE_FILE([ofctl_monitor.log])
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir--pidfile 2> ofctl_monitor.log])
+
+dnl Send an unsolicited reply from port 2. This should be dropped.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000actions=resubmit(,0)"])
+
+dnl OK, now start a new connection from port 1.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000actions=resubmit(,0)"])
+
+dnl Now try a reply from port 2.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+
+dnl Check this output. We only see the latter two packets, not thefirst.
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (viaaction) data_len=42 (unbuffered)+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2udp_csum:0+NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=42ct_state=new|trk,ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1,ip,in_port=2(via action) data_len=42 (unbuffered)+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1udp_csum:0
+])
+
+dnl
+dnl Check that the directionality has been changed by force commit.
+dnl
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep"orig=.src=10\.1\.1\.2,"], [], [dnl
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2)
+])
+
+dnl OK, now send another packet from port 1 and see that it switchesagain+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000actions=resubmit(,0)"])
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep"orig=.src=10\.1\.1\.1,"], [], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - ct flush by 5-tuple])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(commit),2
+priority=100,in_port=2,udp,action=ct(zone=5,commit),1
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,action=ct(zone=5,commit),1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Test UDP from port 1
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep"orig=.src=10\.1\.1\.1,"], [], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep"orig=.src=10\.1\.1\.1,"], [1], [dnl
+])
+
+dnl Test UDP from port 2
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep"orig=.src=10\.1\.1\.2,"], [0], [dnl
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)],[0], [dnl
+])
+
+dnl Test ICMP traffic
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep"orig=.src=10\.1\.1\.2,"], [0], [stdout])
+AT_CHECK([cat stdout | FORMAT_CT(10.1.1.1)], [0],[dnl
+icmp,orig=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=8,code=0),reply=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=0,code=0),zone=5
+])
+
+ICMP_ID=`cat stdout | cut -d ',' -f4 | cut -d '=' -f2`
+ICMP_TUPLE=ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=1,icmp_id=$ICMP_ID,icmp_type=8,icmp_code=0
+AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep"orig=.src=10\.1\.1\.2,"], [1], [dnl
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - IPv4 ping])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return trafficfrom ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
+priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)],[0], [dnl
+icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+dnl Pings from ns1->ns0 should fail.
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 |FORMAT_PING], [0], [dnl
+7 packets transmitted, 0 received, 100% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - get_nconns and get/set_maxconns])
+CHECK_CONNTRACK()
+CHECK_CT_DPIF_SET_GET_MAXCONNS()
+CHECK_CT_DPIF_GET_NCONNS()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return trafficfrom ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
+priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)],[0], [dnl
+icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp], [2], [], [dnl
+ovs-vswitchd: maxconns missing or malformed (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns a], [2], [], [dnl
+ovs-vswitchd: maxconns missing or malformed (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp 10], [2], [],[dnl
+ovs-vswitchd: datapath not found (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns one-bad-dp], [2], [], [dnl
+ovs-vswitchd: datapath not found (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-nconns one-bad-dp], [2], [], [dnl
+ovs-vswitchd: datapath not found (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
+1
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
+3000000
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns 10], [], [dnl
+setting maxconns successful
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
+10
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
+0
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
+10
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - IPv6 ping])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
+
+AT_DATA([flows.txt], [dnl
+
+dnl ICMPv6 echo request and reply go to table 1. The rest of thetraffic goes
+dnl through normal action.
+table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1
+table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1
+table=0,priority=1,action=normal
+
+dnl Allow everything from ns0->ns1. Only allow return traffic fromns1->ns0.
+table=1,priority=100,in_port=1,icmp6,action=ct(commit),2
+table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0)
+table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1
+table=1,priority=1,action=drop
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
+
+dnl The above ping creates state in the connection tracker. We'renot
+dnl interested in that state.
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+dnl Pings from ns1->ns0 should fail.
+NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 |FORMAT_PING], [0], [dnl
+7 packets transmitted, 0 received, 100% packet loss, time 0ms
+])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 |FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0],[dnl
+icmpv6,orig=(src=fc00::1,dst=fc00::2,id=<cleared>,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=<cleared>,type=129,code=0)
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
--
2.7.4

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCHv8] netdev-afxdp: add new netdev type for AF_XDP.

Reply via email to