On Tue, Apr 21, 2020 at 11:47:00PM +0900, Toshiaki Makita wrote:
> This patch adds an XDP-based flow cache using the OVS netdev-offload
> flow API provider.  When an OVS device with XDP offload enabled,
> packets first are processed in the XDP flow cache (with parse, and
> table lookup implemented in eBPF) and if hits, the action processing
> are also done in the context of XDP, which has the minimum overhead.
> 
> This provider is based on top of William's recently posted patch for
> custom XDP load.  When a custom XDP is loaded, the provider detects if
> the program supports classifier, and if supported it starts offloading
> flows to the XDP program.
> 
> The patches are derived from xdp_flow[1], which is a mechanism similar to
> this but implemented in kernel.
> 
> 
> * Motivation
> 
> While userspace datapath using netdev-afxdp or netdev-dpdk shows good
> performance, there are use cases where packets better to be processed in
> kernel, for example, TCP/IP connections, or container to container
> connections.  Current solution is to use tap device or af_packet with
> extra kernel-to/from-userspace overhead.  But with XDP, a better solution
> is to steer packets earlier in the XDP program, and decides to send to
> userspace datapath or stay in kernel.
> 
> One problem with current netdev-afxdp is that it forwards all packets to
> userspace, The first patch from William (netdev-afxdp: Enable loading XDP
> program.) only provides the interface to load XDP program, howerver users
> usually don't know how to write their own XDP program.
> 
> XDP also supports HW-offload so it may be possible to offload flows to
> HW through this provider in the future, although not currently.
> The reason is that map-in-map is required for our program to support
> classifier with subtables in XDP, but map-in-map is not offloadable.
> If map-in-map becomes offloadable, HW-offload of our program will also
> be doable.
> 
> 
> * How to use
> 
> 1. Install clang/llvm >= 9, libbpf >= 0.0.4, and kernel >= 5.3.
> 
> 2. make with --enable-afxdp --enable-bpf
> --enable-bpf will generate XDP program "bpf/flowtable_afxdp.o".  Note that
> the BPF object will not be installed anywhere by "make install" at this 
> point. 

When configure, there is a missing include, causing error due to __u64
checking bpf/bpf_helpers.h usability... no
checking bpf/bpf_helpers.h presence... yes
configure: WARNING: bpf/bpf_helpers.h: present but cannot be compiled
configure: WARNING: bpf/bpf_helpers.h:     check for missing prerequisite 
headers?

configure:18876: gcc -c -g -O2  conftest.c >&5
In file included from /usr/local/include/bpf/bpf_helpers.h:5:0,
                 from conftest.c:73:
/usr/local/include/bpf/bpf_helper_defs.h:55:82: error: unknown type name '__u64'
 static int (*bpf_map_update_elem)(void *map, const void *key, const void 
*value, __u64 flags) = (void *) 2;
                                                                                
  ^
/usr/local/include/bpf/bpf_helper_defs.h:79:41: error: unknown type name '__u32'
 static int (*bpf_probe_read)(void *dst, __u32 size, const void *unsafe_ptr) = 
(void *) 4;

I applied this to fix it:
diff --git a/acinclude.m4 b/acinclude.m4
index 5eeab6feb9cc..39dfce565182 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -326,7 +326,8 @@ AC_DEFUN([OVS_CHECK_LINUX_BPF], [
       [AC_MSG_ERROR([unable to find llc to compile BPF program])])
 
     AC_CHECK_HEADER([bpf/bpf_helpers.h], [],
-      [AC_MSG_ERROR([unable to find bpf/bpf_helpers.h to compile BPF 
program])])
+      [AC_MSG_ERROR([unable to find bpf/bpf_helpers.h to compile BPF 
program])],
+        [#include <linux/types.h>])
 
     AC_CHECK_HEADER([linux/bpf.h], [],
       [AC_MSG_ERROR([unable to find linux/bpf.h to compile BPF program])])

> 
> 3. Load custom XDP program
> E.g.
> $ ovs-vsctl add-port ovsbr0 veth0 -- set int veth0 options:xdp-mode=native \
>   options:xdp-obj="path/to/ovs/bpf/flowtable_afxdp.o"
> $ ovs-vsctl add-port ovsbr0 veth1 -- set int veth1 options:xdp-mode=native \
>   options:xdp-obj="path/to/ovs/bpf/flowtable_afxdp.o"
> 
> 4. Enable XDP_REDIRECT
> If you use veth devices, make sure to load some (possibly dummy) programs
> on the peers of veth devices.
> 
> 5. Enable hw-offload 
> $ ovs-vsctl set Open_vSwitch . other_config:offload-driver=linux_xdp
> $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> This will starts offloading flows to the XDP program.
> 
Thanks, I can succefully get it working...
When applying your patch on current master, I hit a bug.
I will send a patch to fix it.

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 40d0cc1105ea..b52071e92ec7 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -3588,6 +3588,7 @@ const struct netdev_class netdev_internal_class = {
 
 #ifdef HAVE_AF_XDP
 #define NETDEV_AFXDP_CLASS_COMMON                               \
+    .init = netdev_afxdp_init,                                  \
     .construct = netdev_afxdp_construct,                        \
     .destruct = netdev_afxdp_destruct,                          \
     .get_stats = netdev_afxdp_get_stats,                        \


Other part works fine.
I'm planning to play with more rules and performance.

William

> You should be able to see some maps installed, including "debug_stats".
> $ bpftool map
> 
> If packets are successfully redirected by the XDP program,
> debug_stats[2] will be counted.
> $ bpftool map dump id <ID of debug_stats>
> 
> Currently only very limited keys and output actions are supported.
> For example NORMAL action entry and IP based matching work with current
> key support.
> 
> 
> * Performance
> 
> Tested 2 cases. 1) i40e to veth, 2) i40e to i40e.
> Test 1 Measured drop rate at veth interface with redirect action from
> physical interface (i40e 25G NIC, XXV 710) to veth. The CPU is Xeon
> Silver 4114 (2.20 GHz).
>                                                                XDP_DROP
>                     +------+                      +-------+    +-------+
>  pktgen -- wire --> | eth0 | -- NORMAL ACTION --> | veth0 |----| veth2 |
>                     +------+                      +-------+    +-------+
> 
> Test 2 uses i40e instead of veth, and measured tx packet rate at output
> device.
> 
> Single-flow performance test results:
> 
> 1) i40e-veth
> 
>   a) no-zerocopy in i40e
> 
>     - xdp   3.7 Mpps
>     - afxdp 820 kpps
> 
>   b) zerocopy in i40e (veth does not have zc)
> 
>     - xdp   1.8 Mpps
>     - afxdp 800 Kpps
> 
> 2) i40e-i40e
> 
>   a) no-zerocopy
> 
>     - xdp   3.0 Mpps
>     - afxdp 1.1 Mpps
> 
>   b) zerocopy
> 
>     - xdp   1.7 Mpps
>     - afxdp 4.0 Mpps
> 
> ** xdp is better when zc is disabled. The reason of poor performance on zc
>    is that xdp_frame requires packet memory allocation and memcpy on
>    XDP_REDIRECT to other devices iff zc is enabled.
> 
> ** afxdp with zc is better than xdp without zc, but afxdp is using 2 cores
>    in this case, one is pmd and the other is softirq. When pmd and softirq
>    were running on the same core, the performance was extremely poor as
>    pmd consumes cpus.
>    When offloading to xdp, xdp only uses softirq while pmd is still
>    consuming 100% cpu.  This means we need probably only one pmd for xdp
>    even when we want to use more cores for multi-flow.
>    I'll also test afxdp-nonpmd when it is applied.
> 
> 
> This patch set is based on top of commit 82b7e6d19 ("compat: Fix broken
> partial backport of extack op parameter").
> 
> [1] https://lwn.net/Articles/802653/
> 
> v2:
> - Add uninit callback of netdev-offload-xdp.
> - Introduce "offload-driver" other_config to specify offload driver.
> - Add --enable-bpf (HAVE_BPF) config option to build bpf programs.
> - Workaround incorrect UINTPTR_MAX in x64 clang bpf build.
> - Fix boot.sh autoconf warning.
> 
> TODO:
> - CI fails due to missing function "bpf_program__get_type" which is not
>   provided by libbpf from linux 5.3. Although we can use linux >= 5.5 to
>   fix it, maybe it's time to switch to using libbpf standalone repository?
> - Fix a crash bug in patch 1 which has been reported by Eelco Chaudron.
> - Add test for XDP offload driver.
> - Add documentation.
> - Implement more actions like vlan push/pop.
> 
> Toshiaki Makita (3):
>   netdev-offload: Add "offload-driver" other_config to specify offload
>     driver
>   netdev-offload: Add xdp flow api provider
>   bpf: Add reference XDP program implementation for netdev-offload-xdp
> 
> William Tu (1):
>   netdev-afxdp: Enable loading XDP program.
> 
>  Documentation/intro/install/afxdp.rst |   59 ++
>  Makefile.am                           |    9 +-
>  NEWS                                  |    2 +
>  acinclude.m4                          |   56 ++
>  bpf/.gitignore                        |    4 +
>  bpf/Makefile.am                       |   59 ++
>  bpf/bpf_miniflow.h                    |  179 ++++
>  bpf/bpf_netlink.h                     |   34 +
>  bpf/bpf_workaround.h                  |   28 +
>  bpf/flowtable_afxdp.c                 |  515 +++++++++++
>  configure.ac                          |    2 +
>  lib/automake.mk                       |    6 +-
>  lib/bpf-util.c                        |   38 +
>  lib/bpf-util.h                        |   22 +
>  lib/netdev-afxdp.c                    |  342 +++++++-
>  lib/netdev-afxdp.h                    |    3 +
>  lib/netdev-linux-private.h            |    5 +
>  lib/netdev-offload-provider.h         |    6 +
>  lib/netdev-offload-xdp.c              | 1143 +++++++++++++++++++++++++
>  lib/netdev-offload-xdp.h              |   49 ++
>  lib/netdev-offload.c                  |   40 +-
>  21 files changed, 2582 insertions(+), 19 deletions(-)
>  create mode 100644 bpf/.gitignore
>  create mode 100644 bpf/Makefile.am
>  create mode 100644 bpf/bpf_miniflow.h
>  create mode 100644 bpf/bpf_netlink.h
>  create mode 100644 bpf/bpf_workaround.h
>  create mode 100644 bpf/flowtable_afxdp.c
>  create mode 100644 lib/bpf-util.c
>  create mode 100644 lib/bpf-util.h
>  create mode 100644 lib/netdev-offload-xdp.c
>  create mode 100644 lib/netdev-offload-xdp.h
> 
> -- 
> 2.25.1
> 
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to