On 17 May 2019, at 14:39, Ilya Maximets wrote:

Hi.
Just a few comments to the issues you're listed.

Best regards, Ilya Maximets.

On 17.05.2019 13:23, Eelco Chaudron wrote:
Hi William,

First a list of issues I found during some basic testing...

- When I restart or stop OVS (using the systemctl interface as found in RHEL) it does not clean up the BFP program causing the restart to fail:

  2019-05-10T09:12:11.384Z|00042|netdev_afxdp|ERR|AF_XDP device eno1 reconfig fails   2019-05-10T09:12:11.384Z|00043|dpif_netdev|ERR|Failed to set interface eno1 new configuration

  I need to manually run "ip link set dev eno1 xdp off" to make it recover.

Userspace datapath requires '--cleanup' option passed to 'ovs-appctl exit' to clean up allocated resources. Otherwise datapath will not be destroyed,
i.e. netdev will not be destroyed --> no xdp program unloading.

Maybe we should try to reload/cleanup at startup?


- When I remove a bridge, I get an emer in the revalidator:

  2019-05-10T09:40:34.401Z|00045|netdev_afxdp|INFO|remove xdp program   2019-05-10T09:40:34.652Z|00001|util(revalidator49)|EMER|lib/poll-loop.c:111: assertion !fd != !wevent failed in poll_create_node()


This actually should never happen. Looks like a memory corruption.

  Easy to replicate with this:

    $ ovs-vsctl add-br ovs_pvp_br0 -- set bridge ovs_pvp_br0 datapath_type=netdev     $ ovs-vsctl add-port ovs_pvp_br0 eno1 -- set interface eno1 type="afxdp" options:xdpmode=drv
    $ ovs-vsctl del-br ovs_pvp_br0


- High pmd usage on the statistics, even with no packets is this expected?

  $ ovs-appctl dpif-netdev/pmd-rxq-show
  pmd thread numa_id 0 core_id 1:
    isolated : false
    port: dpdk0             queue-id:  0  pmd usage:  0 %     port: eno1              queue-id:  0  pmd usage: 49 %

  It goes up slowly and gets stuck at 49%


- When doing the PVP testing I noticed that the physical port has odd/no
  tx statistics:

  $ ovs-ofctl dump-ports ovs_pvp_br0
  OFPST_PORT reply (xid=0x2): 3 ports
    port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
             tx pkts=0, bytes=0, drop=0, errs=0, coll=0
    port  eno1: rx pkts=103256197, bytes=6195630508, drop=0, errs=0, frame=0, over=0, crc=0              tx pkts=0, bytes=19789272440056, drop=0, errs=0, coll=0     port  tapVM: rx pkts=4043, bytes=501278, drop=0, errs=0, frame=0, over=0, crc=0              tx pkts=4058, bytes=502504, drop=0, errs=0, coll=0


- Packets larger than 1028 bytes are dropped. Guess this needs to be fixed, and we need to state that jumbo frames are not supported. Are you planning on adding this?

  Currently I can find not mentioning of MTU limitation in the documentation, or any code to prevent it from being changed above the supported limit.

Actually Jumbo frames are supported, but yes, the packet size
is limited by the page size. So, jumbo frames up to ~3.5K should
be supported without issues.
We'll need to determine the upper limit and reject requested mtu
if it's larger.

Currently, none-jumbo frames are not even working, and I think a jumbo check should be added as we allocate chunks of 2048.



- ovs-vswitchd is still crashing or stops forwarding packets when trying to do   PVP testing with Qemu that has a TAP interface doing XDP and running packets
  at wire speed to the 10G interface.

Actually, there are a lot of places in current version where rings/umems could be corrupted leading to unpredictable memory corruptions/crashes/time wasting
trying to allocate exhausted resources.

ACK :)


  When trying with lower volume packets it seems to work, so with 1% traffic   rate, it forwards packets without any problems (148,771 pps). If I go to   10% the first couple of packet pass, then it stops forwarding. If it's not   crashing I still see packets being received by eno1 flow rules, but no
  packets make it to the VM.

<snip>




The following might be useful when combining DPDK and AF_XDP:

  Currently, DPDK and AF_XDP polling can be combined on a single PMD thread, it   might be nice to have an option to not do this, i.e. have separate PMD   threads for each type. I know we can do this with assigning specific PMDs to   queues, but this will disable auto-balancing. This will also help later if
  we would add poll() mode support for AF_XDP.

This might make some sense, but certainly not on this stage of development. I don't think that we should expect any production level performance or fine grained
solution that must work perfectly in all corner cases.
For now it's better to focus on making it reliable. At least to handle all the control sequences (stop/start/restart/reconfigure) and ability to forward traffic without
hangs/crashes/memory corruptions.

Agreed, this was just something to take into consideration for further development…


<snip>

diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 859c05613ddf..cc91720fad6e 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -198,6 +198,20 @@ cycles_counter_update(struct pmd_perf_stats *s)
 {
 #ifdef DPDK_NETDEV
     return s->last_tsc = rte_get_tsc_cycles();
+#elif HAVE_AF_XDP

We need to add support for at least ARM and PPC, not sure how to do this nicely.

IMHO, it's not required for the experimental feature. But yes,
we'll need to add support later. I'm thinking about fallback to
CLOCK_MONOTONIC_RAW (not that portable too) and further to just
time_usec().

This might add some system calls and this function is called quite often.
Maybe just return 0 in these cases (if we do not divide by it :).

This code is already a quick cut/paste from DPDK, license?

Good question.

I guess copying is fine as long as we give them credits. I was more hinting that if we do it Intel 64 bit we might as well do it for ARM and PPC and experimental support for those…

If you worried, I could gift a version written from scratch (I swear):

---
    uint32_t h, l;

    asm volatile("rdtsc" : "=a" (l), "=d" (h));

    return s->last_tsc = ((uint64_t) h << 32) | l;
---


+    /* This is x86-specific instructions. */
+    union {
+        uint64_t tsc_64;
+        struct {
+            uint32_t lo_32;
+            uint32_t hi_32;
+        };
+    } tsc;
+    asm volatile("rdtsc" :
+             "=a" (tsc.lo_32),
+             "=d" (tsc.hi_32));
+
+    return s->last_tsc = tsc.tsc_64;
 #else
     return s->last_tsc = 0;
 #endif
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
new file mode 100644
index 000000000000..cd1b9ca8be77
--- /dev/null
+++ b/lib/netdev-afxdp.c
@@ -0,0 +1,727 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#if !defined(__i386__) && !defined(__x86_64__)
+#error AF_XDP supported only for Linux on x86 or x86_64

Any reason why we do not support PPC and ARM?

Simple: rdtsc.

Actually, right now we need to restrict support to only x86_64, because
above rdtsc is in 64bit form and will not work for 32bit cpu.

If this is all, why not copy the rest from DPDK and support all…

<snip>

diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
new file mode 100644
index 000000000000..3dd3d902b3c4
--- /dev/null
+++ b/lib/netdev-linux-private.h
@@ -0,0 +1,124 @@
+/*
+ * Copyright (c) 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef NETDEV_LINUX_PRIVATE_H
+#define NETDEV_LINUX_PRIVATE_H 1
+
+#include <config.h>
+
+#include <linux/filter.h>
+#include <linux/gen_stats.h>
+#include <linux/if_ether.h>
+#include <linux/if_tun.h>
+#include <linux/types.h>
+#include <linux/ethtool.h>
+#include <linux/mii.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+#include "netdev-provider.h"
+#include "netdev-tc-offloads.h"
+#include "netdev-vport.h"
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+#include "timer.h"

Why include all the above? They where just added to netdev-linux.h, so if you make sure you include netdev-lunux.h before -private it should work out.

This doesn't look right. File should include everything it uses.
If something from above headers used in this file, headers should
stay. But if the header is not used in this file, it should be
not included here. Otherwise we'll mess up all the includes.


ACK, guess just some checking/cleanup is needed.

<snip>

@@ -1318,10 +1285,18 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
 {
     struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
     struct netdev *netdev = rx->up.netdev;
-    struct dp_packet *buffer;
+    struct dp_packet *buffer = NULL;
     ssize_t retval;
     int mtu;

+#if HAVE_AF_XDP

Think this #if HAVE_AF_XDP can be removed as the compiler should optimize out the if (false).

I guess this will cause build failure with -O0.

Guess you are right, looking at it again, it will fail at:

return netdev_linux_rxq_xsk(dev->xsk[qid], batch);

as the dev structure does not have dev->xsk…


<snip>

diff --git a/lib/xdpsock.h b/lib/xdpsock.h
new file mode 100644
index 000000000000..aabaa8e5df24
--- /dev/null
+++ b/lib/xdpsock.h
@@ -0,0 +1,123 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef XDPSOCK_H
+#define XDPSOCK_H 1
+
+#include <bpf/libbpf.h>
+#include <bpf/xsk.h>
+#include <errno.h>
+#include <getopt.h>
+#include <libgen.h>
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <linux/if_xdp.h>
+#include <linux/if_ether.h>
+#include <locale.h>
+#include <net/if.h>
+#include <poll.h>
+#include <pthread.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/resource.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+
+#define FRAME_HEADROOM  XDP_PACKET_HEADROOM
+#define FRAME_SIZE      XSK_UMEM__DEFAULT_FRAME_SIZE
+#define BATCH_SIZE      NETDEV_MAX_BURST

Move this item to the bottom, so you have FRAME specific define's first

+#define FRAME_SHIFT     XSK_UMEM__DEFAULT_FRAME_SHIFT
+#define FRAME_SHIFT_MASK    ((1 << FRAME_SHIFT) - 1)
+
+#define NUM_FRAMES      4096

Should we add a note/check to make sure this value is a power of 2?

+#define PROD_NUM_DESCS  512
+#define CONS_NUM_DESCS  512
+
+#ifdef USE_XSK_DEFAULT
+#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
+#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
+#endif

Any reason for having this? Should we use the default values? They are 4x larger than you have, did it make any difference in performance results? We could make it configurable like for DPDK, using the n_txq_desc/n_rxq_desc option.

I think, this is not necessary right now and could be done later.

Guess we can do this later, but the vswitch.xml document needs some clarification as all options are under the “PMD (Poll Mode Driver) Options” section, which pure technically AF_XDP is not. However, we support the pmd-rxq-affinity/n_rxq options.
<snip>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to