On 17 May 2019, at 14:39, Ilya Maximets wrote:
Hi.
Just a few comments to the issues you're listed.
Best regards, Ilya Maximets.
On 17.05.2019 13:23, Eelco Chaudron wrote:
Hi William,
First a list of issues I found during some basic testing...
- When I restart or stop OVS (using the systemctl interface as found
in RHEL) it does not clean up the BFP program causing the restart to
fail:
2019-05-10T09:12:11.384Z|00042|netdev_afxdp|ERR|AF_XDP device eno1
reconfig fails
2019-05-10T09:12:11.384Z|00043|dpif_netdev|ERR|Failed to set
interface eno1 new configuration
I need to manually run "ip link set dev eno1 xdp off" to make it
recover.
Userspace datapath requires '--cleanup' option passed to 'ovs-appctl
exit'
to clean up allocated resources. Otherwise datapath will not be
destroyed,
i.e. netdev will not be destroyed --> no xdp program unloading.
Maybe we should try to reload/cleanup at startup?
- When I remove a bridge, I get an emer in the revalidator:
2019-05-10T09:40:34.401Z|00045|netdev_afxdp|INFO|remove xdp
program
2019-05-10T09:40:34.652Z|00001|util(revalidator49)|EMER|lib/poll-loop.c:111:
assertion !fd != !wevent failed in poll_create_node()
This actually should never happen. Looks like a memory corruption.
Easy to replicate with this:
$ ovs-vsctl add-br ovs_pvp_br0 -- set bridge ovs_pvp_br0
datapath_type=netdev
$ ovs-vsctl add-port ovs_pvp_br0 eno1 -- set interface eno1
type="afxdp" options:xdpmode=drv
$ ovs-vsctl del-br ovs_pvp_br0
- High pmd usage on the statistics, even with no packets is this
expected?
$ ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 1:
isolated : false
port: dpdk0 queue-id: 0 pmd
usage: 0 %
port: eno1 queue-id: 0 pmd
usage: 49 %
It goes up slowly and gets stuck at 49%
- When doing the PVP testing I noticed that the physical port has
odd/no
tx statistics:
$ ovs-ofctl dump-ports ovs_pvp_br0
OFPST_PORT reply (xid=0x2): 3 ports
port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0,
over=0, crc=0
tx pkts=0, bytes=0, drop=0, errs=0, coll=0
port eno1: rx pkts=103256197, bytes=6195630508, drop=0,
errs=0, frame=0, over=0, crc=0
tx pkts=0, bytes=19789272440056, drop=0,
errs=0, coll=0
port tapVM: rx pkts=4043, bytes=501278, drop=0, errs=0,
frame=0, over=0, crc=0
tx pkts=4058, bytes=502504, drop=0, errs=0,
coll=0
- Packets larger than 1028 bytes are dropped. Guess this needs to be
fixed, and we need to state that jumbo frames are not supported. Are
you planning on adding this?
Currently I can find not mentioning of MTU limitation in the
documentation, or any code to prevent it from being changed above the
supported limit.
Actually Jumbo frames are supported, but yes, the packet size
is limited by the page size. So, jumbo frames up to ~3.5K should
be supported without issues.
We'll need to determine the upper limit and reject requested mtu
if it's larger.
Currently, none-jumbo frames are not even working, and I think a jumbo
check should be added as we allocate chunks of 2048.
- ovs-vswitchd is still crashing or stops forwarding packets when
trying to do
PVP testing with Qemu that has a TAP interface doing XDP and
running packets
at wire speed to the 10G interface.
Actually, there are a lot of places in current version where
rings/umems could
be corrupted leading to unpredictable memory corruptions/crashes/time
wasting
trying to allocate exhausted resources.
ACK :)
When trying with lower volume packets it seems to work, so with 1%
traffic
rate, it forwards packets without any problems (148,771 pps). If I
go to
10% the first couple of packet pass, then it stops forwarding. If
it's not
crashing I still see packets being received by eno1 flow rules,
but no
packets make it to the VM.
<snip>
The following might be useful when combining DPDK and AF_XDP:
Currently, DPDK and AF_XDP polling can be combined on a single PMD
thread, it
might be nice to have an option to not do this, i.e. have separate
PMD
threads for each type. I know we can do this with assigning
specific PMDs to
queues, but this will disable auto-balancing. This will also help
later if
we would add poll() mode support for AF_XDP.
This might make some sense, but certainly not on this stage of
development.
I don't think that we should expect any production level performance
or fine grained
solution that must work perfectly in all corner cases.
For now it's better to focus on making it reliable. At least to handle
all the control
sequences (stop/start/restart/reconfigure) and ability to forward
traffic without
hangs/crashes/memory corruptions.
Agreed, this was just something to take into consideration for further
development…
<snip>
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 859c05613ddf..cc91720fad6e 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -198,6 +198,20 @@ cycles_counter_update(struct pmd_perf_stats *s)
{
#ifdef DPDK_NETDEV
return s->last_tsc = rte_get_tsc_cycles();
+#elif HAVE_AF_XDP
We need to add support for at least ARM and PPC, not sure how to do
this nicely.
IMHO, it's not required for the experimental feature. But yes,
we'll need to add support later. I'm thinking about fallback to
CLOCK_MONOTONIC_RAW (not that portable too) and further to just
time_usec().
This might add some system calls and this function is called quite
often.
Maybe just return 0 in these cases (if we do not divide by it :).
This code is already a quick cut/paste from DPDK, license?
Good question.
I guess copying is fine as long as we give them credits. I was more
hinting that if we do it Intel 64 bit we might as well do it for ARM and
PPC and experimental support for those…
If you worried, I could gift a version written from scratch (I swear):
---
uint32_t h, l;
asm volatile("rdtsc" : "=a" (l), "=d" (h));
return s->last_tsc = ((uint64_t) h << 32) | l;
---
+ /* This is x86-specific instructions. */
+ union {
+ uint64_t tsc_64;
+ struct {
+ uint32_t lo_32;
+ uint32_t hi_32;
+ };
+ } tsc;
+ asm volatile("rdtsc" :
+ "=a" (tsc.lo_32),
+ "=d" (tsc.hi_32));
+
+ return s->last_tsc = tsc.tsc_64;
#else
return s->last_tsc = 0;
#endif
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
new file mode 100644
index 000000000000..cd1b9ca8be77
--- /dev/null
+++ b/lib/netdev-afxdp.c
@@ -0,0 +1,727 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
software
+ * distributed under the License is distributed on an "AS IS"
BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+ * See the License for the specific language governing permissions
and
+ * limitations under the License.
+ */
+
+#if !defined(__i386__) && !defined(__x86_64__)
+#error AF_XDP supported only for Linux on x86 or x86_64
Any reason why we do not support PPC and ARM?
Simple: rdtsc.
Actually, right now we need to restrict support to only x86_64,
because
above rdtsc is in 64bit form and will not work for 32bit cpu.
If this is all, why not copy the rest from DPDK and support all…
<snip>
diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
new file mode 100644
index 000000000000..3dd3d902b3c4
--- /dev/null
+++ b/lib/netdev-linux-private.h
@@ -0,0 +1,124 @@
+/*
+ * Copyright (c) 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
software
+ * distributed under the License is distributed on an "AS IS"
BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+ * See the License for the specific language governing permissions
and
+ * limitations under the License.
+ */
+
+#ifndef NETDEV_LINUX_PRIVATE_H
+#define NETDEV_LINUX_PRIVATE_H 1
+
+#include <config.h>
+
+#include <linux/filter.h>
+#include <linux/gen_stats.h>
+#include <linux/if_ether.h>
+#include <linux/if_tun.h>
+#include <linux/types.h>
+#include <linux/ethtool.h>
+#include <linux/mii.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+#include "netdev-provider.h"
+#include "netdev-tc-offloads.h"
+#include "netdev-vport.h"
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+#include "timer.h"
Why include all the above? They where just added to netdev-linux.h,
so if you make sure you include netdev-lunux.h before -private it
should work out.
This doesn't look right. File should include everything it uses.
If something from above headers used in this file, headers should
stay. But if the header is not used in this file, it should be
not included here. Otherwise we'll mess up all the includes.
ACK, guess just some checking/cleanup is needed.
<snip>
@@ -1318,10 +1285,18 @@ netdev_linux_rxq_recv(struct netdev_rxq
*rxq_, struct dp_packet_batch *batch,
{
struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
struct netdev *netdev = rx->up.netdev;
- struct dp_packet *buffer;
+ struct dp_packet *buffer = NULL;
ssize_t retval;
int mtu;
+#if HAVE_AF_XDP
Think this #if HAVE_AF_XDP can be removed as the compiler should
optimize out the if (false).
I guess this will cause build failure with -O0.
Guess you are right, looking at it again, it will fail at:
return netdev_linux_rxq_xsk(dev->xsk[qid], batch);
as the dev structure does not have dev->xsk…
<snip>
diff --git a/lib/xdpsock.h b/lib/xdpsock.h
new file mode 100644
index 000000000000..aabaa8e5df24
--- /dev/null
+++ b/lib/xdpsock.h
@@ -0,0 +1,123 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
software
+ * distributed under the License is distributed on an "AS IS"
BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
+ * See the License for the specific language governing permissions
and
+ * limitations under the License.
+ */
+
+#ifndef XDPSOCK_H
+#define XDPSOCK_H 1
+
+#include <bpf/libbpf.h>
+#include <bpf/xsk.h>
+#include <errno.h>
+#include <getopt.h>
+#include <libgen.h>
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <linux/if_xdp.h>
+#include <linux/if_ether.h>
+#include <locale.h>
+#include <net/if.h>
+#include <poll.h>
+#include <pthread.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/resource.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "openvswitch/thread.h"
+#include "ovs-atomic.h"
+
+#define FRAME_HEADROOM XDP_PACKET_HEADROOM
+#define FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE
+#define BATCH_SIZE NETDEV_MAX_BURST
Move this item to the bottom, so you have FRAME specific define's
first
+#define FRAME_SHIFT XSK_UMEM__DEFAULT_FRAME_SHIFT
+#define FRAME_SHIFT_MASK ((1 << FRAME_SHIFT) - 1)
+
+#define NUM_FRAMES 4096
Should we add a note/check to make sure this value is a power of 2?
+#define PROD_NUM_DESCS 512
+#define CONS_NUM_DESCS 512
+
+#ifdef USE_XSK_DEFAULT
+#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
+#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
+#endif
Any reason for having this? Should we use the default values? They
are 4x larger than you have, did it make any difference in
performance results?
We could make it configurable like for DPDK, using the
n_txq_desc/n_rxq_desc option.
I think, this is not necessary right now and could be done later.
Guess we can do this later, but the vswitch.xml document needs some
clarification as all options are under the “PMD (Poll Mode Driver)
Options” section, which pure technically AF_XDP is not. However, we
support the pmd-rxq-affinity/n_rxq options.
<snip>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev