Hi William,
See below parts of an offline email discussion I had with Magnus before,
and some research I did in the end, which explains that by design you
might not get all the descriptors ready.
Hope this helps change your design…
In addition, the Point to Point test is working with you change,
however, the PVP test is still failing due to buffer starvation (see my
comments in Patchv8 for a possible cause).
Also on OVS restart system crashes in the following part:
#0 netdev_afxdp_rxq_recv (rxq_=0x173c080, batch=0x7fe1397f80d0,
qfill=0x0) at lib/netdev-afxdp.c:583
#1 0x0000000000907f21 in netdev_rxq_recv (rx=<optimized out>,
batch=batch@entry=0x7fe1397f80d0, qfill=<optimized out>) at
lib/netdev.c:710
#2 0x00000000008dd1c3 in dp_netdev_process_rxq_port
(pmd=pmd@entry=0x175d990, rxq=0x175a460, port_no=2) at
lib/dpif-netdev.c:4257
#3 0x00000000008dd63d in pmd_thread_main (f_=<optimized out>) at
lib/dpif-netdev.c:5449
#4 0x000000000095e94d in ovsthread_wrapper (aux_=<optimized out>) at
lib/ovs-thread.c:352
#5 0x00007fe1633872de in start_thread () from /lib64/libpthread.so.0
#6 0x00007fe162b2ca63 in clone () from /lib64/libc.so.6
Cheers,
Eelco
On 2019-05-22 15:20, Eelco Chaudron wrote:
Hi Magnus, at all,
I was working on the AF_XDP tutorial example and even with a single
RX_BATCH_SIZE I was not getting woken up every packet, so I decided
to get to the bottom of this :)
Looking at the kernel RX size, this is what happens:
xsk_generic_rcv();
xskq_peek_addr(xs->umem->fq, &addr);
...
xskq_discard_addr(xs->umem->fq);
Look at:
148 static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64
*addr)
149 {
150 if (q->cons_tail == q->cons_head) {
151 smp_mb(); /* D, matches A */
152 WRITE_ONCE(q->ring->consumer, q->cons_tail);
153 q->cons_head = q->cons_tail + xskq_nb_avail(q,
RX_BATCH_SIZE);
154
155 /* Order consumer and data */
156 smp_rmb();
157 }
158
159 return xskq_validate_addr(q, addr);
160 }
So ring->consumer gets updated here if we need some additional
buffers (we take 16).
Once we are done with processing a single packet we do not update the
ring->consumer, but only the tail:
162 static inline void xskq_discard_addr(struct xsk_queue *q)
163 {
164 q->cons_tail++;
165 }
Which means we free the consumers slots every 16th packet…
Now looking at the xdpsock_user.c code:
503 static void rx_drop(struct xsk_socket_info *xsk)
504 {
505 unsigned int rcvd, i;
506 u32 idx_rx = 0, idx_fq = 0;
507 int ret;
508
509 rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
510 if (!rcvd)
511 return;
512
513 ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
514 while (ret != rcvd) {
515 if (ret < 0)
516 exit_with_error(-ret);
517 ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
&idx_fq);
If we “sent a single ping only”, only one packet gets received,
but because the consumer is not updated by the kernel every received
packet, we end up waiting for the above line 152 in xskq_peek_addr().
518 }
519
520 for (i = 0; i < rcvd; i++) {
521 u64 addr = xsk_ring_cons__rx_desc(&xsk->rx,
idx_rx)->addr;
522 u32 len = xsk_ring_cons__rx_desc(&xsk->rx,
idx_rx++)->len;
523 char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
524
525 hex_dump(pkt, len, addr);
526 *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
addr;
527 }
528
529 xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
530 xsk_ring_cons__release(&xsk->rx, rcvd);
531 xsk->rx_npkts += rcvd;
532 }
Assuming updating the consumer only every RX_BATCH_SIZE was a design
decision, this means the example app needs some fixing.
What do you guys think?
Ok, picking up this thread!
Yes, the kernel only updates the *consumer* pointer of the fill ring
every RX_BATCH_SIZE, but the Rx ring *producer* pointer is bumped
everytime.
This means, that if you rely on (the naive :-)) code in the sample
application, you can endup in a situation where you can receive from
the
Rx ring, but not post to the fill ring.
So, the reason for the 16 packet hickup is as following:
1. Userland: The fill ring is completely filled.
2. Kernel: One packet is received, one entry picked from the fill
ring,
but the consumer pointer is not bumped, and packet is placed on the
Rx ring.
3. Userland: One packet is picked from the Rx ring.
4. Userland: Tries to put an entry on fill ring. The fill ring is
full,
so userland spins.
5. Kernel: When 16 packets has been picked from the fill ring the
consumer ptr is released.
6. Userland: Exists the while loop.
We could make the sample more robust, but then again, it's a sample!
:-)
Just wanted to get to the bottom of this, as Magnus was not able to
reproduce it in his setup.
Guess his network is noisier and get more than a single packet in…
I’m working on an AF_XDP example for the xdp-tutorial project and
will fix it there. It’s just that people seem to blindly copy
examples…
Also, it might be useful to get an API to know how many slots are
avail, like xsk_ring_prod__free(struct xsk_ring_prod *prod),
this way we could see how many descriptors we can add to top-off the
queue. There is xsk_prod_nb_free() in the file, but does not seems
like an official API (or are we ok to use it)?
Let me know what you think, and I can send a patch for
xsk_ring_prod__free().
I'd say use it; It's part of the xsk.h file, but inlined, so it's not
versioned...
Cheers,
Björn
On 15 Jun 2019, at 15:33, William Tu wrote:
Hi Eelco,
I think it's either a bug in kernel or my misunderstanding about how
to
process the xsk cq ring. I posted the issue here
https://marc.info/?l=xdp-newbies&m=156055471727857&w=2
And apply this to v11 fix my crash.
Do you mind testing it again on your system?
Thanks for your time for trial and error
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index a6543e8f5126..800c047b71f9 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -705,6 +705,7 @@ afxdp_complete_tx(struct xsk_socket_info *xsk)
struct umem_elem *elems_push[BATCH_SIZE];
uint32_t idx_cq = 0;
int tx_done, j, ret;
+ int tx = 0;
if (!xsk->outstanding_tx) {
return;
@@ -717,23 +718,29 @@ afxdp_complete_tx(struct xsk_socket_info *xsk)
}
tx_done = xsk_ring_cons__peek(&xsk->umem->cq, BATCH_SIZE,
&idx_cq);
- if (tx_done > 0) {
- xsk_ring_cons__release(&xsk->umem->cq, tx_done);
- xsk->outstanding_tx -= tx_done;
- }
/* Recycle back to umem pool */
for (j = 0; j < tx_done; j++) {
struct umem_elem *elem;
- uint64_t addr;
+ uint64_t *addr;
- addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
+ addr = xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
+ if (*addr == 0) {
+ continue;
+ }
elem = ALIGNED_CAST(struct umem_elem *,
- (char *)xsk->umem->buffer + addr);
+ (char *)xsk->umem->buffer + *addr);
elems_push[j] = elem;
+ *addr = 0;
+ tx++;
}
- umem_elem_push_n(&xsk->umem->mpool, tx_done, (void
**)elems_push);
+ umem_elem_push_n(&xsk->umem->mpool, tx, (void **)elems_push);
+
+ if (tx_done > 0) {
+ xsk_ring_cons__release(&xsk->umem->cq, tx_done);
+ xsk->outstanding_tx -= tx_done;
+ }
}
int
On Thu, Jun 13, 2019 at 12:17 AM Eelco Chaudron <[email protected]>
wrote:
On 13 Jun 2019, at 2:37, William Tu wrote:
Hi Eelco,
#0 0x00007fbc6a78193f in raise () from /lib64/libc.so.6
#1 0x00007fbc6a76bc95 in abort () from /lib64/libc.so.6
#2 0x00000000004ed1a1 in __umem_elem_push_n (addrs=0x7fbc40f2ec50,
n=32, umemp=0x24cc790) at lib/xdpsock.c:32
#3 umem_elem_push_n (umemp=0x24cc790, n=32,
I've found that it's due to free the afxdp twice.
The free_afxdp_buf() should be called once per dp_packet, somehow
it gets called twice.
Applying this on v11 fixes the issue
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -145,7 +145,7 @@ dp_packet_uninit(struct dp_packet *b)
free_dpdk_buf((struct dp_packet*) b);
#endif
} else if (b->source == DPBUF_AFXDP) {
- free_afxdp_buf(b);
+ ;
}
}
}
It’s still crashing for me, even with the change. I’m using a
single nic, same port in as out:
@@ -122,6 +144,8 @@ dp_packet_uninit(struct dp_packet *b)
* created as a dp_packet */
free_dpdk_buf((struct dp_packet*) b);
#endif
+ } else if (b->source == DPBUF_AFXDP) {
+// free_afxdp_buf(b);
}
}
}
Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock
-vconsole:emer -vsyslog:err -vfi'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f62ef71193f in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f62bdf33700 (LWP 24737))]
Missing separate debuginfos, use: dnf debuginfo-install
elfutils-libelf-0.174-6.el8.x86_64 glibc-2.28-42.el8_0.1.x86_64
libatomic-8.2.1-3.5.el8.x86_64 libcap-ng-0.7.9-4.el8.x86_64
numactl-libs-2.0.12-2.el8.x86_64 openssl-libs-1.1.1-8.el8.x86_64
zlib-1.2.11-10.el8.x86_64
(gdb) bt
#0 0x00007f62ef71193f in raise () from /lib64/libc.so.6
#1 0x00007f62ef6fbc95 in abort () from /lib64/libc.so.6
#2 0x00000000004ed1a1 in __umem_elem_push_n (addrs=0x7f62bdf30c50,
n=32, umemp=0x2378660) at lib/xdpsock.c:32
#3 umem_elem_push_n (umemp=0x2378660, n=32,
addrs=addrs@entry=0x7f62bdf30ea0) at lib/xdpsock.c:43
#4 0x00000000009b4f41 in afxdp_complete_tx (xsk=0x2378250) at
lib/netdev-afxdp.c:736
#5 netdev_afxdp_batch_send (netdev=<optimized out>, qid=0,
batch=0x7f62a4004e80, concurrent_txq=<optimized out>) at
lib/netdev-afxdp.c:763
#6 0x0000000000908031 in netdev_send (netdev=<optimized out>,
qid=qid@entry=0, batch=batch@entry=0x7f62a4004e80,
concurrent_txq=concurrent_txq@entry=true) at lib/netdev.c:800
#7 0x00000000008d4c24 in dp_netdev_pmd_flush_output_on_port
(pmd=pmd@entry=0x7f62bdf34010, p=p@entry=0x7f62a4004e50) at
lib/dpif-netdev.c:4187
#8 0x00000000008d4f3f in dp_netdev_pmd_flush_output_packets
(pmd=pmd@entry=0x7f62bdf34010, force=force@entry=false) at
lib/dpif-netdev.c:4227
#9 0x00000000008dd2d7 in dp_netdev_pmd_flush_output_packets
(force=false, pmd=0x7f62bdf34010) at lib/dpif-netdev.c:4282
#10 dp_netdev_process_rxq_port (pmd=pmd@entry=0x7f62bdf34010,
rxq=0x237e650, port_no=1) at lib/dpif-netdev.c:4282
#11 0x00000000008dd63d in pmd_thread_main (f_=<optimized out>) at
lib/dpif-netdev.c:5449
#12 0x000000000095e94d in ovsthread_wrapper (aux_=<optimized out>) at
lib/ovs-thread.c:352
#13 0x00007f62f00312de in start_thread () from /lib64/libpthread.so.0
#14 0x00007f62ef7d6a63 in clone () from /lib64/libc.so.6
Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock
-vconsole:emer -vsyslog:err -vfi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 netdev_afxdp_rxq_recv (rxq_=0x2bd75d0, batch=0x7f399f7fc0d0,
qfill=0x0) at lib/netdev-afxdp.c:583
583 rx->fd = xsk_socket__fd(xsk->xsk);
[Current thread is 1 (Thread 0x7f399f7fe700 (LWP 24597))]
Missing separate debuginfos, use: dnf debuginfo-install
elfutils-libelf-0.174-6.el8.x86_64 glibc-2.28-42.el8_0.1.x86_64
libatomic-8.2.1-3.5.el8.x86_64 libcap-ng-0.7.9-4.el8.x86_64
numactl-libs-2.0.12-2.el8.x86_64 openssl-libs-1.1.1-8.el8.x86_64
zlib-1.2.11-10.el8.x86_64
(gdb) bt
#0 netdev_afxdp_rxq_recv (rxq_=0x2bd75d0, batch=0x7f399f7fc0d0,
qfill=0x0) at lib/netdev-afxdp.c:583
#1 0x0000000000907f21 in netdev_rxq_recv (rx=<optimized out>,
batch=batch@entry=0x7f399f7fc0d0, qfill=<optimized out>) at
lib/netdev.c:710
#2 0x00000000008dd1c3 in dp_netdev_process_rxq_port
(pmd=pmd@entry=0x2bf80c0, rxq=0x2bd63e0, port_no=2) at
lib/dpif-netdev.c:4257
#3 0x00000000008dd63d in pmd_thread_main (f_=<optimized out>) at
lib/dpif-netdev.c:5449
#4 0x000000000095e94d in ovsthread_wrapper (aux_=<optimized out>) at
lib/ovs-thread.c:352
#5 0x00007f39d86de2de in start_thread () from /lib64/libpthread.so.0
#6 0x00007f39d7e83a63 in clone () from /lib64/libc.so.6
(gdb) p xsk->xsk
Cannot access memory at address 0x316f6ebd
(gdb) p xsk
$1 = (struct xsk_socket_info *) 0x316f6e65
I will work on next version
Thank you
William
<SNIP>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev