Hi William,

See below parts of an offline email discussion I had with Magnus before, and some research I did in the end, which explains that by design you might not get all the descriptors ready.
Hope this helps change your design…

In addition, the Point to Point test is working with you change, however, the PVP test is still failing due to buffer starvation (see my comments in Patchv8 for a possible cause).

Also on OVS restart system crashes in the following part:

#0 netdev_afxdp_rxq_recv (rxq_=0x173c080, batch=0x7fe1397f80d0, qfill=0x0) at lib/netdev-afxdp.c:583 #1 0x0000000000907f21 in netdev_rxq_recv (rx=<optimized out>, batch=batch@entry=0x7fe1397f80d0, qfill=<optimized out>) at lib/netdev.c:710 #2 0x00000000008dd1c3 in dp_netdev_process_rxq_port (pmd=pmd@entry=0x175d990, rxq=0x175a460, port_no=2) at lib/dpif-netdev.c:4257 #3 0x00000000008dd63d in pmd_thread_main (f_=<optimized out>) at lib/dpif-netdev.c:5449 #4 0x000000000095e94d in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:352
#5  0x00007fe1633872de in start_thread () from /lib64/libpthread.so.0
#6  0x00007fe162b2ca63 in clone () from /lib64/libc.so.6


Cheers,

Eelco


On 2019-05-22 15:20, Eelco Chaudron wrote:
Hi Magnus, at all,

I was working on the AF_XDP tutorial example and even with a single RX_BATCH_SIZE I was not getting woken up every packet, so I decided to get to the bottom of this :)

Looking at the kernel RX size, this is what happens:

  xsk_generic_rcv();
     xskq_peek_addr(xs->umem->fq, &addr);
     ...
     xskq_discard_addr(xs->umem->fq);


Look at:

148 static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
149  {
150      if (q->cons_tail == q->cons_head) {
151          smp_mb(); /* D, matches A */
152          WRITE_ONCE(q->ring->consumer, q->cons_tail);
153 q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);
154
155          /* Order consumer and data */
156          smp_rmb();
157      }
158
159      return xskq_validate_addr(q, addr);
160  }

So ring->consumer gets updated here if we need some additional buffers (we take 16).

Once we are done with processing a single packet we do not update the ring->consumer, but only the tail:

162  static inline void xskq_discard_addr(struct xsk_queue *q)
163  {
164      q->cons_tail++;
165  }

Which means we free the consumers slots every 16th packet…

Now looking at the xdpsock_user.c code:

503  static void rx_drop(struct xsk_socket_info *xsk)
504  {
505      unsigned int rcvd, i;
506      u32 idx_rx = 0, idx_fq = 0;
507      int ret;
508
509      rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
510      if (!rcvd)
511          return;
512
513      ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
514      while (ret != rcvd) {
515          if (ret < 0)
516              exit_with_error(-ret);
517 ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);

If we “sent a single ping only”, only one packet gets received, but because the consumer is not updated by the kernel every received packet, we end up waiting for the above line 152 in xskq_peek_addr().

518      }
519
520      for (i = 0; i < rcvd; i++) {
521 u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr; 522 u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
523          char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
524
525          hex_dump(pkt, len, addr);
526 *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
527      }
528
529      xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
530      xsk_ring_cons__release(&xsk->rx, rcvd);
531      xsk->rx_npkts += rcvd;
532  }

Assuming updating the consumer only every RX_BATCH_SIZE was a design decision, this means the example app needs some fixing.


What do you guys think?


Ok, picking up this thread!

Yes, the kernel only updates the *consumer* pointer of the fill ring
every RX_BATCH_SIZE, but the Rx ring *producer* pointer is bumped everytime.

This means, that if you rely on (the naive :-)) code in the sample
application, you can endup in a situation where you can receive from the
Rx ring, but not post to the fill ring.

So, the reason for the 16 packet hickup is as following:

1. Userland: The fill ring is completely filled.
2. Kernel: One packet is received, one entry picked from the fill ring,
   but the consumer pointer is not bumped, and packet is placed on the
   Rx ring.
3. Userland: One packet is picked from the Rx ring.
4. Userland: Tries to put an entry on fill ring. The fill ring is full,
   so userland spins.
5. Kernel: When 16 packets has been picked from the fill ring the
   consumer ptr is released.
6. Userland: Exists the while loop.

We could make the sample more robust, but then again, it's a sample! :-)

Just wanted to get to the bottom of this, as Magnus was not able to reproduce it in his setup.
Guess his network is noisier and get more than a single packet in…

I’m working on an AF_XDP example for the xdp-tutorial project and will fix it there. It’s just that people seem to blindly copy examples…

Also, it might be useful to get an API to know how many slots are avail, like xsk_ring_prod__free(struct xsk_ring_prod *prod), this way we could see how many descriptors we can add to top-off the queue. There is xsk_prod_nb_free() in the file, but does not seems like an official API (or are we ok to use it)?

Let me know what you think, and I can send a patch for xsk_ring_prod__free().


I'd say use it; It's part of the xsk.h file, but inlined, so it's not
versioned...

Cheers,
Björn



On 15 Jun 2019, at 15:33, William Tu wrote:

Hi Eelco,

I think it's either a bug in kernel or my misunderstanding about how to
process the xsk cq ring. I posted the issue here
https://marc.info/?l=xdp-newbies&m=156055471727857&w=2

And apply this to v11 fix my crash.
Do you mind testing it again on your system?
Thanks for your time for trial and error

diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index a6543e8f5126..800c047b71f9 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -705,6 +705,7 @@ afxdp_complete_tx(struct xsk_socket_info *xsk)
     struct umem_elem *elems_push[BATCH_SIZE];
     uint32_t idx_cq = 0;
     int tx_done, j, ret;
+    int tx = 0;

     if (!xsk->outstanding_tx) {
         return;
@@ -717,23 +718,29 @@ afxdp_complete_tx(struct xsk_socket_info *xsk)
     }

tx_done = xsk_ring_cons__peek(&xsk->umem->cq, BATCH_SIZE, &idx_cq);
-    if (tx_done > 0) {
-        xsk_ring_cons__release(&xsk->umem->cq, tx_done);
-        xsk->outstanding_tx -= tx_done;
-    }

     /* Recycle back to umem pool */
     for (j = 0; j < tx_done; j++) {
         struct umem_elem *elem;
-        uint64_t addr;
+        uint64_t *addr;

-        addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
+        addr = xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
+        if (*addr == 0) {
+            continue;
+        }
         elem = ALIGNED_CAST(struct umem_elem *,
-                            (char *)xsk->umem->buffer + addr);
+                            (char *)xsk->umem->buffer + *addr);
         elems_push[j] = elem;
+        *addr = 0;
+        tx++;
     }

- umem_elem_push_n(&xsk->umem->mpool, tx_done, (void **)elems_push);
+    umem_elem_push_n(&xsk->umem->mpool, tx, (void **)elems_push);
+
+    if (tx_done > 0) {
+        xsk_ring_cons__release(&xsk->umem->cq, tx_done);
+        xsk->outstanding_tx -= tx_done;
+    }
 }

 int

On Thu, Jun 13, 2019 at 12:17 AM Eelco Chaudron <[email protected]> wrote:

On 13 Jun 2019, at 2:37, William Tu wrote:

Hi Eelco,

#0 0x00007fbc6a78193f in raise () from /lib64/libc.so.6
#1 0x00007fbc6a76bc95 in abort () from /lib64/libc.so.6
#2 0x00000000004ed1a1 in __umem_elem_push_n (addrs=0x7fbc40f2ec50,
n=32, umemp=0x24cc790) at lib/xdpsock.c:32
#3 umem_elem_push_n (umemp=0x24cc790, n=32,

I've found that it's due to free the afxdp twice.
The free_afxdp_buf() should be called once per dp_packet, somehow
it gets called twice.
Applying this on v11 fixes the issue
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -145,7 +145,7 @@ dp_packet_uninit(struct dp_packet *b)
free_dpdk_buf((struct dp_packet*) b);
#endif
} else if (b->source == DPBUF_AFXDP) {
- free_afxdp_buf(b);
+ ;
}
}
}

It’s still crashing for me, even with the change. I’m using a single nic, same port in as out:

@@ -122,6 +144,8 @@ dp_packet_uninit(struct dp_packet *b)
              * created as a dp_packet */
             free_dpdk_buf((struct dp_packet*) b);
 #endif
+        } else if (b->source == DPBUF_AFXDP) {
+//            free_afxdp_buf(b);
         }
     }
 }

Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfi'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f62ef71193f in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f62bdf33700 (LWP 24737))]
Missing separate debuginfos, use: dnf debuginfo-install elfutils-libelf-0.174-6.el8.x86_64 glibc-2.28-42.el8_0.1.x86_64 libatomic-8.2.1-3.5.el8.x86_64 libcap-ng-0.7.9-4.el8.x86_64 numactl-libs-2.0.12-2.el8.x86_64 openssl-libs-1.1.1-8.el8.x86_64 zlib-1.2.11-10.el8.x86_64
(gdb) bt
#0  0x00007f62ef71193f in raise () from /lib64/libc.so.6
#1  0x00007f62ef6fbc95 in abort () from /lib64/libc.so.6
#2 0x00000000004ed1a1 in __umem_elem_push_n (addrs=0x7f62bdf30c50, n=32, umemp=0x2378660) at lib/xdpsock.c:32 #3 umem_elem_push_n (umemp=0x2378660, n=32, addrs=addrs@entry=0x7f62bdf30ea0) at lib/xdpsock.c:43 #4 0x00000000009b4f41 in afxdp_complete_tx (xsk=0x2378250) at lib/netdev-afxdp.c:736 #5 netdev_afxdp_batch_send (netdev=<optimized out>, qid=0, batch=0x7f62a4004e80, concurrent_txq=<optimized out>) at lib/netdev-afxdp.c:763 #6 0x0000000000908031 in netdev_send (netdev=<optimized out>, qid=qid@entry=0, batch=batch@entry=0x7f62a4004e80, concurrent_txq=concurrent_txq@entry=true) at lib/netdev.c:800 #7 0x00000000008d4c24 in dp_netdev_pmd_flush_output_on_port (pmd=pmd@entry=0x7f62bdf34010, p=p@entry=0x7f62a4004e50) at lib/dpif-netdev.c:4187 #8 0x00000000008d4f3f in dp_netdev_pmd_flush_output_packets (pmd=pmd@entry=0x7f62bdf34010, force=force@entry=false) at lib/dpif-netdev.c:4227 #9 0x00000000008dd2d7 in dp_netdev_pmd_flush_output_packets (force=false, pmd=0x7f62bdf34010) at lib/dpif-netdev.c:4282 #10 dp_netdev_process_rxq_port (pmd=pmd@entry=0x7f62bdf34010, rxq=0x237e650, port_no=1) at lib/dpif-netdev.c:4282 #11 0x00000000008dd63d in pmd_thread_main (f_=<optimized out>) at lib/dpif-netdev.c:5449 #12 0x000000000095e94d in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:352
#13 0x00007f62f00312de in start_thread () from /lib64/libpthread.so.0
#14 0x00007f62ef7d6a63 in clone () from /lib64/libc.so.6

Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 netdev_afxdp_rxq_recv (rxq_=0x2bd75d0, batch=0x7f399f7fc0d0, qfill=0x0) at lib/netdev-afxdp.c:583
583        rx->fd = xsk_socket__fd(xsk->xsk);
[Current thread is 1 (Thread 0x7f399f7fe700 (LWP 24597))]
Missing separate debuginfos, use: dnf debuginfo-install elfutils-libelf-0.174-6.el8.x86_64 glibc-2.28-42.el8_0.1.x86_64 libatomic-8.2.1-3.5.el8.x86_64 libcap-ng-0.7.9-4.el8.x86_64 numactl-libs-2.0.12-2.el8.x86_64 openssl-libs-1.1.1-8.el8.x86_64 zlib-1.2.11-10.el8.x86_64
(gdb) bt
#0 netdev_afxdp_rxq_recv (rxq_=0x2bd75d0, batch=0x7f399f7fc0d0, qfill=0x0) at lib/netdev-afxdp.c:583 #1 0x0000000000907f21 in netdev_rxq_recv (rx=<optimized out>, batch=batch@entry=0x7f399f7fc0d0, qfill=<optimized out>) at lib/netdev.c:710 #2 0x00000000008dd1c3 in dp_netdev_process_rxq_port (pmd=pmd@entry=0x2bf80c0, rxq=0x2bd63e0, port_no=2) at lib/dpif-netdev.c:4257 #3 0x00000000008dd63d in pmd_thread_main (f_=<optimized out>) at lib/dpif-netdev.c:5449 #4 0x000000000095e94d in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:352
#5  0x00007f39d86de2de in start_thread () from /lib64/libpthread.so.0
#6  0x00007f39d7e83a63 in clone () from /lib64/libc.so.6
(gdb) p xsk->xsk
Cannot access memory at address 0x316f6ebd
(gdb) p xsk
$1 = (struct xsk_socket_info *) 0x316f6e65

I will work on next version
Thank you
William

<SNIP>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to