On 18.06.2019 12:45, Eelco Chaudron wrote:
> 
> 
> On 17 Jun 2019, at 22:32, William Tu wrote:
> 
>> On Mon, Jun 17, 2019 at 11:23 AM William Tu <u9012...@gmail.com> wrote:
>>>
>>> Hi Eelco,
>>>
>>> On Mon, Jun 17, 2019 at 3:12 AM Eelco Chaudron <echau...@redhat.com> wrote:
>>>>
>>>> Hi William,
>>>>
>>>> See below parts of an offline email discussion I had with Magnus before,
>>>> and some research I did in the end, which explains that by design you
>>>> might not get all the descriptors ready.
>>>
>>> I think it's different issues. The behavior you described is a hickup 
>>> waiting
>>> for queuing 16 rx packets. Here, at the afxdp_complete_tx, the
>>> xsk_ring_cons__peek
>>> returns descs that already been released, causing ovs push more elems and 
>>> thus
>>> crash.
>>>
>>>> Hope this helps change your design…
>>>>
>>>> In addition, the Point to Point test is working with you change,
>>>> however, the PVP test is still failing due to buffer starvation (see my
>>>> comments in Patchv8 for a possible cause).
>>>>
>>> Thanks, looking back v8
>>> https://patchwork.ozlabs.org/patch/1097740/
>>> Hopefully next version will fix this issue.
>>>
>>>> Also on OVS restart system crashes in the following part:
>>>>
>>>> #0  netdev_afxdp_rxq_recv (rxq_=0x173c080, batch=0x7fe1397f80d0,
>>>> qfill=0x0) at lib/netdev-afxdp.c:583
>>>> #1  0x0000000000907f21 in netdev_rxq_recv (rx=<optimized out>,
>>>> batch=batch@entry=0x7fe1397f80d0, qfill=<optimized out>) at
>>>> lib/netdev.c:710
>>>> #2  0x00000000008dd1c3 in dp_netdev_process_rxq_port
>>>> (pmd=pmd@entry=0x175d990, rxq=0x175a460, port_no=2) at
>>>> lib/dpif-netdev.c:4257
>>>> #3  0x00000000008dd63d in pmd_thread_main (f_=<optimized out>) at
>>>> lib/dpif-netdev.c:5449
>>>> #4  0x000000000095e94d in ovsthread_wrapper (aux_=<optimized out>) at
>>>> lib/ovs-thread.c:352
>>>> #5  0x00007fe1633872de in start_thread () from /lib64/libpthread.so.0
>>>> #6  0x00007fe162b2ca63 in clone () from /lib64/libc.so.6
>>>>
>>> How do you restart the system? So I have two afxdp port
>>>         Port "eth3"
>>>             Interface "eth3"
>>>                 type: afxdp
>>>                 options: {n_rxq="1", xdpmode=drv}
>>>         Port "eth5"
>>>             Interface "eth5"
>>>                 type: afxdp
>>>                 options: {n_rxq="1", xdpmode=drv}
>>>
>>> I tested using
>>> # ovs-vsctl del-port eth3
>>> # ovs-vsctl del-port eth5
>>> # ovs-vsctl del-br br0
>>> # ovs-appctl -t ovs-vswitchd exit
>>> Looks ok.
>>>
>>> <snip>
>>>
>>>>> This means, that if you rely on (the naive :-)) code in the sample
>>>>> application, you can endup in a situation where you can receive from
>>>>> the
>>>>> Rx ring, but not post to the fill ring.
>>>>>
>>>>> So, the reason for the 16 packet hickup is as following:
>>>>>
>>>>> 1. Userland: The fill ring is completely filled.
>>>>> 2. Kernel: One packet is received, one entry picked from the fill
>>>>> ring,
>>>>>    but the consumer pointer is not bumped, and packet is placed on the
>>>>>    Rx ring.
>>>>> 3. Userland: One packet is picked from the Rx ring.
>>>>> 4. Userland: Tries to put an entry on fill ring. The fill ring is
>>>>> full,
>>>>>    so userland spins.
>>>>> 5. Kernel: When 16 packets has been picked from the fill ring the
>>>>>    consumer ptr is released.
>>>>> 6. Userland: Exists the while loop.
>>>
>>> Based on the above, there is no starvation problem here if there are more
>>> than 16 packets, correct? And at step 4, we can skip spinning and try to
>>> process more rx ring.
>>>
>>> For next version, I will first check the fill ring by using 
>>> xsk_prod_nb_free(),
>>> to avoid the step 4.
>>>
>>> Thanks
>>> William
>>
>> Hi Eelco,
>>
>> I have some fixes with commit "prepare for v12" at
>> https://github.com/williamtu/ovs-ebpf/commits/afxdp-v11
>>
>> I tested PVP and it works ok (using tap and also veth namespaces)
>> Can you give it a try?
> 
> The PVP test seems to work fine however after a while it stops forwarding:
> 
> $ ovs-ofctl dump-flows ovs_pvp_br0
>  cookie=0x0, duration=8.510s, table=0, n_packets=1, n_bytes=1020, 
> in_port=eno1 actions=output:tapVM
>  cookie=0x0, duration=8.504s, table=0, n_packets=1, n_bytes=252, 
> in_port=tapVM actions=output:eno1
> 
> Results:
> 
> "Physical port, ""eno1"", speed 10 Gbit/s, traffic rate 100%"
> "Physical to Virtual to Physical test, L3 flows[port redirect]"
> ,Packet size
> Number of flows,64,256,1024
> 10,13448,131687,0
> 100,596,0,0
> 1000,596,0,0
> 
> Rather low compared to the kernel, note the above is using a single queue:
> 
> "Physical port, ""eno1"", speed 10 Gbit/s, traffic rate 100%"
> "Physical to Virtual to Physical test, L3 flows[port redirect]"
> ,Packet size
> Number of flows,64,256,1024
> 10,502411,451579,421558
> 100,525439,440637,422051
> 1000,463875,419996,402010
> 
> However I can not restart OVS (see other email on how I restart), even if I 
> clear the XDP programs before a restart it fails, and cores.
> The only way to recover is to reboot the box and start from scratch:
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007f455919a9b5 in xsk_clear_bpf_maps (xsk=0x21) at xsk.c:462
> 462        bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id, &qid, 0);
> [Current thread is 1 (Thread 0x7f4559f1c000 (LWP 4898))]
> Missing separate debuginfos, use: dnf debuginfo-install 
> elfutils-libelf-0.174-6.el8.x86_64 glibc-2.28-42.el8_0.1.x86_64 
> libatomic-8.2.1-3.5.el8.x86_64 libcap-ng-0.7.9-4.el8.x86_64 
> numactl-libs-2.0.12-2.el8.x86_64 openssl-libs-1.1.1-8.el8.x86_64 
> zlib-1.2.11-10.el8.x86_64
> (gdb) bt
> #0  0x00007f455919a9b5 in xsk_clear_bpf_maps (xsk=0x21) at xsk.c:462
> #1  0x00007f455919b278 in xsk_socket__delete (xsk=0x21) at xsk.c:711
> #2  0x00000000009b3af1 in xsk_destroy (xsk_info=<optimized out>) at 
> lib/netdev-afxdp.c:313
> #3  xsk_destroy_all (netdev=0x1df49a0) at lib/netdev-afxdp.c:313
> #4  0x00000000009b4fe9 in netdev_afxdp_destruct (netdev_=0x1df49a0) at 
> lib/netdev-afxdp.c:845
> #5  0x0000000000906e53 in netdev_unref (dev=0x1df49a0) at lib/netdev.c:573
> #6  0x00000000008739b1 in iface_do_create (errp=0x7ffe4fc5b588, 
> netdevp=0x7ffe4fc5b580, ofp_portp=0x7ffe4fc5b578, iface_cfg=0x1cde5d0, 
> br=0x1ce1690) at vswitchd/bridge.c:1825
> #7  iface_create (port_cfg=0x1cb3690, iface_cfg=0x1cde5d0, br=0x1ce1690) at 
> vswitchd/bridge.c:1848
> #8  bridge_add_ports__ (br=br@entry=0x1ce1690, 
> wanted_ports=wanted_ports@entry=0x1ce1770, 
> with_requested_port=with_requested_port@entry=false) at vswitchd/bridge.c:936
> #9  0x0000000000875ef7 in bridge_add_ports (wanted_ports=0x1ce1770, 
> br=0x1ce1690) at vswitchd/bridge.c:952
> #10 bridge_reconfigure (ovs_cfg=ovs_cfg@entry=0x1cb4b90) at 
> vswitchd/bridge.c:666
> #11 0x0000000000879521 in bridge_run () at vswitchd/bridge.c:3043
> #12 0x00000000004ef545 in main (argc=<optimized out>, argv=<optimized out>) 
> at vswitchd/ovs-vswitchd.c:127
> 
> Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: 
> ovs|00051|netdev_afxdp|ERR|xsk_socket__create failed (Device or resource 
> busy) mode: SKB qid: 0
> Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: 
> ovs|00052|netdev_afxdp|ERR|failed to create AF_XDP socket on queue 0
> Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: 
> ovs|00055|netdev_afxdp|ERR|AF_XDP device tapVM reconfig fails
> Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: 
> ovs|00056|dpif_netdev|ERR|Failed to set interface tapVM new configuration
> Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: 
> ovs|00062|netdev_afxdp|ERR|xsk_socket__create failed (Device or resource 
> busy) mode: DRV qid: 0
> Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: 
> ovs|00063|netdev_afxdp|ERR|failed to create AF_XDP socket on queue 0
> Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: 
> ovs|00066|netdev_afxdp|ERR|AF_XDP device eno1 reconfig fails
> Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com ovs-vswitchd[5861]: 
> ovs|00067|dpif_netdev|ERR|Failed to set interface eno1 new configuration
> Jun 18 03:52:06 wsfd-netdev76.ntdv.lab.eng.bos.redhat.com kernel: 
> ovs-vswitchd[5861]: segfault at 123 ip 00000000009b3afd sp 00007ffff954a770 
> error 4 in ovs-vswitchd[400000+899000]
> 

I guess, this crash caused by trying to destroy unallocated queue.

Following change could help:
---
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index a6543e8f5..6e1431dce 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -249,7 +249,7 @@ xsk_configure_all(struct netdev *netdev)
     ifindex = linux_get_ifindex(netdev_get_name(netdev));
 
     n_rxq = netdev_n_rxq(netdev);
-    dev->xsks = xmalloc(n_rxq * sizeof(struct xsk_socket_info *));
+    dev->xsks = xzalloc(n_rxq * sizeof(struct xsk_socket_info *));
 
     /* configure each queue */
     for (i = 0; i < n_rxq; i++) {
---

This should prevent OVS from crash, however, I don't know why socket
creation fails in your case.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to