On 20.08.2019 14:19, Eelco Chaudron wrote: > > > On 20 Aug 2019, at 12:10, Ilya Maximets wrote: > >> On 14.08.2019 19:16, William Tu wrote: >>> On Wed, Aug 14, 2019 at 7:58 AM William Tu <u9012...@gmail.com> wrote: >>>> >>>> On Wed, Aug 14, 2019 at 5:09 AM Eelco Chaudron <echau...@redhat.com> wrote: >>>>> >>>>> >>>>> >>>>> On 8 Aug 2019, at 17:38, Ilya Maximets wrote: >>>>> >>>>> <SNIP> >>>>> >>>>>>>>> I see a rather high number of afxdp_cq_skip, which should to my >>>>>>>>> knowledge never happen? >>>>>>>> >>>>>>>> I tried to investigate this previously, but didn't find anything >>>>>>>> suspicious. >>>>>>>> So, for my knowledge, this should never happen too. >>>>>>>> However, I only looked at the code without actually running, because >>>>>>>> I had no >>>>>>>> HW available for testing. >>>>>>>> >>>>>>>> While investigation and stress-testing virtual ports I found few >>>>>>>> issues with >>>>>>>> missing locking inside the kernel, so there is no trust for kernel >>>>>>>> part of XDP >>>>>>>> implementation from my side. I'm suspecting that there are some >>>>>>>> other bugs in >>>>>>>> kernel/libbpf that only could be reproduced with driver mode. >>>>>>>> >>>>>>>> This never happens for virtual ports with SKB mode, so I never saw >>>>>>>> this coverage >>>>>>>> counter being non-zero. >>>>>>> >>>>>>> Did some quick debugging, as something else has come up that needs my >>>>>>> attention :) >>>>>>> >>>>>>> But once I’m in a faulty state and sent a single packet, causing >>>>>>> afxdp_complete_tx() to be called, it tells me 2048 descriptors are >>>>>>> ready, which is XSK_RING_PROD__DEFAULT_NUM_DESCS. So I guess that >>>>>>> there might be some ring management bug. Maybe consumer and receiver >>>>>>> are equal meaning 0 buffers, but it returns max? I did not look at >>>>>>> the kernel code, so this is just a wild guess :) >>>>>>> >>>>>>> (gdb) p tx_done >>>>>>> $3 = 2048 >>>>>>> >>>>>>> (gdb) p umem->cq >>>>>>> $4 = {cached_prod = 3830466864, cached_cons = 3578066899, mask = >>>>>>> 2047, size = 2048, producer = 0x7f08486b8000, consumer = >>>>>>> 0x7f08486b8040, ring = 0x7f08486b8080} >>>>>> >>>>>> Thanks for debugging! >>>>>> >>>>>> xsk_ring_cons__peek() just returns the difference between cached_prod >>>>>> and cached_cons, but these values are too different: >>>>>> >>>>>> 3830466864 - 3578066899 = 252399965 >>>>>> >>>>>> Since this value > requested, it returns requested number (2048). >>>>>> >>>>>> So, the ring is broken. At least broken its 'cached' part. It'll be >>>>>> good >>>>>> to look at *consumer and *producer values to verify the state of the >>>>>> actual ring. >>>>>> >>>>> >>>>> I’ll try to find some more time next week to debug further. >>>>> >>>>> William I noticed your email in xdp-newbies where you mention this >>>>> problem of getting the wrong pointers. Did you ever follow up, or did >>>>> further trouble shooting on the above? >>>> >>>> Yes, I posted here >>>> https://www.spinics.net/lists/xdp-newbies/msg00956.html >>>> "Question/Bug about AF_XDP idx_cq from xsk_ring_cons__peek?" >>>> >>>> At that time I was thinking about reproducing the problem using the >>>> xdpsock sample code from kernel. But turned out that my reproduction >>>> code is not correct, so not able to show the case we hit here in OVS. >>>> >>>> Then I put more similar code logic from OVS to xdpsock, but the problem >>>> does not show up. As a result, I worked around it by marking addr as >>>> "*addr == UINT64_MAX". >>>> >>>> I will debug again this week once I get my testbed back. >>>> >>> Just to refresh my memory. The original issue is that >>> when calling: >>> tx_done = xsk_ring_cons__peek(&umem->cq, CONS_NUM_DESCS, &idx_cq); >>> xsk_ring_cons__release(&umem->cq, tx_done); >>> >>> I expect there are 'tx_done' elems on the CQ to re-cycle back to memory >>> pool. >>> However, when I inspect these elems, I found some elems that 'already' been >>> reported complete last time I call xsk_ring_cons__peek. In other word, some >>> elems show up at CQ twice. And this cause overflow of the mempool. >>> >>> Thus, mark the elems on CQ as UINT64_MAX to indicate that we already >>> seen this elem. >> >> William, Eelco, which HW NIC you're using? Which kernel driver? > > I’m using the below on the latest bpf-next driver: > > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ > Network Connection (rev 01) > 01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ > Network Connection (rev 01)
Thanks for information. I found one suspicious place inside the ixgbe driver that could break the completion queue ring and prepared a patch: https://patchwork.ozlabs.org/patch/1150244/ It'll be good if you can test it. Best regards, Ilya Maximets. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev