Hi, Sorry for the late response.
The missed packets are counter by internal NIC counter, attached to Rx queues. mlx5 can use 2 different approaches to create RxQ - either via rdma-core API (also known as "Verbs"), or using direct FW calls (sure, via kernel thunks) (also known as "DevX"). The way PMD chooses mostly depends on the dv_flow_en devargs, for the dv_flow_en=0, the legacy "Verbs" way is engaged. Once RxQ is being created with Verbs, the kernel can automatically attach the "out-of-buf" internal counter to the queue being created. For the DevX approach PMD should take explicit care about - allocate the counter and attach queues to it. If you are OK with mlx5_devx_cmd_queue_counter_query() (used for DevX) - try to force this way with dv_flow_en=1 (IIRC, this is a default since 20.02). I'm no so familiar if kernel requires RAWIO permission for this call. As for the sysfs entry access (used for Verbs) - it is unlikely will be changed. With best regards, Slava > -----Original Message----- > From: Daniel Östman <[email protected]> > Sent: Wednesday, November 8, 2023 2:56 PM > To: Maxime Coquelin <[email protected]>; Erez Ferber > <[email protected]>; Slava Ovsiienko <[email protected]> > Cc: [email protected]; Matan Azrad <[email protected]>; > [email protected] > Subject: RE: mlx5: imissed / out_of_buffer counter always 0 > > Hi, > > Any input from Nvidia on this? Matan perhaps? > The question here is if it's expected to require capability SYS_RAWIO just to > get the out of buffer counter? > If so, any plans on changing that? > > Best regards, > Daniel > > > -----Original Message----- > > From: Maxime Coquelin <[email protected]> > > Sent: Wednesday, 4 October 2023 15:49 > > To: Daniel Östman <[email protected]>; Erez Ferber > > <[email protected]>; Slava Ovsiienko <[email protected]> > > Cc: [email protected]; Matan Azrad <[email protected]>; > > [email protected] > > Subject: Re: mlx5: imissed / out_of_buffer counter always 0 > > > > Hi Daniel, Erez & Slava, > > > > My time to be sorry, I missed this email when coming back from vacation. > > > > On 8/18/23 14:04, Daniel Östman wrote: > > > Hi Maxime, > > > > > > Sorry for the late reply, I've been on vacation. > > > Please see my answer below. > > > > > > / Daniel > > > > > >> -----Original Message----- > > >> From: Maxime Coquelin <[email protected]> > > >> Sent: Thursday, 22 June 2023 17:48 > > >> To: Daniel Östman <[email protected]>; Erez Ferber > > >> <[email protected]>; Slava Ovsiienko <[email protected]> > > >> Cc: [email protected]; Matan Azrad <[email protected]>; > > >> [email protected] > > >> Subject: Re: mlx5: imissed / out_of_buffer counter always 0 > > >> > > >> Hi, > > >> > > >> On 6/21/23 22:22, Maxime Coquelin wrote: > > >>> Hi Daniel, all, > > >>> > > >>> On 6/5/23 16:00, Daniel Östman wrote: > > >>>> Hi Slava and Erez and thanks for your answers, > > >>>> > > >>>> Regarding the firmware, I’ve also deployed in a different > > >>>> OpenShift cluster were I see the exact same issue but with a > > >>>> different Mellanox > > >>>> NIC: > > >>>> > > >>>> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port > 100GbE > > >>>> QSFP56 PCIe Adapter > > >>>> > > >>>> driver: mlx5_core > > >>>> > > >>>> version: 5.0-0 > > >>>> firmware-version: 22.36.1010 (DEL0000000027) > > >>>> > > >>>> From what I can see the firmware is relatively new on that one? > > >>> > > >>> With below configuration: > > >>> - ConnectX-6 Dx MT2892 > > >>> - Kernel: 6.4.0-rc6 > > >>> - FW version: 22.35.1012 (MT_0000000528) > > >>> > > >>> The out-of-buffer counter is fetched via > > >>> mlx5_devx_cmd_queue_counter_query(): > > >>> > > >>> [pid 2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0 [pid > > >>> 2942] write(1, "\n ######################## NIC "..., 80) = 80 > > >>> [pid 2942] write(1, " RX-packets: 630997736 RX-miss"..., 70) = > > >>> 70 [pid 2942] write(1, " RX-errors: 0\n", 15) = 15 [pid 2942] > > >>> write(1, " > > >>> RX-nombuf: 0 \n", 25) = 25 [pid 2942] write(1, " > > >>> TX-packets: 0 TX-erro"..., 60) = 60 [pid 2942] write(1, > > >>> "\n", 1) = 1 [pid 2942] write(1, " Throughput (since > > >>> last show)\n", 31) = 31 [pid 2942] write(1, " Rx-pps: > > >>> 0 "..., 106) = 106 [pid 2942] write(1," > > >>> ##############################"..., 79) = 79 > > >>> > > >>> It looks like we may miss some mlx5 kernel patches so that we can > > >>> use > > >>> mlx5_devx_cmd_queue_counter_query() with RHEL? > > >>> > > >>> Erez, Slava, any idea on the patches that could be missing? > > >> > > >> Above test was on baremetal as root, I get the same "working" > > >> behaviour on RHEL as root. > > >> > > >> We managed to reproduce Daniel's with running the same within a > > >> container, enabling debug logs we have this warning: > > >> > > >> mlx5_common: DevX create q counter set failed errno=121 status=0x2 > > >> syndrome=0x8975f1 > > >> mlx5_net: Port 0 queue counter object cannot be created by DevX - > > >> fall-back to use the kernel driver global queue counter. > > >> > > >> Running the container as privileged solves the issue, and so does > > >> when adding SYS_RAWIO capability to the container. > > >> > > >> Erez, Slava, is that expected to require SYS_RAWIO just to get a > > >> stat > > counter? > > > > Erez & Slava, could it be possible to get the stats counters via devx > > without requiring SYS_RAWIO? > > > > >> > > >> Daniel, could you try adding SYS_RAWIO to your pod to confirm you > > >> face the same issue? > > > > > > Yes I can confirm what you are seeing when running in a cluster with > > Openshift 4.12 (RHEL 8.6) and with SYS_RAWIO or running as privileged. > > > But with privileged container I also need to run with UID 0 for it > > > to work, is > > that what you are doing as well? > > > > I don't have an OCP setup at hand right now to test it, but IIRC yes > > we ran it with UID 0. > > > > > In both these cases the counter can be successfully retrieved > > > through the > > DevX interface. > > > > Ok. > > > > > However, when running in a cluster with Openshift 4.10 (RHEL 8.4) I > > > can not > > get it to work with any of these two approaches. > > > > I'm not sure this is Kernel related, as I tested on both RHEL-8.4.0 > > and latest > > RHEL_8.4 and I can get que q counters via ioctl(). > > > > Maxime > > > > >> Thanks in advance, > > >> Maxime > > >>> Regards, > > >>> Maxime > > >>> > > >>>> > > >>>> I tried setting dv_flow_en=0 (and saw that it was propagated to > > >>>> config->dv_flow_en) but it didn’t seem to help. > > >>>> > > >>>> Erez, I’m not sure what you mean by shared or non-shared mode in > > >>>> this case, however it seems it could be related to the fact that > > >>>> the container is running in a separate network namespace. Because > > >>>> the hw_counter directory is available on the host (cluster node), > > >>>> but not in the pod container. > > >>>> > > >>>> Best regards, > > >>>> > > >>>> Daniel > > >>>> > > >>>> *From:*Erez Ferber <[email protected]> > > >>>> *Sent:* Monday, 5 June 2023 12:29 > > >>>> *To:* Slava Ovsiienko <[email protected]> > > >>>> *Cc:* Daniel Östman <[email protected]>; > > [email protected]; > > >>>> Matan Azrad <[email protected]>; [email protected]; > > >>>> [email protected] > > >>>> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0 > > >>>> > > >>>> Hi Daniel, > > >>>> > > >>>> is the container running in shared or non-shared mode ? > > >>>> > > >>>> For shared mode, I assume the kernel sysfs counters which DPDK > > >>>> relies on for imissed/out_of_buffer are not exposed. > > >>>> > > >>>> Best regards, > > >>>> > > >>>> Erez > > >>>> > > >>>> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko > > >>>> <[email protected] <mailto:[email protected]>> wrote: > > >>>> > > >>>> Hi, Daniel > > >>>> > > >>>> I would recommend to take the following action: > > >>>> > > >>>> - update the firmware, 16.33.xxxx looks to be outdated a little > > >>>> bit. > > >>>> Please, try 16.35.1012 or later. > > >>>> mlx5_glue->devx_obj_create might succeed with the newer FW. > > >>>> > > >>>> - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to > > >>>> use > > >>>> rdma_core library for queue management > > >>>> and kernel driver will be aware about Rx queues being > > >>>> created and > > >>>> attach them to the kernel counter set > > >>>> > > >>>> With best regards, > > >>>> Slava > > >>>> > > >>>> *From:*Daniel Östman <[email protected] > > >>>> <mailto:[email protected]>> > > >>>> *Sent:* Friday, June 2, 2023 3:59 PM > > >>>> *To:* [email protected] <mailto:[email protected]> > > >>>> *Cc:* Matan Azrad <[email protected] > > >>>> <mailto:[email protected]>>; > > >>>> Slava Ovsiienko <[email protected] > > >>>> <mailto:[email protected]>>; > [email protected] > > >>>> <mailto:[email protected]>; > > [email protected] > > >>>> <mailto:[email protected]> > > >>>> *Subject:* mlx5: imissed / out_of_buffer counter always 0 > > >>>> > > >>>> Hi, > > >>>> > > >>>> I’m deploying a containerized DPDK application in an > > >>>> OpenShift > > >>>> Kubernetes environment using DPDK 21.11.3. > > >>>> > > >>>> The application uses a Mellanox ConnectX-5 100G NIC through > VFs. > > >>>> > > >>>> The problem I have is that the ETH stats counter imissed > > >>>> (which > > >>>> seems to be mapped to “out_of_buffer” internally in mlx5 PMD > > >>>> driver) > > >>>> is 0 when I don’t expect it to be, i.e. when the application > > >>>> doesn’t > > >>>> read the packets fast enough. > > >>>> > > >>>> Using GDB I can see that it tries to access the counter > > >>>> through > > >>>> > > >>>> /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer > > >>>> but > > >>>> the hw_counters directory is missing so it will just return > > >>>> a zero > > >>>> value. I don’t know why it is missing. > > >>>> > > >>>> When looking at mlx5_os_read_dev_stat() I can see that there > > >>>> is an > > >>>> alternative way of reading the counter, through > > >>>> mlx5_devx_cmd_queue_counter_query() but under the condition > > >>>> that > > >>>> priv->q_counters are set. > > >>>> > > >>>> It doesn’t get set in my case because > > >>>> mlx5_glue->devx_obj_create() > > >>>> fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc(). > > >>>> > > >>>> Have I missed something? > > >>>> > > >>>> NIC info: > > >>>> > > >>>> Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb > > >>>> 2-port > > >>>> QSFP28 MCX516A-CCHT > > >>>> driver: mlx5_core > > >>>> version: 5.0-0 > > >>>> firmware-version: 16.33.1048 (MT_0000000417) > > >>>> > > >>>> Please let me know if I need to provide more information. > > >>>> > > >>>> Best regards, > > >>>> > > >>>> Daniel > > >>>>
