Hi Slava and Erez and thanks for your answers,

Regarding the firmware, I’ve also deployed in a different OpenShift cluster 
were I see the exact same issue but with a different Mellanox NIC:

Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE QSFP56 PCIe 
Adapter
driver: mlx5_core
version: 5.0-0
firmware-version: 22.36.1010 (DEL0000000027)

From what I can see the firmware is relatively new on that one?

I tried setting dv_flow_en=0 (and saw that it was propagated to 
config->dv_flow_en) but it didn’t seem to help.

Erez, I’m not sure what you mean by shared or non-shared mode in this case, 
however it seems it could be related to the fact that the container is running 
in a separate network namespace. Because the hw_counter directory is available 
on the host (cluster node), but not in the pod container.

Best regards,
Daniel

From: Erez Ferber <[email protected]>
Sent: Monday, 5 June 2023 12:29
To: Slava Ovsiienko <[email protected]>
Cc: Daniel Östman <[email protected]>; [email protected]; Matan Azrad 
<[email protected]>; [email protected]; [email protected]
Subject: Re: mlx5: imissed / out_of_buffer counter always 0

Hi Daniel,

is the container running in shared or non-shared mode ?
For shared mode, I assume the kernel sysfs counters which DPDK relies on for 
imissed/out_of_buffer are not exposed.

Best regards,
Erez

On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko 
<[email protected]<mailto:[email protected]>> wrote:
Hi, Daniel

I would recommend to take the following action:
- update the firmware, 16.33.xxxx looks to be outdated a little bit. Please, 
try 16.35.1012 or later.
  mlx5_glue->devx_obj_create might succeed with the newer FW.
- try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use rdma_core 
library for queue management
 and kernel driver will  be aware about Rx queues being created and attach them 
to the kernel counter set

With best regards,
Slava

From: Daniel Östman 
<[email protected]<mailto:[email protected]>>
Sent: Friday, June 2, 2023 3:59 PM
To: [email protected]<mailto:[email protected]>
Cc: Matan Azrad <[email protected]<mailto:[email protected]>>; Slava Ovsiienko 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>; 
[email protected]<mailto:[email protected]>
Subject: mlx5: imissed / out_of_buffer counter always 0

Hi,

I’m deploying a containerized DPDK application in an OpenShift Kubernetes 
environment using DPDK 21.11.3.
The application uses a Mellanox ConnectX-5 100G NIC through VFs.

The problem I have is that the ETH stats counter imissed (which seems to be 
mapped to “out_of_buffer” internally in mlx5 PMD driver) is 0 when I don’t 
expect it to be, i.e. when the application doesn’t read the packets fast enough.

Using GDB I can see that it tries to access the counter through 
/sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but the 
hw_counters directory is missing so it will just return a zero value. I don’t 
know why it is missing.
When looking at mlx5_os_read_dev_stat() I can see that there is an alternative 
way of reading the counter, through mlx5_devx_cmd_queue_counter_query() but 
under the condition that priv->q_counters are set.
It doesn’t get set in my case because mlx5_glue->devx_obj_create() fails (errno 
22) in mlx5_devx_cmd_queue_counter_alloc().

Have I missed something?

NIC info:
Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port QSFP28 
MCX516A-CCHT
driver: mlx5_core
version: 5.0-0
firmware-version: 16.33.1048 (MT_0000000417)

Please let me know if I need to provide more information.

Best regards,
Daniel

Reply via email to