Hi Daniel, all,

On 6/5/23 16:00, Daniel Östman wrote:
Hi Slava and Erez and thanks for your answers,

Regarding the firmware, I’ve also deployed in a different OpenShift cluster were I see the exact same issue but with a different Mellanox NIC:

Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE QSFP56 PCIe Adapter

driver: mlx5_core

version: 5.0-0
firmware-version: 22.36.1010 (DEL0000000027)

 From what I can see the firmware is relatively new on that one?

With below configuration:
- ConnectX-6 Dx MT2892
- Kernel: 6.4.0-rc6
- FW version: 22.35.1012 (MT_0000000528)

The out-of-buffer counter is fetched via mlx5_devx_cmd_queue_counter_query():

[pid  2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0
[pid  2942] write(1, "\n  ######################## NIC "..., 80) = 80
[pid  2942] write(1, "  RX-packets: 630997736  RX-miss"..., 70) = 70
[pid  2942] write(1, "  RX-errors: 0\n", 15) = 15
[pid  2942] write(1, "  RX-nombuf:  0         \n", 25) = 25
[pid  2942] write(1, "  TX-packets: 0          TX-erro"..., 60) = 60
[pid  2942] write(1, "\n", 1)           = 1
[pid  2942] write(1, "  Throughput (since last show)\n", 31) = 31
[pid  2942] write(1, "  Rx-pps:            0          "..., 106) = 106
[pid  2942] write(1, "  ##############################"..., 79) = 79

It looks like we may miss some mlx5 kernel patches so that we can use mlx5_devx_cmd_queue_counter_query() with RHEL?

Erez, Slava, any idea on the patches that could be missing?

Regards,
Maxime


I tried setting dv_flow_en=0 (and saw that it was propagated to config->dv_flow_en) but it didn’t seem to help.

Erez, I’m not sure what you mean by shared or non-shared mode in this case, however it seems it could be related to the fact that the container is running in a separate network namespace. Because the hw_counter directory is available on the host (cluster node), but not in the pod container.

Best regards,

Daniel

*From:*Erez Ferber <[email protected]>
*Sent:* Monday, 5 June 2023 12:29
*To:* Slava Ovsiienko <[email protected]>
*Cc:* Daniel Östman <[email protected]>; [email protected]; Matan Azrad <[email protected]>; [email protected]; [email protected]
*Subject:* Re: mlx5: imissed / out_of_buffer counter always 0

Hi Daniel,

is the container running in shared or non-shared mode ?

For shared mode, I assume the kernel sysfs counters which DPDK relies on for imissed/out_of_buffer are not exposed.

Best regards,

Erez

On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko <[email protected] <mailto:[email protected]>> wrote:

    Hi, Daniel

    I would recommend to take the following action:

    - update the firmware, 16.33.xxxx looks to be outdated a little bit.
    Please, try 16.35.1012 or later.
       mlx5_glue->devx_obj_create might succeed with the newer FW.

    - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use
    rdma_core library for queue management
      and kernel driver will  be aware about Rx queues being created and
    attach them to the kernel counter set

    With best regards,
    Slava

    *From:*Daniel Östman <[email protected]
    <mailto:[email protected]>>
    *Sent:* Friday, June 2, 2023 3:59 PM
    *To:* [email protected] <mailto:[email protected]>
    *Cc:* Matan Azrad <[email protected] <mailto:[email protected]>>;
    Slava Ovsiienko <[email protected]
    <mailto:[email protected]>>; [email protected]
    <mailto:[email protected]>; [email protected]
    <mailto:[email protected]>
    *Subject:* mlx5: imissed / out_of_buffer counter always 0

    Hi,

    I’m deploying a containerized DPDK application in an OpenShift
    Kubernetes environment using DPDK 21.11.3.

    The application uses a Mellanox ConnectX-5 100G NIC through VFs.

    The problem I have is that the ETH stats counter imissed (which
    seems to be mapped to “out_of_buffer” internally in mlx5 PMD driver)
    is 0 when I don’t expect it to be, i.e. when the application doesn’t
    read the packets fast enough.

    Using GDB I can see that it tries to access the counter through
    /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but
    the hw_counters directory is missing so it will just return a zero
    value. I don’t know why it is missing.

    When looking at mlx5_os_read_dev_stat() I can see that there is an
    alternative way of reading the counter, through
    mlx5_devx_cmd_queue_counter_query() but under the condition that
    priv->q_counters are set.

    It doesn’t get set in my case because mlx5_glue->devx_obj_create()
    fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().

    Have I missed something?

    NIC info:

    Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port
    QSFP28 MCX516A-CCHT
    driver: mlx5_core
    version: 5.0-0
    firmware-version: 16.33.1048 (MT_0000000417)

    Please let me know if I need to provide more information.

    Best regards,

    Daniel


Reply via email to