Hello, I am having problem with HPE Ethernet 100Gb 2-port 841QSFP28 Adapter which is a Mellanox adapter for 100G network.
The DPDK driver reports and generates lot of error files like dpdk_mlx5_port_0_rxq_0_2459159054 and loses traffic (because IMHO it must reset the card): the first line of the error report files is as follows: Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 10040 rq_ci = 494774062 cq_ci = 3586794130 Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 27509 rq_ci = 808774458 cq_ci = 1527072213 Unexpected CQE error syndrome 0x0e CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 2413356687 Unexpected CQE error syndrome 0xd4 CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 1527072220 Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 60345 rq_ci = 242051992 cq_ci = 1769091515 Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 1138 rq_ci = 619349053 cq_ci = 3152294540 Unexpected CQE error syndrome 0xa0 CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 897769578 Unexpected CQE error syndrome 0xf1 CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 1769091529 Unexpected CQE error syndrome 0x75 CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 3152294549 Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 64529 rq_ci = 763919355 cq_ci = 2532978162 Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 5267 rq_ci = 678728828 cq_ci = 3092052802 Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 46035 rq_ci = 3556062128 cq_ci = 2413356673 Unexpected CQE error syndrome 0x73 CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 2532978172 Unexpected CQE error syndrome 0x40 CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 3092052808 I have tried latest card HP firmware and enable/disable CQE compression in the mlx5 DPDK driver using rxq_cqe_comp_en=0/1, but no improvement. Does anybody know what can be the problem and how to mitigate it? Thanks Pavel Krauz
