Hello,
I am having problem with HPE Ethernet 100Gb 2-port 841QSFP28 Adapter which is a 
Mellanox adapter for 100G network.

The DPDK driver reports and generates lot of error files like 
dpdk_mlx5_port_0_rxq_0_2459159054 and loses traffic (because IMHO it must reset 
the card):

the first line of the error report files is as follows:

Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 
10040 rq_ci = 494774062 cq_ci = 3586794130
Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 
27509 rq_ci = 808774458 cq_ci = 1527072213
Unexpected CQE error syndrome 0x0e CQN = 1030 RQN = 12582977 wqe_counter = 0 
rq_ci = 32768 cq_ci = 2413356687
Unexpected CQE error syndrome 0xd4 CQN = 1030 RQN = 12582977 wqe_counter = 0 
rq_ci = 32768 cq_ci = 1527072220
Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 
60345 rq_ci = 242051992 cq_ci = 1769091515
Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 1138 
rq_ci = 619349053 cq_ci = 3152294540
Unexpected CQE error syndrome 0xa0 CQN = 1030 RQN = 12582977 wqe_counter = 0 
rq_ci = 32768 cq_ci = 897769578
Unexpected CQE error syndrome 0xf1 CQN = 1030 RQN = 12582977 wqe_counter = 0 
rq_ci = 32768 cq_ci = 1769091529
Unexpected CQE error syndrome 0x75 CQN = 1030 RQN = 12582977 wqe_counter = 0 
rq_ci = 32768 cq_ci = 3152294549
Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 
64529 rq_ci = 763919355 cq_ci = 2532978162
Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 5267 
rq_ci = 678728828 cq_ci = 3092052802
Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 
46035 rq_ci = 3556062128 cq_ci = 2413356673
Unexpected CQE error syndrome 0x73 CQN = 1030 RQN = 12582977 wqe_counter = 0 
rq_ci = 32768 cq_ci = 2532978172
Unexpected CQE error syndrome 0x40 CQN = 1030 RQN = 12582977 wqe_counter = 0 
rq_ci = 32768 cq_ci = 3092052808

I have tried latest card HP firmware and enable/disable CQE compression in the 
mlx5 DPDK driver using rxq_cqe_comp_en=0/1, but no improvement.

Does anybody know what can be the problem and how to mitigate it?

Thanks
Pavel Krauz

Reply via email to