SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2018-October/096376.html
** Also affects: linux (Ubuntu Cosmic)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Cosmic)
Status: New => In Progress
** Changed in: linux (Ubuntu Cosmic)
Importance: Undecided => Critical
** Changed in: linux (Ubuntu Cosmic)
Assignee: (unassigned) => Joseph Salisbury (jsalisbury)
** Description changed:
+
+ == SRU Justification ==
+ The requested commit fixes a regression introduce by mainline commit
+ 3a2f70331226, in v4.18-rc1. The commit is only needed in Cosmic. Do to
+ the regression, A Mellanox CX5 stops pinging with rx_wqe_err (mlx5_core)
+
+ == Fix ==
+ 37fdffb217a4 ("net/mlx5: WQ, fixes for fragmented WQ buffers API")
+
+ == Regression Potential ==
+ Low. This commit has been cc'd to stable, so it has had additional
+ upstream review.
+
+ == Test Case ==
+ A test kernel was built with this patch and tested by the original bug
reporter.
+ The bug reporter states the test kernel resolved the bug.
+
+
+
== Comment: #0 - Michael Ranweiler - 2018-10-18 11:34:40 ==
+ ---Problem Description---
+ At the system if u do
+ ethtool -S enP48p1s0f0 | grep wqe_err
+ rx_wqe_err: 1
+ rx0_wqe_err: 0
+ rx1_wqe_err: 0
+ rx2_wqe_err: 0
+ rx3_wqe_err: 1
+ rx4_wqe_err: 0
+ rx5_wqe_err: 0
+ rx6_wqe_err: 0
+ rx7_wqe_err: 0
+ rx8_wqe_err: 0
+ rx9_wqe_err: 0
+ rx10_wqe_err: 0
+ rx11_wqe_err: 0
+ rx12_wqe_err: 0
+ rx13_wqe_err: 0
+ rx14_wqe_err: 0
+ rx15_wqe_err: 0
- ---Problem Description---
- At the system if u do
- ethtool -S enP48p1s0f0 | grep wqe_err
- rx_wqe_err: 1
- rx0_wqe_err: 0
- rx1_wqe_err: 0
- rx2_wqe_err: 0
- rx3_wqe_err: 1
- rx4_wqe_err: 0
- rx5_wqe_err: 0
- rx6_wqe_err: 0
- rx7_wqe_err: 0
- rx8_wqe_err: 0
- rx9_wqe_err: 0
- rx10_wqe_err: 0
- rx11_wqe_err: 0
- rx12_wqe_err: 0
- rx13_wqe_err: 0
- rx14_wqe_err: 0
- rx15_wqe_err: 0
+ Will see that rx side is hitting issue.
- Will see that rx side is hitting issue.
-
-
---Additional Hardware Info---
Mellanox CX5 Ethernet 100G
lspci
0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family
[ConnectX-5 Ex]
0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family
[ConnectX-5 Ex]
-
-
- Machine Type = P9
-
+
+ Machine Type = P9
+
---Debugger---
A debugger is not configured
-
+
---Steps to Reproduce---
- Using a CX5 Ethernet 100G card
+ Using a CX5 Ethernet 100G card
lspci
0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family
[ConnectX-5 Ex]
0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family
[ConnectX-5 Ex]
- just configure IP
+ just configure IP
ifconfig enP48p1s0f0 33.33.33.33 netmask 255.255.255.0 up
then partner system configure IP and then try ping -f
ping -f 33.33.33.33
PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
........................................^C
--- 33.33.33.33 ping statistics ---
5413 packets transmitted, 5373 received, 0% packet loss, time 934ms
rtt min/avg/max/mdev = 0.015/0.019/0.669/0.010 ms, ipg/ewma 0.172/0.020 ms
# ping 33.33.33.33
PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
^C
--- 33.33.33.33 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1071ms
- then at the recv system then do
+ then at the recv system then do
ethtool -S enP48p1s0f0 | grep wqe_err
- rx_wqe_err: 1
- rx0_wqe_err: 0
- rx1_wqe_err: 0
- rx2_wqe_err: 0
- rx3_wqe_err: 1
- rx4_wqe_err: 0
- rx5_wqe_err: 0
- rx6_wqe_err: 0
- rx7_wqe_err: 0
- rx8_wqe_err: 0
- rx9_wqe_err: 0
- rx10_wqe_err: 0
- rx11_wqe_err: 0
- rx12_wqe_err: 0
- rx13_wqe_err: 0
- rx14_wqe_err: 0
- rx15_wqe_err: 0
- you will see rx_wqe_err with a counter non-zero.
+ rx_wqe_err: 1
+ rx0_wqe_err: 0
+ rx1_wqe_err: 0
+ rx2_wqe_err: 0
+ rx3_wqe_err: 1
+ rx4_wqe_err: 0
+ rx5_wqe_err: 0
+ rx6_wqe_err: 0
+ rx7_wqe_err: 0
+ rx8_wqe_err: 0
+ rx9_wqe_err: 0
+ rx10_wqe_err: 0
+ rx11_wqe_err: 0
+ rx12_wqe_err: 0
+ rx13_wqe_err: 0
+ rx14_wqe_err: 0
+ rx15_wqe_err: 0
+ you will see rx_wqe_err with a counter non-zero.
This is fixed by this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=37fdffb217a45609edccbb8b407d031143f551c0
== Comment: #1 - Carol L. Soto - 2018-10-18 11:46:00 ==
- I did a git clone to the cosmic tree and loaded the kernel in a system.
+ I did a git clone to the cosmic tree and loaded the kernel in a system.
kernel 4.18.12 and I can recreate it.
lspci | grep Mell | grep ConnectX-5
0000:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family
[ConnectX-5 Ex]
0000:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family
[ConnectX-5 Ex]
0030:01:00.0 Infiniband controller: Mellanox Technologies MT28800 Family
[ConnectX-5 Ex]
0030:01:00.1 Infiniband controller: Mellanox Technologies MT28800 Family
[ConnectX-5 Ex]
:~# ethtool -S enp1s0f0 | grep wqe_err
- rx_wqe_err: 2
- rx0_wqe_err: 1
- rx1_wqe_err: 1
- rx2_wqe_err: 0
- rx3_wqe_err: 0
- rx4_wqe_err: 0
- rx5_wqe_err: 0
- rx6_wqe_err: 0
- rx7_wqe_err: 0
- rx8_wqe_err: 0
- rx9_wqe_err: 0
- rx10_wqe_err: 0
+ rx_wqe_err: 2
+ rx0_wqe_err: 1
+ rx1_wqe_err: 1
+ rx2_wqe_err: 0
+ rx3_wqe_err: 0
+ rx4_wqe_err: 0
+ rx5_wqe_err: 0
+ rx6_wqe_err: 0
+ rx7_wqe_err: 0
+ rx8_wqe_err: 0
+ rx9_wqe_err: 0
+ rx10_wqe_err: 0
...
-
Let me check if the proposed patch needs backport or not.
== Comment: #3 - Carol L. Soto - 2018-10-18 13:34:46 ==
- I was able to apply the proposed patch as it to the cosmic git tree and no
issue. (no need to backport)
- using a kernel 4.18.12+.
+ I was able to apply the proposed patch as it to the cosmic git tree and no
issue. (no need to backport)
+ using a kernel 4.18.12+.
- With the proposed patch I do not see wqe err and ping does not stop.
+ With the proposed patch I do not see wqe err and ping does not stop.
ethtool -S enp1s0f0 | grep wqe_err
- rx_wqe_err: 0
- rx0_wqe_err: 0
- rx1_wqe_err: 0
- rx2_wqe_err: 0
- rx3_wqe_err: 0
- rx4_wqe_err: 0
- rx5_wqe_err: 0
- rx6_wqe_err: 0
- rx7_wqe_err: 0
- rx8_wqe_err: 0
- rx9_wqe_err: 0
- rx10_wqe_err: 0
+ rx_wqe_err: 0
+ rx0_wqe_err: 0
+ rx1_wqe_err: 0
+ rx2_wqe_err: 0
+ rx3_wqe_err: 0
+ rx4_wqe_err: 0
+ rx5_wqe_err: 0
+ rx6_wqe_err: 0
+ rx7_wqe_err: 0
+ rx8_wqe_err: 0
+ rx9_wqe_err: 0
+ rx10_wqe_err: 0
...
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1799393
Title:
Mellanox CX5 stops pinging with rx_wqe_err (mlx5_core)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1799393/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs