** Description changed:

+ [Impact]
+ 
+ The bnxt_en_bpo driver experienced tx timeouts causing the system to
+ experience network stalls and fail to send data and heartbeat packets.
+ 
  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):
  
  * The bnxt_en_po driver froze on a "TX timed out" error
-   and triggered the Netdev Watchdog timer under load. 
+   and triggered the Netdev Watchdog timer under load.
  
  * From kernel log:
-   "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
-   See attached kern.log excerpt file for full excerpt of error log.
+   "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
+   See attached kern.log excerpt file for full excerpt of error log.
  
- * Release = Xenial 
-   Kernel = 4.4.0-141-generic #167
-   eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet
-   
+ * Release = Xenial
+   Kernel = 4.4.0-141-generic #167
+   eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet
+ 
  * This caused the driver to reset in order to recover:
-   
-   "bnxt_en_bpo 0000:19:00.1 eno2d1: TX timeout detected, starting reset task!"
-  
-   driver: bnxt_en_bpo
-   version: 1.8.1
-   source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()
+ 
+   "bnxt_en_bpo 0000:19:00.1 eno2d1: TX timeout detected, starting reset
+ task!"
+ 
+   driver: bnxt_en_bpo
+   version: 1.8.1
+   source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()
  
  * The loss of connectivity and softirq stall caused other failures
-   on the system. 
+   on the system.
  
  * The bnxt_en_po driver is the imported Broadcom driver
-   pulled in to support newer Broadcom HW (specific boards)
-   while the bnx_en module continues to support the older
-   HW. The current Linux upstream driver does not compile
-   easily with the 4.4 kernel (too many changes). 
+   pulled in to support newer Broadcom HW (specific boards)
+   while the bnx_en module continues to support the older
+   HW. The current Linux upstream driver does not compile
+   easily with the 4.4 kernel (too many changes).
  
  * This upstream and bnxt_en driver fix is a likely solution:
-    "bnxt_en: Fix TX timeout during netpoll"
-    commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906
-   
-   This fix has not been applied to the bnxt_en_po driver
-   version, but review of the code indicates that it is 
-   susceptible to the bug, and the fix would be reasonable. 
+    "bnxt_en: Fix TX timeout during netpoll"
+    commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906
  
- * No easy way to reproduce this
+   This fix has not been applied to the bnxt_en_po driver
+   version, but review of the code indicates that it is
+   susceptible to the bug, and the fix would be reasonable.
+ 
+ [Test Case]
+ 
+ * Unfortunately, this is not easy to reproduce. Also, it is only seen on
+ 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
+ driver.
+ 
+ [Regression Potential]
+ 
+ * The patch is restricted to the bpo driver, with very constrained scope
+ - just the newest Broadcom NICs being used by the Xenial 4.4 kernel (as
+ opposed to the hwe 4.15 etc. kernels, which would have the in-tree fixed
+ driver).
+ 
+ * The patch is very small and backport is fairly minimal and simple.
+ 
+ * The fix has been running on the in-tree driver in upstream mainline as
+ well as the Ubuntu Linux in-tree driver, although the Broadcom driver
+ has a lot of lower level code that is different, this piece is still the
+ same.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to