[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2020-07-14 Thread Guilherme G. Piccoli
** Changed in: linux (Ubuntu)
   Status: Confirmed => Fix Released

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-07-24 Thread Brad Figg
** Tags added: cscc

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Released

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-05-22 Thread Nivedita Singhvi
** Tags added: sts

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Released

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-04-02 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.4.0-145.171

---
linux (4.4.0-145.171) xenial; urgency=medium

  * linux: 4.4.0-145.171 -proposed tracker (LP: #1821724)

  * linux-generic should depend on linux-base >=4.1 (LP: #1820419)
- [Packaging] Fix linux-base dependency

linux (4.4.0-144.170) xenial; urgency=medium

  * linux: 4.4.0-144.170 -proposed tracker (LP: #1819660)

  * Packaging resync (LP: #1786013)
- [Packaging] resync getabis
- [Packaging] update helper scripts
- [Packaging] resync retpoline extraction

  * C++ demangling support missing from perf (LP: #1396654)
- [Packaging] fix a mistype

  * CVE-2019-9213
- mm: enforce min addr even if capable() in expand_downwards()

  * CVE-2019-3460
- Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt

  * Xenial update: 4.4.176 upstream stable release (LP: #1818815)
- net: fix IPv6 prefix route residue
- vsock: cope with memory allocation failure at socket creation time
- hwmon: (lm80) Fix missing unlock on error in set_fan_div()
- net: Fix for_each_netdev_feature on Big endian
- net: Add header for usage of fls64()
- tcp: tcp_v4_err() should be more careful
- net: Do not allocate page fragments that are not skb aligned
- tcp: clear icsk_backoff in tcp_write_queue_purge()
- vxlan: test dev->flags & IFF_UP before calling netif_rx()
- net: stmmac: Fix a race in EEE enable callback
- net: ipv4: use a dedicated counter for icmp_v4 redirect packets
- x86: livepatch: Treat R_X86_64_PLT32 as R_X86_64_PC32
- mfd: as3722: Handle interrupts on suspend
- mfd: as3722: Mark PM functions as __maybe_unused
- net/x25: do not hold the cpu too long in x25_new_lci()
- mISDN: fix a race in dev_expire_timer()
- ax25: fix possible use-after-free
- Linux 4.4.176

  * sky2 ethernet card don't work after returning from suspension
(LP: #1798921) // Xenial update: 4.4.176 upstream stable release
(LP: #1818815)
- sky2: Increase D3 delay again

  * Xenial update: 4.4.175 upstream stable release (LP: #1818813)
- drm/bufs: Fix Spectre v1 vulnerability
- staging: iio: adc: ad7280a: handle error from __ad7280_read32()
- ASoC: Intel: mrfld: fix uninitialized variable access
- scsi: lpfc: Correct LCB RJT handling
- ARM: 8808/1: kexec:offline panic_smp_self_stop CPU
- dlm: Don't swamp the CPU with callbacks queued during recovery
- x86/PCI: Fix Broadcom CNB20LE unintended sign extension (redux)
- powerpc/pseries: add of_node_put() in dlpar_detach_node()
- serial: fsl_lpuart: clear parity enable bit when disable parity
- ptp: check gettime64 return code in PTP_SYS_OFFSET ioctl
- staging:iio:ad2s90: Make probe handle spi_setup failure
- staging: iio: ad7780: update voltage on read
- ARM: OMAP2+: hwmod: Fix some section annotations
- modpost: validate symbol names also in find_elf_symbol
- perf tools: Add Hygon Dhyana support
- soc/tegra: Don't leak device tree node reference
- f2fs: move dir data flush to write checkpoint process
- f2fs: fix wrong return value of f2fs_acl_create
- sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN
- nfsd4: fix crash on writing v4_end_grace before nfsd startup
- arm64: ftrace: don't adjust the LR value
- ARM: dts: mmp2: fix TWSI2
- x86/fpu: Add might_fault() to user_insn()
- media: DaVinci-VPBE: fix error handling in vpbe_initialize()
- smack: fix access permissions for keyring
- usb: hub: delay hub autosuspend if USB3 port is still link training
- timekeeping: Use proper seqcount initializer
- ARM: dts: Fix OMAP4430 SDP Ethernet startup
- mips: bpf: fix encoding bug for mm_srlv32_op
- iommu/arm-smmu-v3: Use explicit mb() when moving cons pointer
- sata_rcar: fix deferred probing
- clk: imx6sl: ensure MMDC CH0 handshake is bypassed
- cpuidle: big.LITTLE: fix refcount leak
- i2c-axxia: check for error conditions first
- udf: Fix BUG on corrupted inode
- ARM: pxa: avoid section mismatch warning
- ASoC: fsl: Fix SND_SOC_EUKREA_TLV320 build error on i.MX8M
- memstick: Prevent memstick host from getting runtime suspended during card
  detection
- tty: serial: samsung: Properly set flags in autoCTS mode
- arm64: KVM: Skip MMIO insn after emulation
- powerpc/uaccess: fix warning/error with access_ok()
- mac80211: fix radiotap vendor presence bitmap handling
- xfrm6_tunnel: Fix spi check in __xfrm6_tunnel_alloc_spi
- Bluetooth: Fix unnecessary error message for HCI request completion
- cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan()
- drbd: narrow rcu_read_lock in drbd_sync_handshake
- drbd: disconnect, if the wrong UUIDs are attached on a connected peer
- drbd: skip spurious timeout (ping-timeo) when failing promote
- drbd: Avoid Clang warning about pointless switch statment
- video: 

[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-03-25 Thread Nivedita Singhvi
I am not sure we could deterministically provoke the 
issue. At the very least to ensure no other regression
was introduced, I would run it under heavy network load.

The environment in question which saw the issue had 
network load, contention for cpus and several other 
issues occur.

The basic environment is:

1. For any 25Gb NIC/chipset that requires the 4.4 bnxt_en_bpo
   driver, set its 2 ports/interfaces up in bonding mode 
   as follows:

bond-lacp-rate fast
bond-master bond0
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9000 

2. Run any heavy TCP network load test over the systems
   (e.g. iperf, netperf, file transfer, etc.)

3. Theoretically, it would appear that if the number of tx
   ring descriptors were lower, than that would be more
   likely to hit this (not successfully proven by testing
   here), but can lower it and see if that helps:

   # ethtool -G eno49 tx 128  // for example


I am not sure if that helps, Scott. I'll try and smoke
up more specific steps but I cannot guarantee you will
see the issue.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-03-25 Thread Scott Smith
Are there repro steps that can be passed along to test?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-03-22 Thread Nivedita Singhvi
Just briefly wanted to say that this is one we've discussed at 
length -- we may not be able to get someone who has the right
NIC to test with it in time. 

I'm sanity checking the kernel, but that is not exercising the 
key change here. 

If we could assume verification-done for our purposes here, 
that might be needed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-03-15 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
xenial' to 'verification-done-xenial'. If the problem still exists,
change the tag 'verification-needed-xenial' to 'verification-failed-
xenial'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-03-03 Thread Khaled El Mously
** Changed in: linux (Ubuntu Xenial)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-03-03 Thread Nivedita Singhvi
Terry,

We've had a lot of discussion over this bug. It does not have
a reliable reproducer, and I have not yet received any acks
on testing of the above. 

Our thinking was that it was still better to patch it since
it has been seen by the mainline driver as well and we'd like 
to avoid a re-occurrence of the situation. 

The need is to have the fix be available in the Xenial official
bits, for sure (rather than providing a temporary test kernel
via our ppa or something, for instance). 

FWIW, here are the boards in question:
enum board_idx {
BCM57301,
BCM57417_NPAR,
BCM58700,
BCM57311,
BCM57312,
BCM57402,
BCM57402_NPAR,
BCM57407,
BCM57412,
BCM57414,
BCM57416,
BCM57417,
BCM57412_NPAR,
BCM57314,
BCM57417_SFP,
BCM57416_SFP,
BCM57404_NPAR,
BCM57406_NPAR,
BCM57407_SFP,
BCM57407_NPAR,
BCM57414_NPAR,
BCM57416_NPAR,
BCM57452,
BCM57454,
NETXTREME_E_VF,
NETXTREME_C_VF,
};

Per conversation with Brad and Jay, it was agreed that patching
the bnxt_en_bpo driver only with this fix was the way to go, 
despite the lack of a reproducer, rather than pulling in an 
entire new driver from Broadcom as also potentially mulled over.


The FW version the issue was hit on: 
firmware-version: 20.8.163/1.8.4 pkg 20.08.04.03

But it might be best to test with latest available
firmware (214.0.166/1.9.2 pkg 21.40.16.6 or later).

Not sure if that helps? Let me know if I can address anything
else.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  In Progress

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-03-03 Thread Nivedita Singhvi
** Changed in: linux (Ubuntu Xenial)
   Status: Confirmed => In Progress

** Changed in: linux (Ubuntu Xenial)
 Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  In Progress

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-02-26 Thread Terry Rudd
Nivedita, per the request to test this patch, determining the correct FW
version seems an open issue.  There is also the issue of having hardware
available to properly test.

Have you been able to determine a reproducer for this bug?  
Do you know if anyone has been able to test the backport?  
Can you confirm if the request is to actually get the patch merged to xenial at 
this time?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Confirmed

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-02-22 Thread Nivedita Singhvi
** Changed in: linux (Ubuntu Xenial)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Confirmed

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-02-21 Thread Terry Rudd
** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  New

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-02-21 Thread Nivedita Singhvi
** Description changed:

+ [Impact]
+ 
+ The bnxt_en_bpo driver experienced tx timeouts causing the system to
+ experience network stalls and fail to send data and heartbeat packets.
+ 
  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):
  
  * The bnxt_en_po driver froze on a "TX timed out" error
-   and triggered the Netdev Watchdog timer under load. 
+   and triggered the Netdev Watchdog timer under load.
  
  * From kernel log:
-   "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
-   See attached kern.log excerpt file for full excerpt of error log.
+   "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
+   See attached kern.log excerpt file for full excerpt of error log.
  
- * Release = Xenial 
-   Kernel = 4.4.0-141-generic #167
-   eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet
-   
+ * Release = Xenial
+   Kernel = 4.4.0-141-generic #167
+   eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet
+ 
  * This caused the driver to reset in order to recover:
-   
-   "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting reset task!"
-  
-   driver: bnxt_en_bpo
-   version: 1.8.1
-   source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()
+ 
+   "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting reset
+ task!"
+ 
+   driver: bnxt_en_bpo
+   version: 1.8.1
+   source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()
  
  * The loss of connectivity and softirq stall caused other failures
-   on the system. 
+   on the system.
  
  * The bnxt_en_po driver is the imported Broadcom driver
-   pulled in to support newer Broadcom HW (specific boards)
-   while the bnx_en module continues to support the older
-   HW. The current Linux upstream driver does not compile
-   easily with the 4.4 kernel (too many changes). 
+   pulled in to support newer Broadcom HW (specific boards)
+   while the bnx_en module continues to support the older
+   HW. The current Linux upstream driver does not compile
+   easily with the 4.4 kernel (too many changes).
  
  * This upstream and bnxt_en driver fix is a likely solution:
-"bnxt_en: Fix TX timeout during netpoll"
-commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906
-   
-   This fix has not been applied to the bnxt_en_po driver
-   version, but review of the code indicates that it is 
-   susceptible to the bug, and the fix would be reasonable. 
+    "bnxt_en: Fix TX timeout during netpoll"
+    commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906
  
- * No easy way to reproduce this
+   This fix has not been applied to the bnxt_en_po driver
+   version, but review of the code indicates that it is
+   susceptible to the bug, and the fix would be reasonable.
+ 
+ [Test Case]
+ 
+ * Unfortunately, this is not easy to reproduce. Also, it is only seen on
+ 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
+ driver.
+ 
+ [Regression Potential]
+ 
+ * The patch is restricted to the bpo driver, with very constrained scope
+ - just the newest Broadcom NICs being used by the Xenial 4.4 kernel (as
+ opposed to the hwe 4.15 etc. kernels, which would have the in-tree fixed
+ driver).
+ 
+ * The patch is very small and backport is fairly minimal and simple.
+ 
+ * The fix has been running on the in-tree driver in upstream mainline as
+ well as the Ubuntu Linux in-tree driver, although the Broadcom driver
+ has a lot of lower level code that is different, this piece is still the
+ same.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW 

[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-02-21 Thread Nivedita Singhvi
If anyone is interested and willing to test a 4.4 kernel 
patched with the fix "bnxt_en: Fix TX timeout during netpoll"
backported to the bnxt_en_bpo driver, please find the packages
here:

http://people.canonical.com/~nivedita/bpo/

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
and triggered the Netdev Watchdog timer under load. 

  * From kernel log:
"NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial 
Kernel = 4.4.0-141-generic #167
eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

"bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting reset task!"
   
driver: bnxt_en_bpo
version: 1.8.1
source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
on the system. 

  * The bnxt_en_po driver is the imported Broadcom driver
pulled in to support newer Broadcom HW (specific boards)
while the bnx_en module continues to support the older
HW. The current Linux upstream driver does not compile
easily with the 4.4 kernel (too many changes). 

  * This upstream and bnxt_en driver fix is a likely solution:
 "bnxt_en: Fix TX timeout during netpoll"
 commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

This fix has not been applied to the bnxt_en_po driver
version, but review of the code indicates that it is 
susceptible to the bug, and the fix would be reasonable. 

  * No easy way to reproduce this

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-01-31 Thread Nivedita Singhvi
** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
and triggered the Netdev Watchdog timer under load. 

  * From kernel log:
"NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial 
Kernel = 4.4.0-141-generic #167
eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

"bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting reset task!"
   
driver: bnxt_en_bpo
version: 1.8.1
source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
on the system. 

  * The bnxt_en_po driver is the imported Broadcom driver
pulled in to support newer Broadcom HW (specific boards)
while the bnx_en module continues to support the older
HW. The current Linux upstream driver does not compile
easily with the 4.4 kernel (too many changes). 

  * This upstream and bnxt_en driver fix is a likely solution:
 "bnxt_en: Fix TX timeout during netpoll"
 commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

This fix has not been applied to the bnxt_en_po driver
version, but review of the code indicates that it is 
susceptible to the bug, and the fix would be reasonable. 

  * No easy way to reproduce this

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-01-31 Thread Nivedita Singhvi
Due to earlier NIC flapping observed on systems for the 
25Gb Broadcom NIC, with originally the following config,
the firmware was upgraded to avoid a known FW bug:

$ cat ethtool_-i_enp59s0f1d1
driver: bnxt_en_bpo
version: 1.8.1
firmware-version: 20.8.163/1.8.4 pkg 20.08.04.03
expansion-rom-version:
bus-info: :3b:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: no
supports-priv-flags: no 

The FW was upgraded on affected systems to:

$ cat ethtool_-i_eno2d1
driver: bnxt_en_bpo
version: 1.8.1
firmware-version: 214.0.166/1.9.2 pkg 21.40.16.6
expansion-rom-version: 
bus-info: :19:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: no
supports-priv-flags: no

Unfortunately, it's not quite clear which FW version the
current bug happened on (I believe the newer but can't 
confirm -- happened in the midst of several reboots)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
and triggered the Netdev Watchdog timer under load. 

  * From kernel log:
"NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial 
Kernel = 4.4.0-141-generic #167
eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

"bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting reset task!"
   
driver: bnxt_en_bpo
version: 1.8.1
source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
on the system. 

  * The bnxt_en_po driver is the imported Broadcom driver
pulled in to support newer Broadcom HW (specific boards)
while the bnx_en module continues to support the older
HW. The current Linux upstream driver does not compile
easily with the 4.4 kernel (too many changes). 

  * This upstream and bnxt_en driver fix is a likely solution:
 "bnxt_en: Fix TX timeout during netpoll"
 commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

This fix has not been applied to the bnxt_en_po driver
version, but review of the code indicates that it is 
susceptible to the bug, and the fix would be reasonable. 

  * No easy way to reproduce this

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-01-31 Thread Nivedita Singhvi
** Attachment added: "kern.log.excerpt-netdev-watchdog-timeout.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+attachment/5234643/+files/kern.log.excerpt-netdev-watchdog-timeout.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  New

Bug description:
  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
and triggered the Netdev Watchdog timer under load. 

  * From kernel log:
"NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial 
Kernel = 4.4.0-141-generic #167
eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

"bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting reset task!"
   
driver: bnxt_en_bpo
version: 1.8.1
source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
on the system. 

  * The bnxt_en_po driver is the imported Broadcom driver
pulled in to support newer Broadcom HW (specific boards)
while the bnx_en module continues to support the older
HW. The current Linux upstream driver does not compile
easily with the 4.4 kernel (too many changes). 

  * This upstream and bnxt_en driver fix is a likely solution:
 "bnxt_en: Fix TX timeout during netpoll"
 commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

This fix has not been applied to the bnxt_en_po driver
version, but review of the code indicates that it is 
susceptible to the bug, and the fix would be reasonable. 

  * No easy way to reproduce this

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp