[Kernel-packages] [Bug 1815033] Re: qlcnic: Firmware aborts/hangs in QLogic NIC

2020-07-14 Thread Guilherme G. Piccoli
** Changed in: linux (Ubuntu)
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1815033

Title:
  qlcnic: Firmware aborts/hangs in QLogic NIC

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [Impact]

  * In multi-queue configurations for qlcnic driver, there is a corner case
in which TX queue zero is used at same time for regular data transmission
by one CPU while another uses the same queue descriptor for MAC 
configuration.

  * When such "race" indeed happens, it could lead to TX queue zero corruption,
triggering as net result firmware aborts/hangs out of nowhere. The following
kernel log messages were collected during the corruption event:

qlcnic :01:00.0: Pause control frames disabled on all ports
qlcnic :01:00.0: firmware hang detected
qlcnic :01:00.0: Dumping hw/fw registers
PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3de7a0,
PEG_NET_0_PC: 0x6d268, PEG_NET_1_PC: 0x6d2ac,
PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6e105,
PEG_NET_4_PC: 0x1e00b
[...]
qlcnic :01:00.0: Detected state change from DEV_NEED_RESET, skipping 
ack check

  * The following device is known to suffer from the issue (lspci output),
although a whole class of devices (named 82XX series from the vendor) are
susceptible to this:
01:00.0 Ethernet controller [0200]: QLogic Corp. cLOM8214 1/10GbE 
Controller [1077:8020]

  * The fix is the following patch, present in mainline kernel as well as
in supported stable branches:
c333fa0c4f22 ("qlcnic: fix Tx descriptor corruption on 82xx devices").
Link for the patch in Linus tree: http://git.kernel.org/linus/c333fa0c4f22

  [Test Case]

  * Unfortunately this is not easy to reproduce; we have a user report of
the issue with a pretty reliable reproducer - user is running a NFS
workload on top of the above PCI adapter. His problem goes away with
the patch proposed here to SRU. His problem happens in both kernels 4.4
and 4.15, and the patch fixes it for both of them.
(Notice this is a Bionic-only SRU, since Ubuntu 4.4 kernel got the patch
from Greg's supported stable branch).

  [Regression Potential]

  * The patch scope is restricted to a single driver, and the code itself
is self-contained - basically a restriction to specific tx_ring when
setting filters. There is potential for regressions in this path for
the driver which could cause different firmware issues for example,
but the user testing exhibited great reliability - without the patch
issue happens after ~6h of machine boot. With the patch the machine ran
for more than 8 days without issues.

  * Also the patch is present in mainline kernel as well as supported
stable branches, and is already present in Ubuntu 4.4 kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815033/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1815033] Re: qlcnic: Firmware aborts/hangs in QLogic NIC

2019-07-24 Thread Brad Figg
** Tags added: cscc

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1815033

Title:
  qlcnic: Firmware aborts/hangs in QLogic NIC

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [Impact]

  * In multi-queue configurations for qlcnic driver, there is a corner case
in which TX queue zero is used at same time for regular data transmission
by one CPU while another uses the same queue descriptor for MAC 
configuration.

  * When such "race" indeed happens, it could lead to TX queue zero corruption,
triggering as net result firmware aborts/hangs out of nowhere. The following
kernel log messages were collected during the corruption event:

qlcnic :01:00.0: Pause control frames disabled on all ports
qlcnic :01:00.0: firmware hang detected
qlcnic :01:00.0: Dumping hw/fw registers
PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3de7a0,
PEG_NET_0_PC: 0x6d268, PEG_NET_1_PC: 0x6d2ac,
PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6e105,
PEG_NET_4_PC: 0x1e00b
[...]
qlcnic :01:00.0: Detected state change from DEV_NEED_RESET, skipping 
ack check

  * The following device is known to suffer from the issue (lspci output),
although a whole class of devices (named 82XX series from the vendor) are
susceptible to this:
01:00.0 Ethernet controller [0200]: QLogic Corp. cLOM8214 1/10GbE 
Controller [1077:8020]

  * The fix is the following patch, present in mainline kernel as well as
in supported stable branches:
c333fa0c4f22 ("qlcnic: fix Tx descriptor corruption on 82xx devices").
Link for the patch in Linus tree: http://git.kernel.org/linus/c333fa0c4f22

  [Test Case]

  * Unfortunately this is not easy to reproduce; we have a user report of
the issue with a pretty reliable reproducer - user is running a NFS
workload on top of the above PCI adapter. His problem goes away with
the patch proposed here to SRU. His problem happens in both kernels 4.4
and 4.15, and the patch fixes it for both of them.
(Notice this is a Bionic-only SRU, since Ubuntu 4.4 kernel got the patch
from Greg's supported stable branch).

  [Regression Potential]

  * The patch scope is restricted to a single driver, and the code itself
is self-contained - basically a restriction to specific tx_ring when
setting filters. There is potential for regressions in this path for
the driver which could cause different firmware issues for example,
but the user testing exhibited great reliability - without the patch
issue happens after ~6h of machine boot. With the patch the machine ran
for more than 8 days without issues.

  * Also the patch is present in mainline kernel as well as supported
stable branches, and is already present in Ubuntu 4.4 kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815033/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1815033] Re: qlcnic: Firmware aborts/hangs in QLogic NIC

2019-04-02 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-47.50

---
linux (4.15.0-47.50) bionic; urgency=medium

  * linux: 4.15.0-47.50 -proposed tracker (LP: #1819716)

  * Packaging resync (LP: #1786013)
- [Packaging] resync getabis
- [Packaging] update helper scripts
- [Packaging] resync retpoline extraction

  * C++ demangling support missing from perf (LP: #1396654)
- [Packaging] fix a mistype

  * arm-smmu-v3 arm-smmu-v3.3.auto: CMD_SYNC timeout (LP: #1818162)
- iommu/arm-smmu-v3: Fix unexpected CMD_SYNC timeout

  * Crash in nvme_irq_check() when using threaded interrupts (LP: #1818747)
- nvme-pci: fix out of bounds access in nvme_cqe_pending

  * CVE-2019-9213
- mm: enforce min addr even if capable() in expand_downwards()

  * CVE-2019-3460
- Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt

  * amdgpu with mst WARNING on blanking (LP: #1814308)
- drm/amd/display: Don't use dc_link in link_encoder
- drm/amd/display: Move wait for hpd ready out from edp power control.
- drm/amd/display: eDP sequence BL off first then DP blank.
- drm/amd/display: Fix unused variable compilation error
- drm/amd/display: Fix warning about misaligned code
- drm/amd/display: Fix MST dp_blank REG_WAIT timeout

  * tun/tap: unable to manage carrier state from userland (LP: #1806392)
- tun: implement carrier change

  * CVE-2019-8980
- exec: Fix mem leak in kernel_read_file

  * raw_skew in timer from the ubuntu_kernel_selftests failed on Bionic
(LP: #1811194)
- selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET/other clock
  adjustments are in progress

  * [Packaging] Allow overlay of config annotations (LP: #1752072)
- [Packaging] config-check: Add an include directive

  * CVE-2019-7308
- bpf: move {prev_,}insn_idx into verifier env
- bpf: move tmp variable into ax register in interpreter
- bpf: enable access to ax register also from verifier rewrite
- bpf: restrict map value pointer arithmetic for unprivileged
- bpf: restrict stack pointer arithmetic for unprivileged
- bpf: restrict unknown scalars of mixed signed bounds for unprivileged
- bpf: fix check_map_access smin_value test when pointer contains offset
- bpf: prevent out of bounds speculation on pointer arithmetic
- bpf: fix sanitation of alu op with pointer / scalar type from different
  paths
- bpf: add various test cases to selftests

  * CVE-2017-5753
- bpf: properly enforce index mask to prevent out-of-bounds speculation
- bpf: fix inner map masking to prevent oob under speculation

  * BPF: kernel pointer leak to unprivileged userspace (LP: #1815259)
- bpf/verifier: disallow pointer subtraction

  * squashfs hardening (LP: #1816756)
- squashfs: more metadata hardening
- squashfs metadata 2: electric boogaloo
- squashfs: more metadata hardening
- Squashfs: Compute expected length from inode size rather than block length

  * efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted (LP: #1814982)
- efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted

  * Update ENA driver to version 2.0.3K (LP: #1816806)
- net: ena: update driver version from 2.0.2 to 2.0.3
- net: ena: fix race between link up and device initalization
- net: ena: fix crash during failed resume from hibernation

  * ipset kernel error: 4.15.0-43-generic (LP: #1811394)
- netfilter: ipset: Fix wraparound in hash:*net* types

  * Silent "Unknown key" message when pressing keyboard backlight hotkey
(LP: #1817063)
- platform/x86: dell-wmi: Ignore new keyboard backlight change event

  * CVE-2018-18021
- arm64: KVM: Tighten guest core register access from userspace
- KVM: arm/arm64: Introduce vcpu_el1_is_32bit
- arm64: KVM: Sanitize PSTATE.M when being set from userspace

  * CVE-2018-14678
- x86/entry/64: Remove %ebx handling from error_entry/exit

  * CVE-2018-19824
- ALSA: usb-audio: Fix UAF decrement if card has no live interfaces in 
card.c

  * CVE-2019-3459
- Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer

  * Bionic update: upstream stable patchset 2019-02-08 (LP: #1815234)
- fork: unconditionally clear stack on fork
- spi: spi-s3c64xx: Fix system resume support
- Input: elan_i2c - add ACPI ID for lenovo ideapad 330
- Input: i8042 - add Lenovo LaVie Z to the i8042 reset list
- Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST
- kvm, mm: account shadow page tables to kmemcg
- delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
- tracing: Fix double free of event_trigger_data
- tracing: Fix possible double free in event_enable_trigger_func()
- kthread, tracing: Don't expose half-written comm when creating kthreads
- tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
- tracing: Quiet gcc warning about maybe unused link 

[Kernel-packages] [Bug 1815033] Re: qlcnic: Firmware aborts/hangs in QLogic NIC

2019-03-27 Thread Guilherme G. Piccoli
Kernel was validated by the user that reported the issue - ran for more than 72h
with no problems.

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1815033

Title:
  qlcnic: Firmware aborts/hangs in QLogic NIC

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

  * In multi-queue configurations for qlcnic driver, there is a corner case
in which TX queue zero is used at same time for regular data transmission
by one CPU while another uses the same queue descriptor for MAC 
configuration.

  * When such "race" indeed happens, it could lead to TX queue zero corruption,
triggering as net result firmware aborts/hangs out of nowhere. The following
kernel log messages were collected during the corruption event:

qlcnic :01:00.0: Pause control frames disabled on all ports
qlcnic :01:00.0: firmware hang detected
qlcnic :01:00.0: Dumping hw/fw registers
PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3de7a0,
PEG_NET_0_PC: 0x6d268, PEG_NET_1_PC: 0x6d2ac,
PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6e105,
PEG_NET_4_PC: 0x1e00b
[...]
qlcnic :01:00.0: Detected state change from DEV_NEED_RESET, skipping 
ack check

  * The following device is known to suffer from the issue (lspci output),
although a whole class of devices (named 82XX series from the vendor) are
susceptible to this:
01:00.0 Ethernet controller [0200]: QLogic Corp. cLOM8214 1/10GbE 
Controller [1077:8020]

  * The fix is the following patch, present in mainline kernel as well as
in supported stable branches:
c333fa0c4f22 ("qlcnic: fix Tx descriptor corruption on 82xx devices").
Link for the patch in Linus tree: http://git.kernel.org/linus/c333fa0c4f22

  [Test Case]

  * Unfortunately this is not easy to reproduce; we have a user report of
the issue with a pretty reliable reproducer - user is running a NFS
workload on top of the above PCI adapter. His problem goes away with
the patch proposed here to SRU. His problem happens in both kernels 4.4
and 4.15, and the patch fixes it for both of them.
(Notice this is a Bionic-only SRU, since Ubuntu 4.4 kernel got the patch
from Greg's supported stable branch).

  [Regression Potential]

  * The patch scope is restricted to a single driver, and the code itself
is self-contained - basically a restriction to specific tx_ring when
setting filters. There is potential for regressions in this path for
the driver which could cause different firmware issues for example,
but the user testing exhibited great reliability - without the patch
issue happens after ~6h of machine boot. With the patch the machine ran
for more than 8 days without issues.

  * Also the patch is present in mainline kernel as well as supported
stable branches, and is already present in Ubuntu 4.4 kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815033/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1815033] Re: qlcnic: Firmware aborts/hangs in QLogic NIC

2019-03-15 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1815033

Title:
  qlcnic: Firmware aborts/hangs in QLogic NIC

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

  * In multi-queue configurations for qlcnic driver, there is a corner case
in which TX queue zero is used at same time for regular data transmission
by one CPU while another uses the same queue descriptor for MAC 
configuration.

  * When such "race" indeed happens, it could lead to TX queue zero corruption,
triggering as net result firmware aborts/hangs out of nowhere. The following
kernel log messages were collected during the corruption event:

qlcnic :01:00.0: Pause control frames disabled on all ports
qlcnic :01:00.0: firmware hang detected
qlcnic :01:00.0: Dumping hw/fw registers
PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3de7a0,
PEG_NET_0_PC: 0x6d268, PEG_NET_1_PC: 0x6d2ac,
PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6e105,
PEG_NET_4_PC: 0x1e00b
[...]
qlcnic :01:00.0: Detected state change from DEV_NEED_RESET, skipping 
ack check

  * The following device is known to suffer from the issue (lspci output),
although a whole class of devices (named 82XX series from the vendor) are
susceptible to this:
01:00.0 Ethernet controller [0200]: QLogic Corp. cLOM8214 1/10GbE 
Controller [1077:8020]

  * The fix is the following patch, present in mainline kernel as well as
in supported stable branches:
c333fa0c4f22 ("qlcnic: fix Tx descriptor corruption on 82xx devices").
Link for the patch in Linus tree: http://git.kernel.org/linus/c333fa0c4f22

  [Test Case]

  * Unfortunately this is not easy to reproduce; we have a user report of
the issue with a pretty reliable reproducer - user is running a NFS
workload on top of the above PCI adapter. His problem goes away with
the patch proposed here to SRU. His problem happens in both kernels 4.4
and 4.15, and the patch fixes it for both of them.
(Notice this is a Bionic-only SRU, since Ubuntu 4.4 kernel got the patch
from Greg's supported stable branch).

  [Regression Potential]

  * The patch scope is restricted to a single driver, and the code itself
is self-contained - basically a restriction to specific tx_ring when
setting filters. There is potential for regressions in this path for
the driver which could cause different firmware issues for example,
but the user testing exhibited great reliability - without the patch
issue happens after ~6h of machine boot. With the patch the machine ran
for more than 8 days without issues.

  * Also the patch is present in mainline kernel as well as supported
stable branches, and is already present in Ubuntu 4.4 kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815033/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1815033] Re: qlcnic: Firmware aborts/hangs in QLogic NIC

2019-02-18 Thread Khaled El Mously
** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1815033

Title:
  qlcnic: Firmware aborts/hangs in QLogic NIC

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

  * In multi-queue configurations for qlcnic driver, there is a corner case
in which TX queue zero is used at same time for regular data transmission
by one CPU while another uses the same queue descriptor for MAC 
configuration.

  * When such "race" indeed happens, it could lead to TX queue zero corruption,
triggering as net result firmware aborts/hangs out of nowhere. The following
kernel log messages were collected during the corruption event:

qlcnic :01:00.0: Pause control frames disabled on all ports
qlcnic :01:00.0: firmware hang detected
qlcnic :01:00.0: Dumping hw/fw registers
PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3de7a0,
PEG_NET_0_PC: 0x6d268, PEG_NET_1_PC: 0x6d2ac,
PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6e105,
PEG_NET_4_PC: 0x1e00b
[...]
qlcnic :01:00.0: Detected state change from DEV_NEED_RESET, skipping 
ack check

  * The following device is known to suffer from the issue (lspci output),
although a whole class of devices (named 82XX series from the vendor) are
susceptible to this:
01:00.0 Ethernet controller [0200]: QLogic Corp. cLOM8214 1/10GbE 
Controller [1077:8020]

  * The fix is the following patch, present in mainline kernel as well as
in supported stable branches:
c333fa0c4f22 ("qlcnic: fix Tx descriptor corruption on 82xx devices").
Link for the patch in Linus tree: http://git.kernel.org/linus/c333fa0c4f22

  [Test Case]

  * Unfortunately this is not easy to reproduce; we have a user report of
the issue with a pretty reliable reproducer - user is running a NFS
workload on top of the above PCI adapter. His problem goes away with
the patch proposed here to SRU. His problem happens in both kernels 4.4
and 4.15, and the patch fixes it for both of them.
(Notice this is a Bionic-only SRU, since Ubuntu 4.4 kernel got the patch
from Greg's supported stable branch).

  [Regression Potential]

  * The patch scope is restricted to a single driver, and the code itself
is self-contained - basically a restriction to specific tx_ring when
setting filters. There is potential for regressions in this path for
the driver which could cause different firmware issues for example,
but the user testing exhibited great reliability - without the patch
issue happens after ~6h of machine boot. With the patch the machine ran
for more than 8 days without issues.

  * Also the patch is present in mainline kernel as well as supported
stable branches, and is already present in Ubuntu 4.4 kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815033/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1815033] Re: qlcnic: Firmware aborts/hangs in QLogic NIC

2019-02-07 Thread Guilherme G. Piccoli
** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1815033

Title:
  qlcnic: Firmware aborts/hangs in QLogic NIC

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  In Progress

Bug description:
  [Impact]

  * In multi-queue configurations for qlcnic driver, there is a corner case
in which TX queue zero is used at same time for regular data transmission
by one CPU while another uses the same queue descriptor for MAC 
configuration.

  * When such "race" indeed happens, it could lead to TX queue zero corruption,
triggering as net result firmware aborts/hangs out of nowhere. The following
kernel log messages were collected during the corruption event:

qlcnic :01:00.0: Pause control frames disabled on all ports
qlcnic :01:00.0: firmware hang detected
qlcnic :01:00.0: Dumping hw/fw registers
PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3de7a0,
PEG_NET_0_PC: 0x6d268, PEG_NET_1_PC: 0x6d2ac,
PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6e105,
PEG_NET_4_PC: 0x1e00b
[...]
qlcnic :01:00.0: Detected state change from DEV_NEED_RESET, skipping 
ack check

  * The following device is known to suffer from the issue (lspci output),
although a whole class of devices (named 82XX series from the vendor) are
susceptible to this:
01:00.0 Ethernet controller [0200]: QLogic Corp. cLOM8214 1/10GbE 
Controller [1077:8020]

  * The fix is the following patch, present in mainline kernel as well as
in supported stable branches:
c333fa0c4f22 ("qlcnic: fix Tx descriptor corruption on 82xx devices").
Link for the patch in Linus tree: http://git.kernel.org/linus/c333fa0c4f22

  [Test Case]

  * Unfortunately this is not easy to reproduce; we have a user report of
the issue with a pretty reliable reproducer - user is running a NFS
workload on top of the above PCI adapter. His problem goes away with
the patch proposed here to SRU. His problem happens in both kernels 4.4
and 4.15, and the patch fixes it for both of them.
(Notice this is a Bionic-only SRU, since Ubuntu 4.4 kernel got the patch
from Greg's supported stable branch).

  [Regression Potential]

  * The patch scope is restricted to a single driver, and the code itself
is self-contained - basically a restriction to specific tx_ring when
setting filters. There is potential for regressions in this path for
the driver which could cause different firmware issues for example,
but the user testing exhibited great reliability - without the patch
issue happens after ~6h of machine boot. With the patch the machine ran
for more than 8 days without issues.

  * Also the patch is present in mainline kernel as well as supported
stable branches, and is already present in Ubuntu 4.4 kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815033/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1815033] Re: qlcnic: Firmware aborts/hangs in QLogic NIC

2019-02-07 Thread Stefan Bader
** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Bionic)
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1815033

Title:
  qlcnic: Firmware aborts/hangs in QLogic NIC

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  In Progress

Bug description:
  [Impact]

  * In multi-queue configurations for qlcnic driver, there is a corner case
in which TX queue zero is used at same time for regular data transmission
by one CPU while another uses the same queue descriptor for MAC 
configuration.

  * When such "race" indeed happens, it could lead to TX queue zero corruption,
triggering as net result firmware aborts/hangs out of nowhere. The following
kernel log messages were collected during the corruption event:

qlcnic :01:00.0: Pause control frames disabled on all ports
qlcnic :01:00.0: firmware hang detected
qlcnic :01:00.0: Dumping hw/fw registers
PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3de7a0,
PEG_NET_0_PC: 0x6d268, PEG_NET_1_PC: 0x6d2ac,
PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6e105,
PEG_NET_4_PC: 0x1e00b
[...]
qlcnic :01:00.0: Detected state change from DEV_NEED_RESET, skipping 
ack check

  * The following device is known to suffer from the issue (lspci output),
although a whole class of devices (named 82XX series from the vendor) are
susceptible to this:
01:00.0 Ethernet controller [0200]: QLogic Corp. cLOM8214 1/10GbE 
Controller [1077:8020]

  * The fix is the following patch, present in mainline kernel as well as
in supported stable branches:
c333fa0c4f22 ("qlcnic: fix Tx descriptor corruption on 82xx devices").
Link for the patch in Linus tree: http://git.kernel.org/linus/c333fa0c4f22

  [Test Case]

  * Unfortunately this is not easy to reproduce; we have a user report of
the issue with a pretty reliable reproducer - user is running a NFS
workload on top of the above PCI adapter. His problem goes away with
the patch proposed here to SRU. His problem happens in both kernels 4.4
and 4.15, and the patch fixes it for both of them.
(Notice this is a Bionic-only SRU, since Ubuntu 4.4 kernel got the patch
from Greg's supported stable branch).

  [Regression Potential]

  * The patch scope is restricted to a single driver, and the code itself
is self-contained - basically a restriction to specific tx_ring when
setting filters. There is potential for regressions in this path for
the driver which could cause different firmware issues for example,
but the user testing exhibited great reliability - without the patch
issue happens after ~6h of machine boot. With the patch the machine ran
for more than 8 days without issues.

  * Also the patch is present in mainline kernel as well as supported
stable branches, and is already present in Ubuntu 4.4 kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815033/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1815033] Re: qlcnic: Firmware aborts/hangs in QLogic NIC

2019-02-07 Thread Guilherme G. Piccoli
Patch was posted in the mailing-list for the SRU process:
https://lists.ubuntu.com/archives/kernel-team/2019-February/098380.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1815033

Title:
  qlcnic: Firmware aborts/hangs in QLogic NIC

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]

  * In multi-queue configurations for qlcnic driver, there is a corner case
in which TX queue zero is used at same time for regular data transmission
by one CPU while another uses the same queue descriptor for MAC 
configuration.

  * When such "race" indeed happens, it could lead to TX queue zero corruption,
triggering as net result firmware aborts/hangs out of nowhere. The following
kernel log messages were collected during the corruption event:

qlcnic :01:00.0: Pause control frames disabled on all ports
qlcnic :01:00.0: firmware hang detected
qlcnic :01:00.0: Dumping hw/fw registers
PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3de7a0,
PEG_NET_0_PC: 0x6d268, PEG_NET_1_PC: 0x6d2ac,
PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6e105,
PEG_NET_4_PC: 0x1e00b
[...]
qlcnic :01:00.0: Detected state change from DEV_NEED_RESET, skipping 
ack check

  * The following device is known to suffer from the issue (lspci output),
although a whole class of devices (named 82XX series from the vendor) are
susceptible to this:
01:00.0 Ethernet controller [0200]: QLogic Corp. cLOM8214 1/10GbE 
Controller [1077:8020]

  * The fix is the following patch, present in mainline kernel as well as
in supported stable branches:
c333fa0c4f22 ("qlcnic: fix Tx descriptor corruption on 82xx devices").
Link for the patch in Linus tree: http://git.kernel.org/linus/c333fa0c4f22

  [Test Case]

  * Unfortunately this is not easy to reproduce; we have a user report of
the issue with a pretty reliable reproducer - user is running a NFS
workload on top of the above PCI adapter. His problem goes away with
the patch proposed here to SRU. His problem happens in both kernels 4.4
and 4.15, and the patch fixes it for both of them.
(Notice this is a Bionic-only SRU, since Ubuntu 4.4 kernel got the patch
from Greg's supported stable branch).

  [Regression Potential]

  * The patch scope is restricted to a single driver, and the code itself
is self-contained - basically a restriction to specific tx_ring when
setting filters. There is potential for regressions in this path for
the driver which could cause different firmware issues for example,
but the user testing exhibited great reliability - without the patch
issue happens after ~6h of machine boot. With the patch the machine ran
for more than 8 days without issues.

  * Also the patch is present in mainline kernel as well as supported
stable branches, and is already present in Ubuntu 4.4 kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815033/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp