[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-02-04 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.18.0-14.15

---
linux (4.18.0-14.15) cosmic; urgency=medium

  * linux: 4.18.0-14.15 -proposed tracker (LP: #1811406)

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
- blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
- blk-wbt: move disable check into get_limit()
- blk-wbt: use wq_has_sleeper() for wq active check
- blk-wbt: fix has-sleeper queueing check
- blk-wbt: abstract out end IO completion handler
- blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
- mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
- mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
- mmc: rtsx_usb_sdmmc: Re-work runtime PM support
- mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
- memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
- misc: rtsx_usb: Use USB remote wakeup signaling for card insertion 
detection
- memstick: Prevent memstick host from getting runtime suspended during card
  detection
- memstick: rtsx_usb_ms: Use ms_dev() helper
- memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
- iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
- iommu/arm-smmu-v3: Implement flush_iotlb_all hook
- iommu/dma: Add support for non-strict mode
- iommu: Add "iommu.strict" command line option
- iommu/io-pgtable-arm: Add support for non-strict mode
- iommu/arm-smmu-v3: Add support for non-strict mode
- iommu/io-pgtable-arm-v7s: Add support for non-strict mode
- iommu/arm-smmu: Support non-strict mode

  * [Regression] crashkernel fails on HiSilicon D05 (LP: #1806766)
- efi: honour memory reservations passed via a linux specific config table
- efi/arm: libstub: add a root memreserve config table
- efi: add API to reserve memory persistently across kexec reboot
- irqchip/gic-v3-its: Change initialization ordering for LPIs
- irqchip/gic-v3-its: Simplify LPI_PENDBASE_SZ usage
- irqchip/gic-v3-its: Split property table clearing from allocation
- irqchip/gic-v3-its: Move pending table allocation to init time
- irqchip/gic-v3-its: Keep track of property table's PA and VA
- irqchip/gic-v3-its: Allow use of pre-programmed LPI tables
- irqchip/gic-v3-its: Use pre-programmed redistributor tables with kdump
  kernels
- irqchip/gic-v3-its: Check that all RDs have the same property table
- irqchip/gic-v3-its: Register LPI tables with EFI config table
- irqchip/gic-v3-its: Allow use of LPI tables in reserved memory
- arm64: memblock: don't permit memblock resizing until linear mapping is up
- efi/arm: Defer persistent reservations until after paging_init()
- efi: Permit calling efi_mem_reserve_persistent() from atomic context
- efi: Prevent GICv3 WARN() by mapping the memreserve table before first use

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
- pinctrl: cannonlake: Fix community ordering for H variant
- pinctrl: cannonlake: Fix HOSTSW_OWN register offset of H variant

  * Add Cavium ThunderX2 SoC UNCORE PMU driver (LP: #1811200)
- Documentation: perf: Add documentation for ThunderX2 PMU uncore driver
- drivers/perf: Add Cavium ThunderX2 SoC UNCORE PMU driver
- [Config] New config CONFIG_THUNDERX2_PMU=m

  * iptables connlimit allows more connections than the limit when using
multiple CPUs (LP: #1811094)
- netfilter: nf_conncount: don't skip eviction when age is negative

  * CVE-2018-16882
- KVM: Fix UAF in nested posted interrupt processing

  * Cannot initialize ATA disk if IDENTIFY command fails (LP: #1809046)
- scsi: libsas: check the ata device status by ata_dev_enabled()

  * scsi: libsas: fix a race condition when smp task timeout (LP: #1808912)
- scsi: libsas: fix a race condition when smp task timeout

  * CVE-2018-14625
- vhost/vsock: fix use-after-free in network stack callers

  * Fix and issue that LG I2C touchscreen stops working after reboot
(LP: #1805085)
- HID: i2c-hid: Disable runtime PM for LG touchscreen

  * Drivers: hv: vmbus: Offload the handling of channels to two workqueues
(LP: #1807757)
- Drivers: hv: vmbus: check the creation_status in vmbus_establish_gpadl()
- Drivers: hv: vmbus: Offload the handling of channels to two workqueues

  * Disable LPM for Raydium Touchscreens (LP: #1802248)
- USB: quirks: Add no-lpm quirk for Raydium touchscreens

  * Power leakage at S5 with Qualcomm Atheros QCA9377 802.11ac Wireless Network
Adapter (LP: #1805607)
- SAUCE: ath10k: provide reset function for QCA9377 chip

  * CVE-2018-19407
- KVM: X86: Fix scan ioapic use-before-initialization

  * Fix USB2 device wrongly detected as USB1 (LP: #1806534)
- xhci: Add quirk to workaround the 

[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-28 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-44.47

---
linux (4.15.0-44.47) bionic; urgency=medium

  * linux: 4.15.0-44.47 -proposed tracker (LP: #1811419)

  * Packaging resync (LP: #1786013)
- [Packaging] update helper scripts

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
- blk-wbt: pass in enum wbt_flags to get_rq_wait()
- blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
- blk-wbt: move disable check into get_limit()
- blk-wbt: use wq_has_sleeper() for wq active check
- blk-wbt: fix has-sleeper queueing check
- blk-wbt: abstract out end IO completion handler
- blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
- mmc: sdhci: Disable 1.8v modes (HS200/HS400/UHS) if controller can't 
support
  1.8v
- mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
- mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
- mmc: rtsx_usb: Use MMC_CAP2_NO_SDIO
- mmc: rtsx_usb: Enable MMC_CAP_ERASE to allow erase/discard/trim requests
- mmc: rtsx_usb_sdmmc: Re-work runtime PM support
- mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
- memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
- misc: rtsx_usb: Use USB remote wakeup signaling for card insertion 
detection
- memstick: Prevent memstick host from getting runtime suspended during card
  detection
- memstick: rtsx_usb_ms: Use ms_dev() helper
- memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
- iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
- iommu/arm-smmu-v3: Implement flush_iotlb_all hook
- iommu/dma: Add support for non-strict mode
- iommu: Add "iommu.strict" command line option
- iommu/io-pgtable-arm: Add support for non-strict mode
- iommu/arm-smmu-v3: Add support for non-strict mode
- iommu/io-pgtable-arm-v7s: Add support for non-strict mode
- iommu/arm-smmu: Support non-strict mode

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
- pinctrl: cannonlake: Fix community ordering for H variant
- pinctrl: cannonlake: Fix HOSTSW_OWN register offset of H variant

  * Add Cavium ThunderX2 SoC UNCORE PMU driver (LP: #1811200)
- perf: Export perf_event_update_userpage
- Documentation: perf: Add documentation for ThunderX2 PMU uncore driver
- drivers/perf: Add Cavium ThunderX2 SoC UNCORE PMU driver
- [Config] New config CONFIG_THUNDERX2_PMU=m

  * Update hisilicon SoC-specific drivers (LP: #1810457)
- SAUCE: Revert "net: hns3: Updates RX packet info fetch in case of multi 
BD"
- Revert "UBUNTU: SAUCE: {topost} net: hns3: separate roce from nic when
  resetting"
- Revert "UBUNTU: SAUCE: {topost} net: hns3: Use roce handle when calling 
roce
  callback function"
- Revert "UBUNTU: SAUCE: {topost} net: hns3: Add calling roce callback
  function when link status change"
- Revert "UBUNTU: SAUCE: {topost} net: hns3: optimize the process of 
notifying
  roce client"
- Revert "UBUNTU: SAUCE: {topost} net: hns3: Add pf reset for hip08 RoCE"
- scsi: hisi_sas: Remove depends on HAS_DMA in case of platform dependency
- ethernet: hisilicon: hns: hns_dsaf_mac: Use generic eth_broadcast_addr
- scsi: hisi_sas: consolidate command check in hisi_sas_get_ata_protocol()
- scsi: hisi_sas: remove some unneeded structure members
- scsi: hisi_sas: Introduce hisi_sas_phy_set_linkrate()
- net: hns: Fix the process of adding broadcast addresses to tcam
- net: hns3: remove redundant variable 'protocol'
- scsi: hisi_sas: Drop hisi_sas_slot_abort()
- net: hns: Make many functions static
- net: hns: make hns_dsaf_roce_reset non static
- net: hisilicon: hns: Replace mdelay() with msleep()
- net: hns3: fix return value error while hclge_cmd_csq_clean failed
- net: hns: remove redundant variables 'max_frm' and 'tmp_mac_key'
- net: hns: Mark expected switch fall-through
- net: hns3: Mark expected switch fall-through
- net: hns3: Remove tx ring BD len register in hns3_enet
- net: hns: modify variable type in hns_nic_reuse_page
- net: hns: use eth_get_headlen interface instead of hns_nic_get_headlen
- net: hns3: modify variable type in hns3_nic_reuse_page
- net: hns3: Fix for vf vlan delete failed problem
- net: hns3: Fix for multicast failure
- net: hns3: Fix error of checking used vlan id
- net: hns3: Implement shutdown ops in hns3 pci driver
- net: hns3: Fix for loopback selftest failed problem
- net: hns3: Fix ping exited problem when doing lp selftest
- net: hns3: Preserve vlan 0 in hardware table
- net: hns3: Only update mac configuation when necessary
- net: hns3: Change the dst mac addr of loopback packet
- net: hns3: Remove redundant codes of query 

[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-18 Thread Mauricio Faria de Oliveira
Verification done on Bionic, with the HWE kernel in Xenial 
(i.e., 4.15.0-44.47~16.04.1 per the original reporter's environment)

The mpt3sas driver is running correctly -- the sosreport shows the
previous kernel had mpt3sas fault_state error messages repeatedly within
less than 10 minutes, and the current kernel has zero in ~76 minutes
(~4600 seconds uptime in kern.log).

$ grep -e 'Linux version' -e 'mpt3sas.*fault_state' sosreport-/var/log/kern.log
...
Jan 18 04:32:36  kernel: [18225729.321846] mpt3sas_cm0: 
fault_state(0x2100)!
Jan 18 04:41:08  kernel: [18226240.928889] mpt3sas_cm0: 
fault_state(0x2100)!
Jan 18 04:48:47  kernel: [18226700.312831] mpt3sas_cm0: 
fault_state(0x2100)!
Jan 18 04:57:29  kernel: [18227222.159601] mpt3sas_cm0: 
fault_state(0x2100)!
Jan 18 05:05:46  kernel: [18227719.430826] mpt3sas_cm0: 
fault_state(0x2100)!
Jan 18 05:12:52  kernel: [18228145.023317] mpt3sas_cm0: 
fault_state(0x2100)!
Jan 18 05:17:22  kernel: [18228414.970544] mpt3sas_cm0: 
fault_state(0x2100)!
Jan 18 05:22:22  kernel: [18228714.613254] mpt3sas_cm0: 
fault_state(0x2100)!
Jan 18 05:26:57  kernel: [18228989.680424] mpt3sas_cm0: 
fault_state(0x2100)!
Jan 18 05:36:14  kernel: [ 0.00] Linux version 
4.15.0-44-generic (buildd@lcy01-amd64-025) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.10)) #47~16.04.1-Ubuntu SMP Mon Jan 14 20:50:30 UTC 2019 
(Ubuntu 4.15.0-44.47~16.04.1-generic 4.15.18)

$ tail -n1 sosreport-/var/log/kern.log
Jan 18 06:52:36  kernel: [ 4613.908291] perf: interrupt took 
too long (3958 > 3952), lowering kernel.perf_event_max_sample_rate to 50500

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Won't Fix
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * Adapter resets periodically during high-load activity.

  * I/O stalls until reset/reinit is complete (latency) and I/O performance
  degrades across cluster (e.g., low throughput from data spread over nodes).

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

    mpt3sas_cm0: fault_state(0x2100)!
    mpt3sas_cm0: sending diag reset !!
    mpt3sas_cm0: diag reset: SUCCESS
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue.

  * We have reports that the adapter "LSI Logic / Symbios Logic Device
  [1000:00ac]" is affected by the issue.  And this commit resolved the
  problem.

  [Regression Potential]

  * This is a long-term issue from the mpt3sas driver, affecting only a
  class of adapters of this vendor. Since it's a clearly bug, the fix is
  necessary. The potential of regressions is unknown, but likely low -
  it changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which
  restricts even more the scope of this patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-17 Thread Mauricio Faria de Oliveira
Verification done on Cosmic for regression on an older adapter model,
I/O stress (iozone) finishes successfully, no errors seen in dmesg.

Waiting for verification on Bionic by the reporter.

root@dixie:~# fdisk /dev/sdb # create one partition
root@dixie:~# mkfs.ext4 /dev/sdb1
root@dixie:~# mount /dev/sdb1 /test/
root@dixie:~# cd /test
root@dixie:/test#  iozone -R -s 2G -r 1m -S 2048 -i 0 -i 2 -i 8 -G -c -o -l 128 
-u 128 -t 128
root@dixie:/test# dmesg | tail
<...>
[  693.674243] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: 
(null)


** Tags removed: verification-needed-cosmic
** Tags added: verification-done-cosmic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Won't Fix
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * Adapter resets periodically during high-load activity.

  * I/O stalls until reset/reinit is complete (latency) and I/O performance
  degrades across cluster (e.g., low throughput from data spread over nodes).

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

    mpt3sas_cm0: fault_state(0x2100)!
    mpt3sas_cm0: sending diag reset !!
    mpt3sas_cm0: diag reset: SUCCESS
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue.

  * We have reports that the adapter "LSI Logic / Symbios Logic Device
  [1000:00ac]" is affected by the issue.  And this commit resolved the
  problem.

  [Regression Potential]

  * This is a long-term issue from the mpt3sas driver, affecting only a
  class of adapters of this vendor. Since it's a clearly bug, the fix is
  necessary. The potential of regressions is unknown, but likely low -
  it changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which
  restricts even more the scope of this patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-15 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
cosmic' to 'verification-done-cosmic'. If the problem still exists,
change the tag 'verification-needed-cosmic' to 'verification-failed-
cosmic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-cosmic

** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Won't Fix
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * Adapter resets periodically during high-load activity.

  * I/O stalls until reset/reinit is complete (latency) and I/O performance
  degrades across cluster (e.g., low throughput from data spread over nodes).

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

    mpt3sas_cm0: fault_state(0x2100)!
    mpt3sas_cm0: sending diag reset !!
    mpt3sas_cm0: diag reset: SUCCESS
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue.

  * We have reports that the adapter "LSI Logic / Symbios Logic Device
  [1000:00ac]" is affected by the issue.  And this commit resolved the
  problem.

  [Regression Potential]

  * This is a long-term issue from the mpt3sas driver, affecting only a
  class of adapters of this vendor. Since it's a clearly bug, the fix is
  necessary. The potential of regressions is unknown, but likely low -
  it changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which
  restricts even more the scope of this patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-15 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Won't Fix
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * Adapter resets periodically during high-load activity.

  * I/O stalls until reset/reinit is complete (latency) and I/O performance
  degrades across cluster (e.g., low throughput from data spread over nodes).

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

    mpt3sas_cm0: fault_state(0x2100)!
    mpt3sas_cm0: sending diag reset !!
    mpt3sas_cm0: diag reset: SUCCESS
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue.

  * We have reports that the adapter "LSI Logic / Symbios Logic Device
  [1000:00ac]" is affected by the issue.  And this commit resolved the
  problem.

  [Regression Potential]

  * This is a long-term issue from the mpt3sas driver, affecting only a
  class of adapters of this vendor. Since it's a clearly bug, the fix is
  necessary. The potential of regressions is unknown, but likely low -
  it changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which
  restricts even more the scope of this patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-07 Thread Khaled El Mously
** Changed in: linux (Ubuntu Bionic)
   Status: Confirmed => Fix Committed

** Changed in: linux (Ubuntu Cosmic)
   Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Won't Fix
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * Adapter resets periodically during high-load activity.

  * I/O stalls until reset/reinit is complete (latency) and I/O performance
  degrades across cluster (e.g., low throughput from data spread over nodes).

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

    mpt3sas_cm0: fault_state(0x2100)!
    mpt3sas_cm0: sending diag reset !!
    mpt3sas_cm0: diag reset: SUCCESS
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue.

  * We have reports that the adapter "LSI Logic / Symbios Logic Device
  [1000:00ac]" is affected by the issue.  And this commit resolved the
  problem.

  [Regression Potential]

  * This is a long-term issue from the mpt3sas driver, affecting only a
  class of adapters of this vendor. Since it's a clearly bug, the fix is
  necessary. The potential of regressions is unknown, but likely low -
  it changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which
  restricts even more the scope of this patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-07 Thread Mauricio Faria de Oliveira
Patch submitted to kernel-team mailing list, got 2 ACKs.

https://lists.ubuntu.com/archives/kernel-team/2019-January/097471.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Won't Fix
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * Adapter resets periodically during high-load activity.

  * I/O stalls until reset/reinit is complete (latency) and I/O performance
  degrades across cluster (e.g., low throughput from data spread over nodes).

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

    mpt3sas_cm0: fault_state(0x2100)!
    mpt3sas_cm0: sending diag reset !!
    mpt3sas_cm0: diag reset: SUCCESS
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue.

  * We have reports that the adapter "LSI Logic / Symbios Logic Device
  [1000:00ac]" is affected by the issue.  And this commit resolved the
  problem.

  [Regression Potential]

  * This is a long-term issue from the mpt3sas driver, affecting only a
  class of adapters of this vendor. Since it's a clearly bug, the fix is
  necessary. The potential of regressions is unknown, but likely low -
  it changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which
  restricts even more the scope of this patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-07 Thread Mauricio Faria de Oliveira
** Description changed:

  [Impact]
  
  * Adapter resets periodically during high-load activity.
  
  * I/O stalls until reset/reinit is complete (latency) and I/O performance
  degrades across cluster (e.g., low throughput from data spread over nodes).
  
  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue)
  in the I/O completion path; there's a MMIO register that driver uses to
  flag an empty entry in such queue, called Reply Post Host Index. This
  value is updated during the driver interrupt routine [in
  _base_interrupt() function].
  
  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply queue
  according to the number of maximum MSI-X vectors that the adapter
  exposes and the spec version (SAS 3.0 vs SAS 3.5).
  
  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:
  
    mpt3sas_cm0: fault_state(0x2100)!
    mpt3sas_cm0: sending diag reset !!
    mpt3sas_cm0: diag reset: SUCCESS
  [followed by a lot of driver messages as result of the reset procedure]
  
  * During these resets, I/O is stalled so it may affect performance.
  
  [Test Case]
  
  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise the
- I/O path in a heavy way and trigger the issue. We have reports that the
- adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by
- the issue.
+ I/O path in a heavy way and trigger the issue.
+ 
+ * We have reports that the adapter "LSI Logic / Symbios Logic Device
+ [1000:00ac]" is affected by the issue.  And this commit resolved the
+ problem.
  
  [Regression Potential]
  
  * This is a long-term issue from the mpt3sas driver, affecting only a
- class of adapters of this vendor. Since it's a clear bug, the fix is
+ class of adapters of this vendor. Since it's a clearly bug, the fix is
  necessary. The potential of regressions is unknown, but likely low - it
  changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which restricts
  even more the scope of this patch.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Won't Fix
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * Adapter resets periodically during high-load activity.

  * I/O stalls until reset/reinit is complete (latency) and I/O performance
  degrades across cluster (e.g., low throughput from data spread over nodes).

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

    mpt3sas_cm0: fault_state(0x2100)!
    mpt3sas_cm0: sending diag reset !!
    mpt3sas_cm0: diag reset: SUCCESS
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue.

  * We 

[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-07 Thread Mauricio Faria de Oliveira
** Description changed:

  [Impact]
+ 
+ * Adapter resets periodically during high-load activity.
+ 
+ * I/O stalls until reset/reinit is complete (latency) and I/O performance
+ degrades across cluster (e.g., low throughput from data spread over nodes).
  
  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue)
  in the I/O completion path; there's a MMIO register that driver uses to
  flag an empty entry in such queue, called Reply Post Host Index. This
  value is updated during the driver interrupt routine [in
  _base_interrupt() function].
  
  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply queue
  according to the number of maximum MSI-X vectors that the adapter
  exposes and the spec version (SAS 3.0 vs SAS 3.5).
  
  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:
  
-   mpt3sas_cm0: fault_state(0x2100)! 
-   mpt3sas_cm0: sending diag reset !! 
-   mpt3sas_cm0: diag reset: SUCCESS 
+   mpt3sas_cm0: fault_state(0x2100)!
+   mpt3sas_cm0: sending diag reset !!
+   mpt3sas_cm0: diag reset: SUCCESS
  [followed by a lot of driver messages as result of the reset procedure]
  
  * During these resets, I/O is stalled so it may affect performance.
- 
  
  [Test Case]
  
  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise the
  I/O path in a heavy way and trigger the issue. We have reports that the
  adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by
  the issue.
  
- 
  [Regression Potential]
  
  * This is a long-term issue from the mpt3sas driver, affecting only a
  class of adapters of this vendor. Since it's a clear bug, the fix is
  necessary. The potential of regressions is unknown, but likely low - it
  changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which restricts
  even more the scope of this patch.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Won't Fix
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * Adapter resets periodically during high-load activity.

  * I/O stalls until reset/reinit is complete (latency) and I/O performance
  degrades across cluster (e.g., low throughput from data spread over nodes).

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

    mpt3sas_cm0: fault_state(0x2100)!
    mpt3sas_cm0: sending diag reset !!
    mpt3sas_cm0: diag reset: SUCCESS
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue. We have reports
  that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is
  affected by the issue.

  [Regression Potential]

  * This is a long-term issue 

[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-07 Thread Guilherme G. Piccoli
Xenial has no support for the SAS 3.5 class, so we won't backport the
patch - it's only needed in Bionic (4.15 / Xenial HWE) and Cosmic kernel
(4.18).

** Changed in: linux (Ubuntu Xenial)
   Status: Confirmed => Won't Fix

** Changed in: linux (Ubuntu Xenial)
   Importance: Critical => Medium

** Changed in: linux (Ubuntu Disco)
   Importance: Critical => Medium

** Changed in: linux (Ubuntu Xenial)
 Assignee: Guilherme G. Piccoli (gpiccoli) => Mauricio Faria de Oliveira 
(mfo)

** Changed in: linux (Ubuntu Bionic)
 Assignee: Guilherme G. Piccoli (gpiccoli) => Mauricio Faria de Oliveira 
(mfo)

** Changed in: linux (Ubuntu Cosmic)
 Assignee: Guilherme G. Piccoli (gpiccoli) => Mauricio Faria de Oliveira 
(mfo)

** Changed in: linux (Ubuntu Disco)
 Assignee: Guilherme G. Piccoli (gpiccoli) => Mauricio Faria de Oliveira 
(mfo)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Won't Fix
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

mpt3sas_cm0: fault_state(0x2100)! 
mpt3sas_cm0: sending diag reset !! 
mpt3sas_cm0: diag reset: SUCCESS 
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  
  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue. We have reports
  that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is
  affected by the issue.

  
  [Regression Potential]

  * This is a long-term issue from the mpt3sas driver, affecting only a
  class of adapters of this vendor. Since it's a clear bug, the fix is
  necessary. The potential of regressions is unknown, but likely low -
  it changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which
  restricts even more the scope of this patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-07 Thread Guilherme G. Piccoli
** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => Critical

** Changed in: linux (Ubuntu Cosmic)
   Importance: Undecided => Critical

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => Critical

** Changed in: linux (Ubuntu Xenial)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Bionic)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Cosmic)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Disco)
   Status: Confirmed => Fix Released

** Changed in: linux (Ubuntu Cosmic)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Xenial)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

mpt3sas_cm0: fault_state(0x2100)! 
mpt3sas_cm0: sending diag reset !! 
mpt3sas_cm0: diag reset: SUCCESS 
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  
  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue. We have reports
  that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is
  affected by the issue.

  
  [Regression Potential]

  * This is a long-term issue from the mpt3sas driver, affecting only a
  class of adapters of this vendor. Since it's a clear bug, the fix is
  necessary. The potential of regressions is unknown, but likely low -
  it changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which
  restricts even more the scope of this patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-07 Thread Dan Streetman
** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Cosmic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Disco)
   Importance: Critical
 Assignee: Guilherme G. Piccoli (gpiccoli)
   Status: Confirmed

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

mpt3sas_cm0: fault_state(0x2100)! 
mpt3sas_cm0: sending diag reset !! 
mpt3sas_cm0: diag reset: SUCCESS 
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  
  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue. We have reports
  that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is
  affected by the issue.

  
  [Regression Potential]

  * This is a long-term issue from the mpt3sas driver, affecting only a
  class of adapters of this vendor. Since it's a clear bug, the fix is
  necessary. The potential of regressions is unknown, but likely low -
  it changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which
  restricts even more the scope of this patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW

2019-01-07 Thread Guilherme G. Piccoli
** Attachment added: "dmesg snippet showing the error"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+attachment/5227414/+files/dmesg-error

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810781

Title:
  mpt3sas - driver using the wrong register to update a queue index in
  FW

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Fix Released

Bug description:
  [Impact]

  * The mpt3sas driver relies in a FW queue (Reply Post Descriptor
  Queue) in the I/O completion path; there's a MMIO register that driver
  uses to flag an empty entry in such queue, called Reply Post Host
  Index. This value is updated during the driver interrupt routine [in
  _base_interrupt() function].

  * Happens that there are 2 registers representing the Reply Post Host
  Index according to the type of the adapter. They are differentiated in
  the driver through the "ioc->combined_reply_queue" check. By the MPI
  specification (vendor spec), driver should use this combined reply
  queue according to the number of maximum MSI-X vectors that the
  adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

  * Currently, this is wrong checked for a class of adapters, which was fixed 
in the upstream
  kernel commit 2b48be65685a [0]. Without this commit, we can observe 
spontaneous resets in the
  driver due to queue overflow (FW is not aware that there are free entries in 
the Reply Post Descriptor Queue). The dmesg log will show the following output 
in case of this error:

mpt3sas_cm0: fault_state(0x2100)! 
mpt3sas_cm0: sending diag reset !! 
mpt3sas_cm0: diag reset: SUCCESS 
  [followed by a lot of driver messages as result of the reset procedure]

  * During these resets, I/O is stalled so it may affect performance.

  
  [Test Case]

  * It's not trivial to test the problem, but given a machine with an
  affected device, an I/O benchmark like FIO could be used to exercise
  the I/O path in a heavy way and trigger the issue. We have reports
  that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is
  affected by the issue.

  
  [Regression Potential]

  * This is a long-term issue from the mpt3sas driver, affecting only a
  class of adapters of this vendor. Since it's a clear bug, the fix is
  necessary. The potential of regressions is unknown, but likely low -
  it changes the register used for the index updates given some set of
  characteristics of the adapter (according to the spec.), which
  restricts even more the scope of this patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp