[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
This bug was fixed in the package linux - 4.18.0-14.15 --- linux (4.18.0-14.15) cosmic; urgency=medium * linux: 4.18.0-14.15 -proposed tracker (LP: #1811406) * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998) - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait - blk-wbt: move disable check into get_limit() - blk-wbt: use wq_has_sleeper() for wq active check - blk-wbt: fix has-sleeper queueing check - blk-wbt: abstract out end IO completion handler - blk-wbt: improve waking of tasks * To reduce the Realtek USB cardreader power consumption (LP: #1811337) - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led - mmc: rtsx_usb_sdmmc: Re-work runtime PM support - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection - memstick: Prevent memstick host from getting runtime suspended during card detection - memstick: rtsx_usb_ms: Use ms_dev() helper - memstick: rtsx_usb_ms: Support runtime power management * Support non-strict iommu mode on arm64 (LP: #1806488) - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap() - iommu/arm-smmu-v3: Implement flush_iotlb_all hook - iommu/dma: Add support for non-strict mode - iommu: Add "iommu.strict" command line option - iommu/io-pgtable-arm: Add support for non-strict mode - iommu/arm-smmu-v3: Add support for non-strict mode - iommu/io-pgtable-arm-v7s: Add support for non-strict mode - iommu/arm-smmu: Support non-strict mode * [Regression] crashkernel fails on HiSilicon D05 (LP: #1806766) - efi: honour memory reservations passed via a linux specific config table - efi/arm: libstub: add a root memreserve config table - efi: add API to reserve memory persistently across kexec reboot - irqchip/gic-v3-its: Change initialization ordering for LPIs - irqchip/gic-v3-its: Simplify LPI_PENDBASE_SZ usage - irqchip/gic-v3-its: Split property table clearing from allocation - irqchip/gic-v3-its: Move pending table allocation to init time - irqchip/gic-v3-its: Keep track of property table's PA and VA - irqchip/gic-v3-its: Allow use of pre-programmed LPI tables - irqchip/gic-v3-its: Use pre-programmed redistributor tables with kdump kernels - irqchip/gic-v3-its: Check that all RDs have the same property table - irqchip/gic-v3-its: Register LPI tables with EFI config table - irqchip/gic-v3-its: Allow use of LPI tables in reserved memory - arm64: memblock: don't permit memblock resizing until linear mapping is up - efi/arm: Defer persistent reservations until after paging_init() - efi: Permit calling efi_mem_reserve_persistent() from atomic context - efi: Prevent GICv3 WARN() by mapping the memreserve table before first use * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335) - pinctrl: cannonlake: Fix community ordering for H variant - pinctrl: cannonlake: Fix HOSTSW_OWN register offset of H variant * Add Cavium ThunderX2 SoC UNCORE PMU driver (LP: #1811200) - Documentation: perf: Add documentation for ThunderX2 PMU uncore driver - drivers/perf: Add Cavium ThunderX2 SoC UNCORE PMU driver - [Config] New config CONFIG_THUNDERX2_PMU=m * iptables connlimit allows more connections than the limit when using multiple CPUs (LP: #1811094) - netfilter: nf_conncount: don't skip eviction when age is negative * CVE-2018-16882 - KVM: Fix UAF in nested posted interrupt processing * Cannot initialize ATA disk if IDENTIFY command fails (LP: #1809046) - scsi: libsas: check the ata device status by ata_dev_enabled() * scsi: libsas: fix a race condition when smp task timeout (LP: #1808912) - scsi: libsas: fix a race condition when smp task timeout * CVE-2018-14625 - vhost/vsock: fix use-after-free in network stack callers * Fix and issue that LG I2C touchscreen stops working after reboot (LP: #1805085) - HID: i2c-hid: Disable runtime PM for LG touchscreen * Drivers: hv: vmbus: Offload the handling of channels to two workqueues (LP: #1807757) - Drivers: hv: vmbus: check the creation_status in vmbus_establish_gpadl() - Drivers: hv: vmbus: Offload the handling of channels to two workqueues * Disable LPM for Raydium Touchscreens (LP: #1802248) - USB: quirks: Add no-lpm quirk for Raydium touchscreens * Power leakage at S5 with Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter (LP: #1805607) - SAUCE: ath10k: provide reset function for QCA9377 chip * CVE-2018-19407 - KVM: X86: Fix scan ioapic use-before-initialization * Fix USB2 device wrongly detected as USB1 (LP: #1806534) - xhci: Add quirk to workaround the
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
This bug was fixed in the package linux - 4.15.0-44.47 --- linux (4.15.0-44.47) bionic; urgency=medium * linux: 4.15.0-44.47 -proposed tracker (LP: #1811419) * Packaging resync (LP: #1786013) - [Packaging] update helper scripts * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998) - blk-wbt: pass in enum wbt_flags to get_rq_wait() - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait - blk-wbt: move disable check into get_limit() - blk-wbt: use wq_has_sleeper() for wq active check - blk-wbt: fix has-sleeper queueing check - blk-wbt: abstract out end IO completion handler - blk-wbt: improve waking of tasks * To reduce the Realtek USB cardreader power consumption (LP: #1811337) - mmc: sdhci: Disable 1.8v modes (HS200/HS400/UHS) if controller can't support 1.8v - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led - mmc: rtsx_usb: Use MMC_CAP2_NO_SDIO - mmc: rtsx_usb: Enable MMC_CAP_ERASE to allow erase/discard/trim requests - mmc: rtsx_usb_sdmmc: Re-work runtime PM support - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection - memstick: Prevent memstick host from getting runtime suspended during card detection - memstick: rtsx_usb_ms: Use ms_dev() helper - memstick: rtsx_usb_ms: Support runtime power management * Support non-strict iommu mode on arm64 (LP: #1806488) - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap() - iommu/arm-smmu-v3: Implement flush_iotlb_all hook - iommu/dma: Add support for non-strict mode - iommu: Add "iommu.strict" command line option - iommu/io-pgtable-arm: Add support for non-strict mode - iommu/arm-smmu-v3: Add support for non-strict mode - iommu/io-pgtable-arm-v7s: Add support for non-strict mode - iommu/arm-smmu: Support non-strict mode * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335) - pinctrl: cannonlake: Fix community ordering for H variant - pinctrl: cannonlake: Fix HOSTSW_OWN register offset of H variant * Add Cavium ThunderX2 SoC UNCORE PMU driver (LP: #1811200) - perf: Export perf_event_update_userpage - Documentation: perf: Add documentation for ThunderX2 PMU uncore driver - drivers/perf: Add Cavium ThunderX2 SoC UNCORE PMU driver - [Config] New config CONFIG_THUNDERX2_PMU=m * Update hisilicon SoC-specific drivers (LP: #1810457) - SAUCE: Revert "net: hns3: Updates RX packet info fetch in case of multi BD" - Revert "UBUNTU: SAUCE: {topost} net: hns3: separate roce from nic when resetting" - Revert "UBUNTU: SAUCE: {topost} net: hns3: Use roce handle when calling roce callback function" - Revert "UBUNTU: SAUCE: {topost} net: hns3: Add calling roce callback function when link status change" - Revert "UBUNTU: SAUCE: {topost} net: hns3: optimize the process of notifying roce client" - Revert "UBUNTU: SAUCE: {topost} net: hns3: Add pf reset for hip08 RoCE" - scsi: hisi_sas: Remove depends on HAS_DMA in case of platform dependency - ethernet: hisilicon: hns: hns_dsaf_mac: Use generic eth_broadcast_addr - scsi: hisi_sas: consolidate command check in hisi_sas_get_ata_protocol() - scsi: hisi_sas: remove some unneeded structure members - scsi: hisi_sas: Introduce hisi_sas_phy_set_linkrate() - net: hns: Fix the process of adding broadcast addresses to tcam - net: hns3: remove redundant variable 'protocol' - scsi: hisi_sas: Drop hisi_sas_slot_abort() - net: hns: Make many functions static - net: hns: make hns_dsaf_roce_reset non static - net: hisilicon: hns: Replace mdelay() with msleep() - net: hns3: fix return value error while hclge_cmd_csq_clean failed - net: hns: remove redundant variables 'max_frm' and 'tmp_mac_key' - net: hns: Mark expected switch fall-through - net: hns3: Mark expected switch fall-through - net: hns3: Remove tx ring BD len register in hns3_enet - net: hns: modify variable type in hns_nic_reuse_page - net: hns: use eth_get_headlen interface instead of hns_nic_get_headlen - net: hns3: modify variable type in hns3_nic_reuse_page - net: hns3: Fix for vf vlan delete failed problem - net: hns3: Fix for multicast failure - net: hns3: Fix error of checking used vlan id - net: hns3: Implement shutdown ops in hns3 pci driver - net: hns3: Fix for loopback selftest failed problem - net: hns3: Fix ping exited problem when doing lp selftest - net: hns3: Preserve vlan 0 in hardware table - net: hns3: Only update mac configuation when necessary - net: hns3: Change the dst mac addr of loopback packet - net: hns3: Remove redundant codes of query
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
Verification done on Bionic, with the HWE kernel in Xenial (i.e., 4.15.0-44.47~16.04.1 per the original reporter's environment) The mpt3sas driver is running correctly -- the sosreport shows the previous kernel had mpt3sas fault_state error messages repeatedly within less than 10 minutes, and the current kernel has zero in ~76 minutes (~4600 seconds uptime in kern.log). $ grep -e 'Linux version' -e 'mpt3sas.*fault_state' sosreport-/var/log/kern.log ... Jan 18 04:32:36 kernel: [18225729.321846] mpt3sas_cm0: fault_state(0x2100)! Jan 18 04:41:08 kernel: [18226240.928889] mpt3sas_cm0: fault_state(0x2100)! Jan 18 04:48:47 kernel: [18226700.312831] mpt3sas_cm0: fault_state(0x2100)! Jan 18 04:57:29 kernel: [18227222.159601] mpt3sas_cm0: fault_state(0x2100)! Jan 18 05:05:46 kernel: [18227719.430826] mpt3sas_cm0: fault_state(0x2100)! Jan 18 05:12:52 kernel: [18228145.023317] mpt3sas_cm0: fault_state(0x2100)! Jan 18 05:17:22 kernel: [18228414.970544] mpt3sas_cm0: fault_state(0x2100)! Jan 18 05:22:22 kernel: [18228714.613254] mpt3sas_cm0: fault_state(0x2100)! Jan 18 05:26:57 kernel: [18228989.680424] mpt3sas_cm0: fault_state(0x2100)! Jan 18 05:36:14 kernel: [ 0.00] Linux version 4.15.0-44-generic (buildd@lcy01-amd64-025) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #47~16.04.1-Ubuntu SMP Mon Jan 14 20:50:30 UTC 2019 (Ubuntu 4.15.0-44.47~16.04.1-generic 4.15.18) $ tail -n1 sosreport-/var/log/kern.log Jan 18 06:52:36 kernel: [ 4613.908291] perf: interrupt took too long (3958 > 3952), lowering kernel.perf_event_max_sample_rate to 50500 ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1810781 Title: mpt3sas - driver using the wrong register to update a queue index in FW Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Won't Fix Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Status in linux source package in Disco: Fix Released Bug description: [Impact] * Adapter resets periodically during high-load activity. * I/O stalls until reset/reinit is complete (latency) and I/O performance degrades across cluster (e.g., low throughput from data spread over nodes). * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. * We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. And this commit resolved the problem. [Regression Potential] * This is a long-term issue from the mpt3sas driver, affecting only a class of adapters of this vendor. Since it's a clearly bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
Verification done on Cosmic for regression on an older adapter model, I/O stress (iozone) finishes successfully, no errors seen in dmesg. Waiting for verification on Bionic by the reporter. root@dixie:~# fdisk /dev/sdb # create one partition root@dixie:~# mkfs.ext4 /dev/sdb1 root@dixie:~# mount /dev/sdb1 /test/ root@dixie:~# cd /test root@dixie:/test# iozone -R -s 2G -r 1m -S 2048 -i 0 -i 2 -i 8 -G -c -o -l 128 -u 128 -t 128 root@dixie:/test# dmesg | tail <...> [ 693.674243] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null) ** Tags removed: verification-needed-cosmic ** Tags added: verification-done-cosmic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1810781 Title: mpt3sas - driver using the wrong register to update a queue index in FW Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Won't Fix Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Status in linux source package in Disco: Fix Released Bug description: [Impact] * Adapter resets periodically during high-load activity. * I/O stalls until reset/reinit is complete (latency) and I/O performance degrades across cluster (e.g., low throughput from data spread over nodes). * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. * We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. And this commit resolved the problem. [Regression Potential] * This is a long-term issue from the mpt3sas driver, affecting only a class of adapters of this vendor. Since it's a clearly bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed- cosmic'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-cosmic ** Tags added: verification-needed-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1810781 Title: mpt3sas - driver using the wrong register to update a queue index in FW Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Won't Fix Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Status in linux source package in Disco: Fix Released Bug description: [Impact] * Adapter resets periodically during high-load activity. * I/O stalls until reset/reinit is complete (latency) and I/O performance degrades across cluster (e.g., low throughput from data spread over nodes). * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. * We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. And this commit resolved the problem. [Regression Potential] * This is a long-term issue from the mpt3sas driver, affecting only a class of adapters of this vendor. Since it's a clearly bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed- bionic'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1810781 Title: mpt3sas - driver using the wrong register to update a queue index in FW Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Won't Fix Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Status in linux source package in Disco: Fix Released Bug description: [Impact] * Adapter resets periodically during high-load activity. * I/O stalls until reset/reinit is complete (latency) and I/O performance degrades across cluster (e.g., low throughput from data spread over nodes). * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. * We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. And this commit resolved the problem. [Regression Potential] * This is a long-term issue from the mpt3sas driver, affecting only a class of adapters of this vendor. Since it's a clearly bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
** Changed in: linux (Ubuntu Bionic) Status: Confirmed => Fix Committed ** Changed in: linux (Ubuntu Cosmic) Status: Confirmed => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1810781 Title: mpt3sas - driver using the wrong register to update a queue index in FW Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Won't Fix Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Fix Committed Status in linux source package in Disco: Fix Released Bug description: [Impact] * Adapter resets periodically during high-load activity. * I/O stalls until reset/reinit is complete (latency) and I/O performance degrades across cluster (e.g., low throughput from data spread over nodes). * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. * We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. And this commit resolved the problem. [Regression Potential] * This is a long-term issue from the mpt3sas driver, affecting only a class of adapters of this vendor. Since it's a clearly bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
Patch submitted to kernel-team mailing list, got 2 ACKs. https://lists.ubuntu.com/archives/kernel-team/2019-January/097471.html -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1810781 Title: mpt3sas - driver using the wrong register to update a queue index in FW Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Won't Fix Status in linux source package in Bionic: Confirmed Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Fix Released Bug description: [Impact] * Adapter resets periodically during high-load activity. * I/O stalls until reset/reinit is complete (latency) and I/O performance degrades across cluster (e.g., low throughput from data spread over nodes). * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. * We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. And this commit resolved the problem. [Regression Potential] * This is a long-term issue from the mpt3sas driver, affecting only a class of adapters of this vendor. Since it's a clearly bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
** Description changed: [Impact] * Adapter resets periodically during high-load activity. * I/O stalls until reset/reinit is complete (latency) and I/O performance degrades across cluster (e.g., low throughput from data spread over nodes). * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the - I/O path in a heavy way and trigger the issue. We have reports that the - adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by - the issue. + I/O path in a heavy way and trigger the issue. + + * We have reports that the adapter "LSI Logic / Symbios Logic Device + [1000:00ac]" is affected by the issue. And this commit resolved the + problem. [Regression Potential] * This is a long-term issue from the mpt3sas driver, affecting only a - class of adapters of this vendor. Since it's a clear bug, the fix is + class of adapters of this vendor. Since it's a clearly bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1810781 Title: mpt3sas - driver using the wrong register to update a queue index in FW Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Won't Fix Status in linux source package in Bionic: Confirmed Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Fix Released Bug description: [Impact] * Adapter resets periodically during high-load activity. * I/O stalls until reset/reinit is complete (latency) and I/O performance degrades across cluster (e.g., low throughput from data spread over nodes). * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. * We
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
** Description changed: [Impact] + + * Adapter resets periodically during high-load activity. + + * I/O stalls until reset/reinit is complete (latency) and I/O performance + degrades across cluster (e.g., low throughput from data spread over nodes). * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: - mpt3sas_cm0: fault_state(0x2100)! - mpt3sas_cm0: sending diag reset !! - mpt3sas_cm0: diag reset: SUCCESS + mpt3sas_cm0: fault_state(0x2100)! + mpt3sas_cm0: sending diag reset !! + mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. - [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. - [Regression Potential] * This is a long-term issue from the mpt3sas driver, affecting only a class of adapters of this vendor. Since it's a clear bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1810781 Title: mpt3sas - driver using the wrong register to update a queue index in FW Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Won't Fix Status in linux source package in Bionic: Confirmed Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Fix Released Bug description: [Impact] * Adapter resets periodically during high-load activity. * I/O stalls until reset/reinit is complete (latency) and I/O performance degrades across cluster (e.g., low throughput from data spread over nodes). * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. [Regression Potential] * This is a long-term issue
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
Xenial has no support for the SAS 3.5 class, so we won't backport the patch - it's only needed in Bionic (4.15 / Xenial HWE) and Cosmic kernel (4.18). ** Changed in: linux (Ubuntu Xenial) Status: Confirmed => Won't Fix ** Changed in: linux (Ubuntu Xenial) Importance: Critical => Medium ** Changed in: linux (Ubuntu Disco) Importance: Critical => Medium ** Changed in: linux (Ubuntu Xenial) Assignee: Guilherme G. Piccoli (gpiccoli) => Mauricio Faria de Oliveira (mfo) ** Changed in: linux (Ubuntu Bionic) Assignee: Guilherme G. Piccoli (gpiccoli) => Mauricio Faria de Oliveira (mfo) ** Changed in: linux (Ubuntu Cosmic) Assignee: Guilherme G. Piccoli (gpiccoli) => Mauricio Faria de Oliveira (mfo) ** Changed in: linux (Ubuntu Disco) Assignee: Guilherme G. Piccoli (gpiccoli) => Mauricio Faria de Oliveira (mfo) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1810781 Title: mpt3sas - driver using the wrong register to update a queue index in FW Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Won't Fix Status in linux source package in Bionic: Confirmed Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Fix Released Bug description: [Impact] * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. [Regression Potential] * This is a long-term issue from the mpt3sas driver, affecting only a class of adapters of this vendor. Since it's a clear bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
** Changed in: linux (Ubuntu Bionic) Importance: Undecided => Critical ** Changed in: linux (Ubuntu Cosmic) Importance: Undecided => Critical ** Changed in: linux (Ubuntu Xenial) Importance: Undecided => Critical ** Changed in: linux (Ubuntu Xenial) Status: New => Confirmed ** Changed in: linux (Ubuntu Bionic) Status: New => Confirmed ** Changed in: linux (Ubuntu Cosmic) Status: New => Confirmed ** Changed in: linux (Ubuntu Disco) Status: Confirmed => Fix Released ** Changed in: linux (Ubuntu Cosmic) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Bionic) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: linux (Ubuntu Xenial) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1810781 Title: mpt3sas - driver using the wrong register to update a queue index in FW Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Confirmed Status in linux source package in Bionic: Confirmed Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Fix Released Bug description: [Impact] * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. [Regression Potential] * This is a long-term issue from the mpt3sas driver, affecting only a class of adapters of this vendor. Since it's a clear bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Disco) Importance: Critical Assignee: Guilherme G. Piccoli (gpiccoli) Status: Confirmed ** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1810781 Title: mpt3sas - driver using the wrong register to update a queue index in FW Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Confirmed Status in linux source package in Bionic: Confirmed Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Fix Released Bug description: [Impact] * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. [Regression Potential] * This is a long-term issue from the mpt3sas driver, affecting only a class of adapters of this vendor. Since it's a clear bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1810781] Re: mpt3sas - driver using the wrong register to update a queue index in FW
** Attachment added: "dmesg snippet showing the error" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+attachment/5227414/+files/dmesg-error -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1810781 Title: mpt3sas - driver using the wrong register to update a queue index in FW Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Confirmed Status in linux source package in Bionic: Confirmed Status in linux source package in Cosmic: Confirmed Status in linux source package in Disco: Fix Released Bug description: [Impact] * The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function]. * Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5). * Currently, this is wrong checked for a class of adapters, which was fixed in the upstream kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error: mpt3sas_cm0: fault_state(0x2100)! mpt3sas_cm0: sending diag reset !! mpt3sas_cm0: diag reset: SUCCESS [followed by a lot of driver messages as result of the reset procedure] * During these resets, I/O is stalled so it may affect performance. [Test Case] * It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue. We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. [Regression Potential] * This is a long-term issue from the mpt3sas driver, affecting only a class of adapters of this vendor. Since it's a clear bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810781/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp