[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Hi Evan, The SRU cycle has completed, and all kernels containing the Raid10 block discard performance patches have now been released to -updates. Note that the versions are different than the kernels in -proposed, due to the kernel team needing to do a last minute respin to fix two sets of CVEs, one for broadcom wifi chipsets and the other for bpf, hence the kernels being released a day later than usual. The released kernels are: Hirsute: 5.11.0-22-generic Groovy: 5.8.0-59-generic Focal: 5.4.0-77-generic Bionic: 4.15.0-147-generic The HWE equivalents have also been released to -updates. You may now install these kernels to your systems and enjoy fast block discard for your Raid10 arrays. Thanks, Matthew -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Fix Released Status in linux source package in Focal: Fix Released Status in linux source package in Groovy: Fix Released Status in linux source package in Hirsute: Fix Released Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commit enables Radid10 to use
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug was fixed in the package linux - 4.15.0-147.151 --- linux (4.15.0-147.151) bionic; urgency=medium * CVE-2021-3444 - bpf: Fix truncation handling for mod32 dst reg wrt zero * CVE-2021-3600 - SAUCE: bpf: Do not use ax register in interpreter on div/mod - bpf: fix subprog verifier bypass by div/mod by 0 exception - SAUCE: bpf: Fix 32-bit register truncation on div/mod instruction linux (4.15.0-146.150) bionic; urgency=medium * UAF on CAN BCM bcm_rx_handler (LP: #1931855) - SAUCE: can: bcm: delay release of struct bcm_op after synchronize_rcu linux (4.15.0-145.149) bionic; urgency=medium * bionic/linux: 4.15.0-145.149 -proposed tracker (LP: #1929967) * Packaging resync (LP: #1786013) - update dkms package versions * raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations (LP: #1896578) - md: add md_submit_discard_bio() for submitting discard bio - md/raid10: extend r10bio devs to raid disks - md/raid10: pull the code that wait for blocked dev into one function - md/raid10: improve raid10 discard request - md/raid10: improve discard request for far layout * CVE-2021-23133 - sctp: delay auto_asconf init until binding the first addr * Bionic update: upstream stable patchset 2021-05-25 (LP: #1929603) - Input: nspire-keypad - enable interrupts only when opened - dmaengine: dw: Make it dependent to HAS_IOMEM - ARM: dts: Fix moving mmc devices with aliases for omap4 & 5 - arc: kernel: Return -EFAULT if copy_to_user() fails - neighbour: Disregard DEAD dst in neigh_update - ARM: keystone: fix integer overflow warning - ASoC: fsl_esai: Fix TDM slot setup for I2S mode - scsi: scsi_transport_srp: Don't block target in SRP_PORT_LOST state - net: ieee802154: stop dump llsec keys for monitors - net: ieee802154: stop dump llsec devs for monitors - net: ieee802154: forbid monitor for add llsec dev - net: ieee802154: stop dump llsec devkeys for monitors - net: ieee802154: forbid monitor for add llsec devkey - net: ieee802154: stop dump llsec seclevels for monitors - net: ieee802154: forbid monitor for add llsec seclevel - pcnet32: Use pci_resource_len to validate PCI resource - mac80211: clear sta->fast_rx when STA removed from 4-addr VLAN - Input: i8042 - fix Pegatron C15B ID entry - HID: wacom: set EV_KEY and EV_ABS only for non-HID_GENERIC type of devices - readdir: make sure to verify directory entry for legacy interfaces too - arm64: fix inline asm in load_unaligned_zeropad() - arm64: alternatives: Move length validation in alternative_{insn, endif} - scsi: libsas: Reset num_scatter if libata marks qc as NODATA - netfilter: conntrack: do not print icmpv6 as unknown via /proc - netfilter: nft_limit: avoid possible divide error in nft_limit_init - net: davicom: Fix regulator not turned off on failed probe - net: sit: Unregister catch-all devices - i40e: fix the panic when running bpf in xdpdrv mode - ibmvnic: avoid calling napi_disable() twice - ibmvnic: remove duplicate napi_schedule call in do_reset function - ibmvnic: remove duplicate napi_schedule call in open function - ARM: footbridge: fix PCI interrupt mapping - ARM: 9071/1: uprobes: Don't hook on thumb instructions - pinctrl: lewisburg: Update number of pins in community - HID: wacom: Assign boolean values to a bool variable - ARM: dts: Fix swapped mmc order for omap3 - net: geneve: check skb is large enough for IPv4/IPv6 header - s390/entry: save the caller of psw_idle - xen-netback: Check for hotplug-status existence before watching - cavium/liquidio: Fix duplicate argument - ia64: fix discontig.c section mismatches - ia64: tools: remove duplicate definition of ia64_mf() on ia64 - x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access - net: hso: fix NULL-deref on disconnect regression - USB: CDC-ACM: fix poison/unpoison imbalance - lockdep: Add a missing initialization hint to the "INFO: Trying to register non-static key" message - drm/msm: Fix a5xx/a6xx timestamps - Input: s6sy761 - fix coordinate read bit shift - net: ip6_tunnel: Unregister catch-all devices - ACPI: tables: x86: Reserve memory occupied by ACPI tables - ACPI: x86: Call acpi_boot_table_init() after acpi_table_upgrade() - net: usb: ax88179_178a: initialize local variables before use - iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_enqueue_hcmd() - mips: Do not include hi and lo in clobber list for R6 - bpf: Fix masking negation logic upon negative dst register - iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_gen2_enqueue_hcmd() - ALSA: usb-audio: Add MIDI quirk for Vox ToneLab EX - USB: Add reset-resume quirk for WD19's Realtek Hub - platform/x86: thinkpad_acpi: Correct thermal sensor allocation * r8152 tx
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug was fixed in the package linux - 5.8.0-59.66 --- linux (5.8.0-59.66) groovy; urgency=medium * UAF on CAN J1939 j1939_can_recv (LP: #1932209) - SAUCE: can: j1939: delay release of j1939_priv after synchronize_rcu * UAF on CAN BCM bcm_rx_handler (LP: #1931855) - SAUCE: can: bcm: delay release of struct bcm_op after synchronize_rcu linux (5.8.0-57.64) groovy; urgency=medium * groovy/linux: 5.8.0-57.64 -proposed tracker (LP: #1932047) * pmtu.sh from selftests.net in linux ADT test failure with linux/5.8.0-56.63 (LP: #1931731) - net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb linux (5.8.0-56.63) groovy; urgency=medium * groovy/linux: 5.8.0-56.63 -proposed tracker (LP: #1930052) * Packaging resync (LP: #1786013) - update dkms package versions * scsi: storvsc: Parameterize number hardware queues (LP: #1930626) - scsi: storvsc: Parameterize number hardware queues * CVE-2021-33200 - bpf: Wrap aux data inside bpf_sanitize_info container - bpf: Fix mask direction swap upon off reg sign change - bpf: No need to simulate speculative domain for immediates * CVE-2021-3490 - SAUCE: Revert "UBUNTU: SAUCE: bpf: verifier: fix ALU32 bounds tracking with bitwise ops" - gpf: Fix alu32 const subreg bound tracking on bitwise operations * CVE-2021-3489 - SAUCE: Revert "UBUNTU: SAUCE: bpf: prevent writable memory-mapping of read- only ringbuf pages" - bpf: Prevent writable memory-mapping of read-only ringbuf pages * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle (LP: #1928242) - USB: Verify the port status when timeout happens during port suspend * CVE-2020-26145 - ath10k: drop fragments with multicast DA for SDIO - ath10k: add CCMP PN replay protection for fragmented frames for PCIe - ath10k: drop fragments with multicast DA for PCIe * CVE-2020-26141 - ath10k: Fix TKIP Michael MIC verification for PCIe * CVE-2020-24587 - ath11k: Clear the fragment cache during key install * CVE-2020-24588 - mac80211: properly handle A-MSDUs that start with an RFC 1042 header - cfg80211: mitigate A-MSDU aggregation attacks - mac80211: drop A-MSDUs on old ciphers - ath10k: drop MPDU which has discard flag set by firmware for SDIO * CVE-2020-26139 - mac80211: do not accept/forward invalid EAPOL frames * CVE-2020-24586 // CVE-2020-24587 // CVE-2020-24587 for such cases. - mac80211: extend protection against mixed key and fragment cache attacks * CVE-2020-24586 // CVE-2020-24587 - mac80211: prevent mixed key and fragment cache attacks - mac80211: add fragment cache to sta_info - mac80211: check defrag PN against current frame - mac80211: prevent attacks on TKIP/WEP as well * CVE-2020-26147 - mac80211: assure all fragments are encrypted * raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations (LP: #1896578) - md: add md_submit_discard_bio() for submitting discard bio - md/raid10: extend r10bio devs to raid disks - md/raid10: pull the code that wait for blocked dev into one function - md/raid10: improve raid10 discard request - md/raid10: improve discard request for far layout - dm raid: remove unnecessary discard limits for raid0 and raid10 * [SRU] mpt3sas: only one vSES is handy even IOC has multi vSES (LP: #1926517) - scsi: mpt3sas: Only one vSES is present even when IOC has multi vSES * CVE-2021-23133 - sctp: delay auto_asconf init until binding the first addr * kvm: properly tear down PV features on hibernate (LP: #1920944) - x86/kvm: Fix pr_info() for async PF setup/teardown - x86/kvm: Teardown PV features on boot CPU as well - x86/kvm: Disable kvmclock on all CPUs on shutdown - x86/kvm: Disable all PV features on crash - x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline() * CVE-2021-31440 - bpf: Fix propagation of 32 bit unsigned bounds from 64 bit bounds * Can't detect intel wifi 6235 (LP: #1920180) - SAUCE: iwlwifi: add new pci id for 6235 * [SRU] Patch for flicker and glitching on common LCD display panels, intel framebuffer (LP: #1925685) - drm/i915: Try to use fast+narrow link on eDP again and fall back to the old max strategy on failure - drm/i915/dp: Use slow and wide link training for everything * pmtu.sh from net in ubuntu_kernel_selftests failed with no error message (LP: #1887661) - selftests: pmtu.sh: use $ksft_skip for skipped return code * IR Remote Keys Repeat Many Times Starting with Kernel 5.8.0-49 (LP: #1926030) - SAUCE: Revert "media: rc: ite-cir: fix min_timeout calculation" - SAUCE: Revert "media: rc: fix timeout handling after switch to microsecond durations" * Groovy update: upstream stable patchset 2021-05-20 (LP: #1929132) - Input: nspire-keypad - enable
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug was fixed in the package linux - 5.11.0-22.23 --- linux (5.11.0-22.23) hirsute; urgency=medium * UAF on CAN J1939 j1939_can_recv (LP: #1932209) - SAUCE: can: j1939: delay release of j1939_priv after synchronize_rcu * UAF on CAN BCM bcm_rx_handler (LP: #1931855) - SAUCE: can: bcm: delay release of struct bcm_op after synchronize_rcu linux (5.11.0-20.21) hirsute; urgency=medium * hirsute/linux: 5.11.0-20.21 -proposed tracker (LP: #1930854) * ath11k WIFI not working in proposed kernel 5.11.0-19-generic (LP: #1930637) - bus: mhi: core: Download AMSS image from appropriate function linux (5.11.0-19.20) hirsute; urgency=medium * hirsute/linux: 5.11.0-19.20 -proposed tracker (LP: #1930075) * Packaging resync (LP: #1786013) - update dkms package versions * CVE-2021-33200 - bpf: Wrap aux data inside bpf_sanitize_info container - bpf: Fix mask direction swap upon off reg sign change - bpf: No need to simulate speculative domain for immediates * AX201 BT will cause system could not enter S0i3 (LP: #1928047) - SAUCE: drm/i915: Tweaked Wa_14010685332 for all PCHs * CVE-2021-3490 - SAUCE: Revert "UBUNTU: SAUCE: bpf: verifier: fix ALU32 bounds tracking with bitwise ops" - gpf: Fix alu32 const subreg bound tracking on bitwise operations * CVE-2021-3489 - SAUCE: Revert "UBUNTU: SAUCE: bpf: prevent writable memory-mapping of read- only ringbuf pages" - bpf: Prevent writable memory-mapping of read-only ringbuf pages * Select correct boot VGA when BIOS doesn't do it properly (LP: #1929217) - vgaarb: Use ACPI HID name to find integrated GPU * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle (LP: #1928242) - USB: Verify the port status when timeout happens during port suspend * CVE-2020-26145 - ath10k: drop fragments with multicast DA for SDIO - ath10k: add CCMP PN replay protection for fragmented frames for PCIe - ath10k: drop fragments with multicast DA for PCIe * CVE-2020-26141 - ath10k: Fix TKIP Michael MIC verification for PCIe * CVE-2020-24587 - ath11k: Clear the fragment cache during key install * CVE-2020-24588 - mac80211: properly handle A-MSDUs that start with an RFC 1042 header - cfg80211: mitigate A-MSDU aggregation attacks - mac80211: drop A-MSDUs on old ciphers - ath10k: drop MPDU which has discard flag set by firmware for SDIO * CVE-2020-26139 - mac80211: do not accept/forward invalid EAPOL frames * CVE-2020-24586 // CVE-2020-24587 // CVE-2020-24587 for such cases. - mac80211: extend protection against mixed key and fragment cache attacks * CVE-2020-24586 // CVE-2020-24587 - mac80211: prevent mixed key and fragment cache attacks - mac80211: add fragment cache to sta_info - mac80211: check defrag PN against current frame - mac80211: prevent attacks on TKIP/WEP as well * CVE-2020-26147 - mac80211: assure all fragments are encrypted * raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations (LP: #1896578) - md: add md_submit_discard_bio() for submitting discard bio - md/raid10: extend r10bio devs to raid disks - md/raid10: pull the code that wait for blocked dev into one function - md/raid10: improve raid10 discard request - md/raid10: improve discard request for far layout - dm raid: remove unnecessary discard limits for raid0 and raid10 * [SRU][OEM-5.10/H] Fix typec output on AMD Cezanne GPU (LP: #1929646) - drm/amd/display: use max lb for latency hiding * kvm: properly tear down PV features on hibernate (LP: #1920944) - x86/kvm: Fix pr_info() for async PF setup/teardown - x86/kvm: Teardown PV features on boot CPU as well - x86/kvm: Disable kvmclock on all CPUs on shutdown - x86/kvm: Disable all PV features on crash - x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline() * Add support for AMD wireless button (LP: #1928820) - platform/x86: hp-wireless: add AMD's hardware id to the supported list * Can't detect intel wifi 6235 (LP: #1920180) - SAUCE: iwlwifi: add new pci id for 6235 * Speed up resume time on HP laptops (LP: #1929048) - platform/x86: hp_accel: Avoid invoking _INI to speed up resume * Fix kernel panic on Intel Bluetooth (LP: #1928838) - Bluetooth: Shutdown controller after workqueues are flushed or cancelled * build module CONFIG_SND_SOC_INTEL_SOUNDWIRE_SOF_MACH=m for 5.11, 5.13-rc2 and later (LP: #1921632) - [Config] enable soundwire audio mach driver * [SRU] Patch for flicker and glitching on common LCD display panels, intel framebuffer (LP: #1925685) - drm/i915: Try to use fast+narrow link on eDP again and fall back to the old max strategy on failure - drm/i915/dp: Use slow and wide link training for everything * Fix screen flickering when two 4K 60Hz monitors are connected to AMD Oland
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug was fixed in the package linux - 5.4.0-77.86 --- linux (5.4.0-77.86) focal; urgency=medium * UAF on CAN J1939 j1939_can_recv (LP: #1932209) - SAUCE: can: j1939: delay release of j1939_priv after synchronize_rcu * UAF on CAN BCM bcm_rx_handler (LP: #1931855) - SAUCE: can: bcm: delay release of struct bcm_op after synchronize_rcu linux (5.4.0-76.85) focal; urgency=medium * focal/linux: 5.4.0-76.85 -proposed tracker (LP: #1932123) * Upstream v5.9 introduced 'module' patches that removed exported symbols (LP: #1932065) - SAUCE: Revert "modules: inherit TAINT_PROPRIETARY_MODULE" - SAUCE: Revert "modules: return licensing information from find_symbol" - SAUCE: Revert "modules: rename the licence field in struct symsearch to license" - SAUCE: Revert "modules: unexport __module_address" - SAUCE: Revert "modules: unexport __module_text_address" - SAUCE: Revert "modules: mark each_symbol_section static" - SAUCE: Revert "modules: mark find_symbol static" - SAUCE: Revert "modules: mark ref_module static" linux (5.4.0-75.84) focal; urgency=medium * focal/linux: 5.4.0-75.84 -proposed tracker (LP: #1930032) * Packaging resync (LP: #1786013) - update dkms package versions * CVE-2021-33200 - bpf: Wrap aux data inside bpf_sanitize_info container - bpf: Fix mask direction swap upon off reg sign change - bpf: No need to simulate speculative domain for immediates * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle (LP: #1928242) - USB: Verify the port status when timeout happens during port suspend * CVE-2020-26145 - ath10k: drop fragments with multicast DA for SDIO - ath10k: add CCMP PN replay protection for fragmented frames for PCIe - ath10k: drop fragments with multicast DA for PCIe * CVE-2020-26141 - ath10k: Fix TKIP Michael MIC verification for PCIe * CVE-2020-24588 - mac80211: properly handle A-MSDUs that start with an RFC 1042 header - cfg80211: mitigate A-MSDU aggregation attacks - mac80211: drop A-MSDUs on old ciphers - ath10k: drop MPDU which has discard flag set by firmware for SDIO * CVE-2020-26139 - mac80211: do not accept/forward invalid EAPOL frames * CVE-2020-24586 // CVE-2020-24587 // CVE-2020-24587 for such cases. - mac80211: extend protection against mixed key and fragment cache attacks * CVE-2020-24586 // CVE-2020-24587 - mac80211: prevent mixed key and fragment cache attacks - mac80211: add fragment cache to sta_info - mac80211: check defrag PN against current frame - mac80211: prevent attacks on TKIP/WEP as well * CVE-2020-26147 - mac80211: assure all fragments are encrypted * raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations (LP: #1896578) - md: add md_submit_discard_bio() for submitting discard bio - md/raid10: extend r10bio devs to raid disks - md/raid10: pull the code that wait for blocked dev into one function - md/raid10: improve raid10 discard request - md/raid10: improve discard request for far layout - dm raid: remove unnecessary discard limits for raid0 and raid10 * [SRU] mpt3sas: only one vSES is handy even IOC has multi vSES (LP: #1926517) - scsi: mpt3sas: Only one vSES is present even when IOC has multi vSES * kvm: properly tear down PV features on hibernate (LP: #1920944) - x86/kvm: Fix pr_info() for async PF setup/teardown - x86/kvm: Teardown PV features on boot CPU as well - x86/kvm: Disable kvmclock on all CPUs on shutdown - x86/kvm: Disable all PV features on crash - x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline() * Focal update: v5.4.119 upstream stable release (LP: #1929615) - Bluetooth: verify AMP hci_chan before amp_destroy - hsr: use netdev_err() instead of WARN_ONCE() - bluetooth: eliminate the potential race condition when removing the HCI controller - net/nfc: fix use-after-free llcp_sock_bind/connect - Revert "USB: cdc-acm: fix rounding error in TIOCSSERIAL" - tty: moxa: fix TIOCSSERIAL jiffies conversions - tty: amiserial: fix TIOCSSERIAL permission check - USB: serial: usb_wwan: fix TIOCSSERIAL jiffies conversions - staging: greybus: uart: fix TIOCSSERIAL jiffies conversions - USB: serial: ti_usb_3410_5052: fix TIOCSSERIAL permission check - staging: fwserial: fix TIOCSSERIAL jiffies conversions - tty: moxa: fix TIOCSSERIAL permission check - staging: fwserial: fix TIOCSSERIAL permission check - usb: typec: tcpm: Address incorrect values of tcpm psy for fixed supply - usb: typec: tcpm: Address incorrect values of tcpm psy for pps supply - usb: typec: tcpm: update power supply once partner accepts - usb: xhci-mtk: remove or operator for setting schedule parameters - usb: xhci-mtk: improve bandwidth scheduling with TT - ASoC: samsung: tm2_wm5110: check
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug was fixed in the package linux - 5.11.0-20.21+21.10.1 --- linux (5.11.0-20.21+21.10.1) impish; urgency=medium * impish/linux: 5.11.0-20.21+21.10.1 -proposed tracker (LP: #1930056) * Packaging resync (LP: #1786013) - update dkms package versions [ Ubuntu: 5.11.0-20.21 ] * hirsute/linux: 5.11.0-20.21 -proposed tracker (LP: #1930854) * ath11k WIFI not working in proposed kernel 5.11.0-19-generic (LP: #1930637) - bus: mhi: core: Download AMSS image from appropriate function [ Ubuntu: 5.11.0-19.20 ] * hirsute/linux: 5.11.0-19.20 -proposed tracker (LP: #1930075) * Packaging resync (LP: #1786013) - update dkms package versions * CVE-2021-33200 - bpf: Wrap aux data inside bpf_sanitize_info container - bpf: Fix mask direction swap upon off reg sign change - bpf: No need to simulate speculative domain for immediates * AX201 BT will cause system could not enter S0i3 (LP: #1928047) - SAUCE: drm/i915: Tweaked Wa_14010685332 for all PCHs * CVE-2021-3490 - SAUCE: Revert "UBUNTU: SAUCE: bpf: verifier: fix ALU32 bounds tracking with bitwise ops" - gpf: Fix alu32 const subreg bound tracking on bitwise operations * CVE-2021-3489 - SAUCE: Revert "UBUNTU: SAUCE: bpf: prevent writable memory-mapping of read- only ringbuf pages" - bpf: Prevent writable memory-mapping of read-only ringbuf pages * Select correct boot VGA when BIOS doesn't do it properly (LP: #1929217) - vgaarb: Use ACPI HID name to find integrated GPU * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle (LP: #1928242) - USB: Verify the port status when timeout happens during port suspend * CVE-2020-26145 - ath10k: drop fragments with multicast DA for SDIO - ath10k: add CCMP PN replay protection for fragmented frames for PCIe - ath10k: drop fragments with multicast DA for PCIe * CVE-2020-26141 - ath10k: Fix TKIP Michael MIC verification for PCIe * CVE-2020-24587 - ath11k: Clear the fragment cache during key install * CVE-2020-24588 - mac80211: properly handle A-MSDUs that start with an RFC 1042 header - cfg80211: mitigate A-MSDU aggregation attacks - mac80211: drop A-MSDUs on old ciphers - ath10k: drop MPDU which has discard flag set by firmware for SDIO * CVE-2020-26139 - mac80211: do not accept/forward invalid EAPOL frames * CVE-2020-24586 // CVE-2020-24587 // CVE-2020-24587 for such cases. - mac80211: extend protection against mixed key and fragment cache attacks * CVE-2020-24586 // CVE-2020-24587 - mac80211: prevent mixed key and fragment cache attacks - mac80211: add fragment cache to sta_info - mac80211: check defrag PN against current frame - mac80211: prevent attacks on TKIP/WEP as well * CVE-2020-26147 - mac80211: assure all fragments are encrypted * raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations (LP: #1896578) - md: add md_submit_discard_bio() for submitting discard bio - md/raid10: extend r10bio devs to raid disks - md/raid10: pull the code that wait for blocked dev into one function - md/raid10: improve raid10 discard request - md/raid10: improve discard request for far layout - dm raid: remove unnecessary discard limits for raid0 and raid10 * [SRU][OEM-5.10/H] Fix typec output on AMD Cezanne GPU (LP: #1929646) - drm/amd/display: use max lb for latency hiding * kvm: properly tear down PV features on hibernate (LP: #1920944) - x86/kvm: Fix pr_info() for async PF setup/teardown - x86/kvm: Teardown PV features on boot CPU as well - x86/kvm: Disable kvmclock on all CPUs on shutdown - x86/kvm: Disable all PV features on crash - x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline() * Add support for AMD wireless button (LP: #1928820) - platform/x86: hp-wireless: add AMD's hardware id to the supported list * Can't detect intel wifi 6235 (LP: #1920180) - SAUCE: iwlwifi: add new pci id for 6235 * Speed up resume time on HP laptops (LP: #1929048) - platform/x86: hp_accel: Avoid invoking _INI to speed up resume * Fix kernel panic on Intel Bluetooth (LP: #1928838) - Bluetooth: Shutdown controller after workqueues are flushed or cancelled * build module CONFIG_SND_SOC_INTEL_SOUNDWIRE_SOF_MACH=m for 5.11, 5.13-rc2 and later (LP: #1921632) - [Config] enable soundwire audio mach driver * [SRU] Patch for flicker and glitching on common LCD display panels, intel framebuffer (LP: #1925685) - drm/i915: Try to use fast+narrow link on eDP again and fall back to the old max strategy on failure - drm/i915/dp: Use slow and wide link training for everything * Fix screen flickering when two 4K 60Hz monitors are connected to AMD Oland GFX (LP: #1928361) - drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected * Display abnormal on the
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Hi Evan, Just checking in. Are you still running 5.4.0-75-generic on your server? Is everything nice and stable? Is your data fully intact, and no signs of corruption at all? My server has been running for two weeks now, and it does a fstrim every 30 minutes, and everything appears to be stable, and I don't have any corruption when I fsck my disks. If things keep looking good, the SRU cycle will complete early next week, and the kernel will be released to -updates around the 21st of June, give or take a few days if any CVEs turn up. Let me know how things are going. Thanks, Matthew -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commit enables Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Hi Evan, Great to hear things are looking good for you and that the block discard performance is there. If possible, keep running the kernel from -proposed for a bit longer, just to make sure nothing comes up on longer runs. I spent some time today performing verification on all the kernels in -proposed, testing block discard performance [1], and also running through the regression testcase from LP #1907262 [2]. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578 [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262 All kernels performed as expected, with block discard on 4x 1.9TB NVMe disks on an i3.8xlarge AWS instance taking 3-4 seconds, and the consistency checks performed returned clean disks, with no filesystem or data corruption. I have documented my tests in my verification messages: Hirsute: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/26 Groovy: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/27 Focal: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/28 Bionic: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/29 I have marked the launchpad bug as verified for all releases. I'm still running my own testing, with my /home directory being on a Raid10 array on a Google Cloud instance, and it has no issues. If things keep going well, we should see a release to -updates around the 21st of June, give or take a few days if any CVEs turn up. Thanks, Matthew -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Performing verification for Bionic. I'm going to do three rounds of verification. The first is the testcase from this bug, showing block discard performance. The second is running through the regression reproducer from bug 1907262. The third will be results from my testing with my /home directory on a cloud instance with Raid10 backed disks, 3x customer testing and 2x community user testing. This will be in a separate comment closer to the release date once I have collected results. Starting with the testcase for this bug. I started a i3.8xlarge instance on AWS, enabled -proposed and installed 4.15.0-145-generic. From there, I ran through the testcase of making a Raid10 array, and formatting it with xfs, and block discard performance was excellent: https://paste.ubuntu.com/p/Sr8tR9yhRd/ It took 3.2 seconds to format the array with xfs, and 1.0 seconds for a fstrim, as opposed to the 20 minutes it took beforehand. Performance with block discard is excellent. Moving onto the second testcase, the regression reproducer from bug 1907262. I started a n1-standard-2 VM on Google Cloud, and attached 2x NVMe scratch disks. I enabled -proposed and installed 4.15.0-145-generic. I ran through the testcase of making a Raid10 array, doing consistency checks, ensuring that mismatch count is 0, creating a file, deleting it, performing a fstrim, and more consistency checks, then taking the raid array down and bringing up one disk at a time, and performing a fsck.ext4. All disks came back clean: https://paste.ubuntu.com/p/h8gTd4JQ8Y/ Since the block discard performance is there, and there is no apparent data corruption going on after a fstrim, I will mark this verified for Bionic. ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Performing verification for Focal. I'm going to do three rounds of verification. The first is the testcase from this bug, showing block discard performance. The second is running through the regression reproducer from bug 1907262. The third will be results from my testing with my /home directory on a cloud instance with Raid10 backed disks, 3x customer testing and 2x community user testing. This will be in a separate comment closer to the release date once I have collected results. Starting with the testcase for this bug. I started a i3.8xlarge instance on AWS, enabled -proposed and installed 5.4.0-75-generic. From there, I ran through the testcase of making a Raid10 array, and formatting it with xfs, and block discard performance was excellent: https://paste.ubuntu.com/p/mdQ6Wjr4yK/ It took 6.6 seconds to format the array with xfs, and 3.8 seconds for a fstrim, as opposed to the 20 minutes it took beforehand. Performance with block discard is excellent. Moving onto the second testcase, the regression reproducer from bug 1907262. I started a n1-standard-2 VM on Google Cloud, and attached 2x NVMe scratch disks. I enabled -proposed and installed 5.4.0-75-generic. I ran through the testcase of making a Raid10 array, doing consistency checks, ensuring that mismatch count is 0, creating a file, deleting it, performing a fstrim, and more consistency checks, then taking the raid array down and bringing up one disk at a time, and performing a fsck.ext4. All disks came back clean: https://paste.ubuntu.com/p/jFHW26kcCK/ Since the block discard performance is there, and there is no apparent data corruption going on after a fstrim, I will mark this verified for Focal. ** Tags removed: verification-needed-focal ** Tags added: verification-done-focal -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link:
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Performing verification for Hirsute. I'm going to do three rounds of verification. The first is the testcase from this bug, showing block discard performance. The second is running through the regression reproducer from bug 1907262. The third will be results from my testing with my /home directory on a cloud instance with Raid10 backed disks, 3x customer testing and 2x community user testing. This will be in a separate comment closer to the release date once I have collected results. Starting with the testcase for this bug. I started a i3.8xlarge instance on AWS, enabled -proposed and installed 5.11.0-20-generic. From there, I ran through the testcase of making a Raid10 array, and formatting it with xfs, and block discard performance was excellent: https://paste.ubuntu.com/p/X5sdCGT78Y/ It took 4.6 seconds to format the array with xfs, and 2.8 seconds for a fstrim, as opposed to the 20 minutes it took beforehand. Performance with block discard is excellent. Moving onto the second testcase, the regression reproducer from bug 1907262. I started a n1-standard-2 VM on Google Cloud, and attached 2x NVMe scratch disks. I enabled -proposed and installed 5.11.0-20-generic. I ran through the testcase of making a Raid10 array, doing consistency checks, ensuring that mismatch count is 0, creating a file, deleting it, performing a fstrim, and more consistency checks, then taking the raid array down and bringing up one disk at a time, and performing a fsck.ext4. All disks came back clean: https://paste.ubuntu.com/p/Xy6CPCQXZN/ Since the block discard performance is there, and there is no apparent data corruption going on after a fstrim, I will mark this verified for Hirsute. ** Tags removed: verification-needed-hirsute ** Tags added: verification-done-hirsute -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Performing verification for Groovy. I'm going to do three rounds of verification. The first is the testcase from this bug, showing block discard performance. The second is running through the regression reproducer from bug 1907262. The third will be results from my testing with my /home directory on a cloud instance with Raid10 backed disks, 3x customer testing and 2x community user testing. This will be in a separate comment closer to the release date once I have collected results. Starting with the testcase for this bug. I started a i3.8xlarge instance on AWS, enabled -proposed and installed 5.8.0-56-generic. From there, I ran through the testcase of making a Raid10 array, and formatting it with xfs, and block discard performance was excellent: https://paste.ubuntu.com/p/GGXfjCHfDR/ It took 5.7 seconds to format the array with xfs, and 2.3 seconds for a fstrim, as opposed to the 20 minutes it took beforehand. Performance with block discard is excellent. Moving onto the second testcase, the regression reproducer from bug 1907262. I started a n1-standard-2 VM on Google Cloud, and attached 2x NVMe scratch disks. I enabled -proposed and installed 5.8.0-56-generic. I ran through the testcase of making a Raid10 array, doing consistency checks, ensuring that mismatch count is 0, creating a file, deleting it, performing a fstrim, and more consistency checks, then taking the raid array down and bringing up one disk at a time, and performing a fsck.ext4. All disks came back clean: https://paste.ubuntu.com/p/75xWd4Z3NZ/ Since the block discard performance is there, and there is no apparent data corruption going on after a fstrim, I will mark this verified for Groovy. ** Tags removed: verification-needed-groovy ** Tags added: verification-done-groovy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link:
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Thanks Matt. I have it installed on one machine so far and looks good (in the past 10 minutes). fstrim of a ~30 TB RAID 10 took 73 seconds instead of multiple hours. # uname -a Linux xxx 5.4.0-75-generic #84-Ubuntu SMP Fri May 28 16:28:37 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux # df -h /dev/md0 Filesystem Size Used Avail Use% Mounted on /dev/md0 30T 212G 29T 1% /opt/raid # cat /proc/mdstat Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] md0 : active raid10 nvme7n1[7] nvme2n1[2] nvme6n1[6] nvme4n1[4] nvme1n1[1] nvme3n1[3] nvme0n1[0] nvme5n1[5] 31255576576 blocks super 1.2 512K chunks 2 near-copies [8/8] [] bitmap: 11/233 pages [44KB], 65536KB chunk unused devices: # time fstrim /opt/raid real1m13.162s user0m0.004s sys 0m0.351s -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Hi Evan, The kernel team have built all of the kernels for this SRU cycle, and have placed them into -proposed for verification. We now need to do some thorough testing and make sure that Raid10 arrays function with good performance, ensure data integrity and make sure we won't be introducing any regressions when these kernels are released in two weeks time. I would really appreciate it if you could help test and verify these kernels function as intended. Instructions to Install: 1) cat << EOF | sudo tee /etc/apt/sources.list.d/ubuntu-$(lsb_release -cs)-proposed.list # Enable Ubuntu proposed archive deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -cs)-proposed main universe EOF 2) sudo apt update For 21.04 / Hirsute: 3) sudo apt install linux-image-5.11.0-20-generic linux-modules-5.11.0-20-generic \ linux-modules-extra-5.11.0-20-generic linux-headers-5.11.0-20-generic For 20.10 / Groovy: 3) sudo apt install linux-image-5.8.0-56-generic linux-modules-5.8.0-56-generic \ linux-modules-extra-5.8.0-56-generic linux-headers-5.8.0-56-generic For 20.04 / Focal: 3) sudo apt install linux-image-5.4.0-75-generic linux-modules-5.4.0-75-generic \ linux-modules-extra-5.4.0-75-generic linux-headers-5.4.0-75-generic For 18.04 / Bionic: For the 5.4 Bionic HWE Kernel: 3) sudo apt install linux-image-5.4.0-75-generic linux-modules-5.4.0-75-generic \ linux-modules-extra-5.4.0-75-generic linux-headers-5.4.0-75-generic For the 4.15 Bionic GA Kernel: 3) sudo apt install linux-image-4.15.0-145-generic linux-modules-4.15.0-145-generic \ linux-modules-extra-4.15.0-145-generic linux-headers-4.15.0-145-generic 4) sudo reboot 5) uname -rv You may need to modify your grub configuration to boot the correct kernel. If you need help, read these instructions: https://paste.ubuntu.com/p/XrTzWPPnWJ/ I am running the -proposed kernel on my cloud instance with my /home directory on a Raid10 array made up of 4x NVMe devices, and things are looking okay. I will be performing my detailed regression testing against these kernels tomorrow, and I will write back with the results then. Please help test these kernels in -proposed, and let me know how they go. Thanks, Matthew -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed- groovy'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-groovy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commit enables Radid10 to use large discards, instead of splitting into many bios, since the technical
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed- bionic'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-bionic ** Tags added: verification-needed-focal -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commit enables Radid10 to use large discards, instead of
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commit enables Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed- hirsute'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-hirsute -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commit enables Radid10 to use large discards, instead of splitting into many bios, since the
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Hi Evan, As I mentioned in my previous message, I submitted the patches to the Ubuntu kernel mailing list for SRU. These patches have now gotten 2 acks [1][2] from senior kernel team members, and the patches have now been applied [3] to the 4.15, 5.4, 5.8 and 5.11 kernels. [1] https://lists.ubuntu.com/archives/kernel-team/2021-May/120475.html [2] https://lists.ubuntu.com/archives/kernel-team/2021-May/120799.html [3] https://lists.ubuntu.com/archives/kernel-team/2021-May/120800.html This is what is going to happen next. Next week, between the 31st of May and 4th of June, the kernel team will build the next kernel update, and place it in -proposed for testing. As soon as these kernels enter -proposed, we need to install and test Raid10 in these new kernels as much as possible. The testing and verification window is between the 7th and 18th of June. If all goes well, we can mark the launchpad bug as verified, and we will see a release to -updates around the 21st of June, give or take a few days if any CVEs turn up. The schedule is on https://kernel.ubuntu.com/ if anything were to change. I will write back once the next kernel update is in -proposed, likely early to mid next week. I would really, really appreciate it if you could help test the kernels when they arrive in -proposed, as I really don't want to introduce any more regressions. Thanks, Matthew -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Changed in: linux (Ubuntu Bionic) Status: In Progress => Fix Committed ** Changed in: linux (Ubuntu Focal) Status: In Progress => Fix Committed ** Changed in: linux (Ubuntu Groovy) Status: In Progress => Fix Committed ** Changed in: linux (Ubuntu Hirsute) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Status in linux source package in Hirsute: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commit enables Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit ca4a4e9a55beeb138bb06e3867f5e486da896d44 Author: Mike Snitzer Date: Fri Apr 30 14:38:37 2021 -0400 Subject: dm raid: remove unnecessary discard limits for raid0 and raid10 Link: https://github.com/torvalds/linux/commit/ca4a4e9a55beeb138bb06e3867f5e486da896d44 The
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
I have it running on two machines now that needed big RAID 10s: # uname -rv 5.4.0-72-generic #80+TEST1896578v20210504b1-Ubuntu SMP Tue May 4 00:30:36 UTC 202 # df -h /opt/raid Filesystem Size Used Avail Use% Mounted on /dev/md0 30T 208G 29T 1% /opt/raid # cat /proc/mdstat Personalities : [raid10] md0 : active raid10 nvme7n1[7] nvme6n1[6] nvme5n1[5] nvme4n1[4] nvme3n1[3] nvme2n1[2] nvme1n1[1] nvme0n1[0] 31255576576 blocks super 1.2 512K chunks 2 near-copies [8/8] [] [>] resync = 0.1% (48777600/31255576576) finish=2514.8min speed=206813K/sec bitmap: 233/233 pages [932KB], 65536KB chunk FWIW the mkfs.xfs took ~1 minute across 8x8TB NVMe disks with this patched kernel. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Focal: In Progress Status in linux source package in Groovy: In Progress Status in linux source package in Hirsute: In Progress Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commit enables
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Hi Evan, The patches have been submitted for SRU to the Ubuntu kernel mailing list, for the 4.15, 5.4, 5.8 and 5.11 kernels: [0] https://lists.ubuntu.com/archives/kernel-team/2021-May/119935.html [1] https://lists.ubuntu.com/archives/kernel-team/2021-May/119936.html [2] https://lists.ubuntu.com/archives/kernel-team/2021-May/119937.html [3] https://lists.ubuntu.com/archives/kernel-team/2021-May/119938.html [4] https://lists.ubuntu.com/archives/kernel-team/2021-May/119939.html [5] https://lists.ubuntu.com/archives/kernel-team/2021-May/119941.html [6] https://lists.ubuntu.com/archives/kernel-team/2021-May/119940.html [7] https://lists.ubuntu.com/archives/kernel-team/2021-May/119942.html [8] https://lists.ubuntu.com/archives/kernel-team/2021-May/119943.html [9] https://lists.ubuntu.com/archives/kernel-team/2021-May/119944.html [10] https://lists.ubuntu.com/archives/kernel-team/2021-May/119945.html [11] https://lists.ubuntu.com/archives/kernel-team/2021-May/119946.html The kernel team have reviewed the patches, but they are in no hurry to ACK the patchset [12], and they also haven't outright rejected it. [12] https://lists.ubuntu.com/archives/kernel-team/2021-May/120051.html The current status is that the kernel team have requested more testing to be performed, and that the patches will not make the current SRU cycle. They will instead be submitted for consideration in the next SRU cycle. You can look at https://kernel.ubuntu.com/ for dates of various SRU cycles. If the patches are accepted for the 2021.05.31 SRU cycle, then you could expect a supported kernel to be available in late June. If you want to help, then please consider installing the test kernels in comment #14 and helping test Raid10 on ssds / NVMe drives that support block discard. I am currently using a cloud instance with 4x NVMe disks in Raid10 as my /home directory, and things seem okay. I'll keep you updated on the progress of this patchset via this bug. Thanks, Matthew -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Focal: In Progress Status in linux source package in Groovy: In Progress Status in linux source package in Hirsute: In Progress Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link:
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Is there any ETA on a supported kernel with this patch? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Focal: In Progress Status in linux source package in Groovy: In Progress Status in linux source package in Hirsute: In Progress Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commit enables Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit ca4a4e9a55beeb138bb06e3867f5e486da896d44 Author: Mike Snitzer Date: Fri Apr 30 14:38:37 2021 -0400 Subject: dm raid: remove unnecessary discard limits for raid0 and raid10 Link: https://github.com/torvalds/linux/commit/ca4a4e9a55beeb138bb06e3867f5e486da896d44 The commits more or less cherry pick to the 5.11, 5.8, 5.4 and 4.15 kernels, with the following minor backports: 1) submit_bio_noacct() needed to be renamed to generic_make_request() since it was recently changed in: commit ed00aabd5eb9fb44d6aff1173234a2e911b9fead
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
I have completed most of my regression testing, and things are still looking good. The performance of the block discard is there, and I haven't seen any data corruption. In particular, I have been testing against the testcase for the regression that occurred with the previous revision of the patches, back in December. The testcase is covered in bug 1907262 [1]. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262 For each of the 5.11, 5.8, 5.4 and 4.15 kernels, the problem does not reproduce, as the values of /sys/block/md0/md/mismatch_cnt are always 0, and mounting each disk in singular and performing a full deep fsck shows no data corruption. Test results for each kernel are below: 5.11.0-16-generic #17+TEST1896578v20210503b1-Ubuntu https://paste.ubuntu.com/p/Dp3sR9mNdY/ 5.8.0-50-generic #56+TEST1896578v20210504b1-Ubuntu https://paste.ubuntu.com/p/tXmtmd5Jys/ 5.4.0-72-generic #80+TEST1896578v20210504b1-Ubuntu https://paste.ubuntu.com/p/VzX2mXcKbF/ 4.15.0-142-generic #146+TEST1896578v20210504b1-Ubuntu https://paste.ubuntu.com/p/HpMcX3N9fD/ I think I will look into some longer running tests as well, more info on that later. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Focal: In Progress Status in linux source package in Groovy: In Progress Status in linux source package in Hirsute: In Progress Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Description changed: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard - performance problems. These commits have now landed in 5.10-rc1. + performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The - following commits enable Radid10 to use large discards, instead of + following commit enables Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit ca4a4e9a55beeb138bb06e3867f5e486da896d44 Author: Mike Snitzer Date: Fri Apr 30 14:38:37 2021 -0400 Subject: dm raid: remove unnecessary discard limits for raid0 and raid10 Link: https://github.com/torvalds/linux/commit/ca4a4e9a55beeb138bb06e3867f5e486da896d44 The commits more or less cherry pick to the 5.11, 5.8, 5.4 and 4.15 kernels, with the following minor backports: 1) submit_bio_noacct() needed to be renamed to generic_make_request() since it was recently changed in: commit ed00aabd5eb9fb44d6aff1173234a2e911b9fead Author: Christoph Hellwig Date: Wed Jul 1 10:59:44 2020 +0200 Subject: block: rename generic_make_request to submit_bio_noacct Link: https://github.com/torvalds/linux/commit/ed00aabd5eb9fb44d6aff1173234a2e911b9fead 2) In the 4.15, 5.4 and 5.8 kernels, trace_block_bio_remap() needs to have its request_queue argument put back in place. It was recently removed in: commit
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
If anyone is interested in testing, there are new re-spins of the test kernels available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1896578-test The patches used are the ones I will be submitting for SRU, and are more or less identical to the patches in the previous test kernels I supplied in February. Please go ahead and do some testing, and let me know if you find any problems. Please note this package is NOT SUPPORTED by Canonical, and is for TESTING PURPOSES ONLY. ONLY Install in a dedicated test environment. Instructions to install: 1) sudo add-apt-repository ppa:mruffell/lp1896578-test 2) sudo apt update For 21.04 / Hirsute: 3) sudo apt install linux-image-unsigned-5.11.0-16-generic linux-modules-5.11.0-16-generic \ linux-modules-extra-5.11.0-16-generic linux-headers-5.11.0-16-generic For 20.10 / Groovy: 3) sudo apt install linux-image-unsigned-5.8.0-50-generic linux-modules-5.8.0-50-generic \ linux-modules-extra-5.8.0-50-generic linux-headers-5.8.0-50-generic For 20.04 / Focal: 3) sudo apt install linux-image-unsigned-5.4.0-72-generic linux-modules-5.4.0-72-generic \ linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic For 18.04 / Bionic: For the 5.4 Bionic HWE kernel: 3) sudo apt install linux-image-unsigned-5.4.0-72-generic linux-modules-5.4.0-72-generic \ linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic For the 4.15 Bionic GA kernel: 3) sudo apt install linux-image-unsigned-4.15.0-142-generic linux-modules-4.15.0-142-generic \ linux-modules-extra-4.15.0-142-generic linux-headers-4.15.0-142-generic 4) sudo reboot 5) uname -rv Make sure the string "+TEST1896578v20210504b1" is present in the uname -rv. You may need to modify your grub configuration to boot the correct kernel. If you need help, read these instructions: https://paste.ubuntu.com/p/XrTzWPPnWJ/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Focal: In Progress Status in linux source package in Groovy: In Progress Status in linux source package in Hirsute: In Progress Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Description changed: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit ca4a4e9a55beeb138bb06e3867f5e486da896d44 Author: Mike Snitzer Date: Fri Apr 30 14:38:37 2021 -0400 Subject: dm raid: remove unnecessary discard limits for raid0 and raid10 Link: https://github.com/torvalds/linux/commit/ca4a4e9a55beeb138bb06e3867f5e486da896d44 The commits more or less cherry pick to the 5.11, 5.8, 5.4 and 4.15 kernels, with the following minor backports: 1) submit_bio_noacct() needed to be renamed to generic_make_request() since it was recently changed in: commit ed00aabd5eb9fb44d6aff1173234a2e911b9fead Author: Christoph Hellwig Date: Wed Jul 1 10:59:44 2020 +0200 Subject: block: rename generic_make_request to submit_bio_noacct Link: https://github.com/torvalds/linux/commit/ed00aabd5eb9fb44d6aff1173234a2e911b9fead - 2) bio_split(), mempool_alloc(), bio_clone_fast() all needed their + 2) In the 4.15, 5.4 and 5.8 kernels, trace_block_bio_remap() needs to + have its request_queue argument put back in place. It was recently + removed in: + + commit 1c02fca620f7273b597591065d366e2cca948d8f + Author: Christoph Hellwig + Date: Thu
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Also affects: linux (Ubuntu Hirsute) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Hirsute) Status: New => In Progress ** Changed in: linux (Ubuntu Hirsute) Importance: Undecided => Medium ** Changed in: linux (Ubuntu Hirsute) Assignee: (unassigned) => Matthew Ruffell (mruffell) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Focal: In Progress Status in linux source package in Groovy: In Progress Status in linux source package in Hirsute: In Progress Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 Author: Xiao Ni Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 Author: Xiao Ni Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 commit f2e7e269a7525317752d472bb48a549780e87d22 Author: Xiao Ni Date: Thu Feb 4 15:50:45 2021 +0800 Subject: md/raid10: pull the code that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 commit d30588b2731fb01e1616cf16c3fe79a1443e29aa Author: Xiao Ni Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa commit 254c271da0712ea8914f187588e0f81f7678ee2f Author: Xiao Ni Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit ca4a4e9a55beeb138bb06e3867f5e486da896d44 Author: Mike Snitzer Date: Fri Apr 30 14:38:37 2021 -0400 Subject: dm raid: remove unnecessary discard limits for raid0 and raid10 Link: https://github.com/torvalds/linux/commit/ca4a4e9a55beeb138bb06e3867f5e486da896d44 The
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Description changed: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. - commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 - Author: Xiao Ni - Date: Tue Aug 25 13:42:59 2020 +0800 + commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8 + Author: Xiao Ni + Date: Thu Feb 4 15:50:43 2021 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio - Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 - - commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 - Author: Xiao Ni - Date: Tue Aug 25 13:43:00 2020 +0800 + Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8 + + commit c2968285925adb97b9aa4ede94c1f1ab61ce0925 + Author: Xiao Ni + Date: Thu Feb 4 15:50:44 2021 +0800 Subject: md/raid10: extend r10bio devs to raid disks - Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 - - commit f046f5d0d79cdb968f219ce249e497fd1accf484 - Author: Xiao Ni - Date: Tue Aug 25 13:43:01 2020 +0800 - Subject: md/raid10: pull codes that wait for blocked dev into one function - Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 - - commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 - Author: Xiao Ni - Date: Wed Sep 2 20:00:22 2020 +0800 + Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925 + + commit f2e7e269a7525317752d472bb48a549780e87d22 + Author: Xiao Ni + Date: Thu Feb 4 15:50:45 2021 +0800 + Subject: md/raid10: pull the code that wait for blocked dev into one function + Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22 + + commit d30588b2731fb01e1616cf16c3fe79a1443e29aa + Author: Xiao Ni + Date: Thu Feb 4 15:50:46 2021 +0800 Subject: md/raid10: improve raid10 discard request - Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 - - commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 - Author: Xiao Ni - Date: Wed Sep 2 20:00:23 2020 +0800 + Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa + + commit 254c271da0712ea8914f187588e0f81f7678ee2f + Author: Xiao Ni + Date: Thu Feb 4 15:50:47 2021 +0800 Subject: md/raid10: improve discard request for far layout - Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 + Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. - commit
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Hi everyone, The original patch author, Xiao Ni, has sent a V2 patchset to the linux- raid mailing list for feedback. This new patchset fixes the problems the previous version had, namely, properly calculating the discard offset for second and onward disks, and correctly calculates the stripe size in far layouts. The patches are: https://www.spinics.net/lists/raid/msg67208.html https://www.spinics.net/lists/raid/msg67212.html https://www.spinics.net/lists/raid/msg67213.html https://www.spinics.net/lists/raid/msg67209.html https://www.spinics.net/lists/raid/msg67210.html https://www.spinics.net/lists/raid/msg67211.html We now need to thoroughly test and provide feedback to Xiao and the Raid subsystem maintainer before these patches can get merged into mainline again. We really need to make sure that these patches don't cause any data corruption. I have backported the patchset to the 4.15, 5.4 and 5.8 kernels. Backports for 5.4 and 5.8 kernels: https://paste.ubuntu.com/p/vPFFPMjhbv/ https://paste.ubuntu.com/p/MCGH8v7Rqk/ https://paste.ubuntu.com/p/rppy39Qgkz/ https://paste.ubuntu.com/p/Dsqy4PQNzJ/ https://paste.ubuntu.com/p/mZ9VDBD8d5/ https://paste.ubuntu.com/p/vJNYZyGTWH/ https://paste.ubuntu.com/p/M4sMwhgWTj/ Backports for the 4.15 kernel: https://paste.ubuntu.com/p/X9rRHT59qf/ https://paste.ubuntu.com/p/VWwW9JbBHy/ https://paste.ubuntu.com/p/pFY3YbBW6t/ https://paste.ubuntu.com/p/JKg4KcHwPB/ https://paste.ubuntu.com/p/C4sf2r9jS4/ I have built test kernels for bionic, bionic HWE, focal and groovy. Performance testing confirms that the testcase for formatting a Raid10 array on NVMe disks drops from 8.5 minutes to about 6 seconds, on AWS i3.8xlarge, due to the speedup in block discard. https://paste.ubuntu.com/p/NNGqP3xdsc/ I have also run through the data corruption regression reproducer from bug 1907262, and throughout the process, the /sys/block/md0/md/mismatch_cnt was always 0, and all deep fsck checks came back clean for individual disks. https://paste.ubuntu.com/p/5DK57TzdFH/ I am happy with these results, and its time to get some wider testing on these patches. If you are interested in helping to test, please use dedicated test servers, and not production systems. These patches have caused data corruption before, so only place data on the Raid10 array that you have copies of elsewhere, and assume that total data loss could happen anytime. Please note, these test kernels are NOT SUPPORTED by Canonical, and are for TEST PURPOSES ONLY. ONLY install in a dedicated test environment. Instructions to Install (on a Bionic or Focal or Groovy system): 1) sudo add-apt-repository ppa:mruffell/lp1896578-test 2) sudo apt update For Bionic: 3) sudo apt install linux-image-unsigned-4.15.0-136-generic linux-modules-4.15.0-136-generic linux-modules-extra-4.15.0-136-generic linux-headers-4.15.0-136-generic For Bionic HWE 5.4 or Focal: 3) sudo apt install linux-image-unsigned-5.4.0-66-generic linux-modules-5.4.0-66-generic linux-modules-extra-5.4.0-66-generic linux-headers-5.4.0-66-generic For Groovy: 3) sudo apt install linux-image-unsigned-5.8.0-44-generic linux-modules-5.8.0-44-generic linux-modules-extra-5.8.0-44-generic linux-headers-5.8.0-44-generic 4) sudo reboot 5) uname -rv Bionic: 4.15.0-136-generic #140+TEST1896578v20210212b1-Ubuntu SMP Fri Feb 12 03:00:17 UTC 2 Bionic HWE: 5.4.0-66-generic #74~18.04.2+TEST1896578v20210212b1-Ubuntu SMP Fri Feb 12 02:55:4 Focal: 5.4.0-66-generic #74+TEST1896578v20210212b1-Ubuntu SMP Fri Feb 12 06:30:51 UTC 20 Groovy: 5.8.0-44-generic #50+TEST1896578v20210212b1-Ubuntu SMP Fri Feb 12 06:19:49 UTC 20 Make sure the uname matches one of the above strings before you start testing and formatting Raid10 arrays. We want to test formatting the arrays with xfs, ext4, and general usage over time with regular consistency checks and fstrims. We want to make sure that mismatch counts are 0, and all fsck -f runs are clean, and no data corruption happens. If you have any problems whatsoever, please let me know. Thanks, Matthew -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Focal: In Progress Status in linux source package in Groovy: In Progress Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Description changed: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512 Author: Mike Snitzer Date: Thu Sep 24 13:14:52 2020 -0400 Subject: dm raid: fix discard limits for raid1 and raid10 Link: https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512 commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28 Author: Mike Snitzer Date: Thu Sep 24 16:40:12 2020 -0400 Subject: dm raid: remove unnecessary discard limits for raid10 Link: https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28 All the commits mentioned follow a similar strategy which was implemented in Raid0 in the below commit, which was merged in 4.12-rc2, which fixed block discard performance issues in Raid0: commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0 Author: Shaohua Li Date: Sun May 7 17:36:24 2017 -0700 Subject: md/md0: optimize raid0 discard handling Link: https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0 The commits more or less cherry pick to the 5.8, 5.4 and 4.15 kernels, with the following minor fixups: 1)
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Changed in: linux (Ubuntu) Status: Fix Released => In Progress ** Changed in: linux (Ubuntu Bionic) Status: Fix Released => In Progress ** Changed in: linux (Ubuntu Focal) Status: Fix Released => In Progress ** Changed in: linux (Ubuntu Groovy) Status: Fix Released => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Focal: In Progress Status in linux source package in Groovy: In Progress Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512 Author: Mike Snitzer Date: Thu Sep 24 13:14:52 2020 -0400 Subject: dm raid: fix discard limits for raid1 and raid10 Link: https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512 commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28 Author: Mike Snitzer Date: Thu Sep 24 16:40:12 2020
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug was fixed in the package linux - 5.8.0-36.40+21.04.1 --- linux (5.8.0-36.40+21.04.1) hirsute; urgency=medium * Packaging resync (LP: #1786013) - update dkms package versions [ Ubuntu: 5.8.0-36.40 ] * debian/scripts/file-downloader does not handle positive failures correctly (LP: #1878897) - [Packaging] file-downloader not handling positive failures correctly [ Ubuntu: 5.8.0-35.39 ] * Packaging resync (LP: #1786013) - update dkms package versions * CVE-2021-1052 // CVE-2021-1053 - [Packaging] NVIDIA -- Add the NVIDIA 460 driver -- Kleber Sacilotto de Souza Thu, 07 Jan 2021 11:57:30 +0100 ** Changed in: linux (Ubuntu) Status: In Progress => Fix Released ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-1052 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-1053 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Fix Released Status in linux source package in Focal: Fix Released Status in linux source package in Groovy: Fix Released Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Hi Markus, I am deeply sorry for causing the regression. We are aware, and tracking the issue in bug 1907262. The kernel team have started an emergency revert and you can expect fixed kernels to be released in the next day or so. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Released Status in linux source package in Focal: Fix Released Status in linux source package in Groovy: Fix Released Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512 Author: Mike Snitzer Date: Thu Sep 24 13:14:52 2020 -0400 Subject: dm raid: fix discard limits for raid1 and raid10 Link: https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512 commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28 Author: Mike Snitzer Date: Thu Sep 24 16:40:12 2020 -0400 Subject: dm raid: remove unnecessary discard limits for raid10 Link:
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
On focal (5.4.0-56-generic) we are starting to see massive file system corruptions on systems updated to this kernel version. These systems are using LVM with discards and thin provisioning on 6 or 8 NVMe drives in a RAID10 near configuration. We are currently downgrading all systems back to 5.4.0-54-generic and hope we can provide a simple reproducer -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Released Status in linux source package in Focal: Fix Released Status in linux source package in Groovy: Fix Released Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512 Author: Mike Snitzer Date: Thu Sep 24 13:14:52 2020 -0400 Subject: dm raid: fix discard limits for raid1 and raid10 Link: https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512 commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28 Author: Mike
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug was fixed in the package linux - 4.15.0-126.129 --- linux (4.15.0-126.129) bionic; urgency=medium * bionic/linux: 4.15.0-126.129 -proposed tracker (LP: #1905305) * CVE-2020-4788 - SAUCE: powerpc/64s: Define MASKABLE_RELON_EXCEPTION_PSERIES_OOL - SAUCE: powerpc/64s: move some exception handlers out of line - powerpc/64s: flush L1D on kernel entry - SAUCE: powerpc: Add a framework for user access tracking - powerpc: Implement user_access_begin and friends - powerpc: Fix __clear_user() with KUAP enabled - powerpc/uaccess: Evaluate macro arguments once, before user access is allowed - powerpc/64s: flush L1D after user accesses linux (4.15.0-125.128) bionic; urgency=medium * bionic/linux: 4.15.0-125.128 -proposed tracker (LP: #1903137) * Update kernel packaging to support forward porting kernels (LP: #1902957) - [Debian] Update for leader included in BACKPORT_SUFFIX * Avoid double newline when running insertchanges (LP: #1903293) - [Packaging] insertchanges: avoid double newline * EFI: Fails when BootCurrent entry does not exist (LP: #183) - efivarfs: Replace invalid slashes with exclamation marks in dentries. * CVE-2020-14351 - perf/core: Fix race in the perf_mmap_close() function * raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations (LP: #1896578) - md: add md_submit_discard_bio() for submitting discard bio - md/raid10: extend r10bio devs to raid disks - md/raid10: pull codes that wait for blocked dev into one function - md/raid10: improve raid10 discard request - md/raid10: improve discard request for far layout * Bionic: btrfs: kernel BUG at /build/linux- eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254) - btrfs: use offset_in_page instead of open-coding it - btrfs: use BUG() instead of BUG_ON(1) - btrfs: drop unnecessary offset_in_page in extent buffer helpers - btrfs: extent_io: do extra check for extent buffer read write functions - btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent() - btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref() - btrfs: ctree: check key order before merging tree blocks * Bionic update: upstream stable patchset 2020-11-04 (LP: #1902943) - USB: gadget: f_ncm: Fix NDP16 datagram validation - gpio: tc35894: fix up tc35894 interrupt configuration - vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock - vsock/virtio: stop workers during the .remove() - vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock() - net: virtio_vsock: Enhance connection semantics - Input: i8042 - add nopnp quirk for Acer Aspire 5 A515 - ftrace: Move RCU is watching check after recursion check - drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config - drivers/net/wan/hdlc_fr: Add needed_headroom for PVC devices - drm/sun4i: mixer: Extend regmap max_register - net: dec: de2104x: Increase receive ring size for Tulip - rndis_host: increase sleep time in the query-response loop - nvme-core: get/put ctrl and transport module in nvme_dev_open/release() - drivers/net/wan/lapbether: Make skb->protocol consistent with the header - drivers/net/wan/hdlc: Set skb->protocol before transmitting - mac80211: do not allow bigger VHT MPDUs than the hardware supports - spi: fsl-espi: Only process interrupts for expected events - nvme-fc: fail new connections to a deleted host or remote port - pinctrl: mvebu: Fix i2c sda definition for 98DX3236 - nfs: Fix security label length not being reset - clk: samsung: exynos4: mark 'chipid' clock as CLK_IGNORE_UNUSED - iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate() - i2c: cpm: Fix i2c_ram structure - Input: trackpoint - enable Synaptics trackpoints - random32: Restore __latent_entropy attribute on net_rand_state - epoll: do not insert into poll queues until all sanity checks are done - epoll: replace ->visited/visited_list with generation count - epoll: EPOLL_CTL_ADD: close the race in decision to take fast path - ep_create_wakeup_source(): dentry name can change under you... - netfilter: ctnetlink: add a range check for l3/l4 protonum - drm/syncobj: Fix drm_syncobj_handle_to_fd refcount leak - fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h - Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts - Revert "ravb: Fixed to be able to unload modules" - fbcon: Fix global-out-of-bounds read in fbcon_get_font() - net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() - usermodehelper: reset umask to default before executing user process - platform/x86: thinkpad_acpi: initialize tp_nvram_state variable - platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse - driver
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug was fixed in the package linux - 5.8.0-31.33 --- linux (5.8.0-31.33) groovy; urgency=medium * groovy/linux: 5.8.0-31.33 -proposed tracker (LP: #1905299) * Groovy 5.8 kernel hangs on boot on CPUs with eLLC (LP: #1903397) - drm/i915: Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup during fbdev init * CVE-2020-4788 - selftests/powerpc: rfi_flush: disable entry flush if present - powerpc/64s: flush L1D on kernel entry - powerpc/64s: flush L1D after user accesses - selftests/powerpc: entry flush test linux (5.8.0-30.32) groovy; urgency=medium * groovy/linux: 5.8.0-30.32 -proposed tracker (LP: #1903194) * Update kernel packaging to support forward porting kernels (LP: #1902957) - [Debian] Update for leader included in BACKPORT_SUFFIX * Avoid double newline when running insertchanges (LP: #1903293) - [Packaging] insertchanges: avoid double newline * EFI: Fails when BootCurrent entry does not exist (LP: #183) - efivarfs: Replace invalid slashes with exclamation marks in dentries. * raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations (LP: #1896578) - md: add md_submit_discard_bio() for submitting discard bio - md/raid10: extend r10bio devs to raid disks - md/raid10: pull codes that wait for blocked dev into one function - md/raid10: improve raid10 discard request - md/raid10: improve discard request for far layout - dm raid: fix discard limits for raid1 and raid10 - dm raid: remove unnecessary discard limits for raid10 * Bionic: btrfs: kernel BUG at /build/linux- eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254) - btrfs: extent_io: do extra check for extent buffer read write functions - btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent() - btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref() - btrfs: ctree: check key order before merging tree blocks * Tiger Lake PMC core driver fixes (LP: #1899883) - platform/x86: intel_pmc_core: update TGL's LPM0 reg bit map name - platform/x86: intel_pmc_core: fix bound check in pmc_core_mphy_pg_show() - platform/x86: pmc_core: Use descriptive names for LPM registers - platform/x86: intel_pmc_core: Fix TigerLake power gating status map - platform/x86: intel_pmc_core: Fix the slp_s0 counter displayed value * drm/i915/dp_mst - System would hang during the boot up. (LP: #1902469) - Revert "UBUNTU: SAUCE: drm/i915/display: Fix null deref in intel_psr_atomic_check()" - drm/i915: Fix encoder lookup during PSR atomic check * Undetected Data corruption in MPI workloads that use VSX for reductions on POWER9 DD2.1 systems (LP: #1902694) - powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation - selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load workaround * [20.04 FEAT] Support/enhancement of NVMe IPL (LP: #1902179) - s390/ipl: support NVMe IPL kernel parameters * uvcvideo: add mapping for HEVC payloads (LP: #1895803) - media: uvcvideo: Add mapping for HEVC payloads * risc-v 5.8 kernel oops on ftrace tests (LP: #1894613) - stop_machine, rcu: Mark functions as notrace * Groovy update: v5.8.17 upstream stable release (LP: #1902137) - xgb4: handle 4-tuple PEDIT to NAT mode translation - ibmveth: Switch order of ibmveth_helper calls. - ibmveth: Identify ingress large send packets. - ipv4: Restore flowi4_oif update before call to xfrm_lookup_route - mlx4: handle non-napi callers to napi_poll - net: dsa: microchip: fix race condition - net: fec: Fix phy_device lookup for phy_reset_after_clk_enable() - net: fec: Fix PHY init after phy_reset_after_clk_enable() - net: fix pos incrementment in ipv6_route_seq_next - net: ipa: skip suspend/resume activities if not set up - net: mptcp: make DACK4/DACK8 usage consistent among all subflows - net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info - net/smc: fix use-after-free of delayed events - net/smc: fix valid DMBE buffer sizes - net/tls: sendfile fails with ktls offload - net: usb: qmi_wwan: add Cellient MPL200 card - tipc: fix the skb_unshare() in tipc_buf_append() - socket: fix option SO_TIMESTAMPING_NEW - socket: don't clear SOCK_TSTAMP_NEW when SO_TIMESTAMPNS is disabled - can: m_can_platform: don't call m_can_class_suspend in runtime suspend - can: j1935: j1939_tp_tx_dat_new(): fix missing initialization of skbcnt - net: j1939: j1939_session_fresh_new(): fix missing initialization of skbcnt - net/ipv4: always honour route mtu during forwarding - net_sched: remove a redundant goto chain check - r8169: fix data corruption issue on RTL8402 - binder: fix UAF when releasing todo list - ALSA: bebob: potential info leak in hwdep_read() - ALSA: hda/hdmi: fix incorrect
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug was fixed in the package linux - 5.4.0-56.62 --- linux (5.4.0-56.62) focal; urgency=medium * focal/linux: 5.4.0-56.62 -proposed tracker (LP: #1905300) * CVE-2020-4788 - selftests/powerpc: rfi_flush: disable entry flush if present - powerpc/64s: flush L1D on kernel entry - powerpc/64s: flush L1D after user accesses - selftests/powerpc: entry flush test linux (5.4.0-55.61) focal; urgency=medium * focal/linux: 5.4.0-55.61 -proposed tracker (LP: #1903175) * Update kernel packaging to support forward porting kernels (LP: #1902957) - [Debian] Update for leader included in BACKPORT_SUFFIX * Avoid double newline when running insertchanges (LP: #1903293) - [Packaging] insertchanges: avoid double newline * EFI: Fails when BootCurrent entry does not exist (LP: #183) - efivarfs: Replace invalid slashes with exclamation marks in dentries. * CVE-2020-14351 - perf/core: Fix race in the perf_mmap_close() function * raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations (LP: #1896578) - md: add md_submit_discard_bio() for submitting discard bio - md/raid10: extend r10bio devs to raid disks - md/raid10: pull codes that wait for blocked dev into one function - md/raid10: improve raid10 discard request - md/raid10: improve discard request for far layout - dm raid: fix discard limits for raid1 and raid10 - dm raid: remove unnecessary discard limits for raid10 * Bionic: btrfs: kernel BUG at /build/linux- eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254) - btrfs: drop unnecessary offset_in_page in extent buffer helpers - btrfs: extent_io: do extra check for extent buffer read write functions - btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent() - btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref() - btrfs: ctree: check key order before merging tree blocks * Ethernet no link lights after reboot (Intel i225-v 2.5G) (LP: #1902578) - igc: Add PHY power management control * Undetected Data corruption in MPI workloads that use VSX for reductions on POWER9 DD2.1 systems (LP: #1902694) - powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation - selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load workaround * [20.04 FEAT] Support/enhancement of NVMe IPL (LP: #1902179) - s390: nvme ipl - s390: nvme reipl - s390/ipl: support NVMe IPL kernel parameters * uvcvideo: add mapping for HEVC payloads (LP: #1895803) - media: uvcvideo: Add mapping for HEVC payloads * Focal update: v5.4.73 upstream stable release (LP: #1902115) - ibmveth: Switch order of ibmveth_helper calls. - ibmveth: Identify ingress large send packets. - ipv4: Restore flowi4_oif update before call to xfrm_lookup_route - mlx4: handle non-napi callers to napi_poll - net: fec: Fix phy_device lookup for phy_reset_after_clk_enable() - net: fec: Fix PHY init after phy_reset_after_clk_enable() - net: fix pos incrementment in ipv6_route_seq_next - net/smc: fix valid DMBE buffer sizes - net/tls: sendfile fails with ktls offload - net: usb: qmi_wwan: add Cellient MPL200 card - tipc: fix the skb_unshare() in tipc_buf_append() - socket: fix option SO_TIMESTAMPING_NEW - can: m_can_platform: don't call m_can_class_suspend in runtime suspend - can: j1935: j1939_tp_tx_dat_new(): fix missing initialization of skbcnt - net: j1939: j1939_session_fresh_new(): fix missing initialization of skbcnt - net/ipv4: always honour route mtu during forwarding - net_sched: remove a redundant goto chain check - r8169: fix data corruption issue on RTL8402 - cxgb4: handle 4-tuple PEDIT to NAT mode translation - binder: fix UAF when releasing todo list - ALSA: bebob: potential info leak in hwdep_read() - ALSA: hda/hdmi: fix incorrect locking in hdmi_pcm_close - nvme-pci: disable the write zeros command for Intel 600P/P3100 - chelsio/chtls: fix socket lock - chelsio/chtls: correct netdevice for vlan interface - chelsio/chtls: correct function return and return type - ibmvnic: save changed mac address to adapter->mac_addr - net: ftgmac100: Fix Aspeed ast2600 TX hang issue - net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device - net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup - net: Properly typecast int values to set sk_max_pacing_rate - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels - nexthop: Fix performance regression in nexthop deletion - nfc: Ensure presence of NFC_ATTR_FIRMWARE_NAME attribute in nfc_genl_fw_download() - r8169: fix operation under forced interrupt threading - selftests: forwarding: Add missing 'rp_filter' configuration - tcp: fix to update snd_wl1 in bulk receiver fast
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Performing verification for Bionic. I enabled -proposed and installed 4.15.0-125-generic to a i3.8xlarge AWS instance. >From there, I followed the testcase steps: $ uname -rv 4.15.0-125-generic #128-Ubuntu SMP Mon Nov 9 20:51:00 UTC 2020 $ lsblk NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda202:008G 0 disk └─xvda1 202:108G 0 part / nvme0n1 259:00 1.7T 0 disk nvme1n1 259:10 1.7T 0 disk nvme2n1 259:20 1.7T 0 disk nvme3n1 259:30 1.7T 0 disk $ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 mdadm: layout defaults to n2 mdadm: layout defaults to n2 mdadm: chunk size defaults to 512K mdadm: size set to 1855336448K mdadm: automatically enabling write-intent bitmap on large array mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. $ time sudo mkfs.xfs /dev/md0 meta-data=/dev/md0 isize=512agcount=32, agsize=28989568 blks = sectsz=512 attr=2, projid32bit=1 = crc=1finobt=1, sparse=0, rmapbt=0, reflink=0 data = bsize=4096 blocks=927666176, imaxpct=5 = sunit=128swidth=256 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=452968, version=2 = sectsz=512 sunit=8 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 real0m3.615s user0m0.002s sys 0m0.179s $ sudo mkdir /mnt/disk $ sudo mount /dev/md0 /mnt/disk $ time sudo fstrim /mnt/disk real0m1.898s user0m0.002s sys 0m0.015s We can see that mkfs.xfs took 3.6 seconds, and fstrim only 2 seconds. This is a significant improvement over the current 11 minutes. I started up a c5.large instance, and attached 4x EBS drives, which do not support block discard, and went through the testcase steps. Everything worked fine, and the changes have not caused any regressions to disks which do not support block discard. I also started another i3.8xlarge instance and tested raid0, to check for regressions around the refactoring. raid0 deployed fine, and was as performant as usual. The 4.15.0-125-generic kernel in -proposed fixes the issue, and I am happy to mark as verified. ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Performing verification for Focal. I enabled -proposed and installed 5.4.0-55-generic to a i3.8xlarge AWS instance. >From there, I followed the testcase steps: $ uname -rv 5.4.0-55-generic #61-Ubuntu SMP Mon Nov 9 20:49:56 UTC 2020 $ lsblk NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda202:008G 0 disk └─xvda1 202:108G 0 part / nvme0n1 259:00 1.7T 0 disk nvme1n1 259:10 1.7T 0 disk nvme3n1 259:20 1.7T 0 disk nvme2n1 259:30 1.7T 0 disk $ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 mdadm: layout defaults to n2 mdadm: layout defaults to n2 mdadm: chunk size defaults to 512K mdadm: size set to 1855336448K mdadm: automatically enabling write-intent bitmap on large array mdadm: Fail create md0 when using /sys/module/md_mod/parameters/new_array mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. $ time sudo mkfs.xfs /dev/md0 log stripe unit (524288 bytes) is too large (maximum is 256KiB) log stripe unit adjusted to 32KiB meta-data=/dev/md0 isize=512agcount=32, agsize=28989568 blks = sectsz=512 attr=2, projid32bit=1 = crc=1finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=927666176, imaxpct=5 = sunit=128swidth=256 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=452968, version=2 = sectsz=512 sunit=8 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 real0m5.350s user0m0.022s sys 0m0.179s $ sudo mkdir /mnt/disk $ sudo mount /dev/md0 /mnt/disk $ time sudo fstrim /mnt/disk real0m2.944s user0m0.006s sys 0m0.013s We can see that mkfs.xfs took 5.3 seconds, and fstrim only 3 seconds. This is a significant improvement over the current 11 minutes. I started up a c5.large instance, and attached 4x EBS drives, which do not support block discard, and went through the testcase steps. Everything worked fine, and the changes have not caused any regressions to disks which do not support block discard. I also started another i3.8xlarge instance and tested raid0, to check for regressions around the refactoring. raid0 deployed fine, and was as performant as usual. The 5.4.0-55-generic kernel in -proposed fixes the issue, and I am happy to mark as verified. ** Tags removed: verification-needed-focal ** Tags added: verification-done-focal -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>]
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
Performing verification for Groovy. I enabled -proposed and installed 5.8.0-30-generic to a i3.8xlarge AWS instance. >From there, I followed the testcase steps: $ uname -rv 5.8.0-30-generic #32-Ubuntu SMP Mon Nov 9 21:03:15 UTC 2020 $ lsblk NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda202:008G 0 disk └─xvda1 202:108G 0 part / nvme0n1 259:00 1.7T 0 disk nvme1n1 259:10 1.7T 0 disk nvme3n1 259:20 1.7T 0 disk nvme2n1 259:30 1.7T 0 disk $ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 mdadm: layout defaults to n2 mdadm: layout defaults to n2 mdadm: chunk size defaults to 512K mdadm: size set to 1855336448K mdadm: automatically enabling write-intent bitmap on large array mdadm: Fail create md0 when using /sys/module/md_mod/parameters/new_array mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. $ time sudo mkfs.xfs /dev/md0 log stripe unit (524288 bytes) is too large (maximum is 256KiB) log stripe unit adjusted to 32KiB meta-data=/dev/md0 isize=512agcount=32, agsize=28989568 blks = sectsz=512 attr=2, projid32bit=1 = crc=1finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=927666176, imaxpct=5 = sunit=128swidth=256 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=452968, version=2 = sectsz=512 sunit=8 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Discarding blocks...Done. real0m4.413s user0m0.022s sys 0m0.245s $ sudo mkdir /mnt/disk $ sudo mount /dev/md0 /mnt/disk $ time sudo fstrim /mnt/disk real0m1.973s user0m0.000s sys 0m0.037s We can see that mkfs.xfs took 4.4 seconds, and fstrim only 2 seconds. This is a significant improvement over the current 11 minutes. I started up a c5.large instance, and attached 4x EBS drives, which do not support block discard, and went through the testcase steps. Everything worked fine, and the changes have not caused any regressions to disks which do not support block discard. I also started another i3.8xlarge instance and tested raid0, to check for regressions around the refactoring. raid0 deployed fine, and was as performant as usual. The 5.8.0-30-generic kernel in -proposed fixes the issue, and I am happy to mark as verified. ** Tags removed: verification-needed-groovy ** Tags added: verification-done-groovy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>]
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed- bionic'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-bionic ** Tags added: verification-needed-focal -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed- groovy'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512 Author: Mike Snitzer Date:
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-groovy -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Changed in: linux (Ubuntu Bionic) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512 Author: Mike Snitzer Date: Thu Sep 24 13:14:52 2020 -0400 Subject: dm raid: fix discard limits for raid1 and raid10 Link: https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512 commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28 Author: Mike Snitzer Date: Thu Sep 24 16:40:12 2020 -0400 Subject: dm raid: remove unnecessary discard limits for raid10 Link: https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28 All the commits mentioned follow a similar strategy which was
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Changed in: linux (Ubuntu Focal) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Focal: Fix Committed Status in linux source package in Groovy: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512 Author: Mike Snitzer Date: Thu Sep 24 13:14:52 2020 -0400 Subject: dm raid: fix discard limits for raid1 and raid10 Link: https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512 commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28 Author: Mike Snitzer Date: Thu Sep 24 16:40:12 2020 -0400 Subject: dm raid: remove unnecessary discard limits for raid10 Link: https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28 All the commits mentioned follow a similar strategy which was
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Changed in: linux (Ubuntu Groovy) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1896578 Title: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: In Progress Status in linux source package in Focal: In Progress Status in linux source package in Groovy: Fix Committed Bug description: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512 Author: Mike Snitzer Date: Thu Sep 24 13:14:52 2020 -0400 Subject: dm raid: fix discard limits for raid1 and raid10 Link: https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512 commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28 Author: Mike Snitzer Date: Thu Sep 24 16:40:12 2020 -0400 Subject: dm raid: remove unnecessary discard limits for raid10 Link: https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28 All the commits mentioned follow a similar strategy which was
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Description changed: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512 Author: Mike Snitzer Date: Thu Sep 24 13:14:52 2020 -0400 Subject: dm raid: fix discard limits for raid1 and raid10 Link: https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512 commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28 Author: Mike Snitzer Date: Thu Sep 24 16:40:12 2020 -0400 Subject: dm raid: remove unnecessary discard limits for raid10 Link: https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28 All the commits mentioned follow a similar strategy which was implemented in Raid0 in the below commit, which was merged in 4.12-rc2, which fixed block discard performance issues in Raid0: commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0 Author: Shaohua Li Date: Sun May 7 17:36:24 2017 -0700 Subject: md/md0: optimize raid0 discard handling Link: https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0 The commits more or less cherry pick to the 5.8, 5.4 and 4.15 kernels, with the following minor fixups: 1)
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Description changed: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512 Author: Mike Snitzer Date: Thu Sep 24 13:14:52 2020 -0400 Subject: dm raid: fix discard limits for raid1 and raid10 Link: https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512 commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28 Author: Mike Snitzer Date: Thu Sep 24 16:40:12 2020 -0400 Subject: dm raid: remove unnecessary discard limits for raid10 Link: https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28 All the commits mentioned follow a similar strategy which was implemented in Raid0 in the below commit, which was merged in 4.12-rc2, which fixed block discard performance issues in Raid0: commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0 Author: Shaohua Li Date: Sun May 7 17:36:24 2017 -0700 Subject: md/md0: optimize raid0 discard handling Link: https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0 The commits more or less cherry pick to the 5.8, 5.4 and 4.15 kernels, with the following minor fixups: 1)
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Description changed: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512 Author: Mike Snitzer Date: Thu Sep 24 13:14:52 2020 -0400 Subject: dm raid: fix discard limits for raid1 and raid10 Link: https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512 commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28 Author: Mike Snitzer Date: Thu Sep 24 16:40:12 2020 -0400 Subject: dm raid: remove unnecessary discard limits for raid10 Link: https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28 All the commits mentioned follow a similar strategy which was implemented in Raid0 in the below commit, which was merged in 4.12-rc2, which fixed block discard performance issues in Raid0: commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0 Author: Shaohua Li Date: Sun May 7 17:36:24 2017 -0700 Subject: md/md0: optimize raid0 discard handling Link: https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0 + The commits more or less cherry pick to the 5.8, 5.4 and 4.15 kernels, + with the following minor fixups: + + 1)
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Description changed: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. + commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 + Author: Xiao Ni + Date: Tue Aug 25 13:42:59 2020 +0800 + Subject: md: add md_submit_discard_bio() for submitting discard bio + Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 + + commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 + Author: Xiao Ni + Date: Tue Aug 25 13:43:00 2020 +0800 + Subject: md/raid10: extend r10bio devs to raid disks + Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 + + commit f046f5d0d79cdb968f219ce249e497fd1accf484 + Author: Xiao Ni + Date: Tue Aug 25 13:43:01 2020 +0800 + Subject: md/raid10: pull codes that wait for blocked dev into one function + Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 + + commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 + Author: Xiao Ni + Date: Wed Sep 2 20:00:22 2020 +0800 + Subject: md/raid10: improve raid10 discard request + Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 + commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni - Date: Wed Sep 2 20:00:23 2020 +0800 + Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 - - commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 - Author: Xiao Ni - Date: Wed Sep 2 20:00:22 2020 +0800 - Subject: md/raid10: improve raid10 discard request - Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 - - commit f046f5d0d79cdb968f219ce249e497fd1accf484 - Author: Xiao Ni - Date: Tue Aug 25 13:43:01 2020 +0800 - Subject: md/raid10: pull codes that wait for blocked dev into one function - Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 - - commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 - Author: Xiao Ni - Date: Tue Aug 25 13:43:00 2020 +0800 - Subject: md/raid10: extend r10bio devs to raid disks - Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 - - commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 - Author: Xiao Ni - Date: Tue Aug 25 13:42:59 2020 +0800 - Subject: md: add md_submit_discard_bio() for submitting discard bio - Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. commit
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Description changed: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.10-rc1. commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The - following commit enables Radid10 to use large discards, instead of + following commits enable Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed. + + commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512 + Author: Mike Snitzer + Date: Thu Sep 24 13:14:52 2020 -0400 + Subject: dm raid: fix discard limits for raid1 and raid10 + Link: https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512 commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28 Author: Mike Snitzer Date: Thu Sep 24 16:40:12 2020 -0400 Subject: dm raid: remove unnecessary discard limits for raid10 Link: https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28 All the commits mentioned follow a similar strategy which was implemented in Raid0 in the below commit, which was merged in 4.12-rc2, which fixed block discard performance issues in Raid0: commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0 Author: Shaohua Li Date: Sun May 7 17:36:24 2017 -0700 Subject: md/md0: optimize raid0 discard handling Link: https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0 [Testcase] You will need a machine with at
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Description changed: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard - performance problems. It is currently in the md-next tree [1], and I am - expecting the commits to be merged during the 5.10 merge window. + performance problems. These commits have now landed in 5.10-rc1. - [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h - =md-next + commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359 + Author: Xiao Ni + Date: Wed Sep 2 20:00:23 2020 +0800 + Subject: md/raid10: improve discard request for far layout + Link: https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359 - commit 5b2374a6c221f28c74913d208bb5376a7ee3bf70 + commit bcc90d280465ebd51ab8688be86e1f00c62dccf9 Author: Xiao Ni - Date: Wed Sep 2 20:00:23 2020 +0800 - Subject: md/raid10: improve discard request for far layout - Link: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=5b2374a6c221f28c74913d208bb5376a7ee3bf70 + Date: Wed Sep 2 20:00:22 2020 +0800 + Subject: md/raid10: improve raid10 discard request + Link: https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9 - commit 8f694215ae4c7abf1e6c985803a1aad0db748d07 + commit f046f5d0d79cdb968f219ce249e497fd1accf484 Author: Xiao Ni - Date: Wed Sep 2 20:00:22 2020 +0800 - Subject: md/raid10: improve raid10 discard request - Link: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=8f694215ae4c7abf1e6c985803a1aad0db748d07 + Date: Tue Aug 25 13:43:01 2020 +0800 + Subject: md/raid10: pull codes that wait for blocked dev into one function + Link: https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484 - commit 6fcfa8732a8cfea7828a9444c855691c481ee557 + commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 Author: Xiao Ni - Date: Tue Aug 25 13:43:01 2020 +0800 - Subject: md/raid10: pull codes that wait for blocked dev into one function - Link: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=6fcfa8732a8cfea7828a9444c855691c481ee557 + Date: Tue Aug 25 13:43:00 2020 +0800 + Subject: md/raid10: extend r10bio devs to raid disks + Link: https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3 - commit 6f4fed152a5e483af2227156ce7b6263aeeb5c84 + commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 Author: Xiao Ni - Date: Tue Aug 25 13:43:00 2020 +0800 - Subject: md/raid10: extend r10bio devs to raid disks - Link: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=6f4fed152a5e483af2227156ce7b6263aeeb5c84 + Date: Tue Aug 25 13:42:59 2020 +0800 + Subject: md: add md_submit_discard_bio() for submitting discard bio + Link: https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0 - commit 7197f1a616caf85508d81c7f5c9f065ffaebf027 - Author: Xiao Ni -
[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations
** Description changed: BugLink: https://bugs.launchpad.net/bugs/1896578 [Impact] Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time. For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds. The bigger the devices, the longer it takes. The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests. For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once: $ cat /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 Where the Raid10 md device only supports 512k: $ cat /sys/block/md0/queue/discard_max_bytes 524288 $ cat /sys/block/md0/queue/discard_max_hw_bytes 524288 If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard() $ sudo cat /proc/1626/stack [<0>] wait_barrier+0x14c/0x230 [raid10] [<0>] regular_request_wait+0x39/0x150 [raid10] [<0>] raid10_write_request+0x11e/0x850 [raid10] [<0>] raid10_make_request+0xd7/0x150 [raid10] [<0>] md_handle_request+0x123/0x1a0 [<0>] md_submit_bio+0xda/0x120 [<0>] __submit_bio_noacct+0xde/0x320 [<0>] submit_bio_noacct+0x4d/0x90 [<0>] submit_bio+0x4f/0x1b0 [<0>] __blkdev_issue_discard+0x154/0x290 [<0>] blkdev_issue_discard+0x5d/0xc0 [<0>] blk_ioctl_discard+0xc4/0x110 [<0>] blkdev_common_ioctl+0x56c/0x840 [<0>] blkdev_ioctl+0xeb/0x270 [<0>] block_ioctl+0x3d/0x50 [<0>] __x64_sys_ioctl+0x91/0xc0 [<0>] do_syscall_64+0x38/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fix] Xiao Ni has developed a patchset which resolves the block discard performance problems. It is currently in the md-next tree [1], and I am expecting the commits to be merged during the 5.10 merge window. [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h =md-next commit 5b2374a6c221f28c74913d208bb5376a7ee3bf70 Author: Xiao Ni Date: Wed Sep 2 20:00:23 2020 +0800 Subject: md/raid10: improve discard request for far layout Link: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=5b2374a6c221f28c74913d208bb5376a7ee3bf70 commit 8f694215ae4c7abf1e6c985803a1aad0db748d07 Author: Xiao Ni Date: Wed Sep 2 20:00:22 2020 +0800 Subject: md/raid10: improve raid10 discard request Link: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=8f694215ae4c7abf1e6c985803a1aad0db748d07 commit 6fcfa8732a8cfea7828a9444c855691c481ee557 Author: Xiao Ni Date: Tue Aug 25 13:43:01 2020 +0800 Subject: md/raid10: pull codes that wait for blocked dev into one function Link: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=6fcfa8732a8cfea7828a9444c855691c481ee557 commit 6f4fed152a5e483af2227156ce7b6263aeeb5c84 Author: Xiao Ni Date: Tue Aug 25 13:43:00 2020 +0800 Subject: md/raid10: extend r10bio devs to raid disks Link: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=6f4fed152a5e483af2227156ce7b6263aeeb5c84 commit 7197f1a616caf85508d81c7f5c9f065ffaebf027 Author: Xiao Ni Date: Tue Aug 25 13:42:59 2020 +0800 Subject: md: add md_submit_discard_bio() for submitting discard bio Link: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=7197f1a616caf85508d81c7f5c9f065ffaebf027 It follows a similar strategy which was implemented in Raid0 in the below commit, which was merged in 4.12-rc2: commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0 Author: Shaohua Li Date: Sun May 7 17:36:24 2017 -0700 Subject: md/md0: optimize raid0 discard handling Link: https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0 [Testcase] You will need a machine with at least 4x NVMe drives which support block discard. I use a i3.8xlarge instance on AWS, since it has all of these things. $ lsblk xvda202:008G 0 disk └─xvda1 202:108G 0 part / nvme0n1 259:20 1.7T 0 disk nvme1n1 259:00 1.7T 0 disk nvme2n1 259:10 1.7T 0 disk nvme3n1 259:30 1.7T 0 disk Create a Raid10 array: $ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 Format the array with XFS: $ time sudo mkfs.xfs /dev/md0 real 11m14.734s $ sudo mkdir /mnt/disk $ sudo mount /dev/md0