[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-22 Thread Matthew Ruffell
Hi Evan,

The SRU cycle has completed, and all kernels containing the Raid10 block
discard performance patches have now been released to -updates.

Note that the versions are different than the kernels in -proposed, due
to the kernel team needing to do a last minute respin to fix two sets of
CVEs, one for broadcom wifi chipsets and the other for bpf, hence the
kernels being released a day later than usual.

The released kernels are:

Hirsute: 5.11.0-22-generic
Groovy:  5.8.0-59-generic
Focal:   5.4.0-77-generic
Bionic:  4.15.0-147-generic

The HWE equivalents have also been released to -updates.

You may now install these kernels to your systems and enjoy fast block
discard for your Raid10 arrays.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Focal:
  Fix Released
Status in linux source package in Groovy:
  Fix Released
Status in linux source package in Hirsute:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commit enables Radid10 to use 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-22 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-147.151

---
linux (4.15.0-147.151) bionic; urgency=medium

  * CVE-2021-3444
- bpf: Fix truncation handling for mod32 dst reg wrt zero

  * CVE-2021-3600
- SAUCE: bpf: Do not use ax register in interpreter on div/mod
- bpf: fix subprog verifier bypass by div/mod by 0 exception
- SAUCE: bpf: Fix 32-bit register truncation on div/mod instruction

linux (4.15.0-146.150) bionic; urgency=medium

  * UAF on CAN BCM bcm_rx_handler (LP: #1931855)
- SAUCE: can: bcm: delay release of struct bcm_op after synchronize_rcu

linux (4.15.0-145.149) bionic; urgency=medium

  * bionic/linux: 4.15.0-145.149 -proposed tracker (LP: #1929967)

  * Packaging resync (LP: #1786013)
- update dkms package versions

  * raid10: Block discard is very slow, causing severe delays for mkfs and
fstrim operations (LP: #1896578)
- md: add md_submit_discard_bio() for submitting discard bio
- md/raid10: extend r10bio devs to raid disks
- md/raid10: pull the code that wait for blocked dev into one function
- md/raid10: improve raid10 discard request
- md/raid10: improve discard request for far layout

  * CVE-2021-23133
- sctp: delay auto_asconf init until binding the first addr

  * Bionic update: upstream stable patchset 2021-05-25 (LP: #1929603)
- Input: nspire-keypad - enable interrupts only when opened
- dmaengine: dw: Make it dependent to HAS_IOMEM
- ARM: dts: Fix moving mmc devices with aliases for omap4 & 5
- arc: kernel: Return -EFAULT if copy_to_user() fails
- neighbour: Disregard DEAD dst in neigh_update
- ARM: keystone: fix integer overflow warning
- ASoC: fsl_esai: Fix TDM slot setup for I2S mode
- scsi: scsi_transport_srp: Don't block target in SRP_PORT_LOST state
- net: ieee802154: stop dump llsec keys for monitors
- net: ieee802154: stop dump llsec devs for monitors
- net: ieee802154: forbid monitor for add llsec dev
- net: ieee802154: stop dump llsec devkeys for monitors
- net: ieee802154: forbid monitor for add llsec devkey
- net: ieee802154: stop dump llsec seclevels for monitors
- net: ieee802154: forbid monitor for add llsec seclevel
- pcnet32: Use pci_resource_len to validate PCI resource
- mac80211: clear sta->fast_rx when STA removed from 4-addr VLAN
- Input: i8042 - fix Pegatron C15B ID entry
- HID: wacom: set EV_KEY and EV_ABS only for non-HID_GENERIC type of devices
- readdir: make sure to verify directory entry for legacy interfaces too
- arm64: fix inline asm in load_unaligned_zeropad()
- arm64: alternatives: Move length validation in alternative_{insn, endif}
- scsi: libsas: Reset num_scatter if libata marks qc as NODATA
- netfilter: conntrack: do not print icmpv6 as unknown via /proc
- netfilter: nft_limit: avoid possible divide error in nft_limit_init
- net: davicom: Fix regulator not turned off on failed probe
- net: sit: Unregister catch-all devices
- i40e: fix the panic when running bpf in xdpdrv mode
- ibmvnic: avoid calling napi_disable() twice
- ibmvnic: remove duplicate napi_schedule call in do_reset function
- ibmvnic: remove duplicate napi_schedule call in open function
- ARM: footbridge: fix PCI interrupt mapping
- ARM: 9071/1: uprobes: Don't hook on thumb instructions
- pinctrl: lewisburg: Update number of pins in community
- HID: wacom: Assign boolean values to a bool variable
- ARM: dts: Fix swapped mmc order for omap3
- net: geneve: check skb is large enough for IPv4/IPv6 header
- s390/entry: save the caller of psw_idle
- xen-netback: Check for hotplug-status existence before watching
- cavium/liquidio: Fix duplicate argument
- ia64: fix discontig.c section mismatches
- ia64: tools: remove duplicate definition of ia64_mf() on ia64
- x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access
- net: hso: fix NULL-deref on disconnect regression
- USB: CDC-ACM: fix poison/unpoison imbalance
- lockdep: Add a missing initialization hint to the "INFO: Trying to 
register
  non-static key" message
- drm/msm: Fix a5xx/a6xx timestamps
- Input: s6sy761 - fix coordinate read bit shift
- net: ip6_tunnel: Unregister catch-all devices
- ACPI: tables: x86: Reserve memory occupied by ACPI tables
- ACPI: x86: Call acpi_boot_table_init() after acpi_table_upgrade()
- net: usb: ax88179_178a: initialize local variables before use
- iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_enqueue_hcmd()
- mips: Do not include hi and lo in clobber list for R6
- bpf: Fix masking negation logic upon negative dst register
- iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_gen2_enqueue_hcmd()
- ALSA: usb-audio: Add MIDI quirk for Vox ToneLab EX
- USB: Add reset-resume quirk for WD19's Realtek Hub
- platform/x86: thinkpad_acpi: Correct thermal sensor allocation

  * r8152 tx 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-22 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.8.0-59.66

---
linux (5.8.0-59.66) groovy; urgency=medium

  * UAF on CAN J1939 j1939_can_recv (LP: #1932209)
- SAUCE: can: j1939: delay release of j1939_priv after synchronize_rcu

  * UAF on CAN BCM bcm_rx_handler (LP: #1931855)
- SAUCE: can: bcm: delay release of struct bcm_op after synchronize_rcu

linux (5.8.0-57.64) groovy; urgency=medium

  * groovy/linux: 5.8.0-57.64 -proposed tracker (LP: #1932047)

  * pmtu.sh from selftests.net in linux ADT test failure with linux/5.8.0-56.63
(LP: #1931731)
- net: geneve: modify IP header check in geneve6_xmit_skb and 
geneve_xmit_skb

linux (5.8.0-56.63) groovy; urgency=medium

  * groovy/linux: 5.8.0-56.63 -proposed tracker (LP: #1930052)

  * Packaging resync (LP: #1786013)
- update dkms package versions

  * scsi: storvsc: Parameterize number hardware queues (LP: #1930626)
- scsi: storvsc: Parameterize number hardware queues

  * CVE-2021-33200
- bpf: Wrap aux data inside bpf_sanitize_info container
- bpf: Fix mask direction swap upon off reg sign change
- bpf: No need to simulate speculative domain for immediates

  * CVE-2021-3490
- SAUCE: Revert "UBUNTU: SAUCE: bpf: verifier: fix ALU32 bounds tracking 
with
  bitwise ops"
- gpf: Fix alu32 const subreg bound tracking on bitwise operations

  * CVE-2021-3489
- SAUCE: Revert "UBUNTU: SAUCE: bpf: prevent writable memory-mapping of 
read-
  only ringbuf pages"
- bpf: Prevent writable memory-mapping of read-only ringbuf pages

  * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle
(LP: #1928242)
- USB: Verify the port status when timeout happens during port suspend

  * CVE-2020-26145
- ath10k: drop fragments with multicast DA for SDIO
- ath10k: add CCMP PN replay protection for fragmented frames for PCIe
- ath10k: drop fragments with multicast DA for PCIe

  * CVE-2020-26141
- ath10k: Fix TKIP Michael MIC verification for PCIe

  * CVE-2020-24587
- ath11k: Clear the fragment cache during key install

  * CVE-2020-24588
- mac80211: properly handle A-MSDUs that start with an RFC 1042 header
- cfg80211: mitigate A-MSDU aggregation attacks
- mac80211: drop A-MSDUs on old ciphers
- ath10k: drop MPDU which has discard flag set by firmware for SDIO

  * CVE-2020-26139
- mac80211: do not accept/forward invalid EAPOL frames

  * CVE-2020-24586 // CVE-2020-24587 // CVE-2020-24587 for such cases.
- mac80211: extend protection against mixed key and fragment cache attacks

  * CVE-2020-24586 // CVE-2020-24587
- mac80211: prevent mixed key and fragment cache attacks
- mac80211: add fragment cache to sta_info
- mac80211: check defrag PN against current frame
- mac80211: prevent attacks on TKIP/WEP as well

  * CVE-2020-26147
- mac80211: assure all fragments are encrypted

  * raid10: Block discard is very slow, causing severe delays for mkfs and
fstrim operations (LP: #1896578)
- md: add md_submit_discard_bio() for submitting discard bio
- md/raid10: extend r10bio devs to raid disks
- md/raid10: pull the code that wait for blocked dev into one function
- md/raid10: improve raid10 discard request
- md/raid10: improve discard request for far layout
- dm raid: remove unnecessary discard limits for raid0 and raid10

  * [SRU] mpt3sas: only one vSES is handy even IOC has multi vSES (LP: #1926517)
- scsi: mpt3sas: Only one vSES is present even when IOC has multi vSES

  * CVE-2021-23133
- sctp: delay auto_asconf init until binding the first addr

  * kvm: properly tear down PV features on hibernate (LP: #1920944)
- x86/kvm: Fix pr_info() for async PF setup/teardown
- x86/kvm: Teardown PV features on boot CPU as well
- x86/kvm: Disable kvmclock on all CPUs on shutdown
- x86/kvm: Disable all PV features on crash
- x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline()

  * CVE-2021-31440
- bpf: Fix propagation of 32 bit unsigned bounds from 64 bit bounds

  * Can't detect intel wifi 6235 (LP: #1920180)
- SAUCE: iwlwifi: add new pci id for 6235

  * [SRU] Patch for flicker and glitching on common LCD display panels, intel
framebuffer (LP: #1925685)
- drm/i915: Try to use fast+narrow link on eDP again and fall back to the 
old
  max strategy on failure
- drm/i915/dp: Use slow and wide link training for everything

  * pmtu.sh from net in ubuntu_kernel_selftests failed with no error message
(LP: #1887661)
- selftests: pmtu.sh: use $ksft_skip for skipped return code

  * IR Remote Keys Repeat Many Times Starting with Kernel 5.8.0-49
(LP: #1926030)
- SAUCE: Revert "media: rc: ite-cir: fix min_timeout calculation"
- SAUCE: Revert "media: rc: fix timeout handling after switch to microsecond
  durations"

  * Groovy update: upstream stable patchset 2021-05-20 (LP: #1929132)
- Input: nspire-keypad - enable 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-22 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.11.0-22.23

---
linux (5.11.0-22.23) hirsute; urgency=medium

  * UAF on CAN J1939 j1939_can_recv (LP: #1932209)
- SAUCE: can: j1939: delay release of j1939_priv after synchronize_rcu

  * UAF on CAN BCM bcm_rx_handler (LP: #1931855)
- SAUCE: can: bcm: delay release of struct bcm_op after synchronize_rcu

linux (5.11.0-20.21) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-20.21 -proposed tracker (LP: #1930854)

  * ath11k WIFI not working in proposed kernel 5.11.0-19-generic (LP: #1930637)
- bus: mhi: core: Download AMSS image from appropriate function

linux (5.11.0-19.20) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-19.20 -proposed tracker (LP: #1930075)

  * Packaging resync (LP: #1786013)
- update dkms package versions

  * CVE-2021-33200
- bpf: Wrap aux data inside bpf_sanitize_info container
- bpf: Fix mask direction swap upon off reg sign change
- bpf: No need to simulate speculative domain for immediates

  * AX201 BT will cause system could not enter S0i3 (LP: #1928047)
- SAUCE: drm/i915: Tweaked Wa_14010685332 for all PCHs

  * CVE-2021-3490
- SAUCE: Revert "UBUNTU: SAUCE: bpf: verifier: fix ALU32 bounds tracking 
with
  bitwise ops"
- gpf: Fix alu32 const subreg bound tracking on bitwise operations

  * CVE-2021-3489
- SAUCE: Revert "UBUNTU: SAUCE: bpf: prevent writable memory-mapping of 
read-
  only ringbuf pages"
- bpf: Prevent writable memory-mapping of read-only ringbuf pages

  * Select correct boot VGA when BIOS doesn't do it properly (LP: #1929217)
- vgaarb: Use ACPI HID name to find integrated GPU

  * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle
(LP: #1928242)
- USB: Verify the port status when timeout happens during port suspend

  * CVE-2020-26145
- ath10k: drop fragments with multicast DA for SDIO
- ath10k: add CCMP PN replay protection for fragmented frames for PCIe
- ath10k: drop fragments with multicast DA for PCIe

  * CVE-2020-26141
- ath10k: Fix TKIP Michael MIC verification for PCIe

  * CVE-2020-24587
- ath11k: Clear the fragment cache during key install

  * CVE-2020-24588
- mac80211: properly handle A-MSDUs that start with an RFC 1042 header
- cfg80211: mitigate A-MSDU aggregation attacks
- mac80211: drop A-MSDUs on old ciphers
- ath10k: drop MPDU which has discard flag set by firmware for SDIO

  * CVE-2020-26139
- mac80211: do not accept/forward invalid EAPOL frames

  * CVE-2020-24586 // CVE-2020-24587 // CVE-2020-24587 for such cases.
- mac80211: extend protection against mixed key and fragment cache attacks

  * CVE-2020-24586 // CVE-2020-24587
- mac80211: prevent mixed key and fragment cache attacks
- mac80211: add fragment cache to sta_info
- mac80211: check defrag PN against current frame
- mac80211: prevent attacks on TKIP/WEP as well

  * CVE-2020-26147
- mac80211: assure all fragments are encrypted

  * raid10: Block discard is very slow, causing severe delays for mkfs and
fstrim operations (LP: #1896578)
- md: add md_submit_discard_bio() for submitting discard bio
- md/raid10: extend r10bio devs to raid disks
- md/raid10: pull the code that wait for blocked dev into one function
- md/raid10: improve raid10 discard request
- md/raid10: improve discard request for far layout
- dm raid: remove unnecessary discard limits for raid0 and raid10

  * [SRU][OEM-5.10/H] Fix typec output on AMD Cezanne GPU (LP: #1929646)
- drm/amd/display: use max lb for latency hiding

  * kvm: properly tear down PV features on hibernate (LP: #1920944)
- x86/kvm: Fix pr_info() for async PF setup/teardown
- x86/kvm: Teardown PV features on boot CPU as well
- x86/kvm: Disable kvmclock on all CPUs on shutdown
- x86/kvm: Disable all PV features on crash
- x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline()

  * Add support for AMD wireless button (LP: #1928820)
- platform/x86: hp-wireless: add AMD's hardware id to the supported list

  * Can't detect intel wifi 6235 (LP: #1920180)
- SAUCE: iwlwifi: add new pci id for 6235

  * Speed up resume time on HP laptops (LP: #1929048)
- platform/x86: hp_accel: Avoid invoking _INI to speed up resume

  * Fix kernel panic on Intel Bluetooth (LP: #1928838)
- Bluetooth: Shutdown controller after workqueues are flushed or cancelled

  * build module CONFIG_SND_SOC_INTEL_SOUNDWIRE_SOF_MACH=m for 5.11,  5.13-rc2
and later (LP: #1921632)
- [Config] enable soundwire audio mach driver

  * [SRU] Patch for flicker and glitching on common LCD display panels, intel
framebuffer (LP: #1925685)
- drm/i915: Try to use fast+narrow link on eDP again and fall back to the 
old
  max strategy on failure
- drm/i915/dp: Use slow and wide link training for everything

  * Fix screen flickering when two 4K 60Hz monitors are connected to AMD Oland

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-22 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.4.0-77.86

---
linux (5.4.0-77.86) focal; urgency=medium

  * UAF on CAN J1939 j1939_can_recv (LP: #1932209)
- SAUCE: can: j1939: delay release of j1939_priv after synchronize_rcu

  * UAF on CAN BCM bcm_rx_handler (LP: #1931855)
- SAUCE: can: bcm: delay release of struct bcm_op after synchronize_rcu

linux (5.4.0-76.85) focal; urgency=medium

  * focal/linux: 5.4.0-76.85 -proposed tracker (LP: #1932123)

  * Upstream v5.9 introduced 'module' patches that removed exported symbols
(LP: #1932065)
- SAUCE: Revert "modules: inherit TAINT_PROPRIETARY_MODULE"
- SAUCE: Revert "modules: return licensing information from find_symbol"
- SAUCE: Revert "modules: rename the licence field in struct symsearch to
  license"
- SAUCE: Revert "modules: unexport __module_address"
- SAUCE: Revert "modules: unexport __module_text_address"
- SAUCE: Revert "modules: mark each_symbol_section static"
- SAUCE: Revert "modules: mark find_symbol static"
- SAUCE: Revert "modules: mark ref_module static"

linux (5.4.0-75.84) focal; urgency=medium

  * focal/linux: 5.4.0-75.84 -proposed tracker (LP: #1930032)

  * Packaging resync (LP: #1786013)
- update dkms package versions

  * CVE-2021-33200
- bpf: Wrap aux data inside bpf_sanitize_info container
- bpf: Fix mask direction swap upon off reg sign change
- bpf: No need to simulate speculative domain for immediates

  * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle
(LP: #1928242)
- USB: Verify the port status when timeout happens during port suspend

  * CVE-2020-26145
- ath10k: drop fragments with multicast DA for SDIO
- ath10k: add CCMP PN replay protection for fragmented frames for PCIe
- ath10k: drop fragments with multicast DA for PCIe

  * CVE-2020-26141
- ath10k: Fix TKIP Michael MIC verification for PCIe

  * CVE-2020-24588
- mac80211: properly handle A-MSDUs that start with an RFC 1042 header
- cfg80211: mitigate A-MSDU aggregation attacks
- mac80211: drop A-MSDUs on old ciphers
- ath10k: drop MPDU which has discard flag set by firmware for SDIO

  * CVE-2020-26139
- mac80211: do not accept/forward invalid EAPOL frames

  * CVE-2020-24586 // CVE-2020-24587 // CVE-2020-24587 for such cases.
- mac80211: extend protection against mixed key and fragment cache attacks

  * CVE-2020-24586 // CVE-2020-24587
- mac80211: prevent mixed key and fragment cache attacks
- mac80211: add fragment cache to sta_info
- mac80211: check defrag PN against current frame
- mac80211: prevent attacks on TKIP/WEP as well

  * CVE-2020-26147
- mac80211: assure all fragments are encrypted

  * raid10: Block discard is very slow, causing severe delays for mkfs and
fstrim operations (LP: #1896578)
- md: add md_submit_discard_bio() for submitting discard bio
- md/raid10: extend r10bio devs to raid disks
- md/raid10: pull the code that wait for blocked dev into one function
- md/raid10: improve raid10 discard request
- md/raid10: improve discard request for far layout
- dm raid: remove unnecessary discard limits for raid0 and raid10

  * [SRU] mpt3sas: only one vSES is handy even IOC has multi vSES (LP: #1926517)
- scsi: mpt3sas: Only one vSES is present even when IOC has multi vSES

  * kvm: properly tear down PV features on hibernate (LP: #1920944)
- x86/kvm: Fix pr_info() for async PF setup/teardown
- x86/kvm: Teardown PV features on boot CPU as well
- x86/kvm: Disable kvmclock on all CPUs on shutdown
- x86/kvm: Disable all PV features on crash
- x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline()

  * Focal update: v5.4.119 upstream stable release (LP: #1929615)
- Bluetooth: verify AMP hci_chan before amp_destroy
- hsr: use netdev_err() instead of WARN_ONCE()
- bluetooth: eliminate the potential race condition when removing the HCI
  controller
- net/nfc: fix use-after-free llcp_sock_bind/connect
- Revert "USB: cdc-acm: fix rounding error in TIOCSSERIAL"
- tty: moxa: fix TIOCSSERIAL jiffies conversions
- tty: amiserial: fix TIOCSSERIAL permission check
- USB: serial: usb_wwan: fix TIOCSSERIAL jiffies conversions
- staging: greybus: uart: fix TIOCSSERIAL jiffies conversions
- USB: serial: ti_usb_3410_5052: fix TIOCSSERIAL permission check
- staging: fwserial: fix TIOCSSERIAL jiffies conversions
- tty: moxa: fix TIOCSSERIAL permission check
- staging: fwserial: fix TIOCSSERIAL permission check
- usb: typec: tcpm: Address incorrect values of tcpm psy for fixed supply
- usb: typec: tcpm: Address incorrect values of tcpm psy for pps supply
- usb: typec: tcpm: update power supply once partner accepts
- usb: xhci-mtk: remove or operator for setting schedule parameters
- usb: xhci-mtk: improve bandwidth scheduling with TT
- ASoC: samsung: tm2_wm5110: check 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-21 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.11.0-20.21+21.10.1

---
linux (5.11.0-20.21+21.10.1) impish; urgency=medium

  * impish/linux: 5.11.0-20.21+21.10.1 -proposed tracker (LP: #1930056)

  * Packaging resync (LP: #1786013)
- update dkms package versions

  [ Ubuntu: 5.11.0-20.21 ]

  * hirsute/linux: 5.11.0-20.21 -proposed tracker (LP: #1930854)
  * ath11k WIFI not working in proposed kernel 5.11.0-19-generic (LP: #1930637)
- bus: mhi: core: Download AMSS image from appropriate function

  [ Ubuntu: 5.11.0-19.20 ]

  * hirsute/linux: 5.11.0-19.20 -proposed tracker (LP: #1930075)
  * Packaging resync (LP: #1786013)
- update dkms package versions
  * CVE-2021-33200
- bpf: Wrap aux data inside bpf_sanitize_info container
- bpf: Fix mask direction swap upon off reg sign change
- bpf: No need to simulate speculative domain for immediates
  * AX201 BT will cause system could not enter S0i3 (LP: #1928047)
- SAUCE: drm/i915: Tweaked Wa_14010685332 for all PCHs
  * CVE-2021-3490
- SAUCE: Revert "UBUNTU: SAUCE: bpf: verifier: fix ALU32 bounds tracking 
with
  bitwise ops"
- gpf: Fix alu32 const subreg bound tracking on bitwise operations
  * CVE-2021-3489
- SAUCE: Revert "UBUNTU: SAUCE: bpf: prevent writable memory-mapping of 
read-
  only ringbuf pages"
- bpf: Prevent writable memory-mapping of read-only ringbuf pages
  * Select correct boot VGA when BIOS doesn't do it properly (LP: #1929217)
- vgaarb: Use ACPI HID name to find integrated GPU
  * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle
(LP: #1928242)
- USB: Verify the port status when timeout happens during port suspend
  * CVE-2020-26145
- ath10k: drop fragments with multicast DA for SDIO
- ath10k: add CCMP PN replay protection for fragmented frames for PCIe
- ath10k: drop fragments with multicast DA for PCIe
  * CVE-2020-26141
- ath10k: Fix TKIP Michael MIC verification for PCIe
  * CVE-2020-24587
- ath11k: Clear the fragment cache during key install
  * CVE-2020-24588
- mac80211: properly handle A-MSDUs that start with an RFC 1042 header
- cfg80211: mitigate A-MSDU aggregation attacks
- mac80211: drop A-MSDUs on old ciphers
- ath10k: drop MPDU which has discard flag set by firmware for SDIO
  * CVE-2020-26139
- mac80211: do not accept/forward invalid EAPOL frames
  * CVE-2020-24586 // CVE-2020-24587 // CVE-2020-24587 for such cases.
- mac80211: extend protection against mixed key and fragment cache attacks
  * CVE-2020-24586 // CVE-2020-24587
- mac80211: prevent mixed key and fragment cache attacks
- mac80211: add fragment cache to sta_info
- mac80211: check defrag PN against current frame
- mac80211: prevent attacks on TKIP/WEP as well
  * CVE-2020-26147
- mac80211: assure all fragments are encrypted
  * raid10: Block discard is very slow, causing severe delays for mkfs and
fstrim operations (LP: #1896578)
- md: add md_submit_discard_bio() for submitting discard bio
- md/raid10: extend r10bio devs to raid disks
- md/raid10: pull the code that wait for blocked dev into one function
- md/raid10: improve raid10 discard request
- md/raid10: improve discard request for far layout
- dm raid: remove unnecessary discard limits for raid0 and raid10
  * [SRU][OEM-5.10/H] Fix typec output on AMD Cezanne GPU (LP: #1929646)
- drm/amd/display: use max lb for latency hiding
  * kvm: properly tear down PV features on hibernate (LP: #1920944)
- x86/kvm: Fix pr_info() for async PF setup/teardown
- x86/kvm: Teardown PV features on boot CPU as well
- x86/kvm: Disable kvmclock on all CPUs on shutdown
- x86/kvm: Disable all PV features on crash
- x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline()
  * Add support for AMD wireless button (LP: #1928820)
- platform/x86: hp-wireless: add AMD's hardware id to the supported list
  * Can't detect intel wifi 6235 (LP: #1920180)
- SAUCE: iwlwifi: add new pci id for 6235
  * Speed up resume time on HP laptops (LP: #1929048)
- platform/x86: hp_accel: Avoid invoking _INI to speed up resume
  * Fix kernel panic on Intel Bluetooth (LP: #1928838)
- Bluetooth: Shutdown controller after workqueues are flushed or cancelled
  * build module CONFIG_SND_SOC_INTEL_SOUNDWIRE_SOF_MACH=m for 5.11,  5.13-rc2
and later (LP: #1921632)
- [Config] enable soundwire audio mach driver
  * [SRU] Patch for flicker and glitching on common LCD display panels, intel
framebuffer (LP: #1925685)
- drm/i915: Try to use fast+narrow link on eDP again and fall back to the 
old
  max strategy on failure
- drm/i915/dp: Use slow and wide link training for everything
  * Fix screen flickering when two 4K 60Hz monitors are connected to AMD Oland
GFX (LP: #1928361)
- drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors
  are connected
  * Display abnormal on the 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-17 Thread Matthew Ruffell
Hi Evan,

Just checking in. Are you still running 5.4.0-75-generic on your server?

Is everything nice and stable? Is your data fully intact, and no signs
of corruption at all?

My server has been running for two weeks now, and it does a fstrim every
30 minutes, and everything appears to be stable, and I don't have any
corruption when I fsck my disks.

If things keep looking good, the SRU cycle will complete early next
week, and the kernel will be released to -updates around the 21st of
June, give or take a few days if any CVEs turn up.

Let me know how things are going.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commit enables Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-10 Thread Matthew Ruffell
Hi Evan,

Great to hear things are looking good for you and that the block discard
performance is there. If possible, keep running the kernel from
-proposed for a bit longer, just to make sure nothing comes up on longer
runs.

I spent some time today performing verification on all the kernels in
-proposed, testing block discard performance [1], and also running
through the regression testcase from LP #1907262 [2].

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262

All kernels performed as expected, with block discard on 4x 1.9TB NVMe
disks on an i3.8xlarge AWS instance taking 3-4 seconds, and the
consistency checks performed returned clean disks, with no filesystem or
data corruption.

I have documented my tests in my verification messages:

Hirsute: 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/26

Groovy:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/27

Focal:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/28

Bionic:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/29

I have marked the launchpad bug as verified for all releases.

I'm still running my own testing, with my /home directory being on a
Raid10 array on a Google Cloud instance, and it has no issues.

If things keep going well, we should see a release to -updates around
the 21st of June, give or take a few days if any CVEs turn up.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-10 Thread Matthew Ruffell
Performing verification for Bionic.

I'm going to do three rounds of verification.

The first is the testcase from this bug, showing block discard
performance.

The second is running through the regression reproducer from bug
1907262.

The third will be results from my testing with my /home directory on a
cloud instance with Raid10 backed disks, 3x customer testing and 2x
community user testing. This will be in a separate comment closer to the
release date once I have collected results.

Starting with the testcase for this bug.

I started a i3.8xlarge instance on AWS, enabled -proposed and installed
4.15.0-145-generic. From there, I ran through the testcase of making a
Raid10 array, and formatting it with xfs, and block discard performance
was excellent:

https://paste.ubuntu.com/p/Sr8tR9yhRd/

It took 3.2 seconds to format the array with xfs, and 1.0 seconds for a
fstrim, as opposed to the 20 minutes it took beforehand.

Performance with block discard is excellent.

Moving onto the second testcase, the regression reproducer from bug
1907262.

I started a n1-standard-2 VM on Google Cloud, and attached 2x NVMe
scratch disks. I enabled -proposed and installed 4.15.0-145-generic. I
ran through the testcase of making a Raid10 array, doing consistency
checks, ensuring that mismatch count is 0, creating a file, deleting it,
performing a fstrim, and more consistency checks, then taking the raid
array down and bringing up one disk at a time, and performing a
fsck.ext4. All disks came back clean:

https://paste.ubuntu.com/p/h8gTd4JQ8Y/

Since the block discard performance is there, and there is no apparent
data corruption going on after a fstrim, I will mark this verified for
Bionic.

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-10 Thread Matthew Ruffell
Performing verification for Focal.

I'm going to do three rounds of verification.

The first is the testcase from this bug, showing block discard
performance.

The second is running through the regression reproducer from bug
1907262.

The third will be results from my testing with my /home directory on a
cloud instance with Raid10 backed disks, 3x customer testing and 2x
community user testing. This will be in a separate comment closer to the
release date once I have collected results.

Starting with the testcase for this bug.

I started a i3.8xlarge instance on AWS, enabled -proposed and installed
5.4.0-75-generic. From there, I ran through the testcase of making a
Raid10 array, and formatting it with xfs, and block discard performance
was excellent:

https://paste.ubuntu.com/p/mdQ6Wjr4yK/

It took 6.6 seconds to format the array with xfs, and 3.8 seconds for a
fstrim, as opposed to the 20 minutes it took beforehand.

Performance with block discard is excellent.

Moving onto the second testcase, the regression reproducer from bug
1907262.

I started a n1-standard-2 VM on Google Cloud, and attached 2x NVMe
scratch disks. I enabled -proposed and installed 5.4.0-75-generic. I ran
through the testcase of making a Raid10 array, doing consistency checks,
ensuring that mismatch count is 0, creating a file, deleting it,
performing a fstrim, and more consistency checks, then taking the raid
array down and bringing up one disk at a time, and performing a
fsck.ext4. All disks came back clean:

https://paste.ubuntu.com/p/jFHW26kcCK/

Since the block discard performance is there, and there is no apparent
data corruption going on after a fstrim, I will mark this verified for
Focal.

** Tags removed: verification-needed-focal
** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-10 Thread Matthew Ruffell
Performing verification for Hirsute.

I'm going to do three rounds of verification.

The first is the testcase from this bug, showing block discard
performance.

The second is running through the regression reproducer from bug
1907262.

The third will be results from my testing with my /home directory on a
cloud instance with Raid10 backed disks, 3x customer testing and 2x
community user testing. This will be in a separate comment closer to the
release date once I have collected results.

Starting with the testcase for this bug.

I started a i3.8xlarge instance on AWS, enabled -proposed and installed
5.11.0-20-generic. From there, I ran through the testcase of making a
Raid10 array, and formatting it with xfs, and block discard performance
was excellent:

https://paste.ubuntu.com/p/X5sdCGT78Y/

It took 4.6 seconds to format the array with xfs, and 2.8 seconds for a fstrim,
as opposed to the 20 minutes it took beforehand.

Performance with block discard is excellent.

Moving onto the second testcase, the regression reproducer from bug
1907262.

I started a n1-standard-2 VM on Google Cloud, and attached 2x NVMe
scratch disks. I enabled -proposed and installed 5.11.0-20-generic. I
ran through the testcase of making a Raid10 array, doing consistency
checks, ensuring that mismatch count is 0, creating a file, deleting it,
performing a fstrim, and more consistency checks, then taking the raid
array down and bringing up one disk at a time, and performing a
fsck.ext4. All disks came back clean:

https://paste.ubuntu.com/p/Xy6CPCQXZN/

Since the block discard performance is there, and there is no apparent
data corruption going on after a fstrim, I will mark this verified for
Hirsute.

** Tags removed: verification-needed-hirsute
** Tags added: verification-done-hirsute

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-10 Thread Matthew Ruffell
Performing verification for Groovy.

I'm going to do three rounds of verification.

The first is the testcase from this bug, showing block discard
performance.

The second is running through the regression reproducer from bug
1907262.

The third will be results from my testing with my /home directory on a
cloud instance with Raid10 backed disks, 3x customer testing and 2x
community user testing. This will be in a separate comment closer to the
release date once I have collected results.

Starting with the testcase for this bug.

I started a i3.8xlarge instance on AWS, enabled -proposed and installed
5.8.0-56-generic. From there, I ran through the testcase of making a
Raid10 array, and formatting it with xfs, and block discard performance
was excellent:

https://paste.ubuntu.com/p/GGXfjCHfDR/

It took 5.7 seconds to format the array with xfs, and 2.3 seconds for a fstrim,
as opposed to the 20 minutes it took beforehand.

Performance with block discard is excellent.

Moving onto the second testcase, the regression reproducer from bug
1907262.

I started a n1-standard-2 VM on Google Cloud, and attached 2x NVMe
scratch disks. I enabled -proposed and installed 5.8.0-56-generic. I ran
through the testcase of making a Raid10 array, doing consistency checks,
ensuring that mismatch count is 0, creating a file, deleting it,
performing a fstrim, and more consistency checks, then taking the raid
array down and bringing up one disk at a time, and performing a
fsck.ext4. All disks came back clean:

https://paste.ubuntu.com/p/75xWd4Z3NZ/

Since the block discard performance is there, and there is no apparent
data corruption going on after a fstrim, I will mark this verified for
Groovy.

** Tags removed: verification-needed-groovy
** Tags added: verification-done-groovy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-08 Thread Evan Hoffman
Thanks Matt.  I have it installed on one machine so far and looks good
(in the past 10 minutes).  fstrim of a ~30 TB RAID 10 took 73 seconds
instead of multiple hours.

# uname -a
Linux xxx 5.4.0-75-generic #84-Ubuntu SMP Fri May 28 16:28:37 UTC 2021 x86_64 
x86_64 x86_64 GNU/Linux
# df -h /dev/md0
Filesystem  Size  Used Avail Use% Mounted on
/dev/md0 30T  212G   29T   1% /opt/raid
# cat /proc/mdstat
Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
[raid4]
md0 : active raid10 nvme7n1[7] nvme2n1[2] nvme6n1[6] nvme4n1[4] nvme1n1[1] 
nvme3n1[3] nvme0n1[0] nvme5n1[5]
  31255576576 blocks super 1.2 512K chunks 2 near-copies [8/8] []
  bitmap: 11/233 pages [44KB], 65536KB chunk

unused devices: 

# time fstrim /opt/raid

real1m13.162s
user0m0.004s
sys 0m0.351s

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f

  There is also an additional commit which is required, and was 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-07 Thread Matthew Ruffell
Hi Evan,

The kernel team have built all of the kernels for this SRU cycle, and
have placed them into -proposed for verification.

We now need to do some thorough testing and make sure that Raid10 arrays
function with good performance, ensure data integrity and make sure we
won't be introducing any regressions when these kernels are released in
two weeks time.

I would really appreciate it if you could help test and verify these
kernels function as intended.

Instructions to Install:

1) cat << EOF | sudo tee /etc/apt/sources.list.d/ubuntu-$(lsb_release 
-cs)-proposed.list
# Enable Ubuntu proposed archive
deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -cs)-proposed main universe
EOF
2) sudo apt update

For 21.04 / Hirsute:

3) sudo apt install linux-image-5.11.0-20-generic 
linux-modules-5.11.0-20-generic \
linux-modules-extra-5.11.0-20-generic linux-headers-5.11.0-20-generic

For 20.10 / Groovy:

3) sudo apt install linux-image-5.8.0-56-generic linux-modules-5.8.0-56-generic 
\
linux-modules-extra-5.8.0-56-generic linux-headers-5.8.0-56-generic

For 20.04 / Focal:

3) sudo apt install linux-image-5.4.0-75-generic linux-modules-5.4.0-75-generic 
\
linux-modules-extra-5.4.0-75-generic linux-headers-5.4.0-75-generic

For 18.04 / Bionic:
 For the 5.4 Bionic HWE Kernel:
 
 3) sudo apt install linux-image-5.4.0-75-generic 
linux-modules-5.4.0-75-generic \
linux-modules-extra-5.4.0-75-generic linux-headers-5.4.0-75-generic
 
 For the 4.15 Bionic GA Kernel:
 
 3) sudo apt install linux-image-4.15.0-145-generic 
linux-modules-4.15.0-145-generic \
linux-modules-extra-4.15.0-145-generic linux-headers-4.15.0-145-generic


4) sudo reboot
5) uname -rv

You may need to modify your grub configuration to boot the correct
kernel. If you need help, read these instructions:
https://paste.ubuntu.com/p/XrTzWPPnWJ/

I am running the -proposed kernel on my cloud instance with my /home directory 
on a Raid10 array made up of 4x NVMe devices, and things are looking okay.
I will be performing my detailed regression testing against these kernels 
tomorrow, and I will write back with the results then.

Please help test these kernels in -proposed, and let me know how they
go.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-05 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
groovy' to 'verification-done-groovy'. If the problem still exists,
change the tag 'verification-needed-groovy' to 'verification-failed-
groovy'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-groovy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commit enables Radid10 to use large discards, instead of
  splitting into many bios, since the technical 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-02 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

** Tags added: verification-needed-focal

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commit enables Radid10 to use large discards, instead of
  

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-02 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
focal' to 'verification-done-focal'. If the problem still exists, change
the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commit enables Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-06-02 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
hirsute' to 'verification-done-hirsute'. If the problem still exists,
change the tag 'verification-needed-hirsute' to 'verification-failed-
hirsute'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-hirsute

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commit enables Radid10 to use large discards, instead of
  splitting into many bios, since the 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-05-27 Thread Matthew Ruffell
Hi Evan,

As I mentioned in my previous message, I submitted the patches to the
Ubuntu kernel mailing list for SRU.

These patches have now gotten 2 acks [1][2] from senior kernel team
members, and the patches have now been applied [3] to the 4.15, 5.4, 5.8
and 5.11 kernels.

[1] https://lists.ubuntu.com/archives/kernel-team/2021-May/120475.html
[2] https://lists.ubuntu.com/archives/kernel-team/2021-May/120799.html
[3] https://lists.ubuntu.com/archives/kernel-team/2021-May/120800.html

This is what is going to happen next. Next week, between the 31st of May
and 4th of June, the kernel team will build the next kernel update, and
place it in -proposed for testing.

As soon as these kernels enter -proposed, we need to install and test
Raid10 in these new kernels as much as possible. The testing and
verification window is between the 7th and 18th of June.

If all goes well, we can mark the launchpad bug as verified, and we will
see a release to -updates around the 21st of June, give or take a few
days if any CVEs turn up.

The schedule is on https://kernel.ubuntu.com/ if anything were to
change.

I will write back once the next kernel update is in -proposed, likely
early to mid next week. I would really, really appreciate it if you
could help test the kernels when they arrive in -proposed, as I really
don't want to introduce any more regressions.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-05-27 Thread Kleber Sacilotto de Souza
** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Fix Committed

** Changed in: linux (Ubuntu Focal)
   Status: In Progress => Fix Committed

** Changed in: linux (Ubuntu Groovy)
   Status: In Progress => Fix Committed

** Changed in: linux (Ubuntu Hirsute)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commit enables Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit ca4a4e9a55beeb138bb06e3867f5e486da896d44
  Author: Mike Snitzer 
  Date:   Fri Apr 30 14:38:37 2021 -0400
  Subject: dm raid: remove unnecessary discard limits for raid0 and raid10
  Link: 
https://github.com/torvalds/linux/commit/ca4a4e9a55beeb138bb06e3867f5e486da896d44

  The 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-05-24 Thread Evan Hoffman
I have it running on two machines now that needed big RAID 10s:

# uname -rv
5.4.0-72-generic #80+TEST1896578v20210504b1-Ubuntu SMP Tue May 4 00:30:36 UTC 
202
# df -h /opt/raid
Filesystem  Size  Used Avail Use% Mounted on
/dev/md0 30T  208G   29T   1% /opt/raid
# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 nvme7n1[7] nvme6n1[6] nvme5n1[5] nvme4n1[4] nvme3n1[3] 
nvme2n1[2] nvme1n1[1] nvme0n1[0]
  31255576576 blocks super 1.2 512K chunks 2 near-copies [8/8] []
  [>]  resync =  0.1% (48777600/31255576576) 
finish=2514.8min speed=206813K/sec
  bitmap: 233/233 pages [932KB], 65536KB chunk

FWIW the mkfs.xfs took ~1 minute across 8x8TB NVMe disks with this
patched kernel.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Groovy:
  In Progress
Status in linux source package in Hirsute:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commit enables 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-05-19 Thread Matthew Ruffell
Hi Evan,

The patches have been submitted for SRU to the Ubuntu kernel mailing
list, for the 4.15, 5.4, 5.8 and 5.11 kernels:

[0] https://lists.ubuntu.com/archives/kernel-team/2021-May/119935.html
[1] https://lists.ubuntu.com/archives/kernel-team/2021-May/119936.html
[2] https://lists.ubuntu.com/archives/kernel-team/2021-May/119937.html
[3] https://lists.ubuntu.com/archives/kernel-team/2021-May/119938.html
[4] https://lists.ubuntu.com/archives/kernel-team/2021-May/119939.html
[5] https://lists.ubuntu.com/archives/kernel-team/2021-May/119941.html
[6] https://lists.ubuntu.com/archives/kernel-team/2021-May/119940.html
[7] https://lists.ubuntu.com/archives/kernel-team/2021-May/119942.html
[8] https://lists.ubuntu.com/archives/kernel-team/2021-May/119943.html
[9] https://lists.ubuntu.com/archives/kernel-team/2021-May/119944.html
[10] https://lists.ubuntu.com/archives/kernel-team/2021-May/119945.html
[11] https://lists.ubuntu.com/archives/kernel-team/2021-May/119946.html

The kernel team have reviewed the patches, but they are in no hurry to
ACK the patchset [12], and they also haven't outright rejected it.

[12] https://lists.ubuntu.com/archives/kernel-team/2021-May/120051.html

The current status is that the kernel team have requested more testing
to be performed, and that the patches will not make the current SRU
cycle. They will instead be submitted for consideration in the next SRU
cycle.

You can look at https://kernel.ubuntu.com/ for dates of various SRU
cycles. If the patches are accepted for the 2021.05.31 SRU cycle, then
you could expect a supported kernel to be available in late June.

If you want to help, then please consider installing the test kernels in
comment #14 and helping test Raid10 on ssds / NVMe drives that support
block discard.

I am currently using a cloud instance with 4x NVMe disks in Raid10 as my
/home directory, and things seem okay.

I'll keep you updated on the progress of this patchset via this bug.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Groovy:
  In Progress
Status in linux source package in Hirsute:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-05-19 Thread Evan Hoffman
Is there any ETA on a supported kernel with this patch?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Groovy:
  In Progress
Status in linux source package in Hirsute:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commit enables Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit ca4a4e9a55beeb138bb06e3867f5e486da896d44
  Author: Mike Snitzer 
  Date:   Fri Apr 30 14:38:37 2021 -0400
  Subject: dm raid: remove unnecessary discard limits for raid0 and raid10
  Link: 
https://github.com/torvalds/linux/commit/ca4a4e9a55beeb138bb06e3867f5e486da896d44

  The commits more or less cherry pick to the 5.11, 5.8, 5.4 and 4.15
  kernels, with the following minor backports:

  1) submit_bio_noacct() needed to be renamed to generic_make_request()
  since it was recently changed in:

  commit ed00aabd5eb9fb44d6aff1173234a2e911b9fead
  

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-05-07 Thread Matthew Ruffell
I have completed most of my regression testing, and things are still looking
good. The performance of the block discard is there, and I haven't seen any
data corruption.

In particular, I have been testing against the testcase for the regression that
occurred with the previous revision of the patches, back in December. The
testcase is covered in bug 1907262 [1].

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262

For each of the 5.11, 5.8, 5.4 and 4.15 kernels, the problem does not reproduce,
as the values of /sys/block/md0/md/mismatch_cnt are always 0, and mounting each
disk in singular and performing a full deep fsck shows no data corruption.

Test results for each kernel are below:

5.11.0-16-generic #17+TEST1896578v20210503b1-Ubuntu
https://paste.ubuntu.com/p/Dp3sR9mNdY/

5.8.0-50-generic #56+TEST1896578v20210504b1-Ubuntu
https://paste.ubuntu.com/p/tXmtmd5Jys/

5.4.0-72-generic #80+TEST1896578v20210504b1-Ubuntu
https://paste.ubuntu.com/p/VzX2mXcKbF/

4.15.0-142-generic #146+TEST1896578v20210504b1-Ubuntu
https://paste.ubuntu.com/p/HpMcX3N9fD/

I think I will look into some longer running tests as well, more info on that
later.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Groovy:
  In Progress
Status in linux source package in Hirsute:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

  commit 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-05-07 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1896578
  
  [Impact]
  
  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to take
  a very long time.
  
  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid
  0, takes 4 seconds.
  
  The bigger the devices, the longer it takes.
  
  The cause is that Raid10 currently uses a 512k chunk size, and uses this
  for the discard_max_bytes value. If we need to discard 1.9TB, the kernel
  splits the request into millions of 512k bio requests, even if the
  underlying device supports larger requests.
  
  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at
  once:
  
  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040
  
  Where the Raid10 md device only supports 512k:
  
  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288
  
  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()
  
  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
  
  [Fix]
  
  Xiao Ni has developed a patchset which resolves the block discard
- performance problems. These commits have now landed in 5.10-rc1.
+ performance problems. These commits have now landed in 5.13-rc1.
  
  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  
  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925
  
  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22
  
  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa
  
  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f
  
  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
- following commits enable Radid10 to use large discards, instead of
+ following commit enables Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.
  
  commit ca4a4e9a55beeb138bb06e3867f5e486da896d44
  Author: Mike Snitzer 
  Date:   Fri Apr 30 14:38:37 2021 -0400
  Subject: dm raid: remove unnecessary discard limits for raid0 and raid10
  Link: 
https://github.com/torvalds/linux/commit/ca4a4e9a55beeb138bb06e3867f5e486da896d44
  
  The commits more or less cherry pick to the 5.11, 5.8, 5.4 and 4.15
  kernels, with the following minor backports:
  
  1) submit_bio_noacct() needed to be renamed to generic_make_request()
  since it was recently changed in:
  
  commit ed00aabd5eb9fb44d6aff1173234a2e911b9fead
  Author: Christoph Hellwig 
  Date:   Wed Jul 1 10:59:44 2020 +0200
  Subject: block: rename generic_make_request to submit_bio_noacct
  Link: 
https://github.com/torvalds/linux/commit/ed00aabd5eb9fb44d6aff1173234a2e911b9fead
  
  2) In the 4.15, 5.4 and 5.8 kernels, trace_block_bio_remap() needs to
  have its request_queue argument put back in place. It was recently
  removed in:
  
  commit 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-05-07 Thread Matthew Ruffell
If anyone is interested in testing, there are new re-spins of the test kernels
available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/lp1896578-test

The patches used are the ones I will be submitting for SRU, and are more
or less identical to the patches in the previous test kernels I supplied
in February.

Please go ahead and do some testing, and let me know if you find any
problems.

Please note this package is NOT SUPPORTED by Canonical, and is for
TESTING PURPOSES ONLY. ONLY Install in a dedicated test environment.

Instructions to install:
1) sudo add-apt-repository ppa:mruffell/lp1896578-test
2) sudo apt update

For 21.04 / Hirsute:

3) sudo apt install linux-image-unsigned-5.11.0-16-generic 
linux-modules-5.11.0-16-generic \
linux-modules-extra-5.11.0-16-generic linux-headers-5.11.0-16-generic

For 20.10 / Groovy:

3) sudo apt install linux-image-unsigned-5.8.0-50-generic 
linux-modules-5.8.0-50-generic \
linux-modules-extra-5.8.0-50-generic linux-headers-5.8.0-50-generic

For 20.04 / Focal:

3) sudo apt install linux-image-unsigned-5.4.0-72-generic 
linux-modules-5.4.0-72-generic \
linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic

For 18.04 / Bionic:
For the 5.4 Bionic HWE kernel:

3) sudo apt install linux-image-unsigned-5.4.0-72-generic 
linux-modules-5.4.0-72-generic \
linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic

For the 4.15 Bionic GA kernel:

3) sudo apt install linux-image-unsigned-4.15.0-142-generic 
linux-modules-4.15.0-142-generic \
linux-modules-extra-4.15.0-142-generic linux-headers-4.15.0-142-generic

4) sudo reboot
5) uname -rv
Make sure the string "+TEST1896578v20210504b1" is present in the uname -rv.

You may need to modify your grub configuration to boot the correct
kernel. If you need help, read these instructions:
https://paste.ubuntu.com/p/XrTzWPPnWJ/

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Groovy:
  In Progress
Status in linux source package in Hirsute:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.13-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-05-03 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1896578
  
  [Impact]
  
  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to take
  a very long time.
  
  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid
  0, takes 4 seconds.
  
  The bigger the devices, the longer it takes.
  
  The cause is that Raid10 currently uses a 512k chunk size, and uses this
  for the discard_max_bytes value. If we need to discard 1.9TB, the kernel
  splits the request into millions of 512k bio requests, even if the
  underlying device supports larger requests.
  
  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at
  once:
  
  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040
  
  Where the Raid10 md device only supports 512k:
  
  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288
  
  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()
  
  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
  
  [Fix]
  
  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.
  
  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  
  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925
  
  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22
  
  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa
  
  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f
  
  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.
  
  commit ca4a4e9a55beeb138bb06e3867f5e486da896d44
  Author: Mike Snitzer 
  Date:   Fri Apr 30 14:38:37 2021 -0400
  Subject: dm raid: remove unnecessary discard limits for raid0 and raid10
  Link: 
https://github.com/torvalds/linux/commit/ca4a4e9a55beeb138bb06e3867f5e486da896d44
  
  The commits more or less cherry pick to the 5.11, 5.8, 5.4 and 4.15
  kernels, with the following minor backports:
  
  1) submit_bio_noacct() needed to be renamed to generic_make_request()
  since it was recently changed in:
  
  commit ed00aabd5eb9fb44d6aff1173234a2e911b9fead
  Author: Christoph Hellwig 
  Date:   Wed Jul 1 10:59:44 2020 +0200
  Subject: block: rename generic_make_request to submit_bio_noacct
  Link: 
https://github.com/torvalds/linux/commit/ed00aabd5eb9fb44d6aff1173234a2e911b9fead
  
- 2) bio_split(), mempool_alloc(), bio_clone_fast() all needed their
+ 2) In the 4.15, 5.4 and 5.8 kernels, trace_block_bio_remap() needs to
+ have its request_queue argument put back in place. It was recently
+ removed in:
+ 
+ commit 1c02fca620f7273b597591065d366e2cca948d8f
+ Author: Christoph Hellwig 
+ Date:   Thu 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-05-02 Thread Matthew Ruffell
** Also affects: linux (Ubuntu Hirsute)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Hirsute)
   Status: New => In Progress

** Changed in: linux (Ubuntu Hirsute)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Hirsute)
 Assignee: (unassigned) => Matthew Ruffell (mruffell)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Groovy:
  In Progress
Status in linux source package in Hirsute:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.

  commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

  commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

  commit f2e7e269a7525317752d472bb48a549780e87d22
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:45 2021 +0800
  Subject: md/raid10: pull the code that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

  commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

  commit 254c271da0712ea8914f187588e0f81f7678ee2f
  Author: Xiao Ni 
  Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit ca4a4e9a55beeb138bb06e3867f5e486da896d44
  Author: Mike Snitzer 
  Date:   Fri Apr 30 14:38:37 2021 -0400
  Subject: dm raid: remove unnecessary discard limits for raid0 and raid10
  Link: 
https://github.com/torvalds/linux/commit/ca4a4e9a55beeb138bb06e3867f5e486da896d44

  The 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-05-02 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1896578
  
  [Impact]
  
  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to take
  a very long time.
  
  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid
  0, takes 4 seconds.
  
  The bigger the devices, the longer it takes.
  
  The cause is that Raid10 currently uses a 512k chunk size, and uses this
  for the discard_max_bytes value. If we need to discard 1.9TB, the kernel
  splits the request into millions of 512k bio requests, even if the
  underlying device supports larger requests.
  
  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at
  once:
  
  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040
  
  Where the Raid10 md device only supports 512k:
  
  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288
  
  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()
  
  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
  
  [Fix]
  
  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.
  
- commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
- Author: Xiao Ni 
- Date: Tue Aug 25 13:42:59 2020 +0800
+ commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
+ Author: Xiao Ni 
+ Date:   Thu Feb 4 15:50:43 2021 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
- Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
- 
- commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
- Author: Xiao Ni 
- Date: Tue Aug 25 13:43:00 2020 +0800
+ Link: 
https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8
+ 
+ commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
+ Author: Xiao Ni 
+ Date:   Thu Feb 4 15:50:44 2021 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
- Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
- 
- commit f046f5d0d79cdb968f219ce249e497fd1accf484
- Author: Xiao Ni 
- Date: Tue Aug 25 13:43:01 2020 +0800
- Subject: md/raid10: pull codes that wait for blocked dev into one function
- Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484
- 
- commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
- Author: Xiao Ni 
- Date: Wed Sep 2 20:00:22 2020 +0800
+ Link: 
https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925
+ 
+ commit f2e7e269a7525317752d472bb48a549780e87d22
+ Author: Xiao Ni 
+ Date:   Thu Feb 4 15:50:45 2021 +0800
+ Subject: md/raid10: pull the code that wait for blocked dev into one function
+ Link: 
https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22
+ 
+ commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
+ Author: Xiao Ni 
+ Date:   Thu Feb 4 15:50:46 2021 +0800
  Subject: md/raid10: improve raid10 discard request
- Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9
- 
- commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
- Author: Xiao Ni 
- Date: Wed Sep 2 20:00:23 2020 +0800
+ Link: 
https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa
+ 
+ commit 254c271da0712ea8914f187588e0f81f7678ee2f
+ Author: Xiao Ni 
+ Date:   Thu Feb 4 15:50:47 2021 +0800
  Subject: md/raid10: improve discard request for far layout
- Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359
+ Link: 
https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f
  
  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.
  
- commit 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-02-14 Thread Matthew Ruffell
Hi everyone,

The original patch author, Xiao Ni, has sent a V2 patchset to the linux-
raid mailing list for feedback. This new patchset fixes the problems the
previous version had, namely, properly calculating the discard offset
for second and onward disks, and correctly calculates the stripe size in
far layouts.

The patches are:

https://www.spinics.net/lists/raid/msg67208.html
https://www.spinics.net/lists/raid/msg67212.html
https://www.spinics.net/lists/raid/msg67213.html
https://www.spinics.net/lists/raid/msg67209.html
https://www.spinics.net/lists/raid/msg67210.html
https://www.spinics.net/lists/raid/msg67211.html

We now need to thoroughly test and provide feedback to Xiao and the Raid
subsystem maintainer before these patches can get merged into mainline
again. We really need to make sure that these patches don't cause any
data corruption.

I have backported the patchset to the 4.15, 5.4 and 5.8 kernels.

Backports for 5.4 and 5.8 kernels:

https://paste.ubuntu.com/p/vPFFPMjhbv/
https://paste.ubuntu.com/p/MCGH8v7Rqk/
https://paste.ubuntu.com/p/rppy39Qgkz/
https://paste.ubuntu.com/p/Dsqy4PQNzJ/
https://paste.ubuntu.com/p/mZ9VDBD8d5/
https://paste.ubuntu.com/p/vJNYZyGTWH/
https://paste.ubuntu.com/p/M4sMwhgWTj/

Backports for the 4.15 kernel:

https://paste.ubuntu.com/p/X9rRHT59qf/
https://paste.ubuntu.com/p/VWwW9JbBHy/
https://paste.ubuntu.com/p/pFY3YbBW6t/
https://paste.ubuntu.com/p/JKg4KcHwPB/
https://paste.ubuntu.com/p/C4sf2r9jS4/

I have built test kernels for bionic, bionic HWE, focal and groovy.

Performance testing confirms that the testcase for formatting a Raid10
array on NVMe disks drops from 8.5 minutes to about 6 seconds, on AWS
i3.8xlarge, due to the speedup in block discard.

https://paste.ubuntu.com/p/NNGqP3xdsc/

I have also run through the data corruption regression reproducer from
bug 1907262, and throughout the process, the
/sys/block/md0/md/mismatch_cnt was always 0, and all deep fsck checks
came back clean for individual disks.

https://paste.ubuntu.com/p/5DK57TzdFH/

I am happy with these results, and its time to get some wider testing on
these patches.

If you are interested in helping to test, please use dedicated test
servers, and not production systems. These patches have caused data
corruption before, so only place data on the Raid10 array that you have
copies of elsewhere, and assume that total data loss could happen
anytime.

Please note, these test kernels are NOT SUPPORTED by Canonical, and are
for TEST PURPOSES ONLY. ONLY install in a dedicated test environment.

Instructions to Install (on a Bionic or Focal or Groovy system):
1) sudo add-apt-repository ppa:mruffell/lp1896578-test
2) sudo apt update

For Bionic:
3) sudo apt install linux-image-unsigned-4.15.0-136-generic 
linux-modules-4.15.0-136-generic linux-modules-extra-4.15.0-136-generic 
linux-headers-4.15.0-136-generic

For Bionic HWE 5.4 or Focal:
3) sudo apt install linux-image-unsigned-5.4.0-66-generic 
linux-modules-5.4.0-66-generic linux-modules-extra-5.4.0-66-generic 
linux-headers-5.4.0-66-generic

For Groovy:
3) sudo apt install linux-image-unsigned-5.8.0-44-generic 
linux-modules-5.8.0-44-generic linux-modules-extra-5.8.0-44-generic 
linux-headers-5.8.0-44-generic

4) sudo reboot
5) uname -rv

Bionic:
4.15.0-136-generic #140+TEST1896578v20210212b1-Ubuntu SMP Fri Feb 12 03:00:17 
UTC 2
Bionic HWE:
5.4.0-66-generic #74~18.04.2+TEST1896578v20210212b1-Ubuntu SMP Fri Feb 12 
02:55:4
Focal:
5.4.0-66-generic #74+TEST1896578v20210212b1-Ubuntu SMP Fri Feb 12 06:30:51 UTC 
20
Groovy:
5.8.0-44-generic #50+TEST1896578v20210212b1-Ubuntu SMP Fri Feb 12 06:19:49 UTC 
20

Make sure the uname matches one of the above strings before you start
testing and formatting Raid10 arrays.

We want to test formatting the arrays with xfs, ext4, and general usage
over time with regular consistency checks and fstrims. We want to make
sure that mismatch counts are 0, and all fsck -f runs are clean, and no
data corruption happens.

If you have any problems whatsoever, please let me know.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Groovy:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-02-10 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1896578
  
  [Impact]
  
  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to take
  a very long time.
  
  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid
  0, takes 4 seconds.
  
  The bigger the devices, the longer it takes.
  
  The cause is that Raid10 currently uses a 512k chunk size, and uses this
  for the discard_max_bytes value. If we need to discard 1.9TB, the kernel
  splits the request into millions of 512k bio requests, even if the
  underlying device supports larger requests.
  
  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at
  once:
  
  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040
  
  Where the Raid10 md device only supports 512k:
  
  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288
  
  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()
  
  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
  
  [Fix]
  
  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.
  
  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  
  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  
  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484
  
  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9
  
  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  
  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.
  
  commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  Author: Mike Snitzer 
  Date: Thu Sep 24 13:14:52 2020 -0400
  Subject: dm raid: fix discard limits for raid1 and raid10
  Link: 
https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  
  commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  Author: Mike Snitzer 
  Date:   Thu Sep 24 16:40:12 2020 -0400
  Subject: dm raid: remove unnecessary discard limits for raid10
  Link: 
https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  
  All the commits mentioned follow a similar strategy which was
  implemented in Raid0 in the below commit, which was merged in 4.12-rc2,
  which fixed block discard performance issues in Raid0:
  
  commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0
  Author: Shaohua Li 
  Date: Sun May 7 17:36:24 2017 -0700
  Subject: md/md0: optimize raid0 discard handling
  Link: 
https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0
  
  The commits more or less cherry pick to the 5.8, 5.4 and 4.15 kernels,
  with the following minor fixups:
  
  1) 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-01-11 Thread Matthew Ruffell
** Changed in: linux (Ubuntu)
   Status: Fix Released => In Progress

** Changed in: linux (Ubuntu Bionic)
   Status: Fix Released => In Progress

** Changed in: linux (Ubuntu Focal)
   Status: Fix Released => In Progress

** Changed in: linux (Ubuntu Groovy)
   Status: Fix Released => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Groovy:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.

  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0

  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3

  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484

  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9

  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  Author: Mike Snitzer 
  Date: Thu Sep 24 13:14:52 2020 -0400
  Subject: dm raid: fix discard limits for raid1 and raid10
  Link: 
https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512

  commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  Author: Mike Snitzer 
  Date:   Thu Sep 24 16:40:12 2020 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2021-01-11 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.8.0-36.40+21.04.1

---
linux (5.8.0-36.40+21.04.1) hirsute; urgency=medium

  * Packaging resync (LP: #1786013)
- update dkms package versions

  [ Ubuntu: 5.8.0-36.40 ]

  * debian/scripts/file-downloader does not handle positive failures correctly
(LP: #1878897)
- [Packaging] file-downloader not handling positive failures correctly

  [ Ubuntu: 5.8.0-35.39 ]

  * Packaging resync (LP: #1786013)
- update dkms package versions
  * CVE-2021-1052 // CVE-2021-1053
- [Packaging] NVIDIA -- Add the NVIDIA 460 driver

 -- Kleber Sacilotto de Souza   Thu, 07 Jan
2021 11:57:30 +0100

** Changed in: linux (Ubuntu)
   Status: In Progress => Fix Released

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-1052

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-1053

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Focal:
  Fix Released
Status in linux source package in Groovy:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.

  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0

  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3

  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484

  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9

  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-12-09 Thread Matthew Ruffell
Hi Markus,

I am deeply sorry for causing the regression. We are aware, and tracking
the issue in bug 1907262.

The kernel team have started an emergency revert and you can expect
fixed kernels to be released in the next day or so.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Focal:
  Fix Released
Status in linux source package in Groovy:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.

  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0

  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3

  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484

  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9

  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  Author: Mike Snitzer 
  Date: Thu Sep 24 13:14:52 2020 -0400
  Subject: dm raid: fix discard limits for raid1 and raid10
  Link: 
https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512

  commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  Author: Mike Snitzer 
  Date:   Thu Sep 24 16:40:12 2020 -0400
  Subject: dm raid: remove unnecessary discard limits for raid10
  Link: 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-12-09 Thread Markus Schade
On focal (5.4.0-56-generic) we are starting to see massive file system 
corruptions on systems updated to this kernel version.
These systems are using LVM with discards and thin provisioning on 6 or 8 NVMe 
drives in a RAID10 near configuration. We are currently downgrading all systems 
back to 5.4.0-54-generic and hope we can provide a simple reproducer

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Focal:
  Fix Released
Status in linux source package in Groovy:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.

  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0

  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3

  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484

  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9

  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  Author: Mike Snitzer 
  Date: Thu Sep 24 13:14:52 2020 -0400
  Subject: dm raid: fix discard limits for raid1 and raid10
  Link: 
https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512

  commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  Author: Mike 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-12-01 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-126.129

---
linux (4.15.0-126.129) bionic; urgency=medium

  * bionic/linux: 4.15.0-126.129 -proposed tracker (LP: #1905305)

  * CVE-2020-4788
- SAUCE: powerpc/64s: Define MASKABLE_RELON_EXCEPTION_PSERIES_OOL
- SAUCE: powerpc/64s: move some exception handlers out of line
- powerpc/64s: flush L1D on kernel entry
- SAUCE: powerpc: Add a framework for user access tracking
- powerpc: Implement user_access_begin and friends
- powerpc: Fix __clear_user() with KUAP enabled
- powerpc/uaccess: Evaluate macro arguments once, before user access is
  allowed
- powerpc/64s: flush L1D after user accesses

linux (4.15.0-125.128) bionic; urgency=medium

  * bionic/linux: 4.15.0-125.128 -proposed tracker (LP: #1903137)

  * Update kernel packaging to support forward porting kernels (LP: #1902957)
- [Debian] Update for leader included in BACKPORT_SUFFIX

  * Avoid double newline when running insertchanges (LP: #1903293)
- [Packaging] insertchanges: avoid double newline

  * EFI: Fails when BootCurrent entry does not exist (LP: #183)
- efivarfs: Replace invalid slashes with exclamation marks in dentries.

  * CVE-2020-14351
- perf/core: Fix race in the perf_mmap_close() function

  * raid10: Block discard is very slow, causing severe delays for mkfs and
fstrim operations (LP: #1896578)
- md: add md_submit_discard_bio() for submitting discard bio
- md/raid10: extend r10bio devs to raid disks
- md/raid10: pull codes that wait for blocked dev into one function
- md/raid10: improve raid10 discard request
- md/raid10: improve discard request for far layout

  * Bionic: btrfs: kernel BUG at /build/linux-
eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254)
- btrfs: use offset_in_page instead of open-coding it
- btrfs: use BUG() instead of BUG_ON(1)
- btrfs: drop unnecessary offset_in_page in extent buffer helpers
- btrfs: extent_io: do extra check for extent buffer read write functions
- btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent()
- btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref()
- btrfs: ctree: check key order before merging tree blocks

  * Bionic update: upstream stable patchset 2020-11-04 (LP: #1902943)
- USB: gadget: f_ncm: Fix NDP16 datagram validation
- gpio: tc35894: fix up tc35894 interrupt configuration
- vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock
- vsock/virtio: stop workers during the .remove()
- vsock/virtio: add transport parameter to the
  virtio_transport_reset_no_sock()
- net: virtio_vsock: Enhance connection semantics
- Input: i8042 - add nopnp quirk for Acer Aspire 5 A515
- ftrace: Move RCU is watching check after recursion check
- drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config
- drivers/net/wan/hdlc_fr: Add needed_headroom for PVC devices
- drm/sun4i: mixer: Extend regmap max_register
- net: dec: de2104x: Increase receive ring size for Tulip
- rndis_host: increase sleep time in the query-response loop
- nvme-core: get/put ctrl and transport module in nvme_dev_open/release()
- drivers/net/wan/lapbether: Make skb->protocol consistent with the header
- drivers/net/wan/hdlc: Set skb->protocol before transmitting
- mac80211: do not allow bigger VHT MPDUs than the hardware supports
- spi: fsl-espi: Only process interrupts for expected events
- nvme-fc: fail new connections to a deleted host or remote port
- pinctrl: mvebu: Fix i2c sda definition for 98DX3236
- nfs: Fix security label length not being reset
- clk: samsung: exynos4: mark 'chipid' clock as CLK_IGNORE_UNUSED
- iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()
- i2c: cpm: Fix i2c_ram structure
- Input: trackpoint - enable Synaptics trackpoints
- random32: Restore __latent_entropy attribute on net_rand_state
- epoll: do not insert into poll queues until all sanity checks are done
- epoll: replace ->visited/visited_list with generation count
- epoll: EPOLL_CTL_ADD: close the race in decision to take fast path
- ep_create_wakeup_source(): dentry name can change under you...
- netfilter: ctnetlink: add a range check for l3/l4 protonum
- drm/syncobj: Fix drm_syncobj_handle_to_fd refcount leak
- fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h
- Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts
- Revert "ravb: Fixed to be able to unload modules"
- fbcon: Fix global-out-of-bounds read in fbcon_get_font()
- net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key()
- usermodehelper: reset umask to default before executing user process
- platform/x86: thinkpad_acpi: initialize tp_nvram_state variable
- platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse
- driver 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-12-01 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.8.0-31.33

---
linux (5.8.0-31.33) groovy; urgency=medium

  * groovy/linux: 5.8.0-31.33 -proposed tracker (LP: #1905299)

  * Groovy 5.8 kernel hangs on boot on CPUs with eLLC (LP: #1903397)
- drm/i915: Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup
  during fbdev init

  * CVE-2020-4788
- selftests/powerpc: rfi_flush: disable entry flush if present
- powerpc/64s: flush L1D on kernel entry
- powerpc/64s: flush L1D after user accesses
- selftests/powerpc: entry flush test

linux (5.8.0-30.32) groovy; urgency=medium

  * groovy/linux: 5.8.0-30.32 -proposed tracker (LP: #1903194)

  * Update kernel packaging to support forward porting kernels (LP: #1902957)
- [Debian] Update for leader included in BACKPORT_SUFFIX

  * Avoid double newline when running insertchanges (LP: #1903293)
- [Packaging] insertchanges: avoid double newline

  * EFI: Fails when BootCurrent entry does not exist (LP: #183)
- efivarfs: Replace invalid slashes with exclamation marks in dentries.

  * raid10: Block discard is very slow, causing severe delays for mkfs and
fstrim operations (LP: #1896578)
- md: add md_submit_discard_bio() for submitting discard bio
- md/raid10: extend r10bio devs to raid disks
- md/raid10: pull codes that wait for blocked dev into one function
- md/raid10: improve raid10 discard request
- md/raid10: improve discard request for far layout
- dm raid: fix discard limits for raid1 and raid10
- dm raid: remove unnecessary discard limits for raid10

  * Bionic: btrfs: kernel BUG at /build/linux-
eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254)
- btrfs: extent_io: do extra check for extent buffer read write functions
- btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent()
- btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref()
- btrfs: ctree: check key order before merging tree blocks

  * Tiger Lake PMC core driver fixes (LP: #1899883)
- platform/x86: intel_pmc_core: update TGL's LPM0 reg bit map name
- platform/x86: intel_pmc_core: fix bound check in pmc_core_mphy_pg_show()
- platform/x86: pmc_core: Use descriptive names for LPM registers
- platform/x86: intel_pmc_core: Fix TigerLake power gating status map
- platform/x86: intel_pmc_core: Fix the slp_s0 counter displayed value

  * drm/i915/dp_mst - System would hang during the boot up. (LP: #1902469)
- Revert "UBUNTU: SAUCE: drm/i915/display: Fix null deref in
  intel_psr_atomic_check()"
- drm/i915: Fix encoder lookup during PSR atomic check

  * Undetected Data corruption in MPI workloads that use VSX for reductions on
POWER9 DD2.1 systems (LP: #1902694)
- powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load 
emulation
- selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load
  workaround

  * [20.04 FEAT] Support/enhancement of NVMe IPL (LP: #1902179)
- s390/ipl: support NVMe IPL kernel parameters

  * uvcvideo: add mapping for HEVC payloads (LP: #1895803)
- media: uvcvideo: Add mapping for HEVC payloads

  * risc-v 5.8 kernel oops on ftrace tests (LP: #1894613)
- stop_machine, rcu: Mark functions as notrace

  * Groovy update: v5.8.17 upstream stable release (LP: #1902137)
- xgb4: handle 4-tuple PEDIT to NAT mode translation
- ibmveth: Switch order of ibmveth_helper calls.
- ibmveth: Identify ingress large send packets.
- ipv4: Restore flowi4_oif update before call to xfrm_lookup_route
- mlx4: handle non-napi callers to napi_poll
- net: dsa: microchip: fix race condition
- net: fec: Fix phy_device lookup for phy_reset_after_clk_enable()
- net: fec: Fix PHY init after phy_reset_after_clk_enable()
- net: fix pos incrementment in ipv6_route_seq_next
- net: ipa: skip suspend/resume activities if not set up
- net: mptcp: make DACK4/DACK8 usage consistent among all subflows
- net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
- net/smc: fix use-after-free of delayed events
- net/smc: fix valid DMBE buffer sizes
- net/tls: sendfile fails with ktls offload
- net: usb: qmi_wwan: add Cellient MPL200 card
- tipc: fix the skb_unshare() in tipc_buf_append()
- socket: fix option SO_TIMESTAMPING_NEW
- socket: don't clear SOCK_TSTAMP_NEW when SO_TIMESTAMPNS is disabled
- can: m_can_platform: don't call m_can_class_suspend in runtime suspend
- can: j1935: j1939_tp_tx_dat_new(): fix missing initialization of skbcnt
- net: j1939: j1939_session_fresh_new(): fix missing initialization of 
skbcnt
- net/ipv4: always honour route mtu during forwarding
- net_sched: remove a redundant goto chain check
- r8169: fix data corruption issue on RTL8402
- binder: fix UAF when releasing todo list
- ALSA: bebob: potential info leak in hwdep_read()
- ALSA: hda/hdmi: fix incorrect 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-11-30 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.4.0-56.62

---
linux (5.4.0-56.62) focal; urgency=medium

  * focal/linux: 5.4.0-56.62 -proposed tracker (LP: #1905300)

  * CVE-2020-4788
- selftests/powerpc: rfi_flush: disable entry flush if present
- powerpc/64s: flush L1D on kernel entry
- powerpc/64s: flush L1D after user accesses
- selftests/powerpc: entry flush test

linux (5.4.0-55.61) focal; urgency=medium

  * focal/linux: 5.4.0-55.61 -proposed tracker (LP: #1903175)

  * Update kernel packaging to support forward porting kernels (LP: #1902957)
- [Debian] Update for leader included in BACKPORT_SUFFIX

  * Avoid double newline when running insertchanges (LP: #1903293)
- [Packaging] insertchanges: avoid double newline

  * EFI: Fails when BootCurrent entry does not exist (LP: #183)
- efivarfs: Replace invalid slashes with exclamation marks in dentries.

  * CVE-2020-14351
- perf/core: Fix race in the perf_mmap_close() function

  * raid10: Block discard is very slow, causing severe delays for mkfs and
fstrim operations (LP: #1896578)
- md: add md_submit_discard_bio() for submitting discard bio
- md/raid10: extend r10bio devs to raid disks
- md/raid10: pull codes that wait for blocked dev into one function
- md/raid10: improve raid10 discard request
- md/raid10: improve discard request for far layout
- dm raid: fix discard limits for raid1 and raid10
- dm raid: remove unnecessary discard limits for raid10

  * Bionic: btrfs: kernel BUG at /build/linux-
eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254)
- btrfs: drop unnecessary offset_in_page in extent buffer helpers
- btrfs: extent_io: do extra check for extent buffer read write functions
- btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent()
- btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref()
- btrfs: ctree: check key order before merging tree blocks

  * Ethernet no link lights after reboot (Intel i225-v 2.5G) (LP: #1902578)
- igc: Add PHY power management control

  * Undetected Data corruption in MPI workloads that use VSX for reductions on
POWER9 DD2.1 systems (LP: #1902694)
- powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load 
emulation
- selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load
  workaround

  * [20.04 FEAT] Support/enhancement of NVMe IPL (LP: #1902179)
- s390: nvme ipl
- s390: nvme reipl
- s390/ipl: support NVMe IPL kernel parameters

  * uvcvideo: add mapping for HEVC payloads (LP: #1895803)
- media: uvcvideo: Add mapping for HEVC payloads

  * Focal update: v5.4.73 upstream stable release (LP: #1902115)
- ibmveth: Switch order of ibmveth_helper calls.
- ibmveth: Identify ingress large send packets.
- ipv4: Restore flowi4_oif update before call to xfrm_lookup_route
- mlx4: handle non-napi callers to napi_poll
- net: fec: Fix phy_device lookup for phy_reset_after_clk_enable()
- net: fec: Fix PHY init after phy_reset_after_clk_enable()
- net: fix pos incrementment in ipv6_route_seq_next
- net/smc: fix valid DMBE buffer sizes
- net/tls: sendfile fails with ktls offload
- net: usb: qmi_wwan: add Cellient MPL200 card
- tipc: fix the skb_unshare() in tipc_buf_append()
- socket: fix option SO_TIMESTAMPING_NEW
- can: m_can_platform: don't call m_can_class_suspend in runtime suspend
- can: j1935: j1939_tp_tx_dat_new(): fix missing initialization of skbcnt
- net: j1939: j1939_session_fresh_new(): fix missing initialization of 
skbcnt
- net/ipv4: always honour route mtu during forwarding
- net_sched: remove a redundant goto chain check
- r8169: fix data corruption issue on RTL8402
- cxgb4: handle 4-tuple PEDIT to NAT mode translation
- binder: fix UAF when releasing todo list
- ALSA: bebob: potential info leak in hwdep_read()
- ALSA: hda/hdmi: fix incorrect locking in hdmi_pcm_close
- nvme-pci: disable the write zeros command for Intel 600P/P3100
- chelsio/chtls: fix socket lock
- chelsio/chtls: correct netdevice for vlan interface
- chelsio/chtls: correct function return and return type
- ibmvnic: save changed mac address to adapter->mac_addr
- net: ftgmac100: Fix Aspeed ast2600 TX hang issue
- net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
- net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling
  ether_setup
- net: Properly typecast int values to set sk_max_pacing_rate
- net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
- nexthop: Fix performance regression in nexthop deletion
- nfc: Ensure presence of NFC_ATTR_FIRMWARE_NAME attribute in
  nfc_genl_fw_download()
- r8169: fix operation under forced interrupt threading
- selftests: forwarding: Add missing 'rp_filter' configuration
- tcp: fix to update snd_wl1 in bulk receiver fast 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-11-17 Thread Matthew Ruffell
Performing verification for Bionic.

I enabled -proposed and installed 4.15.0-125-generic to a i3.8xlarge AWS
instance.

>From there, I followed the testcase steps:

$ uname -rv
4.15.0-125-generic #128-Ubuntu SMP Mon Nov 9 20:51:00 UTC 2020
$ lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda202:008G  0 disk 
└─xvda1 202:108G  0 part /
nvme0n1 259:00  1.7T  0 disk 
nvme1n1 259:10  1.7T  0 disk 
nvme2n1 259:20  1.7T  0 disk 
nvme3n1 259:30  1.7T  0 disk 
$ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 
/dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
mdadm: layout defaults to n2
mdadm: layout defaults to n2
mdadm: chunk size defaults to 512K
mdadm: size set to 1855336448K
mdadm: automatically enabling write-intent bitmap on large array
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
$ time sudo mkfs.xfs /dev/md0
meta-data=/dev/md0   isize=512agcount=32, agsize=28989568 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=1, sparse=0, rmapbt=0, 
reflink=0
data =   bsize=4096   blocks=927666176, imaxpct=5
 =   sunit=128swidth=256 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal log   bsize=4096   blocks=452968, version=2
 =   sectsz=512   sunit=8 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

real0m3.615s
user0m0.002s
sys 0m0.179s
$ sudo mkdir /mnt/disk
$ sudo mount /dev/md0 /mnt/disk
$ time sudo fstrim /mnt/disk

real0m1.898s
user0m0.002s
sys 0m0.015s

We can see that mkfs.xfs took 3.6 seconds, and fstrim only 2 seconds.
This is a significant improvement over the current 11 minutes.

I started up a c5.large instance, and attached 4x EBS drives, which do
not support block discard, and went through the testcase steps.
Everything worked fine, and the changes have not caused any regressions
to disks which do not support block discard.

I also started another i3.8xlarge instance and tested raid0, to check
for regressions around the refactoring. raid0 deployed fine, and was as
performant as usual.

The 4.15.0-125-generic kernel in -proposed fixes the issue, and I am
happy to mark as verified.

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-11-17 Thread Matthew Ruffell
Performing verification for Focal.

I enabled -proposed and installed 5.4.0-55-generic to a i3.8xlarge AWS
instance.

>From there, I followed the testcase steps:

$ uname -rv
5.4.0-55-generic #61-Ubuntu SMP Mon Nov 9 20:49:56 UTC 2020
$ lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda202:008G  0 disk 
└─xvda1 202:108G  0 part /
nvme0n1 259:00  1.7T  0 disk 
nvme1n1 259:10  1.7T  0 disk 
nvme3n1 259:20  1.7T  0 disk 
nvme2n1 259:30  1.7T  0 disk 
$ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 
/dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
mdadm: layout defaults to n2
mdadm: layout defaults to n2
mdadm: chunk size defaults to 512K
mdadm: size set to 1855336448K
mdadm: automatically enabling write-intent bitmap on large array
mdadm: Fail create md0 when using /sys/module/md_mod/parameters/new_array
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
$ time sudo mkfs.xfs /dev/md0
log stripe unit (524288 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB
meta-data=/dev/md0   isize=512agcount=32, agsize=28989568 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=1, sparse=1, rmapbt=0
 =   reflink=1
data =   bsize=4096   blocks=927666176, imaxpct=5
 =   sunit=128swidth=256 blks
naming   =version 2  bsize=4096   ascii-ci=0, ftype=1
log  =internal log   bsize=4096   blocks=452968, version=2
 =   sectsz=512   sunit=8 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

real0m5.350s
user0m0.022s
sys 0m0.179s
$ sudo mkdir /mnt/disk
$ sudo mount /dev/md0 /mnt/disk
$ time sudo fstrim /mnt/disk

real0m2.944s
user0m0.006s
sys 0m0.013s

We can see that mkfs.xfs took 5.3 seconds, and fstrim only 3 seconds.
This is a significant improvement over the current 11 minutes.

I started up a c5.large instance, and attached 4x EBS drives, which do
not support block discard, and went through the testcase steps.
Everything worked fine, and the changes have not caused any regressions
to disks which do not support block discard.

I also started another i3.8xlarge instance and tested raid0, to check
for regressions around the refactoring. raid0 deployed fine, and was as
performant as usual.

The 5.4.0-55-generic kernel in -proposed fixes the issue, and I am happy
to mark as verified.

** Tags removed: verification-needed-focal
** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-11-17 Thread Matthew Ruffell
Performing verification for Groovy.

I enabled -proposed and installed 5.8.0-30-generic to a i3.8xlarge AWS
instance.

>From there, I followed the testcase steps:

$ uname -rv
5.8.0-30-generic #32-Ubuntu SMP Mon Nov 9 21:03:15 UTC 2020
$ lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda202:008G  0 disk 
└─xvda1 202:108G  0 part /
nvme0n1 259:00  1.7T  0 disk 
nvme1n1 259:10  1.7T  0 disk 
nvme3n1 259:20  1.7T  0 disk 
nvme2n1 259:30  1.7T  0 disk 
$ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 
/dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
mdadm: layout defaults to n2
mdadm: layout defaults to n2
mdadm: chunk size defaults to 512K
mdadm: size set to 1855336448K
mdadm: automatically enabling write-intent bitmap on large array
mdadm: Fail create md0 when using /sys/module/md_mod/parameters/new_array
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
$ time sudo mkfs.xfs /dev/md0
log stripe unit (524288 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB
meta-data=/dev/md0   isize=512agcount=32, agsize=28989568 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=1, sparse=1, rmapbt=0
 =   reflink=1
data =   bsize=4096   blocks=927666176, imaxpct=5
 =   sunit=128swidth=256 blks
naming   =version 2  bsize=4096   ascii-ci=0, ftype=1
log  =internal log   bsize=4096   blocks=452968, version=2
 =   sectsz=512   sunit=8 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.

real0m4.413s
user0m0.022s
sys 0m0.245s
$ sudo mkdir /mnt/disk
$ sudo mount /dev/md0 /mnt/disk
$ time sudo fstrim /mnt/disk

real0m1.973s
user0m0.000s
sys 0m0.037s

We can see that mkfs.xfs took 4.4 seconds, and fstrim only 2 seconds.
This is a significant improvement over the current 11 minutes.

I started up a c5.large instance, and attached 4x EBS drives, which do
not support block discard, and went through the testcase steps.
Everything worked fine, and the changes have not caused any regressions
to disks which do not support block discard.

I also started another i3.8xlarge instance and tested raid0, to check
for regressions around the refactoring. raid0 deployed fine, and was as
performant as usual.

The 5.8.0-30-generic kernel in -proposed fixes the issue, and I am happy
to mark as verified.

** Tags removed: verification-needed-groovy
** Tags added: verification-done-groovy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-11-17 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

** Tags added: verification-needed-focal

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.

  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0

  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3

  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484

  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9

  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-11-17 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
groovy' to 'verification-done-groovy'. If the problem still exists,
change the tag 'verification-needed-groovy' to 'verification-failed-
groovy'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.

  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0

  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3

  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484

  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9

  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  Author: Mike Snitzer 
  Date: 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-11-17 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
focal' to 'verification-done-focal'. If the problem still exists, change
the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-groovy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.

  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0

  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3

  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484

  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9

  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-11-05 Thread Ian
** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.

  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0

  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3

  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484

  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9

  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  Author: Mike Snitzer 
  Date: Thu Sep 24 13:14:52 2020 -0400
  Subject: dm raid: fix discard limits for raid1 and raid10
  Link: 
https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512

  commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  Author: Mike Snitzer 
  Date:   Thu Sep 24 16:40:12 2020 -0400
  Subject: dm raid: remove unnecessary discard limits for raid10
  Link: 
https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28

  All the commits mentioned follow a similar strategy which was
  

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-11-05 Thread Ian
** Changed in: linux (Ubuntu Focal)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.

  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0

  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3

  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484

  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9

  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  Author: Mike Snitzer 
  Date: Thu Sep 24 13:14:52 2020 -0400
  Subject: dm raid: fix discard limits for raid1 and raid10
  Link: 
https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512

  commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  Author: Mike Snitzer 
  Date:   Thu Sep 24 16:40:12 2020 -0400
  Subject: dm raid: remove unnecessary discard limits for raid10
  Link: 
https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28

  All the commits mentioned follow a similar strategy which was
  

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-11-05 Thread Ian
** Changed in: linux (Ubuntu Groovy)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1896578

  [Impact]

  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to
  take a very long time.

  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on
  Raid 0, takes 4 seconds.

  The bigger the devices, the longer it takes.

  The cause is that Raid10 currently uses a 512k chunk size, and uses
  this for the discard_max_bytes value. If we need to discard 1.9TB, the
  kernel splits the request into millions of 512k bio requests, even if
  the underlying device supports larger requests.

  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard
  at once:

  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040

  Where the Raid10 md device only supports 512k:

  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288

  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()

  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [Fix]

  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.

  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0

  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3

  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484

  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9

  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359

  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.

  commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  Author: Mike Snitzer 
  Date: Thu Sep 24 13:14:52 2020 -0400
  Subject: dm raid: fix discard limits for raid1 and raid10
  Link: 
https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512

  commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  Author: Mike Snitzer 
  Date:   Thu Sep 24 16:40:12 2020 -0400
  Subject: dm raid: remove unnecessary discard limits for raid10
  Link: 
https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28

  All the commits mentioned follow a similar strategy which was
  

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-10-28 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1896578
  
  [Impact]
  
  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to take
  a very long time.
  
  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid
  0, takes 4 seconds.
  
  The bigger the devices, the longer it takes.
  
  The cause is that Raid10 currently uses a 512k chunk size, and uses this
  for the discard_max_bytes value. If we need to discard 1.9TB, the kernel
  splits the request into millions of 512k bio requests, even if the
  underlying device supports larger requests.
  
  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at
  once:
  
  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040
  
  Where the Raid10 md device only supports 512k:
  
  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288
  
  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()
  
  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
  
  [Fix]
  
  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.
  
  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  
  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  
  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484
  
  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9
  
  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  
  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.
  
  commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  Author: Mike Snitzer 
  Date: Thu Sep 24 13:14:52 2020 -0400
  Subject: dm raid: fix discard limits for raid1 and raid10
  Link: 
https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  
  commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  Author: Mike Snitzer 
  Date:   Thu Sep 24 16:40:12 2020 -0400
  Subject: dm raid: remove unnecessary discard limits for raid10
  Link: 
https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  
  All the commits mentioned follow a similar strategy which was
  implemented in Raid0 in the below commit, which was merged in 4.12-rc2,
  which fixed block discard performance issues in Raid0:
  
  commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0
  Author: Shaohua Li 
  Date: Sun May 7 17:36:24 2017 -0700
  Subject: md/md0: optimize raid0 discard handling
  Link: 
https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0
  
  The commits more or less cherry pick to the 5.8, 5.4 and 4.15 kernels,
  with the following minor fixups:
  
  1) 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-10-27 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1896578
  
  [Impact]
  
  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to take
  a very long time.
  
  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid
  0, takes 4 seconds.
  
  The bigger the devices, the longer it takes.
  
  The cause is that Raid10 currently uses a 512k chunk size, and uses this
  for the discard_max_bytes value. If we need to discard 1.9TB, the kernel
  splits the request into millions of 512k bio requests, even if the
  underlying device supports larger requests.
  
  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at
  once:
  
  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040
  
  Where the Raid10 md device only supports 512k:
  
  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288
  
  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()
  
  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
  
  [Fix]
  
  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.
  
  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  
  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  
  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484
  
  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9
  
  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  
  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.
  
  commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  Author: Mike Snitzer 
  Date: Thu Sep 24 13:14:52 2020 -0400
  Subject: dm raid: fix discard limits for raid1 and raid10
  Link: 
https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  
  commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  Author: Mike Snitzer 
  Date:   Thu Sep 24 16:40:12 2020 -0400
  Subject: dm raid: remove unnecessary discard limits for raid10
  Link: 
https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  
  All the commits mentioned follow a similar strategy which was
  implemented in Raid0 in the below commit, which was merged in 4.12-rc2,
  which fixed block discard performance issues in Raid0:
  
  commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0
  Author: Shaohua Li 
  Date: Sun May 7 17:36:24 2017 -0700
  Subject: md/md0: optimize raid0 discard handling
  Link: 
https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0
  
  The commits more or less cherry pick to the 5.8, 5.4 and 4.15 kernels,
  with the following minor fixups:
  
  1) 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-10-25 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1896578
  
  [Impact]
  
  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to take
  a very long time.
  
  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid
  0, takes 4 seconds.
  
  The bigger the devices, the longer it takes.
  
  The cause is that Raid10 currently uses a 512k chunk size, and uses this
  for the discard_max_bytes value. If we need to discard 1.9TB, the kernel
  splits the request into millions of 512k bio requests, even if the
  underlying device supports larger requests.
  
  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at
  once:
  
  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040
  
  Where the Raid10 md device only supports 512k:
  
  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288
  
  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()
  
  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
  
  [Fix]
  
  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.
  
  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  
  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  
  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484
  
  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9
  
  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  
  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.
  
  commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  Author: Mike Snitzer 
  Date: Thu Sep 24 13:14:52 2020 -0400
  Subject: dm raid: fix discard limits for raid1 and raid10
  Link: 
https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  
  commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  Author: Mike Snitzer 
  Date:   Thu Sep 24 16:40:12 2020 -0400
  Subject: dm raid: remove unnecessary discard limits for raid10
  Link: 
https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  
  All the commits mentioned follow a similar strategy which was
  implemented in Raid0 in the below commit, which was merged in 4.12-rc2,
  which fixed block discard performance issues in Raid0:
  
  commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0
  Author: Shaohua Li 
  Date: Sun May 7 17:36:24 2017 -0700
  Subject: md/md0: optimize raid0 discard handling
  Link: 
https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0
  
+ The commits more or less cherry pick to the 5.8, 5.4 and 4.15 kernels,
+ with the following minor fixups:
+ 
+ 1) 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-10-24 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1896578
  
  [Impact]
  
  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to take
  a very long time.
  
  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid
  0, takes 4 seconds.
  
  The bigger the devices, the longer it takes.
  
  The cause is that Raid10 currently uses a 512k chunk size, and uses this
  for the discard_max_bytes value. If we need to discard 1.9TB, the kernel
  splits the request into millions of 512k bio requests, even if the
  underlying device supports larger requests.
  
  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at
  once:
  
  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040
  
  Where the Raid10 md device only supports 512k:
  
  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288
  
  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()
  
  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
  
  [Fix]
  
  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.
  
+ commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
+ Author: Xiao Ni 
+ Date: Tue Aug 25 13:42:59 2020 +0800
+ Subject: md: add md_submit_discard_bio() for submitting discard bio
+ Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
+ 
+ commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
+ Author: Xiao Ni 
+ Date: Tue Aug 25 13:43:00 2020 +0800
+ Subject: md/raid10: extend r10bio devs to raid disks
+ Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
+ 
+ commit f046f5d0d79cdb968f219ce249e497fd1accf484
+ Author: Xiao Ni 
+ Date: Tue Aug 25 13:43:01 2020 +0800
+ Subject: md/raid10: pull codes that wait for blocked dev into one function
+ Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484
+ 
+ commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
+ Author: Xiao Ni 
+ Date: Wed Sep 2 20:00:22 2020 +0800
+ Subject: md/raid10: improve raid10 discard request
+ Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9
+ 
  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
- Date:   Wed Sep 2 20:00:23 2020 +0800
+ Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359
- 
- commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
- Author: Xiao Ni 
- Date:   Wed Sep 2 20:00:22 2020 +0800
- Subject: md/raid10: improve raid10 discard request
- Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9
- 
- commit f046f5d0d79cdb968f219ce249e497fd1accf484
- Author: Xiao Ni 
- Date:   Tue Aug 25 13:43:01 2020 +0800
- Subject: md/raid10: pull codes that wait for blocked dev into one function
- Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484
- 
- commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
- Author: Xiao Ni 
- Date:   Tue Aug 25 13:43:00 2020 +0800
- Subject: md/raid10: extend r10bio devs to raid disks
- Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
- 
- commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
- Author: Xiao Ni 
- Date:   Tue Aug 25 13:42:59 2020 +0800
- Subject: md: add md_submit_discard_bio() for submitting discard bio
- Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  
  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
  following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.
  
  commit 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-10-24 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1896578
  
  [Impact]
  
  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to take
  a very long time.
  
  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid
  0, takes 4 seconds.
  
  The bigger the devices, the longer it takes.
  
  The cause is that Raid10 currently uses a 512k chunk size, and uses this
  for the discard_max_bytes value. If we need to discard 1.9TB, the kernel
  splits the request into millions of 512k bio requests, even if the
  underlying device supports larger requests.
  
  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at
  once:
  
  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040
  
  Where the Raid10 md device only supports 512k:
  
  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288
  
  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()
  
  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
  
  [Fix]
  
  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. These commits have now landed in 5.10-rc1.
  
  commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  Author: Xiao Ni 
  Date:   Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  
  commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
  Date:   Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9
  
  commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
  Date:   Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484
  
  commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
  Date:   Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  
  commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
  Date:   Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  
  There is also an additional commit which is required, and was merged
  after "md/raid10: improve raid10 discard request" was merged. The
- following commit enables Radid10 to use large discards, instead of
+ following commits enable Radid10 to use large discards, instead of
  splitting into many bios, since the technical hurdles have now been
  removed.
+ 
+ commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512
+ Author: Mike Snitzer 
+ Date: Thu Sep 24 13:14:52 2020 -0400
+ Subject: dm raid: fix discard limits for raid1 and raid10
+ Link: 
https://github.com/torvalds/linux/commit/e0910c8e4f87bb9f767e61a778b0d9271c4dc512
  
  commit f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  Author: Mike Snitzer 
  Date:   Thu Sep 24 16:40:12 2020 -0400
  Subject: dm raid: remove unnecessary discard limits for raid10
  Link: 
https://github.com/torvalds/linux/commit/f0e90b6c663a7e3b4736cb318c6c7c589f152c28
  
  All the commits mentioned follow a similar strategy which was
  implemented in Raid0 in the below commit, which was merged in 4.12-rc2,
  which fixed block discard performance issues in Raid0:
  
  commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0
  Author: Shaohua Li 
  Date: Sun May 7 17:36:24 2017 -0700
  Subject: md/md0: optimize raid0 discard handling
  Link: 
https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0
  
  [Testcase]
  
  You will need a machine with at 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-10-21 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1896578
  
  [Impact]
  
  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to take
  a very long time.
  
  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid
  0, takes 4 seconds.
  
  The bigger the devices, the longer it takes.
  
  The cause is that Raid10 currently uses a 512k chunk size, and uses this
  for the discard_max_bytes value. If we need to discard 1.9TB, the kernel
  splits the request into millions of 512k bio requests, even if the
  underlying device supports larger requests.
  
  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at
  once:
  
  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040
  
  Where the Raid10 md device only supports 512k:
  
  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288
  
  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()
  
  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
  
  [Fix]
  
  Xiao Ni has developed a patchset which resolves the block discard
- performance problems. It is currently in the md-next tree [1], and I am
- expecting the commits to be merged during the 5.10 merge window.
+ performance problems. These commits have now landed in 5.10-rc1.
  
- [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h
- =md-next
+ commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359
+ Author: Xiao Ni 
+ Date:   Wed Sep 2 20:00:23 2020 +0800
+ Subject: md/raid10: improve discard request for far layout
+ Link: 
https://github.com/torvalds/linux/commit/d3ee2d8415a6256c1c41e1be36e80e640c3e6359
  
- commit 5b2374a6c221f28c74913d208bb5376a7ee3bf70
+ commit bcc90d280465ebd51ab8688be86e1f00c62dccf9
  Author: Xiao Ni 
- Date: Wed Sep 2 20:00:23 2020 +0800
- Subject: md/raid10: improve discard request for far layout
- Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=5b2374a6c221f28c74913d208bb5376a7ee3bf70
+ Date:   Wed Sep 2 20:00:22 2020 +0800
+ Subject: md/raid10: improve raid10 discard request
+ Link: 
https://github.com/torvalds/linux/commit/bcc90d280465ebd51ab8688be86e1f00c62dccf9
  
- commit 8f694215ae4c7abf1e6c985803a1aad0db748d07
+ commit f046f5d0d79cdb968f219ce249e497fd1accf484
  Author: Xiao Ni 
- Date: Wed Sep 2 20:00:22 2020 +0800
- Subject: md/raid10: improve raid10 discard request
- Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=8f694215ae4c7abf1e6c985803a1aad0db748d07
+ Date:   Tue Aug 25 13:43:01 2020 +0800
+ Subject: md/raid10: pull codes that wait for blocked dev into one function
+ Link: 
https://github.com/torvalds/linux/commit/f046f5d0d79cdb968f219ce249e497fd1accf484
  
- commit 6fcfa8732a8cfea7828a9444c855691c481ee557
+ commit 8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  Author: Xiao Ni 
- Date: Tue Aug 25 13:43:01 2020 +0800
- Subject: md/raid10: pull codes that wait for blocked dev into one function
- Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=6fcfa8732a8cfea7828a9444c855691c481ee557
+ Date:   Tue Aug 25 13:43:00 2020 +0800
+ Subject: md/raid10: extend r10bio devs to raid disks
+ Link: 
https://github.com/torvalds/linux/commit/8650a889017cb1f6ea6813ccf83a2e9f6fa49dd3
  
- commit 6f4fed152a5e483af2227156ce7b6263aeeb5c84
+ commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  Author: Xiao Ni 
- Date: Tue Aug 25 13:43:00 2020 +0800
- Subject: md/raid10: extend r10bio devs to raid disks
- Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=6f4fed152a5e483af2227156ce7b6263aeeb5c84
+ Date:   Tue Aug 25 13:42:59 2020 +0800
+ Subject: md: add md_submit_discard_bio() for submitting discard bio
+ Link: 
https://github.com/torvalds/linux/commit/2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0
  
- commit 7197f1a616caf85508d81c7f5c9f065ffaebf027
- Author: Xiao Ni 
- 

[Kernel-packages] [Bug 1896578] Re: raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

2020-09-22 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1896578
  
  [Impact]
  
  Block discard is very slow on Raid10, which causes common use cases
  which invoke block discard, such as mkfs and fstrim operations, to take
  a very long time.
  
  For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe
  devices which support block discard, a mkfs.xfs operation on Raid 10
  takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid
  0, takes 4 seconds.
  
  The bigger the devices, the longer it takes.
  
  The cause is that Raid10 currently uses a 512k chunk size, and uses this
  for the discard_max_bytes value. If we need to discard 1.9TB, the kernel
  splits the request into millions of 512k bio requests, even if the
  underlying device supports larger requests.
  
  For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at
  once:
  
  $ cat /sys/block/nvme0n1/queue/discard_max_bytes
  2199023255040
  $ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
  2199023255040
  
  Where the Raid10 md device only supports 512k:
  
  $ cat /sys/block/md0/queue/discard_max_bytes
  524288
  $ cat /sys/block/md0/queue/discard_max_hw_bytes
  524288
  
  If we perform a mkfs.xfs operation on the /dev/md array, it takes over
  11 minutes and if we examine the stack, it is stuck in
  blkdev_issue_discard()
  
  $ sudo cat /proc/1626/stack
  [<0>] wait_barrier+0x14c/0x230 [raid10]
  [<0>] regular_request_wait+0x39/0x150 [raid10]
  [<0>] raid10_write_request+0x11e/0x850 [raid10]
  [<0>] raid10_make_request+0xd7/0x150 [raid10]
  [<0>] md_handle_request+0x123/0x1a0
  [<0>] md_submit_bio+0xda/0x120
  [<0>] __submit_bio_noacct+0xde/0x320
  [<0>] submit_bio_noacct+0x4d/0x90
  [<0>] submit_bio+0x4f/0x1b0
  [<0>] __blkdev_issue_discard+0x154/0x290
  [<0>] blkdev_issue_discard+0x5d/0xc0
  [<0>] blk_ioctl_discard+0xc4/0x110
  [<0>] blkdev_common_ioctl+0x56c/0x840
  [<0>] blkdev_ioctl+0xeb/0x270
  [<0>] block_ioctl+0x3d/0x50
  [<0>] __x64_sys_ioctl+0x91/0xc0
  [<0>] do_syscall_64+0x38/0x90
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
  
  [Fix]
  
  Xiao Ni has developed a patchset which resolves the block discard
  performance problems. It is currently in the md-next tree [1], and I am
  expecting the commits to be merged during the 5.10 merge window.
  
  [1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h
  =md-next
  
  commit 5b2374a6c221f28c74913d208bb5376a7ee3bf70
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:23 2020 +0800
  Subject: md/raid10: improve discard request for far layout
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=5b2374a6c221f28c74913d208bb5376a7ee3bf70
  
  commit 8f694215ae4c7abf1e6c985803a1aad0db748d07
  Author: Xiao Ni 
  Date: Wed Sep 2 20:00:22 2020 +0800
  Subject: md/raid10: improve raid10 discard request
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=8f694215ae4c7abf1e6c985803a1aad0db748d07
  
  commit 6fcfa8732a8cfea7828a9444c855691c481ee557
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:01 2020 +0800
  Subject: md/raid10: pull codes that wait for blocked dev into one function
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=6fcfa8732a8cfea7828a9444c855691c481ee557
  
  commit 6f4fed152a5e483af2227156ce7b6263aeeb5c84
  Author: Xiao Ni 
  Date: Tue Aug 25 13:43:00 2020 +0800
  Subject: md/raid10: extend r10bio devs to raid disks
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=6f4fed152a5e483af2227156ce7b6263aeeb5c84
  
  commit 7197f1a616caf85508d81c7f5c9f065ffaebf027
  Author: Xiao Ni 
  Date: Tue Aug 25 13:42:59 2020 +0800
  Subject: md: add md_submit_discard_bio() for submitting discard bio
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next=7197f1a616caf85508d81c7f5c9f065ffaebf027
  
  It follows a similar strategy which was implemented in Raid0 in the
  below commit, which was merged in 4.12-rc2:
  
  commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0
  Author: Shaohua Li 
  Date: Sun May 7 17:36:24 2017 -0700
  Subject: md/md0: optimize raid0 discard handling
  Link: 
https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0
  
  [Testcase]
  
  You will need a machine with at least 4x NVMe drives which support block
  discard. I use a i3.8xlarge instance on AWS, since it has all of these
  things.
  
  $ lsblk
  xvda202:008G  0 disk
  └─xvda1 202:108G  0 part /
  nvme0n1 259:20  1.7T  0 disk
  nvme1n1 259:00  1.7T  0 disk
  nvme2n1 259:10  1.7T  0 disk
  nvme3n1 259:30  1.7T  0 disk
  
  Create a Raid10 array:
  
  $ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4
  /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
  
  Format the array with XFS:
  
  $ time sudo mkfs.xfs /dev/md0
  real 11m14.734s
  
  $ sudo mkdir /mnt/disk
  $ sudo mount /dev/md0