Hi everyone,

The original patch author, Xiao Ni, has sent a V2 patchset to the linux-
raid mailing list for feedback. This new patchset fixes the problems the
previous version had, namely, properly calculating the discard offset
for second and onward disks, and correctly calculates the stripe size in
far layouts.

The patches are:

https://www.spinics.net/lists/raid/msg67208.html
https://www.spinics.net/lists/raid/msg67212.html
https://www.spinics.net/lists/raid/msg67213.html
https://www.spinics.net/lists/raid/msg67209.html
https://www.spinics.net/lists/raid/msg67210.html
https://www.spinics.net/lists/raid/msg67211.html

We now need to thoroughly test and provide feedback to Xiao and the Raid
subsystem maintainer before these patches can get merged into mainline
again. We really need to make sure that these patches don't cause any
data corruption.

I have backported the patchset to the 4.15, 5.4 and 5.8 kernels.

Backports for 5.4 and 5.8 kernels:

https://paste.ubuntu.com/p/vPFFPMjhbv/
https://paste.ubuntu.com/p/MCGH8v7Rqk/
https://paste.ubuntu.com/p/rppy39Qgkz/
https://paste.ubuntu.com/p/Dsqy4PQNzJ/
https://paste.ubuntu.com/p/mZ9VDBD8d5/
https://paste.ubuntu.com/p/vJNYZyGTWH/
https://paste.ubuntu.com/p/M4sMwhgWTj/

Backports for the 4.15 kernel:

https://paste.ubuntu.com/p/X9rRHT59qf/
https://paste.ubuntu.com/p/VWwW9JbBHy/
https://paste.ubuntu.com/p/pFY3YbBW6t/
https://paste.ubuntu.com/p/JKg4KcHwPB/
https://paste.ubuntu.com/p/C4sf2r9jS4/

I have built test kernels for bionic, bionic HWE, focal and groovy.

Performance testing confirms that the testcase for formatting a Raid10
array on NVMe disks drops from 8.5 minutes to about 6 seconds, on AWS
i3.8xlarge, due to the speedup in block discard.

https://paste.ubuntu.com/p/NNGqP3xdsc/

I have also run through the data corruption regression reproducer from
bug 1907262, and throughout the process, the
/sys/block/md0/md/mismatch_cnt was always 0, and all deep fsck checks
came back clean for individual disks.

https://paste.ubuntu.com/p/5DK57TzdFH/

I am happy with these results, and its time to get some wider testing on
these patches.

If you are interested in helping to test, please use dedicated test
servers, and not production systems. These patches have caused data
corruption before, so only place data on the Raid10 array that you have
copies of elsewhere, and assume that total data loss could happen
anytime.

Please note, these test kernels are NOT SUPPORTED by Canonical, and are
for TEST PURPOSES ONLY. ONLY install in a dedicated test environment.

Instructions to Install (on a Bionic or Focal or Groovy system):
1) sudo add-apt-repository ppa:mruffell/lp1896578-test
2) sudo apt update

For Bionic:
3) sudo apt install linux-image-unsigned-4.15.0-136-generic 
linux-modules-4.15.0-136-generic linux-modules-extra-4.15.0-136-generic 
linux-headers-4.15.0-136-generic

For Bionic HWE 5.4 or Focal:
3) sudo apt install linux-image-unsigned-5.4.0-66-generic 
linux-modules-5.4.0-66-generic linux-modules-extra-5.4.0-66-generic 
linux-headers-5.4.0-66-generic

For Groovy:
3) sudo apt install linux-image-unsigned-5.8.0-44-generic 
linux-modules-5.8.0-44-generic linux-modules-extra-5.8.0-44-generic 
linux-headers-5.8.0-44-generic

4) sudo reboot
5) uname -rv

Bionic:
4.15.0-136-generic #140+TEST1896578v20210212b1-Ubuntu SMP Fri Feb 12 03:00:17 
UTC 2
Bionic HWE:
5.4.0-66-generic #74~18.04.2+TEST1896578v20210212b1-Ubuntu SMP Fri Feb 12 
02:55:4
Focal:
5.4.0-66-generic #74+TEST1896578v20210212b1-Ubuntu SMP Fri Feb 12 06:30:51 UTC 
20
Groovy:
5.8.0-44-generic #50+TEST1896578v20210212b1-Ubuntu SMP Fri Feb 12 06:19:49 UTC 
20

Make sure the uname matches one of the above strings before you start
testing and formatting Raid10 arrays.

We want to test formatting the arrays with xfs, ext4, and general usage
over time with regular consistency checks and fstrims. We want to make
sure that mismatch counts are 0, and all fsck -f runs are clean, and no
data corruption happens.

If you have any problems whatsoever, please let me know.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1896578

Title:
  raid10: Block discard is very slow, causing severe delays for mkfs and
  fstrim operations

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to