[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Matthew, sorry for the late reply. Today I triggered another fstrim with the linux-image-5.4.0-75-generic kernel and made a final check on the RAID - for me no trouble occured yet. Thank you for pursuing this topic so persistently and providing the patches to the Ubuntu kernel finally. Best regards, Thimo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Thimo, The SRU cycle has completed, and all kernels containing the Raid10 block discard performance patches have now been released to -updates. Note that the versions are different than the kernels in -proposed, due to the kernel team needing to do a last minute respin to fix two sets of CVEs, one for broadcom wifi chipsets and the other for bpf, hence the kernels being released a day later than usual. The released kernels are: Hirsute: 5.11.0-22-generic Groovy: 5.8.0-59-generic Focal: 5.4.0-77-generic Bionic: 4.15.0-147-generic The HWE equivalents have also been released to -updates. You may now install these kernels to your systems and enjoy fast block discard for your Raid10 arrays. All of our testing has concluded that these patches are stable, but if you run into any issues whatsoever as you roll this out to more systems, please let us know, and we will investigate accordingly. I wish you a trouble free rollout of these kernels to your systems. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Thimo, Just checking in. Are you still running 5.4.0-75-generic on your server? Is everything nice and stable? Is your data fully intact, and no signs of corruption at all? My server has been running for two weeks now, and it does a fstrim every 30 minutes, and everything appears to be stable, and I don't have any corruption when I fsck my disks. If things keep looking good, the SRU cycle will complete early next week, and the kernel will be released to -updates around the 21st of June, give or take a few days if any CVEs turn up. Let me know how things are going. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Thimo, Thanks for letting me know, and great to hear that things are working as expected. I'll check in with you in one week's time, to double check things are still going okay. I spent some time today performing verification on all the kernels in -proposed, testing block discard performance [1], and also running through the regression testcase from LP #1907262 [2]. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578 [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262 All kernels performed as expected, with block discard on 4x 1.9TB NVMe disks on an i3.8xlarge AWS instance taking 3-4 seconds, and the consistency checks performed returned clean disks, with no filesystem or data corruption. I have documented my tests in my verification messages: Hirsute: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/26 Groovy: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/27 Focal: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/28 Bionic: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/29 I have marked the launchpad bug as verified for all releases. I'm still running my own testing, with my /home directory being on a Raid10 array on a Google Cloud instance, and it has no issues. If things keep going well, we should see a release to -updates around the 21st of June, give or take a few days if any CVEs turn up. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Matthew, Thanks for your effort to add this feature to the Ubuntu kernels. I installed linux-image-5.4.0-75-generic on 2021-06-08. Neither during normal work nor during manual fstrim any problems so far. Best regards, Thimo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Thimo, The kernel team have built all of the kernels for this SRU cycle, and have placed them into -proposed for verification. We now need to do some thorough testing and make sure that Raid10 arrays function with good performance, ensure data integrity and make sure we won't be introducing any regressions when these kernels are released in two weeks time. I would really appreciate it if you could help test and verify these kernels function as intended. Instructions to Install: 1) cat << EOF | sudo tee /etc/apt/sources.list.d/ubuntu-$(lsb_release -cs)-proposed.list # Enable Ubuntu proposed archive deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -cs)-proposed main universe EOF 2) sudo apt update For 21.04 / Hirsute: 3) sudo apt install linux-image-5.11.0-20-generic linux-modules-5.11.0-20-generic \ linux-modules-extra-5.11.0-20-generic linux-headers-5.11.0-20-generic For 20.10 / Groovy: 3) sudo apt install linux-image-5.8.0-56-generic linux-modules-5.8.0-56-generic \ linux-modules-extra-5.8.0-56-generic linux-headers-5.8.0-56-generic For 20.04 / Focal: 3) sudo apt install linux-image-5.4.0-75-generic linux-modules-5.4.0-75-generic \ linux-modules-extra-5.4.0-75-generic linux-headers-5.4.0-75-generic For 18.04 / Bionic: For the 5.4 Bionic HWE Kernel: 3) sudo apt install linux-image-5.4.0-75-generic linux-modules-5.4.0-75-generic \ linux-modules-extra-5.4.0-75-generic linux-headers-5.4.0-75-generic For the 4.15 Bionic GA Kernel: 3) sudo apt install linux-image-4.15.0-145-generic linux-modules-4.15.0-145-generic \ linux-modules-extra-4.15.0-145-generic linux-headers-4.15.0-145-generic 4) sudo reboot 5) uname -rv You may need to modify your grub configuration to boot the correct kernel. If you need help, read these instructions: https://paste.ubuntu.com/p/XrTzWPPnWJ/ I am running the -proposed kernel on my cloud instance with my /home directory on a Raid10 array made up of 4x NVMe devices, and things are looking okay. I will be performing my detailed regression testing against these kernels tomorrow, and I will write back with the results then. Please help test these kernels in -proposed, and let me know how they go. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Thimo, As I mentioned in my previous message, I submitted the patches to the Ubuntu kernel mailing list for SRU. These patches have now gotten 2 acks [1][2] from senior kernel team members, and the patches have now been applied [3] to the 4.15, 5.4, 5.8 and 5.11 kernels. [1] https://lists.ubuntu.com/archives/kernel-team/2021-May/120475.html [2] https://lists.ubuntu.com/archives/kernel-team/2021-May/120799.html [3] https://lists.ubuntu.com/archives/kernel-team/2021-May/120800.html This is what is going to happen next. Next week, between the 31st of May and 4th of June, the kernel team will build the next kernel update, and place it in -proposed for testing. As soon as these kernels enter -proposed, we need to install and test Raid10 in these new kernels as much as possible. The testing and verification window is between the 7th and 18th of June. If all goes well, we can mark the launchpad bug as verified, and we will see a release to -updates around the 21st of June, give or take a few days if any CVEs turn up. The schedule is on https://kernel.ubuntu.com/ if anything were to change. I will write back once the next kernel update is in -proposed, likely early to mid next week. I would really, really appreciate it if you could help test the kernels when they arrive in -proposed, as I really don't want to introduce any more regressions. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Thimo, Thanks for helping test! I really appreciate it. It is great to hear that you haven't had any trouble with the test kernel. Just a quick update on the state of the Raid10 patchset. I submitted them for SRU for the current cycle, and the kernel team wrote back to me asking for more testing to be done before they make a decision to include them in the Ubuntu kernels. I am currently looking into longer running tests. At the moment, I am using a cloud instance as my personal computer with 4x scratch NVMe disks built as a Raid10 array with the same 5.4 test kernel, and I put my /home directory on the raid array. Everything is okay so far. I am planing to submit the patches for SRU to the next kernel SRU cycle, so hopefully we can get them reviewed and accepted then. I hope things are still running nice and stable on your side. I'll let you know how I get on with my /home on a Raid10 array, and when I next submit the patches for SRU. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Matthew, thank you for your continuous effort. I tested your 5.4.0-72-generic #80+TEST1896578v20210504b1-Ubuntu until now without trouble. I also started fstrim manually on a machine which did not do it for some time due to disabled fstrim service. Regards, Thimo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Thimo, I have been doing quite a bit of regression testing, and so far everything is looking good. The performance of the block discard is there, and I haven't come across any data corruption. I have also spent some time running through the testcase you created for this bug, and I have the results of those tests below. For each of the 5.11, 5.8, 5.4 and 4.15 kernels, the problem does not reproduce, as the values of /sys/block/md0/md/mismatch_cnt are always 0, and mounting each disk in singular and performing a full deep fsck shows no data corruption. Test results for each kernel are below: 5.11.0-16-generic #17+TEST1896578v20210503b1-Ubuntu https://paste.ubuntu.com/p/Dp3sR9mNdY/ 5.8.0-50-generic #56+TEST1896578v20210504b1-Ubuntu https://paste.ubuntu.com/p/tXmtmd5Jys/ 5.4.0-72-generic #80+TEST1896578v20210504b1-Ubuntu https://paste.ubuntu.com/p/VzX2mXcKbF/ 4.15.0-142-generic #146+TEST1896578v20210504b1-Ubuntu https://paste.ubuntu.com/p/HpMcX3N9fD/ I'm going to look into some longer running test cases as well, so far I have been focusing on short term (less than six hour) test cases. Otherwise, I have submitted the patches to the Ubuntu kernel mailing list for SRU. Now, these patches will still be subject to review by senior members of the kernel team, and their approval is required before they get applied to the official Ubuntu kernels. I will let you know if they get approval or not. In the meantime, please test the test kernels, and if you find any issues at all with the test kernels, please let me know. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Matthew, thank you for providing the test-kernel and instructions. I will give it a try. Regards, Thimo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Thimo, As promised yesterday, the new re-spins of the test kernels have finished building and are now available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1896578-test The patches used are the ones I will be submitting for SRU, and are more or less identical to the patches in the previous test kernels I supplied in February. Please go ahead and do some testing, and let me know if you find any problems. Please note this package is NOT SUPPORTED by Canonical, and is for TESTING PURPOSES ONLY. ONLY Install in a dedicated test environment. Instructions to install: 1) sudo add-apt-repository ppa:mruffell/lp1896578-test 2) sudo apt update For 21.04 / Hirsute: 3) sudo apt install linux-image-unsigned-5.11.0-16-generic linux-modules-5.11.0-16-generic \ linux-modules-extra-5.11.0-16-generic linux-headers-5.11.0-16-generic For 20.10 / Groovy: 3) sudo apt install linux-image-unsigned-5.8.0-50-generic linux-modules-5.8.0-50-generic \ linux-modules-extra-5.8.0-50-generic linux-headers-5.8.0-50-generic For 20.04 / Focal: 3) sudo apt install linux-image-unsigned-5.4.0-72-generic linux-modules-5.4.0-72-generic \ linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic For 18.04 / Bionic: For the 5.4 Bionic HWE kernel: 3) sudo apt install linux-image-unsigned-5.4.0-72-generic linux-modules-5.4.0-72-generic \ linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic For the 4.15 Bionic GA kernel: 3) sudo apt install linux-image-unsigned-4.15.0-142-generic linux-modules-4.15.0-142-generic \ linux-modules-extra-4.15.0-142-generic linux-headers-4.15.0-142-generic 4) sudo reboot 5) uname -rv Make sure the string "+TEST1896578v20210504b1" is present in the uname -rv. You may need to modify your grub configuration to boot the correct kernel. If you need help, read these instructions: https://paste.ubuntu.com/p/XrTzWPPnWJ/ I'm still doing final regression testing, but things are looking okay so far. The deadline for patch submission to the next SRU cycle is tomorrow. I'm still planning on submitting the patches for tomorrow, but if I think we need more time for testing, worst case it will slip to the SRU cycle after, which is 3 weeks away. I will write back tomorrow with the results of my regression testing and if I have submitted the patches for SRU. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Thimo, Thanks for writing back, great timing! So, the new revision of the patches that we have been testing since February have just been merged into mainline. The md/raid10 patches got merged on Friday, and the dm/raid patches got merged on Saturday, and will be tagged into 5.13-rc1. There's been a few of us testing them, and we haven't seen any regressions that cause data loss or disk corruption. Things are looking okay. If you are interested, you can see a list of the new commits on bug 1896578. We are still planning to SRU the new revision into the Ubuntu kernels, and I have spent the day backporting the official mainline commits to the Ubuntu 5.11, 5.8, 5.4 and 4.15 kernels. I'm currently building re-spins of the test kernels, based on more recently released Ubuntu kernels, with these official mainline patches, instead of the patches I got from the development mailing list I used in my previous set of test kernels. I'm expecting these kernels to finish building overnight, and I will make sure to write back tomorrow morning with instructions on how to install these test kernels. It would be great if you could give them a test before they get built into the next Ubuntu kernel update. Even when they are built into the next kernel update, I'll let you know how you can test them when they are in -proposed, before they are officially released to -updates. I'll write back tomorrow morning with instructions on how to install the fresh test kernels. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Matthew, are these tests still relevant for you? BR, Thimo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
** Changed in: linux (Ubuntu Groovy) Assignee: Sinclair Willis (yousure1222) => (unassigned) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
** Changed in: linux (Ubuntu Groovy) Assignee: (unassigned) => Sinclair Willis (yousure1222) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Thimo, Recently, Xiao Ni, the original author of the Raid10 block discard patchset, has posted a new revision of the patchset to the linux-raid mailing list for feedback. Xiao has fixed the two bugs that caused the regression. The first was incorrectly calculating the start offset for block discard for the second and extra disks. The second bug was an incorrect stripe size for far layouts. The new patches are: https://www.spinics.net/lists/raid/msg67208.html https://www.spinics.net/lists/raid/msg67212.html https://www.spinics.net/lists/raid/msg67213.html https://www.spinics.net/lists/raid/msg67209.html https://www.spinics.net/lists/raid/msg67210.html https://www.spinics.net/lists/raid/msg67211.html Now, at some point in the future I do want to try and SRU these patches to the Ubuntu kernel, but only when they are ready. I was wondering if you would be interested in helping to test these new patches, since you have a lot of experience with Raid10. If you have some time, and a dedicated spare server, read comment 13 in the below bug which contains instructions to install test kernels I have built. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/13 This is entirely optional, and don't feel that you are obligated to test. We just want to get more eyes on the patches and some wider testing done, and to give feedback back to Xiao, the author, and to Song Liu, the Raid subsystem maintainer about the performance and safety of these patches. I have tested the test kernels with the regression reproducer from this bug, and the mismatch count is always 0, and all fsck -f comes back clean for all disks. If you have some spare time and a spare server, I would really appreciate help testing these kernels. Thanks! Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
This bug was fixed in the package linux - 5.8.0-36.40+21.04.1 --- linux (5.8.0-36.40+21.04.1) hirsute; urgency=medium * Packaging resync (LP: #1786013) - update dkms package versions [ Ubuntu: 5.8.0-36.40 ] * debian/scripts/file-downloader does not handle positive failures correctly (LP: #1878897) - [Packaging] file-downloader not handling positive failures correctly [ Ubuntu: 5.8.0-35.39 ] * Packaging resync (LP: #1786013) - update dkms package versions * CVE-2021-1052 // CVE-2021-1053 - [Packaging] NVIDIA -- Add the NVIDIA 460 driver -- Kleber Sacilotto de Souza Thu, 07 Jan 2021 11:57:30 +0100 ** Changed in: linux (Ubuntu) Status: Confirmed => Fix Released ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-1052 ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-1053 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
This bug was fixed in the package linux - 4.15.0-128.131 --- linux (4.15.0-128.131) bionic; urgency=medium * bionic/linux: 4.15.0-128.131 -proposed tracker (LP: #1907354) * Packaging resync (LP: #1786013) - update dkms package versions * raid10: discard leads to corrupted file system (LP: #1907262) - Revert "md/raid10: improve discard request for far layout" - Revert "md/raid10: improve raid10 discard request" - Revert "md/raid10: pull codes that wait for blocked dev into one function" - Revert "md/raid10: extend r10bio devs to raid disks" - Revert "md: add md_submit_discard_bio() for submitting discard bio" -- Khalid Elmously Wed, 09 Dec 2020 01:27:33 -0500 ** Changed in: linux (Ubuntu Bionic) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
This bug was fixed in the package linux - 5.4.0-58.64 --- linux (5.4.0-58.64) focal; urgency=medium * focal/linux: 5.4.0-58.64 -proposed tracker (LP: #1907390) * Packaging resync (LP: #1786013) - update dkms package versions * raid10: discard leads to corrupted file system (LP: #1907262) - Revert "dm raid: remove unnecessary discard limits for raid10" - Revert "dm raid: fix discard limits for raid1 and raid10" - Revert "md/raid10: improve discard request for far layout" - Revert "md/raid10: improve raid10 discard request" - Revert "md/raid10: pull codes that wait for blocked dev into one function" - Revert "md/raid10: extend r10bio devs to raid disks" - Revert "md: add md_submit_discard_bio() for submitting discard bio" -- Khalid Elmously Wed, 09 Dec 2020 02:10:30 -0500 ** Changed in: linux (Ubuntu Focal) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
This bug was fixed in the package linux - 5.8.0-33.36 --- linux (5.8.0-33.36) groovy; urgency=medium * groovy/linux: 5.8.0-33.36 -proposed tracker (LP: #1907408) * raid10: discard leads to corrupted file system (LP: #1907262) - Revert "dm raid: remove unnecessary discard limits for raid10" - Revert "dm raid: fix discard limits for raid1 and raid10" - Revert "md/raid10: improve discard request for far layout" - Revert "md/raid10: improve raid10 discard request" - Revert "md/raid10: pull codes that wait for blocked dev into one function" - Revert "md/raid10: extend r10bio devs to raid disks" - Revert "md: add md_submit_discard_bio() for submitting discard bio" -- Khalid Elmously Wed, 09 Dec 2020 03:56:47 -0500 ** Changed in: linux (Ubuntu Groovy) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed- groovy'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-groovy -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Performing verification for Focal. I spun up a m5d.4xlarge instance on AWS, to utilise the 2x 300GB NVMe drives that support block discard. I enabled -proposed, and installed the 5.4.0-58-generic kernel. The following is the repro session running through the full testcase: https://paste.ubuntu.com/p/Zr4C2pMbrk/ A 2 disk Raid10 array was created, LVM created and formatted ext4. I let the consistency checks finish, and created, then deleted a file. Did another consistency check, then performed a fstrim. After another consistency check, we unmount and perform a fsck on each individual disk. root@ip-172-31-1-147:/home/ubuntu# fsck.ext4 -n -f /dev/VolGroup/root e2fsck 1.45.5 (07-Jan-2020) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/VolGroup/root: 11/6553600 files (0.0% non-contiguous), 557848/26214400 blocks root@ip-172-31-1-147:/home/ubuntu# fsck.ext4 -n -f /dev/VolGroup/root e2fsck 1.45.5 (07-Jan-2020) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/VolGroup/root: 11/6553600 files (0.0% non-contiguous), 557848/26214400 blocks Both of them pass, there is no corruption to the filesystem. 5.4.0-58-generic fixes the problem, the revert is effective. Marking bug as verified for Focal. ** Tags removed: verification-needed-focal ** Tags added: verification-done-focal -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Performing verification for Bionic. I spun up a m5d.4xlarge instance on AWS, to utilise the 2x 300GB NVMe drives that support block discard. I enabled -proposed, and installed the 4.15.0-128-generic kernel. The following is the repro session running through the full testcase: https://paste.ubuntu.com/p/VpwjbRRcy6/ A 2 disk Raid10 array was created, LVM created and formatted ext4. I let the consistency checks finish, and created, then deleted a file. Did another consistency check, then performed a fstrim. After another consistency check, we unmount and perform a fsck on each individual disk. root@ip-172-31-10-77:~# fsck.ext4 -n -f /dev/VolGroup/root e2fsck 1.44.1 (24-Mar-2018) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/VolGroup/root: 11/6553600 files (0.0% non-contiguous), 557848/26214400 blocks root@ip-172-31-10-77:~# fsck.ext4 -n -f /dev/VolGroup/root e2fsck 1.44.1 (24-Mar-2018) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/VolGroup/root: 11/6553600 files (0.0% non-contiguous), 557848/26214400 blocks Both of them pass, there is no corruption to the filesystem. 4.15.0-128-generic fixes the problem, the revert is effective. Marking bug as verified for Bionic. ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed- bionic'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-bionic ** Tags added: verification-needed-focal -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
For Trusty and Xenial, fstrim is scheduled via cron[0] to run weekly at each Sunday at 6h47[1]. For Bionic onward, fstrim is scheduled via systemd timer to also run weekly[2] Impacted users may want to take action before the next scheduled run by downgrading the running kernel or disabling the fstrim job. [Trusty and Xenial] By default, an /etc/cron.weekly/fstrim job is installed, but this may be supplanted by local modifications. Check if you are running a cron job which might invoke fstrim: $ sudo grep -r fstrim /etc/cron* If an fstrim job is found in the results of the above command, edit the appropriate file and comment out the command with a “#” at the beginning of the line to disable the execution of fstrim. For the default Ubuntu configuration, the command in the /etc/cron.weekly/fstrim file starts with “/sbin/fstrim” or “exec fstrim- all” and is the last line of the file. [Bionic or late] $ sudo systemctl disable --now fstrim.timer $ sudo systemctl mask fstrim.service [0] - /etc/cron.weekly/fstrim [1] - grep -i weekly /etc/crontab: [2] - systemctl status fstrim.timer | grep "Trigger:" -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
For Trusty and Xenial, fstrim is scheduled via cron[0] to run weekly at each Sunday at 6h47[1]. For Bionic onward, fstrim is scheduled via systemd timer to also run weekly[2] Impacted users may want to take action before the next scheduled run by downgrading the running kernel or temporarily disabling the fstrim job. [Trusty and Xenial] By default, an /etc/cron.weekly/fstrim job is installed, but this may be supplanted by local modifications. Check if you are running a cron job which might invoke fstrim: $ sudo grep -r fstrim /etc/cron* If an fstrim job is found in the results of the above command, edit the appropriate file and comment out the command with a “#” at the beginning of the line to disable the execution of fstrim. For the default Ubuntu configuration, the command in the /etc/cron.weekly/fstrim file starts with “/sbin/fstrim” or “exec fstrim- all” and is the last line of the file. [Bionic or late] $ sudo systemctl disable --now fstrim.timer $ sudo systemctl mask fstrim.service [0] - /etc/cron.weekly/fstrim [1] - grep -i weekly /etc/crontab: [2] - systemctl status fstrim.timer | grep "Trigger:" -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
@voidlily, I would assume you are running a HWE kernel (v4.15) on Xenial. If it's the case, fixing the Bionic kernel will generate a new HWE (4.15) kernel for Xenial. ** Changed in: linux (Ubuntu Xenial) Status: New => Invalid ** Changed in: linux (Ubuntu Trusty) Status: New => Invalid -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Trusty) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
This issue is also affecting xenial, or at least the package was pulled from xenial as well. When I try to click the "add distribution" button in launchpad I'm getting an oops error, so posting a comment about xenial being affected in the meantime. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
This is just the procedure with the least damage I found. Still data loss may happen (and actually happened to some of our systems). Probably first re-adding (after zeroing) the second component to the RAID and then fsck-ing leads to the exact same result but I wanted to keep the second component as fall-back until I could see the results of fsck. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Thimo, Thanks for the update; just to clarify, for your "procedure to recover," are you saying that that procedure will always resolve the damage, or that even after that procedure, there may be corruption? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Matthew and all, thank you for taking action immediately. I really appreciate your effort. After investigating the issue further I have to add that the mount option discard seems to trigger the issue, too. @Trent The general problem here is that RAID10 can balance single read streams to all disks (which is probably the major advantage over RAID1 effectively providing you RAID0 read speed; RAID1 needs parallel reads to achieve this). That said it is no big surprise that several machines at our site went to readonly mode after *some time* (probably reading some filesystem relevant data from the "bad disk"). Unfortunately the "clean first disk" only happens if you act immediately, otherwise you might have some data corruption. I verified this on one system where the root partition was affected using the debsums tool (just run debsums -xa) after fixing FS errors. My procedure to recover was: Assembly of the RAID: mdadm --assemble /dev/md127 /dev/nvme0n1p2 mdadm --run /dev/md127 Filesystem check on all partitions (note the -f parameter, some FS "think" they are clean): fsck.ext4 -f /dev/VolGroup/... Re-add the second component: mdadm --zero-superblock /dev/nvme1n1p2 mdadm --add /dev/md127 /dev/nvme1n1p2 Best regards -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
** Changed in: linux (Ubuntu Focal) Status: In Progress => Fix Committed ** Changed in: linux (Ubuntu Groovy) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
** Attachment added: "blktrace-lp1907262.tar.gz" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+attachment/5442212/+files/blktrace-lp1907262.tar.gz -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
I can reproduce this on a Google Cloud n1-standard-16 using 2x Local NVMe disks. Then partition nvme0n1 and nvne0n2 with only an 8GB partition, then format directly with ext4 (skip LVM). In this setup each 'check' takes <1 min so speeds up testing considerably. Example details - seems pre-emptible instance cost for this is $0.292/hour / $7/day. gcloud compute instances create raid10-test --project=juju2-157804 \ --zone=us-west1-b \ --machine-type=n1-standard-16 \ --subnet=default \ --network-tier=STANDARD \ --no-restart-on-failure \ --maintenance-policy=TERMINATE \ --preemptible \ --boot-disk-size=32GB \ --boot-disk-type=pd-ssd \ --image=ubuntu-1804-bionic-v20201116 --image-project=ubuntu-os-cloud \ --local-ssd=interface=NVME --local-ssd=interface=NVME # apt install linux-image-virtual # apt-get remove linux-image-gcp linux-image-5.4.0-1029-gcp linux-image-unsigned-5.4.0-1029-gcp --purge # reboot sgdisk -n 0:0:+8G /dev/nvme0n1 sgdisk -n 0:0:+8G /dev/nvme0n2 mdadm -C -v -l10 -n2 -N "lv-raid" -R /dev/md0 /dev/nvme0n1p2 /dev/nvme1n1p2 mkfs.ext4 /dev/md0 mount /dev/md0 /mnt dd if=/dev/zero of=/mnt/data.raw bs=4K count=1M; sync; rm /mnt/data.raw echo check >/sys/block/md0/md/sync_action; watch 'grep . /proc/mdstat /sys/block/md0/md/mismatch_cnt' # no mismatch fstrim -v /mnt echo check >/sys/block/md0/md/sync_action; watch 'grep . /proc/mdstat /sys/block/md0/md/mismatch_cnt' # mismatch=256 I ran blktrace /dev/md0 /dev/nvme0n1 /dev/nvme0n2 and will upload the results I didn't have time to try and understand the results as yet. Some thoughts - It was asserted that the first disk 'appears' fine - So I wondered can we reliably repair by asking mdadm to do a 'repair' or 'resync' - It seems that reads are at least sometimes balanced (maybe by PID) to different disks since this post.. https://www.spinics.net/lists/raid/msg62762.html - unclear if the same selection impacts writes (not that it would help performance) - So it's unclear we can reliably say only a 'passive mirror' is being corrupted, it's possible application reads may or may not be corrupted. More testing/understanding of the code required. - This area of RAID10 and RAID1 seems quite under-documented, "man md" doesn't talk much about how or which disk is used to repair the other if there is a mismatch (unlike RAID5 where the parity gives us some assurances as to which data is wrong). - We should try writes from different PIDs, with known different data, and compare the data on both disks with the known data to see if we can knowingly get the wrong data on both disks or only one. And try that with 4 disks instead of 2. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
** Changed in: linux (Ubuntu Bionic) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Thimo, Firstly, thank you for your bug report, we really, really appreciate it. You are correct, the recent raid10 patches appear to cause filesystem corruption on raid10 arrays. I have spent the day reproducing, and I can confirm that the 4.15.0-126-generic, 5.4.0-56-generic and 5.8.0-31-generic kernels are affected. The kernel team are aware of the situation, and we have begun an emergency revert of the patches, and we should have new kernels available in the next few hours / day or so. The current mainline kernel is affected, so I have written to the raid subsystem maintainer, and the original author of the raid10 block discard patches, to aid with debugging and fixing the problem. You can follow the upstream thread here: https://www.spinics.net/lists/kernel/msg3765302.html As for the data corruption on your servers, I am deeply sorry for causing this regression. When I was testing the raid10 block discard patches on the Ubuntu stable kernels, I did not think to fsck each of the disks in the array, instead, I was contempt with the speed of creating new arrays, writing a basic dataset to the disks, and rebooting the server to ensure the array came up again with those same files. Since the first disk seems to be okay, there is at least a small window of opportunity for you to restore any data that you have not backed up. I will keep you informed of getting the patches reverted, and getting the root cause fixed upstream. If you have any questions, feel free to ask, and if you have any more details from your own debugging, feel free to share in this bug, or on the upstream mailing list discussion. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
** Also affects: linux (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Groovy) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Bionic) Status: New => In Progress ** Changed in: linux (Ubuntu Focal) Status: New => In Progress ** Changed in: linux (Ubuntu Groovy) Status: New => In Progress ** Changed in: linux (Ubuntu Bionic) Importance: Undecided => High ** Changed in: linux (Ubuntu Focal) Importance: Undecided => High ** Changed in: linux (Ubuntu Groovy) Importance: Undecided => High -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Thimo, Thank you for the very detailed bug report. I will start investigating this immediately. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: linux (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs