[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-06-23 Thread Thimo E
Hi Matthew,

sorry for the late reply.
Today I triggered another fstrim with the linux-image-5.4.0-75-generic kernel 
and made a final check on the RAID - for me no trouble occured yet.
Thank you for pursuing this topic so persistently and providing the patches to 
the Ubuntu kernel finally.

Best regards,
 Thimo

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-06-22 Thread Matthew Ruffell
Hi Thimo,

The SRU cycle has completed, and all kernels containing the Raid10 block
discard performance patches have now been released to -updates.

Note that the versions are different than the kernels in -proposed, due
to the kernel team needing to do a last minute respin to fix two sets of
CVEs, one for broadcom wifi chipsets and the other for bpf, hence the
kernels being released a day later than usual.

The released kernels are:

Hirsute: 5.11.0-22-generic
Groovy:  5.8.0-59-generic
Focal:   5.4.0-77-generic
Bionic:  4.15.0-147-generic

The HWE equivalents have also been released to -updates.

You may now install these kernels to your systems and enjoy fast block
discard for your Raid10 arrays.

All of our testing has concluded that these patches are stable, but if
you run into any issues whatsoever as you roll this out to more systems,
please let us know, and we will investigate accordingly.

I wish you a trouble free rollout of these kernels to your systems.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-06-17 Thread Matthew Ruffell
Hi Thimo,

Just checking in. Are you still running 5.4.0-75-generic on your server?

Is everything nice and stable? Is your data fully intact, and no signs
of corruption at all?

My server has been running for two weeks now, and it does a fstrim every
30 minutes, and everything appears to be stable, and I don't have any
corruption when I fsck my disks.

If things keep looking good, the SRU cycle will complete early next
week, and the kernel will be released to -updates around the 21st of
June, give or take a few days if any CVEs turn up.

Let me know how things are going.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-06-10 Thread Matthew Ruffell
Hi Thimo,

Thanks for letting me know, and great to hear that things are working as
expected. I'll check in with you in one week's time, to double check things are
still going okay.

I spent some time today performing verification on all the kernels in -proposed,
testing block discard performance [1], and also running through the regression
testcase from LP #1907262 [2].

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262

All kernels performed as expected, with block discard on 4x 1.9TB NVMe disks
on an i3.8xlarge AWS instance taking 3-4 seconds, and the consistency checks
performed returned clean disks, with no filesystem or data corruption.

I have documented my tests in my verification messages:

Hirsute: 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/26

Groovy:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/27

Focal:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/28

Bionic:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/29

I have marked the launchpad bug as verified for all releases.

I'm still running my own testing, with my /home directory being on a Raid10 
array
on a Google Cloud instance, and it has no issues.

If things keep going well, we should see a release to -updates around the 21st
of June, give or take a few days if any CVEs turn up.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-06-10 Thread Thimo E
Hi Matthew,

Thanks for your effort to add this feature to the Ubuntu kernels.

I installed linux-image-5.4.0-75-generic on 2021-06-08.
Neither during normal work nor during manual fstrim any problems so far.


Best regards,
 Thimo

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-06-07 Thread Matthew Ruffell
Hi Thimo,

The kernel team have built all of the kernels for this SRU cycle, and
have placed them into -proposed for verification.

We now need to do some thorough testing and make sure that Raid10 arrays
function with good performance, ensure data integrity and make sure we
won't be introducing any regressions when these kernels are released in
two weeks time.

I would really appreciate it if you could help test and verify these
kernels function as intended.

Instructions to Install:

1) cat << EOF | sudo tee /etc/apt/sources.list.d/ubuntu-$(lsb_release 
-cs)-proposed.list
# Enable Ubuntu proposed archive
deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -cs)-proposed main universe
EOF
2) sudo apt update

For 21.04 / Hirsute:

3) sudo apt install linux-image-5.11.0-20-generic 
linux-modules-5.11.0-20-generic \
linux-modules-extra-5.11.0-20-generic linux-headers-5.11.0-20-generic

For 20.10 / Groovy:

3) sudo apt install linux-image-5.8.0-56-generic linux-modules-5.8.0-56-generic 
\
linux-modules-extra-5.8.0-56-generic linux-headers-5.8.0-56-generic

For 20.04 / Focal:

3) sudo apt install linux-image-5.4.0-75-generic linux-modules-5.4.0-75-generic 
\
linux-modules-extra-5.4.0-75-generic linux-headers-5.4.0-75-generic

For 18.04 / Bionic:
 For the 5.4 Bionic HWE Kernel:
 
 3) sudo apt install linux-image-5.4.0-75-generic 
linux-modules-5.4.0-75-generic \
linux-modules-extra-5.4.0-75-generic linux-headers-5.4.0-75-generic
 
 For the 4.15 Bionic GA Kernel:
 
 3) sudo apt install linux-image-4.15.0-145-generic 
linux-modules-4.15.0-145-generic \
linux-modules-extra-4.15.0-145-generic linux-headers-4.15.0-145-generic


4) sudo reboot
5) uname -rv

You may need to modify your grub configuration to boot the correct
kernel. If you need help, read these instructions:
https://paste.ubuntu.com/p/XrTzWPPnWJ/

I am running the -proposed kernel on my cloud instance with my /home directory 
on a Raid10 array made up of 4x NVMe devices, and things are looking okay.
I will be performing my detailed regression testing against these kernels 
tomorrow, and I will write back with the results then.

Please help test these kernels in -proposed, and let me know how they
go.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-05-27 Thread Matthew Ruffell
Hi Thimo,

As I mentioned in my previous message, I submitted the patches to the
Ubuntu kernel mailing list for SRU.

These patches have now gotten 2 acks [1][2] from senior kernel team
members, and the patches have now been applied [3] to the 4.15, 5.4, 5.8
and 5.11 kernels.

[1] https://lists.ubuntu.com/archives/kernel-team/2021-May/120475.html
[2] https://lists.ubuntu.com/archives/kernel-team/2021-May/120799.html
[3] https://lists.ubuntu.com/archives/kernel-team/2021-May/120800.html

This is what is going to happen next. Next week, between the 31st of May
and 4th of June, the kernel team will build the next kernel update, and
place it in -proposed for testing.

As soon as these kernels enter -proposed, we need to install and test
Raid10 in these new kernels as much as possible. The testing and
verification window is between the 7th and 18th of June.

If all goes well, we can mark the launchpad bug as verified, and we will
see a release to -updates around the 21st of June, give or take a few
days if any CVEs turn up.

The schedule is on https://kernel.ubuntu.com/ if anything were to
change.

I will write back once the next kernel update is in -proposed, likely
early to mid next week. I would really, really appreciate it if you
could help test the kernels when they arrive in -proposed, as I really
don't want to introduce any more regressions.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-05-19 Thread Matthew Ruffell
Hi Thimo,

Thanks for helping test! I really appreciate it. It is great to hear
that you haven't had any trouble with the test kernel.

Just a quick update on the state of the Raid10 patchset. I submitted
them for SRU for the current cycle, and the kernel team wrote back to me
asking for more testing to be done before they make a decision to
include them in the Ubuntu kernels.

I am currently looking into longer running tests.

At the moment, I am using a cloud instance as my personal computer with
4x scratch NVMe disks built as a Raid10 array with the same 5.4 test
kernel, and I put my /home directory on the raid array. Everything is
okay so far.

I am planing to submit the patches for SRU to the next kernel SRU cycle,
so hopefully we can get them reviewed and accepted then.

I hope things are still running nice and stable on your side. I'll let
you know how I get on with my /home on a Raid10 array, and when I next
submit the patches for SRU.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-05-10 Thread Thimo E
Hi Matthew,

thank you for your continuous effort. I tested your 5.4.0-72-generic 
#80+TEST1896578v20210504b1-Ubuntu until now without trouble.
I also started fstrim manually on a machine which did not do it for some time 
due to disabled fstrim service.

Regards,
 Thimo

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-05-06 Thread Matthew Ruffell
Hi Thimo,

I have been doing quite a bit of regression testing, and so far everything is
looking good. The performance of the block discard is there, and I haven't
come across any data corruption.

I have also spent some time running through the testcase you created for this
bug, and I have the results of those tests below.

For each of the 5.11, 5.8, 5.4 and 4.15 kernels, the problem does not reproduce,
as the values of /sys/block/md0/md/mismatch_cnt are always 0, and mounting each
disk in singular and performing a full deep fsck shows no data corruption.

Test results for each kernel are below:

5.11.0-16-generic #17+TEST1896578v20210503b1-Ubuntu
https://paste.ubuntu.com/p/Dp3sR9mNdY/

5.8.0-50-generic #56+TEST1896578v20210504b1-Ubuntu
https://paste.ubuntu.com/p/tXmtmd5Jys/

5.4.0-72-generic #80+TEST1896578v20210504b1-Ubuntu
https://paste.ubuntu.com/p/VzX2mXcKbF/

4.15.0-142-generic #146+TEST1896578v20210504b1-Ubuntu
https://paste.ubuntu.com/p/HpMcX3N9fD/

I'm going to look into some longer running test cases as well, so far I have
been focusing on short term (less than six hour) test cases.

Otherwise, I have submitted the patches to the Ubuntu kernel mailing list for
SRU. Now, these patches will still be subject to review by senior members of the
kernel team, and their approval is required before they get applied to the
official Ubuntu kernels. I will let you know if they get approval or not.

In the meantime, please test the test kernels, and if you find any issues at
all with the test kernels, please let me know.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-05-05 Thread Thimo E
Hi Matthew,

thank you for providing the test-kernel and instructions. I will give it
a try.

Regards,
 Thimo

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-05-05 Thread Matthew Ruffell
Hi Thimo,

As promised yesterday, the new re-spins of the test kernels have
finished building and are now available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/lp1896578-test

The patches used are the ones I will be submitting for SRU, and are more
or less identical to the patches in the previous test kernels I supplied
in February.

Please go ahead and do some testing, and let me know if you find any
problems.

Please note this package is NOT SUPPORTED by Canonical, and is for
TESTING PURPOSES ONLY. ONLY Install in a dedicated test environment.

Instructions to install:
1) sudo add-apt-repository ppa:mruffell/lp1896578-test
2) sudo apt update

For 21.04 / Hirsute:

3) sudo apt install linux-image-unsigned-5.11.0-16-generic 
linux-modules-5.11.0-16-generic \
linux-modules-extra-5.11.0-16-generic linux-headers-5.11.0-16-generic

For 20.10 / Groovy:

3) sudo apt install linux-image-unsigned-5.8.0-50-generic 
linux-modules-5.8.0-50-generic \
linux-modules-extra-5.8.0-50-generic linux-headers-5.8.0-50-generic

For 20.04 / Focal:

3) sudo apt install linux-image-unsigned-5.4.0-72-generic 
linux-modules-5.4.0-72-generic \
linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic

For 18.04 / Bionic:
For the 5.4 Bionic HWE kernel:

3) sudo apt install linux-image-unsigned-5.4.0-72-generic 
linux-modules-5.4.0-72-generic \
linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic

For the 4.15 Bionic GA kernel:

3) sudo apt install linux-image-unsigned-4.15.0-142-generic 
linux-modules-4.15.0-142-generic \
linux-modules-extra-4.15.0-142-generic linux-headers-4.15.0-142-generic

4) sudo reboot
5) uname -rv
Make sure the string "+TEST1896578v20210504b1" is present in the uname -rv.

You may need to modify your grub configuration to boot the correct
kernel. If you need help, read these instructions:
https://paste.ubuntu.com/p/XrTzWPPnWJ/

I'm still doing final regression testing, but things are looking okay so
far. The deadline for patch submission to the next SRU cycle is
tomorrow. I'm still planning on submitting the patches for tomorrow, but
if I think we need more time for testing, worst case it will slip to the
SRU cycle after, which is 3 weeks away.

I will write back tomorrow with the results of my regression testing and
if I have submitted the patches for SRU.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-05-04 Thread Matthew Ruffell
Hi Thimo,

Thanks for writing back, great timing!

So, the new revision of the patches that we have been testing since
February have just been merged into mainline. The md/raid10 patches got
merged on Friday, and the dm/raid patches got merged on Saturday, and
will be tagged into 5.13-rc1. There's been a few of us testing them, and
we haven't seen any regressions that cause data loss or disk corruption.
Things are looking okay.

If you are interested, you can see a list of the new commits on bug
1896578.

We are still planning to SRU the new revision into the Ubuntu kernels,
and I have spent the day backporting the official mainline commits to
the Ubuntu 5.11, 5.8, 5.4 and 4.15 kernels.

I'm currently building re-spins of the test kernels, based on more
recently released Ubuntu kernels, with these official mainline patches,
instead of the patches I got from the development mailing list I used in
my previous set of test kernels.

I'm expecting these kernels to finish building overnight, and I will
make sure to write back tomorrow morning with instructions on how to
install these test kernels.

It would be great if you could give them a test before they get built
into the next Ubuntu kernel update. Even when they are built into the
next kernel update, I'll let you know how you can test them when they
are in -proposed, before they are officially released to -updates.

I'll write back tomorrow morning with instructions on how to install the
fresh test kernels.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-05-03 Thread Thimo E
Hi Matthew,

are these tests still relevant for you?

BR,
 Thimo

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-04-29 Thread Matthew Ruffell
** Changed in: linux (Ubuntu Groovy)
 Assignee: Sinclair Willis (yousure1222) => (unassigned)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-04-29 Thread Sinclair Willis
** Changed in: linux (Ubuntu Groovy)
 Assignee: (unassigned) => Sinclair Willis (yousure1222)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-02-14 Thread Matthew Ruffell
Hi Thimo,

Recently, Xiao Ni, the original author of the Raid10 block discard
patchset, has posted a new revision of the patchset to the linux-raid
mailing list for feedback.

Xiao has fixed the two bugs that caused the regression. The first was
incorrectly calculating the start offset for block discard for the
second and extra disks. The second bug was an incorrect stripe size for
far layouts.

The new patches are:

https://www.spinics.net/lists/raid/msg67208.html
https://www.spinics.net/lists/raid/msg67212.html
https://www.spinics.net/lists/raid/msg67213.html
https://www.spinics.net/lists/raid/msg67209.html
https://www.spinics.net/lists/raid/msg67210.html
https://www.spinics.net/lists/raid/msg67211.html

Now, at some point in the future I do want to try and SRU these patches
to the Ubuntu kernel, but only when they are ready.

I was wondering if you would be interested in helping to test these new
patches, since you have a lot of experience with Raid10.

If you have some time, and a dedicated spare server, read comment 13 in
the below bug which contains instructions to install test kernels I have
built.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/13

This is entirely optional, and don't feel that you are obligated to
test. We just want to get more eyes on the patches and some wider
testing done, and to give feedback back to Xiao, the author, and to Song
Liu, the Raid subsystem maintainer about the performance and safety of
these patches.

I have tested the test kernels with the regression reproducer from this
bug, and the mismatch count is always 0, and all fsck -f comes back
clean for all disks.

If you have some spare time and a spare server, I would really
appreciate help testing these kernels.

Thanks!
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-01-11 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.8.0-36.40+21.04.1

---
linux (5.8.0-36.40+21.04.1) hirsute; urgency=medium

  * Packaging resync (LP: #1786013)
- update dkms package versions

  [ Ubuntu: 5.8.0-36.40 ]

  * debian/scripts/file-downloader does not handle positive failures correctly
(LP: #1878897)
- [Packaging] file-downloader not handling positive failures correctly

  [ Ubuntu: 5.8.0-35.39 ]

  * Packaging resync (LP: #1786013)
- update dkms package versions
  * CVE-2021-1052 // CVE-2021-1053
- [Packaging] NVIDIA -- Add the NVIDIA 460 driver

 -- Kleber Sacilotto de Souza   Thu, 07 Jan
2021 11:57:30 +0100

** Changed in: linux (Ubuntu)
   Status: Confirmed => Fix Released

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-1052

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-1053

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-11 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-128.131

---
linux (4.15.0-128.131) bionic; urgency=medium

  * bionic/linux: 4.15.0-128.131 -proposed tracker (LP: #1907354)

  * Packaging resync (LP: #1786013)
- update dkms package versions

  * raid10: discard leads to corrupted file system (LP: #1907262)
- Revert "md/raid10: improve discard request for far layout"
- Revert "md/raid10: improve raid10 discard request"
- Revert "md/raid10: pull codes that wait for blocked dev into one function"
- Revert "md/raid10: extend r10bio devs to raid disks"
- Revert "md: add md_submit_discard_bio() for submitting discard bio"

 -- Khalid Elmously   Wed, 09 Dec 2020
01:27:33 -0500

** Changed in: linux (Ubuntu Bionic)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-11 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.4.0-58.64

---
linux (5.4.0-58.64) focal; urgency=medium

  * focal/linux: 5.4.0-58.64 -proposed tracker (LP: #1907390)

  * Packaging resync (LP: #1786013)
- update dkms package versions

  * raid10: discard leads to corrupted file system (LP: #1907262)
- Revert "dm raid: remove unnecessary discard limits for raid10"
- Revert "dm raid: fix discard limits for raid1 and raid10"
- Revert "md/raid10: improve discard request for far layout"
- Revert "md/raid10: improve raid10 discard request"
- Revert "md/raid10: pull codes that wait for blocked dev into one function"
- Revert "md/raid10: extend r10bio devs to raid disks"
- Revert "md: add md_submit_discard_bio() for submitting discard bio"

 -- Khalid Elmously   Wed, 09 Dec 2020
02:10:30 -0500

** Changed in: linux (Ubuntu Focal)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-11 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.8.0-33.36

---
linux (5.8.0-33.36) groovy; urgency=medium

  * groovy/linux: 5.8.0-33.36 -proposed tracker (LP: #1907408)

  * raid10: discard leads to corrupted file system (LP: #1907262)
- Revert "dm raid: remove unnecessary discard limits for raid10"
- Revert "dm raid: fix discard limits for raid1 and raid10"
- Revert "md/raid10: improve discard request for far layout"
- Revert "md/raid10: improve raid10 discard request"
- Revert "md/raid10: pull codes that wait for blocked dev into one function"
- Revert "md/raid10: extend r10bio devs to raid disks"
- Revert "md: add md_submit_discard_bio() for submitting discard bio"

 -- Khalid Elmously   Wed, 09 Dec 2020
03:56:47 -0500

** Changed in: linux (Ubuntu Groovy)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-11 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
groovy' to 'verification-done-groovy'. If the problem still exists,
change the tag 'verification-needed-groovy' to 'verification-failed-
groovy'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-groovy

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-10 Thread Matthew Ruffell
Performing verification for Focal.

I spun up a m5d.4xlarge instance on AWS, to utilise the 2x 300GB NVMe
drives that support block discard.

I enabled -proposed, and installed the 5.4.0-58-generic kernel.

The following is the repro session running through the full testcase:

https://paste.ubuntu.com/p/Zr4C2pMbrk/

A 2 disk Raid10 array was created, LVM created and formatted ext4. I let
the consistency checks finish, and created, then deleted a file. Did
another consistency check, then performed a fstrim. After another
consistency check, we unmount and perform a fsck on each individual
disk.

root@ip-172-31-1-147:/home/ubuntu# fsck.ext4 -n -f /dev/VolGroup/root
e2fsck 1.45.5 (07-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/VolGroup/root: 11/6553600 files (0.0% non-contiguous), 557848/26214400 
blocks

root@ip-172-31-1-147:/home/ubuntu# fsck.ext4 -n -f /dev/VolGroup/root
e2fsck 1.45.5 (07-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/VolGroup/root: 11/6553600 files (0.0% non-contiguous), 557848/26214400 
blocks

Both of them pass, there is no corruption to the filesystem.

5.4.0-58-generic fixes the problem, the revert is effective.

Marking bug as verified for Focal.

** Tags removed: verification-needed-focal
** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-10 Thread Matthew Ruffell
Performing verification for Bionic.

I spun up a m5d.4xlarge instance on AWS, to utilise the 2x 300GB NVMe
drives that support block discard.

I enabled -proposed, and installed the 4.15.0-128-generic kernel.

The following is the repro session running through the full testcase:

https://paste.ubuntu.com/p/VpwjbRRcy6/

A 2 disk Raid10 array was created, LVM created and formatted ext4. I let
the consistency checks finish, and created, then deleted a file. Did
another consistency check, then performed a fstrim. After another
consistency check, we unmount and perform a fsck on each individual
disk.

root@ip-172-31-10-77:~# fsck.ext4 -n -f /dev/VolGroup/root
e2fsck 1.44.1 (24-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/VolGroup/root: 11/6553600 files (0.0% non-contiguous), 557848/26214400 
blocks

root@ip-172-31-10-77:~# fsck.ext4 -n -f /dev/VolGroup/root
e2fsck 1.44.1 (24-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/VolGroup/root: 11/6553600 files (0.0% non-contiguous), 557848/26214400 
blocks

Both of them pass, there is no corruption to the filesystem.

4.15.0-128-generic fixes the problem, the revert is effective.

Marking bug as verified for Bionic.

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-10 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
focal' to 'verification-done-focal'. If the problem still exists, change
the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-10 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

** Tags added: verification-needed-focal

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Eric Desrochers
For Trusty and Xenial, fstrim is scheduled via cron[0] to run weekly at each 
Sunday at 6h47[1].
For Bionic onward, fstrim is scheduled via systemd timer to also run weekly[2]

Impacted users may want to take action before the next scheduled run by
downgrading the running kernel or disabling the fstrim job.

[Trusty and Xenial]
By default,  an /etc/cron.weekly/fstrim job is installed, but this may be 
supplanted by local modifications.

Check if you are running a cron job which might invoke fstrim:
$ sudo grep -r fstrim /etc/cron*

If an fstrim job is found in the results of the above command, edit the
appropriate file and comment out the command with a “#” at the beginning
of the line to disable the execution of fstrim.

For the default Ubuntu configuration, the command in the
/etc/cron.weekly/fstrim file starts with “/sbin/fstrim” or “exec fstrim-
all” and is the last line of the file.


[Bionic or late]
$ sudo systemctl disable --now fstrim.timer
$ sudo systemctl mask fstrim.service
 

[0] - /etc/cron.weekly/fstrim
[1] - grep -i weekly /etc/crontab:
[2] - systemctl status fstrim.timer | grep "Trigger:"

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Eric Desrochers
For Trusty and Xenial, fstrim is scheduled via cron[0] to run weekly at each 
Sunday at 6h47[1].
For Bionic onward, fstrim is scheduled via systemd timer to also run weekly[2]

Impacted users may want to take action before the next scheduled run by
downgrading the running kernel or temporarily disabling the fstrim job.

[Trusty and Xenial]
By default, an /etc/cron.weekly/fstrim job is installed, but this may be 
supplanted by local modifications.

Check if you are running a cron job which might invoke fstrim:
$ sudo grep -r fstrim /etc/cron*

If an fstrim job is found in the results of the above command, edit the
appropriate file and comment out the command with a “#” at the beginning
of the line to disable the execution of fstrim.

For the default Ubuntu configuration, the command in the
/etc/cron.weekly/fstrim file starts with “/sbin/fstrim” or “exec fstrim-
all” and is the last line of the file.

[Bionic or late]
$ sudo systemctl disable --now fstrim.timer
$ sudo systemctl mask fstrim.service

[0] - /etc/cron.weekly/fstrim
[1] - grep -i weekly /etc/crontab:
[2] - systemctl status fstrim.timer | grep "Trigger:"

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Eric Desrochers
@voidlily,

I would assume you are running a HWE kernel (v4.15) on Xenial.

If it's the case, fixing the Bionic kernel will generate a new HWE
(4.15) kernel for Xenial.


** Changed in: linux (Ubuntu Xenial)
   Status: New => Invalid

** Changed in: linux (Ubuntu Trusty)
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Eric Desrochers
** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Trusty)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread voidlily
This issue is also affecting xenial, or at least the package was pulled
from xenial as well. When I try to click the "add distribution" button
in launchpad I'm getting an oops error, so posting a comment about
xenial being affected in the meantime.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Thimo E
This is just the procedure with the least damage I found.
Still data loss may happen (and actually happened to some of our systems).

Probably first re-adding (after zeroing) the second component to the
RAID and then fsck-ing leads to the exact same result but I wanted to
keep the second component as fall-back until I could see the results of
fsck.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Jay Vosburgh
Thimo,

Thanks for the update; just to clarify, for your "procedure to recover,"
are you saying that that procedure will always resolve the damage, or
that even after that procedure, there may be corruption?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Thimo E
Hi Matthew and all,

thank you for taking action immediately. I really appreciate your
effort.

After investigating the issue further I have to add that the mount
option discard seems to trigger the issue, too.

@Trent
The general problem here is that RAID10 can balance single read streams to all 
disks (which is probably the major advantage over RAID1 effectively providing 
you RAID0 read speed; RAID1 needs parallel reads to achieve this).

That said it is no big surprise that several machines at our site went to 
readonly mode after *some time* (probably reading some filesystem relevant data 
from the "bad disk"). Unfortunately the "clean first disk" only happens if you 
act immediately, otherwise you might have some data corruption.
I verified this on one system where the root partition was affected using the 
debsums tool (just run debsums -xa) after fixing FS errors.

My procedure to recover was:
Assembly of the RAID:
mdadm --assemble /dev/md127 /dev/nvme0n1p2
mdadm --run /dev/md127

Filesystem check on all partitions (note the -f parameter, some FS "think" they 
are clean):
fsck.ext4 -f /dev/VolGroup/...

Re-add the second component:
mdadm --zero-superblock /dev/nvme1n1p2
mdadm --add /dev/md127 /dev/nvme1n1p2

Best regards

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Khaled El Mously
** Changed in: linux (Ubuntu Focal)
   Status: In Progress => Fix Committed

** Changed in: linux (Ubuntu Groovy)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Trent Lloyd
** Attachment added: "blktrace-lp1907262.tar.gz"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+attachment/5442212/+files/blktrace-lp1907262.tar.gz

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Trent Lloyd
I can reproduce this on a Google Cloud n1-standard-16 using 2x Local
NVMe disks. Then partition nvme0n1 and nvne0n2 with only an 8GB
partition, then format directly with ext4 (skip LVM).

In this setup each 'check' takes <1 min so speeds up testing
considerably. Example details - seems pre-emptible instance cost for
this is $0.292/hour / $7/day.

gcloud compute instances create raid10-test --project=juju2-157804 \
--zone=us-west1-b \
--machine-type=n1-standard-16 \
--subnet=default \
--network-tier=STANDARD \
--no-restart-on-failure \
--maintenance-policy=TERMINATE \
--preemptible \
--boot-disk-size=32GB \
--boot-disk-type=pd-ssd \
--image=ubuntu-1804-bionic-v20201116 --image-project=ubuntu-os-cloud \
--local-ssd=interface=NVME  --local-ssd=interface=NVME

# apt install linux-image-virtual
# apt-get remove linux-image-gcp linux-image-5.4.0-1029-gcp 
linux-image-unsigned-5.4.0-1029-gcp   --purge
# reboot

sgdisk -n 0:0:+8G /dev/nvme0n1
sgdisk -n 0:0:+8G /dev/nvme0n2
mdadm -C -v -l10 -n2 -N "lv-raid" -R /dev/md0 /dev/nvme0n1p2 /dev/nvme1n1p2
mkfs.ext4 /dev/md0
mount /dev/md0 /mnt
dd if=/dev/zero of=/mnt/data.raw bs=4K count=1M; sync; rm /mnt/data.raw
echo check >/sys/block/md0/md/sync_action; watch 'grep . /proc/mdstat 
/sys/block/md0/md/mismatch_cnt' # no mismatch
fstrim -v /mnt
echo check >/sys/block/md0/md/sync_action; watch 'grep . /proc/mdstat 
/sys/block/md0/md/mismatch_cnt' # mismatch=256

I ran blktrace /dev/md0 /dev/nvme0n1 /dev/nvme0n2 and will upload the
results I didn't have time to try and understand the results as yet.

Some thoughts
 - It was asserted that the first disk 'appears' fine
 - So I wondered can we reliably repair by asking mdadm to do a 'repair' or 
'resync'
 - It seems that reads are at least sometimes balanced (maybe by PID) to 
different disks since this post.. 
https://www.spinics.net/lists/raid/msg62762.html - unclear if the same 
selection impacts writes (not that it would help performance)
 - So it's unclear we can reliably say only a 'passive mirror' is being 
corrupted, it's possible application reads may or may not be corrupted. More 
testing/understanding of the code required.
 - This area of RAID10 and RAID1 seems quite under-documented, "man md" doesn't 
talk much about how or which disk is used to repair the other if there is a 
mismatch (unlike RAID5 where the parity gives us some assurances as to which 
data is wrong).
 - We should try writes from different PIDs, with known different data, and 
compare the data on both disks with the known data to see if we can knowingly 
get the wrong data on both disks or only one. And try that with 4 disks instead 
of 2.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-08 Thread Khaled El Mously
** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-08 Thread Matthew Ruffell
Hi Thimo,

Firstly, thank you for your bug report, we really, really appreciate it.

You are correct, the recent raid10 patches appear to cause filesystem
corruption on raid10 arrays.

I have spent the day reproducing, and I can confirm that the
4.15.0-126-generic, 5.4.0-56-generic and 5.8.0-31-generic kernels are
affected.

The kernel team are aware of the situation, and we have begun an
emergency revert of the patches, and we should have new kernels
available in the next few hours / day or so.

The current mainline kernel is affected, so I have written to the raid
subsystem maintainer, and the original author of the raid10 block
discard patches, to aid with debugging and fixing the problem.

You can follow the upstream thread here:

https://www.spinics.net/lists/kernel/msg3765302.html

As for the data corruption on your servers, I am deeply sorry for
causing this regression.

When I was testing the raid10 block discard patches on the Ubuntu stable
kernels, I did not think to fsck each of the disks in the array,
instead, I was contempt with the speed of creating new arrays, writing a
basic dataset to the disks, and rebooting the server to ensure the array
came up again with those same files.

Since the first disk seems to be okay, there is at least a small window
of opportunity for you to restore any data that you have not backed up.

I will keep you informed of getting the patches reverted, and getting
the root cause fixed upstream. If you have any questions, feel free to
ask, and if you have any more details from your own debugging, feel free
to share in this bug, or on the upstream mailing list discussion.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-08 Thread Matthew Ruffell
** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Groovy)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Bionic)
   Status: New => In Progress

** Changed in: linux (Ubuntu Focal)
   Status: New => In Progress

** Changed in: linux (Ubuntu Groovy)
   Status: New => In Progress

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Focal)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Groovy)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-08 Thread Nivedita Singhvi
** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-08 Thread Matthew Ruffell
Hi Thimo,

Thank you for the very detailed bug report. I will start investigating this
immediately.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-08 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: linux (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs