[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
This bug was fixed in the package mdadm - 4.1-2ubuntu3.2 --- mdadm (4.1-2ubuntu3.2) eoan; urgency=medium * Introduce "broken" state for RAID0/Linear in mdadm (LP: #1847924) - d/p/lp1847924-Introduce-new-array-state-broken-for-raid0.patch * Disable patches from proposed version 4.1-2ubuntu3.1 due to issues with kernel -- gpicc...@canonical.com (Guilherme G. Piccoli) Tue, 14 Jan 2020 17:09:06 -0300 ** Changed in: mdadm (Ubuntu Eoan) Status: Fix Committed => Fix Released ** Changed in: mdadm (Ubuntu Bionic) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: Fix Released Status in mdadm source package in Bionic: Fix Released Status in mdadm source package in Disco: Won't Fix Status in mdadm source package in Eoan: Fix Released Status in mdadm source package in Focal: Fix Released Status in mdadm package in Debian: Fix Released Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is displayed in "mdadm --detail " output for raid0/linear arrays *only*. Currently, the tool shows just 2 states, "clean" or "active". In the patch being SRUed here, this changes for raid0/linear arrays to read the sysfs array state instead. So for example, we could read "readonly" state here for raid0/linear if the user (or some tool) changes the array to such state. This only affects raid0/linear, the output for other levels didn't change at all. * Regarding potential unnoticed issues in the code, we changed mainly structs and the "detail" command. Structs were incremented with the new "broken" state and the detail output was changed for raid0/linear as discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
** Tags removed: sts-sponsor-ddstreet ** Tags removed: sts-sponsor -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: Fix Released Status in mdadm source package in Bionic: Fix Committed Status in mdadm source package in Disco: Won't Fix Status in mdadm source package in Eoan: Fix Committed Status in mdadm source package in Focal: Fix Released Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is displayed in "mdadm --detail " output for raid0/linear arrays *only*. Currently, the tool shows just 2 states, "clean" or "active". In the patch being SRUed here, this changes for raid0/linear arrays to read the sysfs array state instead. So for example, we could read "readonly" state here for raid0/linear if the user (or some tool) changes the array to such state. This only affects raid0/linear, the output for other levels didn't change at all. * Regarding potential unnoticed issues in the code, we changed mainly structs and the "detail" command. Structs were incremented with the new "broken" state and the detail output was changed for raid0/linear as discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] The last mdadm upload for bug 1850540 added changes that depend on as- yet unreleased kernel changes, and thus blocks any further release of mdadm until the next Bionic point release; see bug 1850540 comment 11. So, this bug (and all future mdadm bugs for Bionic, until the
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
Thanks Dan and Łukasz - I've tested both Bionic (4.1~rc1-3~ubuntu18.04.4) and Eoan (4.1-2ubuntu3.2) versions on -proposed, following the test procedure suggest in the description - both versions passed. Cheers, Guilherme ** Tags removed: block-proposed-bionic block-proposed-disco block-proposed-eoan verification-needed verification-needed-bionic verification-needed-eoan ** Tags added: verification-done verification-done-bionic verification-done-eoan -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: Fix Released Status in mdadm source package in Bionic: Fix Committed Status in mdadm source package in Disco: Won't Fix Status in mdadm source package in Eoan: Fix Committed Status in mdadm source package in Focal: Fix Released Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is displayed in "mdadm --detail " output for raid0/linear arrays *only*. Currently, the tool shows just 2 states, "clean" or "active". In the patch being SRUed here, this changes for raid0/linear arrays to read the sysfs array state instead. So for example, we could read "readonly" state here for raid0/linear if the user (or some tool) changes the array to such state. This only affects raid0/linear, the output for other levels didn't change at all. * Regarding potential unnoticed issues in the code, we changed mainly structs and the "detail" command. Structs were incremented with the new "broken" state and the detail output was changed for raid0/linear as discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
Hello Guilherme, or anyone else affected, Accepted mdadm into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/mdadm/4.1-2ubuntu3.2 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: mdadm (Ubuntu Eoan) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-eoan ** Changed in: mdadm (Ubuntu Bionic) Status: In Progress => Fix Committed ** Tags added: verification-needed-bionic -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: Fix Released Status in mdadm source package in Bionic: Fix Committed Status in mdadm source package in Disco: Won't Fix Status in mdadm source package in Eoan: Fix Committed Status in mdadm source package in Focal: Fix Released Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
re-uploaded the V3 to Bionic and Eoan, @rbasak @sil2100 does this change address your concerns? As this removes (comments out) the patches from bug 1850540, it no longer needs to be blocked until a new kernel is released, as @gpiccoli explains in comment 21. -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: Fix Released Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: Won't Fix Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: Fix Released Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is displayed in "mdadm --detail " output for raid0/linear arrays *only*. Currently, the tool shows just 2 states, "clean" or "active". In the patch being SRUed here, this changes for raid0/linear arrays to read the sysfs array state instead. So for example, we could read "readonly" state here for raid0/linear if the user (or some tool) changes the array to such state. This only affects raid0/linear, the output for other levels didn't change at all. * Regarding potential unnoticed issues in the code, we changed mainly structs and the "detail" command. Structs were incremented with the new "broken" state and the detail output was changed for raid0/linear as discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] The last mdadm upload for bug 1850540 added changes that depend on as- yet unreleased
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
Hi @ddstreet, thanks for the last upload and all the effort in this case! So, we had a situation in which LP #1850540 (the raid0 layout issue) caused the mdadm package to stuck blocked in -proposed due to issues with kernel. They're still figuring that out, so we (Dann, Stephan, from kernel, and I) manage to reach a consensus: let's focus on this LP for now, while the kernel raid0 layout patches are worked. So, next steps here would be: (a) Drop the current mdadm packages from proposed, for all releases; (b) Upload this one and get it released to -proposed; (c) Test -proposed version and get this package released, in order to free the queue for the raid0 layout issue fixes. I'm re-uploading here the debdiffs for Bionic and Eoan (Disco won't get fixed, it's going to be EOL'ed next week). These debdiffs are based on bionic-updates / eoan-release versions of mdadm (the previous debdiffs were based on Dann's -proposed version). Please let me know in case you have any questions or concerns. Cheers, Guilherme ** Patch added: "Bionic debdiff V3" https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+attachment/5320500/+files/lp-1847924-bionic_V3.debdiff ** Changed in: mdadm (Ubuntu Disco) Status: In Progress => Won't Fix ** Changed in: mdadm (Ubuntu Disco) Assignee: Guilherme G. Piccoli (gpiccoli) => (unassigned) ** Changed in: mdadm (Ubuntu Disco) Importance: Medium => Undecided -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: Fix Released Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: Won't Fix Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: Fix Released Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
Uploaded mdadm with V2 patches to B/D/E, however please note as I just updated this bug description to explain, all future updates to mdadm are now temporarily blocked due to the mdadm changes from bug 1850540 requiring corresponding kernel patches that are not yet released. I've added the block-proposed-* tags to this bug to prevent release to -updates. Please see bug 1850540 comment 11 for details. ** Description changed: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non-recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is displayed in "mdadm --detail " output for raid0/linear arrays *only*. Currently, the tool shows just 2 states, "clean" or "active". In the patch being SRUed here, this changes for raid0/linear arrays to read the sysfs array state instead. So for example, we could read "readonly" state here for raid0/linear if the user (or some tool) changes the array to such state. This only affects raid0/linear, the output for other levels didn't change at all. * Regarding potential unnoticed issues in the code, we changed mainly structs and the "detail" command. Structs were incremented with the new "broken" state and the detail output was changed for raid0/linear as discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] - As mdadm for focal (20.04) hasn't been merged yet, this will need to be - added there during or after merge. + The last mdadm upload for bug 1850540 added changes that depend on as- + yet unreleased kernel changes, and thus blocks any further release of + mdadm until the next Bionic point release; see bug 1850540 comment 11. + So, this bug (and all future mdadm bugs for Bionic, until the next point + release) must include
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
Hi Robie and Łukasz, thanks again for your feedback and valid concerns. I've just updated the LP with new debdiffs for the modified SRU patch - this one won't change the mdadm behavior strictly more than necessary. Basically, the code chunk added by this debdiff (in B/D/E) is: ++ /* This is to keep output compatibility ++ * in the SRU for B/D/E - we only change ++ * the state output in case it's "broken" ++ */ ++ if (strcmp(arrayst, "broken")) ++ arrayst = "clean"; Summarizing the change: before the 'broken' state patch, "mdadm --detail" would only show 2 states for the arrays, either 'clean' or 'active'. This behavior was changed in the hereby SRUed patch: now, for RAID0/Linear arrays, in the state field on "mdadm --detail" we show the content of /sys/block/mdX/md/array_state. This is present in the upstream patch and the Focal merged version. But given the concerns of behavior change in already released versions (specially the LTS one, Bionic), we added the small chunk above and for those releases, the behavior will be: in the state field of "mdadm --detail" output, we only have 3 alternatives, the old 'clean' and 'active', and the new 'broken'. Hence, the only modification on the behavior is when arrays are broken and mdadm will provide this information through the state field. I've tested the version with the modified chunk and it's working as expected - also, I've commented this behavior difference in the patch description. The debdiffs were created against -proposed version of the packages (I expect them to get to -uploads soon, given the importance of the layout patch present there). Thanks, please let me know any more questions or concerns you have. Cheers, Guilherme -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: Fix Released Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: Fix Released Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 >
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
** Patch added: "lp-1847924-disco_V2.debdiff" https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+attachment/5311772/+files/lp-1847924-disco_V2.debdiff -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: Fix Released Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: Fix Released Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is displayed in "mdadm --detail " output for raid0/linear arrays *only*. Currently, the tool shows just 2 states, "clean" or "active". In the patch being SRUed here, this changes for raid0/linear arrays to read the sysfs array state instead. So for example, we could read "readonly" state here for raid0/linear if the user (or some tool) changes the array to such state. This only affects raid0/linear, the output for other levels didn't change at all. * Regarding potential unnoticed issues in the code, we changed mainly structs and the "detail" command. Structs were incremented with the new "broken" state and the detail output was changed for raid0/linear as discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] As mdadm for focal (20.04) hasn't been merged yet, this will need to be added there during or after merge. To manage notifications about this bug go to:
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
** Patch added: "lp-1847924-bionic_V2.debdiff" https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+attachment/5311771/+files/lp-1847924-bionic_V2.debdiff -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: Fix Released Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: Fix Released Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is displayed in "mdadm --detail " output for raid0/linear arrays *only*. Currently, the tool shows just 2 states, "clean" or "active". In the patch being SRUed here, this changes for raid0/linear arrays to read the sysfs array state instead. So for example, we could read "readonly" state here for raid0/linear if the user (or some tool) changes the array to such state. This only affects raid0/linear, the output for other levels didn't change at all. * Regarding potential unnoticed issues in the code, we changed mainly structs and the "detail" command. Structs were incremented with the new "broken" state and the detail output was changed for raid0/linear as discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] As mdadm for focal (20.04) hasn't been merged yet, this will need to be added there during or after merge. To manage notifications about this bug go to:
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
** Patch added: "lp-1847924-eoan_V2.debdiff" https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+attachment/5311773/+files/lp-1847924-eoan_V2.debdiff -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: Fix Released Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: Fix Released Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is displayed in "mdadm --detail " output for raid0/linear arrays *only*. Currently, the tool shows just 2 states, "clean" or "active". In the patch being SRUed here, this changes for raid0/linear arrays to read the sysfs array state instead. So for example, we could read "readonly" state here for raid0/linear if the user (or some tool) changes the array to such state. This only affects raid0/linear, the output for other levels didn't change at all. * Regarding potential unnoticed issues in the code, we changed mainly structs and the "detail" command. Structs were incremented with the new "broken" state and the detail output was changed for raid0/linear as discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] As mdadm for focal (20.04) hasn't been merged yet, this will need to be added there during or after merge. To manage notifications about this bug go to:
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
This bug was fixed in the package mdadm - 4.1-4ubuntu1 --- mdadm (4.1-4ubuntu1) focal; urgency=medium [ dann frazier ] * Merge from Debian unstable. Remaining changes: - Ship finalrd hook. - Do not install mdadm-shutdown.service on Ubuntu. - Drop broken and unused init scripts in favor of native systemd units, which can cause failure to reconfigure mdadm package under certain confiment types. - Drop /etc/cron.d/mdadm and migrate to systemd mdcheck_start|continue timer units. - Drop /etc/cron.daily/mdadm and migrate to system mdmonitor-oneshot timer unit. - mdcheck_start.timer configures the mdcheck on a first sunday of the month, with a randomized start delay of up to 24h, and runs for at most 6h. mdcheck_continue.timer kicks off daily, with a randomized start delay of up to 12h, and continues mdcheck for at most 6h. - mdmonitor-oneshot.timer runs daily, with a randomized start delay of up to 24h. - One can use systemd drop-ins to change .timer units timings, set environmental variables to decrease/increase the length of checking, or start the checks by hand. Previously used checkarray is still available, albeit not used by timer units. - Above ensures that previous daily / monthly checks are performed, but are randomized, such that performance is not as impacted across a cluster of machines. * Honor the debconf daily autoscan setting in the systemd timer. [ Guilherme G. Piccoli ] * Introduce "broken" state for RAID0/Linear in mdadm (LP: #1847924) -- dann frazier Wed, 04 Dec 2019 07:05:07 -0700 ** Changed in: mdadm (Ubuntu Focal) Status: In Progress => Fix Released -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: Fix Released Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: Fix Released Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed.
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
in the meantime I'll reject the current uploads from confusing the SRU folks -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: In Progress Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is displayed in "mdadm --detail " output for raid0/linear arrays *only*. Currently, the tool shows just 2 states, "clean" or "active". In the patch being SRUed here, this changes for raid0/linear arrays to read the sysfs array state instead. So for example, we could read "readonly" state here for raid0/linear if the user (or some tool) changes the array to such state. This only affects raid0/linear, the output for other levels didn't change at all. * Regarding potential unnoticed issues in the code, we changed mainly structs and the "detail" command. Structs were incremented with the new "broken" state and the detail output was changed for raid0/linear as discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] As mdadm for focal (20.04) hasn't been merged yet, this will need to be added there during or after merge. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions -- Mailing list:
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
Thanks a lot Robie and Łukasz, reasonable points! I'll work the new diff for B/D/E and in parallel I'm trying the merge for Focal. Cheers, Guilherme -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: In Progress Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is displayed in "mdadm --detail " output for raid0/linear arrays *only*. Currently, the tool shows just 2 states, "clean" or "active". In the patch being SRUed here, this changes for raid0/linear arrays to read the sysfs array state instead. So for example, we could read "readonly" state here for raid0/linear if the user (or some tool) changes the array to such state. This only affects raid0/linear, the output for other levels didn't change at all. * Regarding potential unnoticed issues in the code, we changed mainly structs and the "detail" command. Structs were incremented with the new "broken" state and the detail output was changed for raid0/linear as discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] As mdadm for focal (20.04) hasn't been merged yet, this will need to be added there during or after merge. To manage notifications about this bug go to:
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
Robie asked me for a second opinion on this, so I had a look at it right now. I must say that I'm a bit undecided here, similarly to Robie. But with my SRU-hat on, first thing first: we should not introduce behavior changes that can introduce regressions of existing tools without very solid reasons. So in this case, I only see (b) as an option, but this also depends how the SRU would look like if we added our additional changes on top. All this seems a bit 'a lot' for a medium-impact bug, especially for an important tool like mdadm. Also, this does fall a bit into the 'feature' category, which is not necessarily disallowed for an SRU, but not something we'd prefer doing without it having a lot of people affected/interested. So my opinion is: let's think about how much this change is needed. If we decide there is more than a few users that would profit from this change, the next step would be to see the final diff with the additional conditional and see if the change is still manageable. -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: In Progress Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is displayed in "mdadm --detail " output for raid0/linear arrays *only*. Currently, the tool shows just 2 states, "clean" or "active". In the patch being SRUed here, this changes for raid0/linear arrays to read the sysfs array state instead. So for example, we could read "readonly" state here for raid0/linear if the user (or some tool) changes the array to such state. This only affects raid0/linear, the output for other
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
Just a clarification comment - we're still discussing this (with Robie and the server/foundations folks) , so it's not really ready for upload since we may need a change in the patches for the current releases. Thanks, Guilherme -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: In Progress Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There are mainly two potential regressions here; the first is user- visible changes introduced by this mdadm patch. The second is if the patch itself has some unnoticed bug. * For the first type of potential regression: this patch introduces a change in how the array state is displayed in "mdadm --detail " output for raid0/linear arrays *only*. Currently, the tool shows just 2 states, "clean" or "active". In the patch being SRUed here, this changes for raid0/linear arrays to read the sysfs array state instead. So for example, we could read "readonly" state here for raid0/linear if the user (or some tool) changes the array to such state. This only affects raid0/linear, the output for other levels didn't change at all. * Regarding potential unnoticed issues in the code, we changed mainly structs and the "detail" command. Structs were incremented with the new "broken" state and the detail output was changed for raid0/linear as discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] As mdadm for focal (20.04) hasn't been merged yet, this will need to be added there during or after merge. To manage
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
Robie, I've just updated the description with more information on the user impact (and why we should have this patch), and also the regression potential. I wasn't completely right in my last comment, sorry - for RAID0/Linear arrays, the output of the array state in "mdadm --detail" may change after the patch _even for healthy arrays_. One example is the "readonly" state: we can write this state to the sysfs "array_state" file for a raid0/linear array, and currently the "mdadm --detail" will keep showing state "clean" (arguably an odd behavior, but it is how the state output works right now). With this patch, the state will show as "readonly". I think we have two options here: (a) We can backport the patch as is, hence risking some users running monitoring tools or any kind of scripts parsing "mdadm -detail" output breaking for raid0/linear arrays. (b) We can introduce a conditional check for Bionic/Disco/Eoan in the "mdadm --detail" output to keep "clean" as the array state unless it's "active" or "broken" (the new state for failed arrays). Then, Focal will get a merge from a more recent version of mdadm with the original patch and diverge from this behavior, being aligned with upstream/other distros. Basically, the discussion is to "when" introduce the user-visible change, if during the current releases or between releases (Focal being the first to change). There's an hypothetical 3rd alternative that I personally dislike and would avoid, that is introduce such conditional check for all Ubuntu releases from now on and diverge forever from upstream/other distros - I don't see a point in doing this. Let me know your considerations Robie, and thanks for pointing out valid concerns! Cheers, Guilherme -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: In Progress Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. * We had reports of users testing a scenario of failed raid0 arrays, and getting 'clean' in mdadm proved to cause confusion and doesn't help on noticing something went wrong with the arrays. * The potential situation this patch (with its kernel counter-part) addresses is: an user has raid0/linear array, and it's mounted. If one member fails and gets removed (either physically, like a power or firmware issue, or in software, like a driver-induced removal due to detected failure), _without_ this patch (and its kernel counter-part) there's nothing to let user know it failed, except filesystem errors in dmesg. Also, non-direct writes to the filesystem will succeed, due to how page-cache/writeback work; even a 'sync' command run will succeed. * The case described in above bullet was tested and the writes to failed devices succeeded - after a reboot, the files written were present in the array, but corrupted. An user wouldn't noticed that unless if the writes were directed or some checksum was performed in the files. With this patch (and its kernel counter-part), the writes to such failed raid0/linear array are fast-failed and the filesystem goes read-only quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
** Description changed: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non-recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. + + * We had reports of users testing a scenario of failed raid0 arrays, and + getting 'clean' in mdadm proved to cause confusion and doesn't help on + noticing something went wrong with the arrays. + + * The potential situation this patch (with its kernel counter-part) + addresses is: an user has raid0/linear array, and it's mounted. If one + member fails and gets removed (either physically, like a power or + firmware issue, or in software, like a driver-induced removal due to + detected failure), _without_ this patch (and its kernel counter-part) + there's nothing to let user know it failed, except filesystem errors in + dmesg. Also, non-direct writes to the filesystem will succeed, due to + how page-cache/writeback work; even a 'sync' command run will succeed. + + * The case described in above bullet was tested and the writes to failed + devices succeeded - after a reboot, the files written were present in + the array, but corrupted. An user wouldn't noticed that unless if the + writes were directed or some checksum was performed in the files. With + this patch (and its kernel counter-part), the writes to such failed + raid0/linear array are fast-failed and the filesystem goes read-only + quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] - * There's not much potential regression here; we just exhibit arrays' - state as "broken" if they have one or more missing/failed members; we - believe the most common "issue" that could be reported from this patch - is if an userspace tool rely on the array status as being always "clean" - even for broken devices, then such tool may behave differently with this - patch. + * There are mainly two potential regressions here; the first is user- + visible changes introduced by this mdadm patch. The second is if the + patch itself has some unnoticed bug. + + * For the first type of potential regression: this patch introduces a + change in how the array state is displayed in "mdadm --detail " + output for raid0/linear arrays *only*. Currently, the tool shows just 2 + states, "clean" or "active". In the patch being SRUed here, this changes + for raid0/linear arrays to read the sysfs array state instead. So for + example, we could read "readonly" state here for raid0/linear if the + user (or some tool) changes the array to such state. This only affects + raid0/linear, the output for other levels didn't change at all. + + * Regarding potential unnoticed issues in the code, we changed mainly + structs and the "detail" command. Structs were incremented with the new + "broken" state and the detail output was changed for raid0/linear as + discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] - As mdadm for focal hasn't been merged yet, this will need to be added - there during or after merge. + As mdadm for focal (20.04) hasn't been merged yet, this will need to be + added there during or after merge. -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
Hi Robie, thanks for you concerns! They are valid and make sense. Let me first give you some context on why this is needed in stable releases: the mdadm patch is a counter-part of a kernel modification which performs 2 things: (a) 'broken' state when raid0/linear arrays have a member missing; (b) refuse write I/O to such broken arrays. Without this kernel change, we can continue writing to a broken md device until FS notices some error, which may take a while. And due to writeback/page-cache mechanism, this would be transparent to the regular user, only being noticed by users looking logs and stats of disk writing. The kernel page-cache leads to a situation in which commands like 'dd' or even 'sync' succeeds to a broken device, leading to a partial-written/corrupted file. So, given we are in the process of SRU the kernel change, the mdadm counter-part is highly desirable due to the (a) part above. Or else, we get an incomplete functionality. Regarding your points: 1) Agreed! The change was well-tested though, and validated by 3 maintainers (2 from kernel md subsystem, one from mdadm itself). Also, despite the patch is a bit large, it basically adds this 'broken' state to places where it's needed, like arrays and defines (hence its size is not so small) and restrict the change clearly to md0/linear, not affecting at all any other levels. 2) You're partially right here, mdadm keeps working *exactly* the same for everything _except_ for raid0/linear status, in '--detail' option. For those 2 levels (raid0/linear), now we read the sysfs state of the array from sysfs. So, it's a behavior change, but a correct (or at least, well-accepted) one. And it's concise/scope-reduced, as it doesn't change any other level or any other functionality in raid0/linear, except state query. Feel free to comment in case you have more questions, or ping me in IRC also (gpiccoli in freenode). Now, I have one question for you: do you prefer we do a merge in Focal before accepting this SRU? In order to sync Focal with upstream, I mean. Thanks, Guilherme -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: In Progress Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There's not much potential regression here; we just exhibit arrays' state as "broken" if they have one or more missing/failed members; we believe the most common "issue" that could be reported from this patch is if an userspace tool rely on the array status as being always "clean" even for broken devices, then such tool may behave differently with this patch. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] As mdadm for focal hasn't been merged yet, this will need to be added there during or after merge. To manage
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
I think I understand the issue, but don't see a documented explanation of why this needs to be added to stable Ubuntu releases. What is the actual impact to users in not having mdadm able to pass through this "broken" state information, please, and how would adding this feature to stable releases resolve this impact? My two concerns are: 1) It looks like a medium complexity patch to mdadm and that risks introducing a regression. A regression in mdadm could be very severe, such as one that results in data loss. 2) As you mention, userspace tooling might break in unexpected ways. Is mdadm intended to behave *exactly* the same after this patch is applied for users with no arrays in the new "broken" state (excluding help/usage messages etc)? In other words, is any tooling breakage limited to when an array breaks? -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: In Progress Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There's not much potential regression here; we just exhibit arrays' state as "broken" if they have one or more missing/failed members; we believe the most common "issue" that could be reported from this patch is if an userspace tool rely on the array status as being always "clean" even for broken devices, then such tool may behave differently with this patch. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] As mdadm for focal hasn't been merged yet, this will need to be added there during or after merge. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions -- Mailing list: https://launchpad.net/~sts-sponsors Post to : sts-sponsors@lists.launchpad.net Unsubscribe : https://launchpad.net/~sts-sponsors More help : https://help.launchpad.net/ListHelp
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
** Changed in: mdadm (Debian) Status: Unknown => New -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: In Progress Status in mdadm package in Debian: New Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There's not much potential regression here; we just exhibit arrays' state as "broken" if they have one or more missing/failed members; we believe the most common "issue" that could be reported from this patch is if an userspace tool rely on the array status as being always "clean" even for broken devices, then such tool may behave differently with this patch. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] As mdadm for focal hasn't been merged yet, this will need to be added there during or after merge. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions -- Mailing list: https://launchpad.net/~sts-sponsors Post to : sts-sponsors@lists.launchpad.net Unsubscribe : https://launchpad.net/~sts-sponsors More help : https://help.launchpad.net/ListHelp
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
uploaded to b/d/e, and i'll look at merging f next week and including this patch. ** Tags added: sts-sponsor-ddstreet -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: In Progress Status in mdadm package in Debian: Unknown Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There's not much potential regression here; we just exhibit arrays' state as "broken" if they have one or more missing/failed members; we believe the most common "issue" that could be reported from this patch is if an userspace tool rely on the array status as being always "clean" even for broken devices, then such tool may behave differently with this patch. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] As mdadm for focal hasn't been merged yet, this will need to be added there during or after merge. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions -- Mailing list: https://launchpad.net/~sts-sponsors Post to : sts-sponsors@lists.launchpad.net Unsubscribe : https://launchpad.net/~sts-sponsors More help : https://help.launchpad.net/ListHelp
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
** Description changed: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non-recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There's not much potential regression here; we just exhibit arrays' state as "broken" if they have one or more missing/failed members; we believe the most common "issue" that could be reported from this patch is if an userspace tool rely on the array status as being always "clean" even for broken devices, then such tool may behave differently with this patch. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 + + [other info] + + As mdadm for focal hasn't been merged yet, this will need to be added + there during or after merge. -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: In Progress Status in mdadm package in Debian: Unknown Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There's not much potential regression here; we just exhibit arrays' state as "broken" if they have one or more missing/failed members; we believe the most common "issue" that could be reported from this patch is if an userspace tool rely on the array status as being always "clean" even for broken devices, then such tool may behave differently with this patch. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
** Bug watch added: Debian Bug tracker #943520 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=943520 ** Also affects: mdadm (Debian) via https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=943520 Importance: Unknown Status: Unknown -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: In Progress Status in mdadm package in Debian: Unknown Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There's not much potential regression here; we just exhibit arrays' state as "broken" if they have one or more missing/failed members; we believe the most common "issue" that could be reported from this patch is if an userspace tool rely on the array status as being always "clean" even for broken devices, then such tool may behave differently with this patch. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions -- Mailing list: https://launchpad.net/~sts-sponsors Post to : sts-sponsors@lists.launchpad.net Unsubscribe : https://launchpad.net/~sts-sponsors More help : https://help.launchpad.net/ListHelp
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
** Changed in: mdadm (Ubuntu Focal) Status: Confirmed => In Progress -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in Focal: In Progress Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There's not much potential regression here; we just exhibit arrays' state as "broken" if they have one or more missing/failed members; we believe the most common "issue" that could be reported from this patch is if an userspace tool rely on the array status as being always "clean" even for broken devices, then such tool may behave differently with this patch. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions -- Mailing list: https://launchpad.net/~sts-sponsors Post to : sts-sponsors@lists.launchpad.net Unsubscribe : https://launchpad.net/~sts-sponsors More help : https://help.launchpad.net/ListHelp
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
** Also affects: mdadm (Ubuntu Ff-series) Importance: Undecided Status: New ** Changed in: mdadm (Ubuntu Ff-series) Status: New => Confirmed ** Changed in: mdadm (Ubuntu Ff-series) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) ** Changed in: mdadm (Ubuntu Ff-series) Importance: Undecided => Medium -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Status in mdadm source package in FF-Series: Confirmed Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There's not much potential regression here; we just exhibit arrays' state as "broken" if they have one or more missing/failed members; we believe the most common "issue" that could be reported from this patch is if an userspace tool rely on the array status as being always "clean" even for broken devices, then such tool may behave differently with this patch. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions -- Mailing list: https://launchpad.net/~sts-sponsors Post to : sts-sponsors@lists.launchpad.net Unsubscribe : https://launchpad.net/~sts-sponsors More help : https://help.launchpad.net/ListHelp
[Sts-sponsors] [Bug 1847924] Re: Introduce broken state parsing to mdadm
Thanks @gpiccoli; I can't sponsor this now, since Eoan is in final freeze, but once it's released I'll be happy to sponsor. ** Changed in: mdadm (Ubuntu Eoan) Status: Confirmed => In Progress ** Changed in: mdadm (Ubuntu Disco) Status: Confirmed => In Progress ** Changed in: mdadm (Ubuntu Bionic) Status: Confirmed => In Progress -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm Status in mdadm package in Ubuntu: In Progress Status in mdadm source package in Bionic: In Progress Status in mdadm source package in Disco: In Progress Status in mdadm source package in Eoan: In Progress Bug description: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non- recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] * There's not much potential regression here; we just exhibit arrays' state as "broken" if they have one or more missing/failed members; we believe the most common "issue" that could be reported from this patch is if an userspace tool rely on the array status as being always "clean" even for broken devices, then such tool may behave differently with this patch. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions -- Mailing list: https://launchpad.net/~sts-sponsors Post to : sts-sponsors@lists.launchpad.net Unsubscribe : https://launchpad.net/~sts-sponsors More help : https://help.launchpad.net/ListHelp