** Description changed: [Impact] * Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non-recoverable error condition. The mdadm tool shows "clean" state regardless if a member was removed. * The patch proposed in this SRU addresses this issue by introducing a new state "broken", which is analog to "clean" but indicates that array is not in a good/correct state. The commit, available upstream as 43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear") [0], was extensively discussed and received a good amount of reviews/analysis by both the current mdadm maintainer as well as an old maintainer. * One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773. It works fine/transparently without this kernel counter-part though. + + * We had reports of users testing a scenario of failed raid0 arrays, and + getting 'clean' in mdadm proved to cause confusion and doesn't help on + noticing something went wrong with the arrays. + + * The potential situation this patch (with its kernel counter-part) + addresses is: an user has raid0/linear array, and it's mounted. If one + member fails and gets removed (either physically, like a power or + firmware issue, or in software, like a driver-induced removal due to + detected failure), _without_ this patch (and its kernel counter-part) + there's nothing to let user know it failed, except filesystem errors in + dmesg. Also, non-direct writes to the filesystem will succeed, due to + how page-cache/writeback work; even a 'sync' command run will succeed. + + * The case described in above bullet was tested and the writes to failed + devices succeeded - after a reboot, the files written were present in + the array, but corrupted. An user wouldn't noticed that unless if the + writes were directed or some checksum was performed in the files. With + this patch (and its kernel counter-part), the writes to such failed + raid0/linear array are fast-failed and the filesystem goes read-only + quickly. [Test case] * To test this patch, create a raid0 or linear md array on Linux using mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1"; * Format the array using a FS of your choice (for example ext4) and mount the array; * Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1 > /sys/block/sdX/device/delete); * Without this patch, the array state shown by "mdadm --detail" is "clean", regardless a member is missing/failed. [Regression potential] - * There's not much potential regression here; we just exhibit arrays' - state as "broken" if they have one or more missing/failed members; we - believe the most common "issue" that could be reported from this patch - is if an userspace tool rely on the array status as being always "clean" - even for broken devices, then such tool may behave differently with this - patch. + * There are mainly two potential regressions here; the first is user- + visible changes introduced by this mdadm patch. The second is if the + patch itself has some unnoticed bug. + + * For the first type of potential regression: this patch introduces a + change in how the array state is displayed in "mdadm --detail <array>" + output for raid0/linear arrays *only*. Currently, the tool shows just 2 + states, "clean" or "active". In the patch being SRUed here, this changes + for raid0/linear arrays to read the sysfs array state instead. So for + example, we could read "readonly" state here for raid0/linear if the + user (or some tool) changes the array to such state. This only affects + raid0/linear, the output for other levels didn't change at all. + + * Regarding potential unnoticed issues in the code, we changed mainly + structs and the "detail" command. Structs were incremented with the new + "broken" state and the detail output was changed for raid0/linear as + discussed in the previous bullet. * Note that we *proactively* skipped Xenial SRU here, in order to prevent potential regressions - Xenial mdadm tool lacks code infrastructure used by this patch, so the decision was for safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions. [0] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910 [other info] - As mdadm for focal hasn't been merged yet, this will need to be added - there during or after merge. + As mdadm for focal (20.04) hasn't been merged yet, this will need to be + added there during or after merge.
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1847924 Title: Introduce broken state parsing to mdadm To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs