** Description changed:
[Impact]
* Currently, mounted raid0/md-linear arrays have no indication/warning
when one or more members are removed or suffer from some non-recoverable
error condition. The mdadm tool shows "clean" state regardless if a
member was removed.
* The patch proposed in this SRU addresses this issue by introducing a
new state "broken", which is analog to "clean" but indicates that array
is not in a good/correct state. The commit, available upstream as
43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear")
[0], was extensively discussed and received a good amount of
reviews/analysis by both the current mdadm maintainer as well as an old
maintainer.
* One important note here is that this patch requires a counter-part in the
kernel to be fully functional, which was SRUed in LP: #1847773.
It works fine/transparently without this kernel counter-part though.
+
+ * We had reports of users testing a scenario of failed raid0 arrays, and
+ getting 'clean' in mdadm proved to cause confusion and doesn't help on
+ noticing something went wrong with the arrays.
+
+ * The potential situation this patch (with its kernel counter-part)
+ addresses is: an user has raid0/linear array, and it's mounted. If one
+ member fails and gets removed (either physically, like a power or
+ firmware issue, or in software, like a driver-induced removal due to
+ detected failure), _without_ this patch (and its kernel counter-part)
+ there's nothing to let user know it failed, except filesystem errors in
+ dmesg. Also, non-direct writes to the filesystem will succeed, due to
+ how page-cache/writeback work; even a 'sync' command run will succeed.
+
+ * The case described in above bullet was tested and the writes to failed
+ devices succeeded - after a reboot, the files written were present in
+ the array, but corrupted. An user wouldn't noticed that unless if the
+ writes were directed or some checksum was performed in the files. With
+ this patch (and its kernel counter-part), the writes to such failed
+ raid0/linear array are fast-failed and the filesystem goes read-only
+ quickly.
[Test case]
* To test this patch, create a raid0 or linear md array on Linux using
mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1
/dev/nvme1n1";
* Format the array using a FS of your choice (for example ext4) and
mount the array;
* Remove one member of the array, for example using sysfs interface (for
nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1
> /sys/block/sdX/device/delete);
* Without this patch, the array state shown by "mdadm --detail" is
"clean", regardless a member is missing/failed.
[Regression potential]
- * There's not much potential regression here; we just exhibit arrays'
- state as "broken" if they have one or more missing/failed members; we
- believe the most common "issue" that could be reported from this patch
- is if an userspace tool rely on the array status as being always "clean"
- even for broken devices, then such tool may behave differently with this
- patch.
+ * There are mainly two potential regressions here; the first is user-
+ visible changes introduced by this mdadm patch. The second is if the
+ patch itself has some unnoticed bug.
+
+ * For the first type of potential regression: this patch introduces a
+ change in how the array state is displayed in "mdadm --detail <array>"
+ output for raid0/linear arrays *only*. Currently, the tool shows just 2
+ states, "clean" or "active". In the patch being SRUed here, this changes
+ for raid0/linear arrays to read the sysfs array state instead. So for
+ example, we could read "readonly" state here for raid0/linear if the
+ user (or some tool) changes the array to such state. This only affects
+ raid0/linear, the output for other levels didn't change at all.
+
+ * Regarding potential unnoticed issues in the code, we changed mainly
+ structs and the "detail" command. Structs were incremented with the new
+ "broken" state and the detail output was changed for raid0/linear as
+ discussed in the previous bullet.
* Note that we *proactively* skipped Xenial SRU here, in order to
prevent potential regressions - Xenial mdadm tool lacks code
infrastructure used by this patch, so the decision was for
safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions.
[0]
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910
[other info]
- As mdadm for focal hasn't been merged yet, this will need to be added
- there during or after merge.
+ As mdadm for focal (20.04) hasn't been merged yet, this will need to be
+ added there during or after merge.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1847924
Title:
Introduce broken state parsing to mdadm
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs