[Bug 1847924] Re: Introduce broken state parsing to mdadm

Dan Streetman Sat, 21 Dec 2019 16:11:22 -0800

Uploaded mdadm with V2 patches to B/D/E, however please note as I just
updated this bug description to explain, all future updates to mdadm are
now temporarily blocked due to the mdadm changes from bug 1850540
requiring corresponding kernel patches that are not yet released.  I've
added the block-proposed-* tags to this bug to prevent release to
-updates.  Please see bug 1850540 comment 11 for details.


** Description changed:

  [Impact]
  
  * Currently, mounted raid0/md-linear arrays have no indication/warning
  when one or more members are removed or suffer from some non-recoverable
  error condition. The mdadm tool shows "clean" state regardless if a
  member was removed.
  
  * The patch proposed in this SRU addresses this issue by introducing a
  new state "broken", which is analog to "clean" but indicates that array
  is not in a good/correct state. The commit, available upstream as
  43ebc910 ("mdadm: Introduce new array state 'broken' for raid0/linear")
  [0], was extensively discussed and received a good amount of
  reviews/analysis by both the current mdadm maintainer as well as an old
  maintainer.
  
  * One important note here is that this patch requires a counter-part in the 
kernel to be fully functional, which was SRUed in LP: #1847773.
  It works fine/transparently without this kernel counter-part though.
  
  * We had reports of users testing a scenario of failed raid0 arrays, and
  getting 'clean' in mdadm proved to cause confusion and doesn't help on
  noticing something went wrong with the arrays.
  
  * The potential situation this patch (with its kernel counter-part)
  addresses is: an user has raid0/linear array, and it's mounted. If one
  member fails and gets removed (either physically, like a power or
  firmware issue, or in software, like a driver-induced removal due to
  detected failure), _without_ this patch (and its kernel counter-part)
  there's nothing to let user know it failed, except filesystem errors in
  dmesg. Also, non-direct writes to the filesystem will succeed, due to
  how page-cache/writeback work; even a 'sync' command run will succeed.
  
  * The case described in above bullet was tested and the writes to failed
  devices succeeded - after a reboot, the files written were present in
  the array, but corrupted. An user wouldn't noticed that unless if the
  writes were directed or some checksum was performed in the files. With
  this patch (and its kernel counter-part), the writes to such failed
  raid0/linear array are fast-failed and the filesystem goes read-only
  quickly.
  
  [Test case]
  
  * To test this patch, create a raid0 or linear md array on Linux using
  mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1
  /dev/nvme1n1";
  
  * Format the array using a FS of your choice (for example ext4) and
  mount the array;
  
  * Remove one member of the array, for example using sysfs interface (for
  nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi: echo 1
  > /sys/block/sdX/device/delete);
  
  * Without this patch, the array state shown by "mdadm --detail" is
  "clean", regardless a member is missing/failed.
  
  [Regression potential]
  
  * There are mainly two potential regressions here; the first is user-
  visible changes introduced by this mdadm patch. The second is if the
  patch itself has some unnoticed bug.
  
  * For the first type of potential regression: this patch introduces a
  change in how the array state is displayed in "mdadm --detail <array>"
  output for raid0/linear arrays *only*. Currently, the tool shows just 2
  states, "clean" or "active". In the patch being SRUed here, this changes
  for raid0/linear arrays to read the sysfs array state instead. So for
  example, we could read "readonly" state here for raid0/linear if the
  user (or some tool) changes the array to such state. This only affects
  raid0/linear, the output for other levels didn't change at all.
  
  * Regarding potential unnoticed issues in the code, we changed mainly
  structs and the "detail" command. Structs were incremented with the new
  "broken" state and the detail output was changed for raid0/linear as
  discussed in the previous bullet.
  
  * Note that we *proactively* skipped Xenial SRU here, in order to
  prevent potential regressions - Xenial mdadm tool lacks code
  infrastructure used by this patch, so the decision was for
  safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions.
  
  [0]
  https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910
  
  [other info]
  
- As mdadm for focal (20.04) hasn't been merged yet, this will need to be
- added there during or after merge.
+ The last mdadm upload for bug 1850540 added changes that depend on as-
+ yet unreleased kernel changes, and thus blocks any further release of
+ mdadm until the next Bionic point release; see bug 1850540 comment 11.
+ So, this bug (and all future mdadm bugs for Bionic, until the next point
+ release) must include the block-proposed-RELEASE tag(s) until the kernel
+ is released with the changes required by bug 1850540.

** Tags added: block-proposed-bionic block-proposed-disco block-
proposed-eoan

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1847924

Title:
  Introduce broken state parsing to mdadm

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1847924] Re: Introduce broken state parsing to mdadm

Reply via email to