** Description changed:

- [Impact]
- After installing the 4.15.0-67.76 kernel from bionic-proposed, our Nvidia 
DGX2 system is no longer bootable.
+ Users of RAID0 arrays are susceptible to a corruption issue if:
+  - The members of the RAID array are not all the same size[*]
+  - Data has been written to the array while running kernels < 3.14 and >= 
3.14.
  
- [Test Case]
- [Fix]
- [Regression Risk]
+ Upstream is dealing with this by adding a versioned layout in v5.4, and
+ backporting that via stable. Version 1 is the pre-3.14 layout, Version 2
+ is post 3.14. However, unless a layout-version-aware kernel *created*
+ the array, there's no way for the kernel to know which version was used
+ to write the existing data. This undefined mode is considered "Version
+ 0", and the kernel will now refuse to start these arrays w/o user
+ intervention.
+ 
+ These changes are now coming into our kernels via stable backports of
+ the following commit, which describes the problem in the commit message:
+ 
+ 
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
+ 
+ The user experience is pretty awful here. A user upgrades to the next
+ SRU and all of a sudden their system stops at an (initramfs) prompt. A
+ clueful user can spot something like the following in dmesg:
+ 
+ Here's the message which , as you can see from the log in Comment #1, is
+ hidden in a ton of other messages:
+ 
+ [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
+ [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
+ [ 72.733979] md: pers->run() failed ...
+ mdadm: failed to start array /dev/md0: Unknown error 524
+ 
+ 
+ What that is trying to say is that you should determine if your data - 
specifically the data toward the end of your array - was most likely written 
with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel 
parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel 
command line. And note it should be *raid0.default_layout* not 
*raid.default_layout* as the message says - a fix for that message is now 
queued for stable.
+ 
+ 
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
+ 
+ [*] Which surprisingly is not the case reported in this bug - the user
+ here had a raid0 of 8 identically-sized devices. I suspect there's a bug
+ in the detection code somewhere.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to