** Description changed: - [Impact] - After installing the 4.15.0-67.76 kernel from bionic-proposed, our Nvidia DGX2 system is no longer bootable. + Users of RAID0 arrays are susceptible to a corruption issue if: + - The members of the RAID array are not all the same size[*] + - Data has been written to the array while running kernels < 3.14 and >= 3.14. - [Test Case] - [Fix] - [Regression Risk] + Upstream is dealing with this by adding a versioned layout in v5.4, and + backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 + is post 3.14. However, unless a layout-version-aware kernel *created* + the array, there's no way for the kernel to know which version was used + to write the existing data. This undefined mode is considered "Version + 0", and the kernel will now refuse to start these arrays w/o user + intervention. + + These changes are now coming into our kernels via stable backports of + the following commit, which describes the problem in the commit message: + + https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 + + The user experience is pretty awful here. A user upgrades to the next + SRU and all of a sudden their system stops at an (initramfs) prompt. A + clueful user can spot something like the following in dmesg: + + Here's the message which , as you can see from the log in Comment #1, is + hidden in a ton of other messages: + + [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting + [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 + [ 72.733979] md: pers->run() failed ... + mdadm: failed to start array /dev/md0: Unknown error 524 + + + What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable. + + https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) + + [*] Which surprisingly is not the case reported in this bug - the user + here had a raid0 of 8 identically-sized devices. I suspect there's a bug + in the detection code somewhere.
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
