[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
** No longer affects: ubuntu-release-notes -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Fix Released Status in linux source package in Disco: Fix Released Status in linux source package in Eoan: Fix Released Status in linux source package in Focal: Fix Released Bug description: This bug tracks the temporary revert of the upstream fix for a corruption issue. Bug 1850540 tracks the re-application of that fix once we have a full solution. Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
This bug was fixed in the package linux - 5.3.0-24.26 --- linux (5.3.0-24.26) eoan; urgency=medium * eoan/linux: 5.3.0-24.26 -proposed tracker (LP: #1852232) * Eoan update: 5.3.9 upstream stable release (LP: #1851550) - io_uring: fix up O_NONBLOCK handling for sockets - dm snapshot: introduce account_start_copy() and account_end_copy() - dm snapshot: rework COW throttling to fix deadlock - Btrfs: fix inode cache block reserve leak on failure to allocate data space - btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents() - iio: adc: meson_saradc: Fix memory allocation order - iio: fix center temperature of bmc150-accel-core - libsubcmd: Make _FORTIFY_SOURCE defines dependent on the feature - perf tests: Avoid raising SEGV using an obvious NULL dereference - perf map: Fix overlapped map handling - perf script brstackinsn: Fix recovery from LBR/binary mismatch - perf jevents: Fix period for Intel fixed counters - perf tools: Propagate get_cpuid() error - perf annotate: Propagate perf_env__arch() error - perf annotate: Fix the signedness of failure returns - perf annotate: Propagate the symbol__annotate() error return - perf annotate: Fix arch specific ->init() failure errors - perf annotate: Return appropriate error code for allocation failures - perf annotate: Don't return -1 for error when doing BPF disassembly - staging: rtl8188eu: fix null dereference when kzalloc fails - RDMA/siw: Fix serialization issue in write_space() - RDMA/hfi1: Prevent memory leak in sdma_init - RDMA/iw_cxgb4: fix SRQ access from dump_qp() - RDMA/iwcm: Fix a lock inversion issue - HID: hyperv: Use in-place iterator API in the channel callback - kselftest: exclude failed TARGETS from runlist - selftests/kselftest/runner.sh: Add 45 second timeout per test - nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request - arm64: cpufeature: Effectively expose FRINT capability to userspace - arm64: Fix incorrect irqflag restore for priority masking for compat - arm64: ftrace: Ensure synchronisation in PLT setup for Neoverse-N1 #1542419 - tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()' - tty: serial: rda: Fix the link time qualifier of 'rda_uart_exit()' - serial/sifive: select SERIAL_EARLYCON - tty: n_hdlc: fix build on SPARC - misc: fastrpc: prevent memory leak in fastrpc_dma_buf_attach - RDMA/core: Fix an error handling path in 'res_get_common_doit()' - RDMA/cm: Fix memory leak in cm_add/remove_one - RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error path - RDMA/mlx5: Do not allow rereg of a ODP MR - RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu - RDMA/mlx5: Add missing synchronize_srcu() for MW cases - gpio: max77620: Use correct unit for debounce times - fs: cifs: mute -Wunused-const-variable message - arm64: vdso32: Fix broken compat vDSO build warnings - arm64: vdso32: Detect binutils support for dmb ishld - serial: mctrl_gpio: Check for NULL pointer - serial: 8250_omap: Fix gpio check for auto RTS/CTS - arm64: Default to building compat vDSO with clang when CONFIG_CC_IS_CLANG - arm64: vdso32: Don't use KBUILD_CPPFLAGS unconditionally - efi/cper: Fix endianness of PCIe class code - efi/x86: Do not clean dummy variable in kexec path - MIPS: include: Mark __cmpxchg as __always_inline - riscv: avoid kernel hangs when trapped in BUG() - riscv: avoid sending a SIGTRAP to a user thread trapped in WARN() - riscv: Correct the handling of unexpected ebreak in do_trap_break() - x86/xen: Return from panic notifier - ocfs2: clear zero in unaligned direct IO - fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry() - fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock() - fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc() - btrfs: silence maybe-uninitialized warning in clone_range - arm64: armv8_deprecated: Checking return value for memory allocation - sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision - sched/vtime: Fix guest/system mis-accounting on task switch - perf/core: Rework memory accounting in perf_mmap() - perf/core: Fix corner case in perf_rotate_context() - perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp - drm/amdgpu: fix memory leak - iio: imu: adis16400: release allocated memory on failure - iio: imu: adis16400: fix memory leak - iio: imu: st_lsm6dsx: fix waitime for st_lsm6dsx i2c controller - MIPS: include: Mark __xchg as __always_inline - MIPS: fw: sni: Fix out of bounds init of o32 stack - s390/cio: fix virtio-ccw DMA without PV - virt: vbox: fix memory
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
Aha - found the curtin installation log - this proves the theory in the previous comment: Oct 24 00:03:33 akis cloud-init[2796]: mdadm detail scan after assemble: Oct 24 00:03:33 akis cloud-init[2796]: ARRAY /dev/md/akis:0 level=raid0 num-devices=2 metadata=1.2 name=akis:0 UUID=01fd64cc:287c6b15:429f460f:3d53364b Oct 24 00:03:33 akis cloud-init[2796]:devices=/dev/nvme0n1p2,/dev/nvme1n1 Oct 24 00:03:33 akis cloud-init[2796]: ARRAY /dev/md/akis:1 level=raid0 num-devices=8 metadata=1.2 name=akis:1 UUID=567cc4bb:9ab3f3ac:5d876c61:320a20ad Oct 24 00:03:33 akis cloud-init[2796]: devices=/dev/nvme2n1,/dev/nvme3n1,/dev/nvme4n1,/dev/nvme5n1,/dev/nvme6n1,/dev/nvme7n1,/dev/nvme8n1,/dev/nvme9n1 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in Release Notes for Ubuntu: New Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Fix Released Status in linux source package in Disco: Fix Released Status in linux source package in Eoan: Fix Released Status in linux source package in Focal: Incomplete Bug description: This bug tracks the temporary revert of the upstream fix for a corruption issue. Bug 1850540 tracks the re-application of that fix once we have a full solution. Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-release-notes/+bug/1849682/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe :
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
In Comment #3, I noted that it was mysterious that we were seeing this at all on the reported system - but, after staring at the log, I think I've an explanation for that now. This system is supposed to be configured to have 8 identical NVMe drives in a raid0 mounted at /raid. There are also 2 other NVMes in this system which are supposed to have partitions configured in a raid1 for /. At least, that is what was *supposed* to be the case. A filtered version of the log in comment #1 shows: [ 16.757165] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 16.757165] md/raid0: please set raid.default_layout to 1 or 2 [ 16.757166] md: pers->run() failed ... [ 19.051379] md1: detected capacity change from 0 to 30724962910208 [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... While not explicit, we see that md1 is ginormous - matching the capacity we'd expect for the 8 drive raid0 that's supposed to be mounted at /raid. However, the md/raid0 driver is actually complaining about md*0*. I'm guessing that md0 is the array of 2 partitions that was supposed to be a raid1 mounted at /, but was misconfigured as a raid0. And it therefore makes sense that it is a multi-zone array, as we see that only one NVMe seems to be partitioned: [ 16.541847] nvme1n1: p1 p2 Presumably nvme1n1p1 is used as the EFI System Partition, and presumably nvme1n1p2 was combined with the full nvme0 block device to form a heterogenous raid0, which would therefore be multi-zone. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in Release Notes for Ubuntu: New Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Fix Released Status in linux source package in Disco: Fix Released Status in linux source package in Eoan: Fix Released Status in linux source package in Focal: Incomplete Bug description: This bug tracks the temporary revert of the upstream fix for a corruption issue. Bug 1850540 tracks the re-application of that fix once we have a full solution. Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
** Also affects: ubuntu-release-notes Importance: Undecided Status: New ** Bug watch added: Debian Bug tracker #944676 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=944676 ** Also affects: mdadm (Debian) via https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=944676 Importance: Unknown Status: Unknown ** No longer affects: mdadm (Debian) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in Release Notes for Ubuntu: New Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Fix Released Status in linux source package in Disco: Fix Released Status in linux source package in Eoan: Fix Released Status in linux source package in Focal: Incomplete Bug description: This bug tracks the temporary revert of the upstream fix for a corruption issue. Bug 1850540 tracks the re-application of that fix once we have a full solution. Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-release-notes/+bug/1849682/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
This bug was fixed in the package linux - 4.15.0-69.78 --- linux (4.15.0-69.78) bionic; urgency=medium * KVM NULL pointer deref (LP: #1851205) - KVM: nVMX: handle page fault in vmread fix * CVE-2018-12207 - KVM: MMU: drop vcpu param in gpte_access - kvm: Convert kvm_lock to a mutex - kvm: x86: Do not release the page inside mmu_set_spte() - KVM: x86: make FNAME(fetch) and __direct_map more similar - KVM: x86: remove now unneeded hugepage gfn adjustment - KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON - KVM: x86: add tracepoints around __direct_map and FNAME(fetch) - kvm: x86, powerpc: do not allow clearing largepages debugfs entry - SAUCE: KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active - SAUCE: x86: Add ITLB_MULTIHIT bug infrastructure - SAUCE: kvm: mmu: ITLB_MULTIHIT mitigation - SAUCE: kvm: Add helper function for creating VM worker threads - SAUCE: kvm: x86: mmu: Recovery of shattered NX large pages - SAUCE: cpu/speculation: Uninline and export CPU mitigations helpers - SAUCE: kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT * CVE-2019-11135 - KVM: x86: use Intel speculation bugs and features as derived in generic x86 code - x86/msr: Add the IA32_TSX_CTRL MSR - x86/cpu: Add a helper function x86_read_arch_cap_msr() - x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default - x86/speculation/taa: Add mitigation for TSX Async Abort - x86/speculation/taa: Add sysfs reporting for TSX Async Abort - kvm/x86: Export MDS_NO=0 to guests when TSX is enabled - x86/tsx: Add "auto" option to the tsx= cmdline parameter - x86/speculation/taa: Add documentation for TSX Async Abort - x86/tsx: Add config options to set tsx=on|off|auto - SAUCE: x86/speculation/taa: Call tsx_init() - SAUCE: x86/cpu: Include cpu header from bugs.c - [Config] Disable TSX by default when possible * CVE-2019-0154 - SAUCE: drm/i915: Lower RM timeout to avoid DSI hard hangs - SAUCE: drm/i915/gen8+: Add RC6 CTX corruption WA * CVE-2019-0155 - drm/i915/gtt: Add read only pages to gen8_pte_encode - drm/i915/gtt: Read-only pages for insert_entries on bdw+ - drm/i915/gtt: Disable read-only support under GVT - drm/i915: Prevent writing into a read-only object via a GGTT mmap - drm/i915/cmdparser: Check reg_table_count before derefencing. - drm/i915/cmdparser: Do not check past the cmd length. - drm/i915: Silence smatch for cmdparser - drm/i915: Move engine->needs_cmd_parser to engine->flags - SAUCE: drm/i915: Rename gen7 cmdparser tables - SAUCE: drm/i915: Disable Secure Batches for gen6+ - SAUCE: drm/i915: Remove Master tables from cmdparser - SAUCE: drm/i915: Add support for mandatory cmdparsing - SAUCE: drm/i915: Support ro ppgtt mapped cmdparser shadow buffers - SAUCE: drm/i915: Allow parsing of unsized batches - SAUCE: drm/i915: Add gen9 BCS cmdparsing - SAUCE: drm/i915/cmdparser: Use explicit goto for error paths - SAUCE: drm/i915/cmdparser: Add support for backward jumps - SAUCE: drm/i915/cmdparser: Ignore Length operands during command matching linux (4.15.0-68.77) bionic; urgency=medium * bionic/linux: 4.15.0-68.77 -proposed tracker (LP: #1849855) * [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting (LP: #1849682) - Revert "md/raid0: avoid RAID0 data corruption due to layout confusion." linux (4.15.0-67.76) bionic; urgency=medium * bionic/linux: 4.15.0-67.76 -proposed tracker (LP: #1849035) * Unexpected CFS throttling (LP: #1832151) - sched/fair: Add lsub_positive() and use it consistently - sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices - sched/fair: Fix -Wunused-but-set-variable warnings * [CML] New device IDs for CML-U (LP: #1843774) - i2c: i801: Add support for Intel Comet Lake - spi: pxa2xx: Add support for Intel Comet Lake * CVE-2019-17666 - SAUCE: rtlwifi: rtl8822b: Fix potential overflow on P2P code - SAUCE: rtlwifi: Fix potential overflow on P2P code * md raid0/linear doesn't show error state if an array member is removed and allows successful writes (LP: #1847773) - md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone * Change Config Option CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE for s390x from yes to no (LP: #1848492) - [Config] Change Config Option CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE for s390x from yes to no * [Packaging] Support building Flattened Image Tree (FIT) kernels (LP: #1847969) - [Packaging] add rules to build FIT image - [Packaging] force creation of headers directory * bcache: Performance degradation when querying priority_stats (LP: #1840043) - bcache: add cond_resched() in __bch_cache_cmp() * Add installer support
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
This bug was fixed in the package linux - 5.3.0-22.24 --- linux (5.3.0-22.24) eoan; urgency=medium * [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting (LP: #1849682) - Revert "md/raid0: avoid RAID0 data corruption due to layout confusion." * refcount underflow and type confusion in shiftfs (LP: #1850867) // CVE-2019-15793 - SAUCE: shiftfs: Correct id translation for lower fs operations - SAUCE: shiftfs: prevent type confusion - SAUCE: shiftfs: Fix refcount underflow in btrfs ioctl handling * CVE-2018-12207 - kvm: x86, powerpc: do not allow clearing largepages debugfs entry - SAUCE: KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active - SAUCE: x86: Add ITLB_MULTIHIT bug infrastructure - SAUCE: kvm: mmu: ITLB_MULTIHIT mitigation - SAUCE: kvm: Add helper function for creating VM worker threads - SAUCE: kvm: x86: mmu: Recovery of shattered NX large pages - SAUCE: cpu/speculation: Uninline and export CPU mitigations helpers - SAUCE: kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT * CVE-2019-11135 - x86/msr: Add the IA32_TSX_CTRL MSR - x86/cpu: Add a helper function x86_read_arch_cap_msr() - x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default - x86/speculation/taa: Add mitigation for TSX Async Abort - x86/speculation/taa: Add sysfs reporting for TSX Async Abort - kvm/x86: Export MDS_NO=0 to guests when TSX is enabled - x86/tsx: Add "auto" option to the tsx= cmdline parameter - x86/speculation/taa: Add documentation for TSX Async Abort - x86/tsx: Add config options to set tsx=on|off|auto - [Config] Disable TSX by default when possible * CVE-2019-0154 - SAUCE: drm/i915: Lower RM timeout to avoid DSI hard hangs - SAUCE: drm/i915/gen8+: Add RC6 CTX corruption WA * CVE-2019-0155 - SAUCE: drm/i915: Rename gen7 cmdparser tables - SAUCE: drm/i915: Disable Secure Batches for gen6+ - SAUCE: drm/i915: Remove Master tables from cmdparser - SAUCE: drm/i915: Add support for mandatory cmdparsing - SAUCE: drm/i915: Support ro ppgtt mapped cmdparser shadow buffers - SAUCE: drm/i915: Allow parsing of unsized batches - SAUCE: drm/i915: Add gen9 BCS cmdparsing - SAUCE: drm/i915/cmdparser: Use explicit goto for error paths - SAUCE: drm/i915/cmdparser: Add support for backward jumps - SAUCE: drm/i915/cmdparser: Ignore Length operands during command matching linux (5.3.0-21.22) eoan; urgency=medium * eoan/linux: 5.3.0-21.22 -proposed tracker (LP: #1850486) * Fix signing of staging modules in eoan (LP: #1850234) - [Packaging] Leave unsigned modules unsigned after adding .gnu_debuglink linux (5.3.0-20.21) eoan; urgency=medium * eoan/linux: 5.3.0-20.21 -proposed tracker (LP: #1849064) * eoan: alsa/sof: Enable SOF_HDA link and codec (LP: #1848490) - [Config] Enable SOF_HDA link and codec * Eoan update: 5.3.7 upstream stable release (LP: #1848750) - panic: ensure preemption is disabled during panic() - [Config] updateconfigs for USB_RIO500 - USB: rio500: Remove Rio 500 kernel driver - USB: yurex: Don't retry on unexpected errors - USB: yurex: fix NULL-derefs on disconnect - USB: usb-skeleton: fix runtime PM after driver unbind - USB: usb-skeleton: fix NULL-deref on disconnect - xhci: Fix false warning message about wrong bounce buffer write length - xhci: Prevent device initiated U1/U2 link pm if exit latency is too long - xhci: Check all endpoints for LPM timeout - xhci: Fix USB 3.1 capability detection on early xHCI 1.1 spec based hosts - usb: xhci: wait for CNR controller not ready bit in xhci resume - xhci: Prevent deadlock when xhci adapter breaks during init - xhci: Fix NULL pointer dereference in xhci_clear_tt_buffer_complete() - USB: adutux: fix use-after-free on disconnect - USB: adutux: fix NULL-derefs on disconnect - USB: adutux: fix use-after-free on release - USB: iowarrior: fix use-after-free on disconnect - USB: iowarrior: fix use-after-free on release - USB: iowarrior: fix use-after-free after driver unbind - USB: usblp: fix runtime PM after driver unbind - USB: chaoskey: fix use-after-free on release - USB: ldusb: fix NULL-derefs on driver unbind - serial: uartlite: fix exit path null pointer - serial: uartps: Fix uartps_major handling - USB: serial: keyspan: fix NULL-derefs on open() and write() - USB: serial: ftdi_sio: add device IDs for Sienna and Echelon PL-20 - USB: serial: option: add Telit FN980 compositions - USB: serial: option: add support for Cinterion CLS8 devices - USB: serial: fix runtime PM after driver unbind - USB: usblcd: fix I/O after disconnect - USB: microtek: fix info-leak at probe - USB: dummy-hcd: fix power budget for SuperSpeed mode - usb: renesas_usbhs: gadget: Do not discard queues
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
This bug was fixed in the package linux - 5.0.0-35.38 --- linux (5.0.0-35.38) disco; urgency=medium * [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting (LP: #1849682) - SAUCE: Fix revert "md/raid0: avoid RAID0 data corruption due to layout confusion." * refcount underflow and type confusion in shiftfs (LP: #1850867) // CVE-2019-15793 - SAUCE: shiftfs: Correct id translation for lower fs operations - SAUCE: shiftfs: prevent type confusion - SAUCE: shiftfs: Fix refcount underflow in btrfs ioctl handling * CVE-2018-12207 - kvm: Convert kvm_lock to a mutex - kvm: x86: Do not release the page inside mmu_set_spte() - KVM: x86: make FNAME(fetch) and __direct_map more similar - KVM: x86: remove now unneeded hugepage gfn adjustment - KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON - KVM: x86: add tracepoints around __direct_map and FNAME(fetch) - kvm: x86, powerpc: do not allow clearing largepages debugfs entry - SAUCE: KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active - SAUCE: x86: Add ITLB_MULTIHIT bug infrastructure - SAUCE: kvm: mmu: ITLB_MULTIHIT mitigation - SAUCE: kvm: Add helper function for creating VM worker threads - SAUCE: kvm: x86: mmu: Recovery of shattered NX large pages - SAUCE: cpu/speculation: Uninline and export CPU mitigations helpers - SAUCE: kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT * CVE-2019-11135 - KVM: x86: use Intel speculation bugs and features as derived in generic x86 code - x86/msr: Add the IA32_TSX_CTRL MSR - x86/cpu: Add a helper function x86_read_arch_cap_msr() - x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default - x86/speculation/taa: Add mitigation for TSX Async Abort - x86/speculation/taa: Add sysfs reporting for TSX Async Abort - kvm/x86: Export MDS_NO=0 to guests when TSX is enabled - x86/tsx: Add "auto" option to the tsx= cmdline parameter - x86/speculation/taa: Add documentation for TSX Async Abort - x86/tsx: Add config options to set tsx=on|off|auto - SAUCE: x86/speculation/taa: Call tsx_init() - [Config] Disable TSX by default when possible * CVE-2019-0154 - SAUCE: drm/i915: Lower RM timeout to avoid DSI hard hangs - SAUCE: drm/i915/gen8+: Add RC6 CTX corruption WA * CVE-2019-0155 - SAUCE: drm/i915: Rename gen7 cmdparser tables - SAUCE: drm/i915: Disable Secure Batches for gen6+ - SAUCE: drm/i915: Remove Master tables from cmdparser - SAUCE: drm/i915: Add support for mandatory cmdparsing - SAUCE: drm/i915: Support ro ppgtt mapped cmdparser shadow buffers - SAUCE: drm/i915: Allow parsing of unsized batches - SAUCE: drm/i915: Add gen9 BCS cmdparsing - SAUCE: drm/i915/cmdparser: Use explicit goto for error paths - SAUCE: drm/i915/cmdparser: Add support for backward jumps - SAUCE: drm/i915/cmdparser: Ignore Length operands during command matching linux (5.0.0-34.36) disco; urgency=medium * disco/linux: -proposed tracker (LP: #1850574) * [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting (LP: #1849682) - Revert "md/raid0: avoid RAID0 data corruption due to layout confusion." linux (5.0.0-33.35) disco; urgency=medium * disco/linux: 5.0.0-33.35 -proposed tracker (LP: #1849003) * Disco update: upstream stable patchset 2019-10-18 (LP: #1848817) - tpm: use tpm_try_get_ops() in tpm-sysfs.c. - drm/bridge: tc358767: Increase AUX transfer length limit - drm/panel: simple: fix AUO g185han01 horizontal blanking - video: ssd1307fb: Start page range at page_offset - drm/stm: attach gem fence to atomic state - drm/panel: check failure cases in the probe func - drm/rockchip: Check for fast link training before enabling psr - drm/radeon: Fix EEH during kexec - gpu: drm: radeon: Fix a possible null-pointer dereference in radeon_connector_set_property() - PCI: rpaphp: Avoid a sometimes-uninitialized warning - ipmi_si: Only schedule continuously in the thread in maintenance mode - clk: qoriq: Fix -Wunused-const-variable - clk: sunxi-ng: v3s: add missing clock slices for MMC2 module clocks - drm/amd/display: fix issue where 252-255 values are clipped - drm/amd/display: reprogram VM config when system resume - powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA window - clk: actions: Don't reference clk_init_data after registration - clk: sirf: Don't reference clk_init_data after registration - clk: sprd: Don't reference clk_init_data after registration - clk: zx296718: Don't reference clk_init_data after registration - powerpc/xmon: Check for HV mode when dumping XIVE info from OPAL - powerpc/rtas: use device model APIs and serialization during LPM - powerpc/futex: Fix warning: 'oldval' may be used
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
** Changed in: linux (Ubuntu Disco) Status: Incomplete => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Incomplete Status in linux source package in Focal: Incomplete Bug description: This bug tracks the temporary revert of the upstream fix for a corruption issue. Bug 1850540 tracks the re-application of that fix once we have a full solution. Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
I can 100% reproduce this on 5.0.0-34, but not at all on 5.0.0-32, so marking verification-failed. ** Tags removed: verification-needed-disco ** Tags added: verification-failed-disco -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: Incomplete Status in linux source package in Focal: Incomplete Bug description: This bug tracks the temporary revert of the upstream fix for a corruption issue. Bug 1850540 tracks the re-application of that fix once we have a full solution. Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
The disco story is not so pretty: [0.00] Linux version 5.0.0-34-generic (buildd@lgw01-amd64-051) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #36~18.04.1-Ubuntu SMP Wed Oct 30 08:08:56 UTC 2019 (Ubuntu 5.0.0-34.36~18.04.1-generic 5.0.21) [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.0.0-34-generic root=UUID=e3626652-82d8-4e95-a6d7-e4920d7941b6 ro console=tty1 console=ttyS0 [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Hygon HygonGenuine [0.00] Centaur CentaurHauls [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x1ffd7fff] usable [0.00] BIOS-e820: [mem 0x1ffd8000-0x1fff] reserved [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved [0.00] BIOS-e820: [mem 0xfffc-0x] reserved [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.8 present. [0.00] DMI: OpenStack Foundation OpenStack Nova, BIOS 1.10.2-1ubuntu1 04/01/2014 [0.00] Hypervisor detected: KVM [0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00 [0.00] kvm-clock: cpu 0, msr e601001, primary cpu clock [0.00] kvm-clock: using sched offset of 79214193145 cycles [0.02] clocksource: kvm-clock: mask: 0x max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [0.04] tsc: Detected 2298.666 MHz processor [0.002336] last_pfn = 0x1ffd8 max_arch_pfn = 0x4 [0.002446] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [0.008265] found SMP MP-table at [mem 0x000f6a60-0x000f6a6f] [0.008362] check: Scanning 1 areas for low memory corruption [0.008413] Using GB pages for direct mapping [0.008561] RAMDISK: [mem 0x1d8fb000-0x1ecacfff] [0.008579] ACPI: Early table checksum verification disabled [0.008637] ACPI: RSDP 0x000F6860 14 (v00 BOCHS ) [0.008640] ACPI: RSDT 0x1FFE1505 2C (v01 BOCHS BXPCRSDT 0001 BXPC 0001) [0.008646] ACPI: FACP 0x1FFE1419 74 (v01 BOCHS BXPCFACP 0001 BXPC 0001) [0.008654] ACPI: DSDT 0x1FFE0040 0013D9 (v01 BOCHS BXPCDSDT 0001 BXPC 0001) [0.008657] ACPI: FACS 0x1FFE 40 [0.008660] ACPI: APIC 0x1FFE148D 78 (v01 BOCHS BXPCAPIC 0001 BXPC 0001) [0.009155] No NUMA configuration found [0.009157] Faking a node at [mem 0x-0x1ffd7fff] [0.009167] NODE_DATA(0) allocated [mem 0x1ffad000-0x1ffd7fff] [0.009417] Zone ranges: [0.009418] DMA [mem 0x1000-0x00ff] [0.009420] DMA32[mem 0x0100-0x1ffd7fff] [0.009421] Normal empty [0.009422] Device empty [0.009423] Movable zone start for each node [0.009427] Early memory node ranges [0.009428] node 0: [mem 0x1000-0x0009efff] [0.009429] node 0: [mem 0x0010-0x1ffd7fff] [0.009433] Zeroed struct page in unavailable ranges: 98 pages [0.009434] Initmem setup node 0 [mem 0x1000-0x1ffd7fff] [0.013701] ACPI: PM-Timer IO Port: 0x608 [0.013719] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) [0.013776] IOAPIC[0]: apic_id 0, version 17, address 0xfec0, GSI 0-23 [0.013778] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [0.013780] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) [0.013781] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [0.013783] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) [0.013784] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) [0.013789] Using ACPI (MADT) for SMP configuration information [0.013792] smpboot: Allowing 1 CPUs, 0 hotplug CPUs [0.013814] PM: Registered nosave memory: [mem 0x-0x0fff] [0.013815] PM: Registered nosave memory: [mem 0x0009f000-0x0009] [0.013816] PM: Registered nosave memory: [mem 0x000a-0x000e] [0.013817] PM: Registered nosave memory: [mem 0x000f-0x000f] [0.013819] [mem 0x2000-0xfeffbfff] available for PCI devices [0.013820] Booting paravirtualized
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
= Verification = I see 2 pieces to this: 1) The original report, in Comment #1, where the offending patch caused an issue on a system where it shouldn't have - i.e., a raid0 w/ homogenous member sizes. We were never able to reproduce this in subsequent tests w/ the patch applied. I know sfeole was able to perform the same MAAS install/upgrade w/ the current -proposed kernel (I saw a test report from it), so I think we can confidently say it is not reproducible in that build either. 2) In configs where this patch *should* prevent a raid0 from assembling (heterogenous sizes), I've verified that if I create such an array on an older kernel, then upgrade to the current -proposed kernel, it starts automatically. Now, of course, I continue to be susceptible to corruption, but that's known and tracked in bug 1850540. $ cat /proc/version Linux version 4.15.0-66-generic (buildd@lgw01-amd64-044) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 $ sudo mdadm --create /dev/md0 --run --metadata=default --homehost=akis --level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1 mdadm: /dev/vdb1 appears to be part of a raid array: level=raid0 devices=2 ctime=Thu Oct 31 21:53:40 2019 mdadm: /dev/vdc1 appears to be part of a raid array: level=raid0 devices=2 ctime=Thu Oct 31 21:53:40 2019 mdadm: array /dev/md0 started. $ sudo reboot $ cat /proc/version Linux version 4.15.0-68-generic (buildd@lgw01-amd64-037) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #77-Ubuntu SMP Sun Oct 27 06:02:23 UTC 2019 $ cat /proc/mdstat Personalities : [raid0] [linear] [multipath] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : active raid0 vdc1[1] vdb1[0] 1567744 blocks super 1.2 512k chunks unused devices: -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: Incomplete Status in linux source package in Focal: Incomplete Bug description: This bug tracks the temporary revert of the upstream fix for a corruption issue. Bug 1850540 tracks the re-application of that fix once we have a full solution. Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
@dannf: Ping for verification :) ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: Incomplete Status in linux source package in Focal: Incomplete Bug description: This bug tracks the temporary revert of the upstream fix for a corruption issue. Bug 1850540 tracks the re-application of that fix once we have a full solution. Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- disco' to 'verification-done-disco'. If the problem still exists, change the tag 'verification-needed-disco' to 'verification-failed-disco'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-disco -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: Incomplete Status in linux source package in Focal: Incomplete Bug description: This bug tracks the temporary revert of the upstream fix for a corruption issue. Bug 1850540 tracks the re-application of that fix once we have a full solution. Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
** Description changed: + This bug tracks the temporary revert of the upstream fix for a + corruption issue. Bug 1850540 tracks the re-application of that fix once + we have a full solution. + Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: Incomplete Status in linux source package in Focal: Incomplete Bug description: This bug tracks the temporary revert of the upstream fix for a corruption issue. Bug 1850540 tracks the re-application of that fix once we have a full solution. Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed- bionic'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: Incomplete Status in linux source package in Focal: Incomplete Bug description: Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
** Changed in: linux (Ubuntu Bionic) Status: Confirmed => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: Incomplete Status in linux source package in Focal: Incomplete Bug description: Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
I've proposed changing the error message to point to a webpage w/ a better explanation: https://marc.info/?l=linux-raid=157196348406853=2 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: Incomplete Status in linux source package in Focal: Incomplete Bug description: Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
** Description changed: Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. - Upstream is dealing with this by adding a versioned layout in v5.4, and - backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 + This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: + https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 + + To summarize, upstream is dealing with this by adding a versioned layout + in v5.4, and that is being backported to stable kernels - which is why + we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. - - These changes are now coming into our kernels via stable backports of - the following commit, which describes the problem in the commit message: - - https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. + Note that it also seems like we should investigate backporting this to < + 3.14 kernels. Imagine a user switching between the trusty HWE kernel and + the GA kernel. + References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: Incomplete Status in linux source package in Focal: Incomplete Bug description: Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
** Description changed: Users of RAID0 arrays are susceptible to a corruption issue if: - - The members of the RAID array are not all the same size[*] - - Data has been written to the array while running kernels < 3.14 and >= 3.14. + - The members of the RAID array are not all the same size[*] + - Data has been written to the array while running kernels < 3.14 and >= 3.14. Upstream is dealing with this by adding a versioned layout in v5.4, and backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 - is post 3.14. However, unless a layout-version-aware kernel *created* - the array, there's no way for the kernel to know which version was used - to write the existing data. This undefined mode is considered "Version - 0", and the kernel will now refuse to start these arrays w/o user - intervention. + is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. + However, unless a layout-version-aware kernel *created* the array, + there's no way for the kernel to know which version(s) was used to write + the existing data. This undefined mode is considered "Version 0", and + the kernel will now refuse to start these arrays w/o user intervention. These changes are now coming into our kernels via stable backports of the following commit, which describes the problem in the commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 - - What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable. + What that is trying to say is that you should determine if your data - + specifically the data toward the end of your array - was most likely + written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with + the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on + the kernel command line. And note it should be *raid0.default_layout* + not *raid.default_layout* as the message says - a fix for that message + is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) + + IMHO, we should work with upstream to create a web page that clearly + walks the user through this process, and update the error message to + point to that page. I'd also like to see if we can detect this problem + *before* the user reboots (debconf?) and help the user fix things. e.g. + "We detected that you have RAID0 arrays that maybe susceptible to a + corruption problem", guide the user to choosing a layout, and update the + mdadm initramfs hook to poke the answer in via sysfs before starting the + array on reboot. + [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. ** Description changed: Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - - Data has been written to the array while running kernels < 3.14 and >= 3.14. + - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. Upstream is dealing with this by adding a versioned layout in v5.4, and backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. These changes are now coming into our kernels via stable backports of the following commit, which describes the problem in the commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
** Description changed: - [Impact] - After installing the 4.15.0-67.76 kernel from bionic-proposed, our Nvidia DGX2 system is no longer bootable. + Users of RAID0 arrays are susceptible to a corruption issue if: + - The members of the RAID array are not all the same size[*] + - Data has been written to the array while running kernels < 3.14 and >= 3.14. - [Test Case] - [Fix] - [Regression Risk] + Upstream is dealing with this by adding a versioned layout in v5.4, and + backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 + is post 3.14. However, unless a layout-version-aware kernel *created* + the array, there's no way for the kernel to know which version was used + to write the existing data. This undefined mode is considered "Version + 0", and the kernel will now refuse to start these arrays w/o user + intervention. + + These changes are now coming into our kernels via stable backports of + the following commit, which describes the problem in the commit message: + + https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 + + The user experience is pretty awful here. A user upgrades to the next + SRU and all of a sudden their system stops at an (initramfs) prompt. A + clueful user can spot something like the following in dmesg: + + Here's the message which , as you can see from the log in Comment #1, is + hidden in a ton of other messages: + + [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting + [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 + [ 72.733979] md: pers->run() failed ... + mdadm: failed to start array /dev/md0: Unknown error 524 + + + What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable. + + https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) + + [*] Which surprisingly is not the case reported in this bug - the user + here had a raid0 of 8 identically-sized devices. I suspect there's a bug + in the detection code somewhere. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: Incomplete Status in linux source package in Focal: Incomplete Bug description: Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[*] - Data has been written to the array while running kernels < 3.14 *and* >= 3.14. Upstream is dealing with this by adding a versioned layout in v5.4, and backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. These changes are now coming into our kernels via stable backports of the following commit, which describes the problem in the commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable:
[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting
OK - this is a messy one. It is due to the backport of this: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 Reverting that is probably not the right answer because the point of it is to avoid corruption. But this is a pretty serious usability issue. It is not at all clear from the message that a user needs to do *something* - and what that *something* is is even less clear: Here's the message, buried in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 So if you understand from that that you need to pass a kernel parameter, you're more intuitive than I am. And if you understand from that *why*, and *to which one* - well, you probably wrote the patch. And even then, you probably didn't realize the parameter is actually incorrect (HINT: we should backport this as well: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571). IMO, the error message should include a URL to page with clear steps on how to proceed which I think is something along the lines of "Use mdadm to figure out when your array was created, figure out what kernel you were running back then (ideally with a mapping to Ubuntu release), and then how to fix it. That said, it isn't clear to me why we saw this issue on this specific machine. This issue is supposedly restricted to only multi-zone RAID0 configs, which should only happen if not all members are the same size. But I happen to know that all members on this system here *are* the same size! I've tried to reproduce it but, after redeploying the system with MAAS, it upgrades and reboots w/o error :( -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1849682 Title: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Incomplete Status in linux source package in Eoan: Incomplete Status in linux source package in Focal: Incomplete Bug description: [Impact] After installing the 4.15.0-67.76 kernel from bionic-proposed, our Nvidia DGX2 system is no longer bootable. [Test Case] [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp