[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-12-06 Thread dann frazier
** No longer affects: ubuntu-release-notes

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Disco:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  This bug tracks the temporary revert of the upstream fix for a
  corruption issue. Bug 1850540 tracks the re-application of that fix
  once we have a full solution.

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays that maybe susceptible to
  a corruption problem", guide the user to choosing a layout, and update
  the mdadm initramfs hook to poke the answer in via sysfs before
  starting the array on reboot.

  Note that it also seems like we should investigate backporting this to
  < 3.14 kernels. Imagine a user switching between the trusty HWE kernel
  and the GA kernel.

  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/

  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a
  bug in the detection code somewhere.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-12-06 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.3.0-24.26

---
linux (5.3.0-24.26) eoan; urgency=medium

  * eoan/linux: 5.3.0-24.26 -proposed tracker (LP: #1852232)

  * Eoan update: 5.3.9 upstream stable release (LP: #1851550)
- io_uring: fix up O_NONBLOCK handling for sockets
- dm snapshot: introduce account_start_copy() and account_end_copy()
- dm snapshot: rework COW throttling to fix deadlock
- Btrfs: fix inode cache block reserve leak on failure to allocate data 
space
- btrfs: qgroup: Always free PREALLOC META reserve in
  btrfs_delalloc_release_extents()
- iio: adc: meson_saradc: Fix memory allocation order
- iio: fix center temperature of bmc150-accel-core
- libsubcmd: Make _FORTIFY_SOURCE defines dependent on the feature
- perf tests: Avoid raising SEGV using an obvious NULL dereference
- perf map: Fix overlapped map handling
- perf script brstackinsn: Fix recovery from LBR/binary mismatch
- perf jevents: Fix period for Intel fixed counters
- perf tools: Propagate get_cpuid() error
- perf annotate: Propagate perf_env__arch() error
- perf annotate: Fix the signedness of failure returns
- perf annotate: Propagate the symbol__annotate() error return
- perf annotate: Fix arch specific ->init() failure errors
- perf annotate: Return appropriate error code for allocation failures
- perf annotate: Don't return -1 for error when doing BPF disassembly
- staging: rtl8188eu: fix null dereference when kzalloc fails
- RDMA/siw: Fix serialization issue in write_space()
- RDMA/hfi1: Prevent memory leak in sdma_init
- RDMA/iw_cxgb4: fix SRQ access from dump_qp()
- RDMA/iwcm: Fix a lock inversion issue
- HID: hyperv: Use in-place iterator API in the channel callback
- kselftest: exclude failed TARGETS from runlist
- selftests/kselftest/runner.sh: Add 45 second timeout per test
- nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
- arm64: cpufeature: Effectively expose FRINT capability to userspace
- arm64: Fix incorrect irqflag restore for priority masking for compat
- arm64: ftrace: Ensure synchronisation in PLT setup for Neoverse-N1 
#1542419
- tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()'
- tty: serial: rda: Fix the link time qualifier of 'rda_uart_exit()'
- serial/sifive: select SERIAL_EARLYCON
- tty: n_hdlc: fix build on SPARC
- misc: fastrpc: prevent memory leak in fastrpc_dma_buf_attach
- RDMA/core: Fix an error handling path in 'res_get_common_doit()'
- RDMA/cm: Fix memory leak in cm_add/remove_one
- RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error path
- RDMA/mlx5: Do not allow rereg of a ODP MR
- RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu
- RDMA/mlx5: Add missing synchronize_srcu() for MW cases
- gpio: max77620: Use correct unit for debounce times
- fs: cifs: mute -Wunused-const-variable message
- arm64: vdso32: Fix broken compat vDSO build warnings
- arm64: vdso32: Detect binutils support for dmb ishld
- serial: mctrl_gpio: Check for NULL pointer
- serial: 8250_omap: Fix gpio check for auto RTS/CTS
- arm64: Default to building compat vDSO with clang when CONFIG_CC_IS_CLANG
- arm64: vdso32: Don't use KBUILD_CPPFLAGS unconditionally
- efi/cper: Fix endianness of PCIe class code
- efi/x86: Do not clean dummy variable in kexec path
- MIPS: include: Mark __cmpxchg as __always_inline
- riscv: avoid kernel hangs when trapped in BUG()
- riscv: avoid sending a SIGTRAP to a user thread trapped in WARN()
- riscv: Correct the handling of unexpected ebreak in do_trap_break()
- x86/xen: Return from panic notifier
- ocfs2: clear zero in unaligned direct IO
- fs: ocfs2: fix possible null-pointer dereferences in
  ocfs2_xa_prepare_entry()
- fs: ocfs2: fix a possible null-pointer dereference in
  ocfs2_write_end_nolock()
- fs: ocfs2: fix a possible null-pointer dereference in
  ocfs2_info_scan_inode_alloc()
- btrfs: silence maybe-uninitialized warning in clone_range
- arm64: armv8_deprecated: Checking return value for memory allocation
- sched/fair: Scale bandwidth quota and period without losing quota/period
  ratio precision
- sched/vtime: Fix guest/system mis-accounting on task switch
- perf/core: Rework memory accounting in perf_mmap()
- perf/core: Fix corner case in perf_rotate_context()
- perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp
- drm/amdgpu: fix memory leak
- iio: imu: adis16400: release allocated memory on failure
- iio: imu: adis16400: fix memory leak
- iio: imu: st_lsm6dsx: fix waitime for st_lsm6dsx i2c controller
- MIPS: include: Mark __xchg as __always_inline
- MIPS: fw: sni: Fix out of bounds init of o32 stack
- s390/cio: fix virtio-ccw DMA without PV
- virt: vbox: fix memory 

[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-11-13 Thread dann frazier
Aha - found the curtin installation log - this proves the theory in the
previous comment:

Oct 24 00:03:33 akis cloud-init[2796]: mdadm detail scan after assemble:
Oct 24 00:03:33 akis cloud-init[2796]: ARRAY /dev/md/akis:0 level=raid0 
num-devices=2 metadata=1.2 name=akis:0 UUID=01fd64cc:287c6b15:429f460f:3d53364b
Oct 24 00:03:33 akis cloud-init[2796]:devices=/dev/nvme0n1p2,/dev/nvme1n1
Oct 24 00:03:33 akis cloud-init[2796]: ARRAY /dev/md/akis:1 level=raid0 
num-devices=8 metadata=1.2 name=akis:1 UUID=567cc4bb:9ab3f3ac:5d876c61:320a20ad
Oct 24 00:03:33 akis cloud-init[2796]:
devices=/dev/nvme2n1,/dev/nvme3n1,/dev/nvme4n1,/dev/nvme5n1,/dev/nvme6n1,/dev/nvme7n1,/dev/nvme8n1,/dev/nvme9n1

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in Release Notes for Ubuntu:
  New
Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Disco:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Incomplete

Bug description:
  This bug tracks the temporary revert of the upstream fix for a
  corruption issue. Bug 1850540 tracks the re-application of that fix
  once we have a full solution.

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays that maybe susceptible to
  a corruption problem", guide the user to choosing a layout, and update
  the mdadm initramfs hook to poke the answer in via sysfs before
  starting the array on reboot.

  Note that it also seems like we should investigate backporting this to
  < 3.14 kernels. Imagine a user switching between the trusty HWE kernel
  and the GA kernel.

  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/

  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a
  bug in the detection code somewhere.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-release-notes/+bug/1849682/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : 

[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-11-13 Thread dann frazier
In Comment #3, I noted that it was mysterious that we were seeing this
at all on the reported system - but, after staring at the log, I think
I've an explanation for that now.

This system is supposed to be configured to have 8 identical NVMe drives
in a raid0 mounted at /raid. There are also 2 other NVMes in this system
which are supposed to have partitions configured in a raid1 for /. At
least, that is what was *supposed* to be the case.

A filtered version of the log in comment #1 shows:
[   16.757165] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
[   16.757165] md/raid0: please set raid.default_layout to 1 or 2
[   16.757166] md: pers->run() failed ...
[   19.051379] md1: detected capacity change from 0 to 30724962910208
[   72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
[   72.728149] md/raid0: please set raid.default_layout to 1 or 2
[   72.733979] md: pers->run() failed ...

While not explicit, we see that md1 is ginormous - matching the capacity
we'd expect for the 8 drive raid0 that's supposed to be mounted at
/raid. However, the md/raid0 driver is actually complaining about md*0*.
I'm guessing that md0 is the array of 2 partitions that was supposed to
be a raid1 mounted at /, but was misconfigured as a raid0. And it
therefore makes sense that it is a multi-zone array, as we see that only
one NVMe seems to be partitioned:

[   16.541847]  nvme1n1: p1 p2

Presumably nvme1n1p1 is used as the EFI System Partition, and presumably
nvme1n1p2 was combined with the full nvme0 block device to form a
heterogenous raid0, which would therefore be multi-zone.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in Release Notes for Ubuntu:
  New
Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Disco:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Incomplete

Bug description:
  This bug tracks the temporary revert of the upstream fix for a
  corruption issue. Bug 1850540 tracks the re-application of that fix
  once we have a full solution.

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays that maybe susceptible to
  a corruption problem", guide the user to choosing a layout, and 

[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-11-13 Thread dann frazier
** Also affects: ubuntu-release-notes
   Importance: Undecided
   Status: New

** Bug watch added: Debian Bug tracker #944676
   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=944676

** Also affects: mdadm (Debian) via
   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=944676
   Importance: Unknown
   Status: Unknown

** No longer affects: mdadm (Debian)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in Release Notes for Ubuntu:
  New
Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Disco:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Incomplete

Bug description:
  This bug tracks the temporary revert of the upstream fix for a
  corruption issue. Bug 1850540 tracks the re-application of that fix
  once we have a full solution.

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays that maybe susceptible to
  a corruption problem", guide the user to choosing a layout, and update
  the mdadm initramfs hook to poke the answer in via sysfs before
  starting the array on reboot.

  Note that it also seems like we should investigate backporting this to
  < 3.14 kernels. Imagine a user switching between the trusty HWE kernel
  and the GA kernel.

  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/

  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a
  bug in the detection code somewhere.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-release-notes/+bug/1849682/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-11-12 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-69.78

---
linux (4.15.0-69.78) bionic; urgency=medium

  * KVM NULL pointer deref (LP: #1851205)
- KVM: nVMX: handle page fault in vmread fix

  * CVE-2018-12207
- KVM: MMU: drop vcpu param in gpte_access
- kvm: Convert kvm_lock to a mutex
- kvm: x86: Do not release the page inside mmu_set_spte()
- KVM: x86: make FNAME(fetch) and __direct_map more similar
- KVM: x86: remove now unneeded hugepage gfn adjustment
- KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON
- KVM: x86: add tracepoints around __direct_map and FNAME(fetch)
- kvm: x86, powerpc: do not allow clearing largepages debugfs entry
- SAUCE: KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is
  active
- SAUCE: x86: Add ITLB_MULTIHIT bug infrastructure
- SAUCE: kvm: mmu: ITLB_MULTIHIT mitigation
- SAUCE: kvm: Add helper function for creating VM worker threads
- SAUCE: kvm: x86: mmu: Recovery of shattered NX large pages
- SAUCE: cpu/speculation: Uninline and export CPU mitigations helpers
- SAUCE: kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT

  * CVE-2019-11135
- KVM: x86: use Intel speculation bugs and features as derived in generic 
x86
  code
- x86/msr: Add the IA32_TSX_CTRL MSR
- x86/cpu: Add a helper function x86_read_arch_cap_msr()
- x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
- x86/speculation/taa: Add mitigation for TSX Async Abort
- x86/speculation/taa: Add sysfs reporting for TSX Async Abort
- kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
- x86/tsx: Add "auto" option to the tsx= cmdline parameter
- x86/speculation/taa: Add documentation for TSX Async Abort
- x86/tsx: Add config options to set tsx=on|off|auto
- SAUCE: x86/speculation/taa: Call tsx_init()
- SAUCE: x86/cpu: Include cpu header from bugs.c
- [Config] Disable TSX by default when possible

  * CVE-2019-0154
- SAUCE: drm/i915: Lower RM timeout to avoid DSI hard hangs
- SAUCE: drm/i915/gen8+: Add RC6 CTX corruption WA

  * CVE-2019-0155
- drm/i915/gtt: Add read only pages to gen8_pte_encode
- drm/i915/gtt: Read-only pages for insert_entries on bdw+
- drm/i915/gtt: Disable read-only support under GVT
- drm/i915: Prevent writing into a read-only object via a GGTT mmap
- drm/i915/cmdparser: Check reg_table_count before derefencing.
- drm/i915/cmdparser: Do not check past the cmd length.
- drm/i915: Silence smatch for cmdparser
- drm/i915: Move engine->needs_cmd_parser to engine->flags
- SAUCE: drm/i915: Rename gen7 cmdparser tables
- SAUCE: drm/i915: Disable Secure Batches for gen6+
- SAUCE: drm/i915: Remove Master tables from cmdparser
- SAUCE: drm/i915: Add support for mandatory cmdparsing
- SAUCE: drm/i915: Support ro ppgtt mapped cmdparser shadow buffers
- SAUCE: drm/i915: Allow parsing of unsized batches
- SAUCE: drm/i915: Add gen9 BCS cmdparsing
- SAUCE: drm/i915/cmdparser: Use explicit goto for error paths
- SAUCE: drm/i915/cmdparser: Add support for backward jumps
- SAUCE: drm/i915/cmdparser: Ignore Length operands during command matching

linux (4.15.0-68.77) bionic; urgency=medium

  * bionic/linux: 4.15.0-68.77 -proposed tracker (LP: #1849855)

  * [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with default_layout
setting (LP: #1849682)
- Revert "md/raid0: avoid RAID0 data corruption due to layout confusion."

linux (4.15.0-67.76) bionic; urgency=medium

  * bionic/linux: 4.15.0-67.76 -proposed tracker (LP: #1849035)

  * Unexpected CFS throttling  (LP: #1832151)
- sched/fair: Add lsub_positive() and use it consistently
- sched/fair: Fix low cpu usage with high throttling by removing expiration 
of
  cpu-local slices
- sched/fair: Fix -Wunused-but-set-variable warnings

  * [CML] New device IDs for CML-U (LP: #1843774)
- i2c: i801: Add support for Intel Comet Lake
- spi: pxa2xx: Add support for Intel Comet Lake

  * CVE-2019-17666
- SAUCE: rtlwifi: rtl8822b: Fix potential overflow on P2P code
- SAUCE: rtlwifi: Fix potential overflow on P2P code

  * md raid0/linear doesn't show error state if an array member is removed and
allows successful writes (LP: #1847773)
- md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone

  * Change Config Option CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE for s390x from yes
to no (LP: #1848492)
- [Config] Change Config Option CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE for 
s390x
  from yes to no

  * [Packaging] Support building Flattened Image Tree (FIT) kernels
(LP: #1847969)
- [Packaging] add rules to build FIT image
- [Packaging] force creation of headers directory

  * bcache: Performance degradation when querying priority_stats (LP: #1840043)
- bcache: add cond_resched() in __bch_cache_cmp()

  * Add installer support 

[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-11-12 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.3.0-22.24

---
linux (5.3.0-22.24) eoan; urgency=medium

  * [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with default_layout
setting (LP: #1849682)
- Revert "md/raid0: avoid RAID0 data corruption due to layout confusion."

  * refcount underflow and type confusion in shiftfs (LP: #1850867) // 
CVE-2019-15793
- SAUCE: shiftfs: Correct id translation for lower fs operations
- SAUCE: shiftfs: prevent type confusion
- SAUCE: shiftfs: Fix refcount underflow in btrfs ioctl handling

  * CVE-2018-12207
- kvm: x86, powerpc: do not allow clearing largepages debugfs entry
- SAUCE: KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is
  active
- SAUCE: x86: Add ITLB_MULTIHIT bug infrastructure
- SAUCE: kvm: mmu: ITLB_MULTIHIT mitigation
- SAUCE: kvm: Add helper function for creating VM worker threads
- SAUCE: kvm: x86: mmu: Recovery of shattered NX large pages
- SAUCE: cpu/speculation: Uninline and export CPU mitigations helpers
- SAUCE: kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT

  * CVE-2019-11135
- x86/msr: Add the IA32_TSX_CTRL MSR
- x86/cpu: Add a helper function x86_read_arch_cap_msr()
- x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
- x86/speculation/taa: Add mitigation for TSX Async Abort
- x86/speculation/taa: Add sysfs reporting for TSX Async Abort
- kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
- x86/tsx: Add "auto" option to the tsx= cmdline parameter
- x86/speculation/taa: Add documentation for TSX Async Abort
- x86/tsx: Add config options to set tsx=on|off|auto
- [Config] Disable TSX by default when possible

  * CVE-2019-0154
- SAUCE: drm/i915: Lower RM timeout to avoid DSI hard hangs
- SAUCE: drm/i915/gen8+: Add RC6 CTX corruption WA

  * CVE-2019-0155
- SAUCE: drm/i915: Rename gen7 cmdparser tables
- SAUCE: drm/i915: Disable Secure Batches for gen6+
- SAUCE: drm/i915: Remove Master tables from cmdparser
- SAUCE: drm/i915: Add support for mandatory cmdparsing
- SAUCE: drm/i915: Support ro ppgtt mapped cmdparser shadow buffers
- SAUCE: drm/i915: Allow parsing of unsized batches
- SAUCE: drm/i915: Add gen9 BCS cmdparsing
- SAUCE: drm/i915/cmdparser: Use explicit goto for error paths
- SAUCE: drm/i915/cmdparser: Add support for backward jumps
- SAUCE: drm/i915/cmdparser: Ignore Length operands during command matching

linux (5.3.0-21.22) eoan; urgency=medium

  * eoan/linux: 5.3.0-21.22 -proposed tracker (LP: #1850486)

  * Fix signing of staging modules in eoan (LP: #1850234)
- [Packaging] Leave unsigned modules unsigned after adding .gnu_debuglink

linux (5.3.0-20.21) eoan; urgency=medium

  * eoan/linux: 5.3.0-20.21 -proposed tracker (LP: #1849064)

  * eoan: alsa/sof: Enable SOF_HDA link and codec (LP: #1848490)
- [Config] Enable SOF_HDA link and codec

  * Eoan update: 5.3.7 upstream stable release (LP: #1848750)
- panic: ensure preemption is disabled during panic()
- [Config] updateconfigs for USB_RIO500
- USB: rio500: Remove Rio 500 kernel driver
- USB: yurex: Don't retry on unexpected errors
- USB: yurex: fix NULL-derefs on disconnect
- USB: usb-skeleton: fix runtime PM after driver unbind
- USB: usb-skeleton: fix NULL-deref on disconnect
- xhci: Fix false warning message about wrong bounce buffer write length
- xhci: Prevent device initiated U1/U2 link pm if exit latency is too long
- xhci: Check all endpoints for LPM timeout
- xhci: Fix USB 3.1 capability detection on early xHCI 1.1 spec based hosts
- usb: xhci: wait for CNR controller not ready bit in xhci resume
- xhci: Prevent deadlock when xhci adapter breaks during init
- xhci: Fix NULL pointer dereference in xhci_clear_tt_buffer_complete()
- USB: adutux: fix use-after-free on disconnect
- USB: adutux: fix NULL-derefs on disconnect
- USB: adutux: fix use-after-free on release
- USB: iowarrior: fix use-after-free on disconnect
- USB: iowarrior: fix use-after-free on release
- USB: iowarrior: fix use-after-free after driver unbind
- USB: usblp: fix runtime PM after driver unbind
- USB: chaoskey: fix use-after-free on release
- USB: ldusb: fix NULL-derefs on driver unbind
- serial: uartlite: fix exit path null pointer
- serial: uartps: Fix uartps_major handling
- USB: serial: keyspan: fix NULL-derefs on open() and write()
- USB: serial: ftdi_sio: add device IDs for Sienna and Echelon PL-20
- USB: serial: option: add Telit FN980 compositions
- USB: serial: option: add support for Cinterion CLS8 devices
- USB: serial: fix runtime PM after driver unbind
- USB: usblcd: fix I/O after disconnect
- USB: microtek: fix info-leak at probe
- USB: dummy-hcd: fix power budget for SuperSpeed mode
- usb: renesas_usbhs: gadget: Do not discard queues 

[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-11-12 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.0.0-35.38

---
linux (5.0.0-35.38) disco; urgency=medium

  * [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with default_layout
setting (LP: #1849682)
- SAUCE: Fix revert "md/raid0: avoid RAID0 data corruption due to layout
  confusion."

  * refcount underflow and type confusion in shiftfs (LP: #1850867) // 
CVE-2019-15793
- SAUCE: shiftfs: Correct id translation for lower fs operations
- SAUCE: shiftfs: prevent type confusion
- SAUCE: shiftfs: Fix refcount underflow in btrfs ioctl handling

  * CVE-2018-12207
- kvm: Convert kvm_lock to a mutex
- kvm: x86: Do not release the page inside mmu_set_spte()
- KVM: x86: make FNAME(fetch) and __direct_map more similar
- KVM: x86: remove now unneeded hugepage gfn adjustment
- KVM: x86: change kvm_mmu_page_get_gfn BUG_ON to WARN_ON
- KVM: x86: add tracepoints around __direct_map and FNAME(fetch)
- kvm: x86, powerpc: do not allow clearing largepages debugfs entry
- SAUCE: KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is
  active
- SAUCE: x86: Add ITLB_MULTIHIT bug infrastructure
- SAUCE: kvm: mmu: ITLB_MULTIHIT mitigation
- SAUCE: kvm: Add helper function for creating VM worker threads
- SAUCE: kvm: x86: mmu: Recovery of shattered NX large pages
- SAUCE: cpu/speculation: Uninline and export CPU mitigations helpers
- SAUCE: kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT

  * CVE-2019-11135
- KVM: x86: use Intel speculation bugs and features as derived in generic 
x86
  code
- x86/msr: Add the IA32_TSX_CTRL MSR
- x86/cpu: Add a helper function x86_read_arch_cap_msr()
- x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
- x86/speculation/taa: Add mitigation for TSX Async Abort
- x86/speculation/taa: Add sysfs reporting for TSX Async Abort
- kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
- x86/tsx: Add "auto" option to the tsx= cmdline parameter
- x86/speculation/taa: Add documentation for TSX Async Abort
- x86/tsx: Add config options to set tsx=on|off|auto
- SAUCE: x86/speculation/taa: Call tsx_init()
- [Config] Disable TSX by default when possible

  * CVE-2019-0154
- SAUCE: drm/i915: Lower RM timeout to avoid DSI hard hangs
- SAUCE: drm/i915/gen8+: Add RC6 CTX corruption WA

  * CVE-2019-0155
- SAUCE: drm/i915: Rename gen7 cmdparser tables
- SAUCE: drm/i915: Disable Secure Batches for gen6+
- SAUCE: drm/i915: Remove Master tables from cmdparser
- SAUCE: drm/i915: Add support for mandatory cmdparsing
- SAUCE: drm/i915: Support ro ppgtt mapped cmdparser shadow buffers
- SAUCE: drm/i915: Allow parsing of unsized batches
- SAUCE: drm/i915: Add gen9 BCS cmdparsing
- SAUCE: drm/i915/cmdparser: Use explicit goto for error paths
- SAUCE: drm/i915/cmdparser: Add support for backward jumps
- SAUCE: drm/i915/cmdparser: Ignore Length operands during command matching

linux (5.0.0-34.36) disco; urgency=medium

  * disco/linux:  -proposed tracker (LP: #1850574)

  * [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with default_layout
setting (LP: #1849682)
- Revert "md/raid0: avoid RAID0 data corruption due to layout confusion."

linux (5.0.0-33.35) disco; urgency=medium

  * disco/linux: 5.0.0-33.35 -proposed tracker (LP: #1849003)

  * Disco update: upstream stable patchset 2019-10-18 (LP: #1848817)
- tpm: use tpm_try_get_ops() in tpm-sysfs.c.
- drm/bridge: tc358767: Increase AUX transfer length limit
- drm/panel: simple: fix AUO g185han01 horizontal blanking
- video: ssd1307fb: Start page range at page_offset
- drm/stm: attach gem fence to atomic state
- drm/panel: check failure cases in the probe func
- drm/rockchip: Check for fast link training before enabling psr
- drm/radeon: Fix EEH during kexec
- gpu: drm: radeon: Fix a possible null-pointer dereference in
  radeon_connector_set_property()
- PCI: rpaphp: Avoid a sometimes-uninitialized warning
- ipmi_si: Only schedule continuously in the thread in maintenance mode
- clk: qoriq: Fix -Wunused-const-variable
- clk: sunxi-ng: v3s: add missing clock slices for MMC2 module clocks
- drm/amd/display: fix issue where 252-255 values are clipped
- drm/amd/display: reprogram VM config when system resume
- powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA
  window
- clk: actions: Don't reference clk_init_data after registration
- clk: sirf: Don't reference clk_init_data after registration
- clk: sprd: Don't reference clk_init_data after registration
- clk: zx296718: Don't reference clk_init_data after registration
- powerpc/xmon: Check for HV mode when dumping XIVE info from OPAL
- powerpc/rtas: use device model APIs and serialization during LPM
- powerpc/futex: Fix warning: 'oldval' may be used 

[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-11-08 Thread Khaled El Mously
** Changed in: linux (Ubuntu Disco)
   Status: Incomplete => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  This bug tracks the temporary revert of the upstream fix for a
  corruption issue. Bug 1850540 tracks the re-application of that fix
  once we have a full solution.

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays that maybe susceptible to
  a corruption problem", guide the user to choosing a layout, and update
  the mdadm initramfs hook to poke the answer in via sysfs before
  starting the array on reboot.

  Note that it also seems like we should investigate backporting this to
  < 3.14 kernels. Imagine a user switching between the trusty HWE kernel
  and the GA kernel.

  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/

  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a
  bug in the detection code somewhere.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-11-08 Thread dann frazier
I can 100% reproduce this on 5.0.0-34, but not at all on 5.0.0-32, so
marking verification-failed.

** Tags removed: verification-needed-disco
** Tags added: verification-failed-disco

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  This bug tracks the temporary revert of the upstream fix for a
  corruption issue. Bug 1850540 tracks the re-application of that fix
  once we have a full solution.

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays that maybe susceptible to
  a corruption problem", guide the user to choosing a layout, and update
  the mdadm initramfs hook to poke the answer in via sysfs before
  starting the array on reboot.

  Note that it also seems like we should investigate backporting this to
  < 3.14 kernels. Imagine a user switching between the trusty HWE kernel
  and the GA kernel.

  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/

  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a
  bug in the detection code somewhere.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-11-08 Thread dann frazier
The disco story is not so pretty:

[0.00] Linux version 5.0.0-34-generic (buildd@lgw01-amd64-051) (gcc 
version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #36~18.04.1-Ubuntu SMP Wed Oct 
30 08:08:56 UTC 2019 (Ubuntu 5.0.0-34.36~18.04.1-generic 5.0.21)
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-5.0.0-34-generic 
root=UUID=e3626652-82d8-4e95-a6d7-e4920d7941b6 ro console=tty1 console=ttyS0
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Hygon HygonGenuine
[0.00]   Centaur CentaurHauls
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x1ffd7fff] usable
[0.00] BIOS-e820: [mem 0x1ffd8000-0x1fff] reserved
[0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved
[0.00] BIOS-e820: [mem 0xfffc-0x] reserved
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.8 present.
[0.00] DMI: OpenStack Foundation OpenStack Nova, BIOS 1.10.2-1ubuntu1 
04/01/2014
[0.00] Hypervisor detected: KVM
[0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
[0.00] kvm-clock: cpu 0, msr e601001, primary cpu clock
[0.00] kvm-clock: using sched offset of 79214193145 cycles
[0.02] clocksource: kvm-clock: mask: 0x max_cycles: 
0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[0.04] tsc: Detected 2298.666 MHz processor
[0.002336] last_pfn = 0x1ffd8 max_arch_pfn = 0x4
[0.002446] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[0.008265] found SMP MP-table at [mem 0x000f6a60-0x000f6a6f]
[0.008362] check: Scanning 1 areas for low memory corruption
[0.008413] Using GB pages for direct mapping
[0.008561] RAMDISK: [mem 0x1d8fb000-0x1ecacfff]
[0.008579] ACPI: Early table checksum verification disabled
[0.008637] ACPI: RSDP 0x000F6860 14 (v00 BOCHS )
[0.008640] ACPI: RSDT 0x1FFE1505 2C (v01 BOCHS  BXPCRSDT 
0001 BXPC 0001)
[0.008646] ACPI: FACP 0x1FFE1419 74 (v01 BOCHS  BXPCFACP 
0001 BXPC 0001)
[0.008654] ACPI: DSDT 0x1FFE0040 0013D9 (v01 BOCHS  BXPCDSDT 
0001 BXPC 0001)
[0.008657] ACPI: FACS 0x1FFE 40
[0.008660] ACPI: APIC 0x1FFE148D 78 (v01 BOCHS  BXPCAPIC 
0001 BXPC 0001)
[0.009155] No NUMA configuration found
[0.009157] Faking a node at [mem 0x-0x1ffd7fff]
[0.009167] NODE_DATA(0) allocated [mem 0x1ffad000-0x1ffd7fff]
[0.009417] Zone ranges:
[0.009418]   DMA  [mem 0x1000-0x00ff]
[0.009420]   DMA32[mem 0x0100-0x1ffd7fff]
[0.009421]   Normal   empty
[0.009422]   Device   empty
[0.009423] Movable zone start for each node
[0.009427] Early memory node ranges
[0.009428]   node   0: [mem 0x1000-0x0009efff]
[0.009429]   node   0: [mem 0x0010-0x1ffd7fff]
[0.009433] Zeroed struct page in unavailable ranges: 98 pages
[0.009434] Initmem setup node 0 [mem 0x1000-0x1ffd7fff]
[0.013701] ACPI: PM-Timer IO Port: 0x608
[0.013719] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[0.013776] IOAPIC[0]: apic_id 0, version 17, address 0xfec0, GSI 0-23
[0.013778] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.013780] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[0.013781] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[0.013783] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[0.013784] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[0.013789] Using ACPI (MADT) for SMP configuration information
[0.013792] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[0.013814] PM: Registered nosave memory: [mem 0x-0x0fff]
[0.013815] PM: Registered nosave memory: [mem 0x0009f000-0x0009]
[0.013816] PM: Registered nosave memory: [mem 0x000a-0x000e]
[0.013817] PM: Registered nosave memory: [mem 0x000f-0x000f]
[0.013819] [mem 0x2000-0xfeffbfff] available for PCI devices
[0.013820] Booting paravirtualized 

[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-11-08 Thread dann frazier
= Verification =
I see 2 pieces to this:
 1) The original report, in Comment #1, where the offending patch caused an 
issue on a system where it shouldn't have - i.e., a raid0 w/ homogenous member 
sizes. We were never able to reproduce this in subsequent tests w/ the patch 
applied. I know sfeole was able to perform the same MAAS install/upgrade w/ the 
current -proposed kernel (I saw a test report from it), so I think we can 
confidently say it is not reproducible in that build either.

 2) In configs where this patch *should* prevent a raid0 from assembling 
(heterogenous sizes), I've verified that if I create such an array on an older 
kernel, then upgrade to the current -proposed kernel, it starts automatically. 
Now, of course, I continue to be susceptible to corruption, but that's known 
and tracked in bug 1850540.
  
$ cat /proc/version
Linux version 4.15.0-66-generic (buildd@lgw01-amd64-044) (gcc version 7.4.0 
(Ubuntu 7.4.0-1ubuntu1~18.04.1)) #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019
$ sudo mdadm --create /dev/md0 --run --metadata=default --homehost=akis 
--level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1
mdadm: /dev/vdb1 appears to be part of a raid array:
   level=raid0 devices=2 ctime=Thu Oct 31 21:53:40 2019
mdadm: /dev/vdc1 appears to be part of a raid array:
   level=raid0 devices=2 ctime=Thu Oct 31 21:53:40 2019
mdadm: array /dev/md0 started.
$ sudo reboot

$ cat /proc/version
Linux version 4.15.0-68-generic (buildd@lgw01-amd64-037) (gcc version 7.4.0 
(Ubuntu 7.4.0-1ubuntu1~18.04.1)) #77-Ubuntu SMP Sun Oct 27 06:02:23 UTC 2019
$ cat /proc/mdstat
Personalities : [raid0] [linear] [multipath] [raid1] [raid6] [raid5] [raid4] 
[raid10] 
md127 : active raid0 vdc1[1] vdb1[0]
  1567744 blocks super 1.2 512k chunks
  
unused devices: 

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  This bug tracks the temporary revert of the upstream fix for a
  corruption issue. Bug 1850540 tracks the re-application of that fix
  once we have a full solution.

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays 

[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-11-05 Thread Khaled El Mously
@dannf: Ping for verification :)

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  This bug tracks the temporary revert of the upstream fix for a
  corruption issue. Bug 1850540 tracks the re-application of that fix
  once we have a full solution.

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays that maybe susceptible to
  a corruption problem", guide the user to choosing a layout, and update
  the mdadm initramfs hook to poke the answer in via sysfs before
  starting the array on reboot.

  Note that it also seems like we should investigate backporting this to
  < 3.14 kernels. Imagine a user switching between the trusty HWE kernel
  and the GA kernel.

  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/

  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a
  bug in the detection code somewhere.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-10-30 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
disco' to 'verification-done-disco'. If the problem still exists, change
the tag 'verification-needed-disco' to 'verification-failed-disco'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-disco

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  This bug tracks the temporary revert of the upstream fix for a
  corruption issue. Bug 1850540 tracks the re-application of that fix
  once we have a full solution.

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays that maybe susceptible to
  a corruption problem", guide the user to choosing a layout, and update
  the mdadm initramfs hook to poke the answer in via sysfs before
  starting the array on reboot.

  Note that it also seems like we should investigate backporting this to
  < 3.14 kernels. Imagine a user switching between the trusty HWE kernel
  and the GA kernel.

  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/

  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a
  bug in the detection code somewhere.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-10-29 Thread dann frazier
** Description changed:

+ This bug tracks the temporary revert of the upstream fix for a
+ corruption issue. Bug 1850540 tracks the re-application of that fix once
+ we have a full solution.
+ 
  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.
  
  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
  
  To summarize, upstream is dealing with this by adding a versioned layout
  in v5.4, and that is being backported to stable kernels - which is why
  we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2
  is post 3.14. Mixing version 1 & version 2 layouts can cause corruption.
  However, unless a layout-version-aware kernel *created* the array,
  there's no way for the kernel to know which version(s) was used to write
  the existing data. This undefined mode is considered "Version 0", and
  the kernel will now refuse to start these arrays w/o user intervention.
  
  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:
  
  Here's the message which , as you can see from the log in Comment #1, is
  hidden in a ton of other messages:
  
  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524
  
  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with
  the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on
  the kernel command line. And note it should be *raid0.default_layout*
  not *raid.default_layout* as the message says - a fix for that message
  is now queued for stable:
  
  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
  
  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things. e.g.
  "We detected that you have RAID0 arrays that maybe susceptible to a
  corruption problem", guide the user to choosing a layout, and update the
  mdadm initramfs hook to poke the answer in via sysfs before starting the
  array on reboot.
  
  Note that it also seems like we should investigate backporting this to <
  3.14 kernels. Imagine a user switching between the trusty HWE kernel and
  the GA kernel.
  
  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/
  
  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a bug
  in the detection code somewhere.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  This bug tracks the temporary revert of the upstream fix for a
  corruption issue. Bug 1850540 tracks the re-application of that fix
  once we have a full solution.

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the 

[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-10-29 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays that maybe susceptible to
  a corruption problem", guide the user to choosing a layout, and update
  the mdadm initramfs hook to poke the answer in via sysfs before
  starting the array on reboot.

  Note that it also seems like we should investigate backporting this to
  < 3.14 kernels. Imagine a user switching between the trusty HWE kernel
  and the GA kernel.

  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/

  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a
  bug in the detection code somewhere.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-10-27 Thread Khaled El Mously
** Changed in: linux (Ubuntu Bionic)
   Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays that maybe susceptible to
  a corruption problem", guide the user to choosing a layout, and update
  the mdadm initramfs hook to poke the answer in via sysfs before
  starting the array on reboot.

  Note that it also seems like we should investigate backporting this to
  < 3.14 kernels. Imagine a user switching between the trusty HWE kernel
  and the GA kernel.

  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/

  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a
  bug in the detection code somewhere.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-10-24 Thread dann frazier
I've proposed changing the error message to point to a webpage w/ a better 
explanation:
  https://marc.info/?l=linux-raid=157196348406853=2

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays that maybe susceptible to
  a corruption problem", guide the user to choosing a layout, and update
  the mdadm initramfs hook to poke the answer in via sysfs before
  starting the array on reboot.

  Note that it also seems like we should investigate backporting this to
  < 3.14 kernels. Imagine a user switching between the trusty HWE kernel
  and the GA kernel.

  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/

  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a
  bug in the detection code somewhere.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-10-24 Thread dann frazier
** Description changed:

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.
  
- Upstream is dealing with this by adding a versioned layout in v5.4, and
- backporting that via stable. Version 1 is the pre-3.14 layout, Version 2
+ This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
+ 
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
+ 
+ To summarize, upstream is dealing with this by adding a versioned layout
+ in v5.4, and that is being backported to stable kernels - which is why
+ we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2
  is post 3.14. Mixing version 1 & version 2 layouts can cause corruption.
  However, unless a layout-version-aware kernel *created* the array,
  there's no way for the kernel to know which version(s) was used to write
  the existing data. This undefined mode is considered "Version 0", and
  the kernel will now refuse to start these arrays w/o user intervention.
- 
- These changes are now coming into our kernels via stable backports of
- the following commit, which describes the problem in the commit message:
- 
- 
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
  
  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:
  
  Here's the message which , as you can see from the log in Comment #1, is
  hidden in a ton of other messages:
  
  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524
  
  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with
  the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on
  the kernel command line. And note it should be *raid0.default_layout*
  not *raid.default_layout* as the message says - a fix for that message
  is now queued for stable:
  
  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
  
  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things. e.g.
  "We detected that you have RAID0 arrays that maybe susceptible to a
  corruption problem", guide the user to choosing a layout, and update the
  mdadm initramfs hook to poke the answer in via sysfs before starting the
  array on reboot.
  
+ Note that it also seems like we should investigate backporting this to <
+ 3.14 kernels. Imagine a user switching between the trusty HWE kernel and
+ the GA kernel.
+ 
  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/
  
  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a bug
  in the detection code somewhere.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  This is because of an change in v3.14 that accidentally changed how data was 
written - as described in the upstream commit message:
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a 

[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-10-24 Thread dann frazier
** Description changed:

  Users of RAID0 arrays are susceptible to a corruption issue if:
-  - The members of the RAID array are not all the same size[*]
-  - Data has been written to the array while running kernels < 3.14 and >= 
3.14.
+  - The members of the RAID array are not all the same size[*]
+  - Data has been written to the array while running kernels < 3.14 and >= 
3.14.
  
  Upstream is dealing with this by adding a versioned layout in v5.4, and
  backporting that via stable. Version 1 is the pre-3.14 layout, Version 2
- is post 3.14. However, unless a layout-version-aware kernel *created*
- the array, there's no way for the kernel to know which version was used
- to write the existing data. This undefined mode is considered "Version
- 0", and the kernel will now refuse to start these arrays w/o user
- intervention.
+ is post 3.14. Mixing version 1 & version 2 layouts can cause corruption.
+ However, unless a layout-version-aware kernel *created* the array,
+ there's no way for the kernel to know which version(s) was used to write
+ the existing data. This undefined mode is considered "Version 0", and
+ the kernel will now refuse to start these arrays w/o user intervention.
  
  These changes are now coming into our kernels via stable backports of
  the following commit, which describes the problem in the commit message:
  
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
  
  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:
  
  Here's the message which , as you can see from the log in Comment #1, is
  hidden in a ton of other messages:
  
  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524
  
- 
- What that is trying to say is that you should determine if your data - 
specifically the data toward the end of your array - was most likely written 
with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel 
parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel 
command line. And note it should be *raid0.default_layout* not 
*raid.default_layout* as the message says - a fix for that message is now 
queued for stable.
+ What that is trying to say is that you should determine if your data -
+ specifically the data toward the end of your array - was most likely
+ written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with
+ the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on
+ the kernel command line. And note it should be *raid0.default_layout*
+ not *raid.default_layout* as the message says - a fix for that message
+ is now queued for stable:
  
  
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
+ 
+ IMHO, we should work with upstream to create a web page that clearly
+ walks the user through this process, and update the error message to
+ point to that page. I'd also like to see if we can detect this problem
+ *before* the user reboots (debconf?) and help the user fix things. e.g.
+ "We detected that you have RAID0 arrays that maybe susceptible to a
+ corruption problem", guide the user to choosing a layout, and update the
+ mdadm initramfs hook to poke the answer in via sysfs before starting the
+ array on reboot.
+ 
  
  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a bug
  in the detection code somewhere.

** Description changed:

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
-  - Data has been written to the array while running kernels < 3.14 and >= 
3.14.
+  - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.
  
  Upstream is dealing with this by adding a versioned layout in v5.4, and
  backporting that via stable. Version 1 is the pre-3.14 layout, Version 2
  is post 3.14. Mixing version 1 & version 2 layouts can cause corruption.
  However, unless a layout-version-aware kernel *created* the array,
  there's no way for the kernel to know which version(s) was used to write
  the existing data. This undefined mode is considered "Version 0", and
  the kernel will now refuse to start these arrays w/o user intervention.
  
  These changes are now coming into our kernels via stable backports of
  the following commit, which describes the problem in the commit message:
  
  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
  
  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
 

[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-10-24 Thread dann frazier
** Description changed:

- [Impact]
- After installing the 4.15.0-67.76 kernel from bionic-proposed, our Nvidia 
DGX2 system is no longer bootable.
+ Users of RAID0 arrays are susceptible to a corruption issue if:
+  - The members of the RAID array are not all the same size[*]
+  - Data has been written to the array while running kernels < 3.14 and >= 
3.14.
  
- [Test Case]
- [Fix]
- [Regression Risk]
+ Upstream is dealing with this by adding a versioned layout in v5.4, and
+ backporting that via stable. Version 1 is the pre-3.14 layout, Version 2
+ is post 3.14. However, unless a layout-version-aware kernel *created*
+ the array, there's no way for the kernel to know which version was used
+ to write the existing data. This undefined mode is considered "Version
+ 0", and the kernel will now refuse to start these arrays w/o user
+ intervention.
+ 
+ These changes are now coming into our kernels via stable backports of
+ the following commit, which describes the problem in the commit message:
+ 
+ 
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
+ 
+ The user experience is pretty awful here. A user upgrades to the next
+ SRU and all of a sudden their system stops at an (initramfs) prompt. A
+ clueful user can spot something like the following in dmesg:
+ 
+ Here's the message which , as you can see from the log in Comment #1, is
+ hidden in a ton of other messages:
+ 
+ [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
+ [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
+ [ 72.733979] md: pers->run() failed ...
+ mdadm: failed to start array /dev/md0: Unknown error 524
+ 
+ 
+ What that is trying to say is that you should determine if your data - 
specifically the data toward the end of your array - was most likely written 
with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel 
parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel 
command line. And note it should be *raid0.default_layout* not 
*raid.default_layout* as the message says - a fix for that message is now 
queued for stable.
+ 
+ 
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
+ 
+ [*] Which surprisingly is not the case reported in this bug - the user
+ here had a raid0 of 8 identically-sized devices. I suspect there's a bug
+ in the detection code somewhere.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 
3.14.

  Upstream is dealing with this by adding a versioned layout in v5.4,
  and backporting that via stable. Version 1 is the pre-3.14 layout,
  Version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, unless a layout-version-aware kernel *created*
  the array, there's no way for the kernel to know which version(s) was
  used to write the existing data. This undefined mode is considered
  "Version 0", and the kernel will now refuse to start these arrays w/o
  user intervention.

  These changes are now coming into our kernels via stable backports of
  the following commit, which describes the problem in the commit
  message:

  
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  

[Kernel-packages] [Bug 1849682] Re: [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout setting

2019-10-24 Thread dann frazier
OK - this is a messy one. It is due to the backport of this:
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

Reverting that is probably not the right answer because the point of it
is to avoid corruption. But this is a pretty serious usability issue. It
is not at all clear from the message that a user needs to do *something*
- and what that *something* is is even less clear:

Here's the message, buried in a ton of other messages:
[   72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with 
default_layout setting
[   72.728149] md/raid0: please set raid.default_layout to 1 or 2
[   72.733979] md: pers->run() failed ...
mdadm: failed to start array /dev/md0: Unknown error 524

So if you understand from that that you need to pass a kernel parameter,
you're more intuitive than I am. And if you understand from that *why*,
and *to which one* - well, you probably wrote the patch. And even then,
you probably didn't realize the parameter is actually incorrect (HINT:
we should backport this as well:
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571).

IMO, the error message should include a URL to page with clear steps on
how to proceed which I think is something along the lines of "Use mdadm
to figure out when your array was created, figure out what kernel you
were running back then (ideally with a mapping to Ubuntu release), and
then how to fix it.

That said, it isn't clear to me why we saw this issue on this specific
machine. This issue is supposedly restricted to only multi-zone RAID0
configs, which should only happen if not all members are the same size.
But I happen to know that all members on this system here *are* the same
size! I've tried to reproduce it but, after redeploying the system with
MAAS, it upgrades and reboots w/o error :(

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1849682

Title:
  [REGRESSION]  md/raid0: cannot assemble multi-zone RAID0 with
  default_layout setting

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Incomplete
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  [Impact]
  After installing the 4.15.0-67.76 kernel from bionic-proposed, our Nvidia 
DGX2 system is no longer bootable.

  [Test Case]
  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1849682/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp