Public bug reported:

[Impact]
MAAS deploys to the wrong NVMe device

[Description]
Since [0], the introduction of NVMe multipath brought a change in the way 
namespaces' "identity" is calculated. It's possible to have a "mismatch" 
between the nvme device names and their corresponding namespace, similar to the 
situation below:

lrwxrwxrwx 1 root root 0 Jul 18 13:25 /sys/block/nvme0n1/device -> ../../nvme1
lrwxrwxrwx 1 root root 0 Jul 18 13:25 /sys/block/nvme1n1/device -> ../../nvme0

This can cause MAAS/curtin to deploy the wrong nvme device, as it's
currently using device names that are subject to change between reboots.
It can be alleviated by using the nvme_core.multipath=0 parameter, but
ideally we should not have MAAS/curtin rely on the device numbers.

A possible solution for this is to ensure that NVMe devices are referred
to by their device ID, as that should keep things consistent between
reboots.

[0] ed754e5dee ("nvme: track shared namespaces")
https://git.kernel.org/linus/ed754e5dee

[Test Case]
On a system with multiple NVMe devices, deploy a Custom OS image with MAAS. As 
the change to device names is not completely deterministic, below are some 
reports of possible symptoms:
- deployed OS ends up in the wrong drive
- disk management presented Disk 0 as uninitialized and Disk 1 with installed OS
- OS fails to boot if only primary drive was listed in boot order

[Regression Potential]
The regression potential for this change should be low, considering that 
MAAS/curtin already have the necessary support for referring to storage devices 
by their ID. A regression could cause deployments to fail consistently, if the 
nvme devices end up being indexed by wrong IDs.

** Affects: curtin
     Importance: Undecided
         Status: New

** Affects: maas
     Importance: Undecided
         Status: New

** Affects: maas (Ubuntu)
     Importance: Undecided
         Status: Incomplete


** Tags: sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1849320

Title:
  MAAS assigns wrong multipath NVMe device

To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/1849320/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to