** Description changed: + [Impact] + Systems with disks that have long spin-up times or otherwise take a while to be detected may be affected by a failure to boot due to the drives underlying multipath devices not being available. + + [Test case] + This issue is difficult to reproduce. + - Boot a system with the boot device on multipath. + This may be limited to POWER LPARs. See description below. + + [Regression Potential] + Given that a new initramfs script is introduced to add a udev trigger with a timeout of 2 minutes (121 seconds), users may notice a delay of up to two minutes in booting if devices take 2 minutes or more to be brought up or detected by udev. + + --- + == Comment: #0 - Manjunatha H R <[email protected]> - 2015-09-25 11:05:36 == Booting of Ubuntu15.10 lpar fails and control falls to initramfs. uname -a -------------- Linux (none) 4.2.0-10-generic #12-Ubuntu SMP Tue Sep 15 19:46:04 UTC 2015 ppc64le GNU/Linux - Boot log: ------------- - Booting a command list + Booting a command list Loading Linux 4.2.0-10-generic ... Loading initial ramdisk ... OF stdout device is: /vdevice/vty@30000000 Preparing to boot Linux version 4.2.0-10-generic (buildd@fisher04) (gcc version 5.2.1 20150911 (Ubuntu 5.2.1-17ubuntu4) ) #12-Ubuntu SMP Tue Sep 15 19:46:04 UTC 2015 (Ubuntu 4.2.0-10.12-generic 4.2.0) Detected machine type: 0000000000000101 Max number of cores passed to firmware: 256 (NR_CPUS = 2048) Calling ibm,client-architecture-support... done command line: BOOT_IMAGE=/boot/vmlinux-4.2.0-10-generic root=UUID=822dd709-5b69-45a9-aba5-63cb55768ffb ro splash quiet topology_updates=off memory layout at init: - memory_limit : 0000000000000000 (16 MB aligned) - alloc_bottom : 000000000bf80000 - alloc_top : 0000000010000000 - alloc_top_hi : 0000000010000000 - rmo_top : 0000000010000000 - ram_top : 0000000010000000 + memory_limit : 0000000000000000 (16 MB aligned) + alloc_bottom : 000000000bf80000 + alloc_top : 0000000010000000 + alloc_top_hi : 0000000010000000 + rmo_top : 0000000010000000 + ram_top : 0000000010000000 found display : /pci@80000002000002c/display@0, opening... done instantiating rtas at 0x000000000eb60000... done prom_hold_cpus: skipped copying OF device tree... Building dt strings... Building dt structure... Device tree strings 0x000000000bf90000 -> 0x000000000bf91965 Device tree struct 0x000000000bfa0000 -> 0x000000000bfe0000 Quiescing Open Firmware ... Booting Linux via __start() ... - -> smp_release_cpus() + -> smp_release_cpus() spinning_secondaries = 199 - <- smp_release_cpus() - <- setup_system() + <- smp_release_cpus() + <- setup_system() [ 2.868103] [drm:radeon_device_init [radeon]] *ERROR* Unable to find PCI I/O BAR [ 3.074553] [drm:radeon_atombios_init [radeon]] *ERROR* Unable to find PCI I/O BAR; using MMIO for ATOM IIO [ 5.060785] lpfc 0002:90:00.0: 0:1303 Link Up Event x1 received Data: x1 x0 x80 x0 x0 x0 0 Scanning for Btrfs filesystems fsck from util-linux 2.26.2 /dev/sdn2 is in use. e2fsck: Cannot continue, aborting. fsck exited with status code 8 [ 36.233086] rport-0:0-9: blocked FC remote port time out: removing rport mount: mounting /dev/sdn2 on /root failed: Device or resource busy Target filesystem doesn't have requested /sbin/init. mount: mounting /dev on /root/dev failed: No such file or directory No init found. Try passing init= bootarg. BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) built-in shell (ash) Enter 'help' for a list of built-in commands. (initramfs) ------------------------- - This lpar is having multipath disks and boot disk is on a multipath disk. Boot passes only whenever fsck tries to scan boot disk via : /dev/dm OR /dev/mapper/mpath Boot Pass scenarios: ---------------------------- 1. Boot passed when fsck tried scanning "/dev/mapper/mpathb" fsck from util-linux 2.26.2 /dev/mapper/mpathb-part2: clean, 81802/3139584 files, 1040598/12558080 blocks 2. Boot passed when fsck tried scanning "/dev/dm-3" Scanning for Btrfs filesystems fsck from util-linux 2.26.2 /dev/dm-3: clean, 81802/3139584 files, 1040605/12558080 blocks - - Boot fails, whenever fsck is called on /dev/sd + Boot fails, whenever fsck is called on /dev/sd Boot fail scenario: Boot failed when fsck is called on "/dev/sdn" ------------------------- Scanning for Btrfs filesystems fsck from util-linux 2.26.2 /dev/sdn2 is in use. e2fsck: Cannot continue, aborting. fsck exited with status code 8 [ 36.108653] rport-0:0-9: blocked FC remote port time out: removing rport mount: mounting /dev/sdn2 on /root failed: Device or resource busy Target filesystem doesn't have requested /sbin/init. mount: mounting /dev on /root/dev failed: No such file or directory No init found. Try passing init= bootarg. BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) built-in shell (ash) Enter 'help' for a list of built-in commands. (initramfs) ------------------------- Contact info: ---------------- Manju ([email protected]) A.P. ([email protected]) == Comment: #12 - Mauricio Faria De Oliveira <[email protected]> - 2015-10-02 13:51:29 == Hi Manju and Alton, I could not reproduce this bug in 2 attempts. The LPAR booted successfully, using the root=UUID= parameter. By looking at this message from the description: > mount: mounting /dev/sdn2 on /root failed: Device or resource busy It should have happened because multipath udev rules failed to update the /dev/disk/by-id/<uuid> symlink from /dev/sdn to /dev/dm-X, but multipathing the path was successful (so it got locked/in-use). If you can reproduce it again, please leave the LPAR in the failing state (in the initramfs), reopen this bug and ping me. I'd be happy to debug it. Thanks! == Comment: #15 - Mauricio Faria De Oliveira <[email protected]> - 2015-10-05 19:59:56 == This is probably a race between the resolve_device() call in mountroot() and the multipath discovery triggered by udev rules. If resolve_device() runs before the root device is multipathed, $ROOT is set to an individual path (eg, /dev/sdf2) rather than its multipah device (eg, /dev/mapper/mpathb-part2), because the /dev/disk/by-uuid/<UUID> symlink is not updated yet. The multipath discovery finishes after $ROOT is set, so the individual path becomes locked, and afterwards the root mount will be attempted on it -- this fails. The LPAR is now patched w/ a test fix that is supposed to ensure resolve_device() only starts after udev rules are finished. Can you try to recreate the issue, please? Thanks! == Comment: #16 - Mauricio Faria De Oliveira <[email protected]> - 2015-10-05 20:05:37 == Console messages - (!) Note: the local-premount messages (Running ... & done.) - occur around SCSI device scan/discovery time. - - ... - Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Loading multipath modules ... [ 5.113397] device-mapper: multipath: version 1.9.0 loaded - Success: loaded module dm-multipath. - Failure: failed to load module dm-emc. - done. - Begin: Discovering multipaths ... done. - done. - Begin: Running /scripts/local-premount ... [ 5.187071] scsi 0:0:0:0: Direct-Access IBM 2107900 .850 PQ: 0 ANSI: 5 - [ 5.195814] sd 0:0:0:0: Attached scsi generic sg0 type 0 - [ 5.203289] sd 0:0:0:0: [sda] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB) - ... - [ 5.878046] sd 0:0:3:3: [sdp] Attached SCSI disk - ... - <10-20 SCSI disks via FC> - ... - [ 5.923577] device-mapper: multipath round-robin: version 1.0.0 loaded - ... - done. - ... - - If resolve_device() runs before the multipath udev rules + (!) Note: the local-premount messages (Running ... & done.) occur + around SCSI device scan/discovery time. + + ... + Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Loading multipath modules ... [ 5.113397] device-mapper: multipath: version 1.9.0 loaded + Success: loaded module dm-multipath. + Failure: failed to load module dm-emc. + done. + Begin: Discovering multipaths ... done. + done. + Begin: Running /scripts/local-premount ... [ 5.187071] scsi 0:0:0:0: Direct-Access IBM 2107900 .850 PQ: 0 ANSI: 5 + [ 5.195814] sd 0:0:0:0: Attached scsi generic sg0 type 0 + [ 5.203289] sd 0:0:0:0: [sda] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB) + ... + [ 5.878046] sd 0:0:3:3: [sdp] Attached SCSI disk + ... + <10-20 SCSI disks via FC> + ... + [ 5.923577] device-mapper: multipath round-robin: version 1.0.0 loaded + ... + done. + ... + + If resolve_device() runs before the multipath udev rules (the rules multipath the root device and update the /dev/disk/by-uuid symlink of $ROOT) this happens: - Begin: Checking root file system ... fsck from util-linux 2.26.2 - /dev/sdf2 is in use. - e2fsck: Cannot continue, aborting. - - fsck exited with status code 8 - done. - Warning: File system check failed but did not detect errors - - mount: mounting /dev/sdf2 on /root failed: Device or resource busy - done. - Target filesystem doesn't have requested /sbin/init. - Begin: Running /scripts/local-bottom ... done. - Begin: Running /scripts/init-bottom ... - ... - mount: mounting /dev on /root/dev failed: No such file or directory - done. - No init found. Try passing init= bootarg. - ... - (initramfs) + Begin: Checking root file system ... fsck from util-linux 2.26.2 + /dev/sdf2 is in use. + e2fsck: Cannot continue, aborting. + + fsck exited with status code 8 + done. + Warning: File system check failed but did not detect errors + + mount: mounting /dev/sdf2 on /root failed: Device or resource busy + done. + Target filesystem doesn't have requested /sbin/init. + Begin: Running /scripts/local-bottom ... done. + Begin: Running /scripts/init-bottom ... + ... + mount: mounting /dev on /root/dev failed: No such file or directory + done. + No init found. Try passing init= bootarg. + ... + (initramfs) So, /dev/sdf2 is in use ... and hits Device or resource busy. This comes from $ROOT. However, the root=UUID= symlink points to the multipath device: - (initramfs) echo $ROOT - /dev/sdf2 - - (initramfs) cat /proc/cmdline - BOOT_IMAGE=/boot/vmlinux-4.2.0-12-generic root=UUID=44bd8a6e-8613-431a-9335-879d8cf5d0e4 ro - - (initramfs) ls -l /dev/disk/by-uuid/44bd8a6e-8613-431a-9335-879d8cf5d0e4 - lrwxrwxrwx 1 11 /dev/disk/by-uuid/44bd8a6e-8613-431a-9335-879d8cf5d0e4 -> ../../dm-19 - - - (initramfs) ls -l /dev/sdf - brw------- 1 8, 80 /dev/sdf - - (initramfs) dmsetup table | grep 8:80 - mpathb: 0 104857600 multipath 0 0 1 1 round-robin 0 4 1 8:80 1 8:16 1 8:144 1 8:208 1 + (initramfs) echo $ROOT + /dev/sdf2 + + (initramfs) cat /proc/cmdline + BOOT_IMAGE=/boot/vmlinux-4.2.0-12-generic root=UUID=44bd8a6e-8613-431a-9335-879d8cf5d0e4 ro + + (initramfs) ls -l /dev/disk/by-uuid/44bd8a6e-8613-431a-9335-879d8cf5d0e4 + lrwxrwxrwx 1 11 /dev/disk/by-uuid/44bd8a6e-8613-431a-9335-879d8cf5d0e4 -> ../../dm-19 + + (initramfs) ls -l /dev/sdf + brw------- 1 8, 80 /dev/sdf + + (initramfs) dmsetup table | grep 8:80 + mpathb: 0 104857600 multipath 0 0 1 1 round-robin 0 4 1 8:80 1 8:16 1 8:144 1 8:208 1 It's probably because resolve_device() was racing w/ the multipath discoveries from udev rules. resolve_device() finished before the /dev/disk/by-uuid/ symlink was updated by multipath discovery. Code: initramfs :: /init - log_begin_msg "Mounting root file system" - ... - mountroot - log_end_msg - - (!) Note: message "Mounting root fs" and call to mountroot() + log_begin_msg "Mounting root file system" + ... + mountroot + log_end_msg + + (!) Note: message "Mounting root fs" and call to mountroot() initramfs :: /scripts/local - mountroot() - { - local_mount_root - } - - local_mount_root() - { - ... - local_premount - - ROOT=$(resolve_device "$ROOT") - ... - } - - (!) Note: local_premount() is the last call before - resolve_device() is called - - local_premount() - { - ... - [ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-premount" - run_scripts /scripts/local-premount - [ "$quiet" != "y" ] && log_end_msg - ... - } - - - So we're testing a call to 'udevadm settle' in /scripts/local-premount/multipath script. + mountroot() + { + local_mount_root + } + + local_mount_root() + { + ... + local_premount + + ROOT=$(resolve_device "$ROOT") + ... + } + + (!) Note: local_premount() is the last call before resolve_device() is + called + + local_premount() + { + ... + [ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-premount" + run_scripts /scripts/local-premount + [ "$quiet" != "y" ] && log_end_msg + ... + } + + So we're testing a call to 'udevadm settle' in /scripts/local- + premount/multipath script. == Comment: #17 - Mauricio Faria De Oliveira <[email protected]> - 2015-10-05 20:08:26 == - == Comment: #18 - Manjunatha H R <[email protected]> - 2015-10-06 06:05:30 == Thank you Mauricio for a quick fix!! Lpar is booting up properly without seeing device or resource busy errors. == Comment: #20 - Mauricio Faria De Oliveira <[email protected]> - 2015-10-06 09:04:50 == Confirmed w/ Manju the # of tests. - 10:00:41 AM: Manjunatha H R: Hi Mauricio, I tried around 10 boots.. + 10:00:41 AM: Manjunatha H R: Hi Mauricio, I tried around 10 boots.. 10:01:07 AM: Manjunatha H R: all times it booted up.. Sounds good. I'll be sending a patch/mirroring.
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1503286 Title: ISST-LTE: Boot of Ubuntu15.10 lpar fails: "mounting /dev/sdn2 on /root failed: Device or resource busy" [multipath] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1503286/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
