------- Comment From [email protected] 2016-11-14 07:39 EDT------- ------- Comment From [email protected] 2016-11-14 07:49 EDT-------
------- Comment From [email protected] 2016-11-14 08:35 EDT------- ------- Comment From [email protected] 2016-11-14 19:02 EDT------- (In reply to comment #15) > On 14 November 2016 at 11:49, bugproxy <[email protected]> wrote: > > (In reply to comment #7) > >> > (In reply to comment #1) > >> > The installation was on a FCP SCSI SAN volumes each with two active > >> > paths. > >> > Multipath was involved. The system IPLed fine up to the point that we > >> > expanded the /root filesystem to span volumes. At boot time, the system > >> > was unable to locate the second segment of the /root filesystem. The > >> > error > >> > message indicated this was due to lvmetad not being not active. > >> For the zfcp case, did you use the chzdev tool to activate the paths of > >> your > >> new additional LVM physical volume (PV)? > > > > Initially, the paths to the second luns were brought online manually > > with "echo 0x4000400f00000000 > unit_add". Then I followed up by > > running the "chzdev zfcp-lun -e --online" command and verified they were > > After chzdev, one must run $ update-initramfs -u; such that the below > generated udev rules are copied into the initramfs. > > To recover this system. Ubuntu initramfs should drop you into a > busybox shell, navigate sysfs to online the required device, execute > vgscan. At that point it should be sufficient to exit the busybox > shell and initramfs should continue boot as normal. > > Once the boot is complete, run $ sudo update-initramfs -u, and reboot. > It should boot fine from now on. When the system enters BusyBox, none of the PV, VG and LV commands are available since the Root file system was not mounted. So I was unable to issue the vgscan as instructed. However, I was able to manually repeat the boot process to get the system to the point that it recognized the root LVM. At that point I was able to issue the update-initramfs. Here are the steps I took to recover the system that worked. /scripts (initramfs) cd init-premount (initramfs) ./lvm2 ln: /tmp/mountroot-fail-hooks.d/20-lvm2: File exists (initramfs) ./mdadm ln: /tmp/mountroot-fail-hooks.d/10-mdadm: File exists (initramfs) cd .. (initramfs) cd local-top (initramfs) ./iscsi (initramfs) ./lvm2 lvmetad is not active yet, using direct activation during sysinit (initramfs) cd .. (initramfs) cd local-premount (initramfs) ./btrfs Scanning for Btrfs filesystems (initramfs) cd .. (initramfs) cd local-block (initramfs) ./lvm2 (initramfs) Exit (AT THIS POINT THE SYSTEM was able to COMPLETE Booting. The root file system was mounted.) Begin: Will now check root file system ... fsck from util-linux 2.27.1 [/sbin/fsck.ext4 (1) -- /dev/mapper/ub01--vg-root] fsck.ext4 -a -C0 /dev/mapper/ ub01--vg-root /dev/mapper/ub01--vg-root: clean, 121359/6686224 files, 2860629/26895360 blocks done. Once boot completed, I issued update-initramfs ?u and rebooted the system. Everything looked good. So is that the supposed way to do? Thank you for the help. > > > online and persistent with the lszdev command. > > > > Below are the rules files for the 0.0.e100 and 0.0.e300 paths. Below > > that is the output of the lszdev command. The date on these files is > > 10/26/2016. The output from the lszdev command is also from 10/26/2016. > > > > cat 41-zfcp-lun-0.0.e100.rules > > # Generated by chzdev > > ACTION=="add", SUBSYSTEMS=="ccw", KERNELS=="0.0.e100", > > GOTO="start_zfcp_lun_0.0.e100" > > GOTO="end_zfcp_lun_0.0.e100" > > > > LABEL="start_zfcp_lun_0.0.e100" > > SUBSYSTEM=="fc_remote_ports", ATTR{port_name}=="0x5005076306135700", > > GOTO="cfg_fc_0.0.e100_0x5005076306135700" > > SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074675712", > > KERNELS=="rport-*", > > ATTRS{fc_remote_ports/$id/port_name}=="0x5005076306135700", > > GOTO="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400e00000000" > > SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074741248", > > KERNELS=="rport-*", > > ATTRS{fc_remote_ports/$id/port_name}=="0x5005076306135700", > > GOTO="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400f00000000" > > GOTO="end_zfcp_lun_0.0.e100" > > > > LABEL="cfg_fc_0.0.e100_0x5005076306135700" > > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4000400e00000000" > > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4000400f00000000" > > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4000401200000000" > > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4001400d00000000" > > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4001401100000000" > > GOTO="end_zfcp_lun_0.0.e100" > > > > LABEL="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400e00000000" > > ATTR{queue_depth}="32" > > GOTO="end_zfcp_lun_0.0.e100" > > > > LABEL="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400f00000000" > > ATTR{queue_depth}="32" > > GOTO="end_zfcp_lun_0.0.e100" > > > > LABEL="end_zfcp_lun_0.0.e100" > > > > ---------------------------------------------------------------------------------------------------------------------------------------------------- > > > > cat 41-zfcp-lun-0.0.e300.rules > > # Generated by chzdev > > ACTION=="add", SUBSYSTEMS=="ccw", KERNELS=="0.0.e300", > > GOTO="start_zfcp_lun_0.0.e300" > > GOTO="end_zfcp_lun_0.0.e300" > > > > LABEL="start_zfcp_lun_0.0.e300" > > SUBSYSTEM=="fc_remote_ports", ATTR{port_name}=="0x500507630618d700", > > GOTO="cfg_fc_0.0.e300_0x500507630618d700" > > SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074675712", > > KERNELS=="rport-*", > > ATTRS{fc_remote_ports/$id/port_name}=="0x500507630618d700", > > GOTO="cfg_scsi_0.0.e300_0x500507630618d700_0x4000400e00000000" > > SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074741248", > > KERNELS=="rport-*", > > ATTRS{fc_remote_ports/$id/port_name}=="0x500507630618d700", > > GOTO="cfg_scsi_0.0.e300_0x500507630618d700_0x4000400f00000000" > > GOTO="end_zfcp_lun_0.0.e300" > > > > LABEL="cfg_fc_0.0.e300_0x500507630618d700" > > ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4000400e00000000" > > ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4000400f00000000" > > ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4000401200000000" > > ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4001400d00000000" > > ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4001401100000000" > > GOTO="end_zfcp_lun_0.0.e300" > > > > LABEL="cfg_scsi_0.0.e300_0x500507630618d700_0x4000400e00000000" > > ATTR{queue_depth}="32" > > GOTO="end_zfcp_lun_0.0.e300" > > > > LABEL="cfg_scsi_0.0.e300_0x500507630618d700_0x4000400f00000000" > > ATTR{queue_depth}="32" > > GOTO="end_zfcp_lun_0.0.e300" > > > > LABEL="end_zfcp_lun_0.0.e300" > > > > ---------------------------------------------------------------------------------------------------------------------------------- > > > > Output from lfzdev > > zfcp-lun 0.0.e100:0x5005076306135700:0x4000400e00000000 yes yes sda sg0 > > zfcp-lun 0.0.e100:0x5005076306135700:0x4000400f00000000 yes yes sdc sg2 > > zfcp-lun 0.0.e300:0x500507630618d700:0x4000400e00000000 yes yes sdb sg1 > > zfcp-lun 0.0.e300:0x500507630618d700:0x4000400f00000000 yes yes sdd sg3 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401300000000 yes yes sde sg4 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401400000000 yes yes sdf sg5 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401500000000 yes yes sdg sg6 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401600000000 yes yes sdh sg7 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401700000000 yes yes sdi sg8 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401800000000 yes yes sdj sg9 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401900000000 yes yes sdk sg10 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401a00000000 yes yes sdl sg11 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4001401200000000 yes yes sdm sg12 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4001401300000000 yes yes sdn sg13 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4001401400000000 yes yes sdo sg14 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4001401500000000 yes yes sdp sg15 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4001401600000000 yes yes sdq sg16 > > zfcp-lun 0.0.e500:0x500507630633d700:0x4001401700000000 yes yes sdr sg17 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401300000000 yes yes sds sg18 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401400000000 yes yes sdt sg19 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401500000000 yes yes sdu sg20 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401600000000 yes yes sdv sg21 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401700000000 yes yes sdw sg22 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401800000000 yes yes sdx sg23 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401900000000 yes yes sdy sg24 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401a00000000 yes yes sdz sg25 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4001401200000000 yes yes sdaa sg26 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4001401300000000 yes yes sdab sg27 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4001401400000000 yes yes sdac sg28 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4001401500000000 yes yes sdad sg29 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4001401600000000 yes yes sdae sg30 > > zfcp-lun 0.0.e700:0x5005076306389700:0x4001401700000000 yes yes sdaf sg31 > > > >> This is the only supported post-install method to (dynamically and) > >> persistently activate zfcp-attached FCP LUNs. See also > >> http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/ > >> ludd_t_fcp_wrk_addu.html. > >> > >> > PV Volume information: > >> > physical_volumes { > >> > > >> > pv0 { > >> > device = "/dev/sdb5" # Hint only > >> > >> > pv1 { > >> > device = "/dev/sda" # Hint only > >> > >> This does not look very good, having single path scsi disk devices > >> mentioned > >> by LVM. With zfcp-attached SCSI disks, LVM must be on top of multipathing. > >> Could you please double check if your installation with LVM and > >> multipathing > >> does the correct layering? If not, this would be an independent bug. See > >> also [1, slide 28 "Multipathing for Disks ? LVM on Top"]. > >> > >> > Additional testing has been done with CKD volumes and we see the same > >> > behavior. > >> > Because of this behavior, I do not > >> > believe the problem is related to SAN disk or multipath. I think it is > >> > due > >> > to the system not being able to read the UUID on any PV in the VG other > >> > then > >> > the IPL disk. > >> > >> For any disk device type, the initrd must contain all information how to > >> enable/activate all paths of the entire block device dependency tree > >> required to mount the root file system. An example for a dependency tree is > >> in [1, slide 37] and such example is independent of any particular Linux > >> distribution. > >> I don't know how much automatic dependency tracking Ubuntu does for the > >> user, especially regarding additional z-specific device activation steps > >> ("setting online" as for DASD or zFCP). Potentially the user must take care > >> of the dependency tree himself and ensure the necessary information lands > >> in > >> the initrd. > >> > >> Once the dependency tree of the root-fs has changed (such as adding a PV to > >> an LVM containing the root-fs as in your case), you must re-create the > >> initrd with the following command before any reboot: > >> $ update-initramfs -u > > > > The "update-initramfs -u" command was never explicitly run after the system > > was built. > > The second PV volume was added to VG on 10/26/2016. However, it was not > > until early November that the root FS was extended. > > > > Between 10/16/2016 and the date the root fs was extended, the second PV > > was always online and and active in a VG and LV display after every Reboot. > > I have a note in my runlog with the following from 10/26/2016 > >>>>Rebooted the system and all is working. Both disks are there and > >>>>everything is online. > > lsscsi > > [0:0:0:1074675712]disk IBM 2107900 1.69 /dev/sdb <----- This would > > be 0x400E4000 > > [0:0:0:1074741248]disk IBM 2107900 1.69 /dev/sdd <----- This would > > be 0x400F4000 > > [1:0:0:1074675712]disk IBM 2107900 1.69 /dev/sda > > [1:0:0:1074741248]disk IBM 2107900 1.69 /dev/sdc > > > >> > >> On z Systems, this also contains the necessary step to re-write the boot > >> record (using the zipl bootloader management tool) so it correctly points > >> to > >> the new initrd. > >> See also > >> http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/ > >> ludd_t_fcp_wrk_on.html. > >> > >> > >> In your case on reboot, it only activated 2 paths to FCP LUN > >> 0x4000400e00000000 (I cannot determine the target port WWPN(s) from below > >> output because it does not convey this info) from two different FCP devices > >> 0.0.e300 and 0.0.e100. > >> From attachment 113696 [details]: > >> [ 6.666977] scsi host0: zfcp > >> [ 6.671670] random: nonblocking pool is initialized > >> [ 6.672622] qdio: 0.0.e300 ZFCP on SC 2cc5 using AI:1 QEBSM:0 PRI:1 > >> TDD:1 > >> SIGA: W AP > >> [ 6.722312] scsi host1: zfcp > >> [ 6.724547] scsi 0:0:0:1074675712: Direct-Access IBM 2107900 > >> 1.69 PQ: 0 ANSI: 5 > >> [ 6.725159] sd 0:0:0:1074675712: alua: supports implicit TPGS > >> [ 6.725164] sd 0:0:0:1074675712: alua: device > >> naa.6005076306ffd700000000000000000e port group 0 rel port 303 > >> [ 6.725287] sd 0:0:0:1074675712: Attached scsi generic sg0 type 0 > >> [ 6.728234] qdio: 0.0.e100 ZFCP on SC 2c85 using AI:1 QEBSM:0 PRI:1 > >> TDD:1 > >> SIGA: W AP > >> [ 6.747662] sd 0:0:0:1074675712: alua: transition timeout set to 60 > >> seconds > >> [ 6.747667] sd 0:0:0:1074675712: alua: port group 00 state A preferred > >> supports tolusnA > >> [ 6.747801] sd 0:0:0:1074675712: [sda] 209715200 512-byte logical > >> blocks: > >> (107 GB/100 GiB) > >> [ 6.748652] sd 0:0:0:1074675712: [sda] Write Protect is off > >> [ 6.749024] sd 0:0:0:1074675712: [sda] Write cache: enabled, read cache: > >> enabled, doesn't support DPO or FUA > >> [ 6.752076] sda: sda1 sda2 < sda5 > > >> [ 6.754107] sd 0:0:0:1074675712: [sda] Attached SCSI disk > >> [ 6.760935] scsi 1:0:0:1074675712: Direct-Access IBM 2107900 > >> 1.69 PQ: 0 ANSI: 5 > >> [ 6.761444] sd 1:0:0:1074675712: alua: supports implicit TPGS > >> [ 6.761448] sd 1:0:0:1074675712: alua: device > >> naa.6005076306ffd700000000000000000e port group 0 rel port 231 > >> [ 6.761514] sd 1:0:0:1074675712: Attached scsi generic sg1 type 0 > >> [ 6.787710] sd 1:0:0:1074675712: [sdb] 209715200 512-byte logical > >> blocks: > >> (107 GB/100 GiB) > >> [ 6.787770] sd 1:0:0:1074675712: alua: port group 00 state A preferred > >> supports tolusnA > >> [ 6.788464] sd 1:0:0:1074675712: [sdb] Write Protect is off[ > >> 6.788728] > >> sd 1:0:0:1074675712: [sdb] Write cache: enabled, read cache: enabled, > >> doesn't support DPO or FUA > >> [ 6.790829] sdb: sdb1 sdb2 < sdb5 > > >> [ 6.792535] sd 1:0:0:1074675712: [sdb] Attached SCSI disk > > > > I see what you are saying, only 1074675712 (0x400E4000) is coming > > online at boot. 107474128 (0x400f4000) does not come online at boot. > > The second device must be coming online after boot has completed and > > that is why lsscsi shows it online. And since the boot partition is on > > the first segment, the system can read initrd and start the boot. But > > when it goes to mount root, it is not aware of the second segment. Do > > I have this right? > > > > If so, that brings me to the next question. If this is the case, do > > you have a procedure where I could bring up a rescue system, bring > > volumes 1074675712 (0x400E4000) & 107474128 (0x400f4000) online, chroot > > and then update the initrd with the second volume? or do I need to > > rebuild the system from scratch? > > > >> > >> > >> REFERENCE > >> > >> [1] > >> http://www-05.ibm.com/de/events/linux-on-z/pdf/day2/4_Steffen_Maier_zfcp- > >> best-practices-2015.pdf > > > > -- > > You received this bug notification because you are a bug assignee. > > Matching subscriptions: s390x > > https://bugs.launchpad.net/bugs/1641078 > > > > Title: > > System cannot be booted up when root filesystem is on an LVM on two > > disks > > > > To manage notifications about this bug go to: > > https://bugs.launchpad.net/ubuntu-z-systems/+bug/1641078/+subscriptions > > -- > Regards, > > Dimitri. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1641078 Title: System cannot be booted up when root filesystem is on an LVM on two disks To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1641078/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
