------- Comment From [email protected] 2016-11-14 07:39 EDT-------

------- Comment From [email protected] 2016-11-14 07:49 EDT-------

------- Comment From [email protected] 2016-11-14 08:35
EDT-------

------- Comment From [email protected] 2016-11-14 19:02 EDT-------
(In reply to comment #15)
> On 14 November 2016 at 11:49, bugproxy <[email protected]> wrote:
> > (In reply to comment #7)
> >> > (In reply to comment #1)
> >> > The installation was on a FCP SCSI SAN volumes each with two active 
> >> > paths.
> >> > Multipath was involved.  The system IPLed fine up to the point that we
> >> > expanded the /root filesystem to span volumes.  At boot time,  the system
> >> > was unable to locate the second segment of the /root filesystem.   The 
> >> > error
> >> > message indicated this was due to lvmetad not being not active.
> >> For the zfcp case, did you use the chzdev tool to activate the paths of 
> >> your
> >> new additional LVM physical volume (PV)?
> >
> > Initially, the paths to the second luns were brought online manually
> > with "echo 0x4000400f00000000 > unit_add".  Then I followed up by
> > running the "chzdev zfcp-lun -e --online" command and verified they were
>
> After chzdev, one must run $ update-initramfs -u; such that the below
> generated udev rules are copied into the initramfs.
>
> To recover this system. Ubuntu initramfs should drop you into a
> busybox shell, navigate sysfs to online the required device, execute
> vgscan. At that point it should be sufficient to exit the busybox
> shell and initramfs should continue boot as normal.
>
> Once the boot is complete, run $ sudo update-initramfs -u, and reboot.
> It should boot fine from now on.

When the system enters BusyBox,  none of the PV, VG and LV commands are
available since the Root file system was not mounted. So I was unable to
issue the vgscan as instructed. However,  I was able to manually repeat
the boot process to get the system to the point that it recognized the
root LVM. At that point I was able to issue the update-initramfs.  Here
are the steps I took to recover the system that worked.

/scripts
(initramfs)
cd init-premount
(initramfs)
./lvm2
ln: /tmp/mountroot-fail-hooks.d/20-lvm2: File exists
(initramfs)
./mdadm
ln: /tmp/mountroot-fail-hooks.d/10-mdadm: File exists
(initramfs)
cd ..
(initramfs)
cd local-top
(initramfs)
./iscsi
(initramfs)
./lvm2
lvmetad is not active yet, using direct activation during sysinit
(initramfs)
cd ..
(initramfs)
cd local-premount
(initramfs)
./btrfs
Scanning for Btrfs filesystems
(initramfs)
cd ..
(initramfs)
cd local-block
(initramfs)
./lvm2
(initramfs)
Exit                          (AT THIS POINT THE SYSTEM was able to COMPLETE 
Booting. The root file system was mounted.)
Begin: Will now check root file system ... fsck from util-linux 2.27.1
[/sbin/fsck.ext4 (1) -- /dev/mapper/ub01--vg-root] fsck.ext4 -a -C0 /dev/mapper/
ub01--vg-root
/dev/mapper/ub01--vg-root: clean, 121359/6686224 files, 2860629/26895360 blocks
done.

Once boot completed,  I issued update-initramfs ?u  and rebooted the
system.  Everything looked good.   So is that the supposed way to do?

Thank you for the help.

>
> > online and persistent with the lszdev command.
> >
> > Below are the rules files for the 0.0.e100 and 0.0.e300 paths.   Below
> > that is the output of the lszdev command. The date on these files is
> > 10/26/2016.  The output from the lszdev command is also from 10/26/2016.
> >
> > cat 41-zfcp-lun-0.0.e100.rules
> > # Generated by chzdev
> > ACTION=="add", SUBSYSTEMS=="ccw", KERNELS=="0.0.e100", 
> > GOTO="start_zfcp_lun_0.0.e100"
> > GOTO="end_zfcp_lun_0.0.e100"
> >
> > LABEL="start_zfcp_lun_0.0.e100"
> > SUBSYSTEM=="fc_remote_ports", ATTR{port_name}=="0x5005076306135700", 
> > GOTO="cfg_fc_0.0.e100_0x5005076306135700"
> > SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074675712", 
> > KERNELS=="rport-*", 
> > ATTRS{fc_remote_ports/$id/port_name}=="0x5005076306135700", 
> > GOTO="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400e00000000"
> > SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074741248", 
> > KERNELS=="rport-*", 
> > ATTRS{fc_remote_ports/$id/port_name}=="0x5005076306135700", 
> > GOTO="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400f00000000"
> > GOTO="end_zfcp_lun_0.0.e100"
> >
> > LABEL="cfg_fc_0.0.e100_0x5005076306135700"
> > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4000400e00000000"
> > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4000400f00000000"
> > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4000401200000000"
> > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4001400d00000000"
> > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4001401100000000"
> > GOTO="end_zfcp_lun_0.0.e100"
> >
> > LABEL="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400e00000000"
> > ATTR{queue_depth}="32"
> > GOTO="end_zfcp_lun_0.0.e100"
> >
> > LABEL="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400f00000000"
> > ATTR{queue_depth}="32"
> > GOTO="end_zfcp_lun_0.0.e100"
> >
> > LABEL="end_zfcp_lun_0.0.e100"
> >
> > ----------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > cat 41-zfcp-lun-0.0.e300.rules
> > # Generated by chzdev
> > ACTION=="add", SUBSYSTEMS=="ccw", KERNELS=="0.0.e300", 
> > GOTO="start_zfcp_lun_0.0.e300"
> > GOTO="end_zfcp_lun_0.0.e300"
> >
> > LABEL="start_zfcp_lun_0.0.e300"
> > SUBSYSTEM=="fc_remote_ports", ATTR{port_name}=="0x500507630618d700", 
> > GOTO="cfg_fc_0.0.e300_0x500507630618d700"
> > SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074675712", 
> > KERNELS=="rport-*", 
> > ATTRS{fc_remote_ports/$id/port_name}=="0x500507630618d700", 
> > GOTO="cfg_scsi_0.0.e300_0x500507630618d700_0x4000400e00000000"
> > SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074741248", 
> > KERNELS=="rport-*", 
> > ATTRS{fc_remote_ports/$id/port_name}=="0x500507630618d700", 
> > GOTO="cfg_scsi_0.0.e300_0x500507630618d700_0x4000400f00000000"
> > GOTO="end_zfcp_lun_0.0.e300"
> >
> > LABEL="cfg_fc_0.0.e300_0x500507630618d700"
> > ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4000400e00000000"
> > ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4000400f00000000"
> > ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4000401200000000"
> > ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4001400d00000000"
> > ATTR{[ccw/0.0.e300]0x500507630618d700/unit_add}="0x4001401100000000"
> > GOTO="end_zfcp_lun_0.0.e300"
> >
> > LABEL="cfg_scsi_0.0.e300_0x500507630618d700_0x4000400e00000000"
> > ATTR{queue_depth}="32"
> > GOTO="end_zfcp_lun_0.0.e300"
> >
> > LABEL="cfg_scsi_0.0.e300_0x500507630618d700_0x4000400f00000000"
> > ATTR{queue_depth}="32"
> > GOTO="end_zfcp_lun_0.0.e300"
> >
> > LABEL="end_zfcp_lun_0.0.e300"
> >
> > ----------------------------------------------------------------------------------------------------------------------------------
> >
> > Output from lfzdev
> > zfcp-lun 0.0.e100:0x5005076306135700:0x4000400e00000000 yes yes sda sg0
> > zfcp-lun 0.0.e100:0x5005076306135700:0x4000400f00000000 yes yes sdc sg2
> > zfcp-lun 0.0.e300:0x500507630618d700:0x4000400e00000000 yes yes sdb sg1
> > zfcp-lun 0.0.e300:0x500507630618d700:0x4000400f00000000 yes yes sdd sg3
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401300000000 yes yes sde sg4
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401400000000 yes yes sdf sg5
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401500000000 yes yes sdg sg6
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401600000000 yes yes sdh sg7
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401700000000 yes yes sdi sg8
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401800000000 yes yes sdj sg9
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401900000000 yes yes sdk sg10
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4000401a00000000 yes yes sdl sg11
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4001401200000000 yes yes sdm sg12
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4001401300000000 yes yes sdn sg13
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4001401400000000 yes yes sdo sg14
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4001401500000000 yes yes sdp sg15
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4001401600000000 yes yes sdq sg16
> > zfcp-lun 0.0.e500:0x500507630633d700:0x4001401700000000 yes yes sdr sg17
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401300000000 yes yes sds sg18
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401400000000 yes yes sdt sg19
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401500000000 yes yes sdu sg20
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401600000000 yes yes sdv sg21
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401700000000 yes yes sdw sg22
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401800000000 yes yes sdx sg23
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401900000000 yes yes sdy sg24
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4000401a00000000 yes yes sdz sg25
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4001401200000000 yes yes sdaa sg26
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4001401300000000 yes yes sdab sg27
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4001401400000000 yes yes sdac sg28
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4001401500000000 yes yes sdad sg29
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4001401600000000 yes yes sdae sg30
> > zfcp-lun 0.0.e700:0x5005076306389700:0x4001401700000000 yes yes sdaf sg31
> >
> >> This is the only supported post-install method to (dynamically and)
> >> persistently activate zfcp-attached FCP LUNs. See also
> >> http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/
> >> ludd_t_fcp_wrk_addu.html.
> >>
> >> > PV Volume information:
> >> > physical_volumes {
> >> >
> >> >                pv0 {
> >> >                        device = "/dev/sdb5"        # Hint only
> >>
> >> >                pv1 {
> >> >                        device = "/dev/sda"        # Hint only
> >>
> >> This does not look very good, having single path scsi disk devices 
> >> mentioned
> >> by LVM. With zfcp-attached SCSI disks, LVM must be on top of multipathing.
> >> Could you please double check if your installation with LVM and 
> >> multipathing
> >> does the correct layering? If not, this would be an independent bug. See
> >> also [1, slide 28 "Multipathing for Disks ? LVM on Top"].
> >>
> >> > Additional testing has been done with CKD volumes and we see the same
> >> > behavior.
> >> > Because of this behavior, I do not
> >> > believe the problem is related to SAN disk or multipath.   I think it is 
> >> > due
> >> > to the system not being able to read the UUID on any PV in the VG other 
> >> > then
> >> > the IPL disk.
> >>
> >> For any disk device type, the initrd must contain all information how to
> >> enable/activate all paths of the entire block device dependency tree
> >> required to mount the root file system. An example for a dependency tree is
> >> in [1, slide 37] and such example is independent of any particular Linux
> >> distribution.
> >> I don't know how much automatic dependency tracking Ubuntu does for the
> >> user, especially regarding additional z-specific device activation steps
> >> ("setting online" as for DASD or zFCP). Potentially the user must take care
> >> of the dependency tree himself and ensure the necessary information lands 
> >> in
> >> the initrd.
> >>
> >> Once the dependency tree of the root-fs has changed (such as adding a PV to
> >> an LVM containing the root-fs as in your case), you must re-create the
> >> initrd with the following command before any reboot:
> >> $ update-initramfs -u
> >
> > The "update-initramfs -u" command was never explicitly run after the system 
> > was built.
> > The second PV volume was added to VG on 10/26/2016.  However,  it was not 
> > until early November that the root FS was extended.
> >
> > Between 10/16/2016 and the date the root fs was extended,  the second PV 
> > was always online and and active in a VG and LV display after every Reboot.
> > I have a note in my runlog with the following from 10/26/2016
> >>>>Rebooted the system and all is working. Both disks are there and 
> >>>>everything is online.
> > lsscsi
> > [0:0:0:1074675712]disk IBM 2107900 1.69 /dev/sdb         <-----  This would 
> > be 0x400E4000
> > [0:0:0:1074741248]disk IBM 2107900 1.69 /dev/sdd         <-----  This would 
> > be 0x400F4000
> > [1:0:0:1074675712]disk IBM 2107900 1.69 /dev/sda
> > [1:0:0:1074741248]disk IBM 2107900 1.69 /dev/sdc
> >
> >>
> >> On z Systems, this also contains the necessary step to re-write the boot
> >> record (using the zipl bootloader management tool) so it correctly points 
> >> to
> >> the new initrd.
> >> See also
> >> http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/
> >> ludd_t_fcp_wrk_on.html.
> >>
> >>
> >> In your case on reboot, it only activated 2 paths to FCP LUN
> >> 0x4000400e00000000 (I cannot determine the target port WWPN(s) from below
> >> output because it does not convey this info) from two different FCP devices
> >> 0.0.e300 and 0.0.e100.
> >> From attachment 113696 [details]:
> >> [    6.666977] scsi host0: zfcp
> >> [    6.671670] random: nonblocking pool is initialized
> >> [    6.672622] qdio: 0.0.e300 ZFCP on SC 2cc5 using AI:1 QEBSM:0 PRI:1 
> >> TDD:1
> >> SIGA: W AP
> >> [    6.722312] scsi host1: zfcp
> >> [    6.724547] scsi 0:0:0:1074675712: Direct-Access     IBM      2107900
> >> 1.69 PQ: 0 ANSI: 5
> >> [    6.725159] sd 0:0:0:1074675712: alua: supports implicit TPGS
> >> [    6.725164] sd 0:0:0:1074675712: alua: device
> >> naa.6005076306ffd700000000000000000e port group 0 rel port 303
> >> [    6.725287] sd 0:0:0:1074675712: Attached scsi generic sg0 type 0
> >> [    6.728234] qdio: 0.0.e100 ZFCP on SC 2c85 using AI:1 QEBSM:0 PRI:1 
> >> TDD:1
> >> SIGA: W AP
> >> [    6.747662] sd 0:0:0:1074675712: alua: transition timeout set to 60
> >> seconds
> >> [    6.747667] sd 0:0:0:1074675712: alua: port group 00 state A preferred
> >> supports tolusnA
> >> [    6.747801] sd 0:0:0:1074675712: [sda] 209715200 512-byte logical 
> >> blocks:
> >> (107 GB/100 GiB)
> >> [    6.748652] sd 0:0:0:1074675712: [sda] Write Protect is off
> >> [    6.749024] sd 0:0:0:1074675712: [sda] Write cache: enabled, read cache:
> >> enabled, doesn't support DPO or FUA
> >> [    6.752076]  sda: sda1 sda2 < sda5 >
> >> [    6.754107] sd 0:0:0:1074675712: [sda] Attached SCSI disk
> >> [    6.760935] scsi 1:0:0:1074675712: Direct-Access     IBM      2107900
> >> 1.69 PQ: 0 ANSI: 5
> >> [    6.761444] sd 1:0:0:1074675712: alua: supports implicit TPGS
> >> [    6.761448] sd 1:0:0:1074675712: alua: device
> >> naa.6005076306ffd700000000000000000e port group 0 rel port 231
> >> [    6.761514] sd 1:0:0:1074675712: Attached scsi generic sg1 type 0
> >> [    6.787710] sd 1:0:0:1074675712: [sdb] 209715200 512-byte logical 
> >> blocks:
> >> (107 GB/100 GiB)
> >> [    6.787770] sd 1:0:0:1074675712: alua: port group 00 state A preferred
> >> supports tolusnA
> >> [    6.788464] sd 1:0:0:1074675712: [sdb] Write Protect is off[    
> >> 6.788728]
> >> sd 1:0:0:1074675712: [sdb] Write cache: enabled, read cache: enabled,
> >> doesn't support DPO or FUA
> >> [    6.790829]  sdb: sdb1 sdb2 < sdb5 >
> >> [    6.792535] sd 1:0:0:1074675712: [sdb] Attached SCSI disk
> >
> > I see what you are saying,  only 1074675712 (0x400E4000) is coming
> > online at boot.  107474128 (0x400f4000) does not come online at boot.
> > The second device must be coming online after boot has completed and
> > that is why lsscsi shows it online.  And since the boot partition is on
> > the first segment, the system can read initrd and start the boot.  But
> > when it goes to mount root,  it is not aware of the second segment.   Do
> > I have this right?
> >
> > If so, that brings me to the next question.  If this is the case,  do
> > you have a procedure where I could bring up a rescue system,  bring
> > volumes 1074675712 (0x400E4000) & 107474128 (0x400f4000) online, chroot
> > and then update the initrd with the second volume?  or do I need to
> > rebuild the system from scratch?
> >
> >>
> >>
> >> REFERENCE
> >>
> >> [1]
> >> http://www-05.ibm.com/de/events/linux-on-z/pdf/day2/4_Steffen_Maier_zfcp-
> >> best-practices-2015.pdf
> >
> > --
> > You received this bug notification because you are a bug assignee.
> > Matching subscriptions: s390x
> > https://bugs.launchpad.net/bugs/1641078
> >
> > Title:
> >   System cannot be booted up when root filesystem is on an LVM on two
> >   disks
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/ubuntu-z-systems/+bug/1641078/+subscriptions
>
> --
> Regards,
>
> Dimitri.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1641078

Title:
  System cannot be booted up when root filesystem is on an LVM on two
  disks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1641078/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to