We were finally able to run another test with the Cirrus DMS device and I was able to capture the trace when it failed. One point, the root file system is on ECKD disk, the problem we are having is accessing the LUNs that contain our other file systems. Here is the trace, the only thing I can see is that the second path is coming up late.
[ 1.201641] SCSI subsystem initialized [ 1.208476] qeth.87067b: loading core functions [ 1.226248] NET: Registered protocol family 10 [ 1.228215] qeth.933eb7: register layer 2 discipline [ 1.231415] qdio: 0.0.0602 OSA on SC e using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA:RW A [ 1.235242] scsi0 : zfcp [ 1.235309] scsi_eh_0: sleeping [ 1.237109] qdio: 0.0.0400 ZFCP on SC 0 using AI:1 QEBSM:1 PRI:1 TDD:1 SIGA: W A [ 1.248175] netif_napi_add() called with weight 128 on device eth%d [ 1.248737] qeth.cc0c57: 0.0.0600: MAC address 02:00:0b:00:00:12 successfully registered on device eth0 [ 1.248748] qeth.736dae: 0.0.0600: Device is a Guest LAN QDIO card (level: V642) [ 1.248751] with link type GuestLAN QDIO (portname: whatever) Mounting debugfs at /sys/kernel/debug..done No efivars filesystem driver available.Starting udevd: ..done Loading drivers, configuring devices: [ 2.261765] scsi1 : zfcp [ 2.261768] scsi_eh_1: sleeping [ 2.262638] qdio: 0.0.0401 ZFCP on SC 1 using AI:1 QEBSM:1 PRI:1 TDD:1 SIGA: W A [ 2.263937] scsi 0:0:0:0: scsi scan: INQUIRY pass 1 length 36 [ 3.304488] scsi 1:0:0:0: scsi scan: INQUIRY pass 1 length 36 ..done [ 3.318574] Adding 259956k swap on /dev/dasdb1. Priority:-1 extents:1 across:259956k [ 3.319113] Adding 519924k swap on /dev/dasdc1. Priority:-2 extents:1 across:519924k [ 3.352258] device-mapper: uevent: version 1.0.3 [ 3.352314] device-mapper: ioctl: 4.25.0-ioctl (2012-07-25) initialised: dm-de...@redhat.com [ 3.386135] device-mapper: multipath: version 1.6.0 loaded Activating swap-devices in /etc/fstab... ..doneSetting up the System Clock..done ..done Activating device mapper... ..done Loading required kernel modules ..doneCreating multipath targets: (waiting for udev)..done Starting MD RAID ..unused [ 3.475733] scsi scan: INQUIRY successful with code 0x0 [ 3.475741] scsi 0:0:0:0: scsi scan: INQUIRY pass 2 length 149 [ 3.475978] scsi scan: INQUIRY successful with code 0x0 [ 3.475985] scsi 0:0:0:0: Direct-Access EMC SYMMETRIX 5874 PQ: 0 ANSI: 5 [ 3.476061] scsi scan: Sending REPORT LUNS to host 0 channel 0 id 0 (try 0) [ 3.476513] scsi scan: REPORT LUNS successful (try 0) result 0x0 [ 3.476519] scsi 0:0:0:0: scsi scan: REPORT LUN scan [ 3.476524] scsi scan: device exists on 0:0:0:0 [ 3.477382] scsi 0:0:0:1: scsi scan: INQUIRY pass 1 length 36 [ 3.477872] scsi scan: INQUIRY successful with code 0x0 [ 3.477880] scsi 0:0:0:1: scsi scan: INQUIRY pass 2 length 149 [ 3.478311] scsi scan: INQUIRY successful with code 0x0 [ 3.478331] scsi 0:0:0:1: Direct-Access EMC SYMMETRIX 5874 PQ: 0 ANSI: 5 [ 3.486130] sd 0:0:0:0: Done: SUCCESS [ 3.486135] sd 0:0:0:0: Result: hostbyte=DID_OK driverbyte=DRIVER_OK [ 3.486140] sd 0:0:0:0: CDB: Test Unit Ready: 00 00 00 00 00 00 [ 3.486158] sd 0:0:0:0: Sense Key : Unit Attention [current] [ 3.486177] sd 0:0:0:0: Add. Sense: I_T nexus loss occurred [ 3.486616] sd 0:0:0:1: Done: SUCCESS [ 3.486622] sd 0:0:0:1: Result: hostbyte=DID_OK driverbyte=DRIVER_OK [ 3.486629] sd 0:0:0:1: CDB: Test Unit Ready: 00 00 00 00 00 00 [ 3.486644] sd 0:0:0:1: Sense Key : Unit Attention [current] [ 3.486652] sd 0:0:0:1: Add. Sense: I_T nexus loss occurred [ 3.487964] sd 0:0:0:1: [sdb] 141434880 512-byte logical blocks: (72.4 GB/67.4 GiB) [ 3.488130] sd 0:0:0:0: [sda] 141434880 512-byte logical blocks: (72.4 GB/67.4 GiB) [ 3.491496] sd 0:0:0:1: [sdb] Write Protect is off [ 3.491504] sd 0:0:0:1: [sdb] Mode Sense: 8b 00 00 08 [ 3.491893] sd 0:0:0:0: [sda] Write Protect is off [ 3.491917] sd 0:0:0:0: [sda] Mode Sense: 8b 00 00 08 [ 3.492273] sd 0:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 3.492750] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 3.499550] sdb: sdb1 [ 3.500401] sda: sda1 [ 3.504773] sg_alloc: dev=0 [ 3.504795] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 3.504802] sg_alloc: dev=1 [ 3.504818] sd 0:0:0:1: Attached scsi generic sg1 type 0 [ 3.504831] sd 0:0:0:1: [sdb] Attached SCSI disk [ 3.510375] sd 0:0:0:0: [sda] Attached SCSI disk Waiting for udev to settle... Scanning for LVM volume groups... Reading all physical volumes. This may take a while... Found volume group "system-vg" using metadata type lvm2 Activating LVM volume groups... PARTIAL MODE. Incomplete logical volumes will be processed. 5 logical volume(s) in volume group "system-vg" now active ..done Waiting for /dev/user-vg/app-lv /dev/user-vg/ibmitm-lv /dev/user-vg/vantage-lv no more events Checking file systems... fsck from util-linux 2.19.1 Checking all file systems. /dev/user-vg/app-lv: nonexistent device ("nofail" fstab option may be used to skip this device) /dev/user-vg/ibmitm-lv: nonexistent device ("nofail" fstab option may be used to skip this device) /dev/user-vg/vantage-lv: nonexistent device ("nofail" fstab option may be used to skip this device) [/sbin/fsck.ext3 (1) -- /home] fsck.ext3 -a -C0 /dev/mapper/system--vg-home--lv /dev/mapper/system--vg-home--lv: clean, 310/131072 files, 30559/524288 blocks [/sbin/fsck.ext3 (1) -- /opt] fsck.ext3 -a -C0 /dev/mapper/system--vg-opt--lv /dev/mapper/system--vg-opt--lv: clean, 174/131072 files, 38932/524288 blocks [/sbin/fsck.ext3 (1) -- /tmp] fsck.ext3 -a -C0 /dev/mapper/system--vg-tmp--lv /dev/mapper/system--vg-tmp--lv: clean, 87/131072 files, 25736/524288 blocks [/sbin/fsck.ext3 (1) -- /usr] fsck.ext3 -a -C0 /dev/mapper/system--vg-usr--lv /dev/mapper/system--vg-usr--lv: clean, 59550/212992 files, 491523/837632 blocks [/sbin/fsck.ext3 (1) -- /var] fsck.ext3 -a -C0 /dev/mapper/system--vg-var--lv Give root password for maintenance (or type Control-D to continue): blogd: no message logging because /var file system is not accessible /dev/mapper/system--vg-var--lv: clean, 3529/131072 files, 187603/524288 blocks [/sbin/fsck.ext3 (1) -- /app] fsck.ext3 -a -C0 /dev/user-vg/app-lv fsck.ext3: No such file or directory while trying to open /dev/user-vg/app-lv /dev/user-vg/app-lv: The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> [/sbin/fsck.ext3 (1) -- /app/IBM/ITM] fsck.ext3 -a -C0 /dev/user-vg/ibmitm-lv fsck.ext3: No such file or directory while trying to open /dev/user-vg/ibmitm-lv /dev/user-vg/ibmitm-lv: The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate sup erblock: e2fsck -b 8193 <device> [/sbin/fsck.ext3 (1) -- /app/ecotools] fsck.ext3 -a -C0 /dev/user-vg/vantage-lv fsck.ext3: No such file or directory while trying to open /dev/user-vg/vantage-lv /dev/user-vg/vantage-lv: The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> ..failed fsck failed for at least one filesystem (not /). Please repair manually and reboot. The root file system is already mounted read-write. Attention: Only CONTROL-D will reboot the system in this maintanance mode. shutdown or reboot will not work. [ 5.021680] scsi scan: INQUIRY successful with code 0x0 [ 5.021688] scsi 1:0:0:0: scsi scan: INQUIRY pass 2 length 149 [ 5.021899] scsi scan: INQUIRY successful with code 0x0 [ 5.021906] scsi 1:0:0:0: Direct-Access EMC SYMMETRIX 5874 PQ: 0 ANSI: 5 [ 5.021985] sg_alloc: dev=2 [ 5.022009] sd 1:0:0:0: Attached scsi generic sg2 type 0 [ 5.022033] scsi scan: Sending REPORT LUNS to host 1 channel 0 id 0 (try 0) [ 5.022419] scsi scan: REPORT LUNS successful (try 0) result 0x0 [ 5.022424] sd 1:0:0:0: scsi scan: REPORT LUN scan [ 5.022428] scsi scan: device exists on 1:0:0:0 [ 5.023321] scsi 1:0:0:1: scsi scan: INQUIRY pass 1 length 36 [ 5.023508] sd 1:0:0:0: Done: SUCCESS [ 5.023514] sd 1:0:0:0: Result: hostbyte=DID_OK driverbyte=DRIVER_OK [ 5.023522] sd 1:0:0:0: CDB: Test Unit Ready: 00 00 00 00 00 00 [ 5.023542] sd 1:0:0:0: Sense Key : Unit Attention [current] [ 5.023551] sd 1:0:0:0: Add. Sense: I_T nexus loss occurred [ 5.024722] scsi scan: INQUIRY successful with code 0x0 [ 5.024750] scsi 1:0:0:1: scsi scan: INQUIRY pass 2 length 149 [ 5.024865] sd 1:0:0:0: [sdc] 141434880 512-byte logical blocks: (72.4 GB/67.4 GiB) [ 5.024954] scsi scan: INQUIRY successful with code 0x0 [ 5.024959] scsi 1:0:0:1: Direct-Access EMC SYMMETRIX 5874 PQ: 0 ANSI: 5 [ 5.025016] sg_alloc: dev=3 [ 5.025042] sd 1:0:0:1: Attached scsi generic sg3 type 0 [ 5.026523] sd 1:0:0:1: Done: SUCCESS [ 5.026529] sd 1:0:0:1: Result: hostbyte=DID_OK driverbyte=DRIVER_OK [ 5.026535] sd 1:0:0:1: CDB: Test Unit Ready: 00 00 00 00 00 00 [ 5.026550] sd 1:0:0:1: Sense Key : Unit Attention [current] [ 5.026558] sd 1:0:0:1: Add. Sense: I_T nexus loss occurred [ 5.027998] sd 1:0:0:1: [sdd] 141434880 512-byte logical blocks: (72.4 GB/67.4 GiB) [ 5.028987] sd 1:0:0:0: [sdc] Write Protect is off [ 5.028999] sd 1:0:0:0: [sdc] Mode Sense: 8b 00 00 08 [ 5.029462] sd 1:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 5.030860] sd 1:0:0:1: [sdd] Write Protect is off [ 5.030872] sd 1:0:0:1: [sdd] Mode Sense: 8b 00 00 08 [ 5.032427] sd 1:0:0:1: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 5.034279] sdc: sdc1 [ 5.037147] sdd: sdd1 [ 5.041471] sd 1:0:0:0: [sdc] Attached SCSI disk [ 5.043020] sd 1:0:0:1: [sdd] Attached SCSI disk Chris Will Enterprise Linux/UNIX (ELU) (313) 549-9729 Cell cw...@bcbsm.com -----Original Message----- From: Linux on 390 Port <LINUX-390@VM.MARIST.EDU> On Behalf Of Benjamin Block Sent: Wednesday, February 20, 2019 5:06 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: Question on auto lun scan Hey Alan, On Tue, Feb 19, 2019 at 01:06:17PM -0500, Alan Altmark wrote: > On Tuesday, 02/19/2019 at 01:49 GMT, Benjamin Block <bbl...@linux.ibm.com> > wrote: > > udev_timeout=30 > > > > The default is 30s, you can add it to the kernel command line > > (/etc/zipl.conf), and increase it (e.g.: udev_timeout=90). > > > > This will increase the timeout for for both udevadm and the step where > > it waits for the root-volume to appear. > > Does that mean ALL of the needed needed LUNs must be brought online within > 30 seconds? Or that REPORT LUNS (LUN 0) or INQUIRY (LUN 0 and/or all > others) must EACH respond within 30 seconds? > It regulates two things at once. First, during the early phase of the INITRD, the init process will start 'udev', and then use 'udevadm settle --timeout=${udev_timeout}' to wait till it udev processed all the events happening in the kernel (such as adding devices, executing rules upon it saw specific devices being added [such as loading drivers and so on], and so on). This timeout is deceptive though, it will only wait up to this amount OR if there is no events in the queue anymore.. if an event happens 1s after it already stopped waiting, bad luck - although, because udev keeps running as daemon, it should sill process the event if I am not mistaken, its just that init will continue doing other things meanwhile (such as:). The second mechanism it controls is the late phase during run of the INITRD, where init will wait till it sees the root-volume appear. In this phase it will wait up to ${udev_timeout} seconds till it sees it, and after that it will fall back to ask the user for guidance (which is what I think is happening here, although strictly spoken, this is only a guess). That is as far as I have read the code - mind you, I didn't write it, so it may well be that I have miss-interpreted something. But this all changes with SLES12, because SLES12 uses a different kind of INITRD (dracut) and the code there probably doesn't even know this parameter. So this is strictly SLES11 only. -- With Best Regards, Benjamin Block / Linux on IBM Z Kernel Development IBM Systems & Technology Group / IBM Deutschland Research & Development GmbH Vorsitz. AufsR.: Matthias Hartmann / Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294 ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 The information contained in this communication is highly confidential and is intended solely for the use of the individual(s) to whom this communication is directed. If you are not the intended recipient, you are hereby notified that any viewing, copying, disclosure or distribution of this information is prohibited. Please notify the sender, by electronic mail or telephone, of any unintended receipt and delete the original message without making any copies. Blue Cross Blue Shield of Michigan and Blue Care Network of Michigan are nonprofit corporations and independent licensees of the Blue Cross and Blue Shield Association. ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390