Re: [osol-help] ZFS root moved to new device path [WAS: Boot Disk question]

Johan Hartzenberg Thu, 31 Jul 2008 07:24:13 -0700

The problem described below is definitely not related to moving the disk
from the S-ATA port to a USB port.  It appears to be related to how ZFS
imports pools, and the same problem occurs when moving the drive from one
USB port to another.  I have now, after about 20 reboots, identified a
"workarround" that works consistently.


1. export the non-root pools before shutting down.  If this was not done,
additional recovery steps are needed.

2. Shut down, move the disk to another port.

3. On bootup normal booting will fail.

4. From grub, move to the failsafe boot option and press e to EDIT the
option.

On the next screen go to the kernel line, and add -a to the unix (boot)
options

5 Then boot (unix -s -a)

When prompted, select default [etc/system] for the kernel config file
Also accept the default for the Retire store [etc/devices/retire_store]

Enter Y when mounted to confirm to mount the rpool read/write

6. reboot (nothing else needed)

7. Select normal grub boot entry.

If additional zfs pools exist on the disk, these can be imported by doing
the following:

shut down
At grub, modify the boot option kernel like to add -m milestone=none, eg
unix -B $ZFS_BOOT -m milestone=none

Log in as root on the console session, and remove the file
/etc/zfs/zpool.cache

reboot, boot normally

run zpool import on the Z-pool.

Now I have another question:  Lars suggested that this is not a bug, but is
indeed an RFE situation.  No offense to Lars, but I would like a few more
votes on that please, I can not make up my mind.  Also suggestions about the
category/sub-category would be handy.

The main reason why I think this may be a bug is because a user may expect
that USB ZFS-formatted disks do not need to be exported prior to shutting
down, regardless of whether ZFS root is used or not, and/or whether the disk
is in fact the boot disk or not.

On Thu, Jul 31, 2008 at 1:49 PM, Johan Hartzenberg <[EMAIL PROTECTED]>wrote:

> I have seemingly found a workarround to my problem.
>
>
> On Thu, Jul 31, 2008 at 1:34 PM, Johan Hartzenberg <[EMAIL PROTECTED]>wrote:
>
>> Hi Lars,
>>
>> Thank you for the response - it has prompted me to do some more thinking
>> and investigation.
>>
>> Note: zfs root file systems are not mentioned in /etc/vfstab - in fact the
>> file is not even included in the boot archive.
>>
>> Also the file /etc/path_to_inst is used to reserve driver instance to
>> physical device paths, and does not specify the boot device - the boot
>> device is specified by its full, real device path - nothing is trying to
>> look-up driver instances just yet.
>>
>> I need to go re-read the boot process docs on x86 as it has changed much.
>> But the stage where this is at:
>> grub loads.
>> find-root actually FINDS the boot-archive (Thus the real device is known)
>> Solaris starts to load, so the boot-archive has decompresesed and
>> un-mounted...
>> Something in there refers to the boot device, overwriting what grub
>> found/thought is the boot disk, so I try to look for it:
>>
>> / $ bootadm list-archive|xargs file|grep text|cut -d: -f1|xargs grep -i
>> "[EMAIL PROTECTED]/[EMAIL PROTECTED],0"
>> etc/path_to_inst:"/[EMAIL PROTECTED],0/[EMAIL PROTECTED],2/[EMAIL 
>> PROTECTED]/[EMAIL PROTECTED],0" 0 "cmdk"
>>
>> That is un-interesting.  Lets look for the compliment:
>>
>> / $ bootadm list-archive|xargs file|egrep -v
>> "text|directory"
>> etc/devices/devid_cache:    data
>> etc/devices/mdi_scsi_vhci_cache:    data
>>
>> We can eliminate vhci from the onset as I'm not multi-pathing (or can we?
>> regardless... devid_cache looks interesting)
>>
>> / $ strings etc/devices/devid_cache
>> /[EMAIL PROTECTED],0/[EMAIL PROTECTED],2/[EMAIL PROTECTED]/[EMAIL 
>> PROTECTED],0
>> devid
>> &cmdkTOSHIBA MK1032GSX=           27BQFEJTS
>> /[EMAIL PROTECTED],0/pci1179,[EMAIL PROTECTED],7/[EMAIL PROTECTED]/[EMAIL 
>> PROTECTED],0
>> devid
>> bG'$2
>> /[EMAIL PROTECTED],0/pci1179,[EMAIL PROTECTED],7/[EMAIL PROTECTED]/[EMAIL 
>> PROTECTED],0
>> devid
>> bG'$2
>>
>>
>> This looks very promising
>> / $ man -k devid_cache
>> devid_cache    devices (4)    - device configuration information
>>
>> The man page however does not help me much, so I'm stumped.
>>
>> Experimenting with this is also tedious, made worse by my paranoia about
>> anti-static precautions.
>>
>>
> I decided that I had to give this one more try, particularly to try out
> boot arguments -a and safe mode.  The problem "went away" but I'm not sure
> why.  What I did was:
>
> 1. Shut down and moved disk to the USB enclosure.
> 2. Tried to boot with -v -m verbose -a
> I accepted the default for /etc/system, and /dev/null for the device file.
>
> This resulted in a reset.  I should have added -k
>
> 3.  When the grub menu appeared, I decided to just give failsafe mode a
> try.  Here I added the options -v -a -k
> Again I accepted the default for /etc/system (The only thing in there is
> the nfs domain setting) and /dev/null for the device file.
>
> This gave me a prompt to mount rpool on /a - I said yes and it mounted
> without any issues.  I looked arround it and noticed everything looks like
> it should.
>
> Ctrl-D resulted in a "No OS installed" message.  I wanted to still try the
> first option with -k, so I rebooted.
>
> *The system came up*, and I have:
>
> # *format*
> Searching for disks...done
>
>
> AVAILABLE DISK SELECTIONS:
>        0. c4t0d0 <DEFAULT cyl 2688 alt 2 hd 255 sec 63>
>           /[EMAIL PROTECTED],0/pci1179,[EMAIL PROTECTED],7/[EMAIL 
> PROTECTED]/[EMAIL PROTECTED],0
> Specify disk (enter its number): ^C
> # *zpool list rpool*
> NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
> rpool  20.5G  12.3G  8.20G    60%  ONLINE  -
> # *zpool status rpool*
>   pool: rpool
>  state: ONLINE
> status: The pool is formatted using an older on-disk format.  The pool can
>     still be used, but some features are unavailable.
> action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
>     pool will no longer be accessible on older software versions.
>  scrub: none requested
> config:
>
>     NAME        STATE     READ WRITE CKSUM
>     rpool       ONLINE       0     0     0
>       c4t0d0s0  ONLINE       0     0     0
>
> errors: No known data errors
>
>
> One problem that remains is that the second ZFS pool is not importing, eg:
>
> # zpool list
> NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
> SHARED      -      -      -      -  FAULTED  -
> rpool   20.5G  12.3G  8.20G    60%  ONLINE  -
> # zpool status SHARED
>   pool: SHARED
>  state: UNAVAIL
> status: One or more devices could not be opened.  There are insufficient
>     replicas for the pool to continue functioning.
> action: Attach the missing device and online it using 'zpool online'.
>    see: http://www.sun.com/msg/ZFS-8000-3C
>  scrub: none requested
> config:
>
>     NAME        STATE     READ WRITE CKSUM
>     SHARED      UNAVAIL      0     0     0  insufficient replicas
>       c0d0p3    UNAVAIL      0     0     0  cannot open
>
>
> Clearly because the pool was not exported, it expects the device to remain
> where it was last seen. I seem to recall that there is a zpool device cache
> somewhere....
>
> ls -l /etc/zfs ... Google for solution... I will update.
>
>


-- 
Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

_______________________________________________
opensolaris-help mailing list
[email protected]

Re: [osol-help] ZFS root moved to new device path [WAS: Boot Disk question]

Reply via email to