Re: [zfs-discuss] Recovering corrupted root pool

2008-07-22 Thread Rainer Orth
Rainer Orth [EMAIL PROTECTED] writes:

 Yesterday evening, I tried Live Upgrade on a Sun Fire V60x running SX:CE 90
 to SX:CE 93 with ZFS root (mirrored root pool called root).  The LU itself
 ran without problems, but before rebooting the machine, I wanted to add
 some space to the root pool that had previously been in use for an UFS BE.
 
 Both disks (c0t0d0 and c0t1d0) were partitioned as follows:
 
 Part  TagFlag Cylinders SizeBlocks
   0   rootwm   1 - 18810   25.91GB(18810/0/0) 54342090
   1 unassignedwm   18811 - 246188.00GB(5808/0/0)  16779312
   2 backupwm   0 - 24618   33.91GB(24619/0/0) 71124291
   3 unassignedwu   00 (0/0/0)0
   4 unassignedwu   00 (0/0/0)0
   5 unassignedwu   00 (0/0/0)0
   6 unassignedwu   00 (0/0/0)0
   7 unassignedwu   00 (0/0/0)0
   8   bootwu   0 - 01.41MB(1/0/0) 2889
   9 unassignedwu   00 (0/0/0)0
 
 Slice 0 is used by the root pool, slice 1 was used by the UFS BE.  To
 achieve this, I ludeleted the now unused UFS BE and used 
 
 # NOINUSE_CHECK=1 format
 
 to extend slice 0 by the size of slice 1, deleting the latter afterwards.
 I'm pretty sure that I've done this successfully before, even on a live
 system, but this time something went wrong: I remember an FMA message about
 one side of the root pool mirror being broken (something about an
 inconsistent label, unfortunately I didn't write down the exact message).
 Nonetheless, I rebooted the machine after luactivate sol_nv_93 (the new ZFS
 BE), but the machine didn't come up:
 
 SunOS Release 5.11 Version snv_93 32-bit
 Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
 Use is subject to license terms.
 NOTICE:
 spa_import_rootpool: error 22
 
 
 panic[cpu0]/thread=fec1cfe0: cannot mount root path /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL 
 PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0:a /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL 
 PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0:a
 
 fec351ac genunix:rootconf+10b (c0f040, 1, fec1c750)
 fec351d0 genunix:vfs_mountroot+54 (fe800010, fec30fd8,)
 fec351e4 genunix:main+b4 ()
 
 panic: entering debugger (no dump device, continue to reboot)
 skipping system dump - no dump device configured
 rebooting...
 
 I've managed a failsafe boot (from the same pool), and zpool import reveals
 
   pool: root
 id: 14475053522795106129
  state: UNAVAIL
 status: The pool was last accessed by another system.
 action: The pool cannot be imported due to damaged devices or data.
see: http://www.sun.com/msg/ZFS-8000-EY
 config:
 
 root  UNAVAIL  insufficient replicas
   mirror  UNAVAIL  corrupted data
 c0t1d0s0  ONLINE
 c0t0d0s0  ONLINE
 
 Even restoring slice 1 on both disks to its old size and shrinking slice 0
 accordingly doesn't help.  I'm sure I've done this correctly since I could
 boot from the old sol_nv_b90_ufs BE, which was still on c0t0d0s1.
 
 I didn't have much success to find out what's going on here: I tried to
 remove either of the disks in case both sides of the mirror are
 inconsistent, but to no avail.  I didn't have much luck with zdb either.
 Here's the output of zdb -l /dev/rdsk/c0t0d0s0 and /dev/rdsk/c0t1d0s0:
 
 c0t0d0s0:
 
 
 LABEL 0
 
 version=10
 name='root'
 state=0
 txg=14643945
 pool_guid=14475053522795106129
 hostid=336880771
 hostname='erebus'
 top_guid=17627503873514720747
 guid=6121143629633742955
 vdev_tree
 type='mirror'
 id=0
 guid=17627503873514720747
 whole_disk=0
 metaslab_array=13
 metaslab_shift=28
 ashift=9
 asize=36409180160
 is_log=0
 children[0]
 type='disk'
 id=0
 guid=1526746004928780410
 path='/dev/dsk/c0t1d0s0'
 devid='id1,[EMAIL PROTECTED]/a'
 phys_path='/[EMAIL PROTECTED],0/pci8086,[EMAIL 
 PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL 
 PROTECTED],0:a'
 whole_disk=0
 DTL=160
 children[1]
 type='disk'
 id=1
 guid=6121143629633742955
 path='/dev/dsk/c0t0d0s0'
 devid='id1,[EMAIL PROTECTED]/a'
 phys_path='/[EMAIL PROTECTED],0/pci8086,[EMAIL 
 PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL 
 PROTECTED],0:a'
 whole_disk=0

[zfs-discuss] Recovering corrupted root pool

2008-07-11 Thread Rainer Orth
Yesterday evening, I tried Live Upgrade on a Sun Fire V60x running SX:CE 90
to SX:CE 93 with ZFS root (mirrored root pool called root).  The LU itself
ran without problems, but before rebooting the machine, I wanted to add
some space to the root pool that had previously been in use for an UFS BE.

Both disks (c0t0d0 and c0t1d0) were partitioned as follows:

Part  TagFlag Cylinders SizeBlocks
  0   rootwm   1 - 18810   25.91GB(18810/0/0) 54342090
  1 unassignedwm   18811 - 246188.00GB(5808/0/0)  16779312
  2 backupwm   0 - 24618   33.91GB(24619/0/0) 71124291
  3 unassignedwu   00 (0/0/0)0
  4 unassignedwu   00 (0/0/0)0
  5 unassignedwu   00 (0/0/0)0
  6 unassignedwu   00 (0/0/0)0
  7 unassignedwu   00 (0/0/0)0
  8   bootwu   0 - 01.41MB(1/0/0) 2889
  9 unassignedwu   00 (0/0/0)0

Slice 0 is used by the root pool, slice 1 was used by the UFS BE.  To
achieve this, I ludeleted the now unused UFS BE and used 

# NOINUSE_CHECK=1 format

to extend slice 0 by the size of slice 1, deleting the latter afterwards.
I'm pretty sure that I've done this successfully before, even on a live
system, but this time something went wrong: I remember an FMA message about
one side of the root pool mirror being broken (something about an
inconsistent label, unfortunately I didn't write down the exact message).
Nonetheless, I rebooted the machine after luactivate sol_nv_93 (the new ZFS
BE), but the machine didn't come up:

SunOS Release 5.11 Version snv_93 32-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
NOTICE:
spa_import_rootpool: error 22


panic[cpu0]/thread=fec1cfe0: cannot mount root path /[EMAIL 
PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL 
PROTECTED],1/[EMAIL PROTECTED],0:a /[EMAIL PROTECTED],0/pci8086,[EMAIL 
PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL 
PROTECTED],0:a

fec351ac genunix:rootconf+10b (c0f040, 1, fec1c750)
fec351d0 genunix:vfs_mountroot+54 (fe800010, fec30fd8,)
fec351e4 genunix:main+b4 ()

panic: entering debugger (no dump device, continue to reboot)
skipping system dump - no dump device configured
rebooting...

I've managed a failsafe boot (from the same pool), and zpool import reveals

  pool: root
id: 14475053522795106129
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

root  UNAVAIL  insufficient replicas
  mirror  UNAVAIL  corrupted data
c0t1d0s0  ONLINE
c0t0d0s0  ONLINE

Even restoring slice 1 on both disks to its old size and shrinking slice 0
accordingly doesn't help.  I'm sure I've done this correctly since I could
boot from the old sol_nv_b90_ufs BE, which was still on c0t0d0s1.

I didn't have much success to find out what's going on here: I tried to
remove either of the disks in case both sides of the mirror are
inconsistent, but to no avail.  I didn't have much luck with zdb either.
Here's the output of zdb -l /dev/rdsk/c0t0d0s0 and /dev/rdsk/c0t1d0s0:

c0t0d0s0:


LABEL 0

version=10
name='root'
state=0
txg=14643945
pool_guid=14475053522795106129
hostid=336880771
hostname='erebus'
top_guid=17627503873514720747
guid=6121143629633742955
vdev_tree
type='mirror'
id=0
guid=17627503873514720747
whole_disk=0
metaslab_array=13
metaslab_shift=28
ashift=9
asize=36409180160
is_log=0
children[0]
type='disk'
id=0
guid=1526746004928780410
path='/dev/dsk/c0t1d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci8086,[EMAIL 
PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL 
PROTECTED],0:a'
whole_disk=0
DTL=160
children[1]
type='disk'
id=1
guid=6121143629633742955
path='/dev/dsk/c0t0d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci8086,[EMAIL 
PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL 
PROTECTED],0:a'
whole_disk=0
DTL=272

LABEL 1

version=10
name='root'
state=0