[zfs-discuss] Corrupted pool

Ricardo Correia Thu, 14 Dec 2006 21:24:06 -0800

Hi,

I've been using a ZFS pool inside a VMware'd NexentaOS, on a single real disk 
partition, for a few months in order to store some backups.


Today I noticed that there were some directories missing inside 2 separate 
filesystems, which I found strange. I went to the backup logs (also stored 
inside the pool) and it seemed that at least one of the directories 
(/pool/backup/var) went missing yesterday *while* the backup was ongoing 
(inside /pool/backup/var/log).

The backup process is a few simple rsyncs from the VMware host, running Linux, 
to the VMware guest, running Nexenta, followed by snapshot creation. The 
filesystems were *not* NFS mounted - I had the rsync server process running 
on the ZFS box.

I tried to look into the snapshots, but doing a 'zfs set snapdir=visible 
pool/backup' didn't make the .zfs dir appear. I did a 'zpool status' and it 
didn't report any errors or checksum failures whatsoever.

I assumed there was probably corrupted memory in the running kernel instance, 
so I rebooted.

Now I can't even mount the pool!

[EMAIL PROTECTED]:~# zpool status -x
  pool: pool
 state: FAULTED
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool        UNAVAIL      0     0     0  insufficient replicas
          c2t0d0s1  UNAVAIL      0     0     0  cannot open
[EMAIL PROTECTED]:~# zdb -l /dev/dsk/c2t0d0s1
--------------------------------------------
LABEL 0
--------------------------------------------
    version=3
    name='pool'
    state=0
    txg=143567
    pool_guid=3667491715056107646
    top_guid=8396736522625936678
    guid=8396736522625936678
    vdev_tree
        type='disk'
        id=0
        guid=8396736522625936678
        path='/dev/dsk/c2t0d0s1'
        devid='id1,[EMAIL PROTECTED]/b'
        whole_disk=0
        metaslab_array=13
        metaslab_shift=30
        ashift=9
        asize=117896380416
        DTL=22
--------------------------------------------
LABEL 1
--------------------------------------------
    version=3
    name='pool'
    state=0
    txg=143567
    pool_guid=3667491715056107646
    top_guid=8396736522625936678
    guid=8396736522625936678
    vdev_tree
        type='disk'
        id=0
        guid=8396736522625936678
        path='/dev/dsk/c2t0d0s1'
        devid='id1,[EMAIL PROTECTED]/b'
        whole_disk=0
        metaslab_array=13
        metaslab_shift=30
        ashift=9
        asize=117896380416
        DTL=22
--------------------------------------------
LABEL 2
--------------------------------------------
    version=3
    name='pool'
    state=0
    txg=143567
    pool_guid=3667491715056107646
    top_guid=8396736522625936678
    guid=8396736522625936678
    vdev_tree
        type='disk'
        id=0
        guid=8396736522625936678
        path='/dev/dsk/c2t0d0s1'
        devid='id1,[EMAIL PROTECTED]/b'
        whole_disk=0
        metaslab_array=13
        metaslab_shift=30
        ashift=9
        asize=117896380416
        DTL=22
--------------------------------------------
LABEL 3
--------------------------------------------
    version=3
    name='pool'
    state=0
    txg=143567
    pool_guid=3667491715056107646
    top_guid=8396736522625936678
    guid=8396736522625936678
    vdev_tree
        type='disk'
        id=0
        guid=8396736522625936678
        path='/dev/dsk/c2t0d0s1'
        devid='id1,[EMAIL PROTECTED]/b'
        whole_disk=0
        metaslab_array=13
        metaslab_shift=30
        ashift=9
        asize=117896380416
        DTL=22

I don't know much about Solaris partitions, but here's how I did it (I needed 
to store swap on this disk):

partition> print
Current partition table (original):
Total disk cylinders available: 14466 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       swap    wm       1 -   131        1.00GB    (131/0/0)     2104515
  1   reserved    wm     132 - 14465      109.80GB    (14334/0/0) 230275710
  2     backup    wu       0 - 14465      110.82GB    (14466/0/0) 232396290
  3 unassigned    wm       0                0         (0/0/0)             0
  4 unassigned    wm       0                0         (0/0/0)             0
  5 unassigned    wm       0                0         (0/0/0)             0
  6 unassigned    wm       0                0         (0/0/0)             0
  7 unassigned    wm       0                0         (0/0/0)             0
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 unassigned    wm       0                0         (0/0/0)             0

However, I found this rather strange:

[EMAIL PROTECTED]:~# stat -L /dev/dsk/c2t0d0s1
  File: `/dev/dsk/c2t0d0s1'
  Size: 9223372036854775807     Blocks: 0          IO Block: 8192   block 
special file
Device: 4380000h/70778880d      Inode: 26214405    Links: 1     Device type: 
32,1
Access: (0640/brw-r-----)  Uid: (    0/    root)   Gid: (    3/     sys)
Access: 2006-12-15 05:11:56.000000000 +0000
Modify: 2006-12-15 05:11:56.000000000 +0000
Change: 2006-12-15 05:11:56.000000000 +0000

(The -L parameter in the GNU stat is the follow symlink option).

Notice the size!

stat -L /dev/dsk/c2t0d0s0 reports a size of 1077511680.

Interestingly, I never had any problems until now, I even did weekly scrubs 
and it *never* reported any errors or checksum failures.

I've tried stopping Nexenta, deleting the disk in VMware, readding it, booting 
and doing 'devfsadm -C'. It didn't solve anything.

Is there any way to recover the data? I don't really know how to begin 
diagnosing/solving the problem. I don't understand why stat is reporting that 
strange number.

Thanks.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Corrupted pool

Reply via email to