On Sat, Nov 5, 2011 at 2:35 PM, Myers Carpenter <my...@maski.org> wrote:

> I would like to pick the brains of the ZFS experts on this list: What
> would you do next to try and recover this zfs pool?
>

I hate running across threads that ask a question and the person that asked
them never comes back to say what they eventually did, so...

To summarize: In late October I had two drives fail in a raidz1 pool.  I
was able to recover all the data from one drive, but the other could not be
seen by the controller.  Trying to zpool import was not working.   I had 3
of the 4 drives, why couldn't I mount this.

I read about every option in zdb and tried ones that might tell me
something more about what was on this recovered drive.  I eventually hit on

zdb -p devs -vvvve -lu /bank4/hd/devs/loop0

where /bank4/hd/devs/loop0 was a symlink back to /dev/loop0 where I had
setup the disk image of the recovered drive.

This showed the uberblocks which looked like this:

Uberblock[1]
        magic = 0000000000bab10c
        version = 26
        txg = 23128193
        guid_sum = 13396147021153418877
        timestamp = 1316987376 UTC = Sun Sep 25 17:49:36 2011
        rootbp = DVA[0]=<0:2981f336c00:400> DVA[1]=<0:1e8dcc01400:400>
DVA[2]=<0:3b16a3dd400:400> [L0 DMU objset] fletcher4 lzjb LE contiguous
unique triple size=800L/200P birth=23128193L/23128193P fill=255
cksum=136175e0a4:79b27ae49c7:1857d594ca833:34ec76b965ae40

Then it all came clear: This drive had encountered errors one month before
the other drive had failed and zfs had stopped writing to it.

So the lesson here: Don't be a dumbass like me.  Setup up nagios or some
other system to alert you when a pool has become degraded.  ZFS works very
well with one drive out of the array, you aren't probably going to notice
problems unless you are proactively looking for them.

myers
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to