On Sat, Nov 5, 2011 at 2:35 PM, Myers Carpenter <[email protected]> wrote:
> I would like to pick the brains of the ZFS experts on this list: What
> would you do next to try and recover this zfs pool?
>
I hate running across threads that ask a question and the person that asked
them never comes back to say what they eventually did, so...
To summarize: In late October I had two drives fail in a raidz1 pool. I
was able to recover all the data from one drive, but the other could not be
seen by the controller. Trying to zpool import was not working. I had 3
of the 4 drives, why couldn't I mount this.
I read about every option in zdb and tried ones that might tell me
something more about what was on this recovered drive. I eventually hit on
zdb -p devs -vvvve -lu /bank4/hd/devs/loop0
where /bank4/hd/devs/loop0 was a symlink back to /dev/loop0 where I had
setup the disk image of the recovered drive.
This showed the uberblocks which looked like this:
Uberblock[1]
magic = 0000000000bab10c
version = 26
txg = 23128193
guid_sum = 13396147021153418877
timestamp = 1316987376 UTC = Sun Sep 25 17:49:36 2011
rootbp = DVA[0]=<0:2981f336c00:400> DVA[1]=<0:1e8dcc01400:400>
DVA[2]=<0:3b16a3dd400:400> [L0 DMU objset] fletcher4 lzjb LE contiguous
unique triple size=800L/200P birth=23128193L/23128193P fill=255
cksum=136175e0a4:79b27ae49c7:1857d594ca833:34ec76b965ae40
Then it all came clear: This drive had encountered errors one month before
the other drive had failed and zfs had stopped writing to it.
So the lesson here: Don't be a dumbass like me. Setup up nagios or some
other system to alert you when a pool has become degraded. ZFS works very
well with one drive out of the array, you aren't probably going to notice
problems unless you are proactively looking for them.
myers
_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss