On Thu, Dec 22, 2011 at 10:00 AM, Myers Carpenter <my...@maski.org> wrote:

> On Sat, Nov 5, 2011 at 2:35 PM, Myers Carpenter <my...@maski.org> wrote:
>> I would like to pick the brains of the ZFS experts on this list: What
>> would you do next to try and recover this zfs pool?
> I hate running across threads that ask a question and the person that
> asked them never comes back to say what they eventually did, so...
> To summarize: In late October I had two drives fail in a raidz1 pool.  I
> was able to recover all the data from one drive, but the other could not be
> seen by the controller.  Trying to zpool import was not working.   I had 3
> of the 4 drives, why couldn't I mount this.
> I read about every option in zdb and tried ones that might tell me
> something more about what was on this recovered drive.  I eventually hit on
> zdb -p devs -vvvve -lu /bank4/hd/devs/loop0
> where /bank4/hd/devs/loop0 was a symlink back to /dev/loop0 where I had
> setup the disk image of the recovered drive.
> This showed the uberblocks which looked like this:
> Uberblock[1]
>         magic = 0000000000bab10c
>         version = 26
>         txg = 23128193
>         guid_sum = 13396147021153418877
>         timestamp = 1316987376 UTC = Sun Sep 25 17:49:36 2011
>         rootbp = DVA[0]=<0:2981f336c00:400> DVA[1]=<0:1e8dcc01400:400>
> DVA[2]=<0:3b16a3dd400:400> [L0 DMU objset] fletcher4 lzjb LE contiguous
> unique triple size=800L/200P birth=23128193L/23128193P fill=255
> cksum=136175e0a4:79b27ae49c7:1857d594ca833:34ec76b965ae40
> Then it all came clear: This drive had encountered errors one month before
> the other drive had failed and zfs had stopped writing to it.
> So the lesson here: Don't be a dumbass like me.  Setup up nagios or some
> other system to alert you when a pool has become degraded.  ZFS works very
> well with one drive out of the array, you aren't probably going to notice
> problems unless you are proactively looking for them.
> myers

Or, if you aren't scrubbing on a regular basis, just change your zpool
failmode property.  Had you set it to wait or panic, it would've been very
clear, very quickly that something was wrong.

zfs-discuss mailing list

Reply via email to