Re: [zfs-discuss] Help with corrupted pool

Daniel Carosone Thu, 18 Feb 2010 12:32:29 -0800

On Thu, Feb 18, 2010 at 12:42:58PM -0500, Ethan wrote:
> On Thu, Feb 18, 2010 at 04:14, Daniel Carosone <d...@geek.com.au> wrote:
> Although I do notice that right now, it imports just fine using the p0
> devices using just `zpool import q`, no longer having to use import -d with
> the directory of symlinks to p0 devices. I guess this has to do with having
> repaired the labels and such? Or whatever it's repaired having successfully
> imported and scrubbed.


It's the zpool.cache file at work, storing extra copies of labels with
corrected device paths.  For curiosity's sake, what happens when you
remove (rename) your dir with the symlinks?  

> After the scrub finished, this is the state of my pool:
>             /export/home/ethan/qdsk/c9t1d0p0  DEGRADED     4     0    60
> too many errors

Ick.  Note that there are device errors as well as content (checksum)
errors, which means it's can't only be correctly-copied damage from
your orignal pool that was having problems.  

zpool clear and rescrub, for starters, and see if they continue.  

I suggest also:
 - carefully checking and reseating cables, etc
 - taking backups now of anything you really wanted out of the pool,
   while it's still available.
 - choosing that disk as the first to replace, and scrubbing again
   after replacing onto it, perhaps twice.
 - doing a dd to overwrite that entire disk with random data and let
   it remap bad sectors, before the replace (not just zeros, and not
   just the sectors a zfs resilver would hit. openssl enc of /dev/zero
   with a lightweight cipher and whatever key; for extra caution read
   back and compare with a second openssl stream using the same key)
 - being generally very watchful and suspicious of that disk in
   particular, look at error logs for clues, etc.
 - being very happy that zfs deals so well with all this abuse, and
   you know your data is ok.

> I have no idea what happened to the one disk, but "No known data errors" is
> what makes me happy. I'm not sure if I should be concerned about the
> physical disk itself

given that it's reported disk errors as well as damaged content, yes.

> or just assume that some data got screwed up with all
> this mess. I guess maybe I'll see how the disk behaves during the replace
> operations (restoring to it and then restoring from it four times seems like
> a pretty good test of it), and if it continues to error, replace the
> physical drive and if necessary restore from the original truecrypt volumes.

Good plan; note the extra scrubs at key points in the process above.

> So, current plan:
> - export the pool.

shouldn't be needed; zpool offline <dev> would be enough

> - format c9t1d0 to have one slice being the entire disk.

Might not have been needed, but given Victor's comments about reserved
space, you may need to do this manually, yes.  Be sure to use EFI
labels.  Pick the suspect disk first.

> - import. should be degraded, missing c9t1d0p0.

no need if you didn't export

> - replace missing c9t1d0p0 with c9t1d0 

yup, or if you've manually partitioned you may need to mention the
slice number to prevent it repartitioning with the default reserved
space again. You may even need to use some other slice (s5 or
whatever), but I don't think so.

> - wait for resilver.
> - repeat with the other four disks.

 - tell us how it went
 - drink beer.

--
Dan.

pgpAeO0FFlOUi.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Help with corrupted pool

Reply via email to