On Thu, Feb 18, 2010 at 12:42:58PM -0500, Ethan wrote: > On Thu, Feb 18, 2010 at 04:14, Daniel Carosone <d...@geek.com.au> wrote: > Although I do notice that right now, it imports just fine using the p0 > devices using just `zpool import q`, no longer having to use import -d with > the directory of symlinks to p0 devices. I guess this has to do with having > repaired the labels and such? Or whatever it's repaired having successfully > imported and scrubbed.
It's the zpool.cache file at work, storing extra copies of labels with corrected device paths. For curiosity's sake, what happens when you remove (rename) your dir with the symlinks? > After the scrub finished, this is the state of my pool: > /export/home/ethan/qdsk/c9t1d0p0 DEGRADED 4 0 60 > too many errors Ick. Note that there are device errors as well as content (checksum) errors, which means it's can't only be correctly-copied damage from your orignal pool that was having problems. zpool clear and rescrub, for starters, and see if they continue. I suggest also: - carefully checking and reseating cables, etc - taking backups now of anything you really wanted out of the pool, while it's still available. - choosing that disk as the first to replace, and scrubbing again after replacing onto it, perhaps twice. - doing a dd to overwrite that entire disk with random data and let it remap bad sectors, before the replace (not just zeros, and not just the sectors a zfs resilver would hit. openssl enc of /dev/zero with a lightweight cipher and whatever key; for extra caution read back and compare with a second openssl stream using the same key) - being generally very watchful and suspicious of that disk in particular, look at error logs for clues, etc. - being very happy that zfs deals so well with all this abuse, and you know your data is ok. > I have no idea what happened to the one disk, but "No known data errors" is > what makes me happy. I'm not sure if I should be concerned about the > physical disk itself given that it's reported disk errors as well as damaged content, yes. > or just assume that some data got screwed up with all > this mess. I guess maybe I'll see how the disk behaves during the replace > operations (restoring to it and then restoring from it four times seems like > a pretty good test of it), and if it continues to error, replace the > physical drive and if necessary restore from the original truecrypt volumes. Good plan; note the extra scrubs at key points in the process above. > So, current plan: > - export the pool. shouldn't be needed; zpool offline <dev> would be enough > - format c9t1d0 to have one slice being the entire disk. Might not have been needed, but given Victor's comments about reserved space, you may need to do this manually, yes. Be sure to use EFI labels. Pick the suspect disk first. > - import. should be degraded, missing c9t1d0p0. no need if you didn't export > - replace missing c9t1d0p0 with c9t1d0 yup, or if you've manually partitioned you may need to mention the slice number to prevent it repartitioning with the default reserved space again. You may even need to use some other slice (s5 or whatever), but I don't think so. > - wait for resilver. > - repeat with the other four disks. - tell us how it went - drink beer. -- Dan.
pgpAeO0FFlOUi.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss