Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

Daniel Carosone Sun, 22 Apr 2012 22:37:27 -0700

On Mon, Apr 23, 2012 at 05:48:16AM +0200, Manuel Ryan wrote:
> After a reboot of the machine, I have no more write errors on disk 2 (only
> 4 checksum, not growing), I was able to access data which I previously
> couldn't and now only the checksum errors on disk 5 are growing.


Well, that's good, but what changed?   If it was just a reboot and
perhaps power-cycle of the disks, I don't think you've solved much in
the long term.. 

> Fortunately, I was able to recover all important data in those conditions
> (yeah !),

.. though that's clearly the most important thing!

If you're down to just checksum errors now, then run a scrub and see
if they can all be repaired, before replacing the disk.  If you
haven't been able to get a scrub complete, then either:
 * delete unimportant / rescued data, until none of the problem
   sectors are referenced any longer, or
 * "replace" the disk like I suggested last time, with a copy under
   zfs' nose and switch

> And since I can live with loosing the pool now, I'll gamble away and
> replace drive 5 tomorrow and if that fails i'll just destroy the pool,
> replace the 2 physical disks and build a new one (maybe raidz2 this time :))

You know what?  If you're prepared to do that in the worst of
circumstances, it would be a very good idea to do that under the best
of circumstances.  If you can, just rebuild it raidz2 and be happier
next time something flaky happens with this hardware.
 
> I'll try to leave all 6 original disks in the machine while replacing,
> maybe zfs will be smart enough to use the 6 drives to build the replacement
> disk ?

I don't think it will.. others who know the code, feel free to comment
otherwise.

If you've got the physical space for the extra disk, why not keep it
there and build the pool raidz2 with the same capacity? 

> It's a miracle that zpool still shows disk 5 as "ONLINE", here's a SMART
> dump of disk 5 (1265 Current_Pending_Sector, ouch) 

That's all indicative of read errors. Note that your reallocated
sector count on that disk is still low, so most of those will probably
clear when overwritten and given a chance to re-map.

If these all appeared suddenly, clearly the disk has developed a
problem. Normally, they appear gradually as head sensitivity
diminishes. 

How often do you normally run a scrub, before this happened?  It's
possible they were accumulating for a while but went undetected for
lack of read attempts to the disk.  Scrub more often!

--
Dan.

pgpFByqrFnHeY.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

Reply via email to