Torrey McMahon wrote:
Richard Elling wrote:
Good question. If you consider that mechanical wear out is what ultimately
causes many failure modes, then the argument can be made that a spun down
disk should last longer. The problem is that there are failure modes which
are triggered by a spin up.  I've never seen field data showing the difference
between the two.

Often, the spare is up and running but for whatever reason you'll have a bad block on it and you'll die during the reconstruct. Periodically checking the spare means reading and writing from over time in order to make sure it's still ok. (You take the spare out of the trunk, you look at it, you check the tire pressure, etc.) The issue I see coming down the road is that we'll start getting into a "Golden Gate paint job" where it takes so long to check the spare that we'll just keep the process going constantly. Not as much wear and tear as real i/o but it will still be up and running the entire time and you won't be able to spin the spare down.

In my experience, checking the spare tire leads to getting a flat and needing
the spare about a week later :-)  It has happened to me twice in the past
few years... I suspect a conspiracy... :-)

Back to the topic, I'd believe that some combination of hot, warm, and
cold spares would be optimal.

Anton B. Rang wrote:
> Shouldn't SCSI/ATA block sparing handle this?  Reconstruction should be
> purely a matter of writing, so "bit rot" shouldn't be an issue; or are
> there cases I'm not thinking of? (Yes, I know there are a limited number of
> spare blocks, but I wouldn't expect a spare which is turned off to develop
> severe media problems...am I wrong?)

In the disk, at the disk block level, there is fairly substantial ECC.
Yet, we still see data loss.  There are many mechanisms at work here.  One
that we have studied to some detail is superparamagnetic decay -- the medium
wishes to decay to a lower-enegy state, losing information in the process.
One way to "prevent" this is to rewrite the data -- basically resetting the
decay clock.  The study we did on this says that rewriting your data once
per year is reasonable.  Note that ZFS is COW, and scrubbing is currently a
read operation which will only write when data needs to be reconstructed.
I look at this as: rewrite-style scrubbing is preventative, read and verify
style scrubbing is prescriptive.  Either is better than neither.

In short, use spares and scrub.
 -- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to