On Wed, Feb 21, 2007 at 04:20:58PM -0800, Eric Schrock wrote:
> Seems like there are a two pieces you're suggesting here:
> 
> 1. Some sort of background process to proactively find errors on disks
>    in use by ZFS.  This will be accomplished by a background scrubbing
>    option, dependent on the block-rewriting work Matt and Mark are
>    working on.  This will allow something like "zpool set scrub=2weeks",
>    which will tell ZFS to "scrub my data at an interval such that all
>    data is touched over a 2 week period".  This will test reading from
>    every block and verifying checksums.  Stressing write failures is a
>    little more difficult.

I got the impression that testing free disk space was also desired.

> 2. Distinguish "slow" drives from "normal" drives and proactively mark
>    them faulted.  This shouldn't require an explicit "zpool dft", as
>    we should be watching the response times of the various drives and
>    keep this as a statistic.  We want to incorporate this information
>    to allow better allocation amongst slower and faster drives.
>    Determining that a drive is "abnormally slow" is much more difficult,
>    though it could theoretically be done if we had some basis - either
>    historical performance for the same drive or comparison to identical
>    drives (manufacturer/model) within the pool.  While we've thought
>    about these same issues, there is currently no active effort to keep
>    track of these statistics or do anything with them.

I would imagine that "slow" as in "long average seek times" should be
relatively easy to detect, whereas "slow" as in "low bandwidth" might be
harder (since I/O bandwidth might depend on characteristics of the
device path and how saturated it is).  Are long average seek times an
indication of trouble?

Nico
-- 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to