On May 15, 2011, at 8:01 PM, Edward Ned Harvey 
<opensolarisisdeadlongliveopensola...@nedharvey.com> wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Jim Klimov
>> 
>> On one hand, I've read that as current drives get larger (while their
> random
>> IOPS/MBPS don't grow nearly as fast with new generations), it is becoming
>> more and more reasonable to use RAIDZ3 with 3 redundancy drives, at least
>> for vdevs made of many disks - a dozen or so. When a drive fails, you
> still
>> have two redundant parities, and with a resilver window expected to be in
>> hours if not days range, I would want that airbag, to say the least. 
> 
> This is both an underestimation of the time required, and a sort of backward
> logic...
> 
> In all of the following, I'm assuming you're creating a pool whose primary
> storage is hard drives, not SSDs or similar.
> 
> The resilver time scales linearly with the number of slabs (blocks) in the
> degraded vdev, and depends on your usage patterns, which determine how
> randomly your data got scattered throughout the vdev upon writes.  

In all of my studies of resilvering, I have never seen a linear correlation
nor a correlation to the number of blocks. Can you share your data, or is
this another guess?

> I assume
> your choice of raid type will not determine your usage patterns.  So if you
> create a big vdev (raidz3) as opposed to a bunch of smaller ones (mirrors)
> the resilver time is longer for the large vdev. 
> 
> Also, even in the best case scenario (mirrors) assuming you have a pool
> that's reasonably full (say, 50% to 70%) the resilver time is likely to take
> several times longer than a complete sequential read/write of the entire
> disk.

Several times isn't significant wrt data protection. 10x to 100x is significant.

>  In one of my systems, I have 1TB mirrors, 70% full, which can be
> sequentially completely read/written in 2 hrs.  But the resilver took 12
> hours of idle time.  Supposing you had a 70% full pool of raidz3, 2TB disks,
> using 10 disks + 3 parity, and a usage pattern similar to mine, your
> resilver time would have been minimum 10 days,

bollix

> likely approaching 20 or 30
> days.  (Because you wouldn't get 2-3 weeks of consecutive idle time, and the
> random access time for a raidz approaches 2x the random access time of a
> mirror.)

totally untrue

> BTW, the reason I chose 10+3 disks above was just because it makes
> calculation easy.  It's easy to multiply by 10.  I'm not suggesting using
> that configuration.  You may notice that I don't recommend raidz for most
> situations.  I endorse mirrors because they minimize resilver time (and
> maximize performance in general).  Resilver time is a problem for ZFS, which
> they may fix someday.

Resilver time is not a significant problem with ZFS. Resilver time is a much
bigger problem with traditional RAID systems. In any case, it is bad systems
engineering to optimize a system for best resilver time.
 -- richard


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to