On May 15, 2011, at 8:01 PM, Edward Ned Harvey <opensolarisisdeadlongliveopensola...@nedharvey.com> wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Jim Klimov >> >> On one hand, I've read that as current drives get larger (while their > random >> IOPS/MBPS don't grow nearly as fast with new generations), it is becoming >> more and more reasonable to use RAIDZ3 with 3 redundancy drives, at least >> for vdevs made of many disks - a dozen or so. When a drive fails, you > still >> have two redundant parities, and with a resilver window expected to be in >> hours if not days range, I would want that airbag, to say the least. > > This is both an underestimation of the time required, and a sort of backward > logic... > > In all of the following, I'm assuming you're creating a pool whose primary > storage is hard drives, not SSDs or similar. > > The resilver time scales linearly with the number of slabs (blocks) in the > degraded vdev, and depends on your usage patterns, which determine how > randomly your data got scattered throughout the vdev upon writes. In all of my studies of resilvering, I have never seen a linear correlation nor a correlation to the number of blocks. Can you share your data, or is this another guess? > I assume > your choice of raid type will not determine your usage patterns. So if you > create a big vdev (raidz3) as opposed to a bunch of smaller ones (mirrors) > the resilver time is longer for the large vdev. > > Also, even in the best case scenario (mirrors) assuming you have a pool > that's reasonably full (say, 50% to 70%) the resilver time is likely to take > several times longer than a complete sequential read/write of the entire > disk. Several times isn't significant wrt data protection. 10x to 100x is significant. > In one of my systems, I have 1TB mirrors, 70% full, which can be > sequentially completely read/written in 2 hrs. But the resilver took 12 > hours of idle time. Supposing you had a 70% full pool of raidz3, 2TB disks, > using 10 disks + 3 parity, and a usage pattern similar to mine, your > resilver time would have been minimum 10 days, bollix > likely approaching 20 or 30 > days. (Because you wouldn't get 2-3 weeks of consecutive idle time, and the > random access time for a raidz approaches 2x the random access time of a > mirror.) totally untrue > BTW, the reason I chose 10+3 disks above was just because it makes > calculation easy. It's easy to multiply by 10. I'm not suggesting using > that configuration. You may notice that I don't recommend raidz for most > situations. I endorse mirrors because they minimize resilver time (and > maximize performance in general). Resilver time is a problem for ZFS, which > they may fix someday. Resilver time is not a significant problem with ZFS. Resilver time is a much bigger problem with traditional RAID systems. In any case, it is bad systems engineering to optimize a system for best resilver time. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss