comment far below...

On Aug 21, 2009, at 6:17 PM, Tim Cook wrote:
On Fri, Aug 21, 2009 at 8:04 PM, Richard Elling <richard.ell...@gmail.com > wrote:
On Aug 21, 2009, at 5:55 PM, Tim Cook wrote:
On Fri, Aug 21, 2009 at 7:41 PM, Richard Elling <richard.ell...@gmail.com > wrote:

My vote is with Ross. KISS wins :-)
Disclaimer: I'm also a member of BAARF.


My point is, RAIDZx+1 SHOULD be simple. I don't entirely understand why it hasn't been implemented. I can only imagine like so many other things it's because there hasn't been significant customer demand. Unfortunate if it's as simple as I believe it is to implement. (No, don't ask me to do it, I put in my time programming in college and have no desire to do it again :))

You can get in the same ballpark with at least two top-level raidz2 devs and copies=2. If you have three or more top-level raidz2 vdevs, then you can even
do better with copies=3 ;-)

Note that I do not have a model for that because it would require separate failure rate data for whole disk failures and all other non-whole disk failures. The latter is not available in data sheets. The closest I can get with published data is using the MTTDL[2] model which considers the published unrecoverable read error rate. In other words, the model would be easy, but data to feed the model is not available :-( Suffice to say, 2 top-level raidz2 vdevs of similar size with copies=2 should offer very nearly the same protection as raidz2+1.
 -- richard


You sure about that? Say I have a sas controller shit the bed (pardon the french), and take one of the JBOD's out entirely. Even with copies=2, isn't the entire pool going tits up and offline when it loses an entire vdev?

Yes. But you need to understand that the probability of a SAS controller failing is much, much smaller than a disk. So in order to properly model the system, you can't treat them as having the same failure rate (the difference is an order of magnitude for HDDs). Depending on the repair policy, the probability of losing a SAS controller is expected to be less than the probability of losing 3 disks in a raidz2. Since SAS is relatively easy to make redundant, a really paranoid person would have two SAS controllers and the probability of losing two highly-reliable
SAS controllers at the same time is way small :-)

It would seem to me copies=2 is only applicable when you have both an entire disk loss, and corrupt data on the "good disks". But feel free to enlighten :) That scenario seems far less likely than having a controller go bad, but that's with my anecdotal personal experiences.

As the Kinks sing, "paranoia will destroy ya!" :-)
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to