On Fri, 3 Nov 2006, Richard Elling - PAE wrote: > ozan s. yigit wrote: > > for s10u2, documentation recommends 3 to 9 devices in raidz. what is the > > basis for this recommendation? i assume it is performance and not failure > > resilience, but i am just guessing... [i know, recommendation was intended > > for people who know their raid cold, so it needed no further explanation] > > Both actually. > The small, random read performance will approximate that of a single disk. > The probability of data loss increases as you add disks to a RAID-5/6/Z/Z2 > volumes. > > For example, suppose you have 12 disks and insist on RAID-Z. > Given > 1. small, random read iops for a single disk is 141 (eg. 2.5" SAS > 10k rpm drive) > 2. MTBF = 1.4M hours (0.63% AFR) (so says the disk vendor) > 3. no spares > 4. service time = 24 hours, resync rate 100 GBytes/hr, 50% space > utilization > 5. infinite service life > > Scenario 1: 12-way RAID-Z > performance = 141 iops > MTTDL[1] = 68,530 years > space = 11 * disk size > > Scenario 2: 2x 6-way RAID-Z+0 > performance = 282 iops > MTTDL[1] = 150,767 years > space = 10 * disk size > > [1] Using MTTDL = MTBF^2 / (N * (N-1) * MTTR)
But ... I'm not sure I buy into your numbers given the probability that more than one disk will fail inside the service window - given that the disks are identical? Or ... a disk failure occurs at 5:01 PM (quitting time) on a Friday and won't be replaced until 8:00AM on Monday morning. Does the failure data you have access to support my hypothesis that failures of identical mechanical systems tend to occur in small clusters within a relatively small window of time? Call me paranoid, but I'd prefer to see a product like thumper configured with 50% of the disks manufactured by vendor A and the other 50% manufactured by someone else. This paranoia is based on a personal experience, many years ago (before we had smart fans etc), where we had a rack full of expensive custom equipment cooled by (what we thought was) a highly redundant group of 5 fans. One fan suffered infant mortality and its failure went unnoticed, leaving 4 fans running. Two of the fans died on the same extended weekend (public holiday). It was an expensive and embarassing disaster. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss