On Fri, 3 Nov 2006, Richard Elling - PAE wrote:

> ozan s. yigit wrote:
> > for s10u2, documentation recommends 3 to 9 devices in raidz. what is the
> > basis for this recommendation? i assume it is performance and not failure
> > resilience, but i am just guessing... [i know, recommendation was intended
> > for people who know their raid cold, so it needed no further explanation]
>
> Both actually.
> The small, random read performance will approximate that of a single disk.
> The probability of data loss increases as you add disks to a RAID-5/6/Z/Z2
> volumes.
>
> For example, suppose you have 12 disks and insist on RAID-Z.
> Given
>       1. small, random read iops for a single disk is 141 (eg. 2.5" SAS
>          10k rpm drive)
>       2. MTBF = 1.4M hours (0.63% AFR) (so says the disk vendor)
>       3. no spares
>       4. service time = 24 hours, resync rate 100 GBytes/hr, 50% space
>          utilization
>       5. infinite service life
>
> Scenario 1: 12-way RAID-Z
>       performance = 141 iops
>       MTTDL[1] = 68,530 years
>       space = 11 * disk size
>
> Scenario 2: 2x 6-way RAID-Z+0
>       performance = 282 iops
>       MTTDL[1] = 150,767 years
>       space = 10 * disk size
>
> [1] Using MTTDL = MTBF^2 / (N * (N-1) * MTTR)

But ... I'm not sure I buy into your numbers given the probability that
more than one disk will fail inside the service window - given that the
disks are identical?  Or ... a disk failure occurs at 5:01 PM (quitting
time) on a Friday and won't be replaced until 8:00AM on Monday morning.
Does the failure data you have access to support my hypothesis that
failures of identical mechanical systems tend to occur in small clusters
within a relatively small window of time?

Call me paranoid, but I'd prefer to see a product like thumper configured
with 50% of the disks manufactured by vendor A and the other 50%
manufactured by someone else.

This paranoia is based on a personal experience, many years ago (before we
had smart fans etc), where we had a rack full of expensive custom
equipment cooled by (what we thought was) a highly redundant group of 5
fans.  One fan suffered infant mortality and its failure went unnoticed,
leaving 4 fans running.  Two of the fans died on the same extended weekend
(public holiday).  It was an expensive and embarassing disaster.

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
             OpenSolaris Governing Board (OGB) Member - Feb 2006
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to