Yes I did mean 6+2, Thank you for fixing the typo.

I'm actually more leaning towards running a simple 7+1 RAIDZ1.
Running this with 1TB is not a problem but I just wanted to
investigate at what TB size the "scales would tip".   I understand
RAIDZ2 protects against failures during a rebuild process.  Currently,
my RAIDZ1 takes 24 hours to rebuild a failed disk, so with 2TB disks
and worse case assuming this is 2 days this is my 'exposure' time.

For example, I would hazard a confident guess that 7+1 RAIDZ1 with 6TB
drives wouldn't be a smart idea.  I'm just trying to extrapolate down.

I will be running hot (or maybe cold) spare.  So I don't need to
factor in "Time it takes for a manufacture to replace the drive".



On Mon, Feb 7, 2011 at 2:48 PM, Edward Ned Harvey
<opensolarisisdeadlongliveopensola...@nedharvey.com> wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Matthew Angelo
>>
>> My question is, how do I determine which of the following zpool and
>> vdev configuration I should run to maximize space whilst mitigating
>> rebuild failure risk?
>>
>> 1. 2x RAIDZ(3+1) vdev
>> 2. 1x RAIDZ(7+1) vdev
>> 3. 1x RAIDZ2(6+2) vdev
>>
>> I just want to prove I shouldn't run a plain old RAID5 (RAIDZ) with 8x
>> 2TB disks.
>
> (Corrected type-o, 6+2 for you).
> Sounds like you made up your mind already.  Nothing wrong with that.  You
> are apparently uncomfortable running with only 1 disk worth of redundancy.
> There is nothing fundamentally wrong with the raidz1 configuration, but the
> probability of failure is obviously higher.
>
> Question is how do you calculate the probability?  Because if we're talking
> abou 5e-21 versus 3e-19 then you probably don't care about the difference...
> They're both essentially zero probability...  Well...  There's no good
> answer to that.
>
> With the cited probability of bit error rate, you're just representing the
> probability of a bit error.  You're not representing the probability of a
> failed drive.  And you're not representing the probability of a drive
> failure within a specified time window.  What you really care about is the
> probability of two drives (or 3 drives) failing concurrently...  In which
> case, you need to model the probability of any one drive failing within a
> specified time window.  And even if you want to model that probability, in
> reality it's not linear.  The probability of a drive failing between 1yr and
> 1yr+3hrs is smaller than the probability of the drive failing between 3yr
> and 3yr+3hrs.  Because after 3yrs, the failure rate will be higher.  So
> after 3 yrs, the probability of multiple simultaneous failures is higher.
>
> I recently saw some seagate data sheets which specified the annual disk
> failure rate to be 0.3%.  Again, this is a linear model, representing a
> nonlinear reality.
>
> Suppose one disk fails...  How many weeks does it take to get a replacement
> onsite under the 3yr limited mail-in warranty?
>
> But then again after 3 years, you're probably considering this your antique
> hardware, and all the stuff you care about is on a newer server.  Etc.
>
> There's no good answer to your question.
>
> You are obviously uncomfortable with a single disk worth of redundancy.  Go
> with your gut.  Sleep well at night.  It only costs you $100.  You probably
> have a cell phone with no backups worth more than that in your pocket right
> now.
>
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to