Ahhhh, I see. But I think your math is a bit out:

62.5e6 iop @ 100iops
= 625000 seconds
= 10416m
= 173h
= 7D6h.

So 7 days & 6 hours. Thats long, but I can live with it. This isnt for an
enterprise environment. While the length of time is of worry in terms of
increasing the chance another drive will fail, in my mind that is mitigated
by the fact that the drives wont be under major stress during that time. Its
a workable solution.

On Thu, Sep 9, 2010 at 3:03 PM, Erik Trimble <erik.trim...@oracle.com>wrote:

>  On 9/9/2010 5:49 AM, hatish wrote:
>
>> Very interesting...
>>
>> Well, lets see if we can do the numbers for my setup.
>>
>>  From a previous post of mine:
>>
>> [i]This is my exact breakdown (cheap disks on cheap bus :P) :
>>
>>
>> PCI-E 8X 4-port ESata Raid Controller.
>> 4 x ESata to 5Sata Port multipliers (each connected to a ESata port on the
>> controller).
>> 20 x Samsung 1TB HDD's. (each connected to a Port Multiplier).
>>
>> The PCIE 8x port gives me 4GBps, which is 32Gbps. No problem there. Each
>> ESata port guarantees 3Gbps, therefore 12Gbps limit on the controller. Each
>> PM can give up to 3Gbps, which is shared amongst 5 drives. According to
>> Samsungs site, max read speed is 250MBps, which translates to 2Gbps.
>> Multiply by 5 drives gives you 10Gbps. Which is 333% of the PM's capability.
>> So the drives arent likely to hit max read speed for long lengths of time,
>> especially during rebuild time.
>>
>> So the bus is going to be quite a bottleneck. Lets assume that the drives
>> are 80% full. Thats 800GB that needs to be read on each drive, which is
>> (800x9) 7.2TB.
>> Best case scenario, we can read 7.2TB at 3Gbps
>> = 57.6 Tb at 3Gbps
>> = 57600 Gb at 3Gbps
>> = 19200 seconds
>> = 320 minutes
>> = 5 Hours 20 minutes.
>>
>> Even if it takes twice that amount of time, Im happy.
>>
>> Initially I had been thinking 2 PM's for each vdev. But now Im thinking
>> maybe split it wide as best I can ([2Ddisks per PM] x 2, [2Ddisks&1Pdisk per
>> PM] x 2) for each vdev. It'll give the best possible speed, but still wont
>> max out the HDD's.
>>
>> I've never actually sat and done the math before. Hope its decently
>> accurate :)[/i]
>>
>> My scenario, as from Erik's post:
>> Scenario: I have 10 1TB disks in a raidz2, and I have 128k
>> slab sizes. Thus, I have 16k of data for each slab written to each
>> disk. (8x16k data + 32k parity for a 128k slab size). So, each IOPS
>> gets to reconstruct 16k of data on the failed drive. It thus takes
>> about 1TB/16k = 62.5e6 IOPS to reconstruct the full 1TB drive.
>>
>> Lets assume the drives are at 95% capacity, which is a pretty bad
>> scenario. So thats 7600GB, which is 60800Gb. There will be no other IO while
>> a rebuild is going.
>> Best Case: I'll read at 12Gbps,&  write at 3Gbps (4:1). I read 128K for
>> every 16K I write (8:1). Hence the read bandwidth will be the bottleneck. So
>> 60800Gb @ 12Gbps is 5066s which is 84m27s (Never gonna happen). A more
>> realistic read of 1.5Gbps gives me 40533s, which is 675m33s, which is
>> 11h15m33s. Which is a more realistic time to read 7.6TB.
>>
>
>
> Actually, your biggest bottleneck will be the IOPS limits of the drives.  A
> 7200RPM SATA drive tops out at 100 IOPS.  Yup. That's it.
>
> So, if you need to do 62.5e6 IOPS, and the rebuild drive can do just 100
> IOPS, that means you will finish (best case) in 62.5e4 seconds.  Which is
> over 173 hours. Or, about 7.25 WEEKS.
>
>
> --
> Erik Trimble
> Java System Support
> Mailstop:  usca22-123
> Phone:  x17195
> Santa Clara, CA
> Timezone: US/Pacific (GMT-0800)
>
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to