Re: [zfs-discuss] repost - high read iops

Tim Cook Tue, 29 Dec 2009 10:32:21 -0800

On Tue, Dec 29, 2009 at 12:07 PM, Richard Elling
<richard.ell...@gmail.com>wrote:


> On Dec 29, 2009, at 9:16 AM, Brad wrote:
>
>  @eric
>>
>> "As a general rule of thumb, each vdev has the random performance
>> roughly the same as a single member of that vdev. Having six RAIDZ
>> vdevs in a pool should give roughly the performance as a stripe of six
>> bare drives, for random IO."
>>
>
> This model begins to break down with raidz2 and further breaks down
> with raidz3.  Since I wrote about this simple model here:
>
> http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance
> we've refined it a bit, to take into account the number of parity devices.
>
> For small, random read IOPS the performance of a single, top-level vdev is
>        performance = performance of a disk * (N / (N - P))
>
> where,
>        N = number of disks in the vdev
>        P = number of parity devices in the vdev
>
> For example, using 5 disks @ 100 IOPS we get something like:
>        2-disk mirror: 200 IOPS
>        4+1 raidz: 125 IOPS
>        3+2 raidz2: 167 IOPS
>        2+3 raidz3:  250 IOPS
>
> Once again, it is clear that mirroring will offer the best small, random
> read
> IOPS.
>
>
>  It sounds like we'll need 16 vdevs striped in a pool to at least get the
>> performance of 15 drives plus another 16 mirrored for redundancy.
>>
>> If we are bounded in iops by the vdev, would it make sense to go with the
>> bare minimum of drives (3) per vdev?
>>
>> "This winds up looking similar to RAID10 in layout, in that you're
>> striping across a lot of disks that each consists of a mirror, though
>> the checksumming rules are different. Performance should also be
>> similar, though it's possible RAID10 may give slightly better random
>> read performance at the expense of some data quality guarantees, since
>> I don't believe RAID10 normally validates checksums on returned data
>> if the device didn't return an error. In normal practice, RAID10 and
>> a pool of mirrored vdevs should benchmark against each other within
>> your margin of error."
>>
>> That's interesting to know that with ZFS's implementation of raid10 it
>> doesn't have checksumming built-in.
>>
>
> ZFS always checksums everything unless you explicitly disable
> checksumming for data. Metadata is always checksummed.
>  -- richard
>
>
>
I imagine he's referring to the fact that it cannot fix any checksum errors
it finds.  <flamesuit>Let me open the can of worms by saying this is nearly
as bad as not doing checksumming at all.  Knowing the data is bad when you
can't do anything to fix it doesn't really help if you have no way to
regenerate it. </flamesuit>


-- 
--Tim

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] repost - high read iops

Reply via email to