Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-25 Thread Ross Walker
On Jul 23, 2010, at 10:14 PM, Edward Ned Harvey sh...@nedharvey.com wrote:

 From: Arne Jansen [mailto:sensi...@gmx.net]
 
 Can anyone else confirm or deny the correctness of this statement?
 
 As I understand it that's the whole point of raidz. Each block is its
 own
 stripe. 
 
 Nope, that doesn't count for confirmation.  It is at least theoretically
 possible to implement raidz using techniques that would (a) unintelligently
 stripe all blocks (even small ones) across multiple disks, thus hurting
 performance on small operations, or (b) implement raidz such that striping
 of blocks behaves differently for small operations (plus parity).  So the
 confirmation I'm looking for would be somebody who knows the actual source
 code, and the actual architecture that was chosen to implement raidz in this
 case.

Maybe this helps?

http://blogs.sun.com/ahl/entry/what_is_raid_z

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-23 Thread Edward Ned Harvey
 From: Robert Milkowski [mailto:mi...@task.gda.pl]

 [In raidz] The issue is that each zfs filesystem block is basically 
 spread across
 n-1 devices.
 So every time you want to read back a single fs block you need to wait
 for all n-1 devices to provide you with a part of it - and keep in mind
 in zfs you can't get a partial block even if that's what you are asking
 for as zfs has to check checksum of entire fs block.

Can anyone else confirm or deny the correctness of this statement?

If you read a small file from a raidz volume, do you have to wait for every
single disk to return a small chunk of the blocksize?  I know this is true
for large files which require more than one block, obviously, but even a
small file gets spread out across multiple disks?

This may be the way it's currently implemented, but it's not a mathematical
requirement.  It is possible, if desired, to implement raid parity and still
allow small files to be written entirely on a single disk, without losing
redundancy.  Thus providing the redundancy, the large file performance,
(both of which are already present in raidz), and also optimizing small file
random operations, which may not already be optimized in raidz.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-23 Thread Arne Jansen

Edward Ned Harvey wrote:

From: Robert Milkowski [mailto:mi...@task.gda.pl]

[In raidz] The issue is that each zfs filesystem block is basically 
spread across

n-1 devices.
So every time you want to read back a single fs block you need to wait
for all n-1 devices to provide you with a part of it - and keep in mind
in zfs you can't get a partial block even if that's what you are asking
for as zfs has to check checksum of entire fs block.


Can anyone else confirm or deny the correctness of this statement?

If you read a small file from a raidz volume, do you have to wait for every
single disk to return a small chunk of the blocksize?  I know this is true
for large files which require more than one block, obviously, but even a
small file gets spread out across multiple disks?

This may be the way it's currently implemented, but it's not a mathematical
requirement.  It is possible, if desired, to implement raid parity and still
allow small files to be written entirely on a single disk, without losing
redundancy.  Thus providing the redundancy, the large file performance,
(both of which are already present in raidz), and also optimizing small file
random operations, which may not already be optimized in raidz.


As I understand it that's the whole point of raidz. Each block is its own
stripe. If necessary the block gets broken down into 512 byte chunks to spread
it as wide as possible. Each block gets its own parity added. So if the array
is too wide for the block to be spread to all disks, you also lose space because
the stripe is not full and parity gets added to that small stripe. That means
if you only write 512 byte blocks, each write writes 3 blocks to disk, so the
net capacity goes down to one third, regardless how many disks you have in your
raid group.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-23 Thread Edward Ned Harvey
 From: Arne Jansen [mailto:sensi...@gmx.net]
 
  Can anyone else confirm or deny the correctness of this statement?
 
 As I understand it that's the whole point of raidz. Each block is its
 own
 stripe. 

Nope, that doesn't count for confirmation.  It is at least theoretically
possible to implement raidz using techniques that would (a) unintelligently
stripe all blocks (even small ones) across multiple disks, thus hurting
performance on small operations, or (b) implement raidz such that striping
of blocks behaves differently for small operations (plus parity).  So the
confirmation I'm looking for would be somebody who knows the actual source
code, and the actual architecture that was chosen to implement raidz in this
case.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-22 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Robert Milkowski
 
 I had a quick look at your results a moment ago.
 The problem is that you used a server with 4GB of RAM + a raid card
 with
 a 256MB of cache.
 Then your filesize for iozone was set to 4GB - so random or not you
 probably had a relatively good cache hit ratio for random reads. And

Look again in the raw_results.  I ran it with 4G, and also with 12G.  There
was no significant difference between the two, so I only compiled the 4G
results into a spreadsheet PDF.


 even then a random read from 8 threads gave you only about 40% more
 IOPS
 than for a RAID-Z made out of 5 disks than a single drive. The poor
 result for HW-R5 is surprising though but it might be that a stripe
 size
 was not matched to ZFS recordsize and iozone block size in this case.

I think what you're saying is With 5 disks performing well, you should
expect 4x higher iops than a single disk, and the measured result was only
40% higher, which is a poor result.

I agree.  I guess the 128k recordsize used in iozone is probably large
enough that it frequently causes blocks to span disks?  I don't know.


 The issue with raid-z and random reads is that as cache hit ratio goes
 down to 0 the IOPS approaches IOPS of a single drive. For a little bit
 more information see http://blogs.sun.com/roch/entry/when_to_and_not_to

I don't think that's correct, unless you're using a single thread.  As long
as multiple threads are issuing random reads on raidz, and those reads are
small enough that each one is entirely written on a single disk, then you
should be able to get n-1 disk operating simultaneously, to achieve (n-1)x
performance of a single disk.

Even if blocks are large enough to span disks, you should be able to get
(n-1)x performance of a single disk for large sequential operations.  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-22 Thread Robert Milkowski

On 22/07/2010 03:25, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Robert Milkowski
 
   

I had a quick look at your results a moment ago.
The problem is that you used a server with 4GB of RAM + a raid card
with
a 256MB of cache.
Then your filesize for iozone was set to 4GB - so random or not you
probably had a relatively good cache hit ratio for random reads. And
 

Look again in the raw_results.  I ran it with 4G, and also with 12G.  There
was no significant difference between the two, so I only compiled the 4G
results into a spreadsheet PDF.


   


The only tests with 12GB file size in raw files are a mirror and a 
single disk configuration.

There are no results for raid-z there.



even then a random read from 8 threads gave you only about 40% more
IOPS
than for a RAID-Z made out of 5 disks than a single drive. The poor
result for HW-R5 is surprising though but it might be that a stripe
size
was not matched to ZFS recordsize and iozone block size in this case.
 

I think what you're saying is With 5 disks performing well, you should
expect 4x higher iops than a single disk, and the measured result was only
40% higher, which is a poor result.

I agree.  I guess the 128k recordsize used in iozone is probably large
enough that it frequently causes blocks to span disks?  I don't know.

   


Probably - but it would also depend on how you configured hw-r5 (mainly 
it's stripe size).
The other thing is that you might have had some bottleneck somewhere 
else as your results for N-way mirrors aren't that good either.


   

The issue with raid-z and random reads is that as cache hit ratio goes
down to 0 the IOPS approaches IOPS of a single drive. For a little bit
more information see http://blogs.sun.com/roch/entry/when_to_and_not_to
 

I don't think that's correct, less you're using a single thread.  As long
as multiple threads are issuing random reads on raidz, and those reads are
small enough that each one is entirely written on a single disk, then you
should be able to get n-1 disk operating simultaneously, to achieve (n-1)x
performance of a single disk.

Even if blocks are large enough to span disks, you should be able to get
(n-1)x performance of a single disk for large sequential operations.
   


While it is tru to some degree for hw raid-5, raid-z doesn't work that way.
The issue is that each zfs filesystem block is basically spread across 
n-1 devices.
So every time you want to read back a single fs block you need to wait 
for all n-1 devices to provide you with a part of it - and keep in mind 
in zfs you can't get a partial block even if that's what you are asking 
for as zfs has to check checksum of entire fs block.
Now multiple readers make it actually worse for raid-z (assuming very 
poor cache hit ratio) - because each read from each reader involves all 
disk drives basically others can't read anything until it is done. It 
gets really bad for random reads. With HW raid-5 is your stripe size 
matches block you are reading back for random reads it is probable that 
while reader-X1 is reading from disk-Y1 reader-X2 is reading from 
disk-Y2 so you should end-up with all disk drives (-1) contributing to 
better overall iops.


Read Roch's blog entry carefully for more information.

btw: even in your results 6x disks in raid-z provided over 3x less IOPS 
than zfs raid-10 configuration for random reads. It is a big difference 
if one needs performance.


--
Robert Milkowski
http://milek.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-21 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of v
 
 for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to
 one physical disk iops, since raidz1 is like raid5 , so is raid5 has
 same performance like raidz1? ie. random iops equal to one physical
 disk's ipos.

I tested this extensively about 6 months ago.  Please see
http://www.nedharvey.com for more details.  I disagree with the assumptions
you've made above, and I'll say this instead:

Look at
http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary.
pdf
Go down to the 2nd section, Compared to a single disk
Look at single-disk and raidz-5disks and raid5-5disks-hardware

You'll see that both raidz and raid5 are significantly faster than a single
disk in all types of operations.  In all cases, raidz is approximately equal
to, or significantly faster than hardware raid5.

Furthermore, I later went on to test performance using nonvolatile devices
(such as SSD) for ZIL dedicated log device, and in those situations, the
performance of ZFS with dedicated log device beat hardware writeback caching
easily.  So put simply:  ZFS raid is faster than the fastest hardware raid.
Because ZFS has knowledge of the filesystem and blocks, while hardware raid
only has knowledge of the blocks.  So ZFS is able to be more intelligent in
the techniques it uses for acceleration.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-21 Thread Robert Milkowski

On 21/07/2010 15:40, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of v

for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to
one physical disk iops, since raidz1 is like raid5 , so is raid5 has
same performance like raidz1? ie. random iops equal to one physical
disk's ipos.
 

I tested this extensively about 6 months ago.  Please see
http://www.nedharvey.com for more details.  I disagree with the assumptions
you've made above, and I'll say this instead:

Look at
http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary.
pdf
Go down to the 2nd section, Compared to a single disk
Look at single-disk and raidz-5disks and raid5-5disks-hardware

You'll see that both raidz and raid5 are significantly faster than a single
disk in all types of operations.  In all cases, raidz is approximately equal
to, or significantly faster than hardware raid5.

   

I had a quick look at your results a moment ago.
The problem is that you used a server with 4GB of RAM + a raid card with 
a 256MB of cache.
Then your filesize for iozone was set to 4GB - so random or not you 
probably had a relatively good cache hit ratio for random reads. And 
even then a random read from 8 threads gave you only about 40% more IOPS 
than for a RAID-Z made out of 5 disks than a single drive. The poor 
result for HW-R5 is surprising though but it might be that a stripe size 
was not matched to ZFS recordsize and iozone block size in this case.


The issue with raid-z and random reads is that as cache hit ratio goes 
down to 0 the IOPS approaches IOPS of a single drive. For a little bit 
more information see http://blogs.sun.com/roch/entry/when_to_and_not_to


--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-21 Thread Ulrich Graef

There is a common misconception about the comparison between
mirror and raidz.

You get the same performance, when you use the same number of disks.
But the resulting filesystem has a different sizre, therefore a comparison
is not applicable.

Example: you have 8 disks

   Compare a zpool with one raidz vdev with 8 disks
   with a zpool containing  4 mirrors of  2 disks each.

   Then the read IOs spread over 8 disks in each case.
   Therfore the number of IOs is comparable

   But you compared apples and oranges because the net size
   is 7 disks in the first case and 4 disks in the second case.

   A valid comparison would be a comparison of a zpool with
   one raidz vdev containing 5 disks with  the mirrored zpool
   containing 4 mirrors of 2 disks each.
   Because then the size of the zpool is the same.

Regards,

   Ulrich



Roy Sigurd Karlsbakk wrote:

- Original Message -
  

On Jul 20, 2010, at 6:12 AM, v victor_zh...@hotmail.com wrote:



Hi,
for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to
one physical disk iops, since raidz1 is like raid5 , so is raid5 has
same performance like raidz1? ie. random iops equal to one physical
disk's ipos.
  

On reads, no, any part of the stripe width can be read without reading
the whole stripe width, giving performance equal to raid0 of
non-parity disks.



Are you sure this is true? I know it is, in theory, but some testing with 
bonnie++ showed me I didn't get so large a gain. Perhaps my tests were done 
wrong?

Vennlige hilsener / Best regards

roy
  



--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  



--
Ulrich Graef / Senior SE / Hardware Presales / Phone: + 49 6103 752 359
ORACLE Deutschland B.V.  Co. KG / Amperestr. 6 / 63225 Langen
http://www.oracle.com

ORACLE Deutschland B.V.  Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 Muenchen
Registergericht: Amtsgericht Muenchen, HRA 95603

Komplementaerin: ORACLE Deutschland Verwaltung B.V.
Rijnzathe 6, 3454PV De Meern, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschaeftsfuehrer: Juergen Kunz, Marcel van de Molen, Alexander van der Ven

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-20 Thread Roy Sigurd Karlsbakk
- Original Message -
 Hi,
 for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to
 one physical disk iops, since raidz1 is like raid5 , so is raid5 has
 same performance like raidz1? ie. random iops equal to one physical
 disk's ipos.

Mostly, yes. Traditionl RAID-5 is likely to be faster than ZFS because of ZFS 
doing checksumming, having the ZIL etc, but then, trad raid5 won't have the 
safety offered by ZFS

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-20 Thread Ross Walker
On Jul 20, 2010, at 6:12 AM, v victor_zh...@hotmail.com wrote:

 Hi,
 for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to one 
 physical disk iops, since raidz1 is like raid5 , so is raid5 has same 
 performance like raidz1? ie. random iops equal to one physical disk's ipos.

On reads, no, any part of the stripe width can be read without reading the 
whole stripe width, giving performance equal to raid0 of non-parity disks.

On writes it could be worse then raidz1 depending on whether whole stripe 
widths are being written (same performance) or partial stripe widths are being 
written (worse performance). If it's a partial stripe width then the remaining 
data needs to be read off disk which doubles the IOs.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-20 Thread Roy Sigurd Karlsbakk
- Original Message -
 On Jul 20, 2010, at 6:12 AM, v victor_zh...@hotmail.com wrote:
 
  Hi,
  for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to
  one physical disk iops, since raidz1 is like raid5 , so is raid5 has
  same performance like raidz1? ie. random iops equal to one physical
  disk's ipos.
 
 On reads, no, any part of the stripe width can be read without reading
 the whole stripe width, giving performance equal to raid0 of
 non-parity disks.

Are you sure this is true? I know it is, in theory, but some testing with 
bonnie++ showed me I didn't get so large a gain. Perhaps my tests were done 
wrong?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision

2010-07-20 Thread Richard Elling
On Jul 20, 2010, at 3:46 AM, Roy Sigurd Karlsbakk wrote:
 - Original Message -
 Hi,
 for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to
 one physical disk iops, since raidz1 is like raid5 , so is raid5 has
 same performance like raidz1? ie. random iops equal to one physical
 disk's ipos.
 
 Mostly, yes. Traditionl RAID-5 is likely to be faster than ZFS because of ZFS 
 doing checksumming, having the ZIL etc, but then, trad raid5 won't have the 
 safety offered by ZFS

Disagree. ZIL has nothing to do with RAIDness. Traditional RAID-5 suffers
from a read-modify-write sequence if the I/O is not perfectly matched to the
stripe width -- a 3x latency hit.  In raidz, the writes are always full stripe, 
so
there is only a 1x latency hit.  

OTOH, for reads, some RAID-5 implementations will read only a single 
portion of a stripe, if the I/O is small enough to fit. In this case, the small,
random read performance can approach RAID-0. raidz will always read
the full block, even though the full block might not be spread across all
of the disks. ZFS does this to verify the checksum of the data.

This is the classic tradeoff -- space, performance, dependability: pick two.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss