Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On Jul 23, 2010, at 10:14 PM, Edward Ned Harvey sh...@nedharvey.com wrote: From: Arne Jansen [mailto:sensi...@gmx.net] Can anyone else confirm or deny the correctness of this statement? As I understand it that's the whole point of raidz. Each block is its own stripe. Nope, that doesn't count for confirmation. It is at least theoretically possible to implement raidz using techniques that would (a) unintelligently stripe all blocks (even small ones) across multiple disks, thus hurting performance on small operations, or (b) implement raidz such that striping of blocks behaves differently for small operations (plus parity). So the confirmation I'm looking for would be somebody who knows the actual source code, and the actual architecture that was chosen to implement raidz in this case. Maybe this helps? http://blogs.sun.com/ahl/entry/what_is_raid_z -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
From: Robert Milkowski [mailto:mi...@task.gda.pl] [In raidz] The issue is that each zfs filesystem block is basically spread across n-1 devices. So every time you want to read back a single fs block you need to wait for all n-1 devices to provide you with a part of it - and keep in mind in zfs you can't get a partial block even if that's what you are asking for as zfs has to check checksum of entire fs block. Can anyone else confirm or deny the correctness of this statement? If you read a small file from a raidz volume, do you have to wait for every single disk to return a small chunk of the blocksize? I know this is true for large files which require more than one block, obviously, but even a small file gets spread out across multiple disks? This may be the way it's currently implemented, but it's not a mathematical requirement. It is possible, if desired, to implement raid parity and still allow small files to be written entirely on a single disk, without losing redundancy. Thus providing the redundancy, the large file performance, (both of which are already present in raidz), and also optimizing small file random operations, which may not already be optimized in raidz. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
Edward Ned Harvey wrote: From: Robert Milkowski [mailto:mi...@task.gda.pl] [In raidz] The issue is that each zfs filesystem block is basically spread across n-1 devices. So every time you want to read back a single fs block you need to wait for all n-1 devices to provide you with a part of it - and keep in mind in zfs you can't get a partial block even if that's what you are asking for as zfs has to check checksum of entire fs block. Can anyone else confirm or deny the correctness of this statement? If you read a small file from a raidz volume, do you have to wait for every single disk to return a small chunk of the blocksize? I know this is true for large files which require more than one block, obviously, but even a small file gets spread out across multiple disks? This may be the way it's currently implemented, but it's not a mathematical requirement. It is possible, if desired, to implement raid parity and still allow small files to be written entirely on a single disk, without losing redundancy. Thus providing the redundancy, the large file performance, (both of which are already present in raidz), and also optimizing small file random operations, which may not already be optimized in raidz. As I understand it that's the whole point of raidz. Each block is its own stripe. If necessary the block gets broken down into 512 byte chunks to spread it as wide as possible. Each block gets its own parity added. So if the array is too wide for the block to be spread to all disks, you also lose space because the stripe is not full and parity gets added to that small stripe. That means if you only write 512 byte blocks, each write writes 3 blocks to disk, so the net capacity goes down to one third, regardless how many disks you have in your raid group. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
From: Arne Jansen [mailto:sensi...@gmx.net] Can anyone else confirm or deny the correctness of this statement? As I understand it that's the whole point of raidz. Each block is its own stripe. Nope, that doesn't count for confirmation. It is at least theoretically possible to implement raidz using techniques that would (a) unintelligently stripe all blocks (even small ones) across multiple disks, thus hurting performance on small operations, or (b) implement raidz such that striping of blocks behaves differently for small operations (plus parity). So the confirmation I'm looking for would be somebody who knows the actual source code, and the actual architecture that was chosen to implement raidz in this case. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Robert Milkowski I had a quick look at your results a moment ago. The problem is that you used a server with 4GB of RAM + a raid card with a 256MB of cache. Then your filesize for iozone was set to 4GB - so random or not you probably had a relatively good cache hit ratio for random reads. And Look again in the raw_results. I ran it with 4G, and also with 12G. There was no significant difference between the two, so I only compiled the 4G results into a spreadsheet PDF. even then a random read from 8 threads gave you only about 40% more IOPS than for a RAID-Z made out of 5 disks than a single drive. The poor result for HW-R5 is surprising though but it might be that a stripe size was not matched to ZFS recordsize and iozone block size in this case. I think what you're saying is With 5 disks performing well, you should expect 4x higher iops than a single disk, and the measured result was only 40% higher, which is a poor result. I agree. I guess the 128k recordsize used in iozone is probably large enough that it frequently causes blocks to span disks? I don't know. The issue with raid-z and random reads is that as cache hit ratio goes down to 0 the IOPS approaches IOPS of a single drive. For a little bit more information see http://blogs.sun.com/roch/entry/when_to_and_not_to I don't think that's correct, unless you're using a single thread. As long as multiple threads are issuing random reads on raidz, and those reads are small enough that each one is entirely written on a single disk, then you should be able to get n-1 disk operating simultaneously, to achieve (n-1)x performance of a single disk. Even if blocks are large enough to span disks, you should be able to get (n-1)x performance of a single disk for large sequential operations. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On 22/07/2010 03:25, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Robert Milkowski I had a quick look at your results a moment ago. The problem is that you used a server with 4GB of RAM + a raid card with a 256MB of cache. Then your filesize for iozone was set to 4GB - so random or not you probably had a relatively good cache hit ratio for random reads. And Look again in the raw_results. I ran it with 4G, and also with 12G. There was no significant difference between the two, so I only compiled the 4G results into a spreadsheet PDF. The only tests with 12GB file size in raw files are a mirror and a single disk configuration. There are no results for raid-z there. even then a random read from 8 threads gave you only about 40% more IOPS than for a RAID-Z made out of 5 disks than a single drive. The poor result for HW-R5 is surprising though but it might be that a stripe size was not matched to ZFS recordsize and iozone block size in this case. I think what you're saying is With 5 disks performing well, you should expect 4x higher iops than a single disk, and the measured result was only 40% higher, which is a poor result. I agree. I guess the 128k recordsize used in iozone is probably large enough that it frequently causes blocks to span disks? I don't know. Probably - but it would also depend on how you configured hw-r5 (mainly it's stripe size). The other thing is that you might have had some bottleneck somewhere else as your results for N-way mirrors aren't that good either. The issue with raid-z and random reads is that as cache hit ratio goes down to 0 the IOPS approaches IOPS of a single drive. For a little bit more information see http://blogs.sun.com/roch/entry/when_to_and_not_to I don't think that's correct, less you're using a single thread. As long as multiple threads are issuing random reads on raidz, and those reads are small enough that each one is entirely written on a single disk, then you should be able to get n-1 disk operating simultaneously, to achieve (n-1)x performance of a single disk. Even if blocks are large enough to span disks, you should be able to get (n-1)x performance of a single disk for large sequential operations. While it is tru to some degree for hw raid-5, raid-z doesn't work that way. The issue is that each zfs filesystem block is basically spread across n-1 devices. So every time you want to read back a single fs block you need to wait for all n-1 devices to provide you with a part of it - and keep in mind in zfs you can't get a partial block even if that's what you are asking for as zfs has to check checksum of entire fs block. Now multiple readers make it actually worse for raid-z (assuming very poor cache hit ratio) - because each read from each reader involves all disk drives basically others can't read anything until it is done. It gets really bad for random reads. With HW raid-5 is your stripe size matches block you are reading back for random reads it is probable that while reader-X1 is reading from disk-Y1 reader-X2 is reading from disk-Y2 so you should end-up with all disk drives (-1) contributing to better overall iops. Read Roch's blog entry carefully for more information. btw: even in your results 6x disks in raid-z provided over 3x less IOPS than zfs raid-10 configuration for random reads. It is a big difference if one needs performance. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of v for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to one physical disk iops, since raidz1 is like raid5 , so is raid5 has same performance like raidz1? ie. random iops equal to one physical disk's ipos. I tested this extensively about 6 months ago. Please see http://www.nedharvey.com for more details. I disagree with the assumptions you've made above, and I'll say this instead: Look at http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary. pdf Go down to the 2nd section, Compared to a single disk Look at single-disk and raidz-5disks and raid5-5disks-hardware You'll see that both raidz and raid5 are significantly faster than a single disk in all types of operations. In all cases, raidz is approximately equal to, or significantly faster than hardware raid5. Furthermore, I later went on to test performance using nonvolatile devices (such as SSD) for ZIL dedicated log device, and in those situations, the performance of ZFS with dedicated log device beat hardware writeback caching easily. So put simply: ZFS raid is faster than the fastest hardware raid. Because ZFS has knowledge of the filesystem and blocks, while hardware raid only has knowledge of the blocks. So ZFS is able to be more intelligent in the techniques it uses for acceleration. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On 21/07/2010 15:40, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of v for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to one physical disk iops, since raidz1 is like raid5 , so is raid5 has same performance like raidz1? ie. random iops equal to one physical disk's ipos. I tested this extensively about 6 months ago. Please see http://www.nedharvey.com for more details. I disagree with the assumptions you've made above, and I'll say this instead: Look at http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary. pdf Go down to the 2nd section, Compared to a single disk Look at single-disk and raidz-5disks and raid5-5disks-hardware You'll see that both raidz and raid5 are significantly faster than a single disk in all types of operations. In all cases, raidz is approximately equal to, or significantly faster than hardware raid5. I had a quick look at your results a moment ago. The problem is that you used a server with 4GB of RAM + a raid card with a 256MB of cache. Then your filesize for iozone was set to 4GB - so random or not you probably had a relatively good cache hit ratio for random reads. And even then a random read from 8 threads gave you only about 40% more IOPS than for a RAID-Z made out of 5 disks than a single drive. The poor result for HW-R5 is surprising though but it might be that a stripe size was not matched to ZFS recordsize and iozone block size in this case. The issue with raid-z and random reads is that as cache hit ratio goes down to 0 the IOPS approaches IOPS of a single drive. For a little bit more information see http://blogs.sun.com/roch/entry/when_to_and_not_to -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
There is a common misconception about the comparison between mirror and raidz. You get the same performance, when you use the same number of disks. But the resulting filesystem has a different sizre, therefore a comparison is not applicable. Example: you have 8 disks Compare a zpool with one raidz vdev with 8 disks with a zpool containing 4 mirrors of 2 disks each. Then the read IOs spread over 8 disks in each case. Therfore the number of IOs is comparable But you compared apples and oranges because the net size is 7 disks in the first case and 4 disks in the second case. A valid comparison would be a comparison of a zpool with one raidz vdev containing 5 disks with the mirrored zpool containing 4 mirrors of 2 disks each. Because then the size of the zpool is the same. Regards, Ulrich Roy Sigurd Karlsbakk wrote: - Original Message - On Jul 20, 2010, at 6:12 AM, v victor_zh...@hotmail.com wrote: Hi, for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to one physical disk iops, since raidz1 is like raid5 , so is raid5 has same performance like raidz1? ie. random iops equal to one physical disk's ipos. On reads, no, any part of the stripe width can be read without reading the whole stripe width, giving performance equal to raid0 of non-parity disks. Are you sure this is true? I know it is, in theory, but some testing with bonnie++ showed me I didn't get so large a gain. Perhaps my tests were done wrong? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Ulrich Graef / Senior SE / Hardware Presales / Phone: + 49 6103 752 359 ORACLE Deutschland B.V. Co. KG / Amperestr. 6 / 63225 Langen http://www.oracle.com ORACLE Deutschland B.V. Co. KG Hauptverwaltung: Riesstr. 25, D-80992 Muenchen Registergericht: Amtsgericht Muenchen, HRA 95603 Komplementaerin: ORACLE Deutschland Verwaltung B.V. Rijnzathe 6, 3454PV De Meern, Niederlande Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Geschaeftsfuehrer: Juergen Kunz, Marcel van de Molen, Alexander van der Ven ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
- Original Message - Hi, for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to one physical disk iops, since raidz1 is like raid5 , so is raid5 has same performance like raidz1? ie. random iops equal to one physical disk's ipos. Mostly, yes. Traditionl RAID-5 is likely to be faster than ZFS because of ZFS doing checksumming, having the ZIL etc, but then, trad raid5 won't have the safety offered by ZFS Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On Jul 20, 2010, at 6:12 AM, v victor_zh...@hotmail.com wrote: Hi, for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to one physical disk iops, since raidz1 is like raid5 , so is raid5 has same performance like raidz1? ie. random iops equal to one physical disk's ipos. On reads, no, any part of the stripe width can be read without reading the whole stripe width, giving performance equal to raid0 of non-parity disks. On writes it could be worse then raidz1 depending on whether whole stripe widths are being written (same performance) or partial stripe widths are being written (worse performance). If it's a partial stripe width then the remaining data needs to be read off disk which doubles the IOs. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
- Original Message - On Jul 20, 2010, at 6:12 AM, v victor_zh...@hotmail.com wrote: Hi, for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to one physical disk iops, since raidz1 is like raid5 , so is raid5 has same performance like raidz1? ie. random iops equal to one physical disk's ipos. On reads, no, any part of the stripe width can be read without reading the whole stripe width, giving performance equal to raid0 of non-parity disks. Are you sure this is true? I know it is, in theory, but some testing with bonnie++ showed me I didn't get so large a gain. Perhaps my tests were done wrong? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs raidz1 and traditional raid 5 perfomrance comparision
On Jul 20, 2010, at 3:46 AM, Roy Sigurd Karlsbakk wrote: - Original Message - Hi, for zfs raidz1, I know for random io, iops of a raidz1 vdev eqaul to one physical disk iops, since raidz1 is like raid5 , so is raid5 has same performance like raidz1? ie. random iops equal to one physical disk's ipos. Mostly, yes. Traditionl RAID-5 is likely to be faster than ZFS because of ZFS doing checksumming, having the ZIL etc, but then, trad raid5 won't have the safety offered by ZFS Disagree. ZIL has nothing to do with RAIDness. Traditional RAID-5 suffers from a read-modify-write sequence if the I/O is not perfectly matched to the stripe width -- a 3x latency hit. In raidz, the writes are always full stripe, so there is only a 1x latency hit. OTOH, for reads, some RAID-5 implementations will read only a single portion of a stripe, if the I/O is small enough to fit. In this case, the small, random read performance can approach RAID-0. raidz will always read the full block, even though the full block might not be spread across all of the disks. ZFS does this to verify the checksum of the data. This is the classic tradeoff -- space, performance, dependability: pick two. -- richard -- Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss