Bob Friesenhahn wrote: > Your idea to stripe two disks per LUN should work. Make sure to use > raidz2 rather than plain raidz for the extra reliability. This > solution is optimized for high data throughput from one user.
Striping two disks per LUN (RAID0 on 2 disks) and then adding a ZFS form of redundancy (either mirror or raidz[2]) would be an efficient use of space. There would be no additional space overhead caused by running that way. Note, however, that if you do this, ZFS must resilver the larger LUN in the event of a single disk failure on the backend. This means a longer time to rebuild, and a lot of "extra" work on the other (non-failed) half of the RAID0 stripe. > > An alternative is to create individual "RAID 0" LUNs which actually > only contain a single disk. This is certainly preferable, since the unit of failure at the hardware level corresponds to the unit of resilvering at the ZFS level. And at least on my Nexsan SATAboy(2f) this configuration is possible. > Then implement the pool as two raidz2s > with six LUNs each, and two hot spares. That would be my own > preference. Due to ZFS's load share this should provide better > performance (perhaps 2X) for multi-user loads. Some testing may be > required to make sure that your hardware is happy with this. I disagree with this suggestion. With this config, you only get 8 disks worth of storage, out of the 14, which is a ~42% overhead. In order to lose data in this scenario, 3 disks would have to fail out of a single 6-disk group before zfs is able to resilver any of them to the hot spares. That seems (to me) a lot more redundancy than is needed. As far as workload, any time you use RAIDZ[2], ZFS must read the entire stripe (across all of the disks) in order to verify the checksum for that data block. This means that a 128k read (the default zfs blocksize) requires a 32kb read from each of 6 disks, which may include a relatively slow seek to the relevant part of the spinning rust. So for random I/O, even though the data is striped across all the disks, you will see only a single disks's worth of throughput. For sequential I/O, you'll see the full RAID set's worth of throughput. If you are expecting a non-sequential workload, you would be better off taking the 50% storage overhead to do ZFS mirroring. > > Avoid RAID5 if you can because it is not as reliable with today's > large disks and the resulting huge LUN size can take a long time to > resilver if the RAID5 should fail (or be considered to have failed). Here's a place that ZFS shines: it doesn't resilver the whole disk, just the data blocks. So it doesn't have to read the full array to rebuild a failed disk, so it's less likely to cause a subsequent failure during parity rebuild. My $.02. --Joe _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss