Re: [zfs-discuss] RaidzN blocksize ... or blocksize in general ... and resilver

Bob Friesenhahn Sun, 17 Oct 2010 09:05:52 -0700

On Sun, 17 Oct 2010, Edward Ned Harvey wrote:

The default blocksize is 128K. If you are using mirrors, then eachblock on disk will be 128K whenever possible. But if you're usingraidzN with a capacity of M disks (M disks useful capacity + N disksredundancy) then the block size on each individual disk will be 128K/ M. Right? This is one of the reasons the raidzN resilver code isinefficient. Since you end up waiting for the slowest seek time ofany one disk in the vdev, and when that's done, the amount of datayou were able to process was at most 128K. Rinse and repeat.

Your idea about what it means for "code" to be inefficient is clearlyvastly different than my own. Regardless, the the physical layoutissues (impacting IOPS requirements) are a reality.

Would it not be wise, when creating raidzN vdev's, to increase theblocksize to 128K * M? Then, the on-disk blocksize for each diskcould be the same as the mirror on-disk blocksize of 128K. It stillwon't resilver as fast as a mirror, but the raidzN resilver would beaccelerated by as much as M times. Right?

This might work for HPC applications with huge files and hugesequential streaming data rate requirements. It would be detrimentalfor the case of small files, or applications which issue many smallwrites, and particularly bad for many random synchronous writes.

The only disadvantage that I know of would be wasted space. Every4K file in a mirror can waste up to 124K of disk space, right? Andin the above described scenario, every 4K file in the raidzN canwaste up to 128K * M of disk space, right? Also, if you have a lotof these sparse 4K blocks, then the resilver time doesn't actuallyimprove either. Because you perform one seek, and regardless if youfetch 128K or 128K*M, you still paid one maximum seek time to fetch4K of useful data.

The tally of disadvantages are quite large. Note that zfs needs towrite each zfs "block" and you are dramatically increasing the levelof write amplification. Also zfs needs to checksum each whole blockand the checksum adds to the latency. The risk of block corruption isincreased. 128K is already quite large for a block.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RaidzN blocksize ... or blocksize in general ... and resilver

Reply via email to