On Jan 17, 2013, at 8:35 AM, Jim Klimov <jimkli...@cos.ru> wrote:
> On 2013-01-17 16:04, Bob Friesenhahn wrote:
>> If almost all of the I/Os are 4K, maybe your ZVOLs should use a
>> volblocksize of 4K? This seems like the most obvious improvement.
>> Matching the volume block size to what the clients are actually using
>> (due to their filesystem configuration) should improve performance
>> during normal operations and should reduce the number of blocks which
>> need to be sent in the backup by reducing write amplification due to
>> "overlap" blocks..
> Also, it would make sense while you are at it to verify that the
> clients(i.e. VMs' filesystems) do their IOs 4KB-aligned, i.e. that
> their partitions start at a 512b-based sector offset divisible by
> 8 inside the virtual HDDs, and the FS headers also align to that
> so the first cluster is 4KB-aligned.
This is the classical expectation. So I added an alignment check into
nfssvrtop and iscsisvrtop. I've looked at a *ton* of NFS workloads from
ESX and, believe it or not, alignment doesn't matter at all, at least for
the data I've collected. I'll let NetApp wallow in the mire of misalignment
while I blissfully dream of other things :-)
> Classic MSDOS MBR did not warrant that partition start, by using
> 63 sectors as the cylinder size and offset factor. Newer OSes don't
> use the classic layout, as any config is allowable; and GPT is well
> aligned as well.
> Overall, a single IO in the VM guest changing a 4KB cluster in its
> FS should translate to one 4KB IO in your backend storage changing
> the dataset's userdata (without reading a bigger block and modifying
> it with COW), plus some avalanche of metadata updates (likely with
> the COW) for ZFS's own bookkeeping.
I've never seen a 1:1 correlation from the VM guest to the workload
on the wire. To wit, I did a bunch of VDI and VDI-like (small, random
writes) testing on XenServer and while the clients were chugging
away doing 4K random I/Os, on the wire I was seeing 1MB NFS
writes. In part this analysis led to my cars-and-trains analysis.
In some VMware configurations, over the wire you could see a 16k
read for every 4k random write. Go figure. Fortunately, those 16k
reads find their way into the MFU side of the ARC :-)
Bottom line: use tools like iscsisvrtop and dtrace to get an idea of
what is really happening over the wire.
zfs-discuss mailing list