I’m wondering if any of the ZIL gurus could examine the following and point out
anywhere my logic is going wrong.
For small backend systems (e.g. 24x10k SAS Raid 10) I’m expecting an absolute
maximum backend write throughput of 10000 seq IOPS** and more realistically
2000-5000. With small (4kB) blocksizes*, 10k is 480MB over 10s so we don’t
need much ZIL space or throughput. What we do need is the ability to absorb
the IOPS at low latency and keep absorbing them at least as fast as the backend
storage can commit them.
ZIL OPTIONS: Obviously a DDRDrive is the ideal (36k 4k random IOPS***) but
for the same budget I can get 2x Vertex 2 EX 50GB drives and put each behind
it’s own P410 512MB BBWC controller. Assuming the SSDs can do 6300 4k random
IOPS*** and that the controller cache confirms those writes in the same latency
as the DDRDrive (both PCIe attached RAM?****) then we should have DDRDrive type
latency up to 6300 sustained IOPS. Also, in bursting traffic, we should be
able to absorb up to 512MB of data (3.5s of 36000 4k IOPS) at much higher
IOPS/low latency as long as averages at 6300 (ie SSD can empty cache before
So what are the issues with using this approach for low budget builds looking
for mirrored ZILs that don’t require >6300 sustained write IOPS (due to backend
disk limitations?). Obviously there’s a lot of assumptions here but wanted to
get my theory straight before I start ordering things to test.
* For NTFS 4kB clusters on VMWare / NFS, I believe 4kB zfs recordsize will
provide best performance (avoid partial writes). Thoughts welcome on that too.
** Assumes 10k SAS can do max 900 sequential writes each striped across 12
mirrors and rounded down (900 based on TomsHardware hdd streaming write bench).
Also assumes ZFS can take completely random writes and turn them into
completely sequential write iops on underlying disks and that no reads,>32k
writes etc are hitting disk at the same time. Realistically 2000-5000 is
probably more likely maximums.
*** Figures from excellent DDRDrive presentation. NB: If BBWC can
sequentialise writes to SSD may get closer to 10000 IOPS
**** I’m assuming that P410 BBWC and DDRDrive have similar IOPS/latency profile
– DDRDrive may do something fancy with striping across RAM to improve IO?
http://opensolaris.org/jive/thread.jspa?messageID=460871 - except normal disks
instead of ssd behind cache (so cache would fill).
http://firstname.lastname@example.org/msg39729.html - same
This message posted from opensolaris.org
zfs-discuss mailing list