I’m wondering if any of the ZIL gurus could examine the following and point out anywhere my logic is going wrong.
For small backend systems (e.g. 24x10k SAS Raid 10) I’m expecting an absolute maximum backend write throughput of 10000 seq IOPS** and more realistically 2000-5000. With small (4kB) blocksizes*, 10k is 480MB over 10s so we don’t need much ZIL space or throughput. What we do need is the ability to absorb the IOPS at low latency and keep absorbing them at least as fast as the backend storage can commit them. ZIL OPTIONS: Obviously a DDRDrive is the ideal (36k 4k random IOPS***) but for the same budget I can get 2x Vertex 2 EX 50GB drives and put each behind it’s own P410 512MB BBWC controller. Assuming the SSDs can do 6300 4k random IOPS*** and that the controller cache confirms those writes in the same latency as the DDRDrive (both PCIe attached RAM?****) then we should have DDRDrive type latency up to 6300 sustained IOPS. Also, in bursting traffic, we should be able to absorb up to 512MB of data (3.5s of 36000 4k IOPS) at much higher IOPS/low latency as long as averages at 6300 (ie SSD can empty cache before fills). So what are the issues with using this approach for low budget builds looking for mirrored ZILs that don’t require >6300 sustained write IOPS (due to backend disk limitations?). Obviously there’s a lot of assumptions here but wanted to get my theory straight before I start ordering things to test. Thanks all. James * For NTFS 4kB clusters on VMWare / NFS, I believe 4kB zfs recordsize will provide best performance (avoid partial writes). Thoughts welcome on that too. ** Assumes 10k SAS can do max 900 sequential writes each striped across 12 mirrors and rounded down (900 based on TomsHardware hdd streaming write bench). Also assumes ZFS can take completely random writes and turn them into completely sequential write iops on underlying disks and that no reads,>32k writes etc are hitting disk at the same time. Realistically 2000-5000 is probably more likely maximums. *** Figures from excellent DDRDrive presentation. NB: If BBWC can sequentialise writes to SSD may get closer to 10000 IOPS **** I’m assuming that P410 BBWC and DDRDrive have similar IOPS/latency profile – DDRDrive may do something fancy with striping across RAM to improve IO? Similar Posts: http://opensolaris.org/jive/thread.jspa?messageID=460871 - except normal disks instead of ssd behind cache (so cache would fill). http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg39729.html - same again -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss