I found something similar happening when writing over NFS (at significantly
lower throughput than available on the system directly), specifically that
effectively all data, even asynchronous writes, were being written to the
ZIL, which I eventually traced (with help from Richard Elling and others on
this list) at least partially to the linux NFS client issuing commit
requests before ZFS wanted to write the asynchronous data to a txg. I
tried fiddling with zfs_write_limit_override to get more data onto normal
vdevs faster, but this reduced performance (perhaps setting a tunable to
make ZFS not throttle writes while hitting the write limit could fix that),
and didn't cause it to go significantly easier on the ZIL devices. I
decided to live with the default behavior, since my main bottleneck is
ethernet anyway, and the projected lifespan of the ZIL devices was fairly
large due to our workload.
I did find that setting logbias=throughput on a zfs filesystem caused it to
act as though the ZIL devices weren't there, which actually reduced commit
times under continuous streaming writes (mostly due to having more
throughput for the same amount of data to commit, in large chunks, but the
zilstat script also reported less writing to the ZIL blocks (which are
allocated from normal vdevs without a ZIL device, or with
logbias=throughput) under this condition, so perhaps there is more to the
story), so if you have different workloads for different datasets, this
could help (since it isn't a poolwide setting). Obviously, small
synchronous writes to that zfs filesystem will take a large hit from this
It would be nice if there was a feature in ZFS that could direct small
commits to ZIL blocks on log devices, but behave like logbias=throughput
for large commits. It would probably need manual tuning, but it would
treat SSD log devices more gently, and increase performance for large
If you can't configure ZFS to write less data to the ZIL, I think a RAM
based ZIL device would be a good way to get throughput up higher (and less
worries about flash endurance, etc).
On Wed, Oct 3, 2012 at 1:28 PM, Schweiss, Chip <c...@innovates.com> wrote:
> I'm in the planing stages of a rather larger ZFS system to house
> approximately 1 PB of data.
> I have only one system with SSDs for L2ARC and ZIL, The ZIL seems to be
> the bottle neck for large bursts of data being written. I can't confirm
> this for sure, but the when throwing enough data at my storage pool and the
> write latency starts rising, the ZIL write speed hangs close the max
> sustained throughput I've measured on the SSD (~200 MB/s).
> The pool when empty w/o L2ARC or ZIL it was tested with Bonnie++ and
> showed ~1300MB/s serial read and ~800MB/s serial write speed.
> How can I determine for sure that my ZIL is my bottleneck? If it is the
> bottleneck, is it possible to keep adding mirrored pairs of SSDs to the ZIL
> to make it faster? Or should I be looking for a DDR drive, ZeusRAM, etc.
> Thanks for any input,
> zfs-discuss mailing list
zfs-discuss mailing list