On 10/04/12 05:30, Schweiss, Chip wrote:
Thanks for all the input. It seems information on the
performance of the ZIL is sparse and scattered. I've spent
significant time researching this the past day. I'll summarize
what I've found. Please correct me if I'm wrong.
- The ZIL can have any number of SSDs attached either mirror
or individually. ZFS will stripe across these in a raid0 or
raid10 fashion depending on how you configure.
The ZIL code chains blocks together and these are allocated round
robin among slogs or
if they don't exist then the main pool devices.
- To determine the true maximum streaming performance of the
ZIL setting sync=disabled will only use the in RAM ZIL. This
gives up power protection to synchronous writes.
There is no RAM ZIL. If sync=disabled then all writes are
asynchronous and are written
as part of the periodic ZFS transaction group (txg) commit that
occurs every 5 seconds.
- Many SSDs do not help protect against power failure because
they have their own ram cache for writes. This effectively
makes the SSD useless for this purpose and potentially
introduces a false sense of security. (These SSDs are fine
The ZIL code issues a write cache flush to all devices it has
written before returning
from the system call. I've heard, that not all devices obey the
flush but we consider them
as broken hardware. I don't have a list to avoid.
- Mirroring SSDs is only helpful if one SSD fails at the time
of a power failure. This leave several unanswered questions.
How good is ZFS at detecting that an SSD is no longer a
reliable write target? The chance of silent data corruption
is well documented about spinning disks. What chance of data
corruption does this introduce with up to 10 seconds of data
written on SSD. Does ZFS read the ZIL during a scrub to
determine if our SSD is returning what we write to it?
If the ZIL code gets a block write failure it will force the txg to
commit before returning.
It will depend on the drivers and IO subsystem as to how hard it
tries to write the block.
- Zpool versions 19 and higher should be able to survive a ZIL
failure only loosing the uncommitted data. However, I
haven't seen good enough information that I would necessarily
trust this yet.
This has been available for quite a while and I haven't heard of any
bugs in this area.
Several threads seem to suggest a ZIL throughput limit of
1Gb/s with SSDs. I'm not sure if that is current, but I
can't find any reports of better performance. I would
suspect that DDR drive or Zeus RAM as ZIL would push past
1GB/s seems very high, but I don't have any numbers to share.
Anyone care to post their performance numbers on current
hardware with E5 processors, and ram based ZIL solutions?
Thanks to everyone who has responded and contacted me directly
on this issue.
On Thu, Oct 4, 2012 at 3:03 AM, Andrew
Noting of course that this means that in the case of an
unexpected system outage or loss of connectivity to the disks,
synchronous writes since the last txg commit will be lost,
even though the applications will believe they are secured to
disk. (ZFS filesystem won't be corrupted, but it will look
like it's been wound back by up to 30 seconds when you
Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
Behalf Of Schweiss, Chip
How can I determine for sure that my ZIL is my
bottleneck? If it is the
bottleneck, is it possible to keep adding mirrored
pairs of SSDs to the ZIL to
make it faster? Or should I be looking for a DDR
drive, ZeusRAM, etc.
Temporarily set sync=disabled
Or, depending on your application, leave it that way
permanently. I know, for the work I do, most systems I
support at most locations have sync=disabled. It all
depends on the workload.
This is fine for some workloads, such as those where you would
start again with fresh data and those which can look closely
at the data to see how far they got before being rudely
interrupted, but not for those which rely on the Posix
semantics of synchronous writes/syncs meaning data is secured
on non-volatile storage when the function returns.
zfs-discuss mailing list
zfs-discuss mailing list