> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Schweiss, Chip
> . The ZIL can have any number of SSDs attached either mirror or
> individually. ZFS will stripe across these in a raid0 or raid10 fashion
> depending on how you configure.
I'm regurgitating something somebody else said - but I don't know where. I
believe multiple ZIL devices don't get striped. They get round-robin'd. This
means your ZIL can absolutely become a bottleneck, if you're doing sustained
high throughput (not high IOPS) sync writes. But the way to prevent that
bottleneck is by tuning the ... I don't know the names of the parameters. Some
parameters that indicate "a sync write larger than X should skip the ZIL and go
directly to pool."
> . To determine the true maximum streaming performance of the ZIL setting
> sync=disabled will only use the in RAM ZIL. This gives up power protection
> synchronous writes.
There is no RAM ZIL. The basic idea behind ZIL is like this: Some
applications simply tell the system to "write" and the system will buffer these
writes in memory, and the application will continue processing. But some
applications do not want the OS to buffer writes, so they issue writes in
"sync" mode. These applications will issue the write command, and they will
block there, until the OS says it's written to nonvolatile storage. In ZFS,
this means the transaction gets written to the ZIL, and then it gets put into
the memory buffer just like any other write. Upon reboot, when the filesystem
is mounting, ZFS will always look in the ZIL to see if there are any
transactions that have not yet been played to disk.
So, when you set sync=disabled, you're just bypassing that step. You're lying
to the applications, if they say "I want to know when this is written to disk,"
and you just immediately say "Yup, it's done" unconditionally. This is the
highest performance thing you could possibly do - but depending on your system
workload, could put you at risk for data loss.
> . Mirroring SSDs is only helpful if one SSD fails at the time of a power
> failure. This leave several unanswered questions. How good is ZFS at
> detecting that an SSD is no longer a reliable write target? The chance of
> silent data corruption is well documented about spinning disks. What chance
> of data corruption does this introduce with up to 10 seconds of data written
> on SSD. Does ZFS read the ZIL during a scrub to determine if our SSD is
> returning what we write to it?
Not just power loss -- any ungraceful crash.
ZFS doesn't have any way to scrub ZIL devices, so it's not very good at
detecting failed ZIL devices. There is definitely the possibility for an SSD
to enter a failure mode where you write to it, it doesn't complain, but you
wouldn't be able to read it back if you tried. Also, upon ungraceful crash,
even if you try to read that data, and fail to get it back, there's no way to
know that you should have expected something. So you still don't detect the
If you want to maintain your SSD periodically, you should do something like:
Remove it as a ZIL device, create a new pool with just this disk in it, write a
bunch of random data to the new junk pool, scrub the pool, then destroy the
junk pool and return it as a ZIL device to the main pool. This does not
guarantee anything - but then - nothing anywhere guarantees anything. This is
a good practice, and it definitely puts you into a territory of reliability
better than the competing alternatives.
> . Zpool versions 19 and higher should be able to survive a ZIL failure only
> loosing the uncommitted data. However, I haven't seen good enough
> information that I would necessarily trust this yet.
That was a very long time ago. (What, 2-3 years?) It's very solid now.
> . Several threads seem to suggest a ZIL throughput limit of 1Gb/s with
> SSDs. I'm not sure if that is current, but I can't find any reports of
> performance. I would suspect that DDR drive or Zeus RAM as ZIL would push
> past this.
Whenever I measure the sustainable throughput of a SSD, HDD, DDRDrive, or
anything else ... Very few devices can actually sustain faster than 1Gb/s, for
use as a ZIL or anything else. Published specs are often higher, but not
If you are ZIL bandwidth limited, you should consider tuning the size of stuff
that goes to ZIL.
zfs-discuss mailing list