> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Schweiss, Chip
> . The ZIL can have any number of SSDs attached either mirror or
> individually.   ZFS will stripe across these in a raid0 or raid10 fashion
> depending on how you configure.

I'm regurgitating something somebody else said - but I don't know where.  I 
believe multiple ZIL devices don't get striped.  They get round-robin'd.  This 
means your ZIL can absolutely become a bottleneck, if you're doing sustained 
high throughput (not high IOPS) sync writes.  But the way to prevent that 
bottleneck is by tuning the ... I don't know the names of the parameters.  Some 
parameters that indicate "a sync write larger than X should skip the ZIL and go 
directly to pool."

> . To determine the true maximum streaming performance of the ZIL setting
> sync=disabled will only use the in RAM ZIL.   This gives up power protection 
> to
> synchronous writes.

There is no RAM ZIL.  The basic idea behind ZIL is like this:  Some 
applications simply tell the system to "write" and the system will buffer these 
writes in memory, and the application will continue processing.  But some 
applications do not want the OS to buffer writes, so they issue writes in 
"sync" mode.  These applications will issue the write command, and they will 
block there, until the OS says it's written to nonvolatile storage.  In ZFS, 
this means the transaction gets written to the ZIL, and then it gets put into 
the memory buffer just like any other write.  Upon reboot, when the filesystem 
is mounting, ZFS will always look in the ZIL to see if there are any 
transactions that have not yet been played to disk.

So, when you set sync=disabled, you're just bypassing that step.  You're lying 
to the applications, if they say "I want to know when this is written to disk," 
and you just immediately say "Yup, it's done" unconditionally.  This is the 
highest performance thing you could possibly do - but depending on your system 
workload, could put you at risk for data loss.

> . Mirroring SSDs is only helpful if one SSD fails at the time of a power
> failure.  This leave several unanswered questions.  How good is ZFS at
> detecting that an SSD is no longer a reliable write target?   The chance of
> silent data corruption is well documented about spinning disks.  What chance
> of data corruption does this introduce with up to 10 seconds of data written
> on SSD.  Does ZFS read the ZIL during a scrub to determine if our SSD is
> returning what we write to it?

Not just power loss -- any ungraceful crash.  

ZFS doesn't have any way to scrub ZIL devices, so it's not very good at 
detecting failed ZIL devices.  There is definitely the possibility for an SSD 
to enter a failure mode where you write to it, it doesn't complain, but you 
wouldn't be able to read it back if you tried.  Also, upon ungraceful crash, 
even if you try to read that data, and fail to get it back, there's no way to 
know that you should have expected something.  So you still don't detect the 

If you want to maintain your SSD periodically, you should do something like:  
Remove it as a ZIL device, create a new pool with just this disk in it, write a 
bunch of random data to the new junk pool, scrub the pool, then destroy the 
junk pool and return it as a ZIL device to the main pool.  This does not 
guarantee anything - but then - nothing anywhere guarantees anything.  This is 
a good practice, and it definitely puts you into a territory of reliability 
better than the competing alternatives.

> . Zpool versions 19 and higher should be able to survive a ZIL failure only
> loosing the uncommitted data.   However, I haven't seen good enough
> information that I would necessarily trust this yet.

That was a very long time ago.  (What, 2-3 years?)  It's very solid now.

> . Several threads seem to suggest a ZIL throughput limit of 1Gb/s with
> SSDs.   I'm not sure if that is current, but I can't find any reports of 
> better
> performance.   I would suspect that DDR drive or Zeus RAM as ZIL would push
> past this.

Whenever I measure the sustainable throughput of a SSD, HDD, DDRDrive, or 
anything else ... Very few devices can actually sustain faster than 1Gb/s, for 
use as a ZIL or anything else.  Published specs are often higher, but not 

If you are ZIL bandwidth limited, you should consider tuning the size of stuff 
that goes to ZIL.

zfs-discuss mailing list
  • [zfs-discuss] Mak... Schweiss, Chip
    • Re: [zfs-dis... Timothy Coalson
    • Re: [zfs-dis... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
      • Re: [zfs... Andrew Gabriel
        • Re: ... Schweiss, Chip
          • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
          • ... Neil Perrin
            • ... Richard Elling
              • ... Schweiss, Chip
                • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
                • ... Richard Elling
            • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
              • ... Neil Perrin
                • ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
        • Re: ... Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)

Reply via email to