Hi Nathan,

You've misunderstood how the Zil works and why it reduces write latency for 
synchronous writes. 

Since you've partitioned a single SSD into two silces, one as pool storage and 
one as Zil for that pool, all sync writes will be 2X amplified. There's no way 
around it. ZFS will write to the Zil while simultaneously (or with up to a 
couple second delay) write to the slice you're using to persistently store pool 
data.  This doesn't happen when you expose the raw partition to the VM because 
those write don't go through the Zil...hence no write amplification.

Since you've put the Zil on physically on the same device as the pool storage, 
the Zil serves no purpose other than to slow things down.  The purpose of a Zil 
is confirm sync writes as fast as possible even if they haven't hit the actual 
pool storage (usually slow HDs) yet; it confirms the write once it's hit the 
Zil and then ZFS has a moment (up to 30sec IIRC) to bundle multiple IOs before 
committing it to persistent pool storage.  

Remove the cache silce from the pool where you've carved out this zvol. Test 
again. Your writes will be faster. They likely won't as fast as your async 
writes (150MB/sec) but they will certainly be faster than the 15MB/sec you're 
getting now when you're unintentionally do synchronous writing to the zil slice 
and async writes to the pool storage slice simultaneously.  I'd bet the zvol 
solution will approach the speed of using a raw partition. 

-Pete 

P.S. Be careful using the term write amplification when talking about 
SSDs...people usually use that to refer to what happens within the SSD. 
Specifically before a write (especially a small write) can be written, other 
nearby data must be read and so an entire block can be rewritten.
http://en.wikipedia.org/wiki/Write_amplification 

On Nov 20, 2012, at 8:29 AM, Nathan Kroenert wrote:

> Hi folks,  (Long time no post...)
> 
> Only starting to get into this one, so apologies if I'm light on detail, 
> but...
> 
> I have a shiny SSD I'm using to help make some VirtualBox stuff I'm doing go 
> fast.
> 
> I have a 240GB Intel 520 series jobbie. Nice.
> 
> I chopped into a few slices - p0 (partition table), p1 128GB, p2 60gb.
> 
> As part of my work, I have used it both as a RAW device (cxtxdxp1) and 
> wrapped partition 1 with a virtualbox created VMDK linkage, and it works like 
> a champ. :) Very happy with that.
> 
> I then tried creating a new zpool using partition 2 of the disk (zpool create 
> c2d0p2) and then carved a zvol out of that (30GB), and wrapped *that* in a 
> vmdk.
> 
> Still works OK and speed is good(ish) - but there are a couple of things in 
> particular that disturb me:
> - Sync writes are pretty slow - only about 1/10th of what I thought I might 
> get (about 15MB/s). ASync writes are fast - up to 150MB/s or more.
> - More worringly, it seems that writes are amplified by 2X in that if I write 
> 100MB at the guest level, the underlying bare metal ZFS writes 200M, as 
> observed by iostat. This doesn't happen on the VM's that are using RAW slices.
> 
> Anyone have any thoughts on what might be happening here?
> 
> I can appreciate that if everything comes through as a sync write, it goes to 
> the ZIL first, then to it's final resting place - but it seems a little over 
> the top that it really is double.
> 
> I have also had a play with sync=, primarycache settings and a few other 
> things but it doesn't seem to change the behavious
> 
> Again - I'm looking for thoughts here - as I have only really just started 
> looking into this. Should I happen across anything interesting, I'll followup 
> this post.
> 
> Cheers,
> 
> Nathan. :)
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to