Re: Options for zfs inside a VM backed by zfs on the host

2015-08-28 Thread Chad J. Milios

 On Aug 27, 2015, at 7:47 PM, Tenzin Lhakhang tenzin.lhakh...@gmail.com 
 wrote:
 
 On Thu, Aug 27, 2015 at 3:53 PM, Chad J. Milios mil...@ccsys.com 
 mailto:mil...@ccsys.com wrote:
 
 Whether we are talking ffs, ntfs or zpool atop zvol, unfortunately there are 
 really no simple answers. You must consider your use case, the host and vm 
 hardware/software configuration, perform meaningful benchmarks and, if you 
 care about data integrity, thorough tests of the likely failure modes (all 
 far more easily said than done). I’m curious to hear more about your use 
 case(s) and setups so as to offer better insight on what alternatives may 
 make more/less sense for you. Performance needs? Are you striving for lower 
 individual latency or higher combined throughput? How critical are integrity 
 and availability? How do you prefer your backup routine? Do you handle that 
 in guest or host? Want features like dedup and/or L2ARC up in the mix? (Then 
 everything bears reconsideration, just about triple your research and testing 
 efforts.)
 
 Sorry, I’m really not trying to scare anyone away from ZFS. It is awesome and 
 capable of providing amazing solutions with very reliable and sensible 
 behavior if handled with due respect, fear, monitoring and upkeep. :)
 
 There are cases to be made for caching [meta-]data in the child, in the 
 parent, checksumming in the child/parent/both, compressing in the 
 child/parent. I believe `gstat` along with your custom-made benchmark or test 
 load will greatly help guide you.
 
 ZFS on ZFS seems to be a hardly studied, seldom reported, never documented, 
 tedious exercise. Prepare for accelerated greying and balding of your hair. 
 The parent's volblocksize, child's ashift, alignment, interactions involving 
 raidz stripes (if used) can lead to problems from slightly decreased 
 performance and storage efficiency to pathological write amplification within 
 ZFS, performance and responsiveness crashing and sinking to the bottom of the 
 ocean. Some datasets can become veritable black holes to vfs system calls. 
 You may see ZFS reporting elusive errors, deadlocking or panicing in the 
 child or parent altogether. With diligence though, stable and performant 
 setups can be discovered for many production situations.
 
 For example, for a zpool (whether used by a VM or not, locally, thru iscsi, 
 ggate[cd], or whatever) atop zvol which sits on parent zpool with no 
 redundancy, I would set primarycache=metadata checksum=off compression=off 
 for the zvol(s) on the host(s) and for the most part just use the same zpool 
 settings and sysctl tunings in the VM (or child zpool, whatever role it may 
 conduct) that i would otherwise use on bare cpu and bare drives (defaults + 
 compression=lz4 atime=off). However, that simple case is likely not yours.
 
 With ufs/ffs/ntfs/ext4 and most other filesystems atop a zvol i use checksums 
 on the parent zvol, and compression too if the child doesn’t support it (as 
 ntfs can), but still caching only metadata on the host and letting the child 
 vm/fs cache real data.
 
 My use case involves charging customers for their memory use so admittedly 
 that is one motivating factor, LOL. Plus, i certainly don’t want one rude VM 
 marching through host ARC unfairly evacuating and starving the other polite 
 neighbors.
 
 VM’s swap space becomes another consideration and I treat it like any other 
 ‘dumb’ filesystem with compression and checksumming done by the parent but 
 recent versions of many operating systems may be paging out only already 
 compressed data, so investigate your guest OS. I’ve found lz4’s claims of an 
 almost-no-penalty early-abort to be vastly overstated when dealing with 
 zvols, small block sizes and high throughput so if you can be certain you’ll 
 be dealing with only compressed data then turn it off. For the virtual memory 
 pagers in most current-day OS’s though set compression on the swap’s backing 
 zvol to lz4.
 
 Another factor is the ZIL. One VM can hoard your synchronous write 
 performance. Solutions are beyond the scope of this already-too-long email :) 
 but I’d be happy to elaborate if queried.
 
 And then there’s always netbooting guests from NFS mounts served by the host 
 and giving the guest no virtual disks, don’t forget to consider that option.
 
 Hope this provokes some fruitful ideas for you. Glad to philosophize about 
 ZFS setups with ya’ll :)
 
 -chad

 That was a really awesome read!  The idea of turning metadata on at the 
 backend zpool and then data on the VM was interesting, I will give that a 
 try. Please can you elaborate more on the ZILs and synchronous writes by 
 VMs.. that seems like a great topic.

 I am right now exploring the question: are SSD ZILs necessary in an all SSD 
 pool? and then the question of NVMe SSD ZILs onto of an all SSD pool.  My 
 guess at the moment is that SSD ZILs are not necessary at all in an SSD pool 
 during intensive IO.  I've been told that ZILs are 

Re: Options for zfs inside a VM backed by zfs on the host

2015-08-27 Thread Allan Jude
On 2015-08-27 02:10, Marcus Reid wrote:
 On Wed, Aug 26, 2015 at 05:25:52PM -0400, Vick Khera wrote:
 I'm running FreeBSD inside a VM that is providing the virtual disks backed
 by several ZFS zvols on the host. I want to run ZFS on the VM itself too
 for simplified management and backup purposes.

 The question I have is on the VM guest, do I really need to run a raid-z or
 mirror or can I just use a single virtual disk (or even a stripe)? Given
 that the underlying storage for the virtual disk is a zvol on a raid-z
 there should not really be too much worry for data corruption, I would
 think. It would be equivalent to using a hardware raid for each component
 of my zfs pool.

 Opinions? Preferably well-reasoned ones. :)
 
 This is a frustrating situation, because none of the options that I can
 think of look particularly appealing.  Single-vdev pools would be the
 best option, your redundancy is already taken care of by the host's
 pool.  The overhead of checksumming, etc. twice is probably not super
 bad.  However, having the ARC eating up lots of memory twice seems
 pretty bletcherous.  You can probably do some tuning to reduce that, but
 I never liked tuning the ARC much.
 
 All the nice features ZFS brings to the table is hard to give up once
 you get used to having them around, so I understand your quandry.
 
 Marcus
 ___
 freebsd-virtualization@freebsd.org mailing list
 https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
 To unsubscribe, send any mail to 
 freebsd-virtualization-unsubscr...@freebsd.org
 

You can just:

zfs set primarycache=metadata poolname

And it will only cache metadata in the ARC inside the VM, and avoid
caching data blocks, which will be cached outside the VM. You could even
turn the primarycache off entirely.

-- 
Allan Jude



signature.asc
Description: OpenPGP digital signature


Re: Options for zfs inside a VM backed by zfs on the host

2015-08-27 Thread Paul Vixie
let me ask a related question: i'm using FFS in the guest, zvol on the
host. should i be telling my guest kernel to not bother with an FFS
buffer cache at all, or to use a smaller one, or what?
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
freebsd-virtualization-unsubscr...@freebsd.org


Re: Options for zfs inside a VM backed by zfs on the host

2015-08-27 Thread Vick Khera
On Thu, Aug 27, 2015 at 6:10 AM, Marie mariehelen...@gmail.com wrote:

 I've tried this in the past, and found the worst performance penalty was
 with ARC disabled in guest. I tried with ARC enabled on host and guest,
 only on host, only on guest. There was a significant performance penalty
 with either ARC disabled.

 I'd still recommend to experiment with it on your own to see if the hit is
 acceptable or not.


Thanks for all the replies. I'm going with a small-ish ARC on the VMs
(about ¼ the allocated RAM as max, and very small amount for min) and
letting the host have its substantial ARC.

Since I'm running with compression=lz4 on the guest, I ended up setting
compression=none on the host for the backing volumes. After some testing I
found I was getting no compression on the backing volumes, so why waste the
CPU overhead trying.
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
freebsd-virtualization-unsubscr...@freebsd.org

RE: Options for zfs inside a VM backed by zfs on the host

2015-08-27 Thread Matt Churchyard via freebsd-virtualization
 On Wed, Aug 26, 2015 at 11:10:44PM -0700, Marcus Reid wrote:
 On Wed, Aug 26, 2015 at 05:25:52PM -0400, Vick Khera wrote:
   Opinions? Preferably well-reasoned ones. :)
   
  However, having the ARC eating up lots of memory twice seems pretty 
  bletcherous.  You can probably do some tuning to reduce that, but I 
  never liked tuning the ARC much.

 I just realized that you can turn primarycache off per-dataset.  Does it make 
 more sense to turn primarycache=none on the zvol on the host, or  on the 
 datasets in the vm?  I'm thinking on the host, but it might be worth 
 experimenting.

I'd be very wary of disabling ARC on the main host, it can have pretty serious 
side effects. It could possibly be useful in the guest though, as data should 
be cached already by ARC on the host, you're just going through an extra step 
of reading through the virtual disk driver, and into host ARC, instead of 
directly from the guest memory. Would need testing to know what performance was 
like and if there are any side effects.

I do agree that it doesn't seem unnecessary to have any redundancy in the guest 
if the host pool is redundant. Save for any glaring bugs in the virtual disk 
emulation, you wouldn't expect to get errors on the guest pool if the host pool 
is already checksumming the data.

It's also worth testing with guest ARC enabled but just limited to a fairly 
small size, so you're not disabling it entirely, but doing at little 
double-caching as possible.

ZFS features seems perfect for virtual hosts, although it's not ideal that you 
have to give up a big chunk of host RAM for ARC. You may also find that you 
need to limit host ARC, then only use MAX_RAM - MY_ARC_LIMIT for guests. 
Otherwise you'll have ZFS and VMs fighting for memory and enough of us have 
seen what shouldn't, but unfortunately does happen in that situation.

Matt
-

 Marcus
 ___
 freebsd-virtualization@freebsd.org mailing list 
 https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
 To unsubscribe, send any mail to 
 freebsd-virtualization-unsubscr...@freebsd.org
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
freebsd-virtualization-unsubscr...@freebsd.org


Re: Options for zfs inside a VM backed by zfs on the host

2015-08-27 Thread Marie
On Thu, Aug 27, 2015 at 11:42 AM Matt Churchyard via freebsd-virtualization
freebsd-virtualization@freebsd.org wrote:

  On Wed, Aug 26, 2015 at 11:10:44PM -0700, Marcus Reid wrote:
  On Wed, Aug 26, 2015 at 05:25:52PM -0400, Vick Khera wrote:
Opinions? Preferably well-reasoned ones. :)
   
   However, having the ARC eating up lots of memory twice seems pretty
   bletcherous.  You can probably do some tuning to reduce that, but I
   never liked tuning the ARC much.

  I just realized that you can turn primarycache off per-dataset.  Does it
 make more sense to turn primarycache=none on the zvol on the host, or  on
 the datasets in the vm?  I'm thinking on the host, but it might be worth
 experimenting.

 I'd be very wary of disabling ARC on the main host, it can have pretty
 serious side effects. It could possibly be useful in the guest though, as
 data should be cached already by ARC on the host, you're just going through
 an extra step of reading through the virtual disk driver, and into host
 ARC, instead of directly from the guest memory. Would need testing to know
 what performance was like and if there are any side effects.

 I do agree that it doesn't seem unnecessary to have any redundancy in the
 guest if the host pool is redundant. Save for any glaring bugs in the
 virtual disk emulation, you wouldn't expect to get errors on the guest pool
 if the host pool is already checksumming the data.

 It's also worth testing with guest ARC enabled but just limited to a
 fairly small size, so you're not disabling it entirely, but doing at little
 double-caching as possible.

 ZFS features seems perfect for virtual hosts, although it's not ideal that
 you have to give up a big chunk of host RAM for ARC. You may also find that
 you need to limit host ARC, then only use MAX_RAM - MY_ARC_LIMIT for
 guests. Otherwise you'll have ZFS and VMs fighting for memory and enough of
 us have seen what shouldn't, but unfortunately does happen in that
 situation.

 Matt
 -

  Marcus


I've tried this in the past, and found the worst performance penalty was
with ARC disabled in guest. I tried with ARC enabled on host and guest,
only on host, only on guest. There was a significant performance penalty
with either ARC disabled.

I'd still recommend to experiment with it on your own to see if the hit is
acceptable or not.

Shameless plug: I'm working on a project (tunnelfs.io) which should be
useful for this use case. :) Unfortunately, there is no ETA on usable code
yet.

--
Marie Helene Kvello-Aune
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
freebsd-virtualization-unsubscr...@freebsd.org


Re: Options for zfs inside a VM backed by zfs on the host

2015-08-27 Thread Marcus Reid
On Wed, Aug 26, 2015 at 05:25:52PM -0400, Vick Khera wrote:
 I'm running FreeBSD inside a VM that is providing the virtual disks backed
 by several ZFS zvols on the host. I want to run ZFS on the VM itself too
 for simplified management and backup purposes.
 
 The question I have is on the VM guest, do I really need to run a raid-z or
 mirror or can I just use a single virtual disk (or even a stripe)? Given
 that the underlying storage for the virtual disk is a zvol on a raid-z
 there should not really be too much worry for data corruption, I would
 think. It would be equivalent to using a hardware raid for each component
 of my zfs pool.
 
 Opinions? Preferably well-reasoned ones. :)

This is a frustrating situation, because none of the options that I can
think of look particularly appealing.  Single-vdev pools would be the
best option, your redundancy is already taken care of by the host's
pool.  The overhead of checksumming, etc. twice is probably not super
bad.  However, having the ARC eating up lots of memory twice seems
pretty bletcherous.  You can probably do some tuning to reduce that, but
I never liked tuning the ARC much.

All the nice features ZFS brings to the table is hard to give up once
you get used to having them around, so I understand your quandry.

Marcus
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
freebsd-virtualization-unsubscr...@freebsd.org


Re: Options for zfs inside a VM backed by zfs on the host

2015-08-27 Thread Tenzin Lhakhang
That was a really awesome read!  The idea of turning metadata on at the
backend zpool and then data on the VM was interesting, I will give that a
try. Please can you elaborate more on the ZILs and synchronous writes by
VMs.. that seems like a great topic.
-
I am right now exploring the question: are SSD ZILs necessary in an all SSD
pool? and then the question of NVMe SSD ZILs onto of an all SSD pool.  My
guess at the moment is that SSD ZILs are not necessary at all in an SSD
pool during intensive IO.  I've been told that ZILs are always there to
help you, but when your pool aggregate IOPs is greater than the a ZIL, it
doesn't seem to make sense.. Or is it the latency of writing to a single
disk vs striping across your fast vdevs?

Thanks,
Tenzin

On Thu, Aug 27, 2015 at 3:53 PM, Chad J. Milios mil...@ccsys.com wrote:

  On Aug 27, 2015, at 10:46 AM, Allan Jude allanj...@freebsd.org wrote:
 
  On 2015-08-27 02:10, Marcus Reid wrote:
  On Wed, Aug 26, 2015 at 05:25:52PM -0400, Vick Khera wrote:
  I'm running FreeBSD inside a VM that is providing the virtual disks
 backed
  by several ZFS zvols on the host. I want to run ZFS on the VM itself
 too
  for simplified management and backup purposes.
 
  The question I have is on the VM guest, do I really need to run a
 raid-z or
  mirror or can I just use a single virtual disk (or even a stripe)?
 Given
  that the underlying storage for the virtual disk is a zvol on a raid-z
  there should not really be too much worry for data corruption, I would
  think. It would be equivalent to using a hardware raid for each
 component
  of my zfs pool.
 
  Opinions? Preferably well-reasoned ones. :)
 
  This is a frustrating situation, because none of the options that I can
  think of look particularly appealing.  Single-vdev pools would be the
  best option, your redundancy is already taken care of by the host's
  pool.  The overhead of checksumming, etc. twice is probably not super
  bad.  However, having the ARC eating up lots of memory twice seems
  pretty bletcherous.  You can probably do some tuning to reduce that, but
  I never liked tuning the ARC much.
 
  All the nice features ZFS brings to the table is hard to give up once
  you get used to having them around, so I understand your quandry.
 
  Marcus
 
  You can just:
 
  zfs set primarycache=metadata poolname
 
  And it will only cache metadata in the ARC inside the VM, and avoid
  caching data blocks, which will be cached outside the VM. You could even
  turn the primarycache off entirely.
 
  --
  Allan Jude

  On Aug 27, 2015, at 1:20 PM, Paul Vixie p...@redbarn.org wrote:
 
  let me ask a related question: i'm using FFS in the guest, zvol on the
  host. should i be telling my guest kernel to not bother with an FFS
  buffer cache at all, or to use a smaller one, or what?


 Whether we are talking ffs, ntfs or zpool atop zvol, unfortunately there
 are really no simple answers. You must consider your use case, the host and
 vm hardware/software configuration, perform meaningful benchmarks and, if
 you care about data integrity, thorough tests of the likely failure modes
 (all far more easily said than done). I’m curious to hear more about your
 use case(s) and setups so as to offer better insight on what alternatives
 may make more/less sense for you. Performance needs? Are you striving for
 lower individual latency or higher combined throughput? How critical are
 integrity and availability? How do you prefer your backup routine? Do you
 handle that in guest or host? Want features like dedup and/or L2ARC up in
 the mix? (Then everything bears reconsideration, just about triple your
 research and testing efforts.)

 Sorry, I’m really not trying to scare anyone away from ZFS. It is awesome
 and capable of providing amazing solutions with very reliable and sensible
 behavior if handled with due respect, fear, monitoring and upkeep. :)

 There are cases to be made for caching [meta-]data in the child, in the
 parent, checksumming in the child/parent/both, compressing in the
 child/parent. I believe `gstat` along with your custom-made benchmark or
 test load will greatly help guide you.

 ZFS on ZFS seems to be a hardly studied, seldom reported, never
 documented, tedious exercise. Prepare for accelerated greying and balding
 of your hair. The parent's volblocksize, child's ashift, alignment,
 interactions involving raidz stripes (if used) can lead to problems from
 slightly decreased performance and storage efficiency to pathological write
 amplification within ZFS, performance and responsiveness crashing and
 sinking to the bottom of the ocean. Some datasets can become veritable
 black holes to vfs system calls. You may see ZFS reporting elusive errors,
 deadlocking or panicing in the child or parent altogether. With diligence
 though, stable and performant setups can be discovered for many production
 situations.

 For example, for a zpool (whether used by a VM or not, locally, thru
 iscsi, ggate[cd], or whatever) 

Options for zfs inside a VM backed by zfs on the host

2015-08-26 Thread Vick Khera
I'm running FreeBSD inside a VM that is providing the virtual disks backed
by several ZFS zvols on the host. I want to run ZFS on the VM itself too
for simplified management and backup purposes.

The question I have is on the VM guest, do I really need to run a raid-z or
mirror or can I just use a single virtual disk (or even a stripe)? Given
that the underlying storage for the virtual disk is a zvol on a raid-z
there should not really be too much worry for data corruption, I would
think. It would be equivalent to using a hardware raid for each component
of my zfs pool.

Opinions? Preferably well-reasoned ones. :)
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
freebsd-virtualization-unsubscr...@freebsd.org