Re: Storage overhead on zvols

2017-12-05 Thread Dustin Wenz

> On Dec 5, 2017, at 10:41 AM, Rodney W. Grimes 
> <freebsd-...@pdx.rh.cn85.dnsmgr.net> wrote:
> 
>> 
>> 
>> Dustin Wenz wrote:
>>> I'm not using ZFS in my VMs for data integrity (the host already
>>> provides that); it's mainly for the easy creation and management of
>>> filesystems, and the ability to do snapshots for rollback and
>>> replication.
>> 
>> snapshot and replication works fine on the host, acting on the zvol.
> 
> I suspect he is snapshotting and doing send/recvs of something
> much less than the zvol, probably some datasetbs, maybe boot
> envorinments, a snapshot of the whole zvol is ok if your managing
> data at the VM level, not so good if you got lots of stuff going
> on inside the VM.

Exactly, it's useful to have control of each filesystem discretely.

>>> Some of my deployments have hundreds of filesystems in
>>> an organized hierarchy, with delegated permissions and automated
>>> snapshots, send/recvs, and clones for various operations.
>> 
>> what kind of zpool do you use in the guest, to avoid unwanted additional 
>> redundancy?
> 
> Just a simple stripe of 1 device would be my guess, though your
> still gona have metadata redundancy.

Also correct; just using the zvol virtual device as a single-disk pool.


>> 
>> did you benchmark the space or time efficiency of ZFS vs. UFS?
>> 
>> in some bsd related meeting this year i asked allan jude for a bhyve 
>> level null mount, so that we could access at / inside the guest some 
>> subtree of the host, and avoid block devices and file systems 
>> altogether. right now i have to use nfs for that, which is irritating.
> 
> This is not as simple as it seems, remember bhyve is just presenting
> a hardware environment, hardware environments dont have a file system
> concept per se, unlike jails which are providing a software environment.
> 
> In effect what your asking for is what NFS does, so use NFS and get
> over the fact that this is the way to get what you want.  Sure you
> could implement a virt-vfs but I wonder how close the spec of that
> would be to the spec of NFS.
> 
> Or maybe thats the answer, implement virt-vfs as a more effecient way
> to transport nfs calls in and out of the guest.


I've not done any deliberate comparisons for latency or throughput. What I've 
decided to virtualize does not have any exceptional performance requirements. 
If I need the best possible IO, I would lean toward using jails instead of a 
hypervisor.

- .Dustin




smime.p7s
Description: S/MIME cryptographic signature


Re: Storage overhead on zvols

2017-12-05 Thread Dustin Wenz
Thanks for linking that resource. The purpose of my posting was to increase the 
body of knowledge available to people who are running bhyve on zfs. It's a 
versatile way to deploy guests, but I haven't seen much practical advise about 
doing it efficiently. 

Allan's explanation yesterday of how allocations are padded is exactly the sort 
of breakdown I could have used when I first started provisioning VMs. I'm sure 
other people will find this conversation useful as well.

- .Dustin

> On Dec 4, 2017, at 9:37 PM, Adam Vande More <amvandem...@gmail.com> wrote:
> 
> On Mon, Dec 4, 2017 at 5:19 PM, Dustin Wenz <dustinw...@ebureau.com> wrote:
> I'm starting a new thread based on the previous discussion in "bhyve uses all 
> available memory during IO-intensive operations" relating to size inflation 
> of bhyve data stored on zvols. I've done some experimenting with this, and I 
> think it will be useful for others.
> 
> The zvols listed here were created with this command:
> 
> zfs create -o volmode=dev -o volblocksize=Xk -V 30g 
> vm00/chyves/guests/myguest/diskY
> 
> The zvols were created on a raidz1 pool of four disks. For each zvol, I 
> created a basic zfs filesystem in the guest using all default tuning (128k 
> recordsize, etc). I then copied the same 8.2GB dataset to each filesystem.
> 
> volblocksizesize amplification
> 
> 512B11.7x
> 4k  1.45x
> 8k  1.45x
> 16k 1.5x
> 32k 1.65x
> 64k 1x
> 128k1x
> 
> The worst case is with a 512B volblocksize, where the space used is more than 
> 11 times the size of the data stored within the guest. The size efficiency 
> gains are non-linear as I continue from 4k and double the block sizes; 32k 
> blocks being the second-worst. The amount of wasted space was minimized by 
> using 64k and 128k blocks.
> 
> It would appear that 64k is a good choice for volblocksize if you are using a 
> zvol to back your VM, and the VM is using the virtual device for a zpool. 
> Incidentally, I believe this is the default when creating VMs in FreeNAS.
> 
> I'm not sure what your purpose is behind the posting, but if its simply a 
> "why this behavior" you can find more detail here as well as some calculation 
> leg work:
> 
> https://www.delphix.com/blog/delphix-engineering/zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz
> 
> -- 
> Adam



smime.p7s
Description: S/MIME cryptographic signature


Storage overhead on zvols

2017-12-04 Thread Dustin Wenz
I'm starting a new thread based on the previous discussion in "bhyve uses all 
available memory during IO-intensive operations" relating to size inflation of 
bhyve data stored on zvols. I've done some experimenting with this, and I think 
it will be useful for others.

The zvols listed here were created with this command:

zfs create -o volmode=dev -o volblocksize=Xk -V 30g 
vm00/chyves/guests/myguest/diskY

The zvols were created on a raidz1 pool of four disks. For each zvol, I created 
a basic zfs filesystem in the guest using all default tuning (128k recordsize, 
etc). I then copied the same 8.2GB dataset to each filesystem.

volblocksizesize amplification

512B11.7x
4k  1.45x
8k  1.45x
16k 1.5x
32k 1.65x
64k 1x
128k1x

The worst case is with a 512B volblocksize, where the space used is more than 
11 times the size of the data stored within the guest. The size efficiency 
gains are non-linear as I continue from 4k and double the block sizes; 32k 
blocks being the second-worst. The amount of wasted space was minimized by 
using 64k and 128k blocks.

It would appear that 64k is a good choice for volblocksize if you are using a 
zvol to back your VM, and the VM is using the virtual device for a zpool. 
Incidentally, I believe this is the default when creating VMs in FreeNAS.

- .Dustin



smime.p7s
Description: S/MIME cryptographic signature


Re: bhyve uses all available memory during IO-intensive operations

2017-12-01 Thread Dustin Wenz
I have noticed significant storage amplification for my zvols; that could very 
well be the reason. I would like to know more about why it happens. 

Since the volblocksize is 512 bytes, I certainly expect extra cpu overhead (and 
maybe an extra 1k or so worth of checksums for each 128k block in the vm), but 
how do you get a 10X expansion in stored data?

What is the recommended zvol block size for a FreeBSD/ZFS guest? Perhaps 4k, to 
match the most common mass storage sector size?

- .Dustin

> On Dec 1, 2017, at 9:18 PM, K. Macy <km...@freebsd.org> wrote:
> 
> One thing to watch out for with chyves if your virtual disk is more
> than 20G is the fact that it uses 512 byte blocks for the zvols it
> creates. I ended up using up 1.4TB only half filling up a 250G zvol.
> Chyves is quick and easy, but it's not exactly production ready.
> 
> -M
> 
> 
> 
>> On Thu, Nov 30, 2017 at 3:15 PM, Dustin Wenz <dustinw...@ebureau.com> wrote:
>> I'm using chyves on FreeBSD 11.1 RELEASE to manage a few VMs (guest OS is 
>> also FreeBSD 11.1). Their sole purpose is to house some medium-sized 
>> Postgres databases (100-200GB). The host system has 64GB of real memory and 
>> 112GB of swap. I have configured each guest to only use 16GB of memory, yet 
>> while doing my initial database imports in the VMs, bhyve will quickly grow 
>> to use all available system memory and then be killed by the kernel:
>> 
>>kernel: swap_pager: I/O error - pageout failed; blkno 1735,size 4096, 
>> error 12
>>kernel: swap_pager: I/O error - pageout failed; blkno 1610,size 4096, 
>> error 12
>>kernel: swap_pager: I/O error - pageout failed; blkno 1763,size 4096, 
>> error 12
>>kernel: pid 41123 (bhyve), uid 0, was killed: out of swap space
>> 
>> The OOM condition seems related to doing moderate IO within the VM, though 
>> nothing within the VM itself shows high memory usage. This is the chyves 
>> config for one of them:
>> 
>>bargs  -A -H -P -S
>>bhyve_disk_typevirtio-blk
>>bhyve_net_type virtio-net
>>bhyveload_flags
>>chyves_guest_version   0300
>>cpu4
>>creation   Created on Mon Oct 23 16:17:04 CDT 2017 by 
>> chyves v0.2.0 2016/09/11 using __create()
>>loader bhyveload
>>net_ifaces tap51
>>os default
>>ram16G
>>rcboot 0
>>revert_to_snapshot
>>revert_to_snapshot_method  off
>>serial nmdm51
>>template   no
>>uuid   8495a130-b837-11e7-b092-0025909a8b56
>> 
>> 
>> I've also tried using different bhyve_disk_types, with no improvement. How 
>> is it that bhyve can use far more memory that I'm specifying?
>> 
>>- .Dustin
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: bhyve uses all available memory during IO-intensive operations

2017-12-01 Thread Dustin Wenz
I've been running a database stress test on my VMs for the last few hours 
without issue, and I've noticed no unexpected memory usage. Prior to changing 
the wired option, this would never have run as long. I haven't limited the ARC 
size yet, but I probably will since it sounds like best practice for a bhyve 
host.

The commit history shows that chyves defaults to -S if you are hosting from 
FreeBSD 10.3 or later. I'm sure they had a reason for doing that, but I don't 
know what that would be. It seems to an inefficient use of main memory if you 
need to run a lot of VMs.

Thanks everyone for helping to nail this down!

- .Dustin


> On Dec 1, 2017, at 12:09 PM, Dustin Wenz <dustinw...@ebureau.com> wrote:
> 
> Yep, and that's also why bhyve is getting killed instead of paging out. For 
> some inexplicable reason, chyves defaulted to setting -S on new VMs. That has 
> the effect of wiring in the max amount of memory for each guest at startup.
> 
> I changed the bargs option to "-A -H -P" instead of "-A -H -P -S". Memory 
> pressure is greatly alleviated upon restart. I'm going to do more testing, 
> but I suspect this will fix my problem. Take this as a PSA for chyves users.
> 
>   - .Dustin
> 
>> On Dec 1, 2017, at 11:56 AM, Peter Grehan <gre...@freebsd.org> wrote:
>> 
>> The -S flag to bhyve wires guest memory so it won't be swapped out.
>> 
>> later,
>> 
>> Peter.
> 



smime.p7s
Description: S/MIME cryptographic signature


Re: bhyve uses all available memory during IO-intensive operations

2017-12-01 Thread Dustin Wenz
Yep, and that's also why bhyve is getting killed instead of paging out. For 
some inexplicable reason, chyves defaulted to setting -S on new VMs. That has 
the effect of wiring in the max amount of memory for each guest at startup.

I changed the bargs option to "-A -H -P" instead of "-A -H -P -S". Memory 
pressure is greatly alleviated upon restart. I'm going to do more testing, but 
I suspect this will fix my problem. Take this as a PSA for chyves users.

- .Dustin

> On Dec 1, 2017, at 11:56 AM, Peter Grehan  wrote:
> 
> The -S flag to bhyve wires guest memory so it won't be swapped out.
> 
> later,
> 
> Peter.



smime.p7s
Description: S/MIME cryptographic signature


Re: bhyve uses all available memory during IO-intensive operations

2017-12-01 Thread Dustin Wenz
Here's the top -uS output from a test this morning:

last pid: 57375;  load averages:  8.29,  7.02,  4.05



 up 38+22:19:14  11:28:25
68 processes:  2 running, 65 sleeping, 1 waiting
CPU:  0.1% user,  0.0% nice, 40.4% system,  0.4% interrupt, 59.1% idle
Mem: 2188K Active, 4K Inact, 62G Wired, 449M Free
ARC: 7947M Total, 58M MFU, 3364M MRU, 1000M Anon, 2620M Header, 904M Other
 4070M Compressed, 4658M Uncompressed, 1.14:1 Ratio
Swap: 112G Total, 78M Used, 112G Free, 4K In, 12K Out

  PIDUIDTHR PRI NICE   SIZERES STATE   C   TIMEWCPU COMMAND
   11  0 24 155 ki31 0K   384K RUN 0??? 1446.82% idle
0  0644 -16- 0K 10304K swapin 21 554:59 492.45% kernel
57333  0 30  200 17445M  1325M kqread  9  16:38 357.42% bhyve
   15  0 10  -8- 0K   192K arc_re 20  80:54  81.55% zfskern
5  0  6 -16- 0K96K -   5  12:35  11.50% cam
   12  0 53 -60- 0K   848K WAIT   21  74:35   9.40% intr
41094  0 30  200 17445M 14587M kqread 17 301:29   0.39% bhyve

Dec  1 11:29:31  service014 kernel: pid 57333 (bhyve), uid 0, was 
killed: out of swap space
Dec  1 11:29:31  service014 kernel: pid 69549 (bhyve), uid 0, was 
killed: out of swap space
Dec  1 11:29:31  service014 kernel: pid 41094 (bhyve), uid 0, was 
killed: out of swap space


This was with three VMs running, but only one of them was doing any IO. Note 
that the whole machine hung for about about 60 seconds before the VMs were shut 
down and memory recovered. That's why the top output is over a minute older 
than the kill messages (top had stopped refreshing).

What I'm suspicious of is that almost all of the physical memory is wired. If 
that is bhyve memory, why did it not page out?


- .Dustin


> On Nov 30, 2017, at 5:15 PM, Dustin Wenz <dustinw...@ebureau.com> wrote:
> 
> I'm using chyves on FreeBSD 11.1 RELEASE to manage a few VMs (guest OS is 
> also FreeBSD 11.1). Their sole purpose is to house some medium-sized Postgres 
> databases (100-200GB). The host system has 64GB of real memory and 112GB of 
> swap. I have configured each guest to only use 16GB of memory, yet while 
> doing my initial database imports in the VMs, bhyve will quickly grow to use 
> all available system memory and then be killed by the kernel:
> 
>   kernel: swap_pager: I/O error - pageout failed; blkno 1735,size 4096, 
> error 12
>   kernel: swap_pager: I/O error - pageout failed; blkno 1610,size 4096, 
> error 12
>   kernel: swap_pager: I/O error - pageout failed; blkno 1763,size 4096, 
> error 12
>   kernel: pid 41123 (bhyve), uid 0, was killed: out of swap space
> 
> The OOM condition seems related to doing moderate IO within the VM, though 
> nothing within the VM itself shows high memory usage. This is the chyves 
> config for one of them:
> 
>   bargs  -A -H -P -S
>   bhyve_disk_typevirtio-blk
>   bhyve_net_type virtio-net
>   bhyveload_flags
>   chyves_guest_version   0300
>   cpu4
>   creation   Created on Mon Oct 23 16:17:04 CDT 2017 by 
> chyves v0.2.0 2016/09/11 using __create()
>   loader bhyveload
>   net_ifaces tap51
>   os default
>   ram16G
>   rcboot 0
>   revert_to_snapshot
>   revert_to_snapshot_method  off
>   serial nmdm51
>   template   no
>   uuid   8495a130-b837-11e7-b092-0025909a8b56
> 
> 
> I've also tried using different bhyve_disk_types, with no improvement. How is 
> it that bhyve can use far more memory that I'm specifying?
> 
>   - .Dustin



smime.p7s
Description: S/MIME cryptographic signature


Re: bhyve uses all available memory during IO-intensive operations

2017-11-30 Thread Dustin Wenz
I am using a zvol as the storage for the VM, and I do not have any ARC limits 
set. However, the bhyve process itself ends up grabbing the vast majority of 
memory. 

I’ll run a test tomorrow to get the exact output from top.

   - .Dustin

> On Nov 30, 2017, at 5:28 PM, Allan Jude <allanj...@freebsd.org> wrote:
> 
>> On 11/30/2017 18:15, Dustin Wenz wrote:
>> I'm using chyves on FreeBSD 11.1 RELEASE to manage a few VMs (guest OS is 
>> also FreeBSD 11.1). Their sole purpose is to house some medium-sized 
>> Postgres databases (100-200GB). The host system has 64GB of real memory and 
>> 112GB of swap. I have configured each guest to only use 16GB of memory, yet 
>> while doing my initial database imports in the VMs, bhyve will quickly grow 
>> to use all available system memory and then be killed by the kernel:
>> 
>>kernel: swap_pager: I/O error - pageout failed; blkno 1735,size 4096, 
>> error 12
>>kernel: swap_pager: I/O error - pageout failed; blkno 1610,size 4096, 
>> error 12
>>kernel: swap_pager: I/O error - pageout failed; blkno 1763,size 4096, 
>> error 12
>>kernel: pid 41123 (bhyve), uid 0, was killed: out of swap space
>> 
>> The OOM condition seems related to doing moderate IO within the VM, though 
>> nothing within the VM itself shows high memory usage. This is the chyves 
>> config for one of them:
>> 
>>bargs  -A -H -P -S
>>bhyve_disk_typevirtio-blk
>>bhyve_net_type virtio-net
>>bhyveload_flags
>>chyves_guest_version   0300
>>cpu4
>>creation   Created on Mon Oct 23 16:17:04 CDT 2017 by 
>> chyves v0.2.0 2016/09/11 using __create()
>>loader bhyveload
>>net_ifaces tap51
>>os default
>>ram16G
>>rcboot 0
>>revert_to_snapshot
>>revert_to_snapshot_method  off
>>serial nmdm51
>>template   no
>>uuid   8495a130-b837-11e7-b092-0025909a8b56
>> 
>> 
>> I've also tried using different bhyve_disk_types, with no improvement. How 
>> is it that bhyve can use far more memory that I'm specifying?
>> 
>>- .Dustin
>> 
> 
> Can you show 'top' output. What makes you think bhyve is using the
> memory? Are you using ZFS? Have you limited the vfs.zfs.arc_max to leave
> some free RAM for the bhyve instances?
> 
> -- 
> Allan Jude
> ___
> freebsd-virtualization@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
> To unsubscribe, send any mail to 
> "freebsd-virtualization-unsubscr...@freebsd.org"
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


bhyve uses all available memory during IO-intensive operations

2017-11-30 Thread Dustin Wenz
I'm using chyves on FreeBSD 11.1 RELEASE to manage a few VMs (guest OS is also 
FreeBSD 11.1). Their sole purpose is to house some medium-sized Postgres 
databases (100-200GB). The host system has 64GB of real memory and 112GB of 
swap. I have configured each guest to only use 16GB of memory, yet while doing 
my initial database imports in the VMs, bhyve will quickly grow to use all 
available system memory and then be killed by the kernel:

kernel: swap_pager: I/O error - pageout failed; blkno 1735,size 4096, 
error 12
kernel: swap_pager: I/O error - pageout failed; blkno 1610,size 4096, 
error 12
kernel: swap_pager: I/O error - pageout failed; blkno 1763,size 4096, 
error 12
kernel: pid 41123 (bhyve), uid 0, was killed: out of swap space

The OOM condition seems related to doing moderate IO within the VM, though 
nothing within the VM itself shows high memory usage. This is the chyves config 
for one of them:

bargs  -A -H -P -S
bhyve_disk_typevirtio-blk
bhyve_net_type virtio-net
bhyveload_flags
chyves_guest_version   0300
cpu4
creation   Created on Mon Oct 23 16:17:04 CDT 2017 by 
chyves v0.2.0 2016/09/11 using __create()
loader bhyveload
net_ifaces tap51
os default
ram16G
rcboot 0
revert_to_snapshot
revert_to_snapshot_method  off
serial nmdm51
template   no
uuid   8495a130-b837-11e7-b092-0025909a8b56


I've also tried using different bhyve_disk_types, with no improvement. How is 
it that bhyve can use far more memory that I'm specifying?

- .Dustin

smime.p7s
Description: S/MIME cryptographic signature