Re: Storage overhead on zvols
> On Dec 5, 2017, at 10:41 AM, Rodney W. Grimes > <freebsd-...@pdx.rh.cn85.dnsmgr.net> wrote: > >> >> >> Dustin Wenz wrote: >>> I'm not using ZFS in my VMs for data integrity (the host already >>> provides that); it's mainly for the easy creation and management of >>> filesystems, and the ability to do snapshots for rollback and >>> replication. >> >> snapshot and replication works fine on the host, acting on the zvol. > > I suspect he is snapshotting and doing send/recvs of something > much less than the zvol, probably some datasetbs, maybe boot > envorinments, a snapshot of the whole zvol is ok if your managing > data at the VM level, not so good if you got lots of stuff going > on inside the VM. Exactly, it's useful to have control of each filesystem discretely. >>> Some of my deployments have hundreds of filesystems in >>> an organized hierarchy, with delegated permissions and automated >>> snapshots, send/recvs, and clones for various operations. >> >> what kind of zpool do you use in the guest, to avoid unwanted additional >> redundancy? > > Just a simple stripe of 1 device would be my guess, though your > still gona have metadata redundancy. Also correct; just using the zvol virtual device as a single-disk pool. >> >> did you benchmark the space or time efficiency of ZFS vs. UFS? >> >> in some bsd related meeting this year i asked allan jude for a bhyve >> level null mount, so that we could access at / inside the guest some >> subtree of the host, and avoid block devices and file systems >> altogether. right now i have to use nfs for that, which is irritating. > > This is not as simple as it seems, remember bhyve is just presenting > a hardware environment, hardware environments dont have a file system > concept per se, unlike jails which are providing a software environment. > > In effect what your asking for is what NFS does, so use NFS and get > over the fact that this is the way to get what you want. Sure you > could implement a virt-vfs but I wonder how close the spec of that > would be to the spec of NFS. > > Or maybe thats the answer, implement virt-vfs as a more effecient way > to transport nfs calls in and out of the guest. I've not done any deliberate comparisons for latency or throughput. What I've decided to virtualize does not have any exceptional performance requirements. If I need the best possible IO, I would lean toward using jails instead of a hypervisor. - .Dustin smime.p7s Description: S/MIME cryptographic signature
Re: Storage overhead on zvols
Thanks for linking that resource. The purpose of my posting was to increase the body of knowledge available to people who are running bhyve on zfs. It's a versatile way to deploy guests, but I haven't seen much practical advise about doing it efficiently. Allan's explanation yesterday of how allocations are padded is exactly the sort of breakdown I could have used when I first started provisioning VMs. I'm sure other people will find this conversation useful as well. - .Dustin > On Dec 4, 2017, at 9:37 PM, Adam Vande More <amvandem...@gmail.com> wrote: > > On Mon, Dec 4, 2017 at 5:19 PM, Dustin Wenz <dustinw...@ebureau.com> wrote: > I'm starting a new thread based on the previous discussion in "bhyve uses all > available memory during IO-intensive operations" relating to size inflation > of bhyve data stored on zvols. I've done some experimenting with this, and I > think it will be useful for others. > > The zvols listed here were created with this command: > > zfs create -o volmode=dev -o volblocksize=Xk -V 30g > vm00/chyves/guests/myguest/diskY > > The zvols were created on a raidz1 pool of four disks. For each zvol, I > created a basic zfs filesystem in the guest using all default tuning (128k > recordsize, etc). I then copied the same 8.2GB dataset to each filesystem. > > volblocksizesize amplification > > 512B11.7x > 4k 1.45x > 8k 1.45x > 16k 1.5x > 32k 1.65x > 64k 1x > 128k1x > > The worst case is with a 512B volblocksize, where the space used is more than > 11 times the size of the data stored within the guest. The size efficiency > gains are non-linear as I continue from 4k and double the block sizes; 32k > blocks being the second-worst. The amount of wasted space was minimized by > using 64k and 128k blocks. > > It would appear that 64k is a good choice for volblocksize if you are using a > zvol to back your VM, and the VM is using the virtual device for a zpool. > Incidentally, I believe this is the default when creating VMs in FreeNAS. > > I'm not sure what your purpose is behind the posting, but if its simply a > "why this behavior" you can find more detail here as well as some calculation > leg work: > > https://www.delphix.com/blog/delphix-engineering/zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz > > -- > Adam smime.p7s Description: S/MIME cryptographic signature
Storage overhead on zvols
I'm starting a new thread based on the previous discussion in "bhyve uses all available memory during IO-intensive operations" relating to size inflation of bhyve data stored on zvols. I've done some experimenting with this, and I think it will be useful for others. The zvols listed here were created with this command: zfs create -o volmode=dev -o volblocksize=Xk -V 30g vm00/chyves/guests/myguest/diskY The zvols were created on a raidz1 pool of four disks. For each zvol, I created a basic zfs filesystem in the guest using all default tuning (128k recordsize, etc). I then copied the same 8.2GB dataset to each filesystem. volblocksizesize amplification 512B11.7x 4k 1.45x 8k 1.45x 16k 1.5x 32k 1.65x 64k 1x 128k1x The worst case is with a 512B volblocksize, where the space used is more than 11 times the size of the data stored within the guest. The size efficiency gains are non-linear as I continue from 4k and double the block sizes; 32k blocks being the second-worst. The amount of wasted space was minimized by using 64k and 128k blocks. It would appear that 64k is a good choice for volblocksize if you are using a zvol to back your VM, and the VM is using the virtual device for a zpool. Incidentally, I believe this is the default when creating VMs in FreeNAS. - .Dustin smime.p7s Description: S/MIME cryptographic signature
Re: bhyve uses all available memory during IO-intensive operations
I have noticed significant storage amplification for my zvols; that could very well be the reason. I would like to know more about why it happens. Since the volblocksize is 512 bytes, I certainly expect extra cpu overhead (and maybe an extra 1k or so worth of checksums for each 128k block in the vm), but how do you get a 10X expansion in stored data? What is the recommended zvol block size for a FreeBSD/ZFS guest? Perhaps 4k, to match the most common mass storage sector size? - .Dustin > On Dec 1, 2017, at 9:18 PM, K. Macy <km...@freebsd.org> wrote: > > One thing to watch out for with chyves if your virtual disk is more > than 20G is the fact that it uses 512 byte blocks for the zvols it > creates. I ended up using up 1.4TB only half filling up a 250G zvol. > Chyves is quick and easy, but it's not exactly production ready. > > -M > > > >> On Thu, Nov 30, 2017 at 3:15 PM, Dustin Wenz <dustinw...@ebureau.com> wrote: >> I'm using chyves on FreeBSD 11.1 RELEASE to manage a few VMs (guest OS is >> also FreeBSD 11.1). Their sole purpose is to house some medium-sized >> Postgres databases (100-200GB). The host system has 64GB of real memory and >> 112GB of swap. I have configured each guest to only use 16GB of memory, yet >> while doing my initial database imports in the VMs, bhyve will quickly grow >> to use all available system memory and then be killed by the kernel: >> >>kernel: swap_pager: I/O error - pageout failed; blkno 1735,size 4096, >> error 12 >>kernel: swap_pager: I/O error - pageout failed; blkno 1610,size 4096, >> error 12 >>kernel: swap_pager: I/O error - pageout failed; blkno 1763,size 4096, >> error 12 >>kernel: pid 41123 (bhyve), uid 0, was killed: out of swap space >> >> The OOM condition seems related to doing moderate IO within the VM, though >> nothing within the VM itself shows high memory usage. This is the chyves >> config for one of them: >> >>bargs -A -H -P -S >>bhyve_disk_typevirtio-blk >>bhyve_net_type virtio-net >>bhyveload_flags >>chyves_guest_version 0300 >>cpu4 >>creation Created on Mon Oct 23 16:17:04 CDT 2017 by >> chyves v0.2.0 2016/09/11 using __create() >>loader bhyveload >>net_ifaces tap51 >>os default >>ram16G >>rcboot 0 >>revert_to_snapshot >>revert_to_snapshot_method off >>serial nmdm51 >>template no >>uuid 8495a130-b837-11e7-b092-0025909a8b56 >> >> >> I've also tried using different bhyve_disk_types, with no improvement. How >> is it that bhyve can use far more memory that I'm specifying? >> >>- .Dustin ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: bhyve uses all available memory during IO-intensive operations
I've been running a database stress test on my VMs for the last few hours without issue, and I've noticed no unexpected memory usage. Prior to changing the wired option, this would never have run as long. I haven't limited the ARC size yet, but I probably will since it sounds like best practice for a bhyve host. The commit history shows that chyves defaults to -S if you are hosting from FreeBSD 10.3 or later. I'm sure they had a reason for doing that, but I don't know what that would be. It seems to an inefficient use of main memory if you need to run a lot of VMs. Thanks everyone for helping to nail this down! - .Dustin > On Dec 1, 2017, at 12:09 PM, Dustin Wenz <dustinw...@ebureau.com> wrote: > > Yep, and that's also why bhyve is getting killed instead of paging out. For > some inexplicable reason, chyves defaulted to setting -S on new VMs. That has > the effect of wiring in the max amount of memory for each guest at startup. > > I changed the bargs option to "-A -H -P" instead of "-A -H -P -S". Memory > pressure is greatly alleviated upon restart. I'm going to do more testing, > but I suspect this will fix my problem. Take this as a PSA for chyves users. > > - .Dustin > >> On Dec 1, 2017, at 11:56 AM, Peter Grehan <gre...@freebsd.org> wrote: >> >> The -S flag to bhyve wires guest memory so it won't be swapped out. >> >> later, >> >> Peter. > smime.p7s Description: S/MIME cryptographic signature
Re: bhyve uses all available memory during IO-intensive operations
Yep, and that's also why bhyve is getting killed instead of paging out. For some inexplicable reason, chyves defaulted to setting -S on new VMs. That has the effect of wiring in the max amount of memory for each guest at startup. I changed the bargs option to "-A -H -P" instead of "-A -H -P -S". Memory pressure is greatly alleviated upon restart. I'm going to do more testing, but I suspect this will fix my problem. Take this as a PSA for chyves users. - .Dustin > On Dec 1, 2017, at 11:56 AM, Peter Grehanwrote: > > The -S flag to bhyve wires guest memory so it won't be swapped out. > > later, > > Peter. smime.p7s Description: S/MIME cryptographic signature
Re: bhyve uses all available memory during IO-intensive operations
Here's the top -uS output from a test this morning: last pid: 57375; load averages: 8.29, 7.02, 4.05 up 38+22:19:14 11:28:25 68 processes: 2 running, 65 sleeping, 1 waiting CPU: 0.1% user, 0.0% nice, 40.4% system, 0.4% interrupt, 59.1% idle Mem: 2188K Active, 4K Inact, 62G Wired, 449M Free ARC: 7947M Total, 58M MFU, 3364M MRU, 1000M Anon, 2620M Header, 904M Other 4070M Compressed, 4658M Uncompressed, 1.14:1 Ratio Swap: 112G Total, 78M Used, 112G Free, 4K In, 12K Out PIDUIDTHR PRI NICE SIZERES STATE C TIMEWCPU COMMAND 11 0 24 155 ki31 0K 384K RUN 0??? 1446.82% idle 0 0644 -16- 0K 10304K swapin 21 554:59 492.45% kernel 57333 0 30 200 17445M 1325M kqread 9 16:38 357.42% bhyve 15 0 10 -8- 0K 192K arc_re 20 80:54 81.55% zfskern 5 0 6 -16- 0K96K - 5 12:35 11.50% cam 12 0 53 -60- 0K 848K WAIT 21 74:35 9.40% intr 41094 0 30 200 17445M 14587M kqread 17 301:29 0.39% bhyve Dec 1 11:29:31 service014 kernel: pid 57333 (bhyve), uid 0, was killed: out of swap space Dec 1 11:29:31 service014 kernel: pid 69549 (bhyve), uid 0, was killed: out of swap space Dec 1 11:29:31 service014 kernel: pid 41094 (bhyve), uid 0, was killed: out of swap space This was with three VMs running, but only one of them was doing any IO. Note that the whole machine hung for about about 60 seconds before the VMs were shut down and memory recovered. That's why the top output is over a minute older than the kill messages (top had stopped refreshing). What I'm suspicious of is that almost all of the physical memory is wired. If that is bhyve memory, why did it not page out? - .Dustin > On Nov 30, 2017, at 5:15 PM, Dustin Wenz <dustinw...@ebureau.com> wrote: > > I'm using chyves on FreeBSD 11.1 RELEASE to manage a few VMs (guest OS is > also FreeBSD 11.1). Their sole purpose is to house some medium-sized Postgres > databases (100-200GB). The host system has 64GB of real memory and 112GB of > swap. I have configured each guest to only use 16GB of memory, yet while > doing my initial database imports in the VMs, bhyve will quickly grow to use > all available system memory and then be killed by the kernel: > > kernel: swap_pager: I/O error - pageout failed; blkno 1735,size 4096, > error 12 > kernel: swap_pager: I/O error - pageout failed; blkno 1610,size 4096, > error 12 > kernel: swap_pager: I/O error - pageout failed; blkno 1763,size 4096, > error 12 > kernel: pid 41123 (bhyve), uid 0, was killed: out of swap space > > The OOM condition seems related to doing moderate IO within the VM, though > nothing within the VM itself shows high memory usage. This is the chyves > config for one of them: > > bargs -A -H -P -S > bhyve_disk_typevirtio-blk > bhyve_net_type virtio-net > bhyveload_flags > chyves_guest_version 0300 > cpu4 > creation Created on Mon Oct 23 16:17:04 CDT 2017 by > chyves v0.2.0 2016/09/11 using __create() > loader bhyveload > net_ifaces tap51 > os default > ram16G > rcboot 0 > revert_to_snapshot > revert_to_snapshot_method off > serial nmdm51 > template no > uuid 8495a130-b837-11e7-b092-0025909a8b56 > > > I've also tried using different bhyve_disk_types, with no improvement. How is > it that bhyve can use far more memory that I'm specifying? > > - .Dustin smime.p7s Description: S/MIME cryptographic signature
Re: bhyve uses all available memory during IO-intensive operations
I am using a zvol as the storage for the VM, and I do not have any ARC limits set. However, the bhyve process itself ends up grabbing the vast majority of memory. I’ll run a test tomorrow to get the exact output from top. - .Dustin > On Nov 30, 2017, at 5:28 PM, Allan Jude <allanj...@freebsd.org> wrote: > >> On 11/30/2017 18:15, Dustin Wenz wrote: >> I'm using chyves on FreeBSD 11.1 RELEASE to manage a few VMs (guest OS is >> also FreeBSD 11.1). Their sole purpose is to house some medium-sized >> Postgres databases (100-200GB). The host system has 64GB of real memory and >> 112GB of swap. I have configured each guest to only use 16GB of memory, yet >> while doing my initial database imports in the VMs, bhyve will quickly grow >> to use all available system memory and then be killed by the kernel: >> >>kernel: swap_pager: I/O error - pageout failed; blkno 1735,size 4096, >> error 12 >>kernel: swap_pager: I/O error - pageout failed; blkno 1610,size 4096, >> error 12 >>kernel: swap_pager: I/O error - pageout failed; blkno 1763,size 4096, >> error 12 >>kernel: pid 41123 (bhyve), uid 0, was killed: out of swap space >> >> The OOM condition seems related to doing moderate IO within the VM, though >> nothing within the VM itself shows high memory usage. This is the chyves >> config for one of them: >> >>bargs -A -H -P -S >>bhyve_disk_typevirtio-blk >>bhyve_net_type virtio-net >>bhyveload_flags >>chyves_guest_version 0300 >>cpu4 >>creation Created on Mon Oct 23 16:17:04 CDT 2017 by >> chyves v0.2.0 2016/09/11 using __create() >>loader bhyveload >>net_ifaces tap51 >>os default >>ram16G >>rcboot 0 >>revert_to_snapshot >>revert_to_snapshot_method off >>serial nmdm51 >>template no >>uuid 8495a130-b837-11e7-b092-0025909a8b56 >> >> >> I've also tried using different bhyve_disk_types, with no improvement. How >> is it that bhyve can use far more memory that I'm specifying? >> >>- .Dustin >> > > Can you show 'top' output. What makes you think bhyve is using the > memory? Are you using ZFS? Have you limited the vfs.zfs.arc_max to leave > some free RAM for the bhyve instances? > > -- > Allan Jude > ___ > freebsd-virtualization@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to > "freebsd-virtualization-unsubscr...@freebsd.org" ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
bhyve uses all available memory during IO-intensive operations
I'm using chyves on FreeBSD 11.1 RELEASE to manage a few VMs (guest OS is also FreeBSD 11.1). Their sole purpose is to house some medium-sized Postgres databases (100-200GB). The host system has 64GB of real memory and 112GB of swap. I have configured each guest to only use 16GB of memory, yet while doing my initial database imports in the VMs, bhyve will quickly grow to use all available system memory and then be killed by the kernel: kernel: swap_pager: I/O error - pageout failed; blkno 1735,size 4096, error 12 kernel: swap_pager: I/O error - pageout failed; blkno 1610,size 4096, error 12 kernel: swap_pager: I/O error - pageout failed; blkno 1763,size 4096, error 12 kernel: pid 41123 (bhyve), uid 0, was killed: out of swap space The OOM condition seems related to doing moderate IO within the VM, though nothing within the VM itself shows high memory usage. This is the chyves config for one of them: bargs -A -H -P -S bhyve_disk_typevirtio-blk bhyve_net_type virtio-net bhyveload_flags chyves_guest_version 0300 cpu4 creation Created on Mon Oct 23 16:17:04 CDT 2017 by chyves v0.2.0 2016/09/11 using __create() loader bhyveload net_ifaces tap51 os default ram16G rcboot 0 revert_to_snapshot revert_to_snapshot_method off serial nmdm51 template no uuid 8495a130-b837-11e7-b092-0025909a8b56 I've also tried using different bhyve_disk_types, with no improvement. How is it that bhyve can use far more memory that I'm specifying? - .Dustin smime.p7s Description: S/MIME cryptographic signature