On 20.09.20 15:25, Maciej S. Szmigiero wrote: > From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com> > > This series adds a Hyper-V Dynamic Memory Protocol driver (hv-balloon) > and its protocol definitions. > Also included is a driver providing backing devices for memory hot-add > protocols ("haprots"). > > A haprot device works like a virtual DIMM stick: it allows inserting > extra RAM into the guest at run time. > > The main differences from the ACPI-based PC DIMM hotplug are: > * Notifying the guest about the new memory range is not done via ACPI but > via a protocol handler that registers with the haprot framework. > This means that the ACPI DIMM slot limit does not apply. > > * A protocol handler can prevent removal of a haprot device when it is > still in use by setting its "busy" field. > > * A protocol handler can also register an "unplug" callback so it gets > notified when an user decides to remove the haprot device. > This way the protocol handler can inform the guest about this fact and / or > do its own cleanup. > > The hv-balloon driver is like virtio-balloon on steroids: it allows both > changing the guest memory allocation via ballooning and inserting extra > RAM into it by adding haprot virtual DIMM sticks. > One of advantages of these over ACPI-based PC DIMM hotplug is that such > memory can be hotplugged in much smaller granularity because the ACPI DIMM > slot limit does not apply.
Reading further below, it's essentially DIMM-based memory hotplug + virtio-balloon - except the 256MB DIMM limit. But reading below, I don't see how you want to avoid the KVM memory slot limit that's in a similar size (I recall 256*2 due to 2 address spaces). Or avoid VMA limits when wanting to grow a VM big in very tiny steps over time (e.g., adding 64MB at a time). > > In contrast with ACPI DIMM hotplug where one can only request to unplug a > whole DIMM stick this driver allows removing memory from guest in single > page (4k) units via ballooning. > Then, once the guest has released the whole memory backed by a haprot > virtual DIMM stick such device is marked "unused" and can be removed from > the VM, if one wants so. > A "HV_BALLOON_HAPROT_UNUSED" QMP event is emitted in this case so the > software controlling QEMU knows that this operation is now possible. > > The haprot devices are also marked unused after a VM reboot (with a > corresponding "HV_BALLOON_HAPROT_UNUSED" QMP event). > They are automatically reinserted (if still present) after the guest > reconnects to this protocol (a "HV_BALLOON_HAPROT_INUSE" QMP event is then > emitted). > > For performance reasons, the guest-released memory is tracked in few range > trees, as a series of (start, count) ranges. > Each time a new page range is inserted into such tree its neighbors are > checked as candidates for possible merging with it. > > Besides performance reasons, the Dynamic Memory protocol itself uses page > ranges as the data structure in its messages, so relevant pages need to be > merged into such ranges anyway. > > One has to be careful when tracking the guest-released pages, since the > guest can maliciously report returning pages outside its current address > space, which later clash with the address range of newly added memory. > Similarly, the guest can report freeing the same page twice. > > The above design results in much better ballooning performance than when > using virtio-balloon with the same guest: 230 GB / minute with this driver > versus 70 GB / minute with virtio-balloon. I assume these numbers apply with Windows guests only. IIRC Linux hv_balloon does not support page migration/compaction, while virtio-balloon does. So you might end up with quite some fragmented memory with hv_balloon in Linux guests - of course, usually only in corner cases. > > During a ballooning operation most of time is spent waiting for the guest > to come up with newly freed page ranges, processing the received ranges on > the host side (in QEMU / KVM) is nearly instantaneous. > > The unballoon operation is also pretty much instantaneous: > thanks to the merging of the ballooned out page ranges 200 GB of memory can > be returned to the guest in about 1 second. > With virtio-balloon this operation takes about 2.5 minutes. > > These tests were done against a Windows Server 2019 guest running on a > Xeon E5-2699, after dirtying the whole memory inside guest before each > balloon operation. > > Using a range tree instead of a bitmap to track the removed memory also > means that the solution scales well with the guest size: even a 1 TB range > takes just few bytes of memory. > Example usage: > * Add "-device vmbus-bridge,id=vmbus-bridge -device hv-balloon,id=hvb" > to the QEMU command line and set "maxmem" value to something large, > like 1T. > > * Use QEMU monitor commands to add a haprot virtual DIMM stick, together > with its memory backend: > object_add memory-backend-ram,id=mem1,size=200G > device_add mem-haprot,id=ha1,memdev=mem1 > The first command is actually the same as for ACPI-based DIMM hotplug. > > * Use the ballooning interface monitor commands to force the guest to give > out as much memory as possible: > balloon 1 At least under virtio-balloon with Linux, that will pretty sure trigger a guest crash. Is something like that expected to work with Windows guests reasonably well? > The ballooning interface monitor commands can also be used to resize > the guest up and down appropriately. > > * One can check the current guest size by issuing a "info balloon" command. > This is useful to know what is happening, since large ballooning or > unballooning operations take some time to complete. So, every time you want to add more memory (after the balloon was deflated) to a guest, you have to plug a new mem-haprot device, correct? So your QEMU user has to be well aware of how to balance "balloon" and "object_add/device_add/object_del_device_del" commands to achieve the desired guest size. > > * Once the guest releases the whole memory backed by a haprot device > (or is restarted) a "HV_BALLOON_HAPROT_UNUSED" QMP event will be > generated. > The haprot device then can be removed, together with its memory backend: > device_del ha1 > object_del mem1 So, you rely on some external entity to properly shrink a guest again (e.g., during reboot). > > Future directions: > * Allow sharing the ballooning QEMU interface between hv-balloon and > virtio-balloon drivers. > Currently, only one of them can be added to the VM at the same time. Yeah, that makes sense. Only one at a time. > > * Allow new haport devices to reuse the same address range as the ones > that were previously deleted via device_del monitor command without > having to restart the VM. > > * Add vmstate / live migration support to the hv-balloon driver. > > * Use haprot device to also add memory via virtio interface (this requires > defining a new operation in virtio-balloon protocol and appropriate > support from the client virtio-balloon driver in the Linux kernel). Most probably not the direction we are going to take. We have virtio-mem for clean, fine-grained, NUMA-aware, paravirtualized memory hot(un)plug now, and we are well aware of various issues with (base-page size based) memory ballooning that are fairly impossible to solve (especially in the context of vfio). -- Thanks, David / dhildenb