On Thu, 19 Nov 2020 16:39:10 +0100 David Hildenbrand <da...@redhat.com> wrote:
> We have some special RAM memory regions (managed by virtio-mem), whereby > the guest agreed to only use selected memory ranges. "unused" parts are > discarded so they won't consume memory - to logically unplug these memory > ranges. Before the VM is allowed to use such logically unplugged memory > again, coordination with the hypervisor is required. > > This results in "sparse" mmaps/RAMBlocks/memory regions, whereby only > coordinated parts are valid to be used/accessed by the VM. > > In most cases, we don't care about that - e.g., in KVM, we simply have a > single KVM memory slot. However, in case of vfio, registering the > whole region with the kernel results in all pages getting pinned, and > therefore an unexpected high memory consumption - discarding of RAM in > that context is broken. > > Let's introduce a way to coordinate discarding/populating memory within a > RAM memory region with such special consumers of RAM memory regions: they > can register as listeners and get updates on memory getting discarded and > populated. Using this machinery, vfio will be able to map only the > currently populated parts, resulting in discarded parts not getting pinned > and not consuming memory. > > A RamDiscardMgr has to be set for a memory region before it is getting > mapped, and cannot change while the memory region is mapped. > > Note: At some point, we might want to let RAMBlock users (esp. vfio used > for nvme://) consume this interface as well. We'll need RAMBlock notifier > calls when a RAMBlock is getting mapped/unmapped (via the corresponding > memory region), so we can properly register a listener there as well. > > Cc: Paolo Bonzini <pbonz...@redhat.com> > Cc: "Michael S. Tsirkin" <m...@redhat.com> > Cc: Alex Williamson <alex.william...@redhat.com> > Cc: Dr. David Alan Gilbert <dgilb...@redhat.com> > Cc: Igor Mammedov <imamm...@redhat.com> > Cc: Pankaj Gupta <pankaj.gupta.li...@gmail.com> > Cc: Peter Xu <pet...@redhat.com> > Cc: Auger Eric <eric.au...@redhat.com> > Cc: Wei Yang <richard.weiy...@linux.alibaba.com> > Cc: teawater <teawat...@linux.alibaba.com> > Cc: Marek Kedzierski <mkedz...@redhat.com> > Signed-off-by: David Hildenbrand <da...@redhat.com> > --- > include/exec/memory.h | 225 ++++++++++++++++++++++++++++++++++++++++++ > softmmu/memory.c | 22 +++++ > 2 files changed, 247 insertions(+) > > diff --git a/include/exec/memory.h b/include/exec/memory.h > index 0f3e6bcd5e..468cbb53a4 100644 > --- a/include/exec/memory.h > +++ b/include/exec/memory.h ... > @@ -425,6 +501,120 @@ struct IOMMUMemoryRegionClass { > Error **errp); > }; > > +/* > + * RamDiscardMgrClass: > + * > + * A #RamDiscardMgr coordinates which parts of specific RAM #MemoryRegion > + * regions are currently populated to be used/accessed by the VM, notifying > + * after parts were discarded (freeing up memory) and before parts will be > + * populated (consuming memory), to be used/acessed by the VM. > + * > + * A #RamDiscardMgr can only be set for a RAM #MemoryRegion while the > + * #MemoryRegion isn't mapped yet; it cannot change while the #MemoryRegion > is > + * mapped. > + * > + * The #RamDiscardMgr is intended to be used by technologies that are > + * incompatible with discarding of RAM (e.g., VFIO, which may pin all > + * memory inside a #MemoryRegion), and require proper coordination to only > + * map the currently populated parts, to hinder parts that are expected to > + * remain discarded from silently getting populated and consuming memory. > + * Technologies that support discarding of RAM don't have to bother and can > + * simply map the whole #MemoryRegion. > + * > + * An example #RamDiscardMgr is virtio-mem, which logically (un)plugs > + * memory within an assigned RAM #MemoryRegion, coordinated with the VM. > + * Logically unplugging memory consists of discarding RAM. The VM agreed to > not > + * access unplugged (discarded) memory - especially via DMA. virtio-mem will > + * properly coordinate with listeners before memory is plugged (populated), > + * and after memory is unplugged (discarded). > + * > + * Listeners are called in multiples of the minimum granularity and changes > are > + * aligned to the minimum granularity within the #MemoryRegion. Listeners > have > + * to prepare for memory becomming discarded in a different granularity than > it > + * was populated and the other way around. > + */ > +struct RamDiscardMgrClass { > + /* private */ > + InterfaceClass parent_class; > + > + /* public */ > + > + /** > + * @get_min_granularity: > + * > + * Get the minimum granularity in which listeners will get notified > + * about changes within the #MemoryRegion via the #RamDiscardMgr. > + * > + * @rdm: the #RamDiscardMgr > + * @mr: the #MemoryRegion > + * > + * Returns the minimum granularity. > + */ > + uint64_t (*get_min_granularity)(const RamDiscardMgr *rdm, > + const MemoryRegion *mr); > + > + /** > + * @is_populated: > + * > + * Check whether the given range within the #MemoryRegion is completely > + * populated (i.e., no parts are currently discarded). There are no > + * alignment requirements for the range. > + * > + * @rdm: the #RamDiscardMgr > + * @mr: the #MemoryRegion > + * @offset: offset into the #MemoryRegion > + * @size: size in the #MemoryRegion > + * > + * Returns the minimum granularity. I think the return description got copied from above, this returns bool. ... > diff --git a/softmmu/memory.c b/softmmu/memory.c > index aa393f1bb0..fbdc50fa8b 100644 > --- a/softmmu/memory.c > +++ b/softmmu/memory.c > @@ -2013,6 +2013,21 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion > *iommu_mr) > return imrc->num_indexes(iommu_mr); > } > > +RamDiscardMgr *memory_region_get_ram_discard_mgr(MemoryRegion *mr) > +{ > + if (!memory_region_is_mapped(mr) || !memory_region_is_ram(mr)) { > + return false; s/false/NULL/? > + } > + return mr->rdm; > +} > +