On Wed, Dec 05, 2018 at 08:59:06AM +0100, David Hildenbrand wrote: > On 05.12.18 06:06, David Gibson wrote: > > Currently, virtio-balloon uses madvise() with MADV_DONTNEED to actually > > discard RAM pages inserted into the balloon. This is basically a Linux > > only interface (MADV_DONTNEED exists on some other platforms, but doesn't > > always have the same semantics). It also doesn't work on hugepages and has > > some other limitations. > > > > It turns out that postcopy also needs to discard chunks of memory, and uses > > a better interface for it: ram_block_discard_range(). It doesn't cover > > every case, but it covers more than going direct to madvise() and this > > gives us a single place to update for more possibilities in future. > > > > There are some subtleties here to maintain the current balloon behaviour: > > > > * For now, we just ignore requests to balloon in a hugepage backed region. > > That matches current behaviour, because MADV_DONTNEED on a hugepage would > > simply fail, and we ignore the error. > > > > * If host page size is > BALLOON_PAGE_SIZE we can frequently call this on > > non-host-page-aligned addresses. These would also fail in madvise(), > > which we then ignored. ram_block_discard_range() error_report()s calls > > on unaligned addresses, so we explicitly check that case to avoid > > spamming the logs. > > > > * We now call ram_block_discard_range() with the *host* page size, whereas > > we previously called madvise() with BALLOON_PAGE_SIZE. Surprisingly, > > this also matches existing behaviour. Although the kernel fails madvise > > on unaligned addresses, it will round unaligned sizes *up* to the host > > page size. Yes, this means that if BALLOON_PAGE_SIZE < guest page size > > we can incorrectly discard more memory than the guest asked us to. I'm > > planning to address that soon. > > > > Errors other than the ones discussed above, will now be reported by > > ram_block_discard_range(), rather than silently ignored, which means we > > have a much better chance of seeing when something is going wrong. > > > > Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> > > Reviewed-by: Michael S. Tsirkin <m...@redhat.com> > > --- > > hw/virtio/virtio-balloon.c | 23 ++++++++++++++++++++++- > > 1 file changed, 22 insertions(+), 1 deletion(-) > > > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > > index c3a19aa27d..4435905c87 100644 > > --- a/hw/virtio/virtio-balloon.c > > +++ b/hw/virtio/virtio-balloon.c > > @@ -37,8 +37,29 @@ static void balloon_inflate_page(VirtIOBalloon *balloon, > > MemoryRegion *mr, hwaddr offset) > > { > > void *addr = memory_region_get_ram_ptr(mr) + offset; > > + RAMBlock *rb; > > + size_t rb_page_size; > > + ram_addr_t ram_offset; > > > > - qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED); > > + /* XXX is there a better way to get to the RAMBlock than via a > > + * host address? */ > > We have qemu_get_ram_block(). That one should work as long as we know > that it is a valid guest ram address. (would have to make it !static) > > Then we would only have to pass to this function the original ram_addr_t > handed over by the guest (which looks somewhat cleaner to me than going > via memory regions)
So, I didn't use that because it's a hwaddr, not a ram_addr_t that the guest gives us. I think they have the same value for guest RAM addresses, but I wasn't sure if it was safe to rely on that. > > > + rb = qemu_ram_block_from_host(addr, false, &ram_offset); > > + rb_page_size = qemu_ram_pagesize(rb); > > + > > + /* Silently ignore hugepage RAM blocks */ > > + if (rb_page_size != getpagesize()) { > > + return; > > + } > > + > > + /* Silently ignore unaligned requests */ > > + if (ram_offset & (rb_page_size - 1)) { > > + return; > > + } > > + > > + ram_block_discard_range(rb, ram_offset, rb_page_size); > > + /* We ignore errors from ram_block_discard_range(), because it has > > + * already reported them, and failing to discard a balloon page is > > + * not fatal */ > > } > > > > static const char *balloon_stat_names[] = { > > > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature