* Christian Borntraeger (borntrae...@de.ibm.com) wrote: > > > On 03/01/2018 01:35 PM, Christian Borntraeger wrote: > > > > > > On 03/01/2018 01:28 PM, Dr. David Alan Gilbert wrote: > >> * Christian Borntraeger (borntrae...@de.ibm.com) wrote: > >>> > >>> > >>> On 03/01/2018 12:45 PM, Dr. David Alan Gilbert wrote: > >>>> * Christian Borntraeger (borntrae...@de.ibm.com) wrote: > >>>>> > >>>>> > >>>>> On 03/01/2018 10:24 AM, Dr. David Alan Gilbert wrote: > >>>>>> * Thomas Huth (th...@redhat.com) wrote: > >>>>>>> On 28.02.2018 20:53, Christian Borntraeger wrote: > >>>>>>>> When a guests reboots with diagnose 308 subcode 3 it requests the > >>>>>>>> memory > >>>>>>>> to be cleared. We did not do it so far. This does not only violate > >>>>>>>> the > >>>>>>>> architecture, it also misses the chance to free up that memory on > >>>>>>>> reboot, which would help on host memory over commitment. By using > >>>>>>>> ram_block_discard_range we can cover both cases. > >>>>>>> > >>>>>>> Sounds like a good idea. I wonder whether that release_all_ram() > >>>>>>> function should maybe rather reside in exec.c, so that other machines > >>>>>>> that want to clear all RAM at reset time can use it, too? > >>>>>>> > >>>>>>>> Signed-off-by: Christian Borntraeger <borntrae...@de.ibm.com> > >>>>>>>> --- > >>>>>>>> target/s390x/kvm.c | 19 +++++++++++++++++++ > >>>>>>>> 1 file changed, 19 insertions(+) > >>>>>>>> > >>>>>>>> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c > >>>>>>>> index 8f3a422288..2e145ad5c3 100644 > >>>>>>>> --- a/target/s390x/kvm.c > >>>>>>>> +++ b/target/s390x/kvm.c > >>>>>>>> @@ -34,6 +34,8 @@ > >>>>>>>> #include "qapi/error.h" > >>>>>>>> #include "qemu/error-report.h" > >>>>>>>> #include "qemu/timer.h" > >>>>>>>> +#include "qemu/rcu_queue.h" > >>>>>>>> +#include "sysemu/cpus.h" > >>>>>>>> #include "sysemu/sysemu.h" > >>>>>>>> #include "sysemu/hw_accel.h" > >>>>>>>> #include "hw/boards.h" > >>>>>>>> @@ -41,6 +43,7 @@ > >>>>>>>> #include "sysemu/device_tree.h" > >>>>>>>> #include "exec/gdbstub.h" > >>>>>>>> #include "exec/address-spaces.h" > >>>>>>>> +#include "exec/ram_addr.h" > >>>>>>>> #include "trace.h" > >>>>>>>> #include "qapi-event.h" > >>>>>>>> #include "hw/s390x/s390-pci-inst.h" > >>>>>>>> @@ -1841,6 +1844,14 @@ static int kvm_arch_handle_debug_exit(S390CPU > >>>>>>>> *cpu) > >>>>>>>> return ret; > >>>>>>>> } > >>>>>>>> > >>>>>>>> +static void release_all_rams(void) > >>>>>>> > >>>>>>> s/rams/ram/ maybe? > >>>>>>> > >>>>>>>> +{ > >>>>>>>> + struct RAMBlock *rb; > >>>>>>>> + > >>>>>>>> + QLIST_FOREACH_RCU(rb, &ram_list.blocks, next) > >>>>>>>> + ram_block_discard_range(rb, 0, rb->used_length); > >>>>>>> > >>>>>>> From a coding style point of view, I think there should be curly > >>>>>>> braces > >>>>>>> around ram_block_discard_range() ? > >>>>>> > >>>>>> I think this might break if it happens during a postcopy migrate. > >>>>>> The destination CPU is running, so it can do a reboot at just the wrong > >>>>>> time; and then the pages (that are protected by userfaultfd) would get > >>>>>> deallocated and trigger userfaultfd requests if accessed. > >>>>> > >>>>> Yes, userfaultd/postcopy is really fragile and relies on things that > >>>>> are not > >>>>> necessarily true (e.g. virito-balloon can also invalidate pages). > >>>> > >>>> That's why we use qemu_balloon_inhibit around postcopy to stop > >>>> ballooning; I'm not aware of anything else that does the same. > >>> > >>> we also have at least the pte_unused thing in mm/rmap.c that clearly > >>> predates userfaultfd. We might need to look into this as well.... > >> > >> I've not come across that; what does that do? > > > > It can drop a page on page out if the page is no longer of value. It is > > used by > > the CMMA (guest page hinting) code of s390x. > > > > see kernel mm/rmap.c > > > > > > static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > > unsigned long address, void *arg) > > { > > [...] > > } else if (pte_unused(pteval)) { > > /* > > * The guest indicated that the page content is of > > no > > * interest anymore. Simply discard the pte, vmscan > > * will take care of the rest. > > */ > > dec_mm_counter(mm, mm_counter(page)); > > /* We have to invalidate as we cleared the pte */ > > mmu_notifier_invalidate_range(mm, address, > > address + PAGE_SIZE); > > } else if (IS_ENABLED(CONFIG_MIGRATION) && > > (flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE))) > > { > > [...] > > > > > > Maybe something like this in the kernel > > diff --git a/mm/rmap.c b/mm/rmap.c > index 47db27f8049e..9bdf4d448987 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1483,7 +1483,7 @@ static bool try_to_unmap_one(struct page *page, struct > vm_area_struct *vma, > set_pte_at(mm, address, pvmw.pte, pteval); > } > > - } else if (pte_unused(pteval)) { > + } else if (pte_unused(pteval) && > !vma->vm_userfaultfd_ctx.ctx) { > /* > * The guest indicated that the page content is of no > * interest anymore. Simply discard the pte, vmscan > > > could help?
I guess so, but please check with aarcange; I don't know the mm code. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK