(Copies in Dave Hildenbrand) * Peter Maydell (peter.mayd...@linaro.org) wrote: > On Sat, 18 Jul 2020 at 14:21, David CARLIER <devne...@gmail.com> wrote: > > > > From a9e3cced279ae55a59847ba232f7828bc2479367 Mon Sep 17 00:00:00 2001 > > From: David Carlier <devne...@gmail.com> > > Date: Sat, 18 Jul 2020 13:29:44 +0100 > > Subject: [PATCH 2/3] exec: posix_madvise usage on SunOS. > > > > with _XOPEN_SOURCE set, the older mman.h API based on caddr_t handling > > is not accessible thus using posix_madvise here. > > > > Signed-off-by: David Carlier <devne...@gmail.com> > > --- > > exec.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/exec.c b/exec.c > > index 6f381f98e2..0466a75b89 100644 > > --- a/exec.c > > +++ b/exec.c > > @@ -3964,7 +3964,15 @@ int ram_block_discard_range(RAMBlock *rb, > > uint64_t start, size_t length) > > * fallocate'd away). > > */ > > #if defined(CONFIG_MADVISE) > > +#if !defined(CONFIG_SOLARIS) > > ret = madvise(host_startaddr, length, MADV_DONTNEED); > > +#else > > + /* > > + * mmap and its caddr_t based api is not accessible > > + * with _XOPEN_SOURCE set on illumos > > + */ > > + ret = posix_madvise(host_startaddr, length, > > POSIX_MADV_DONTNEED); > > +#endif > > Hi. I'm not sure this patch will do the right thing, because > I don't think that Solaris's POSIX_MADV_DONTNEED provides > the semantics that this QEMU function says it needs. The > comment at the top of the function says: > > * Unmap pages of memory from start to start+length such that > * they a) read as 0, b) Trigger whatever fault mechanism > * the OS provides for postcopy. > * The pages must be unmapped by the end of the function.
This code has moved around a bit over it's life; joining the case needed by balloon and the case needed by postcopy. > (Aside: the use of 'unmap' in this comment is a bit confusing, > because it clearly doesn't mean 'unmap' if it wants read-as-0. > And the reference to faults on postcopy is incomprehensible > to me: if memory is read-as-0 it isn't going to fault.) I think because internally to Linux the behaviour is the same; this causes the mapping to disappear from the TLB so it faults; normally when reading the kernel resolves the fault and puts a read-as-zero page there, except if userfault was enabled for postcopy, in which case it gives us a kick and we service it. > Linux's madvise(MADV_DONTNEED) does guarantee us this > read-as-zero behaviour. (It's a silly API choice that Linux > put this behaviour behind madvise, which is supposed to be > merely advisory, but that's how it is.) Yes, I don't think there's any equivalent to madvise that guarantees anything. > The Solaris > posix_madvise() manpage says it is merely advisory and > doesn't affect the behaviour of accesses to the memory. > > If posix_madvise() behaviour was OK in this function, the > right way to fix this would be to use qemu_madvise() > instead, which already provides this "if host has > madvise(), use it, otherwise use posix_madvise()" logic. > But I suspect that the direct madvise() here is deliberate. Yes, but I can't remember the semantics fully - I think it was because we needed the guarantee at this point (and even Linux's posix madvise did something different??) I've got a note saying we didn't want to use qemu_madvise because we wanted to be sure we didn't get posix_madvise. > Side note: not sure the current code is correct for the > BSDs either -- they have madvise() but don't provide > Linux's really-read-as-zero guarantee for MADV_DONTNEED. > So we should probably be doing something else there, and > whatever that something-else is is probably also what > Solaris wants. > > We use ram_block_discard_range() only in migration and > in virtio-balloon and virtio-mem; I've cc'd some people > who hopefully understand what the requirements on this > function are and might have a view on what the not-Linux > implementation should look like. (David Gilbert: git > blame says you wrote this code :-)) Dave > > thanks > -- PMM > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK