Re: [Qemu-devel] [RFC 23/29] vub+postcopy: madvises

Alexey Perevalov Wed, 09 Aug 2017 04:03:06 -0700

On 08/08/2017 08:06 PM, Dr. David Alan Gilbert wrote:

* Alexey Perevalov (a.pereva...@samsung.com) wrote:

On 06/28/2017 10:00 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" <dgilb...@redhat.com>


Clear the area and turn off THP.

Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
   contrib/libvhost-user/libvhost-user.c | 32 ++++++++++++++++++++++++++++++--
   1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c 
b/contrib/libvhost-user/libvhost-user.c
index 0658b6e847..ceddeac74f 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -451,11 +451,39 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
           }
           if (dev->postcopy_listening) {
+            int ret;
               /* We should already have an open ufd need to mark each memory
                * range as ufd.
-             * Note: Do we need any madvises? Well it's not been accessed
-             * yet, still probably need no THP to be safe, discard to be safe?
                */
+
+            /* Discard any mapping we have here; note I can't use MADV_REMOVE
+             * or fallocate to make the hole since I don't want to lose
+             * data that's already arrived in the shared process.
+             * TODO: How to do hugepage
+             */

Hi, David, frankly saying, I stuck with my solution, and I have also another
issues,
but here I could suggest solution for hugepages. I think we could transmit a
received pages
bitmap in VHOST_USER_SET_MEM_TABLE (VhostUserMemoryRegion), but it will
raise a compatibility issue,
or introduce special message type for that and send it before
VHOST_USER_SET_MEM_TABLE.
So it will be  possible to do fallocate on received bitmap basis, just skip
already copied pages.
If you wish, I could send patches, rebased on yours, for doing it.

What we found works is that actually we don't need to do a discard -
since we've only just done the mmap of the arena, nothing will be
occupying it on the shared client, so we don't need to discard.

Looks like yes, I checked on kernel from Andrea's git,
there is any more EEXIST error in case when client doesn't
fallocate.


We've had a postcopy migrate work now, with a few hacks we're still
cleaning up, both on vhost-user-bridge and dpdk; so I'll get this
updated and reposted.

In you patch series vring is disabling in case of VHOST_USER_GET_VRING_BASE.
It's being called when vhost-user server want's to stop vring.

QEMU is enabling vring as soon as virtual machine is started, so Ididn't see

explicit vring disabling for migrating VRING.

So migrating VRING is protected just by uffd_register, isn't it? And PMDthread (anyvhost-user thread which accessing migrating VRING) will wait pagecopying in this case,

right?


Dave

+            ret = madvise((void *)dev_region->mmap_addr,
+                          dev_region->size + dev_region->mmap_offset,
+                          MADV_DONTNEED);
+            if (ret) {
+                fprintf(stderr,
+                        "%s: Failed to madvise(DONTNEED) region %d: %s\n",
+                        __func__, i, strerror(errno));
+            }
+            /* Turn off transparent hugepages so we dont get lose wakeups
+             * in neighbouring pages.
+             * TODO: Turn this backon later.
+             */
+            ret = madvise((void *)dev_region->mmap_addr,
+                          dev_region->size + dev_region->mmap_offset,
+                          MADV_NOHUGEPAGE);
+            if (ret) {
+                /* Note: This can happen legally on kernels that are configured
+                 * without madvise'able hugepages
+                 */
+                fprintf(stderr,
+                        "%s: Failed to madvise(NOHUGEPAGE) region %d: %s\n",
+                        __func__, i, strerror(errno));
+            }
               struct uffdio_register reg_struct;
               /* Note: We might need to go back to using mmap_addr and
                * len + mmap_offset for * huge pages, but then we do hope not to


--
Best regards,
Alexey Perevalov

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


--
Best regards,
Alexey Perevalov

Re: [Qemu-devel] [RFC 23/29] vub+postcopy: madvises

Reply via email to