On 24.07.2020 03:08, Peter Xu wrote:
On Wed, Jul 22, 2020 at 11:11:32AM +0300, Denis Plotnikov wrote:
+/**
+ * ram_copy_page: make a page copy
+ *
+ * Used in the background snapshot to make a copy of a memeory page.
+ * Ensures that the memeory page is copied only once.
+ * When a page copy is done, restores read/write access to the memory
+ * page.
+ * If a page is being copied by another thread, wait until the copying
+ * ends and exit.
+ *
+ * Returns:
+ *   -1 - on error
+ *    0 - the page wasn't copied by the function call
+ *    1 - the page has been copied
+ *
+ * @block:     RAM block to use
+ * @page_nr:   the page number to copy
+ * @page_copy: the pointer to return a page copy
+ *
+ */
+int ram_copy_page(RAMBlock *block, unsigned long page_nr,
+                          void **page_copy)
+{
+    void *host_page;
+    int res = 0;
+
+    atomic_inc(&ram_state->page_copier_cnt);
+
+    if (test_and_set_bit_atomic(page_nr, block->touched_map)) {
+        while (!test_bit_atomic(page_nr, block->copied_map)) {
+            /* the page is being copied -- wait for the end of the copying */
+            qemu_event_wait(&ram_state->page_copying_done);
+        }
+        goto out;
+    }
+
+    *page_copy = ram_page_buffer_get();
+    if (!*page_copy) {
+        res = -1;
+        goto out;
+    }
+
+    host_page = block->host + (page_nr << TARGET_PAGE_BITS);
+    memcpy(*page_copy, host_page, TARGET_PAGE_SIZE);
+
+    if (ram_set_rw(host_page, TARGET_PAGE_SIZE)) {
+        ram_page_buffer_free(*page_copy);
+        *page_copy = NULL;
+        res = -1;
+        goto out;
+    }
+
+    set_bit_atomic(page_nr, block->copied_map);
+    qemu_event_set(&ram_state->page_copying_done);
+    qemu_event_reset(&ram_state->page_copying_done);
+
+    res = 1;
+out:
+    atomic_dec(&ram_state->page_copier_cnt);
+    return res;
+}
Is ram_set_rw() be called on the page only if a page fault triggered?
Shouldn't we also do that even in the background thread when we proactively
copying the pages?
Yes, we should. It's a bug.

Besides current solution, do you think we can make it simpler by only deliver
the fault request to the background thread?  We can let the background thread
to do all the rests and IIUC we can drop all the complicated sync bitmaps and
so on by doing so.  The process can look like:

   - background thread runs the general precopy migration, and,

     - it only does the ram_bulk_stage, which is the first loop, because for
       snapshot no reason to send a page twice..

     - After copy one page, do ram_set_rw() always, so accessing of this page
       will never trigger write-protect page fault again,

     - take requests from the unqueue_page() just like what's done in this
       series, but instead of copying the page, the page request should look
       exactly like the postcopy one.  We don't need copy_page because the page
       won't be changed before we unprotect it, so it shiould be safe.  These
       pages will still be with high priority because when queued it means vcpu
       writed to this protected page and fault in userfaultfd.  We need to
       migrate these pages first to unblock them.

   - the fault handler thread only needs to do:

     - when get a uffd-wp message, translate into a postcopy-like request
       (calculate ramblock and offset), then queue it.  That's all.

I believe we can avoid the copy_page parameter that was passed around, and we
can also save the two extra bitmaps and the complicated synchronizations.

Do you think this would work?

Yes, it would. This scheme is much simpler. I like it, in general.

I use such a complicated approach to reduce all possible vCPU delays:
if the storage where the snapshot is being saved is quite slow, it could lead
to vCPU freezing until the page is fully written to the storage.
So, with the current approach, if not take into account a number of page copies limitation,
the worst case is all VM's ram is copied and then written to the storage.
Other words, the current scheme provides minimal vCPU delays and thus minimal VM cpu
performance slowdown with the cost of host's memory consumption.
The new scheme is simple, doesn't consume extra host memory but can freeze vCPUs for
longer time r because:
* usually memory page coping is faster then memory page writing to a storage (think of HDD disk)
* writing page to a disk depends on disk performance and current disk load

So it seems that we have two different strategies:
1. lower CPU delays
2. lower memory usage

To be honest I would start from the yours scheme as it much simler and the other if needed in the future.

What do you think?

Denis

Besides, have we disabled dirty tracking of memslots?  IIUC that's not needed
for background snapshot too, so neither do we need dirty tracking nor do we
need to sync the dirty bitmap from outside us (kvm/vhost/...).



Reply via email to