Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

Chuang Xu Mon, 27 Feb 2023 05:20:48 -0800

Hi, Peter

On 2023/2/25 下午11:32, Peter Xu wrote:

On Thu, Feb 23, 2023 at 11:28:46AM +0800, Chuang Xu wrote:

Hi, Peter

Hi, Chuang,

On 2023/2/22 下午11:57, Peter Xu wrote:

I don't see why it's wrong, and that's exactly what I wanted to guarantee,
that if memory_region_update_pending==true when updating ioeventfd, we may
have some serios problem.

For this, I split my patch into two parts and I put this change into the
last patch.  See the attachment.  If you worry about this, you can e.g. try
applying patch 1 only later, but I still don't see why it could.


Sorry, I made some mistakes in my previous understanding of the code. However,
if this assertion is added, it will indeed trigger some coredump in qtest with
my patches. Here is the coredump(This is the only one I found):

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fa5e7b17535 in __GI_abort () at abort.c:79
#2  0x00007fa5e7b1740f in __assert_fail_base (fmt=0x7fa5e7c78ef0 "%s%s%s:%u: 
%s%sAssertion `%s' failed.\n%n",
    assertion=0x56305fc02d60 "qemu_mutex_iothread_locked() && !memory_region_update_pending", 
file=0x56305fc028cb "../softmmu/memory.c", line=855, function=<optimized out>) at assert.c:92
#3  0x00007fa5e7b251a2 in __GI___assert_fail (assertion=assertion@entry=0x56305fc02d60 
"qemu_mutex_iothread_locked() && !memory_region_update_pending",
    file=file@entry=0x56305fc028cb "../softmmu/memory.c", line=line@entry=855, 
function=function@entry=0x56305fc03cc0 <__PRETTY_FUNCTION__.38596> 
"address_space_update_ioeventfds")
    at assert.c:101
#4  0x000056305f8a9194 in address_space_update_ioeventfds 
(as=as@entry=0x563061293648) at ../softmmu/memory.c:855
#5  0x000056305f8afe4f in address_space_init (as=as@entry=0x563061293648, 
root=root@entry=0x5630612936a0, name=name@entry=0x5630612934f0 
"virtio-net-pci") at ../softmmu/memory.c:3172
#6  0x000056305f686e43 in do_pci_register_device (errp=0x7fa5e4f39850, devfn=<optimized 
out>, name=0x563061011c40 "virtio-net-pci", pci_dev=0x563061293410) at 
../hw/pci/pci.c:1145
#7  pci_qdev_realize (qdev=0x563061293410, errp=0x7fa5e4f39850) at 
../hw/pci/pci.c:2036
#8  0x000056305f939daf in device_set_realized (obj=<optimized out>, value=true, 
errp=0x7fa5e4f39ae0) at ../hw/core/qdev.c:510
#9  0x000056305f93d156 in property_set_bool (obj=0x563061293410, v=<optimized out>, 
name=<optimized out>, opaque=0x5630610221d0, errp=0x7fa5e4f39ae0) at 
../qom/object.c:2285
#10 0x000056305f93f403 in object_property_set (obj=obj@entry=0x563061293410, 
name=name@entry=0x56305fba6bc3 "realized", v=v@entry=0x563061e52a00, 
errp=errp@entry=0x7fa5e4f39ae0)
    at ../qom/object.c:1420
#11 0x000056305f94247f in object_property_set_qobject (obj=obj@entry=0x563061293410, 
name=name@entry=0x56305fba6bc3 "realized", value=value@entry=0x563061e61cb0,
    errp=errp@entry=0x7fa5e4f39ae0) at ../qom/qom-qobject.c:28
#12 0x000056305f93f674 in object_property_set_bool (obj=0x563061293410, 
name=name@entry=0x56305fba6bc3 "realized", value=value@entry=true, 
errp=errp@entry=0x7fa5e4f39ae0)
    at ../qom/object.c:1489
#13 0x000056305f93959c in qdev_realize (dev=<optimized out>, 
bus=bus@entry=0x563061c9c400, errp=errp@entry=0x7fa5e4f39ae0) at ../hw/core/qdev.c:292
#14 0x000056305f7244a0 in qdev_device_add_from_qdict (opts=0x563061e64c00, 
from_json=<optimized out>, errp=<optimized out>, errp@entry=0x7fa5e4f39ae0)
    at /data00/migration/qemu-open/include/hw/qdev-core.h:17
#15 0x000056305f846c75 in failover_add_primary (errp=0x7fa5e4f39ad8, 
n=0x563062043530) at ../hw/net/virtio-net.c:933
#16 virtio_net_set_features (vdev=<optimized out>, 
features=4611687122246533156) at ../hw/net/virtio-net.c:1004
#17 0x000056305f872238 in virtio_set_features_nocheck 
(vdev=vdev@entry=0x563062043530, val=val@entry=4611687122246533156) at 
../hw/virtio/virtio.c:2851
#18 0x000056305f877e9e in virtio_load (vdev=0x563062043530, f=0x56306125bde0, 
version_id=11) at ../hw/virtio/virtio.c:3027
#19 0x000056305f73c601 in vmstate_load_state (f=f@entry=0x56306125bde0, 
vmsd=0x56305fef16c0 <vmstate_virtio_net>, opaque=0x563062043530, version_id=11) 
at ../migration/vmstate.c:137
#20 0x000056305f757672 in vmstate_load (f=0x56306125bde0, se=0x563062176700) at 
../migration/savevm.c:919
#21 0x000056305f757905 in qemu_loadvm_section_start_full 
(f=f@entry=0x56306125bde0, mis=0x56306101d3e0) at ../migration/savevm.c:2503
#22 0x000056305f75aca8 in qemu_loadvm_state_main (f=f@entry=0x56306125bde0, 
mis=mis@entry=0x56306101d3e0) at ../migration/savevm.c:2719
#23 0x000056305f75c17a in qemu_loadvm_state (f=0x56306125bde0) at 
../migration/savevm.c:2809
#24 0x000056305f74980e in process_incoming_migration_co (opaque=<optimized 
out>) at ../migration/migration.c:606
#25 0x000056305fab25cb in coroutine_trampoline (i0=<optimized out>, i1=<optimized 
out>) at ../util/coroutine-ucontext.c:177

I really think changing depth is a hack... :(

Here IMHO the problem is we have other missing calls to
address_space_to_flatview() during commit() and that caused the issue.
There aren't a lot of those, and sorry to miss yet another one.

So let me try one more time on this (with patch 1; I think I've got two
places missed in the previous attempt).  Let's see how it goes.

We may want to add a tracepoint or have some way to know when enfornced
commit() is triggered during the vm load phase.  I just noticed when you
worried about having enforced commit() to often then it beats the original
purpose and I think yes that's something to worry.

I still believe AHCI condition is rare (since e.g. you've passed all Juan's
tests even before we have this "do_commit" stuff), but in short: I think it
would still be interesting if you can capture all the users of enforced
commit() like the AHCI case you quoted before, and list them in the cover
letter in your next post (along with a new perf measurements, to make sure
your worry is not true on having too much enforced commit will invalid the
whole idea).

When I was digging this out, I also found some RCU issue when using
address_space_to_flatview() (when BQL was not taken), only in the
memory_region_clear_dirty_bitmap() path.  I hope the new assertion
(rcu_read_is_locked()) won't trigger on some of the use cases for you
already, but anyway feel free to ignore this whole paragraph for now until
if you see some assert(rcu_read_is_locked()) being triggered.  I plan to
post some RFC for fixing RCU and I'll have you copied just for reference
(may be separate issue as what you're working on).

Thanks,

Unfortunately, there is another case of stack overflow...

#8  memory_region_transaction_do_commit () at ../softmmu/memory.c:1145
#9  0x00005610e96d3665 in address_space_to_flatview (as=0x5610e9ece820 
<address_space_memory>) at 
/data00/migration/qemu-open/include/exec/memory.h:1119
#10 address_space_get_flatview (as=0x5610e9ece820 <address_space_memory>) at 
../softmmu/memory.c:818
#11 0x00005610e96dfa14 in address_space_cache_init (cache=cache@entry=0x56111cdee410, 
as=<optimized out>, addr=addr@entry=1048576, len=len@entry=4096, is_write=false)
    at ../softmmu/physmem.c:3350
#12 0x00005610e96a0928 in virtio_init_region_cache 
(vdev=vdev@entry=0x5610eb72bf10, n=n@entry=0) at ../hw/virtio/virtio.c:247
#13 0x00005610e96a0db4 in virtio_memory_listener_commit 
(listener=0x5610eb72bff8) at ../hw/virtio/virtio.c:3592
#14 0x00005610e96d2e5e in address_space_update_flatviews_all () at 
../softmmu/memory.c:1126
#15 memory_region_transaction_do_commit () at ../softmmu/memory.c:1145

Fortunately, this is probably the last one (at least according to the qtest 
results)😁.

BTW, Perhaps you can define do_commit as a non-static function? Because I'll 
use it in
address_space_to_flatview later.

My v6 patches are attached. If necessary, you can apply them and make some test.

Thanks!

From 44ec663c04ed54284cdee4d4340bbc421cde82c6 Mon Sep 17 00:00:00 2001
From: Chuang Xu <xuchuangxc...@bytedance.com>
Date: Thu, 22 Dec 2022 22:37:50 +0800
Subject: [PATCH v6 1/3] rcu: introduce rcu_read_is_locked()

add rcu_read_is_locked() to detect holding of rcu lock.

Signed-off-by: Chuang Xu <xuchuangxc...@bytedance.com>
---
 include/qemu/rcu.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index b063c6fde8..719916d9d3 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -119,6 +119,13 @@ static inline void rcu_read_unlock(void)
     }
 }
 
+static inline bool rcu_read_is_locked(void)
+{
+    struct rcu_reader_data *p_rcu_reader = get_ptr_rcu_reader();
+
+    return p_rcu_reader->depth > 0;
+}
+
 extern void synchronize_rcu(void);
 
 /*
-- 
2.20.1

From 0963385fb6ff9106eec6886d311601bbe0e917ab Mon Sep 17 00:00:00 2001
From: Chuang Xu <xuchuangxc...@bytedance.com>
Date: Mon, 12 Dec 2022 11:24:25 +0800
Subject: [PATCH v6 2/3] memory: add sanity check in address_space_to_flatview

Before using any flatview, sanity check whether BQL or rcu is held. And
if we're during a memory region transaction, try to immediately update
mappings, or the map can be invalid.

Signed-off-by: Chuang Xu <xuchuangxc...@bytedance.com>
---
 include/exec/memory.h | 23 +++++++++++++++++++++++
 softmmu/memory.c      |  5 +++++
 2 files changed, 28 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2e602a2fad..84b531c6ff 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -27,6 +27,7 @@
 #include "qemu/notify.h"
 #include "qom/object.h"
 #include "qemu/rcu.h"
+#include "qemu/main-loop.h"
 
 #define RAM_ADDR_INVALID (~(ram_addr_t)0)
 
@@ -1095,8 +1096,30 @@ struct FlatView {
     MemoryRegion *root;
 };
 
+bool memory_region_transaction_in_progress(void);
+
+void memory_region_transaction_do_commit(void);
+
 static inline FlatView *address_space_to_flatview(AddressSpace *as)
 {
+    if (qemu_mutex_iothread_locked()) {
+        /* We exclusively own the flatview now.. */
+        if (memory_region_transaction_in_progress()) {
+            /*
+             * Fetch the flatview within a transaction in-progress, it
+             * means current_map may not be the latest, we need to update
+             * immediately to make sure the caller won't see obsolete
+             * mapping.
+             */
+            memory_region_transaction_do_commit();
+        }
+
+        /* No further protection needed to access current_map */
+        return as->current_map;
+    }
+
+    /* Otherwise we must have had the RCU lock or something went wrong */
+    assert(rcu_read_is_locked());
     return qatomic_rcu_read(&as->current_map);
 }
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index e538f2fe57..5a84f608ed 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1146,6 +1146,11 @@ void memory_region_transaction_commit(void)
     }
 }
 
+bool memory_region_transaction_in_progress(void)
+{
+    return memory_region_transaction_depth != 0;
+}
+
 static void memory_region_destructor_none(MemoryRegion *mr)
 {
 }
-- 
2.20.1

From 1769661203c450941d2e43ebfd38b2d5ad6f5346 Mon Sep 17 00:00:00 2001
From: Chuang Xu <xuchuangxc...@bytedance.com>
Date: Thu, 17 Nov 2022 17:53:19 +0800
Subject: [PATCH v6 3/3] migration: reduce time of loading non-iterable vmstate

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

        time of loading non-iterable vmstate     downtime
before          about 150 ms                      740+ ms
after           about 30 ms                       630+ ms

In test2, we keep the number of the device the same as test1, reduce the
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

        time of loading non-iterable vmstate     downtime
before          about 90 ms                      about 250 ms

after           about 25 ms                      about 160 ms

In test3, we keep the number of queues per device the same as test1, reduce
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

        time of loading non-iterable vmstate     downtime
before          about 20 ms                      about 70 ms
after           about 11 ms                      about 60 ms

As we can see from the test results above, both the number of queues and
the number of devices have a great impact on the time of loading non-iterable
vmstate. The growth of the number of devices and queues will lead to more
mr commits, and the time consumption caused by the flatview reconstruction
will also increase.

Signed-off-by: Chuang Xu <xuchuangxc...@bytedance.com>
---
 migration/savevm.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index b5e6962bb6..3dd9daabd8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2770,6 +2770,7 @@ out:
             goto retry;
         }
     }
+
     return ret;
 }
 
@@ -2795,7 +2796,25 @@ int qemu_loadvm_state(QEMUFile *f)
 
     cpu_synchronize_all_pre_loadvm();
 
+    /*
+     * Call memory_region_transaction_begin() before loading vmstate.
+     * This call is paired with memory_region_transaction_commit() at
+     * the end of qemu_loadvm_state_main(), in order to pack all the
+     * changes to memory region during the period of loading
+     * non-iterable vmstate in a single memory transaction.
+     * This operation will reduce time of loading non-iterable vmstate
+     */
+    memory_region_transaction_begin();
+
     ret = qemu_loadvm_state_main(f, mis);
+
+    /*
+     * Call memory_region_transaction_commit() after loading vmstate.
+     * At this point, qemu actually completes all the previous memory
+     * region transactions.
+     */
+    memory_region_transaction_commit();
+
     qemu_event_set(&mis->main_thread_load_event);
 
     trace_qemu_loadvm_state_post_main(ret);
-- 
2.20.1

Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

Reply via email to