Re: CPR/liveupdate: test results using prior bug fix
OK, acknowledged. Thanks, All. - Michael On 5/16/24 13:07, Steven Sistare wrote: On 5/16/2024 1:24 PM, Michael Galaxy wrote: On 5/14/24 08:54, Michael Tokarev wrote: On 5/14/24 16:39, Michael Galaxy wrote: Steve, OK, so it does not look like this bugfix you wrote was included in 8.2.4 (which was released yesterday). Unfortunately, that means that anyone using CPR in that release will still (eventually) encounter the bug like I did. 8.2.4 is basically a "bugfix" release for 8.2.3 which I somewhat screwed up (in a minor way), plus a few currently (at the time) queued up changes. 8.2.3 was a big release though. I would recommend that y'all consider cherry-picking, perhaps, the relevant commits for a possible 8.2.5 ? Please Cc changes which are relevant for -stable to, well, qemu-sta...@nongnu.org :) Which changes needs to be picked up? Steve, can you comment here, please? At a minimum, we have this one: [PULL 20/25] migration: stop vm for cpr But that pull came with a handful of other changes that are also not in QEMU v8, so I suspect I'm missing some other important changes that might be important for a stable release? - Michael Hi Michael, I sent the full list of commits to this distribution yesterday, and I see it in my Sent email folder. Copying verbatim: ---- Michael Galaxy, I'm afraid you are out of luck with respect to qemu 8.2. It has some of the cpr reboot commits, but is missing the following: 87a2848 migration: massage cpr-reboot documentation cbdafc1 migration: options incompatible with cpr ce5db1c migration: update cpr-reboot description 9867d4d migration: stop vm for cpr 4af667f migration: notifier error checking bf78a04 migration: refactor migrate_fd_connect failures 6835f5a migration: per-mode notifiers 5663dd3 migration: MigrationNotifyFunc c763a23e migration: remove postcopy_after_devices 9d9babf migration: MigrationEvent for notifiers 3e77573 migration: convert to NotifierWithReturn d91f33c migration: remove error from notifier data be19d83 notify: pass error to notifier with return b12635f migration: fix coverity migrate_mode finding 2b58a8b tests/qtest: postcopy migration with suspend b1fdd21 tests/qtest: precopy migration with suspend 5014478 tests/qtest: option to suspend during migration f064975 tests/qtest: migration events 49a5020 migration: preserve suspended for bg_migration 58b1057 migration: preserve suspended for snapshot b4e9ddc migration: preserve suspended runstate d3c86c99 migration: propagate suspended runstate 9ff5e79 cpus: vm_resume 0f1db06 cpus: check running not RUN_STATE_RUNNING b9ae473 cpus: stop vm in suspended runstate f06f316 cpus: vm_was_suspended All of those landed in qemu 9.0. --- - Steve
Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
These are very compelling results, no? (40gbps cards, right? Are the cards active/active? or active/standby?) - Michael On 5/14/24 10:19, Yu Zhang wrote: Hello Peter and all, I did a comparison of the VM live-migration speeds between RDMA and TCP/IP on our servers and plotted the results to get an initial impression. Unfortunately, the Ethernet NICs are not the recent ones, therefore, it may not make much sense. I can do it on servers with more recent Ethernet NICs and keep you updated. It seems that the benefits of RDMA becomes obviously when the VM has large memory and is running memory-intensive workload. Best regards, Yu Zhang @ IONOS Cloud On Thu, May 9, 2024 at 4:14 PM Peter Xu wrote: On Thu, May 09, 2024 at 04:58:34PM +0800, Zheng Chuan via wrote: That's a good news to see the socket abstraction for RDMA! When I was developed the series above, the most pain is the RDMA migration has no QIOChannel abstraction and i need to take a 'fake channel' for it which is awkward in code implementation. So, as far as I know, we can do this by i. the first thing is that we need to evaluate the rsocket is good enough to satisfy our QIOChannel fundamental abstraction ii. if it works right, then we will continue to see if it can give us opportunity to hide the detail of rdma protocol into rsocket by remove most of code in rdma.c and also some hack in migration main process. iii. implement the advanced features like multi-fd and multi-uri for rdma migration. Since I am not familiar with rsocket, I need some times to look at it and do some quick verify with rdma migration based on rsocket. But, yes, I am willing to involved in this refactor work and to see if we can make this migration feature more better:) Based on what we have now, it looks like we'd better halt the deprecation process a bit, so I think we shouldn't need to rush it at least in 9.1 then, and we'll need to see how it goes on the refactoring. It'll be perfect if rsocket works, otherwise supporting multifd with little overhead / exported APIs would also be a good thing in general with whatever approach. And obviously all based on the facts that we can get resources from companies to support this feature first. Note that so far nobody yet compared with rdma v.s. nic perf, so I hope if any of us can provide some test results please do so. Many people are saying RDMA is better, but I yet didn't see any numbers comparing it with modern TCP networks. I don't want to have old impressions floating around even if things might have changed.. When we have consolidated results, we should share them out and also reflect that in QEMU's migration docs when a rdma document page is ready. Chuan, please check the whole thread discussion, it may help to understand what we are looking for on rdma migrations [1]. Meanwhile please feel free to sync with Jinpu's team and see how to move forward with such a project. [1] https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/87frwatp7n@suse.de/__;!!GjvTz_vk!QnXDo1zSlYecz7JvJky4SOQ9I8V5MoGHbINdAQAzMJQ_yYg_8_BSUXz9kjvbSgFefhG0wi1j38KaC3g$ Thanks, -- Peter Xu
Re: CPR/liveupdate: test results using prior bug fix
On 5/14/24 08:54, Michael Tokarev wrote: On 5/14/24 16:39, Michael Galaxy wrote: Steve, OK, so it does not look like this bugfix you wrote was included in 8.2.4 (which was released yesterday). Unfortunately, that means that anyone using CPR in that release will still (eventually) encounter the bug like I did. 8.2.4 is basically a "bugfix" release for 8.2.3 which I somewhat screwed up (in a minor way), plus a few currently (at the time) queued up changes. 8.2.3 was a big release though. I would recommend that y'all consider cherry-picking, perhaps, the relevant commits for a possible 8.2.5 ? Please Cc changes which are relevant for -stable to, well, qemu-sta...@nongnu.org :) Which changes needs to be picked up? Steve, can you comment here, please? At a minimum, we have this one: [PULL 20/25] migration: stop vm for cpr But that pull came with a handful of other changes that are also not in QEMU v8, so I suspect I'm missing some other important changes that might be important for a stable release? - Michael Thanks, /mjt
Re: CPR/liveupdate: test results using prior bug fix
Steve, OK, so it does not look like this bugfix you wrote was included in 8.2.4 (which was released yesterday). Unfortunately, that means that anyone using CPR in that release will still (eventually) encounter the bug like I did. I would recommend that y'all consider cherry-picking, perhaps, the relevant commits for a possible 8.2.5 ? - Michael On 5/13/24 20:15, Michael Galaxy wrote: Hi Steve, Thanks for the response. It looks like literally *just today* 8.2.4 was released. I'll go check it out. - Michael On 5/13/24 15:10, Steven Sistare wrote: Hi Michael, No surprise here, I did see some of the same failure messages and they prompted me to submit the fix. They are all symptoms of "the possibility of ram and device state being out of sync" as mentioned in the commit. I am not familiar with the process for maintaining old releases for qemu. Perhaps someone on this list can comment on 8.2.3. - Steve On 5/13/2024 2:22 PM, Michael Galaxy wrote: Hi Steve, We found that this specific change in particular ("migration: stop vm for cpr") fixes a bug that we've identified in testing back-to-back live updates in a lab environment. More specifically, *without* this change (which is not available in 8.2.2, but *is* available in 9.0.0) causes the metadata save file to be corrupted when doing live updates one after another. Typically we see a corrupted save file somewhere in between 20 and 30 live updates and while doing a git bisect, we found that this change makes the problem go away. Were you aware? Is there any plan in place to cherry pick this for 8.2.3, perhaps or a plan to release 8.2.3 at some point? Here are some examples of how the bug manifests in different locations of the QEMU metadata save file: 2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load mtrr_var:base 2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load cpu:env.mtrr_var 2024-04-26T13:28:53Z qemu-system-x86_64: error while loading state for instance 0x1b of device 'cpu' 2024-04-26T13:28:53Z qemu-system-x86_64: load of migration failed: Input/output error And another: 2024-04-17T16:09:47Z qemu-system-x86_64: check_section_footer: Read section footer failed: -5 2024-04-17T16:09:47Z qemu-system-x86_64: load of migration failed: Invalid argument And another: 2024-04-30T21:53:29Z qemu-system-x86_64: Unable to read ID string for section 163 2024-04-30T21:53:29Z qemu-system-x86_64: load of migration failed: Invalid argument And another: 2024-05-01T16:01:44Z qemu-system-x86_64: Unable to read ID string for section 164 2024-05-01T16:01:44Z qemu-system-x86_64: load of migration failed: Invalid argument As you can see, they occur quite randomly, but generally it takes at least 20-30+ live updates before the problem occurs. - Michael On 2/27/24 23:13, pet...@redhat.com wrote: From: Steve Sistare When migration for cpr is initiated, stop the vm and set state RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the possibility of ram and device state being out of sync, and guarantees that a guest in the suspended state remains suspended, because qmp_cont rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Link:https://urldefense.com/v3/__https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com__;!!GjvTz_vk!QLsFOCX-x2U9bzAo98SdidKlomHrmf_t0UmQKtgudoIcaDVoAJOPm39ZqaNP_nT5I8QqVfSgwhDZmg$ Signed-off-by: Peter Xu --- include/migration/misc.h | 1 + migration/migration.h | 2 -- migration/migration.c | 51 3 files changed, 32 insertions(+), 22 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index e4933b815b..5d1aa593ed 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -60,6 +60,7 @@ void migration_object_init(void); void migration_shutdown(void); bool migration_is_idle(void); bool migration_is_active(MigrationState *); +bool migrate_mode_is_cpr(MigrationState *); typedef enum MigrationEventType { MIG_EVENT_PRECOPY_SETUP, diff --git a/migration/migration.h b/migration/migration.h index aef8afbe1f..65c0b61cbd 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s); */ void migration_rp_kick(MigrationState *s); -int migration_stop_vm(RunState state); - #endif diff --git a/migration/migration.c b/migration/migration.c index 37c836b0b0..90a90947fb 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -167,11 +167,19 @@ static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp) return (a > b) - (a < b); } -int migration_stop_vm(RunState state) +static int migration_stop_vm(MigrationState *s, RunState state) { - int ret = vm_stop_force_state(state); + int ret; + + migration_downtime_start(s); + + s-&g
Re: CPR/liveupdate: test results using prior bug fix
Hi Steve, Thanks for the response. It looks like literally *just today* 8.2.4 was released. I'll go check it out. - Michael On 5/13/24 15:10, Steven Sistare wrote: Hi Michael, No surprise here, I did see some of the same failure messages and they prompted me to submit the fix. They are all symptoms of "the possibility of ram and device state being out of sync" as mentioned in the commit. I am not familiar with the process for maintaining old releases for qemu. Perhaps someone on this list can comment on 8.2.3. - Steve On 5/13/2024 2:22 PM, Michael Galaxy wrote: Hi Steve, We found that this specific change in particular ("migration: stop vm for cpr") fixes a bug that we've identified in testing back-to-back live updates in a lab environment. More specifically, *without* this change (which is not available in 8.2.2, but *is* available in 9.0.0) causes the metadata save file to be corrupted when doing live updates one after another. Typically we see a corrupted save file somewhere in between 20 and 30 live updates and while doing a git bisect, we found that this change makes the problem go away. Were you aware? Is there any plan in place to cherry pick this for 8.2.3, perhaps or a plan to release 8.2.3 at some point? Here are some examples of how the bug manifests in different locations of the QEMU metadata save file: 2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load mtrr_var:base 2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load cpu:env.mtrr_var 2024-04-26T13:28:53Z qemu-system-x86_64: error while loading state for instance 0x1b of device 'cpu' 2024-04-26T13:28:53Z qemu-system-x86_64: load of migration failed: Input/output error And another: 2024-04-17T16:09:47Z qemu-system-x86_64: check_section_footer: Read section footer failed: -5 2024-04-17T16:09:47Z qemu-system-x86_64: load of migration failed: Invalid argument And another: 2024-04-30T21:53:29Z qemu-system-x86_64: Unable to read ID string for section 163 2024-04-30T21:53:29Z qemu-system-x86_64: load of migration failed: Invalid argument And another: 2024-05-01T16:01:44Z qemu-system-x86_64: Unable to read ID string for section 164 2024-05-01T16:01:44Z qemu-system-x86_64: load of migration failed: Invalid argument As you can see, they occur quite randomly, but generally it takes at least 20-30+ live updates before the problem occurs. - Michael On 2/27/24 23:13, pet...@redhat.com wrote: From: Steve Sistare When migration for cpr is initiated, stop the vm and set state RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the possibility of ram and device state being out of sync, and guarantees that a guest in the suspended state remains suspended, because qmp_cont rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Link:https://urldefense.com/v3/__https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com__;!!GjvTz_vk!QLsFOCX-x2U9bzAo98SdidKlomHrmf_t0UmQKtgudoIcaDVoAJOPm39ZqaNP_nT5I8QqVfSgwhDZmg$ Signed-off-by: Peter Xu --- include/migration/misc.h | 1 + migration/migration.h | 2 -- migration/migration.c | 51 3 files changed, 32 insertions(+), 22 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index e4933b815b..5d1aa593ed 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -60,6 +60,7 @@ void migration_object_init(void); void migration_shutdown(void); bool migration_is_idle(void); bool migration_is_active(MigrationState *); +bool migrate_mode_is_cpr(MigrationState *); typedef enum MigrationEventType { MIG_EVENT_PRECOPY_SETUP, diff --git a/migration/migration.h b/migration/migration.h index aef8afbe1f..65c0b61cbd 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s); */ void migration_rp_kick(MigrationState *s); -int migration_stop_vm(RunState state); - #endif diff --git a/migration/migration.c b/migration/migration.c index 37c836b0b0..90a90947fb 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -167,11 +167,19 @@ static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp) return (a > b) - (a < b); } -int migration_stop_vm(RunState state) +static int migration_stop_vm(MigrationState *s, RunState state) { - int ret = vm_stop_force_state(state); + int ret; + + migration_downtime_start(s); + + s->vm_old_state = runstate_get(); + global_state_store(); + + ret = vm_stop_force_state(state); trace_vmstate_downtime_checkpoint("src-vm-stopped"); + trace_migration_completion_vm_stop(ret); return ret; } @@ -1602,6 +1610,11 @@ bool migration_is_active(MigrationState *s) s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); } +bool migrate_mode_is_cpr(MigrationS
Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
One thing to keep in mind here (despite me not having any hardware to test) was that one of the original goals here in the RDMA implementation was not simply raw throughput nor raw latency, but a lack of CPU utilization in kernel space due to the offload. While it is entirely possible that newer hardware w/ TCP might compete, the significant reductions in CPU usage in the TCP/IP stack were a big win at the time. Just something to consider while you're doing the testing - Michael On 5/9/24 03:58, Zheng Chuan wrote: Hi, Peter,Lei,Jinpu. On 2024/5/8 0:28, Peter Xu wrote: On Tue, May 07, 2024 at 01:50:43AM +, Gonglei (Arei) wrote: Hello, -Original Message- From: Peter Xu [mailto:pet...@redhat.com] Sent: Monday, May 6, 2024 11:18 PM To: Gonglei (Arei) Cc: Daniel P. Berrangé ; Markus Armbruster ; Michael Galaxy ; Yu Zhang ; Zhijian Li (Fujitsu) ; Jinpu Wang ; Elmar Gerdes ; qemu-devel@nongnu.org; Yuval Shaia ; Kevin Wolf ; Prasanna Kumar Kalever ; Cornelia Huck ; Michael Roth ; Prasanna Kumar Kalever ; integrat...@gluster.org; Paolo Bonzini ; qemu-bl...@nongnu.org; de...@lists.libvirt.org; Hanna Reitz ; Michael S. Tsirkin ; Thomas Huth ; Eric Blake ; Song Gao ; Marc-André Lureau ; Alex Bennée ; Wainer dos Santos Moschetta ; Beraldo Leal ; Pannengyuan ; Xiexiangyou Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling On Mon, May 06, 2024 at 02:06:28AM +, Gonglei (Arei) wrote: Hi, Peter Hey, Lei, Happy to see you around again after years. Haha, me too. RDMA features high bandwidth, low latency (in non-blocking lossless network), and direct remote memory access by bypassing the CPU (As you know, CPU resources are expensive for cloud vendors, which is one of the reasons why we introduced offload cards.), which TCP does not have. It's another cost to use offload cards, v.s. preparing more cpu resources? Software and hardware offload converged architecture is the way to go for all cloud vendors (Including comprehensive benefits in terms of performance, cost, security, and innovation speed), it's not just a matter of adding the resource of a DPU card. In some scenarios where fast live migration is needed (extremely short interruption duration and migration duration) is very useful. To this end, we have also developed RDMA support for multifd. Will any of you upstream that work? I'm curious how intrusive would it be when adding it to multifd, if it can keep only 5 exported functions like what rdma.h does right now it'll be pretty nice. We also want to make sure it works with arbitrary sized loads and buffers, e.g. vfio is considering to add IO loads to multifd channels too. In fact, we sent the patchset to the community in 2021. Pls see: https://urldefense.com/v3/__https://lore.kernel.org/all/20210203185906.GT2950@work-vm/T/__;!!GjvTz_vk!VfP_SV-8uRya7rBdopv8OUJkmnSi44Ktpqq1E7sr_Xcwt6zvveW51qboWOBSTChdUG1hJwfAl7HZl4NUEGc$ Yes, I have sent the patchset of multifd support for rdma migration by taking over my colleague, and also sorry for not keeping on this work at that time due to some reasons. And also I am strongly agree with Lei that the RDMA protocol has some special advantages against with TCP in some scenario, and we are indeed to use it in our product. I wasn't aware of that for sure in the past.. Multifd has changed quite a bit in the last 9.0 release, that may not apply anymore. One thing to mention is please look at Dan's comment on possible use of rsocket.h: https://urldefense.com/v3/__https://lore.kernel.org/all/zjjm6rcqs5eho...@redhat.com/__;!!GjvTz_vk!VfP_SV-8uRya7rBdopv8OUJkmnSi44Ktpqq1E7sr_Xcwt6zvveW51qboWOBSTChdUG1hJwfAl7HZ0CFSE-o$ And Jinpu did help provide an initial test result over the library: https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/camgffek8wiknqmouyxcathgtiem2dwocf_w7t0vmcd-i30t...@mail.gmail.com/__;!!GjvTz_vk!VfP_SV-8uRya7rBdopv8OUJkmnSi44Ktpqq1E7sr_Xcwt6zvveW51qboWOBSTChdUG1hJwfAl7HZxPNcdb4$ It looks like we have a chance to apply that in QEMU. One thing to note that the question here is not about a pure performance comparison between rdma and nics only. It's about help us make a decision on whether to drop rdma, iow, even if rdma performs well, the community still has the right to drop it if nobody can actively work and maintain it. It's just that if nics can perform as good it's more a reason to drop, unless companies can help to provide good support and work together. We are happy to provide the necessary review and maintenance work for RDMA if the community needs it. CC'ing Chuan Zheng. I'm not sure whether you and Jinpu's team would like to work together and provide a final solution for rdma over multifd. It could be much simpler than the original 2021 proposal if the rsocket API will work out. Thanks, That's a good news to see the socket abstraction for RDMA! When I was developed the series above, the most pain is the RDMA migration has no QIOChannel
CPR/liveupdate: test results using prior bug fix
Hi Steve, We found that this specific change in particular ("migration: stop vm for cpr") fixes a bug that we've identified in testing back-to-back live updates in a lab environment. More specifically, *without* this change (which is not available in 8.2.2, but *is* available in 9.0.0) causes the metadata save file to be corrupted when doing live updates one after another. Typically we see a corrupted save file somewhere in between 20 and 30 live updates and while doing a git bisect, we found that this change makes the problem go away. Were you aware? Is there any plan in place to cherry pick this for 8.2.3, perhaps or a plan to release 8.2.3 at some point? Here are some examples of how the bug manifests in different locations of the QEMU metadata save file: 2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load mtrr_var:base 2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load cpu:env.mtrr_var 2024-04-26T13:28:53Z qemu-system-x86_64: error while loading state for instance 0x1b of device 'cpu' 2024-04-26T13:28:53Z qemu-system-x86_64: load of migration failed: Input/output error And another: 2024-04-17T16:09:47Z qemu-system-x86_64: check_section_footer: Read section footer failed: -5 2024-04-17T16:09:47Z qemu-system-x86_64: load of migration failed: Invalid argument And another: 2024-04-30T21:53:29Z qemu-system-x86_64: Unable to read ID string for section 163 2024-04-30T21:53:29Z qemu-system-x86_64: load of migration failed: Invalid argument And another: 2024-05-01T16:01:44Z qemu-system-x86_64: Unable to read ID string for section 164 2024-05-01T16:01:44Z qemu-system-x86_64: load of migration failed: Invalid argument As you can see, they occur quite randomly, but generally it takes at least 20-30+ live updates before the problem occurs. - Michael On 2/27/24 23:13, pet...@redhat.com wrote: From: Steve Sistare When migration for cpr is initiated, stop the vm and set state RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the possibility of ram and device state being out of sync, and guarantees that a guest in the suspended state remains suspended, because qmp_cont rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Link:https://urldefense.com/v3/__https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com__;!!GjvTz_vk!QLsFOCX-x2U9bzAo98SdidKlomHrmf_t0UmQKtgudoIcaDVoAJOPm39ZqaNP_nT5I8QqVfSgwhDZmg$ Signed-off-by: Peter Xu --- include/migration/misc.h | 1 + migration/migration.h| 2 -- migration/migration.c| 51 3 files changed, 32 insertions(+), 22 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index e4933b815b..5d1aa593ed 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -60,6 +60,7 @@ void migration_object_init(void); void migration_shutdown(void); bool migration_is_idle(void); bool migration_is_active(MigrationState *); +bool migrate_mode_is_cpr(MigrationState *); typedef enum MigrationEventType { MIG_EVENT_PRECOPY_SETUP, diff --git a/migration/migration.h b/migration/migration.h index aef8afbe1f..65c0b61cbd 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s); */ void migration_rp_kick(MigrationState *s); -int migration_stop_vm(RunState state); - #endif diff --git a/migration/migration.c b/migration/migration.c index 37c836b0b0..90a90947fb 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -167,11 +167,19 @@ static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp) return (a > b) - (a < b); } -int migration_stop_vm(RunState state) +static int migration_stop_vm(MigrationState *s, RunState state) { -int ret = vm_stop_force_state(state); +int ret; + +migration_downtime_start(s); + +s->vm_old_state = runstate_get(); +global_state_store(); + +ret = vm_stop_force_state(state); trace_vmstate_downtime_checkpoint("src-vm-stopped"); +trace_migration_completion_vm_stop(ret); return ret; } @@ -1602,6 +1610,11 @@ bool migration_is_active(MigrationState *s) s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); } +bool migrate_mode_is_cpr(MigrationState *s) +{ +return s->parameters.mode == MIG_MODE_CPR_REBOOT; +} + int migrate_init(MigrationState *s, Error **errp) { int ret; @@ -2454,10 +2467,7 @@ static int postcopy_start(MigrationState *ms, Error **errp) bql_lock(); trace_postcopy_start_set_run(); -migration_downtime_start(ms); - -global_state_store(); -ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); +ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE); if (ret < 0) { goto fail; } @@ -2652,15 +2662,12 @@ static int migration_completion_precopy(MigrationState *s, int ret;
Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
Yu Zhang / Jinpu, Any possibility (at your lesiure, and within the disclosure rules of your company, IONOS) if you could share any of your performance information to educate the group? NICs have indeed changed, but not everybody has 100ge mellanox cards at their disposal. Some people don't. - Michael On 5/1/24 11:16, Peter Xu wrote: On Wed, May 01, 2024 at 04:59:38PM +0100, Daniel P. Berrangé wrote: On Wed, May 01, 2024 at 11:31:13AM -0400, Peter Xu wrote: What I worry more is whether this is really what we want to keep rdma in qemu, and that's also why I was trying to request for some serious performance measurements comparing rdma v.s. nics. And here when I said "we" I mean both QEMU community and any company that will support keeping rdma around. The problem is if NICs now are fast enough to perform at least equally against rdma, and if it has a lower cost of overall maintenance, does it mean that rdma migration will only be used by whoever wants to keep them in the products and existed already? In that case we should simply ask new users to stick with tcp, and rdma users should only drop but not increase. It seems also destined that most new migration features will not support rdma: see how much we drop old features in migration now (which rdma _might_ still leverage, but maybe not), and how much we add mostly multifd relevant which will probably not apply to rdma at all. So in general what I am worrying is a both-loss condition, if the company might be easier to either stick with an old qemu (depending on whether other new features are requested to be used besides RDMA alone), or do periodic rebase with RDMA downstream only. I don't know much about the originals of RDMA support in QEMU and why this particular design was taken. It is indeed a huge maint burden to have a completely different code flow for RDMA with 4000+ lines of custom protocol signalling which is barely understandable. I would note that /usr/include/rdma/rsocket.h provides a higher level API that is a 1-1 match of the normal kernel 'sockets' API. If we had leveraged that, then QIOChannelSocket class and the QAPI SocketAddress type could almost[1] trivially have supported RDMA. There would have been almost no RDMA code required in the migration subsystem, and all the modern features like compression, multifd, post-copy, etc would "just work". I guess the 'rsocket.h' shim may well limit some of the possible performance gains, but it might still have been a better tradeoff to have not quite so good peak performance, but with massively less maint burden. My understanding so far is RDMA is sololy for performance but nothing else, then it's a question on whether rdma existing users would like to do so if it will run slower. Jinpu mentioned on the explicit usages of ib verbs but I am just mostly quotting that word as I don't really know such details: https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/camgffem2twjxopcnqtq1sjytf5395dbztcmyikrqfxdzjws...@mail.gmail.com/__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOew9oW_kg$ So not sure whether that applies here too, in that having qiochannel wrapper may not allow direct access to those ib verbs. Thanks, With regards, Daniel [1] "almost" trivially, because the poll() integration for rsockets requires a bit more magic sauce since rsockets FDs are not really FDs from the kernel's POV. Still, QIOCHannel likely can abstract that probme. -- |: https://urldefense.com/v3/__https://berrange.com__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOfyTmFFUQ$ -o- https://urldefense.com/v3/__https://www.flickr.com/photos/dberrange__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOf8A5OC0Q$ :| |: https://urldefense.com/v3/__https://libvirt.org__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOf3gffAdg$ -o- https://urldefense.com/v3/__https://fstop138.berrange.com__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOfPMofYqw$ :| |: https://urldefense.com/v3/__https://entangle-photo.org__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOeQ5jjAeQ$ -o- https://urldefense.com/v3/__https://www.instagram.com/dberrange__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOfhaDF9WA$ :|
Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
Reviewed-by: Michael Galaxy Thanks Yu Zhang and Peter. - Michael On 4/29/24 15:45, Yu Zhang wrote: Hello Michael and Peter, We are very glad at your quick and kind reply about our plan to take over the maintenance of your code. The message is for presenting our plan and working together. If we were able to obtain the maintainer's role, our plan is: 1. Create the necessary unit-test cases and get them integrated into the current QEMU GitLab-CI pipeline 2. Review and test the code changes by other developers to ensure that nothing is broken in the changed code before being merged by the community 3. Based on our current practice and application scenario, look for possible improvements when necessary Besides that, a patch is attached to announce this change in the community. With your generous support, we hope that the development community will make a positive decision for us. Kind regards, Yu Zhang@ IONOS Cloud On Mon, Apr 29, 2024 at 4:57 PM Peter Xu wrote: On Mon, Apr 29, 2024 at 08:08:10AM -0500, Michael Galaxy wrote: Hi All (and Peter), Hi, Michael, My name is Michael Galaxy (formerly Hines). Yes, I changed my last name (highly irregular for a male) and yes, that's my real last name: https://urldefense.com/v3/__https://www.linkedin.com/in/mrgalaxy/__;!!GjvTz_vk!TZmnCE90EK692dSjZGr-2cpOEZBQTBsTO2bW5z3rSbpZgNVCexZkxwDXhmIOWG2GAKZAUovQ5xe5coQ$ ) I'm the original author of the RDMA implementation. I've been discussing with Yu Zhang for a little bit about potentially handing over maintainership of the codebase to his team. I simply have zero access to RoCE or Infiniband hardware at all, unfortunately. so I've never been able to run tests or use what I wrote at work, and as all of you know, if you don't have a way to test something, then you can't maintain it. Yu Zhang put a (very kind) proposal forward to me to ask the community if they feel comfortable training his team to maintain the codebase (and run tests) while they learn about it. The "while learning" part is fine at least to me. IMHO the "ownership" to the code, or say, taking over the responsibility, may or may not need 100% mastering the code base first. There should still be some fundamental confidence to work on the code though as a starting point, then it's about serious use case to back this up, and careful testings while getting more familiar with it. If you don't mind, I'd like to let him send over his (very detailed) proposal, Yes please, it's exactly the time to share the plan. The hope is we try to reach a consensus before or around the middle of this release (9.1). Normally QEMU has a 3~4 months window for each release and 9.1 schedule is not yet out, but I think it means we make a decision before or around middle of June. Thanks, - Michael On 4/11/24 11:36, Yu Zhang wrote: 1) Either a CI test covering at least the major RDMA paths, or at least periodically tests for each QEMU release will be needed. We use a batch of regression test cases for the stack, which covers the test for QEMU. I did such test for most of the QEMU releases planned as candidates for rollout. The migration test needs a pair of (either physical or virtual) servers with InfiniBand network, which makes it difficult to do on a single server. The nested VM could be a possible approach, for which we may need virtual InfiniBand network. Is SoftRoCE [1] a choice? I will try it and let you know. [1] https://urldefense.com/v3/__https://enterprise-support.nvidia.com/s/article/howto-configure-soft-roce__;!!GjvTz_vk!VEqNfg3Kdf58Oh1FkGL6ErDLfvUXZXPwMTaXizuIQeIgJiywPzuwbqx8wM0KUsyopw_EYQxWvGHE3ig$ Thanks and best regards! On Thu, Apr 11, 2024 at 4:20 PM Peter Xu wrote: On Wed, Apr 10, 2024 at 09:49:15AM -0400, Peter Xu wrote: On Wed, Apr 10, 2024 at 02:28:59AM +, Zhijian Li (Fujitsu) via wrote: on 4/10/2024 3:46 AM, Peter Xu wrote: Is there document/link about the unittest/CI for migration tests, Why are those tests missing? Is it hard or very special to set up an environment for that? maybe we can help in this regards. See tests/qtest/migration-test.c. We put most of our migration tests there and that's covered in CI. I think one major issue is CI systems don't normally have rdma devices. Can rdma migration test be carried out without a real hardware? Yeah, RXE aka. SOFT-RoCE is able to emulate the RDMA, for example $ sudo rdma link add rxe_eth0 type rxe netdev eth0 # on host then we can get a new RDMA interface "rxe_eth0". This new RDMA interface is able to do the QEMU RDMA migration. Also, the loopback(lo) device is able to emulate the RDMA interface "rxe_lo", however when I tried(years ago) to do RDMA migration over this interface(rdma:127.0.0.1:) , it got something wrong. So i gave up enabling the RDMA migration qtest at that time. Thanks, Zhijian. I'm not sure adding an emu-link for rdma is doable for CI systems, though. Maybe someone more familiar with how CI wor
Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
Hi All (and Peter), My name is Michael Galaxy (formerly Hines). Yes, I changed my last name (highly irregular for a male) and yes, that's my real last name: https://www.linkedin.com/in/mrgalaxy/) I'm the original author of the RDMA implementation. I've been discussing with Yu Zhang for a little bit about potentially handing over maintainership of the codebase to his team. I simply have zero access to RoCE or Infiniband hardware at all, unfortunately. so I've never been able to run tests or use what I wrote at work, and as all of you know, if you don't have a way to test something, then you can't maintain it. Yu Zhang put a (very kind) proposal forward to me to ask the community if they feel comfortable training his team to maintain the codebase (and run tests) while they learn about it. If you don't mind, I'd like to let him send over his (very detailed) proposal, - Michael On 4/11/24 11:36, Yu Zhang wrote: 1) Either a CI test covering at least the major RDMA paths, or at least periodically tests for each QEMU release will be needed. We use a batch of regression test cases for the stack, which covers the test for QEMU. I did such test for most of the QEMU releases planned as candidates for rollout. The migration test needs a pair of (either physical or virtual) servers with InfiniBand network, which makes it difficult to do on a single server. The nested VM could be a possible approach, for which we may need virtual InfiniBand network. Is SoftRoCE [1] a choice? I will try it and let you know. [1] https://urldefense.com/v3/__https://enterprise-support.nvidia.com/s/article/howto-configure-soft-roce__;!!GjvTz_vk!VEqNfg3Kdf58Oh1FkGL6ErDLfvUXZXPwMTaXizuIQeIgJiywPzuwbqx8wM0KUsyopw_EYQxWvGHE3ig$ Thanks and best regards! On Thu, Apr 11, 2024 at 4:20 PM Peter Xu wrote: On Wed, Apr 10, 2024 at 09:49:15AM -0400, Peter Xu wrote: On Wed, Apr 10, 2024 at 02:28:59AM +, Zhijian Li (Fujitsu) via wrote: on 4/10/2024 3:46 AM, Peter Xu wrote: Is there document/link about the unittest/CI for migration tests, Why are those tests missing? Is it hard or very special to set up an environment for that? maybe we can help in this regards. See tests/qtest/migration-test.c. We put most of our migration tests there and that's covered in CI. I think one major issue is CI systems don't normally have rdma devices. Can rdma migration test be carried out without a real hardware? Yeah, RXE aka. SOFT-RoCE is able to emulate the RDMA, for example $ sudo rdma link add rxe_eth0 type rxe netdev eth0 # on host then we can get a new RDMA interface "rxe_eth0". This new RDMA interface is able to do the QEMU RDMA migration. Also, the loopback(lo) device is able to emulate the RDMA interface "rxe_lo", however when I tried(years ago) to do RDMA migration over this interface(rdma:127.0.0.1:) , it got something wrong. So i gave up enabling the RDMA migration qtest at that time. Thanks, Zhijian. I'm not sure adding an emu-link for rdma is doable for CI systems, though. Maybe someone more familiar with how CI works can chim in. Some people got dropped on the cc list for unknown reason, I'm adding them back (Fabiano, Peter Maydell, Phil). Let's make sure nobody is dropped by accident. I'll try to summarize what is still missing, and I think these will be greatly helpful if we don't want to deprecate rdma migration: 1) Either a CI test covering at least the major RDMA paths, or at least periodically tests for each QEMU release will be needed. 2) Some performance tests between modern RDMA and NIC devices are welcomed. The current knowledge is modern NIC can work similarly to RDMA in performance, then it's debatable why we still maintain so much rdma specific code. 3) No need to be soild patchsets for this one, but some plan to improve RDMA migration code so that it is not almost isolated from the rest protocols. 4) Someone to look after this code for real. For 2) and 3) more info is here: https://urldefense.com/v3/__https://lore.kernel.org/r/ZhWa0YeAb9ySVKD1@x1n__;!!GjvTz_vk!VEqNfg3Kdf58Oh1FkGL6ErDLfvUXZXPwMTaXizuIQeIgJiywPzuwbqx8wM0KUsyopw_EYQxWpIWYBhQ$ Here 4) can be the most important as Markus pointed out. We just didn't get there yet on the discussions, but maybe Markus is right that we should talk that first. Thanks, -- Peter Xu
Re: [PATCH V3] migration: simplify notifiers
On 6/7/23 09:42, Steve Sistare wrote: Pass the callback function to add_migration_state_change_notifier so that migration can initialize the notifier on add and clear it on delete, which simplifies the call sites. Shorten the function names so the extra arg can be added more legibly. Hide the global notifier list in a new function migration_call_notifiers, and make it externally visible so future live update code can call it. Tested-by: Michael Galaxy Reviewed-by: Michael Galaxy No functional change. Signed-off-by: Steve Sistare --- hw/net/virtio-net.c | 6 +++--- hw/vfio/migration.c | 8 include/migration/misc.h | 6 -- migration/migration.c| 22 -- net/vhost-vdpa.c | 7 --- ui/spice-core.c | 3 +-- 6 files changed, 32 insertions(+), 20 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 6df6b73..c4dc795 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -3605,8 +3605,8 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) n->primary_listener.hide_device = failover_hide_primary_device; qatomic_set(>failover_primary_hidden, true); device_listener_register(>primary_listener); -n->migration_state.notify = virtio_net_migration_state_notifier; -add_migration_state_change_notifier(>migration_state); +migration_add_notifier(>migration_state, + virtio_net_migration_state_notifier); n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); } @@ -3769,7 +3769,7 @@ static void virtio_net_device_unrealize(DeviceState *dev) if (n->failover) { qobject_unref(n->primary_opts); device_listener_unregister(>primary_listener); -remove_migration_state_change_notifier(>migration_state); +migration_remove_notifier(>migration_state); } else { assert(n->primary_opts == NULL); } diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index c4656bb..8af0294 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -619,9 +619,9 @@ static int vfio_migration_init(VFIODevice *vbasedev) migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev, vfio_vmstate_change, vbasedev); -migration->migration_state.notify = vfio_migration_state_notifier; -add_migration_state_change_notifier(>migration_state); - +migration_add_notifier(>migration_state, + vfio_migration_state_notifier); + return 0; } @@ -670,7 +670,7 @@ void vfio_migration_exit(VFIODevice *vbasedev) if (vbasedev->migration) { VFIOMigration *migration = vbasedev->migration; -remove_migration_state_change_notifier(>migration_state); +migration_remove_notifier(>migration_state); qemu_del_vm_change_state_handler(migration->vm_state); unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev); vfio_migration_free(vbasedev); diff --git a/include/migration/misc.h b/include/migration/misc.h index 5ebe13b..0987eb1 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -59,8 +59,10 @@ void migration_object_init(void); void migration_shutdown(void); bool migration_is_idle(void); bool migration_is_active(MigrationState *); -void add_migration_state_change_notifier(Notifier *notify); -void remove_migration_state_change_notifier(Notifier *notify); +void migration_add_notifier(Notifier *notify, +void (*func)(Notifier *notifier, void *data)); +void migration_remove_notifier(Notifier *notify); +void migration_call_notifiers(MigrationState *s); bool migration_in_setup(MigrationState *); bool migration_has_finished(MigrationState *); bool migration_has_failed(MigrationState *); diff --git a/migration/migration.c b/migration/migration.c index 5103e2f..17b4b47 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1178,7 +1178,7 @@ static void migrate_fd_cleanup(MigrationState *s) /* It is used on info migrate. We can't free it */ error_report_err(error_copy(s->error)); } -notifier_list_notify(_state_notifiers, s); +migration_call_notifiers(s); block_cleanup_parameters(); yank_unregister_instance(MIGRATION_YANK_INSTANCE); } @@ -1273,14 +1273,24 @@ static void migrate_fd_cancel(MigrationState *s) } } -void add_migration_state_change_notifier(Notifier *notify) +void migration_add_notifier(Notifier *notify, +void (*func)(Notifier *notifier, void *data)) { +notify->notify = func; notifier_list_add(_state_notifiers, notify); } -void remove_migration_state_change_notifier(Notifi
Re: [PATCH V4 0/2] migration file URI
Tested-by: Michael Galaxy Reviewed-by: Michael Galaxy On 6/30/23 09:25, Steve Sistare wrote: Add the migration URI "file:filename[,offset=offset]". Fabiano Rosas has submitted the unit tests in the series migration: Test the new "file:" migration Steve Sistare (2): migration: file URI migration: file URI offset migration/file.c | 103 + migration/file.h | 14 +++ migration/meson.build | 1 + migration/migration.c | 5 +++ migration/trace-events | 4 ++ qemu-options.hx| 7 +++- 6 files changed, 133 insertions(+), 1 deletion(-) create mode 100644 migration/file.c create mode 100644 migration/file.h
Re: [PATCH V4] migration: simplify blockers
On 7/7/23 15:20, Steve Sistare wrote: Modify migrate_add_blocker and migrate_del_blocker to take an Error ** reason. This allows migration to own the Error object, so that if an error occurs, migration code can free the Error and clear the client handle, simplifying client code. This is also a pre-requisite for future patches that will add a mode argument to migration requests to support live update, and will maintain a list of blockers for each mode. A blocker may apply to a single mode or to multiple modes, and passing Error** will allow one Error object to be registered for multiple modes. No functional change. Tested-by: Michael Galaxy Reviewed-by: Michael Galaxy Signed-off-by: Steve Sistare --- backends/tpm/tpm_emulator.c | 10 ++ block/parallels.c| 6 ++ block/qcow.c | 6 ++ block/vdi.c | 6 ++ block/vhdx.c | 6 ++ block/vmdk.c | 6 ++ block/vpc.c | 6 ++ block/vvfat.c| 6 ++ dump/dump.c | 4 ++-- hw/9pfs/9p.c | 10 ++ hw/display/virtio-gpu-base.c | 8 ++-- hw/intc/arm_gic_kvm.c| 3 +-- hw/intc/arm_gicv3_its_kvm.c | 3 +-- hw/intc/arm_gicv3_kvm.c | 3 +-- hw/misc/ivshmem.c| 8 ++-- hw/ppc/pef.c | 2 +- hw/ppc/spapr.c | 2 +- hw/ppc/spapr_events.c| 2 +- hw/ppc/spapr_rtas.c | 2 +- hw/remote/proxy.c| 7 ++- hw/s390x/s390-virtio-ccw.c | 9 +++-- hw/scsi/vhost-scsi.c | 8 +++- hw/vfio/common.c | 26 -- hw/vfio/migration.c | 16 ++-- hw/virtio/vhost.c| 8 ++-- include/migration/blocker.h | 24 +--- migration/migration.c| 22 ++ stubs/migr-blocker.c | 4 ++-- target/i386/kvm/kvm.c| 8 target/i386/nvmm/nvmm-all.c | 3 +-- target/i386/sev.c| 2 +- target/i386/whpx/whpx-all.c | 3 +-- ui/vdagent.c | 5 ++--- 33 files changed, 89 insertions(+), 155 deletions(-) diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c index 402a2d6..bf1a90f 100644 --- a/backends/tpm/tpm_emulator.c +++ b/backends/tpm/tpm_emulator.c @@ -534,11 +534,8 @@ static int tpm_emulator_block_migration(TPMEmulator *tpm_emu) error_setg(_emu->migration_blocker, "Migration disabled: TPM emulator does not support " "migration"); -if (migrate_add_blocker(tpm_emu->migration_blocker, ) < 0) { +if (migrate_add_blocker(_emu->migration_blocker, ) < 0) { error_report_err(err); -error_free(tpm_emu->migration_blocker); -tpm_emu->migration_blocker = NULL; - return -1; } } @@ -1016,10 +1013,7 @@ static void tpm_emulator_inst_finalize(Object *obj) qapi_free_TPMEmulatorOptions(tpm_emu->options); -if (tpm_emu->migration_blocker) { -migrate_del_blocker(tpm_emu->migration_blocker); -error_free(tpm_emu->migration_blocker); -} +migrate_del_blocker(_emu->migration_blocker); tpm_sized_buffer_reset(_blobs->volatil); tpm_sized_buffer_reset(_blobs->permanent); diff --git a/block/parallels.c b/block/parallels.c index 18e34ae..c2d92c4 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -960,9 +960,8 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags, error_setg(>migration_blocker, "The Parallels format used by node '%s' " "does not support live migration", bdrv_get_device_or_node_name(bs)); -ret = migrate_add_blocker(s->migration_blocker, errp); +ret = migrate_add_blocker(>migration_blocker, errp); if (ret < 0) { -error_free(s->migration_blocker); goto fail; } qemu_co_mutex_init(>lock); @@ -994,8 +993,7 @@ static void parallels_close(BlockDriverState *bs) g_free(s->bat_dirty_bmap); qemu_vfree(s->header); -migrate_del_blocker(s->migration_blocker); -error_free(s->migration_blocker); +migrate_del_blocker(>migration_blocker); } static BlockDriver bdrv_parallels = { diff --git a/block/qcow.c b/block/qcow.c index 577bd70..feedad5 100644 --- a/block/qcow.c +++ b/block/qcow.c @@ -304,9 +304,8 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags, error_setg(>migration_blocker, "The qcow format used by node '%s' " "does not support live migration", bdrv_get_device_or_node_name(bs)); -ret = migrate_add_blocker(s->migration_blocker, errp); +ret = migrate_add_blocke
Re: [PATCH V9 00/46] Live Update
Good morning, On 7/10/23 10:10, Steven Sistare wrote: On 6/12/2023 10:59 AM, Michael Galaxy wrote: Hi Steve, On 6/7/23 12:37, Steven Sistare wrote: On 6/7/2023 11:55 AM, Michael Galaxy wrote: Another option could be to expose "-migrate-mode-disable" (instead of enable) and just enable all 3 modes by default, since we are already required to switch from "normal" mode to a CPR-specific mode when it is time to do a live update, if the intention is to preserve the capability to completely prevent a running QEMU from using these modes before the VM starts up. - Michael On 6/6/23 17:15, Michael Galaxy wrote: Hi Steve, In the current design you have, we have to specify both the command line parameter "-migrate-mode-enable cpr-reboot" *and* issue the monitor command "migrate_set_parameter mode cpr-${mode}". Is it possible to opt-in to the CPR mode just once over the monitor instead of having to specify it twice on the command line? This would also match the live migration model: You do not need to necessarily "opt in" to live migration mode through a command line parameter, you simply request it when you need to. Can CPR behave the same way? This would also make switching over to a CPR-capable version of QEMU much simpler and would even make it work for existing libvirt-managed guests as their command line parameters would no longer need to change. This would allow us to simply power-off and power-on existing VMs to make them CPR-capable and then work on a libvirt patch later when we're ready to do so. Comments? Hi Michael, Requiring -migrate-enable-mode allows qemu to initialize objects differently, if necessary, so that migration for a mode is not blocked. See callers of migrate_mode_enabled. There is only one so far, in ram_block_add. If the mode is cpr-exec, then it creates anonymous ram blocks using memfd_create, else using MAP_ANON. In the V7 series, this was controlled by a '-machine memfd-alloc=on' option. migrate-enable-mode is more future proof for the user. If something new must initialize differently to support cpr, then it adds a call to migrate_mode_enabled, and the command line remains the same. However, I could be persuaded to go either way. OK, so it is cpr-exec that needs this option (because of ram block allocation), not really cpr-reboot. Could the option then be made to only be required for cpr-exec and not cpr-reboot, then, since cpr-reboot doesn't require that consideration? In a different forum Juan said this is a memory issue, so it should be expressed as a memory related option. So, I will delete -migrate-enable-mode and revert back to -machine memfd-alloc, as defined in the V7 patch series. Acknowledged. I'm going to try to get my reviewed-by's in soon. Sorry I haven't done it sooner. We've finished testing these patches on our systems and are moving forward. A secondary reason for -migrate-enable-mode is to support the only-cpr-capable option. It needs to know which mode will be used, in order to check a mode-specific blocker list. Still, only-cpr-capable is also optional. If and only if one needs this option, the mode could be specified as part of the option itself, rather than requiring an extra command line parameter, no? Yes, I will make that change. - Steve Acknowledged.
Re: [PATCH V9 00/46] Live Update
Hi Steve, On 6/7/23 12:37, Steven Sistare wrote: On 6/7/2023 11:55 AM, Michael Galaxy wrote: Another option could be to expose "-migrate-mode-disable" (instead of enable) and just enable all 3 modes by default, since we are already required to switch from "normal" mode to a CPR-specific mode when it is time to do a live update, if the intention is to preserve the capability to completely prevent a running QEMU from using these modes before the VM starts up. - Michael On 6/6/23 17:15, Michael Galaxy wrote: Hi Steve, In the current design you have, we have to specify both the command line parameter "-migrate-mode-enable cpr-reboot" *and* issue the monitor command "migrate_set_parameter mode cpr-${mode}". Is it possible to opt-in to the CPR mode just once over the monitor instead of having to specify it twice on the command line? This would also match the live migration model: You do not need to necessarily "opt in" to live migration mode through a command line parameter, you simply request it when you need to. Can CPR behave the same way? This would also make switching over to a CPR-capable version of QEMU much simpler and would even make it work for existing libvirt-managed guests as their command line parameters would no longer need to change. This would allow us to simply power-off and power-on existing VMs to make them CPR-capable and then work on a libvirt patch later when we're ready to do so. Comments? Hi Michael, Requiring -migrate-enable-mode allows qemu to initialize objects differently, if necessary, so that migration for a mode is not blocked. See callers of migrate_mode_enabled. There is only one so far, in ram_block_add. If the mode is cpr-exec, then it creates anonymous ram blocks using memfd_create, else using MAP_ANON. In the V7 series, this was controlled by a '-machine memfd-alloc=on' option. migrate-enable-mode is more future proof for the user. If something new must initialize differently to support cpr, then it adds a call to migrate_mode_enabled, and the command line remains the same. However, I could be persuaded to go either way. OK, so it is cpr-exec that needs this option (because of ram block allocation), not really cpr-reboot. Could the option then be made to only be required for cpr-exec and not cpr-reboot, then, since cpr-reboot doesn't require that consideration? A secondary reason for -migrate-enable-mode is to support the only-cpr-capable option. It needs to know which mode will be used, in order to check a mode-specific blocker list. Still, only-cpr-capable is also optional. If and only if one needs this option, the mode could be specified as part of the option itself, rather than requiring an extra command line parameter, no? Further, in many clouds (including ours), our VM-management is generational with the development of the software versioning, so we *always* run tests and know whether or not a VM is CPR-capable. If it is not CPR-capable, we would never run the VM in the first place, which means we would never really use that option at all. I do see the appeal of the option, but could we simplify it, since it is opt-in? - Michael - Steve
Re: [PATCH V9 00/46] Live Update
Another option could be to expose "-migrate-mode-disable" (instead of enable) and just enable all 3 modes by default, since we are already required to switch from "normal" mode to a CPR-specific mode when it is time to do a live update, if the intention is to preserve the capability to completely prevent a running QEMU from using these modes before the VM starts up. - Michael On 6/6/23 17:15, Michael Galaxy wrote: Hi Steve, In the current design you have, we have to specify both the command line parameter "-migrate-mode-enable cpr-reboot" *and* issue the monitor command "migrate_set_parameter mode cpr-${mode}". Is it possible to opt-in to the CPR mode just once over the monitor instead of having to specify it twice on the command line? This would also match the live migration model: You do not need to necessarily "opt in" to live migration mode through a command line parameter, you simply request it when you need to. Can CPR behave the same way? This would also make switching over to a CPR-capable version of QEMU much simpler and would even make it work for existing libvirt-managed guests as their command line parameters would no longer need to change. This would allow us to simply power-off and power-on existing VMs to make them CPR-capable and then work on a libvirt patch later when we're ready to do so. Comments? - Michael On 12/7/22 09:48, Steven Sistare wrote: This series desperately needs review in its intersection with live migration. The code in other areas has been reviewed and revised multiple times -- thank you! David, Juan, can you spare some time to review this? I have done my best to order the patches logically (see the labelled groups in this email), and to provide complete and clear cover letter and commit messages. Can I do anything to facilitate, like doing a code walk through via zoom? And of course, I welcome anyone's feedback. Here is the original posting. https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/1658851843-236870-1-git-send-email-steven.sist...@oracle.com/__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3EAFRUVu$ - Steve On 7/26/2022 12:09 PM, Steve Sistare wrote: This version of the live update patch series integrates live update into the live migration framework. The new interfaces are: * mode (migration parameter) * cpr-exec-args (migration parameter) * file (migration URI) * migrate-mode-enable (command-line argument) * only-cpr-capable (command-line argument) Provide the cpr-exec and cpr-reboot migration modes for live update. These save and restore VM state, with minimal guest pause time, so that qemu may be updated to a new version in between. The caller sets the mode parameter before invoking the migrate or migrate-incoming commands. In cpr-reboot mode, the migrate command saves state to a file, allowing one to quit qemu, reboot to an updated kernel, start an updated version of qemu, and resume via the migrate-incoming command. The caller must specify a migration URI that writes to and reads from a file. Unlike normal mode, the use of certain local storage options does not block the migration, but the caller must not modify guest block devices between the quit and restart. The guest RAM memory-backend must be shared, and the @x-ignore-shared migration capability must be set, to avoid saving it to the file. Guest RAM must be non-volatile across reboot, which can be achieved by backing it with a dax device, or /dev/shm PKRAM as proposed in https://urldefense.com/v3/__https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3AKRJQux$ but this is not enforced. The restarted qemu arguments must match those used to initially start qemu, plus the -incoming option. The reboot mode supports vfio devices if the caller first suspends the guest, such as by issuing guest-suspend-ram to the qemu guest agent. The guest drivers' suspend methods flush outstanding requests and re-initialize the devices, and thus there is no device state to save and restore. After issuing migrate-incoming, the caller must issue a system_wakeup command to resume. In cpr-exec mode, the migrate command saves state to a file and directly exec's a new version of qemu on the same host, replacing the original process while retaining its PID. The caller must specify a migration URI that writes to and reads from a file, and resumes execution via the migrate-incoming command. Arguments for the new qemu process are taken from the cpr-exec-args migration parameter, and must include the -incoming option. Guest RAM must be backed by a memory backend with share=on, but cannot be memory-backend-ram. The memory is re-mmap'd in the updated process, so guest ram is efficiently preserved in place, albeit with new virtual
Re: [PATCH V9 00/46] Live Update
Hi Steve, In the current design you have, we have to specify both the command line parameter "-migrate-mode-enable cpr-reboot" *and* issue the monitor command "migrate_set_parameter mode cpr-${mode}". Is it possible to opt-in to the CPR mode just once over the monitor instead of having to specify it twice on the command line? This would also match the live migration model: You do not need to necessarily "opt in" to live migration mode through a command line parameter, you simply request it when you need to. Can CPR behave the same way? This would also make switching over to a CPR-capable version of QEMU much simpler and would even make it work for existing libvirt-managed guests as their command line parameters would no longer need to change. This would allow us to simply power-off and power-on existing VMs to make them CPR-capable and then work on a libvirt patch later when we're ready to do so. Comments? - Michael On 12/7/22 09:48, Steven Sistare wrote: This series desperately needs review in its intersection with live migration. The code in other areas has been reviewed and revised multiple times -- thank you! David, Juan, can you spare some time to review this? I have done my best to order the patches logically (see the labelled groups in this email), and to provide complete and clear cover letter and commit messages. Can I do anything to facilitate, like doing a code walk through via zoom? And of course, I welcome anyone's feedback. Here is the original posting. https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/1658851843-236870-1-git-send-email-steven.sist...@oracle.com/__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3EAFRUVu$ - Steve On 7/26/2022 12:09 PM, Steve Sistare wrote: This version of the live update patch series integrates live update into the live migration framework. The new interfaces are: * mode (migration parameter) * cpr-exec-args (migration parameter) * file (migration URI) * migrate-mode-enable (command-line argument) * only-cpr-capable (command-line argument) Provide the cpr-exec and cpr-reboot migration modes for live update. These save and restore VM state, with minimal guest pause time, so that qemu may be updated to a new version in between. The caller sets the mode parameter before invoking the migrate or migrate-incoming commands. In cpr-reboot mode, the migrate command saves state to a file, allowing one to quit qemu, reboot to an updated kernel, start an updated version of qemu, and resume via the migrate-incoming command. The caller must specify a migration URI that writes to and reads from a file. Unlike normal mode, the use of certain local storage options does not block the migration, but the caller must not modify guest block devices between the quit and restart. The guest RAM memory-backend must be shared, and the @x-ignore-shared migration capability must be set, to avoid saving it to the file. Guest RAM must be non-volatile across reboot, which can be achieved by backing it with a dax device, or /dev/shm PKRAM as proposed in https://urldefense.com/v3/__https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3AKRJQux$ but this is not enforced. The restarted qemu arguments must match those used to initially start qemu, plus the -incoming option. The reboot mode supports vfio devices if the caller first suspends the guest, such as by issuing guest-suspend-ram to the qemu guest agent. The guest drivers' suspend methods flush outstanding requests and re-initialize the devices, and thus there is no device state to save and restore. After issuing migrate-incoming, the caller must issue a system_wakeup command to resume. In cpr-exec mode, the migrate command saves state to a file and directly exec's a new version of qemu on the same host, replacing the original process while retaining its PID. The caller must specify a migration URI that writes to and reads from a file, and resumes execution via the migrate-incoming command. Arguments for the new qemu process are taken from the cpr-exec-args migration parameter, and must include the -incoming option. Guest RAM must be backed by a memory backend with share=on, but cannot be memory-backend-ram. The memory is re-mmap'd in the updated process, so guest ram is efficiently preserved in place, albeit with new virtual addresses. In addition, the '-migrate-mode-enable cpr-exec' option is required. This causes secondary guest ram blocks (those not specified on the command line) to be allocated by mmap'ing a memfd. The memfds are kept open across exec, their values are saved in special cpr state which is retrieved after exec, and they are re-mmap'd. Since guest RAM is not copied, and storage blocks are not migrated, the caller must disable all capabilities related to page and block
Re: [PATCH V9 00/46] Live Update
Greetings, For what its worth, our team has been aggressively testing this patch series and to date have not found any deficiencies or bottlenecks whatsoever. Overall, with a very well-tuned system (and linux kernel), we are getting live update downtimes just above 15 seconds, and the vast majority of that downtime is kind of wasted time during kexec (particularly with PCI device probing, if anybody has any tips on that) and has nothing to do whatsoever with the QEMU-side of things. (We use this patchset in a PMEM-based configuration which has been working out extremely well so far). We also tested "back-to-back" live updates as you would expect to do in production. Overall we've done thousands of live updates over the past few months on many different types of hardware without failure. Steven has been really responsive in answering some of our usability questions and we were able to fix those issues. We will continue our testing throughout the year with more heavily-loaded workloads, but all in all we would very much be interested in seeing further reviews on this patch series from others. * *--- Tested-by: Michael Galaxy On 12/7/22 09:48, Steven Sistare wrote: This series desperately needs review in its intersection with live migration. The code in other areas has been reviewed and revised multiple times -- thank you! David, Juan, can you spare some time to review this? I have done my best to order the patches logically (see the labelled groups in this email), and to provide complete and clear cover letter and commit messages. Can I do anything to facilitate, like doing a code walk through via zoom? And of course, I welcome anyone's feedback. Here is the original posting. https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/1658851843-236870-1-git-send-email-steven.sist...@oracle.com/__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3EAFRUVu$ - Steve On 7/26/2022 12:09 PM, Steve Sistare wrote: This version of the live update patch series integrates live update into the live migration framework. The new interfaces are: * mode (migration parameter) * cpr-exec-args (migration parameter) * file (migration URI) * migrate-mode-enable (command-line argument) * only-cpr-capable (command-line argument) Provide the cpr-exec and cpr-reboot migration modes for live update. These save and restore VM state, with minimal guest pause time, so that qemu may be updated to a new version in between. The caller sets the mode parameter before invoking the migrate or migrate-incoming commands. In cpr-reboot mode, the migrate command saves state to a file, allowing one to quit qemu, reboot to an updated kernel, start an updated version of qemu, and resume via the migrate-incoming command. The caller must specify a migration URI that writes to and reads from a file. Unlike normal mode, the use of certain local storage options does not block the migration, but the caller must not modify guest block devices between the quit and restart. The guest RAM memory-backend must be shared, and the @x-ignore-shared migration capability must be set, to avoid saving it to the file. Guest RAM must be non-volatile across reboot, which can be achieved by backing it with a dax device, or /dev/shm PKRAM as proposed in https://urldefense.com/v3/__https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3AKRJQux$ but this is not enforced. The restarted qemu arguments must match those used to initially start qemu, plus the -incoming option. The reboot mode supports vfio devices if the caller first suspends the guest, such as by issuing guest-suspend-ram to the qemu guest agent. The guest drivers' suspend methods flush outstanding requests and re-initialize the devices, and thus there is no device state to save and restore. After issuing migrate-incoming, the caller must issue a system_wakeup command to resume. In cpr-exec mode, the migrate command saves state to a file and directly exec's a new version of qemu on the same host, replacing the original process while retaining its PID. The caller must specify a migration URI that writes to and reads from a file, and resumes execution via the migrate-incoming command. Arguments for the new qemu process are taken from the cpr-exec-args migration parameter, and must include the -incoming option. Guest RAM must be backed by a memory backend with share=on, but cannot be memory-backend-ram. The memory is re-mmap'd in the updated process, so guest ram is efficiently preserved in place, albeit with new virtual addresses. In addition, the '-migrate-mode-enable cpr-exec' option is required. This causes secondary guest ram blocks (those not specified on the command line) to be allocated by mmap'ing a memfd. The memfds are kept open across exec, th
Re: [PATCH V9 00/46] Live Update
Hey Steven, Have you done any "back-to-back" live update testing before? I am still doing extensive testing on this myself. I am running into a bug that I have not yet diagnosed. It involves the following: 1. Perform a live update (I'm using kexec + PMEM-based live updates). => VM comes back just fine. All is well. 2. Using the *same* QEMU instance: Try to perform another live update agaig, this results in the following error: monitor> migrate_incoming file:qemu.sav qemu-system-x86_64: check_section_footer: Read section footer failed: -5 qemu-system-x86_64: load of migration failed: Invalid argument I'm going to start diving into the code soon, but just thought I would ask first. - Michael On 12/7/22 09:48, Steven Sistare wrote: This series desperately needs review in its intersection with live migration. The code in other areas has been reviewed and revised multiple times -- thank you! David, Juan, can you spare some time to review this? I have done my best to order the patches logically (see the labelled groups in this email), and to provide complete and clear cover letter and commit messages. Can I do anything to facilitate, like doing a code walk through via zoom? And of course, I welcome anyone's feedback. Here is the original posting. https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/1658851843-236870-1-git-send-email-steven.sist...@oracle.com/__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3EAFRUVu$ - Steve On 7/26/2022 12:09 PM, Steve Sistare wrote: This version of the live update patch series integrates live update into the live migration framework. The new interfaces are: * mode (migration parameter) * cpr-exec-args (migration parameter) * file (migration URI) * migrate-mode-enable (command-line argument) * only-cpr-capable (command-line argument) Provide the cpr-exec and cpr-reboot migration modes for live update. These save and restore VM state, with minimal guest pause time, so that qemu may be updated to a new version in between. The caller sets the mode parameter before invoking the migrate or migrate-incoming commands. In cpr-reboot mode, the migrate command saves state to a file, allowing one to quit qemu, reboot to an updated kernel, start an updated version of qemu, and resume via the migrate-incoming command. The caller must specify a migration URI that writes to and reads from a file. Unlike normal mode, the use of certain local storage options does not block the migration, but the caller must not modify guest block devices between the quit and restart. The guest RAM memory-backend must be shared, and the @x-ignore-shared migration capability must be set, to avoid saving it to the file. Guest RAM must be non-volatile across reboot, which can be achieved by backing it with a dax device, or /dev/shm PKRAM as proposed in https://urldefense.com/v3/__https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3AKRJQux$ but this is not enforced. The restarted qemu arguments must match those used to initially start qemu, plus the -incoming option. The reboot mode supports vfio devices if the caller first suspends the guest, such as by issuing guest-suspend-ram to the qemu guest agent. The guest drivers' suspend methods flush outstanding requests and re-initialize the devices, and thus there is no device state to save and restore. After issuing migrate-incoming, the caller must issue a system_wakeup command to resume. In cpr-exec mode, the migrate command saves state to a file and directly exec's a new version of qemu on the same host, replacing the original process while retaining its PID. The caller must specify a migration URI that writes to and reads from a file, and resumes execution via the migrate-incoming command. Arguments for the new qemu process are taken from the cpr-exec-args migration parameter, and must include the -incoming option. Guest RAM must be backed by a memory backend with share=on, but cannot be memory-backend-ram. The memory is re-mmap'd in the updated process, so guest ram is efficiently preserved in place, albeit with new virtual addresses. In addition, the '-migrate-mode-enable cpr-exec' option is required. This causes secondary guest ram blocks (those not specified on the command line) to be allocated by mmap'ing a memfd. The memfds are kept open across exec, their values are saved in special cpr state which is retrieved after exec, and they are re-mmap'd. Since guest RAM is not copied, and storage blocks are not migrated, the caller must disable all capabilities related to page and block copy. The implementation ignores all related parameters. The exec mode supports vfio devices by preserving the vfio container, group, device, and event descriptors across the qemu re-exec, and by updating DMA mapping virtual addresses