Re: CPR/liveupdate: test results using prior bug fix

2024-05-17 Thread Michael Galaxy

OK, acknowledged. Thanks, All.

- Michael

On 5/16/24 13:07, Steven Sistare wrote:

On 5/16/2024 1:24 PM, Michael Galaxy wrote:

On 5/14/24 08:54, Michael Tokarev wrote:

On 5/14/24 16:39, Michael Galaxy wrote:

Steve,

OK, so it does not look like this bugfix you wrote was included in 
8.2.4 (which was released yesterday). Unfortunately, that means 
that anyone using CPR in that release will still (eventually) 
encounter the bug like I did.


8.2.4 is basically a "bugfix" release for 8.2.3 which I somewhat
screwed up (in a minor way), plus a few currently (at the time)
queued up changes.   8.2.3 was a big release though.

I would recommend that y'all consider cherry-picking, perhaps, the 
relevant commits for a possible 8.2.5 ?


Please Cc changes which are relevant for -stable to, well,
qemu-sta...@nongnu.org :)

Which changes needs to be picked up?

Steve, can you comment here, please? At a minimum, we have this one: 
[PULL 20/25] migration: stop vm for cpr


But that pull came with a handful of other changes that are also not 
in QEMU v8, so I suspect I'm missing some other important changes 
that might be important for a stable release?


- Michael


Hi Michael, I sent the full list of commits to this distribution 
yesterday, and

I see it in my Sent email folder.  Copying verbatim:

----
Michael Galaxy, I'm afraid you are out of luck with respect to qemu 8.2.
It has some of the cpr reboot commits, but is missing the following:

87a2848 migration: massage cpr-reboot documentation
cbdafc1 migration: options incompatible with cpr
ce5db1c migration: update cpr-reboot description
9867d4d migration: stop vm for cpr
4af667f migration: notifier error checking
bf78a04 migration: refactor migrate_fd_connect failures
6835f5a migration: per-mode notifiers
5663dd3 migration: MigrationNotifyFunc
c763a23e migration: remove postcopy_after_devices
9d9babf migration: MigrationEvent for notifiers
3e77573 migration: convert to NotifierWithReturn
d91f33c migration: remove error from notifier data
be19d83 notify: pass error to notifier with return
b12635f migration: fix coverity migrate_mode finding
2b58a8b tests/qtest: postcopy migration with suspend
b1fdd21 tests/qtest: precopy migration with suspend
5014478 tests/qtest: option to suspend during migration
f064975 tests/qtest: migration events
49a5020 migration: preserve suspended for bg_migration
58b1057 migration: preserve suspended for snapshot
b4e9ddc migration: preserve suspended runstate
d3c86c99 migration: propagate suspended runstate
9ff5e79 cpus: vm_resume
0f1db06 cpus: check running not RUN_STATE_RUNNING
b9ae473 cpus: stop vm in suspended runstate
f06f316 cpus: vm_was_suspended

All of those landed in qemu 9.0.
---

- Steve




Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-16 Thread Michael Galaxy

These are very compelling results, no?

(40gbps cards, right? Are the cards active/active? or active/standby?)

- Michael

On 5/14/24 10:19, Yu Zhang wrote:

Hello Peter and all,

I did a comparison of the VM live-migration speeds between RDMA and
TCP/IP on our servers
and plotted the results to get an initial impression. Unfortunately,
the Ethernet NICs are not the
recent ones, therefore, it may not make much sense. I can do it on
servers with more recent Ethernet
NICs and keep you updated.

It seems that the benefits of RDMA becomes obviously when the VM has
large memory and is
running memory-intensive workload.

Best regards,
Yu Zhang @ IONOS Cloud

On Thu, May 9, 2024 at 4:14 PM Peter Xu  wrote:

On Thu, May 09, 2024 at 04:58:34PM +0800, Zheng Chuan via wrote:

That's a good news to see the socket abstraction for RDMA!
When I was developed the series above, the most pain is the RDMA migration has 
no QIOChannel abstraction and i need to take a 'fake channel'
for it which is awkward in code implementation.
So, as far as I know, we can do this by
i. the first thing is that we need to evaluate the rsocket is good enough to 
satisfy our QIOChannel fundamental abstraction
ii. if it works right, then we will continue to see if it can give us 
opportunity to hide the detail of rdma protocol
 into rsocket by remove most of code in rdma.c and also some hack in 
migration main process.
iii. implement the advanced features like multi-fd and multi-uri for rdma 
migration.

Since I am not familiar with rsocket, I need some times to look at it and do 
some quick verify with rdma migration based on rsocket.
But, yes, I am willing to involved in this refactor work and to see if we can 
make this migration feature more better:)

Based on what we have now, it looks like we'd better halt the deprecation
process a bit, so I think we shouldn't need to rush it at least in 9.1
then, and we'll need to see how it goes on the refactoring.

It'll be perfect if rsocket works, otherwise supporting multifd with little
overhead / exported APIs would also be a good thing in general with
whatever approach.  And obviously all based on the facts that we can get
resources from companies to support this feature first.

Note that so far nobody yet compared with rdma v.s. nic perf, so I hope if
any of us can provide some test results please do so.  Many people are
saying RDMA is better, but I yet didn't see any numbers comparing it with
modern TCP networks.  I don't want to have old impressions floating around
even if things might have changed..  When we have consolidated results, we
should share them out and also reflect that in QEMU's migration docs when a
rdma document page is ready.

Chuan, please check the whole thread discussion, it may help to understand
what we are looking for on rdma migrations [1].  Meanwhile please feel free
to sync with Jinpu's team and see how to move forward with such a project.

[1] 
https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/87frwatp7n@suse.de/__;!!GjvTz_vk!QnXDo1zSlYecz7JvJky4SOQ9I8V5MoGHbINdAQAzMJQ_yYg_8_BSUXz9kjvbSgFefhG0wi1j38KaC3g$

Thanks,

--
Peter Xu





Re: CPR/liveupdate: test results using prior bug fix

2024-05-16 Thread Michael Galaxy



On 5/14/24 08:54, Michael Tokarev wrote:

On 5/14/24 16:39, Michael Galaxy wrote:

Steve,

OK, so it does not look like this bugfix you wrote was included in 
8.2.4 (which was released yesterday). Unfortunately, that means that 
anyone using CPR in that release will still (eventually) encounter 
the bug like I did.


8.2.4 is basically a "bugfix" release for 8.2.3 which I somewhat
screwed up (in a minor way), plus a few currently (at the time)
queued up changes.   8.2.3 was a big release though.

I would recommend that y'all consider cherry-picking, perhaps, the 
relevant commits for a possible 8.2.5 ?


Please Cc changes which are relevant for -stable to, well,
qemu-sta...@nongnu.org :)

Which changes needs to be picked up?

Steve, can you comment here, please? At a minimum, we have this one: 
[PULL 20/25] migration: stop vm for cpr


But that pull came with a handful of other changes that are also not in 
QEMU v8, so I suspect I'm missing some other important changes that 
might be important for a stable release?


- Michael



Thanks,

/mjt




Re: CPR/liveupdate: test results using prior bug fix

2024-05-14 Thread Michael Galaxy

Steve,

OK, so it does not look like this bugfix you wrote was included in 8.2.4 
(which was released yesterday). Unfortunately, that means that anyone 
using CPR in that release will still (eventually) encounter the bug like 
I did.


I would recommend that y'all consider cherry-picking, perhaps, the 
relevant commits for a possible 8.2.5 ?


- Michael

On 5/13/24 20:15, Michael Galaxy wrote:

Hi Steve,

Thanks for the response.

It looks like literally *just today* 8.2.4 was released. I'll go check 
it out.


- Michael

On 5/13/24 15:10, Steven Sistare wrote:

Hi Michael,
  No surprise here, I did see some of the same failure messages and they
prompted me to submit the fix.  They are all symptoms of "the 
possibility of

ram and device state being out of sync" as mentioned in the commit.

I am not familiar with the process for maintaining old releases for 
qemu.

Perhaps someone on this list can comment on 8.2.3.

- Steve

On 5/13/2024 2:22 PM, Michael Galaxy wrote:

Hi Steve,

We found that this specific change in particular ("migration: stop 
vm for cpr") fixes a bug that we've identified in testing 
back-to-back live updates in a lab environment.


More specifically, *without* this change (which is not available in 
8.2.2, but *is* available in 9.0.0) causes the metadata save file to 
be corrupted when doing live updates one after another. Typically we 
see a corrupted save file somewhere in between 20 and 30 live 
updates and while doing a git bisect, we found that this change 
makes the problem go away.


Were you aware? Is there any plan in place to cherry pick this for 
8.2.3, perhaps or a plan to release 8.2.3 at some point?


Here are some examples of how the bug manifests in different 
locations of the QEMU metadata save file:


2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load mtrr_var:base
2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load 
cpu:env.mtrr_var
2024-04-26T13:28:53Z qemu-system-x86_64: error while loading state 
for instance 0x1b of device 'cpu'
2024-04-26T13:28:53Z qemu-system-x86_64: load of migration failed: 
Input/output error


And another:

2024-04-17T16:09:47Z qemu-system-x86_64: check_section_footer: Read 
section footer failed: -5
2024-04-17T16:09:47Z qemu-system-x86_64: load of migration failed: 
Invalid argument


And another:

2024-04-30T21:53:29Z qemu-system-x86_64: Unable to read ID string 
for section 163
2024-04-30T21:53:29Z qemu-system-x86_64: load of migration failed: 
Invalid argument


And another:

2024-05-01T16:01:44Z qemu-system-x86_64: Unable to read ID string 
for section 164
2024-05-01T16:01:44Z qemu-system-x86_64: load of migration failed: 
Invalid argument


As you can see, they occur quite randomly, but generally it takes at 
least 20-30+ live updates before the problem occurs.


- Michael

On 2/27/24 23:13, pet...@redhat.com wrote:

From: Steve Sistare

When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because 
qmp_cont

rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare
Reviewed-by: Peter Xu
Link:https://urldefense.com/v3/__https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com__;!!GjvTz_vk!QLsFOCX-x2U9bzAo98SdidKlomHrmf_t0UmQKtgudoIcaDVoAJOPm39ZqaNP_nT5I8QqVfSgwhDZmg$ 
Signed-off-by: Peter Xu

---
  include/migration/misc.h |  1 +
  migration/migration.h    |  2 --
  migration/migration.c    | 51 


  3 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e4933b815b..5d1aa593ed 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,6 +60,7 @@ void migration_object_init(void);
  void migration_shutdown(void);
  bool migration_is_idle(void);
  bool migration_is_active(MigrationState *);
+bool migrate_mode_is_cpr(MigrationState *);
    typedef enum MigrationEventType {
  MIG_EVENT_PRECOPY_SETUP,
diff --git a/migration/migration.h b/migration/migration.h
index aef8afbe1f..65c0b61cbd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s);
   */
  void migration_rp_kick(MigrationState *s);
  -int migration_stop_vm(RunState state);
-
  #endif
diff --git a/migration/migration.c b/migration/migration.c
index 37c836b0b0..90a90947fb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,11 +167,19 @@ static gint 
page_request_addr_cmp(gconstpointer ap, gconstpointer bp)

  return (a > b) - (a < b);
  }
  -int migration_stop_vm(RunState state)
+static int migration_stop_vm(MigrationState *s, RunState state)
  {
-    int ret = vm_stop_force_state(state);
+    int ret;
+
+    migration_downtime_start(s);
+
+    s-&g

Re: CPR/liveupdate: test results using prior bug fix

2024-05-13 Thread Michael Galaxy

Hi Steve,

Thanks for the response.

It looks like literally *just today* 8.2.4 was released. I'll go check 
it out.


- Michael

On 5/13/24 15:10, Steven Sistare wrote:

Hi Michael,
  No surprise here, I did see some of the same failure messages and they
prompted me to submit the fix.  They are all symptoms of "the 
possibility of

ram and device state being out of sync" as mentioned in the commit.

I am not familiar with the process for maintaining old releases for qemu.
Perhaps someone on this list can comment on 8.2.3.

- Steve

On 5/13/2024 2:22 PM, Michael Galaxy wrote:

Hi Steve,

We found that this specific change in particular ("migration: stop vm 
for cpr") fixes a bug that we've identified in testing back-to-back 
live updates in a lab environment.


More specifically, *without* this change (which is not available in 
8.2.2, but *is* available in 9.0.0) causes the metadata save file to 
be corrupted when doing live updates one after another. Typically we 
see a corrupted save file somewhere in between 20 and 30 live updates 
and while doing a git bisect, we found that this change makes the 
problem go away.


Were you aware? Is there any plan in place to cherry pick this for 
8.2.3, perhaps or a plan to release 8.2.3 at some point?


Here are some examples of how the bug manifests in different 
locations of the QEMU metadata save file:


2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load mtrr_var:base
2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load cpu:env.mtrr_var
2024-04-26T13:28:53Z qemu-system-x86_64: error while loading state 
for instance 0x1b of device 'cpu'
2024-04-26T13:28:53Z qemu-system-x86_64: load of migration failed: 
Input/output error


And another:

2024-04-17T16:09:47Z qemu-system-x86_64: check_section_footer: Read 
section footer failed: -5
2024-04-17T16:09:47Z qemu-system-x86_64: load of migration failed: 
Invalid argument


And another:

2024-04-30T21:53:29Z qemu-system-x86_64: Unable to read ID string for 
section 163
2024-04-30T21:53:29Z qemu-system-x86_64: load of migration failed: 
Invalid argument


And another:

2024-05-01T16:01:44Z qemu-system-x86_64: Unable to read ID string for 
section 164
2024-05-01T16:01:44Z qemu-system-x86_64: load of migration failed: 
Invalid argument


As you can see, they occur quite randomly, but generally it takes at 
least 20-30+ live updates before the problem occurs.


- Michael

On 2/27/24 23:13, pet...@redhat.com wrote:

From: Steve Sistare

When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because qmp_cont
rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare
Reviewed-by: Peter Xu
Link:https://urldefense.com/v3/__https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com__;!!GjvTz_vk!QLsFOCX-x2U9bzAo98SdidKlomHrmf_t0UmQKtgudoIcaDVoAJOPm39ZqaNP_nT5I8QqVfSgwhDZmg$ 
Signed-off-by: Peter Xu

---
  include/migration/misc.h |  1 +
  migration/migration.h    |  2 --
  migration/migration.c    | 51 


  3 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e4933b815b..5d1aa593ed 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,6 +60,7 @@ void migration_object_init(void);
  void migration_shutdown(void);
  bool migration_is_idle(void);
  bool migration_is_active(MigrationState *);
+bool migrate_mode_is_cpr(MigrationState *);
    typedef enum MigrationEventType {
  MIG_EVENT_PRECOPY_SETUP,
diff --git a/migration/migration.h b/migration/migration.h
index aef8afbe1f..65c0b61cbd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s);
   */
  void migration_rp_kick(MigrationState *s);
  -int migration_stop_vm(RunState state);
-
  #endif
diff --git a/migration/migration.c b/migration/migration.c
index 37c836b0b0..90a90947fb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,11 +167,19 @@ static gint 
page_request_addr_cmp(gconstpointer ap, gconstpointer bp)

  return (a > b) - (a < b);
  }
  -int migration_stop_vm(RunState state)
+static int migration_stop_vm(MigrationState *s, RunState state)
  {
-    int ret = vm_stop_force_state(state);
+    int ret;
+
+    migration_downtime_start(s);
+
+    s->vm_old_state = runstate_get();
+    global_state_store();
+
+    ret = vm_stop_force_state(state);
    trace_vmstate_downtime_checkpoint("src-vm-stopped");
+    trace_migration_completion_vm_stop(ret);
    return ret;
  }
@@ -1602,6 +1610,11 @@ bool migration_is_active(MigrationState *s)
  s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
  }
  +bool migrate_mode_is_cpr(MigrationS

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-13 Thread Michael Galaxy
One thing to keep in mind here (despite me not having any hardware to 
test) was that one of the original goals here
in the RDMA implementation was not simply raw throughput nor raw 
latency, but a lack of CPU utilization in kernel
space due to the offload. While it is entirely possible that newer 
hardware w/ TCP might compete, the significant

reductions in CPU usage in the TCP/IP stack were a big win at the time.

Just something to consider while you're doing the testing

- Michael

On 5/9/24 03:58, Zheng Chuan wrote:

Hi, Peter,Lei,Jinpu.

On 2024/5/8 0:28, Peter Xu wrote:

On Tue, May 07, 2024 at 01:50:43AM +, Gonglei (Arei) wrote:

Hello,


-Original Message-
From: Peter Xu [mailto:pet...@redhat.com]
Sent: Monday, May 6, 2024 11:18 PM
To: Gonglei (Arei) 
Cc: Daniel P. Berrangé ; Markus Armbruster
; Michael Galaxy ; Yu Zhang
; Zhijian Li (Fujitsu) ; Jinpu Wang
; Elmar Gerdes ;
qemu-devel@nongnu.org; Yuval Shaia ; Kevin Wolf
; Prasanna Kumar Kalever
; Cornelia Huck ;
Michael Roth ; Prasanna Kumar Kalever
; integrat...@gluster.org; Paolo Bonzini
; qemu-bl...@nongnu.org; de...@lists.libvirt.org;
Hanna Reitz ; Michael S. Tsirkin ;
Thomas Huth ; Eric Blake ; Song
Gao ; Marc-André Lureau
; Alex Bennée ;
Wainer dos Santos Moschetta ; Beraldo Leal
; Pannengyuan ;
Xiexiangyou 
Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

On Mon, May 06, 2024 at 02:06:28AM +, Gonglei (Arei) wrote:

Hi, Peter

Hey, Lei,

Happy to see you around again after years.


Haha, me too.


RDMA features high bandwidth, low latency (in non-blocking lossless
network), and direct remote memory access by bypassing the CPU (As you
know, CPU resources are expensive for cloud vendors, which is one of
the reasons why we introduced offload cards.), which TCP does not have.

It's another cost to use offload cards, v.s. preparing more cpu resources?


Software and hardware offload converged architecture is the way to go for all 
cloud vendors
(Including comprehensive benefits in terms of performance, cost, security, and 
innovation speed),
it's not just a matter of adding the resource of a DPU card.


In some scenarios where fast live migration is needed (extremely short
interruption duration and migration duration) is very useful. To this
end, we have also developed RDMA support for multifd.

Will any of you upstream that work?  I'm curious how intrusive would it be
when adding it to multifd, if it can keep only 5 exported functions like what
rdma.h does right now it'll be pretty nice.  We also want to make sure it works
with arbitrary sized loads and buffers, e.g. vfio is considering to add IO 
loads to
multifd channels too.


In fact, we sent the patchset to the community in 2021. Pls see:
https://urldefense.com/v3/__https://lore.kernel.org/all/20210203185906.GT2950@work-vm/T/__;!!GjvTz_vk!VfP_SV-8uRya7rBdopv8OUJkmnSi44Ktpqq1E7sr_Xcwt6zvveW51qboWOBSTChdUG1hJwfAl7HZl4NUEGc$

Yes, I have sent the patchset of multifd support for rdma migration by taking 
over my colleague, and also
sorry for not keeping on this work at that time due to some reasons.
And also I am strongly agree with Lei that the RDMA protocol has some special 
advantages against with TCP
in some scenario, and we are indeed to use it in our product.


I wasn't aware of that for sure in the past..

Multifd has changed quite a bit in the last 9.0 release, that may not apply
anymore.  One thing to mention is please look at Dan's comment on possible
use of rsocket.h:

https://urldefense.com/v3/__https://lore.kernel.org/all/zjjm6rcqs5eho...@redhat.com/__;!!GjvTz_vk!VfP_SV-8uRya7rBdopv8OUJkmnSi44Ktpqq1E7sr_Xcwt6zvveW51qboWOBSTChdUG1hJwfAl7HZ0CFSE-o$

And Jinpu did help provide an initial test result over the library:

https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/camgffek8wiknqmouyxcathgtiem2dwocf_w7t0vmcd-i30t...@mail.gmail.com/__;!!GjvTz_vk!VfP_SV-8uRya7rBdopv8OUJkmnSi44Ktpqq1E7sr_Xcwt6zvveW51qboWOBSTChdUG1hJwfAl7HZxPNcdb4$

It looks like we have a chance to apply that in QEMU.




One thing to note that the question here is not about a pure performance
comparison between rdma and nics only.  It's about help us make a decision
on whether to drop rdma, iow, even if rdma performs well, the community still
has the right to drop it if nobody can actively work and maintain it.
It's just that if nics can perform as good it's more a reason to drop, unless
companies can help to provide good support and work together.


We are happy to provide the necessary review and maintenance work for RDMA
if the community needs it.

CC'ing Chuan Zheng.

I'm not sure whether you and Jinpu's team would like to work together and
provide a final solution for rdma over multifd.  It could be much simpler
than the original 2021 proposal if the rsocket API will work out.

Thanks,


That's a good news to see the socket abstraction for RDMA!
When I was developed the series above, the most pain is the RDMA migration has 
no QIOChannel

CPR/liveupdate: test results using prior bug fix

2024-05-13 Thread Michael Galaxy

Hi Steve,

We found that this specific change in particular ("migration: stop vm 
for cpr") fixes a bug that we've identified in testing back-to-back live 
updates in a lab environment.


More specifically, *without* this change (which is not available in 
8.2.2, but *is* available in 9.0.0) causes the metadata save file to be 
corrupted when doing live updates one after another. Typically we see a 
corrupted save file somewhere in between 20 and 30 live updates and 
while doing a git bisect, we found that this change makes the problem go 
away.


Were you aware? Is there any plan in place to cherry pick this for 
8.2.3, perhaps or a plan to release 8.2.3 at some point?


Here are some examples of how the bug manifests in different locations 
of the QEMU metadata save file:


2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load mtrr_var:base
2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load cpu:env.mtrr_var
2024-04-26T13:28:53Z qemu-system-x86_64: error while loading state for instance 
0x1b of device 'cpu'
2024-04-26T13:28:53Z qemu-system-x86_64: load of migration failed: Input/output 
error

And another:

2024-04-17T16:09:47Z qemu-system-x86_64: check_section_footer: Read section 
footer failed: -5
2024-04-17T16:09:47Z qemu-system-x86_64: load of migration failed: Invalid 
argument

And another:

2024-04-30T21:53:29Z qemu-system-x86_64: Unable to read ID string for section 
163
2024-04-30T21:53:29Z qemu-system-x86_64: load of migration failed: Invalid 
argument

And another:

2024-05-01T16:01:44Z qemu-system-x86_64: Unable to read ID string for section 
164
2024-05-01T16:01:44Z qemu-system-x86_64: load of migration failed: Invalid 
argument
 

As you can see, they occur quite randomly, but generally it takes at 
least 20-30+ live updates before the problem occurs.


- Michael

On 2/27/24 23:13, pet...@redhat.com wrote:

From: Steve Sistare

When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because qmp_cont
rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare
Reviewed-by: Peter Xu
Link:https://urldefense.com/v3/__https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com__;!!GjvTz_vk!QLsFOCX-x2U9bzAo98SdidKlomHrmf_t0UmQKtgudoIcaDVoAJOPm39ZqaNP_nT5I8QqVfSgwhDZmg$  
Signed-off-by: Peter Xu

---
  include/migration/misc.h |  1 +
  migration/migration.h|  2 --
  migration/migration.c| 51 
  3 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e4933b815b..5d1aa593ed 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,6 +60,7 @@ void migration_object_init(void);
  void migration_shutdown(void);
  bool migration_is_idle(void);
  bool migration_is_active(MigrationState *);
+bool migrate_mode_is_cpr(MigrationState *);
  
  typedef enum MigrationEventType {

  MIG_EVENT_PRECOPY_SETUP,
diff --git a/migration/migration.h b/migration/migration.h
index aef8afbe1f..65c0b61cbd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s);
   */
  void migration_rp_kick(MigrationState *s);
  
-int migration_stop_vm(RunState state);

-
  #endif
diff --git a/migration/migration.c b/migration/migration.c
index 37c836b0b0..90a90947fb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,11 +167,19 @@ static gint page_request_addr_cmp(gconstpointer ap, 
gconstpointer bp)
  return (a > b) - (a < b);
  }
  
-int migration_stop_vm(RunState state)

+static int migration_stop_vm(MigrationState *s, RunState state)
  {
-int ret = vm_stop_force_state(state);
+int ret;
+
+migration_downtime_start(s);
+
+s->vm_old_state = runstate_get();
+global_state_store();
+
+ret = vm_stop_force_state(state);
  
  trace_vmstate_downtime_checkpoint("src-vm-stopped");

+trace_migration_completion_vm_stop(ret);
  
  return ret;

  }
@@ -1602,6 +1610,11 @@ bool migration_is_active(MigrationState *s)
  s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
  }
  
+bool migrate_mode_is_cpr(MigrationState *s)

+{
+return s->parameters.mode == MIG_MODE_CPR_REBOOT;
+}
+
  int migrate_init(MigrationState *s, Error **errp)
  {
  int ret;
@@ -2454,10 +2467,7 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
  bql_lock();
  trace_postcopy_start_set_run();
  
-migration_downtime_start(ms);

-
-global_state_store();
-ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
+ret = migration_stop_vm(ms, RUN_STATE_FINISH_MIGRATE);
  if (ret < 0) {
  goto fail;
  }
@@ -2652,15 +2662,12 @@ static int migration_completion_precopy(MigrationState 
*s,
  int ret;
  
  

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-02 Thread Michael Galaxy

Yu Zhang / Jinpu,

Any possibility (at your lesiure, and within the disclosure rules of 
your company, IONOS) if you could share any of your performance 
information to educate the group?


NICs have indeed changed, but not everybody has 100ge mellanox cards at 
their disposal. Some people don't.


- Michael

On 5/1/24 11:16, Peter Xu wrote:

On Wed, May 01, 2024 at 04:59:38PM +0100, Daniel P. Berrangé wrote:

On Wed, May 01, 2024 at 11:31:13AM -0400, Peter Xu wrote:

What I worry more is whether this is really what we want to keep rdma in
qemu, and that's also why I was trying to request for some serious
performance measurements comparing rdma v.s. nics.  And here when I said
"we" I mean both QEMU community and any company that will support keeping
rdma around.

The problem is if NICs now are fast enough to perform at least equally
against rdma, and if it has a lower cost of overall maintenance, does it
mean that rdma migration will only be used by whoever wants to keep them in
the products and existed already?  In that case we should simply ask new
users to stick with tcp, and rdma users should only drop but not increase.

It seems also destined that most new migration features will not support
rdma: see how much we drop old features in migration now (which rdma
_might_ still leverage, but maybe not), and how much we add mostly multifd
relevant which will probably not apply to rdma at all.  So in general what
I am worrying is a both-loss condition, if the company might be easier to
either stick with an old qemu (depending on whether other new features are
requested to be used besides RDMA alone), or do periodic rebase with RDMA
downstream only.

I don't know much about the originals of RDMA support in QEMU and why
this particular design was taken. It is indeed a huge maint burden to
have a completely different code flow for RDMA with 4000+ lines of
custom protocol signalling which is barely understandable.

I would note that /usr/include/rdma/rsocket.h provides a higher level
API that is a 1-1 match of the normal kernel 'sockets' API. If we had
leveraged that, then QIOChannelSocket class and the QAPI SocketAddress
type could almost[1] trivially have supported RDMA. There would have
been almost no RDMA code required in the migration subsystem, and all
the modern features like compression, multifd, post-copy, etc would
"just work".

I guess the 'rsocket.h' shim may well limit some of the possible
performance gains, but it might still have been a better tradeoff
to have not quite so good peak performance, but with massively
less maint burden.

My understanding so far is RDMA is sololy for performance but nothing else,
then it's a question on whether rdma existing users would like to do so if
it will run slower.

Jinpu mentioned on the explicit usages of ib verbs but I am just mostly
quotting that word as I don't really know such details:

https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/camgffem2twjxopcnqtq1sjytf5395dbztcmyikrqfxdzjws...@mail.gmail.com/__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOew9oW_kg$

So not sure whether that applies here too, in that having qiochannel
wrapper may not allow direct access to those ib verbs.

Thanks,


With regards,
Daniel

[1] "almost" trivially, because the poll() integration for rsockets
 requires a bit more magic sauce since rsockets FDs are not
 really FDs from the kernel's POV. Still, QIOCHannel likely can
 abstract that probme.
--
|: 
https://urldefense.com/v3/__https://berrange.com__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOfyTmFFUQ$
   -o-
https://urldefense.com/v3/__https://www.flickr.com/photos/dberrange__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOf8A5OC0Q$
  :|
|: 
https://urldefense.com/v3/__https://libvirt.org__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOf3gffAdg$
  -o-
https://urldefense.com/v3/__https://fstop138.berrange.com__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOfPMofYqw$
  :|
|: 
https://urldefense.com/v3/__https://entangle-photo.org__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOeQ5jjAeQ$
 -o-
https://urldefense.com/v3/__https://www.instagram.com/dberrange__;!!GjvTz_vk!W6-HGWM-XkF_52am249DrLIDQeZctVOHg72LvOHGUcwxqQM5mY0GNYYl-yNJslN7A5GfLOfhaDF9WA$
  :|





Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-04-29 Thread Michael Galaxy

Reviewed-by: Michael Galaxy 

Thanks Yu Zhang and Peter.

- Michael

On 4/29/24 15:45, Yu Zhang wrote:

Hello Michael and Peter,

We are very glad at your quick and kind reply about our plan to take
over the maintenance of your code. The message is for presenting our
plan and working together.
If we were able to obtain the maintainer's role, our plan is:

1. Create the necessary unit-test cases and get them integrated into
the current QEMU GitLab-CI pipeline
2. Review and test the code changes by other developers to ensure that
nothing is broken in the changed code before being merged by the
community
3. Based on our current practice and application scenario, look for
possible improvements when necessary

Besides that, a patch is attached to announce this change in the community.

With your generous support, we hope that the development community
will make a positive decision for us.

Kind regards,
Yu Zhang@ IONOS Cloud

On Mon, Apr 29, 2024 at 4:57 PM Peter Xu  wrote:

On Mon, Apr 29, 2024 at 08:08:10AM -0500, Michael Galaxy wrote:

Hi All (and Peter),

Hi, Michael,


My name is Michael Galaxy (formerly Hines). Yes, I changed my last name
(highly irregular for a male) and yes, that's my real last name:
https://urldefense.com/v3/__https://www.linkedin.com/in/mrgalaxy/__;!!GjvTz_vk!TZmnCE90EK692dSjZGr-2cpOEZBQTBsTO2bW5z3rSbpZgNVCexZkxwDXhmIOWG2GAKZAUovQ5xe5coQ$
 )

I'm the original author of the RDMA implementation. I've been discussing
with Yu Zhang for a little bit about potentially handing over maintainership
of the codebase to his team.

I simply have zero access to RoCE or Infiniband hardware at all,
unfortunately. so I've never been able to run tests or use what I wrote at
work, and as all of you know, if you don't have a way to test something,
then you can't maintain it.

Yu Zhang put a (very kind) proposal forward to me to ask the community if
they feel comfortable training his team to maintain the codebase (and run
tests) while they learn about it.

The "while learning" part is fine at least to me.  IMHO the "ownership" to
the code, or say, taking over the responsibility, may or may not need 100%
mastering the code base first.  There should still be some fundamental
confidence to work on the code though as a starting point, then it's about
serious use case to back this up, and careful testings while getting more
familiar with it.


If you don't mind, I'd like to let him send over his (very detailed)
proposal,

Yes please, it's exactly the time to share the plan.  The hope is we try to
reach a consensus before or around the middle of this release (9.1).
Normally QEMU has a 3~4 months window for each release and 9.1 schedule is
not yet out, but I think it means we make a decision before or around
middle of June.

Thanks,


- Michael

On 4/11/24 11:36, Yu Zhang wrote:

1) Either a CI test covering at least the major RDMA paths, or at least
  periodically tests for each QEMU release will be needed.

We use a batch of regression test cases for the stack, which covers the
test for QEMU. I did such test for most of the QEMU releases planned as
candidates for rollout.

The migration test needs a pair of (either physical or virtual) servers with
InfiniBand network, which makes it difficult to do on a single server. The
nested VM could be a possible approach, for which we may need virtual
InfiniBand network. Is SoftRoCE [1] a choice? I will try it and let you know.

[1]  
https://urldefense.com/v3/__https://enterprise-support.nvidia.com/s/article/howto-configure-soft-roce__;!!GjvTz_vk!VEqNfg3Kdf58Oh1FkGL6ErDLfvUXZXPwMTaXizuIQeIgJiywPzuwbqx8wM0KUsyopw_EYQxWvGHE3ig$

Thanks and best regards!

On Thu, Apr 11, 2024 at 4:20 PM Peter Xu  wrote:

On Wed, Apr 10, 2024 at 09:49:15AM -0400, Peter Xu wrote:

On Wed, Apr 10, 2024 at 02:28:59AM +, Zhijian Li (Fujitsu) via wrote:

on 4/10/2024 3:46 AM, Peter Xu wrote:


Is there document/link about the unittest/CI for migration tests, Why
are those tests missing?
Is it hard or very special to set up an environment for that? maybe we
can help in this regards.

See tests/qtest/migration-test.c.  We put most of our migration tests
there and that's covered in CI.

I think one major issue is CI systems don't normally have rdma devices.
Can rdma migration test be carried out without a real hardware?

Yeah,  RXE aka. SOFT-RoCE is able to emulate the RDMA, for example
$ sudo rdma link add rxe_eth0 type rxe netdev eth0  # on host
then we can get a new RDMA interface "rxe_eth0".
This new RDMA interface is able to do the QEMU RDMA migration.

Also, the loopback(lo) device is able to emulate the RDMA interface
"rxe_lo", however when
I tried(years ago) to do RDMA migration over this
interface(rdma:127.0.0.1:) , it got something wrong.
So i gave up enabling the RDMA migration qtest at that time.

Thanks, Zhijian.

I'm not sure adding an emu-link for rdma is doable for CI systems, though.
Maybe someone more familiar with how CI wor

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-04-29 Thread Michael Galaxy

Hi All (and Peter),

My name is Michael Galaxy (formerly Hines). Yes, I changed my last name 
(highly irregular for a male) and yes, that's my real last name: 
https://www.linkedin.com/in/mrgalaxy/)


I'm the original author of the RDMA implementation. I've been discussing 
with Yu Zhang for a little bit about potentially handing over 
maintainership of the codebase to his team.


I simply have zero access to RoCE or Infiniband hardware at all, 
unfortunately. so I've never been able to run tests or use what I wrote 
at work, and as all of you know, if you don't have a way to test 
something, then you can't maintain it.


Yu Zhang put a (very kind) proposal forward to me to ask the community 
if they feel comfortable training his team to maintain the codebase (and 
run tests) while they learn about it.


If you don't mind, I'd like to let him send over his (very detailed) 
proposal,


- Michael

On 4/11/24 11:36, Yu Zhang wrote:

1) Either a CI test covering at least the major RDMA paths, or at least
 periodically tests for each QEMU release will be needed.

We use a batch of regression test cases for the stack, which covers the
test for QEMU. I did such test for most of the QEMU releases planned as
candidates for rollout.

The migration test needs a pair of (either physical or virtual) servers with
InfiniBand network, which makes it difficult to do on a single server. The
nested VM could be a possible approach, for which we may need virtual
InfiniBand network. Is SoftRoCE [1] a choice? I will try it and let you know.

[1]  
https://urldefense.com/v3/__https://enterprise-support.nvidia.com/s/article/howto-configure-soft-roce__;!!GjvTz_vk!VEqNfg3Kdf58Oh1FkGL6ErDLfvUXZXPwMTaXizuIQeIgJiywPzuwbqx8wM0KUsyopw_EYQxWvGHE3ig$

Thanks and best regards!

On Thu, Apr 11, 2024 at 4:20 PM Peter Xu  wrote:

On Wed, Apr 10, 2024 at 09:49:15AM -0400, Peter Xu wrote:

On Wed, Apr 10, 2024 at 02:28:59AM +, Zhijian Li (Fujitsu) via wrote:


on 4/10/2024 3:46 AM, Peter Xu wrote:


Is there document/link about the unittest/CI for migration tests, Why
are those tests missing?
Is it hard or very special to set up an environment for that? maybe we
can help in this regards.

See tests/qtest/migration-test.c.  We put most of our migration tests
there and that's covered in CI.

I think one major issue is CI systems don't normally have rdma devices.
Can rdma migration test be carried out without a real hardware?

Yeah,  RXE aka. SOFT-RoCE is able to emulate the RDMA, for example
$ sudo rdma link add rxe_eth0 type rxe netdev eth0  # on host
then we can get a new RDMA interface "rxe_eth0".
This new RDMA interface is able to do the QEMU RDMA migration.

Also, the loopback(lo) device is able to emulate the RDMA interface
"rxe_lo", however when
I tried(years ago) to do RDMA migration over this
interface(rdma:127.0.0.1:) , it got something wrong.
So i gave up enabling the RDMA migration qtest at that time.

Thanks, Zhijian.

I'm not sure adding an emu-link for rdma is doable for CI systems, though.
Maybe someone more familiar with how CI works can chim in.

Some people got dropped on the cc list for unknown reason, I'm adding them
back (Fabiano, Peter Maydell, Phil).  Let's make sure nobody is dropped by
accident.

I'll try to summarize what is still missing, and I think these will be
greatly helpful if we don't want to deprecate rdma migration:

   1) Either a CI test covering at least the major RDMA paths, or at least
  periodically tests for each QEMU release will be needed.

   2) Some performance tests between modern RDMA and NIC devices are
  welcomed.  The current knowledge is modern NIC can work similarly to
  RDMA in performance, then it's debatable why we still maintain so much
  rdma specific code.

   3) No need to be soild patchsets for this one, but some plan to improve
  RDMA migration code so that it is not almost isolated from the rest
  protocols.

   4) Someone to look after this code for real.

For 2) and 3) more info is here:

https://urldefense.com/v3/__https://lore.kernel.org/r/ZhWa0YeAb9ySVKD1@x1n__;!!GjvTz_vk!VEqNfg3Kdf58Oh1FkGL6ErDLfvUXZXPwMTaXizuIQeIgJiywPzuwbqx8wM0KUsyopw_EYQxWpIWYBhQ$

Here 4) can be the most important as Markus pointed out.  We just didn't
get there yet on the discussions, but maybe Markus is right that we should
talk that first.

Thanks,

--
Peter Xu





Re: [PATCH V3] migration: simplify notifiers

2023-07-13 Thread Michael Galaxy



On 6/7/23 09:42, Steve Sistare wrote:

Pass the callback function to add_migration_state_change_notifier so
that migration can initialize the notifier on add and clear it on
delete, which simplifies the call sites.  Shorten the function names
so the extra arg can be added more legibly.  Hide the global notifier
list in a new function migration_call_notifiers, and make it externally
visible so future live update code can call it.

Tested-by: Michael Galaxy 
Reviewed-by: Michael Galaxy 

No functional change.

Signed-off-by: Steve Sistare 
---
  hw/net/virtio-net.c  |  6 +++---
  hw/vfio/migration.c  |  8 
  include/migration/misc.h |  6 --
  migration/migration.c| 22 --
  net/vhost-vdpa.c |  7 ---
  ui/spice-core.c  |  3 +--
  6 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 6df6b73..c4dc795 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3605,8 +3605,8 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
  n->primary_listener.hide_device = failover_hide_primary_device;
  qatomic_set(>failover_primary_hidden, true);
  device_listener_register(>primary_listener);
-n->migration_state.notify = virtio_net_migration_state_notifier;
-add_migration_state_change_notifier(>migration_state);
+migration_add_notifier(>migration_state,
+   virtio_net_migration_state_notifier);
  n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
  }
  
@@ -3769,7 +3769,7 @@ static void virtio_net_device_unrealize(DeviceState *dev)

  if (n->failover) {
  qobject_unref(n->primary_opts);
  device_listener_unregister(>primary_listener);
-remove_migration_state_change_notifier(>migration_state);
+migration_remove_notifier(>migration_state);
  } else {
  assert(n->primary_opts == NULL);
  }
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index c4656bb..8af0294 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -619,9 +619,9 @@ static int vfio_migration_init(VFIODevice *vbasedev)
  migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
 
vfio_vmstate_change,
 vbasedev);
-migration->migration_state.notify = vfio_migration_state_notifier;
-add_migration_state_change_notifier(>migration_state);
-
+migration_add_notifier(>migration_state,
+   vfio_migration_state_notifier);
+
  return 0;
  }
  
@@ -670,7 +670,7 @@ void vfio_migration_exit(VFIODevice *vbasedev)

  if (vbasedev->migration) {
  VFIOMigration *migration = vbasedev->migration;
  
-remove_migration_state_change_notifier(>migration_state);

+migration_remove_notifier(>migration_state);
  qemu_del_vm_change_state_handler(migration->vm_state);
  unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
  vfio_migration_free(vbasedev);
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 5ebe13b..0987eb1 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -59,8 +59,10 @@ void migration_object_init(void);
  void migration_shutdown(void);
  bool migration_is_idle(void);
  bool migration_is_active(MigrationState *);
-void add_migration_state_change_notifier(Notifier *notify);
-void remove_migration_state_change_notifier(Notifier *notify);
+void migration_add_notifier(Notifier *notify,
+void (*func)(Notifier *notifier, void *data));
+void migration_remove_notifier(Notifier *notify);
+void migration_call_notifiers(MigrationState *s);
  bool migration_in_setup(MigrationState *);
  bool migration_has_finished(MigrationState *);
  bool migration_has_failed(MigrationState *);
diff --git a/migration/migration.c b/migration/migration.c
index 5103e2f..17b4b47 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1178,7 +1178,7 @@ static void migrate_fd_cleanup(MigrationState *s)
  /* It is used on info migrate.  We can't free it */
  error_report_err(error_copy(s->error));
  }
-notifier_list_notify(_state_notifiers, s);
+migration_call_notifiers(s);
  block_cleanup_parameters();
  yank_unregister_instance(MIGRATION_YANK_INSTANCE);
  }
@@ -1273,14 +1273,24 @@ static void migrate_fd_cancel(MigrationState *s)
  }
  }
  
-void add_migration_state_change_notifier(Notifier *notify)

+void migration_add_notifier(Notifier *notify,
+void (*func)(Notifier *notifier, void *data))
  {
+notify->notify = func;
  notifier_list_add(_state_notifiers, notify);
  }
  
-void remove_migration_state_change_notifier(Notifi

Re: [PATCH V4 0/2] migration file URI

2023-07-13 Thread Michael Galaxy

Tested-by: Michael Galaxy 
Reviewed-by: Michael Galaxy 

On 6/30/23 09:25, Steve Sistare wrote:

Add the migration URI "file:filename[,offset=offset]".

Fabiano Rosas has submitted the unit tests in the series
   migration: Test the new "file:" migration

Steve Sistare (2):
   migration: file URI
   migration: file URI offset

  migration/file.c   | 103 +
  migration/file.h   |  14 +++
  migration/meson.build  |   1 +
  migration/migration.c  |   5 +++
  migration/trace-events |   4 ++
  qemu-options.hx|   7 +++-
  6 files changed, 133 insertions(+), 1 deletion(-)
  create mode 100644 migration/file.c
  create mode 100644 migration/file.h





Re: [PATCH V4] migration: simplify blockers

2023-07-13 Thread Michael Galaxy



On 7/7/23 15:20, Steve Sistare wrote:

Modify migrate_add_blocker and migrate_del_blocker to take an Error **
reason.  This allows migration to own the Error object, so that if
an error occurs, migration code can free the Error and clear the client
handle, simplifying client code.

This is also a pre-requisite for future patches that will add a mode
argument to migration requests to support live update, and will maintain
a list of blockers for each mode.  A blocker may apply to a single mode
or to multiple modes, and passing Error** will allow one Error object to
be registered for multiple modes.

No functional change.


Tested-by: Michael Galaxy 
Reviewed-by: Michael Galaxy 



Signed-off-by: Steve Sistare 
---
  backends/tpm/tpm_emulator.c  | 10 ++
  block/parallels.c|  6 ++
  block/qcow.c |  6 ++
  block/vdi.c  |  6 ++
  block/vhdx.c |  6 ++
  block/vmdk.c |  6 ++
  block/vpc.c  |  6 ++
  block/vvfat.c|  6 ++
  dump/dump.c  |  4 ++--
  hw/9pfs/9p.c | 10 ++
  hw/display/virtio-gpu-base.c |  8 ++--
  hw/intc/arm_gic_kvm.c|  3 +--
  hw/intc/arm_gicv3_its_kvm.c  |  3 +--
  hw/intc/arm_gicv3_kvm.c  |  3 +--
  hw/misc/ivshmem.c|  8 ++--
  hw/ppc/pef.c |  2 +-
  hw/ppc/spapr.c   |  2 +-
  hw/ppc/spapr_events.c|  2 +-
  hw/ppc/spapr_rtas.c  |  2 +-
  hw/remote/proxy.c|  7 ++-
  hw/s390x/s390-virtio-ccw.c   |  9 +++--
  hw/scsi/vhost-scsi.c |  8 +++-
  hw/vfio/common.c | 26 --
  hw/vfio/migration.c  | 16 ++--
  hw/virtio/vhost.c|  8 ++--
  include/migration/blocker.h  | 24 +---
  migration/migration.c| 22 ++
  stubs/migr-blocker.c |  4 ++--
  target/i386/kvm/kvm.c|  8 
  target/i386/nvmm/nvmm-all.c  |  3 +--
  target/i386/sev.c|  2 +-
  target/i386/whpx/whpx-all.c  |  3 +--
  ui/vdagent.c |  5 ++---
  33 files changed, 89 insertions(+), 155 deletions(-)

diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
index 402a2d6..bf1a90f 100644
--- a/backends/tpm/tpm_emulator.c
+++ b/backends/tpm/tpm_emulator.c
@@ -534,11 +534,8 @@ static int tpm_emulator_block_migration(TPMEmulator 
*tpm_emu)
  error_setg(_emu->migration_blocker,
 "Migration disabled: TPM emulator does not support "
 "migration");
-if (migrate_add_blocker(tpm_emu->migration_blocker, ) < 0) {
+if (migrate_add_blocker(_emu->migration_blocker, ) < 0) {
  error_report_err(err);
-error_free(tpm_emu->migration_blocker);
-tpm_emu->migration_blocker = NULL;
-
  return -1;
  }
  }
@@ -1016,10 +1013,7 @@ static void tpm_emulator_inst_finalize(Object *obj)
  
  qapi_free_TPMEmulatorOptions(tpm_emu->options);
  
-if (tpm_emu->migration_blocker) {

-migrate_del_blocker(tpm_emu->migration_blocker);
-error_free(tpm_emu->migration_blocker);
-}
+migrate_del_blocker(_emu->migration_blocker);
  
  tpm_sized_buffer_reset(_blobs->volatil);

  tpm_sized_buffer_reset(_blobs->permanent);
diff --git a/block/parallels.c b/block/parallels.c
index 18e34ae..c2d92c4 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -960,9 +960,8 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  error_setg(>migration_blocker, "The Parallels format used by node '%s' 
"
 "does not support live migration",
 bdrv_get_device_or_node_name(bs));
-ret = migrate_add_blocker(s->migration_blocker, errp);
+ret = migrate_add_blocker(>migration_blocker, errp);
  if (ret < 0) {
-error_free(s->migration_blocker);
  goto fail;
  }
  qemu_co_mutex_init(>lock);
@@ -994,8 +993,7 @@ static void parallels_close(BlockDriverState *bs)
  g_free(s->bat_dirty_bmap);
  qemu_vfree(s->header);
  
-migrate_del_blocker(s->migration_blocker);

-error_free(s->migration_blocker);
+migrate_del_blocker(>migration_blocker);
  }
  
  static BlockDriver bdrv_parallels = {

diff --git a/block/qcow.c b/block/qcow.c
index 577bd70..feedad5 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -304,9 +304,8 @@ static int qcow_open(BlockDriverState *bs, QDict *options, 
int flags,
  error_setg(>migration_blocker, "The qcow format used by node '%s' "
 "does not support live migration",
 bdrv_get_device_or_node_name(bs));
-ret = migrate_add_blocker(s->migration_blocker, errp);
+ret = migrate_add_blocke

Re: [PATCH V9 00/46] Live Update

2023-07-13 Thread Michael Galaxy

 Good morning,

On 7/10/23 10:10, Steven Sistare wrote:

On 6/12/2023 10:59 AM, Michael Galaxy wrote:

Hi Steve,
On 6/7/23 12:37, Steven Sistare wrote:

On 6/7/2023 11:55 AM, Michael Galaxy wrote:

Another option could be to expose "-migrate-mode-disable" (instead of enable) 
and just enable all 3 modes by default,
since we are already required to switch from "normal" mode to a CPR-specific 
mode when it is time to do a live update,
if the intention is to preserve the capability to completely prevent a running 
QEMU from using these modes
before the VM starts up.

- Michael

On 6/6/23 17:15, Michael Galaxy wrote:

Hi Steve,

In the current design you have, we have to specify both the command line parameter 
"-migrate-mode-enable cpr-reboot"
*and* issue the monitor command "migrate_set_parameter mode cpr-${mode}".

Is it possible to opt-in to the CPR mode just once over the monitor instead of 
having to specify it twice on the command line?
This would also match the live migration model: You do not need to necessarily "opt 
in" to live migration mode through
a command line parameter, you simply request it when you need to. Can CPR 
behave the same way?

This would also make switching over to a CPR-capable version of QEMU much 
simpler and would even make it work for
existing libvirt-managed guests as their command line parameters would no 
longer need to change. This would allow us to
simply power-off and power-on existing VMs to make them CPR-capable and then 
work on a libvirt patch later when
we're ready to do so.


Comments?

Hi Michael,
    Requiring -migrate-enable-mode allows qemu to initialize objects
differently, if necessary, so that migration for a mode is not blocked.
See callers of migrate_mode_enabled.  There is only one so far, in
ram_block_add.  If the mode is cpr-exec, then it creates anonymous ram
blocks using memfd_create, else using MAP_ANON.  In the V7 series, this
was controlled by a '-machine memfd-alloc=on' option.

migrate-enable-mode is more future proof for the user.  If something new must
initialize differently to support cpr, then it adds a call to 
migrate_mode_enabled,
and the command line remains the same.  However, I could be persuaded to go 
either way.

OK, so it is cpr-exec that needs this option (because of ram block allocation), 
not really cpr-reboot.
Could the option then be made to only be required for cpr-exec and not 
cpr-reboot, then,
since cpr-reboot doesn't require that consideration?

In a different forum Juan said this is a memory issue, so it should be expressed
as a memory related option.  So, I will delete -migrate-enable-mode and revert 
back
to -machine memfd-alloc, as defined in the V7 patch series.


Acknowledged. I'm going to try to get my reviewed-by's in soon. Sorry I 
haven't done it sooner.


We've finished testing these patches on our systems and are moving forward.



A secondary reason for -migrate-enable-mode is to support the only-cpr-capable
option.  It needs to know which mode will be used, in order to check a
mode-specific blocker list.

Still, only-cpr-capable is also optional. If and only if one needs this option, 
the mode could be
specified as part of the option itself, rather than requiring an extra command 
line parameter, no?

Yes, I will make that change.

- Steve


Acknowledged.





Re: [PATCH V9 00/46] Live Update

2023-06-12 Thread Michael Galaxy

Hi Steve,


On 6/7/23 12:37, Steven Sistare wrote:

On 6/7/2023 11:55 AM, Michael Galaxy wrote:

Another option could be to expose "-migrate-mode-disable" (instead of enable) 
and just enable all 3 modes by default,
since we are already required to switch from "normal" mode to a CPR-specific 
mode when it is time to do a live update,
if the intention is to preserve the capability to completely prevent a running 
QEMU from using these modes
before the VM starts up.

- Michael

On 6/6/23 17:15, Michael Galaxy wrote:

Hi Steve,

In the current design you have, we have to specify both the command line parameter 
"-migrate-mode-enable cpr-reboot"
*and* issue the monitor command "migrate_set_parameter mode cpr-${mode}".

Is it possible to opt-in to the CPR mode just once over the monitor instead of 
having to specify it twice on the command line?
This would also match the live migration model: You do not need to necessarily "opt 
in" to live migration mode through
a command line parameter, you simply request it when you need to. Can CPR 
behave the same way?

This would also make switching over to a CPR-capable version of QEMU much 
simpler and would even make it work for
existing libvirt-managed guests as their command line parameters would no 
longer need to change. This would allow us to
simply power-off and power-on existing VMs to make them CPR-capable and then 
work on a libvirt patch later when
we're ready to do so.


Comments?

Hi Michael,
   Requiring -migrate-enable-mode allows qemu to initialize objects
differently, if necessary, so that migration for a mode is not blocked.
See callers of migrate_mode_enabled.  There is only one so far, in
ram_block_add.  If the mode is cpr-exec, then it creates anonymous ram
blocks using memfd_create, else using MAP_ANON.  In the V7 series, this
was controlled by a '-machine memfd-alloc=on' option.

migrate-enable-mode is more future proof for the user.  If something new must
initialize differently to support cpr, then it adds a call to 
migrate_mode_enabled,
and the command line remains the same.  However, I could be persuaded to go 
either way.



OK, so it is cpr-exec that needs this option (because of ram block 
allocation), not really cpr-reboot.
Could the option then be made to only be required for cpr-exec and not 
cpr-reboot, then,

since cpr-reboot doesn't require that consideration?



A secondary reason for -migrate-enable-mode is to support the only-cpr-capable
option.  It needs to know which mode will be used, in order to check a
mode-specific blocker list.


Still, only-cpr-capable is also optional. If and only if one needs this 
option, the mode could be
specified as part of the option itself, rather than requiring an extra 
command line parameter, no?


Further, in many clouds (including ours), our VM-management is 
generational with the
development of the software versioning, so we *always* run tests and 
know whether or not a VM is CPR-capable.


If it is not CPR-capable, we would never run the VM in the first place, 
which means we would never

really use that option at all.

I do see the appeal of the option, but could we simplify it, since it is 
opt-in?


- Michael




- Steve




Re: [PATCH V9 00/46] Live Update

2023-06-07 Thread Michael Galaxy
Another option could be to expose "-migrate-mode-disable" (instead of 
enable) and just enable all 3 modes by default,
since we are already required to switch from "normal" mode to a 
CPR-specific mode when it is time to do a live update,
if the intention is to preserve the capability to completely prevent a 
running QEMU from using these modes

before the VM starts up.

- Michael

On 6/6/23 17:15, Michael Galaxy wrote:

Hi Steve,

In the current design you have, we have to specify both the command 
line parameter "-migrate-mode-enable cpr-reboot"

*and* issue the monitor command "migrate_set_parameter mode cpr-${mode}".

Is it possible to opt-in to the CPR mode just once over the monitor 
instead of having to specify it twice on the command line?
This would also match the live migration model: You do not need to 
necessarily "opt in" to live migration mode through
a command line parameter, you simply request it when you need to. Can 
CPR behave the same way?


This would also make switching over to a CPR-capable version of QEMU 
much simpler and would even make it work for
existing libvirt-managed guests as their command line parameters would 
no longer need to change. This would allow us to
simply power-off and power-on existing VMs to make them CPR-capable 
and then work on a libvirt patch later when

we're ready to do so.


Comments?

- Michael


On 12/7/22 09:48, Steven Sistare wrote:

This series desperately needs review in its intersection with live migration.
The code in other areas has been reviewed and revised multiple times -- thank 
you!

David, Juan, can you spare some time to review this?  I have done my best to 
order
the patches logically (see the labelled groups in this email), and to provide
complete and clear cover letter and commit messages. Can I do anything to 
facilitate,
like doing a code walk through via zoom?

And of course, I welcome anyone's feedback.

Here is the original posting.

https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/1658851843-236870-1-git-send-email-steven.sist...@oracle.com/__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3EAFRUVu$  


- Steve

On 7/26/2022 12:09 PM, Steve Sistare wrote:

This version of the live update patch series integrates live update into the
live migration framework.  The new interfaces are:
   * mode (migration parameter)
   * cpr-exec-args (migration parameter)
   * file (migration URI)
   * migrate-mode-enable (command-line argument)
   * only-cpr-capable (command-line argument)

Provide the cpr-exec and cpr-reboot migration modes for live update.  These
save and restore VM state, with minimal guest pause time, so that qemu may be
updated to a new version in between.  The caller sets the mode parameter
before invoking the migrate or migrate-incoming commands.

In cpr-reboot mode, the migrate command saves state to a file, allowing
one to quit qemu, reboot to an updated kernel, start an updated version of
qemu, and resume via the migrate-incoming command.  The caller must specify
a migration URI that writes to and reads from a file.  Unlike normal mode,
the use of certain local storage options does not block the migration, but
the caller must not modify guest block devices between the quit and restart.
The guest RAM memory-backend must be shared, and the @x-ignore-shared
migration capability must be set, to avoid saving it to the file.  Guest RAM
must be non-volatile across reboot, which can be achieved by backing it with
a dax device, or /dev/shm PKRAM as proposed in
https://urldefense.com/v3/__https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3AKRJQux$  
but this is not enforced.  The restarted qemu arguments must match those used

to initially start qemu, plus the -incoming option.

The reboot mode supports vfio devices if the caller first suspends the guest,
such as by issuing guest-suspend-ram to the qemu guest agent.  The guest
drivers' suspend methods flush outstanding requests and re-initialize the
devices, and thus there is no device state to save and restore.  After
issuing migrate-incoming, the caller must issue a system_wakeup command to
resume.

In cpr-exec mode, the migrate command saves state to a file and directly
exec's a new version of qemu on the same host, replacing the original process
while retaining its PID.  The caller must specify a migration URI that writes
to and reads from a file, and resumes execution via the migrate-incoming
command.  Arguments for the new qemu process are taken from the cpr-exec-args
migration parameter, and must include the -incoming option.

Guest RAM must be backed by a memory backend with share=on, but cannot be
memory-backend-ram.  The memory is re-mmap'd in the updated process, so guest
ram is efficiently preserved in place, albeit with new virtual

Re: [PATCH V9 00/46] Live Update

2023-06-06 Thread Michael Galaxy

Hi Steve,

In the current design you have, we have to specify both the command line 
parameter "-migrate-mode-enable cpr-reboot"

*and* issue the monitor command "migrate_set_parameter mode cpr-${mode}".

Is it possible to opt-in to the CPR mode just once over the monitor 
instead of having to specify it twice on the command line?
This would also match the live migration model: You do not need to 
necessarily "opt in" to live migration mode through
a command line parameter, you simply request it when you need to. Can 
CPR behave the same way?


This would also make switching over to a CPR-capable version of QEMU 
much simpler and would even make it work for
existing libvirt-managed guests as their command line parameters would 
no longer need to change. This would allow us to
simply power-off and power-on existing VMs to make them CPR-capable and 
then work on a libvirt patch later when

we're ready to do so.


Comments?

- Michael


On 12/7/22 09:48, Steven Sistare wrote:

This series desperately needs review in its intersection with live migration.
The code in other areas has been reviewed and revised multiple times -- thank 
you!

David, Juan, can you spare some time to review this?  I have done my best to 
order
the patches logically (see the labelled groups in this email), and to provide
complete and clear cover letter and commit messages. Can I do anything to 
facilitate,
like doing a code walk through via zoom?

And of course, I welcome anyone's feedback.

Here is the original posting.

https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/1658851843-236870-1-git-send-email-steven.sist...@oracle.com/__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3EAFRUVu$  


- Steve

On 7/26/2022 12:09 PM, Steve Sistare wrote:

This version of the live update patch series integrates live update into the
live migration framework.  The new interfaces are:
   * mode (migration parameter)
   * cpr-exec-args (migration parameter)
   * file (migration URI)
   * migrate-mode-enable (command-line argument)
   * only-cpr-capable (command-line argument)

Provide the cpr-exec and cpr-reboot migration modes for live update.  These
save and restore VM state, with minimal guest pause time, so that qemu may be
updated to a new version in between.  The caller sets the mode parameter
before invoking the migrate or migrate-incoming commands.

In cpr-reboot mode, the migrate command saves state to a file, allowing
one to quit qemu, reboot to an updated kernel, start an updated version of
qemu, and resume via the migrate-incoming command.  The caller must specify
a migration URI that writes to and reads from a file.  Unlike normal mode,
the use of certain local storage options does not block the migration, but
the caller must not modify guest block devices between the quit and restart.
The guest RAM memory-backend must be shared, and the @x-ignore-shared
migration capability must be set, to avoid saving it to the file.  Guest RAM
must be non-volatile across reboot, which can be achieved by backing it with
a dax device, or /dev/shm PKRAM as proposed in
https://urldefense.com/v3/__https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3AKRJQux$  
but this is not enforced.  The restarted qemu arguments must match those used

to initially start qemu, plus the -incoming option.

The reboot mode supports vfio devices if the caller first suspends the guest,
such as by issuing guest-suspend-ram to the qemu guest agent.  The guest
drivers' suspend methods flush outstanding requests and re-initialize the
devices, and thus there is no device state to save and restore.  After
issuing migrate-incoming, the caller must issue a system_wakeup command to
resume.

In cpr-exec mode, the migrate command saves state to a file and directly
exec's a new version of qemu on the same host, replacing the original process
while retaining its PID.  The caller must specify a migration URI that writes
to and reads from a file, and resumes execution via the migrate-incoming
command.  Arguments for the new qemu process are taken from the cpr-exec-args
migration parameter, and must include the -incoming option.

Guest RAM must be backed by a memory backend with share=on, but cannot be
memory-backend-ram.  The memory is re-mmap'd in the updated process, so guest
ram is efficiently preserved in place, albeit with new virtual addresses.
In addition, the '-migrate-mode-enable cpr-exec' option is required.  This
causes secondary guest ram blocks (those not specified on the command line)
to be allocated by mmap'ing a memfd.  The memfds are kept open across exec,
their values are saved in special cpr state which is retrieved after exec,
and they are re-mmap'd.  Since guest RAM is not copied, and storage blocks
are not migrated, the caller must disable all capabilities related to page
and block 

Re: [PATCH V9 00/46] Live Update

2023-04-14 Thread Michael Galaxy

Greetings,

For what its worth, our team has been aggressively testing this patch 
series and to date have not found any deficiencies or bottlenecks 
whatsoever.


Overall, with a very well-tuned system (and linux kernel), we are 
getting live update downtimes just above 15 seconds, and the vast 
majority of that downtime is kind of wasted time during kexec 
(particularly with PCI device probing, if anybody has any tips on that) 
and has nothing to do whatsoever with the QEMU-side of things. (We use 
this patchset in a PMEM-based configuration which has been working out 
extremely well so far).


We also tested "back-to-back" live updates as you would expect to do in 
production.


Overall we've done thousands of live updates over the past few months on 
many different types of hardware without failure.


Steven has been really responsive in answering some of our usability 
questions and we were able to fix those issues.


We will continue our testing throughout the year with more 
heavily-loaded workloads, but all in all we would very much be 
interested in seeing further reviews on this patch series from others.

*
*---
Tested-by: Michael Galaxy 

On 12/7/22 09:48, Steven Sistare wrote:

This series desperately needs review in its intersection with live migration.
The code in other areas has been reviewed and revised multiple times -- thank 
you!

David, Juan, can you spare some time to review this?  I have done my best to 
order
the patches logically (see the labelled groups in this email), and to provide
complete and clear cover letter and commit messages. Can I do anything to 
facilitate,
like doing a code walk through via zoom?

And of course, I welcome anyone's feedback.

Here is the original posting.

https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/1658851843-236870-1-git-send-email-steven.sist...@oracle.com/__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3EAFRUVu$  


- Steve

On 7/26/2022 12:09 PM, Steve Sistare wrote:

This version of the live update patch series integrates live update into the
live migration framework.  The new interfaces are:
   * mode (migration parameter)
   * cpr-exec-args (migration parameter)
   * file (migration URI)
   * migrate-mode-enable (command-line argument)
   * only-cpr-capable (command-line argument)

Provide the cpr-exec and cpr-reboot migration modes for live update.  These
save and restore VM state, with minimal guest pause time, so that qemu may be
updated to a new version in between.  The caller sets the mode parameter
before invoking the migrate or migrate-incoming commands.

In cpr-reboot mode, the migrate command saves state to a file, allowing
one to quit qemu, reboot to an updated kernel, start an updated version of
qemu, and resume via the migrate-incoming command.  The caller must specify
a migration URI that writes to and reads from a file.  Unlike normal mode,
the use of certain local storage options does not block the migration, but
the caller must not modify guest block devices between the quit and restart.
The guest RAM memory-backend must be shared, and the @x-ignore-shared
migration capability must be set, to avoid saving it to the file.  Guest RAM
must be non-volatile across reboot, which can be achieved by backing it with
a dax device, or /dev/shm PKRAM as proposed in
https://urldefense.com/v3/__https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3AKRJQux$  
but this is not enforced.  The restarted qemu arguments must match those used

to initially start qemu, plus the -incoming option.

The reboot mode supports vfio devices if the caller first suspends the guest,
such as by issuing guest-suspend-ram to the qemu guest agent.  The guest
drivers' suspend methods flush outstanding requests and re-initialize the
devices, and thus there is no device state to save and restore.  After
issuing migrate-incoming, the caller must issue a system_wakeup command to
resume.

In cpr-exec mode, the migrate command saves state to a file and directly
exec's a new version of qemu on the same host, replacing the original process
while retaining its PID.  The caller must specify a migration URI that writes
to and reads from a file, and resumes execution via the migrate-incoming
command.  Arguments for the new qemu process are taken from the cpr-exec-args
migration parameter, and must include the -incoming option.

Guest RAM must be backed by a memory backend with share=on, but cannot be
memory-backend-ram.  The memory is re-mmap'd in the updated process, so guest
ram is efficiently preserved in place, albeit with new virtual addresses.
In addition, the '-migrate-mode-enable cpr-exec' option is required.  This
causes secondary guest ram blocks (those not specified on the command line)
to be allocated by mmap'ing a memfd.  The memfds are kept open across exec,
th

Re: [PATCH V9 00/46] Live Update

2023-04-07 Thread Michael Galaxy

Hey Steven,

Have you done any "back-to-back" live update testing before?

I am still doing extensive testing on this myself. I am running into a 
bug that I have not yet diagnosed.


It involves the following:

1. Perform a live update (I'm using kexec + PMEM-based live updates). => 
VM comes back just fine. All is well.
2. Using the *same* QEMU instance: Try to perform another live update 
agaig,

this results in the following error:

monitor> migrate_incoming file:qemu.sav
qemu-system-x86_64: check_section_footer: Read section footer failed: -5
qemu-system-x86_64: load of migration failed: Invalid argument

I'm going to start diving into the code soon, but just thought I would 
ask first.


- Michael

On 12/7/22 09:48, Steven Sistare wrote:


This series desperately needs review in its intersection with live migration.
The code in other areas has been reviewed and revised multiple times -- thank 
you!

David, Juan, can you spare some time to review this?  I have done my best to 
order
the patches logically (see the labelled groups in this email), and to provide
complete and clear cover letter and commit messages. Can I do anything to 
facilitate,
like doing a code walk through via zoom?

And of course, I welcome anyone's feedback.

Here is the original posting.

https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/1658851843-236870-1-git-send-email-steven.sist...@oracle.com/__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3EAFRUVu$

- Steve

On 7/26/2022 12:09 PM, Steve Sistare wrote:

This version of the live update patch series integrates live update into the
live migration framework.  The new interfaces are:
   * mode (migration parameter)
   * cpr-exec-args (migration parameter)
   * file (migration URI)
   * migrate-mode-enable (command-line argument)
   * only-cpr-capable (command-line argument)

Provide the cpr-exec and cpr-reboot migration modes for live update.  These
save and restore VM state, with minimal guest pause time, so that qemu may be
updated to a new version in between.  The caller sets the mode parameter
before invoking the migrate or migrate-incoming commands.

In cpr-reboot mode, the migrate command saves state to a file, allowing
one to quit qemu, reboot to an updated kernel, start an updated version of
qemu, and resume via the migrate-incoming command.  The caller must specify
a migration URI that writes to and reads from a file.  Unlike normal mode,
the use of certain local storage options does not block the migration, but
the caller must not modify guest block devices between the quit and restart.
The guest RAM memory-backend must be shared, and the @x-ignore-shared
migration capability must be set, to avoid saving it to the file.  Guest RAM
must be non-volatile across reboot, which can be achieved by backing it with
a dax device, or /dev/shm PKRAM as proposed in
https://urldefense.com/v3/__https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3AKRJQux$
but this is not enforced.  The restarted qemu arguments must match those used
to initially start qemu, plus the -incoming option.

The reboot mode supports vfio devices if the caller first suspends the guest,
such as by issuing guest-suspend-ram to the qemu guest agent.  The guest
drivers' suspend methods flush outstanding requests and re-initialize the
devices, and thus there is no device state to save and restore.  After
issuing migrate-incoming, the caller must issue a system_wakeup command to
resume.

In cpr-exec mode, the migrate command saves state to a file and directly
exec's a new version of qemu on the same host, replacing the original process
while retaining its PID.  The caller must specify a migration URI that writes
to and reads from a file, and resumes execution via the migrate-incoming
command.  Arguments for the new qemu process are taken from the cpr-exec-args
migration parameter, and must include the -incoming option.

Guest RAM must be backed by a memory backend with share=on, but cannot be
memory-backend-ram.  The memory is re-mmap'd in the updated process, so guest
ram is efficiently preserved in place, albeit with new virtual addresses.
In addition, the '-migrate-mode-enable cpr-exec' option is required.  This
causes secondary guest ram blocks (those not specified on the command line)
to be allocated by mmap'ing a memfd.  The memfds are kept open across exec,
their values are saved in special cpr state which is retrieved after exec,
and they are re-mmap'd.  Since guest RAM is not copied, and storage blocks
are not migrated, the caller must disable all capabilities related to page
and block copy.  The implementation ignores all related parameters.

The exec mode supports vfio devices by preserving the vfio container, group,
device, and event descriptors across the qemu re-exec, and by updating DMA
mapping virtual addresses