Re: [PATCH V1 05/26] migration: precreate vmstate

2024-05-28 Thread Steven Sistare via

On 5/27/2024 2:16 PM, Peter Xu wrote:

On Mon, Apr 29, 2024 at 08:55:14AM -0700, Steve Sistare wrote:

Provide the VMStateDescription precreate field to mark objects that must
be loaded on the incoming side before devices have been created, because
they provide properties that will be needed at creation time.  They will
be saved to and loaded from their own QEMUFile, via
qemu_savevm_precreate_save and qemu_savevm_precreate_load, but these
functions are not yet called in this patch.  Allow them to be called
before or after normal migration is active, when current_migration and
current_incoming are not valid.

Signed-off-by: Steve Sistare 
---
  include/migration/vmstate.h |  6 
  migration/savevm.c  | 69 +
  migration/savevm.h  |  3 ++
  3 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 294d2d8..4691334 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -198,6 +198,12 @@ struct VMStateDescription {
   * a QEMU_VM_SECTION_START section.
   */
  bool early_setup;
+
+/*
+ * Send/receive this object in the precreate migration stream.
+ */
+bool precreate;
+
  int version_id;
  int minimum_version_id;
  MigrationPriority priority;
diff --git a/migration/savevm.c b/migration/savevm.c
index 9789823..a30bcd9 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -239,6 +239,7 @@ static SaveState savevm_state = {
  
  #define SAVEVM_FOREACH(se, entry)\

  QTAILQ_FOREACH(se, _state.handlers, entry)\
+if (!se->vmsd || !se->vmsd->precreate)
  
  #define SAVEVM_FOREACH_ALL(se, entry)\

  QTAILQ_FOREACH(se, _state.handlers, entry)
@@ -1006,13 +1007,19 @@ static void save_section_header(QEMUFile *f, 
SaveStateEntry *se,
  }
  }
  
+static bool send_section_footer(SaveStateEntry *se)

+{
+return (se->vmsd && se->vmsd->precreate) ||
+   migrate_get_current()->send_section_footer;
+}


Does the precreate vmsd "require" the footer?  Or it should also work?
IMHO it's less optimal to bind features without good reasons.


It is not required.  However, IMO we should not treat send-section-footer as
a fungible feature.  It is strictly an improvement, as was added to catch
misformated sections.  It is only registered as a feature for backwards
compatibility with qemu 2.3 and xen.

For a brand new data stream such as precreate, where we are not constrained
by backwards compatibility, we should unconditionally use the better protocol,
and always send the footer.

- Steve




Re: [PATCH V1 00/26] Live update: cpr-exec

2024-05-28 Thread Steven Sistare via

On 5/27/2024 1:45 PM, Peter Xu wrote:

On Tue, May 21, 2024 at 07:46:12AM -0400, Steven Sistare wrote:

I understand, thanks.  If I can help with any of your todo list,
just ask - steve


Thanks for offering the help, Steve.  Started looking at this today, then I
found that I miss something high-level.  Let me ask here, and let me
apologize already for starting to throw multiple questions..

IIUC the whole idea of this patchset is to allow efficient QEMU upgrade, in
this case not host kernel but QEMU-only, and/or upper.

Is there any justification on why the complexity is needed here?  It looks
to me this one is more involved than cpr-reboot, so I'm thinking how much
we can get from the complexity, and whether it's worthwhile.  1000+ LOC is
the min support, and if we even expect more to come, that's really
important, IMHO.

For example, what's the major motivation of this whole work?  Is that more
on performance, or is it more for supporting the special devices like VFIO
which we used to not support, or something else?  I can't find them in
whatever cover letter I can find, including this one.

Firstly, regarding performance, IMHO it'll be always nice to share even
some very fundamental downtime measurement comparisons using the new exec
mode v.s. the old migration ways to upgrade QEMU binary.  Do you perhaps
have some number on hand when you started working on this feature years
ago?  Or maybe some old links on the list would help too, as I didn't
follow this work since the start.

On VFIO, IIUC you started out this project without VFIO migration being
there.  Now we have VFIO migration so not sure how much it would work for
the upgrade use case. Even with current VFIO migration, we may not want to
migrate device states for a local upgrade I suppose, as that can be a lot
depending on the type of device assigned.  However it'll be nice to discuss
this too if this is the major purpose of the series.

I think one other challenge on QEMU upgrade with VFIO devices is that the
dest QEMU won't be able to open the VFIO device when the src QEMU is still
using it as the owner.  IIUC this is a similar condition where QEMU wants
to have proper ownership transfer of a shared block device, and AFAIR right
now we resolved that issue using some form of file lock on the image file.
In this case it won't easily apply to a VFIO dev fd, but maybe we still
have other approaches, not sure whether you investigated any.  E.g. could
the VFIO handle be passed over using unix scm rights?  I think this might
remove one dependency of using exec which can cause quite some difference
v.s. a generic migration (from which regard, cpr-reboot is still a pretty
generic migration).

You also mentioned vhost/tap, is that also a major goal of this series in
the follow up patchsets?  Is this a problem only because this solution will
do exec?  Can it work if either the exec()ed qemu or dst qemu create the
vhost/tap fds when boot?

Meanwhile, could you elaborate a bit on the implication on chardevs?  From
what I read in the doc update it looks like a major part of work in the
future, but I don't yet understand the issue..  Is it also relevant to the
exec() approach?

In all cases, some of such discussion would be really appreciated.  And if
you used to consider other approaches to solve this problem it'll be great
to mention how you chose this way.  Considering this work contains too many
things, it'll be nice if such discussion can start with the fundamentals,
e.g. on why exec() is a must.


The main goal of cpr-exec is providing a fast and reliable way to update
qemu. cpr-reboot is not fast enough or general enough.  It requires the
guest to support suspend and resume for all devices, and that takes seconds.
If one actually reboots the host, that adds more seconds, depending on
system services.  cpr-exec takes 0.1 secs, and works every time, unlike
like migration which can fail to converge on a busy system.  Live migration
also consumes more system and network resources.  cpr-exec seamlessly
preserves client connections by preserving chardevs, and overall provides
a much nicer user experience.

chardev's are preserved by keeping their fd open across the exec, and
remembering the value of the fd in precreate vmstate so that new qemu
can associate the fd with the chardev rather than opening a new one.

The approach of preserving open file descriptors is very general and applicable
to all kinds of devices, regardless of whether they support live migration
in hardware.  Device fd's are preserved using the same mechanism as for
chardevs.

Devices that support live migration in hardware do not like to live migrate
in place to the same node.  It is not what they are designed for, and some
implementations will flat out fail because the source and target interfaces
are the same.

For vhost/tap, sometimes the management layer opens the dev and passes an
fd to qemu, and sometimes qemu opens the dev.  The upcoming vhost/tap support
allows both

Re: [PATCH V1 07/26] migration: VMStateId

2024-05-28 Thread Steven Sistare via

On 5/27/2024 2:20 PM, Peter Xu wrote:

On Mon, Apr 29, 2024 at 08:55:16AM -0700, Steve Sistare wrote:

Define a type for the 256 byte id string to guarantee the same length is
used and enforced everywhere.

Signed-off-by: Steve Sistare 
---
  include/exec/ramblock.h | 3 ++-
  include/migration/vmstate.h | 2 ++
  migration/savevm.c  | 8 
  migration/vmstate.c | 3 ++-
  4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 0babd10..61deefe 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -23,6 +23,7 @@
  #include "cpu-common.h"
  #include "qemu/rcu.h"
  #include "exec/ramlist.h"
+#include "migration/vmstate.h"
  
  struct RAMBlock {

  struct rcu_head rcu;
@@ -35,7 +36,7 @@ struct RAMBlock {
  void (*resized)(const char*, uint64_t length, void *host);
  uint32_t flags;
  /* Protected by the BQL.  */
-char idstr[256];
+VMStateId idstr;
  /* RCU-enabled, writes protected by the ramlist lock */
  QLIST_ENTRY(RAMBlock) next;
  QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;


Hmm.. Don't look like a good idea to include a migration header in
ramblock.h?  Is this ramblock change needed for this work?


Well, entities that are migrated include migration headers, and now that
includes RAMBlock.  There is precedent:

0 include/exec/ramblock.h   26 #include "migration/vmstate.h"
1 include/hw/acpi/ich9_tco. 14 #include "migration/vmstate.h"
2 include/hw/display/ramfb.  4 #include "migration/vmstate.h"
3 include/hw/hyperv/vmbus.h 16 #include "migration/vmstate.h"
4 include/hw/input/pl050.h  14 #include "migration/vmstate.h"
5 include/hw/pci/shpc.h  7 #include "migration/vmstate.h"
6 include/hw/virtio/virtio. 20 #include "migration/vmstate.h"
7 include/migration/cpu.h8 #include "migration/vmstate.h"

Granted, only some of the C files that include ramblock.h need all of vmstate.h.
I could define VMStateId in a smaller file such as migration/misc.h, or a
new file migration/vmstateid.h, and include that in ramblock.h.
Any preference?

- Steve



Re: [PATCH V1 08/26] migration: vmstate_info_void_ptr

2024-05-28 Thread Steven Sistare via

On 5/27/2024 2:31 PM, Peter Xu wrote:

On Mon, Apr 29, 2024 at 08:55:17AM -0700, Steve Sistare wrote:

Define VMSTATE_VOID_PTR so the value of a pointer (but not its target)
can be saved in the migration stream.  This will be needed for CPR.

Signed-off-by: Steve Sistare 


This is really tricky.

 From a first glance, I don't think migrating a VA is valid at all for
migration even if with exec.. and looks insane to me for a cross-process
migration, which seems to be allowed to use as a generic VMSD helper.. as
VA is the address space barrier for different processes and I think it
normally even apply to generic execve(), and we're trying to jailbreak for
some reason..

It definitely won't work for any generic migration as sizeof(void*) can be
different afaict between hosts, e.g. 32bit -> 64bit migrations.

Some description would be really helpful in this commit message,
e.g. explain the users and why.  Do we need to poison that for generic VMSD
use (perhaps with prefixed underscores)?  I think I'll need to read on the
rest to tell..


Short answer: we never dereference the void* in the new process.  And must not.

Longer answer:

During CPR for vfio, each mapped DMA region is re-registered in the new
process using the new VA.  The ioctl to re-register identifies the mapping
by IOVA and length.

The same requirement holds for CPR of iommufd devices.  However, in the
iommufd framework, IOVA does not uniquely identify a dma mapping, and we
need to use the old VA as the unique identifier.  The new process
re-registers each mapping, passing the old VA and new VA to the kernel.
The old VA is never dereferenced in the new process, we just need its value.

I suspected that the void* which must not be dereferenced might make people
uncomfortable.  I have an older version of my code which adds a uint64_t
field to RAMBlock for recording and migrating the old VA.  The saving and
loading code is slightly less elegant, but no big deal.  Would you prefer
that?

- Steve



Re: [PATCH V1 23/26] migration: misc cpr-exec blockers

2024-05-27 Thread Steven Sistare via

On 5/24/2024 8:40 AM, Fabiano Rosas wrote:

Steve Sistare  writes:


Add blockers for cpr-exec migration mode for devices and options that do
not support it.

Signed-off-by: Steve Sistare 
---
  accel/xen/xen-all.c|  5 +
  backends/hostmem-epc.c | 12 ++--
  hw/vfio/migration.c|  3 ++-
  replay/replay.c|  6 ++
  4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c
index 0bdefce..9a7ed0f 100644
--- a/accel/xen/xen-all.c
+++ b/accel/xen/xen-all.c


This file is missing the migration/blocker.h include.


Good eyes, will fix - steve



Re: [PATCH V1 20/26] migration: cpr-exec mode

2024-05-27 Thread Steven Sistare via

On 5/24/2024 10:58 AM, Fabiano Rosas wrote:

Steve Sistare  writes:


Add the cpr-exec migration mode.  Usage:
   qemu-system-$arch -machine memfd-alloc=on ...
   migrate_set_parameter mode cpr-exec
   migrate_set_parameter cpr-exec-args \
   ... -incoming 
   migrate -d 

The migrate command stops the VM, saves state to the URI,
directly exec's a new version of QEMU on the same host,
replacing the original process while retaining its PID, and
loads state from the URI.  Guest RAM is preserved in place,
albeit with new virtual addresses.

Arguments for the new QEMU process are taken from the
@cpr-exec-args parameter.  The first argument should be the
path of a new QEMU binary, or a prefix command that exec's the
new QEMU binary.

Because old QEMU terminates when new QEMU starts, one cannot
stream data between the two, so the URI must be a type, such as
a file, that reads all data before old QEMU exits.

Memory backend objects must have the share=on attribute, and
must be mmap'able in the new QEMU process.  For example,
memory-backend-file is acceptable, but memory-backend-ram is
not.

The VM must be started with the '-machine memfd-alloc=on'
option.  This causes implicit ram blocks (those not explicitly
described by a memory-backend object) to be allocated by
mmap'ing a memfd.  Examples include VGA, ROM, and even guest
RAM when it is specified without a memory-backend object.

The implementation saves precreate vmstate at the end of normal
migration in migrate_fd_cleanup, and tells the main loop to call
cpr_exec.  Incoming qemu loads preceate state early, before objects
are created.  The memfds are kept open across exec by clearing the
close-on-exec flag, their values are saved in precreate vmstate,
and they are mmap'd in new qemu.

Note that the memfd-alloc option is not related to memory-backend-memfd.
Later patches add support for memory-backend-memfd, and for additional
devices, including vfio, chardev, and more.

Signed-off-by: Steve Sistare 
---
  include/migration/cpr.h  |  14 +
  include/migration/misc.h |   3 ++
  migration/cpr.c  | 131 +++
  migration/meson.build|   1 +
  migration/migration.c|  21 
  migration/migration.h|   5 +-
  migration/ram.c  |   1 +
  qapi/migration.json  |  30 ++-
  system/physmem.c |   2 +
  system/vl.c  |   4 ++
  10 files changed, 210 insertions(+), 2 deletions(-)
  create mode 100644 include/migration/cpr.h
  create mode 100644 migration/cpr.c
[...]
diff --git a/migration/migration.c b/migration/migration.c
index b5af6b5..0d91531 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -239,6 +239,7 @@ void migration_object_init(void)
  blk_mig_init();
  ram_mig_init();
  dirty_bitmap_mig_init();
+cpr_mig_init();
  }
  
  typedef struct {

@@ -1395,6 +1396,15 @@ static void migrate_fd_cleanup(MigrationState *s)
  qemu_fclose(tmp);
  }
  
+if (migrate_mode() == MIG_MODE_CPR_EXEC) {

+Error *err = NULL;
+if (migration_precreate_save()) {
+migrate_set_error(s, err);
+error_report_err(err);


There's an error_report_err() call already a few lines down.


+migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED);
+}
+}


Not a fan of saving state in the middle of the cleanup function. This
adds extra restrictions to migrate_fd_cleanup() which already tends to
be the source of a bunch of bugs.

Can this be done either entirely before or after migrate_fd_cleanup()?
There's only one callsite from which you actually want to do cpr-exec,
migration_iteration_finish(). It's no big deal if we have to play a bit
with the notifier call placement.

static void migration_iteration_finish(MigrationState *s)
{
...
 migration_bh_schedule(migrate_cpr_exec_bh, s);
 migration_bh_schedule(migrate_fd_cleanup_bh, s);
 bql_unlock();
}

IIUC, the BQL ensures the ordering here, but if that doesn't work we
could just call the cpr function right at the end of
migrate_fd_cleanup(). That would already be better than interleaving.

static void migrate_cpr_exec_bh(void *opaque)
{
 MigrationState *s = opaque;
 Error *err = NULL;

 if (migration_has_failed(s) || migrate_mode() != MIG_MODE_CPR_EXEC) {
 return;
 }

 assert(s->state == MIGRATION_STATUS_COMPLETED);

 if (migration_precreate_save()) {
 migrate_set_error(s, err);
 error_report_err(err);
 migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED);
 migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);

 return;
 }

 qemu_system_exec_request(cpr_exec, s->parameters.cpr_exec_args);
}


No problem, I can hoist the cpr exec logic out of migrate_fd_cleanup.
I'll call migration_precreate_save prior, and I'll register a notifier
for MIG_EVENT_PRECOPY_DONE that calls qemu_system_exec_request.

BTW the following does not work 

Re: [PATCH V1 00/26] Live update: cpr-exec

2024-05-24 Thread Steven Sistare

On 5/24/2024 9:02 AM, Fabiano Rosas wrote:

Steve Sistare  writes:


This patch series adds the live migration cpr-exec mode.  In this mode, QEMU
stops the VM, writes VM state to the migration URI, and directly exec's a
new version of QEMU on the same host, replacing the original process while
retaining its PID.  Guest RAM is preserved in place, albeit with new virtual
addresses.  The user completes the migration by specifying the -incoming
option, and by issuing the migrate-incoming command if necessary.  This
saves and restores VM state, with minimal guest pause time, so that QEMU may
be updated to a new version in between.

The new interfaces are:
   * cpr-exec (MigMode migration parameter)
   * cpr-exec-args (migration parameter)
   * memfd-alloc=on (command-line option for -machine)
   * only-migratable-modes (command-line argument)

The caller sets the mode parameter before invoking the migrate command.

Arguments for the new QEMU process are taken from the cpr-exec-args parameter.
The first argument should be the path of a new QEMU binary, or a prefix
command that exec's the new QEMU binary, and the arguments should include
the -incoming option.

Memory backend objects must have the share=on attribute, and must be mmap'able
in the new QEMU process.  For example, memory-backend-file is acceptable,
but memory-backend-ram is not.

QEMU must be started with the '-machine memfd-alloc=on' option.  This causes
implicit RAM blocks (those not explicitly described by a memory-backend
object) to be allocated by mmap'ing a memfd.  Examples include VGA, ROM,
and even guest RAM when it is specified without without reference to a
memory-backend object.   The memfds are kept open across exec, their values
are saved in vmstate which is retrieved after exec, and they are re-mmap'd.

The '-only-migratable-modes cpr-exec' option guarantees that the
configuration supports cpr-exec.  QEMU will exit at start time if not.

Example:

In this example, we simply restart the same version of QEMU, but in
a real scenario one would set a new QEMU binary path in cpr-exec-args.

   # qemu-kvm -monitor stdio -object
   memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
   -m 4G -machine memfd-alloc=on ...

   QEMU 9.1.50 monitor - type 'help' for more information
   (qemu) info status
   VM status: running
   (qemu) migrate_set_parameter mode cpr-exec
   (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming 
file:vm.state
   (qemu) migrate -d file:vm.state
   (qemu) QEMU 9.1.50 monitor - type 'help' for more information
   (qemu) info status
   VM status: running

cpr-exec mode preserves attributes of outgoing devices that must be known
before the device is created on the incoming side, such as the memfd descriptor
number, but currently the migration stream is read after all devices are
created.  To solve this problem, I add two VMStateDescription options:
precreate and factory.  precreate objects are saved to their own migration
stream, distinct from the main stream, and are read early by incoming QEMU,
before devices are created.  Factory objects are allocated on demand, without
relying on a pre-registered object's opaque address, which is necessary
because the devices to which the state will apply have not been created yet
and hence have not registered an opaque address to receive the state.

This patch series implements a minimal version of cpr-exec.  Future series
will add support for:
   * vfio
   * chardev's without loss of connectivity
   * vhost
   * fine-grained seccomp controls
   * hostmem-memfd
   * cpr-exec migration test


Steve Sistare (26):
   oslib: qemu_clear_cloexec
   vl: helper to request re-exec
   migration: SAVEVM_FOREACH
   migration: delete unused parameter mis
   migration: precreate vmstate
   migration: precreate vmstate for exec
   migration: VMStateId
   migration: vmstate_info_void_ptr
   migration: vmstate_register_named
   migration: vmstate_unregister_named
   migration: vmstate_register at init time
   migration: vmstate factory object
   physmem: ram_block_create
   physmem: hoist guest_memfd creation
   physmem: hoist host memory allocation
   physmem: set ram block idstr earlier
   machine: memfd-alloc option
   migration: cpr-exec-args parameter
   physmem: preserve ram blocks for cpr
   migration: cpr-exec mode
   migration: migrate_add_blocker_mode
   migration: ram block cpr-exec blockers
   migration: misc cpr-exec blockers
   seccomp: cpr-exec blocker
   migration: fix mismatched GPAs during cpr-exec
   migration: only-migratable-modes

  accel/xen/xen-all.c|   5 +
  backends/hostmem-epc.c |  12 +-
  hmp-commands.hx|   2 +-
  hw/core/machine.c  |  22 +++
  hw/core/qdev.c |   1 +
  hw/intc/apic_common.c  |   2 +-
  hw/vfio/migration.c|   3 +-
  include/exec/cpu-common.h  |   3 +-
  include/exec/memory.h  |  15 ++
  include/exec/ramblock.h|  10 +-
  

Re: [PATCH V1 00/26] Live update: cpr-exec

2024-05-21 Thread Steven Sistare

I understand, thanks.  If I can help with any of your todo list,
just ask - steve

On 5/20/2024 10:31 PM, Peter Xu wrote:
Conference back then pto until today, so tomorrow will be my first working day 
after those. Sorry Steve, will try my best to read it before next week. I didn't 
dare to read too much my inbox yet.  A bit scared but need to face it tomorrow.


On Mon, May 20, 2024, 6:28 p.m. Fabiano Rosas <mailto:faro...@suse.de>> wrote:


    Steven Sistare mailto:steven.sist...@oracle.com>> writes:

 > Hi Peter, Hi Fabiano,
 >    Will you have time to review the migration guts of this series any
time soon?
 > In particular:
 >
 > [PATCH V1 05/26] migration: precreate vmstate
 > [PATCH V1 06/26] migration: precreate vmstate for exec
 > [PATCH V1 12/26] migration: vmstate factory object
 > [PATCH V1 18/26] migration: cpr-exec-args parameter
 > [PATCH V1 20/26] migration: cpr-exec mode
 >

I'll get to them this week. I'm trying to make some progress with my own
code before I forget how to program. I'm also trying to find some time
to implement the device options in the migration tests so we can stop
these virtio-* breakages that have been popping up.





Re: [PATCH V1 00/26] Live update: cpr-exec

2024-05-20 Thread Steven Sistare

Hi Peter, Hi Fabiano,
  Will you have time to review the migration guts of this series any time soon?
In particular:

[PATCH V1 05/26] migration: precreate vmstate
[PATCH V1 06/26] migration: precreate vmstate for exec
[PATCH V1 12/26] migration: vmstate factory object
[PATCH V1 18/26] migration: cpr-exec-args parameter
[PATCH V1 20/26] migration: cpr-exec mode

- Steve

On 4/29/2024 11:55 AM, Steve Sistare wrote:

This patch series adds the live migration cpr-exec mode.  In this mode, QEMU
stops the VM, writes VM state to the migration URI, and directly exec's a
new version of QEMU on the same host, replacing the original process while
retaining its PID.  Guest RAM is preserved in place, albeit with new virtual
addresses.  The user completes the migration by specifying the -incoming
option, and by issuing the migrate-incoming command if necessary.  This
saves and restores VM state, with minimal guest pause time, so that QEMU may
be updated to a new version in between.

The new interfaces are:
   * cpr-exec (MigMode migration parameter)
   * cpr-exec-args (migration parameter)
   * memfd-alloc=on (command-line option for -machine)
   * only-migratable-modes (command-line argument)

The caller sets the mode parameter before invoking the migrate command.

Arguments for the new QEMU process are taken from the cpr-exec-args parameter.
The first argument should be the path of a new QEMU binary, or a prefix
command that exec's the new QEMU binary, and the arguments should include
the -incoming option.

Memory backend objects must have the share=on attribute, and must be mmap'able
in the new QEMU process.  For example, memory-backend-file is acceptable,
but memory-backend-ram is not.

QEMU must be started with the '-machine memfd-alloc=on' option.  This causes
implicit RAM blocks (those not explicitly described by a memory-backend
object) to be allocated by mmap'ing a memfd.  Examples include VGA, ROM,
and even guest RAM when it is specified without without reference to a
memory-backend object.   The memfds are kept open across exec, their values
are saved in vmstate which is retrieved after exec, and they are re-mmap'd.

The '-only-migratable-modes cpr-exec' option guarantees that the
configuration supports cpr-exec.  QEMU will exit at start time if not.

Example:

In this example, we simply restart the same version of QEMU, but in
a real scenario one would set a new QEMU binary path in cpr-exec-args.

   # qemu-kvm -monitor stdio -object
   memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
   -m 4G -machine memfd-alloc=on ...

   QEMU 9.1.50 monitor - type 'help' for more information
   (qemu) info status
   VM status: running
   (qemu) migrate_set_parameter mode cpr-exec
   (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming 
file:vm.state
   (qemu) migrate -d file:vm.state
   (qemu) QEMU 9.1.50 monitor - type 'help' for more information
   (qemu) info status
   VM status: running

cpr-exec mode preserves attributes of outgoing devices that must be known
before the device is created on the incoming side, such as the memfd descriptor
number, but currently the migration stream is read after all devices are
created.  To solve this problem, I add two VMStateDescription options:
precreate and factory.  precreate objects are saved to their own migration
stream, distinct from the main stream, and are read early by incoming QEMU,
before devices are created.  Factory objects are allocated on demand, without
relying on a pre-registered object's opaque address, which is necessary
because the devices to which the state will apply have not been created yet
and hence have not registered an opaque address to receive the state.

This patch series implements a minimal version of cpr-exec.  Future series
will add support for:
   * vfio
   * chardev's without loss of connectivity
   * vhost
   * fine-grained seccomp controls
   * hostmem-memfd
   * cpr-exec migration test


Steve Sistare (26):
   oslib: qemu_clear_cloexec
   vl: helper to request re-exec
   migration: SAVEVM_FOREACH
   migration: delete unused parameter mis
   migration: precreate vmstate
   migration: precreate vmstate for exec
   migration: VMStateId
   migration: vmstate_info_void_ptr
   migration: vmstate_register_named
   migration: vmstate_unregister_named
   migration: vmstate_register at init time
   migration: vmstate factory object
   physmem: ram_block_create
   physmem: hoist guest_memfd creation
   physmem: hoist host memory allocation
   physmem: set ram block idstr earlier
   machine: memfd-alloc option
   migration: cpr-exec-args parameter
   physmem: preserve ram blocks for cpr
   migration: cpr-exec mode
   migration: migrate_add_blocker_mode
   migration: ram block cpr-exec blockers
   migration: misc cpr-exec blockers
   seccomp: cpr-exec blocker
   migration: fix mismatched GPAs during cpr-exec
   migration: only-migratable-modes

  accel/xen/xen-all.c|   5 +
  backends/hostmem-epc.c |  

Re: CPR/liveupdate: test results using prior bug fix

2024-05-16 Thread Steven Sistare

On 5/16/2024 1:24 PM, Michael Galaxy wrote:

On 5/14/24 08:54, Michael Tokarev wrote:

On 5/14/24 16:39, Michael Galaxy wrote:

Steve,

OK, so it does not look like this bugfix you wrote was included in 8.2.4 
(which was released yesterday). Unfortunately, that means that anyone using 
CPR in that release will still (eventually) encounter the bug like I did.


8.2.4 is basically a "bugfix" release for 8.2.3 which I somewhat
screwed up (in a minor way), plus a few currently (at the time)
queued up changes.   8.2.3 was a big release though.

I would recommend that y'all consider cherry-picking, perhaps, the relevant 
commits for a possible 8.2.5 ?


Please Cc changes which are relevant for -stable to, well,
qemu-sta...@nongnu.org :)

Which changes needs to be picked up?

Steve, can you comment here, please? At a minimum, we have this one: [PULL 
20/25] migration: stop vm for cpr


But that pull came with a handful of other changes that are also not in QEMU v8, 
so I suspect I'm missing some other important changes that might be important 
for a stable release?


- Michael


Hi Michael, I sent the full list of commits to this distribution yesterday, and
I see it in my Sent email folder.  Copying verbatim:


Michael Galaxy, I'm afraid you are out of luck with respect to qemu 8.2.
It has some of the cpr reboot commits, but is missing the following:

87a2848 migration: massage cpr-reboot documentation
cbdafc1 migration: options incompatible with cpr
ce5db1c migration: update cpr-reboot description
9867d4d migration: stop vm for cpr
4af667f migration: notifier error checking
bf78a04 migration: refactor migrate_fd_connect failures
6835f5a migration: per-mode notifiers
5663dd3 migration: MigrationNotifyFunc
c763a23e migration: remove postcopy_after_devices
9d9babf migration: MigrationEvent for notifiers
3e77573 migration: convert to NotifierWithReturn
d91f33c migration: remove error from notifier data
be19d83 notify: pass error to notifier with return
b12635f migration: fix coverity migrate_mode finding
2b58a8b tests/qtest: postcopy migration with suspend
b1fdd21 tests/qtest: precopy migration with suspend
5014478 tests/qtest: option to suspend during migration
f064975 tests/qtest: migration events
49a5020 migration: preserve suspended for bg_migration
58b1057 migration: preserve suspended for snapshot
b4e9ddc migration: preserve suspended runstate
d3c86c99 migration: propagate suspended runstate
9ff5e79 cpus: vm_resume
0f1db06 cpus: check running not RUN_STATE_RUNNING
b9ae473 cpus: stop vm in suspended runstate
f06f316 cpus: vm_was_suspended

All of those landed in qemu 9.0.
---

- Steve



Re: CPR/liveupdate: test results using prior bug fix

2024-05-15 Thread Steven Sistare

On 5/14/2024 11:33 AM, Michael Tokarev wrote:

14.05.2024 16:54, Michael Tokarev пишет:

On 5/14/24 16:39, Michael Galaxy wrote:

Steve,

OK, so it does not look like this bugfix you wrote was included in 8.2.4 
(which was released yesterday). Unfortunately, that means that anyone using 
CPR in that release will still (eventually) encounter the bug like I did.


8.2.4 is basically a "bugfix" release for 8.2.3 which I somewhat
screwed up (in a minor way), plus a few currently (at the time)
queued up changes.   8.2.3 was a big release though.

I would recommend that y'all consider cherry-picking, perhaps, the relevant 
commits for a possible 8.2.5 ?


Please Cc changes which are relevant for -stable to, well,
qemu-sta...@nongnu.org :)

Which changes needs to be picked up?

Please also note this particular change does not apply cleanly to
stable-8.2 branch due to other changes in this area between 8.2
and 9.0, in particular, in postcopy_start().  So if this is to be
picked up for stable-8.2, I need help from someone who better
understands this code to select changes to pick.

Thanks,

/mjt


Michael Galaxy, I'm afraid you are out of luck with respect to qemu 8.2.
It has some of the cpr reboot commits, but is missing the following:

87a2848 migration: massage cpr-reboot documentation
cbdafc1 migration: options incompatible with cpr
ce5db1c migration: update cpr-reboot description
9867d4d migration: stop vm for cpr
4af667f migration: notifier error checking
bf78a04 migration: refactor migrate_fd_connect failures
6835f5a migration: per-mode notifiers
5663dd3 migration: MigrationNotifyFunc
c763a23e migration: remove postcopy_after_devices
9d9babf migration: MigrationEvent for notifiers
3e77573 migration: convert to NotifierWithReturn
d91f33c migration: remove error from notifier data
be19d83 notify: pass error to notifier with return
b12635f migration: fix coverity migrate_mode finding
2b58a8b tests/qtest: postcopy migration with suspend
b1fdd21 tests/qtest: precopy migration with suspend
5014478 tests/qtest: option to suspend during migration
f064975 tests/qtest: migration events
49a5020 migration: preserve suspended for bg_migration
58b1057 migration: preserve suspended for snapshot
b4e9ddc migration: preserve suspended runstate
d3c86c99 migration: propagate suspended runstate
9ff5e79 cpus: vm_resume
0f1db06 cpus: check running not RUN_STATE_RUNNING
b9ae473 cpus: stop vm in suspended runstate
f06f316 cpus: vm_was_suspended

All of those landed in qemu 9.0.

- Steve



Re: CPR/liveupdate: test results using prior bug fix

2024-05-13 Thread Steven Sistare

Hi Michael,
  No surprise here, I did see some of the same failure messages and they
prompted me to submit the fix.  They are all symptoms of "the possibility of
ram and device state being out of sync" as mentioned in the commit.

I am not familiar with the process for maintaining old releases for qemu.
Perhaps someone on this list can comment on 8.2.3.

- Steve

On 5/13/2024 2:22 PM, Michael Galaxy wrote:

Hi Steve,

We found that this specific change in particular ("migration: stop vm for cpr") 
fixes a bug that we've identified in testing back-to-back live updates in a lab 
environment.


More specifically, *without* this change (which is not available in 8.2.2, but 
*is* available in 9.0.0) causes the metadata save file to be corrupted when 
doing live updates one after another. Typically we see a corrupted save file 
somewhere in between 20 and 30 live updates and while doing a git bisect, we 
found that this change makes the problem go away.


Were you aware? Is there any plan in place to cherry pick this for 8.2.3, 
perhaps or a plan to release 8.2.3 at some point?


Here are some examples of how the bug manifests in different locations of the 
QEMU metadata save file:


2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load mtrr_var:base
2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load cpu:env.mtrr_var
2024-04-26T13:28:53Z qemu-system-x86_64: error while loading state for instance 
0x1b of device 'cpu'
2024-04-26T13:28:53Z qemu-system-x86_64: load of migration failed: Input/output 
error

And another:

2024-04-17T16:09:47Z qemu-system-x86_64: check_section_footer: Read section 
footer failed: -5
2024-04-17T16:09:47Z qemu-system-x86_64: load of migration failed: Invalid 
argument

And another:

2024-04-30T21:53:29Z qemu-system-x86_64: Unable to read ID string for section 
163
2024-04-30T21:53:29Z qemu-system-x86_64: load of migration failed: Invalid 
argument

And another:

2024-05-01T16:01:44Z qemu-system-x86_64: Unable to read ID string for section 
164
2024-05-01T16:01:44Z qemu-system-x86_64: load of migration failed: Invalid 
argument
  

As you can see, they occur quite randomly, but generally it takes at least 
20-30+ live updates before the problem occurs.


- Michael

On 2/27/24 23:13, pet...@redhat.com wrote:

From: Steve Sistare

When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because qmp_cont
rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare
Reviewed-by: Peter Xu
Link:https://urldefense.com/v3/__https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com__;!!GjvTz_vk!QLsFOCX-x2U9bzAo98SdidKlomHrmf_t0UmQKtgudoIcaDVoAJOPm39ZqaNP_nT5I8QqVfSgwhDZmg$  
Signed-off-by: Peter Xu

---
  include/migration/misc.h |  1 +
  migration/migration.h|  2 --
  migration/migration.c| 51 
  3 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e4933b815b..5d1aa593ed 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,6 +60,7 @@ void migration_object_init(void);
  void migration_shutdown(void);
  bool migration_is_idle(void);
  bool migration_is_active(MigrationState *);
+bool migrate_mode_is_cpr(MigrationState *);
  
  typedef enum MigrationEventType {

  MIG_EVENT_PRECOPY_SETUP,
diff --git a/migration/migration.h b/migration/migration.h
index aef8afbe1f..65c0b61cbd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s);
   */
  void migration_rp_kick(MigrationState *s);
  
-int migration_stop_vm(RunState state);

-
  #endif
diff --git a/migration/migration.c b/migration/migration.c
index 37c836b0b0..90a90947fb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,11 +167,19 @@ static gint page_request_addr_cmp(gconstpointer ap, 
gconstpointer bp)
  return (a > b) - (a < b);
  }
  
-int migration_stop_vm(RunState state)

+static int migration_stop_vm(MigrationState *s, RunState state)
  {
-int ret = vm_stop_force_state(state);
+int ret;
+
+migration_downtime_start(s);
+
+s->vm_old_state = runstate_get();
+global_state_store();
+
+ret = vm_stop_force_state(state);
  
  trace_vmstate_downtime_checkpoint("src-vm-stopped");

+trace_migration_completion_vm_stop(ret);
  
  return ret;

  }
@@ -1602,6 +1610,11 @@ bool migration_is_active(MigrationState *s)
  s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
  }
  
+bool migrate_mode_is_cpr(MigrationState *s)

+{
+return s->parameters.mode == MIG_MODE_CPR_REBOOT;
+}
+
  int migrate_init(MigrationState *s, Error **errp)
  {
  int ret;
@@ -2454,10 +2467,7 @@ static int postcopy_start(MigrationState *ms, 

Re: [PATCH V1 26/26] migration: only-migratable-modes

2024-05-13 Thread Steven Sistare

On 5/9/2024 3:14 PM, Fabiano Rosas wrote:

Steve Sistare  writes:


Add the only-migratable-modes option as a generalization of only-migratable.
Only devices that support all requested modes are allowed.

Signed-off-by: Steve Sistare 
---
  include/migration/misc.h   |  3 +++
  include/sysemu/sysemu.h|  1 -
  migration/migration-hmp-cmds.c | 26 +-
  migration/migration.c  | 22 +-
  migration/savevm.c |  2 +-
  qemu-options.hx| 16 ++--
  system/globals.c   |  1 -
  system/vl.c| 13 -
  target/s390x/cpu_models.c  |  4 +++-
  9 files changed, 75 insertions(+), 13 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 5b963ba..3ad2cd9 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -119,6 +119,9 @@ bool migration_incoming_postcopy_advised(void);
  /* True if background snapshot is active */
  bool migration_in_bg_snapshot(void);
  
+void migration_set_required_mode(MigMode mode);

+bool migration_mode_required(MigMode mode);
+
  /* migration/block-dirty-bitmap.c */
  void dirty_bitmap_mig_init(void);
  
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h

index 5b4397e..0a9c4b4 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -8,7 +8,6 @@
  
  /* vl.c */
  
-extern int only_migratable;

  extern const char *qemu_name;
  extern QemuUUID qemu_uuid;
  extern bool qemu_uuid_set;
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 414c7e8..ca913b7 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -16,6 +16,7 @@
  #include "qemu/osdep.h"
  #include "block/qapi.h"
  #include "migration/snapshot.h"
+#include "migration/misc.h"
  #include "monitor/hmp.h"
  #include "monitor/monitor.h"
  #include "qapi/error.h"
@@ -33,6 +34,28 @@
  #include "options.h"
  #include "migration.h"
  
+static void migration_dump_modes(Monitor *mon)

+{
+int mode, n = 0;
+
+monitor_printf(mon, "only-migratable-modes: ");
+
+for (mode = 0; mode < MIG_MODE__MAX; mode++) {
+if (migration_mode_required(mode)) {
+if (n++) {
+monitor_printf(mon, ",");
+}
+monitor_printf(mon, "%s", MigMode_str(mode));
+}
+}
+
+if (!n) {
+monitor_printf(mon, "none\n");
+} else {
+monitor_printf(mon, "\n");
+}
+}
+
  static void migration_global_dump(Monitor *mon)
  {
  MigrationState *ms = migrate_get_current();
@@ -41,7 +64,7 @@ static void migration_global_dump(Monitor *mon)
  monitor_printf(mon, "store-global-state: %s\n",
 ms->store_global_state ? "on" : "off");
  monitor_printf(mon, "only-migratable: %s\n",
-   only_migratable ? "on" : "off");
+   migration_mode_required(MIG_MODE_NORMAL) ? "on" : "off");
  monitor_printf(mon, "send-configuration: %s\n",
 ms->send_configuration ? "on" : "off");
  monitor_printf(mon, "send-section-footer: %s\n",
@@ -50,6 +73,7 @@ static void migration_global_dump(Monitor *mon)
 ms->decompress_error_check ? "on" : "off");
  monitor_printf(mon, "clear-bitmap-shift: %u\n",
 ms->clear_bitmap_shift);
+migration_dump_modes(mon);
  }
  
  void hmp_info_migrate(Monitor *mon, const QDict *qdict)

diff --git a/migration/migration.c b/migration/migration.c
index 4984dee..5535b84 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1719,17 +1719,29 @@ static bool is_busy(Error **reasonp, Error **errp)
  return false;
  }
  
-static bool is_only_migratable(Error **reasonp, Error **errp, int modes)

+static int migration_modes_required;
+
+void migration_set_required_mode(MigMode mode)
+{
+migration_modes_required |= BIT(mode);
+}
+
+bool migration_mode_required(MigMode mode)
+{
+return !!(migration_modes_required & BIT(mode));
+}
+
+static bool modes_are_required(Error **reasonp, Error **errp, int modes)
  {
  ERRP_GUARD();
  
-if (only_migratable && (modes & BIT(MIG_MODE_NORMAL))) {

+if (migration_modes_required & modes) {
  error_propagate_prepend(errp, *reasonp,
-"disallowing migration blocker "
-"(--only-migratable) for: ");
+"-only-migratable{-modes}  specified, but: ");


extra space before 'specified'


Will fix, thanks.


  *reasonp = NULL;
  return true;
  }
+
  return false;
  }
  
@@ -1783,7 +1795,7 @@ int migrate_add_blocker_modes(Error **reasonp, Error **errp, MigMode mode, ...)

  modes = get_modes(mode, ap);
  va_end(ap);
  
-if (is_only_migratable(reasonp, errp, modes)) {

+if (modes_are_required(reasonp, errp, modes)) {
  return -EACCES;
  } else if (is_busy(reasonp, errp)) {
  

Re: [PATCH V1 13/26] physmem: ram_block_create

2024-05-13 Thread Steven Sistare

On 5/13/2024 2:37 PM, Fabiano Rosas wrote:

Steve Sistare  writes:


Create a common subroutine to allocate a RAMBlock, de-duping the code to
populate its common fields.  Add a trace point for good measure.
No functional change.

Signed-off-by: Steve Sistare 
---
  system/physmem.c| 47 ++-
  system/trace-events |  3 +++
  2 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index c3d04ca..6216b14 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -52,6 +52,7 @@
  #include "sysemu/hw_accel.h"
  #include "sysemu/xen-mapcache.h"
  #include "trace/trace-root.h"
+#include "trace.h"
  
  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE

  #include 
@@ -1918,11 +1919,29 @@ out_free:
  }
  }
  
+static RAMBlock *ram_block_create(MemoryRegion *mr, ram_addr_t size,

+  ram_addr_t max_size, uint32_t ram_flags)
+{
+RAMBlock *rb = g_malloc0(sizeof(*rb));
+
+rb->used_length = size;
+rb->max_length = max_size;
+rb->fd = -1;
+rb->flags = ram_flags;
+rb->page_size = qemu_real_host_page_size();
+rb->mr = mr;
+rb->guest_memfd = -1;
+trace_ram_block_create(rb->idstr, rb->flags, rb->fd, rb->used_length,


There's no idstr at this point, is there? I think this needs to be
memory_region_name(mr).


Thanks, will fix. That is a bug in my patch factoring.  I add the call to
qemu_ram_set_idstr in patch "physmem: set ram block idstr earlier".

- Steve


+   rb->max_length, mr->align);
+return rb;
+}
+
  #ifdef CONFIG_POSIX
  RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
   uint32_t ram_flags, int fd, off_t offset,
   Error **errp)
  {
+void *host;
  RAMBlock *new_block;
  Error *local_err = NULL;
  int64_t file_size, file_align;
@@ -1962,19 +1981,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
  return NULL;
  }
  
-new_block = g_malloc0(sizeof(*new_block));

-new_block->mr = mr;
-new_block->used_length = size;
-new_block->max_length = size;
-new_block->flags = ram_flags;
-new_block->guest_memfd = -1;
-new_block->host = file_ram_alloc(new_block, size, fd, !file_size, offset,
- errp);
-if (!new_block->host) {
+new_block = ram_block_create(mr, size, size, ram_flags);
+host = file_ram_alloc(new_block, size, fd, !file_size, offset, errp);
+if (!host) {
  g_free(new_block);
  return NULL;
  }
  
+new_block->host = host;

  ram_block_add(new_block, _err);
  if (local_err) {
  g_free(new_block);
@@ -1982,7 +1996,6 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
  return NULL;
  }
  return new_block;
-
  }
  
  
@@ -2054,18 +2067,10 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,

  align = MAX(align, TARGET_PAGE_SIZE);
  size = ROUND_UP(size, align);
  max_size = ROUND_UP(max_size, align);
-
-new_block = g_malloc0(sizeof(*new_block));
-new_block->mr = mr;
-new_block->resized = resized;
-new_block->used_length = size;
-new_block->max_length = max_size;
  assert(max_size >= size);
-new_block->fd = -1;
-new_block->guest_memfd = -1;
-new_block->page_size = qemu_real_host_page_size();
-new_block->host = host;
-new_block->flags = ram_flags;
+new_block = ram_block_create(mr, size, max_size, ram_flags);
+new_block->resized = resized;
+
  ram_block_add(new_block, _err);
  if (local_err) {
  g_free(new_block);
diff --git a/system/trace-events b/system/trace-events
index 69c9044..f0a80ba 100644
--- a/system/trace-events
+++ b/system/trace-events
@@ -38,3 +38,6 @@ dirtylimit_state_finalize(void)
  dirtylimit_throttle_pct(int cpu_index, uint64_t pct, int64_t time_us) "CPU[%d] throttle percent: 
%" PRIu64 ", throttle adjust time %"PRIi64 " us"
  dirtylimit_set_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set dirty page rate 
limit %"PRIu64
  dirtylimit_vcpu_execute(int cpu_index, int64_t sleep_time_us) "CPU[%d] sleep %"PRIi64 
" us"
+
+# physmem.c
+ram_block_create(const char *name, uint32_t flags, int fd, size_t used_length, size_t 
max_length, size_t align) "%s, flags %u, fd %d, len %lu, maxlen %lu, align %lu"




Re: [PATCH V1 24/26] seccomp: cpr-exec blocker

2024-05-13 Thread Steven Sistare

On 5/10/2024 3:54 AM, Daniel P. Berrangé wrote:

On Mon, Apr 29, 2024 at 08:55:33AM -0700, Steve Sistare wrote:

cpr-exec mode needs permission to exec.  Block it if permission is denied.

Signed-off-by: Steve Sistare 
---
  include/sysemu/seccomp.h |  1 +
  system/qemu-seccomp.c| 10 --
  system/vl.c  |  6 ++
  3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/seccomp.h b/include/sysemu/seccomp.h
index fe85989..023c0a1 100644
--- a/include/sysemu/seccomp.h
+++ b/include/sysemu/seccomp.h
@@ -22,5 +22,6 @@
  #define QEMU_SECCOMP_SET_RESOURCECTL (1 << 4)
  
  int parse_sandbox(void *opaque, QemuOpts *opts, Error **errp);

+uint32_t qemu_seccomp_get_opts(void);
  
  #endif

diff --git a/system/qemu-seccomp.c b/system/qemu-seccomp.c
index 5c20ac0..0d2a561 100644
--- a/system/qemu-seccomp.c
+++ b/system/qemu-seccomp.c
@@ -360,12 +360,18 @@ static int seccomp_start(uint32_t seccomp_opts, Error 
**errp)
  return rc < 0 ? -1 : 0;
  }
  
+static uint32_t seccomp_opts;

+
+uint32_t qemu_seccomp_get_opts(void)
+{
+return seccomp_opts;
+}
+
  int parse_sandbox(void *opaque, QemuOpts *opts, Error **errp)
  {
  if (qemu_opt_get_bool(opts, "enable", false)) {
-uint32_t seccomp_opts = QEMU_SECCOMP_SET_DEFAULT
-| QEMU_SECCOMP_SET_OBSOLETE;
  const char *value = NULL;
+seccomp_opts = QEMU_SECCOMP_SET_DEFAULT | QEMU_SECCOMP_SET_OBSOLETE;
  
  value = qemu_opt_get(opts, "obsolete");

  if (value) {
diff --git a/system/vl.c b/system/vl.c
index 7252100..b76881e 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -76,6 +76,7 @@
  #include "hw/block/block.h"
  #include "hw/i386/x86.h"
  #include "hw/i386/pc.h"
+#include "migration/blocker.h"
  #include "migration/cpr.h"
  #include "migration/misc.h"
  #include "migration/snapshot.h"
@@ -2493,6 +2494,11 @@ static void qemu_process_early_options(void)
  QemuOptsList *olist = qemu_find_opts_err("sandbox", NULL);
  if (olist) {
  qemu_opts_foreach(olist, parse_sandbox, NULL, _fatal);
+if (qemu_seccomp_get_opts() & QEMU_SECCOMP_SET_SPAWN) {
+Error *blocker = NULL;
+error_setg(, "-sandbox denies exec for cpr-exec");
+migrate_add_blocker_mode(, MIG_MODE_CPR_EXEC, 
_fatal);
+}
  }
  #endi


There are a whole pile of features that get blocked wehn -sandbox is
used. I'm not convinced we should be adding code to check for specific
blocked features, as such a list will always be incomplete at best, and
incorrectly block things at worst.

I view this primarily as a documentation task for the cpr-exec command.


For cpr and live migration, we do our best to prevent breaking the guest
for cases we know will fail.  Independently, a clear error message here
will reduce error reports for this new cpr feature.

Would it be more palatable if I move this blocker's creation to cpr_mig_init?

- Steve



Re: [PATCH V1 22/26] migration: ram block cpr-exec blockers

2024-05-13 Thread Steven Sistare

On 5/9/2024 2:01 PM, Fabiano Rosas wrote:

Steve Sistare  writes:


Unlike cpr-reboot mode, cpr-exec mode cannot save volatile ram blocks in the
migration stream file and recreate them later, because the physical memory for
the blocks is pinned and registered for vfio.  Add an exec-mode blocker for
volatile ram blocks.

Also add a blocker for RAM_GUEST_MEMFD.  Preserving guest_memfd may be
sufficient for cpr-exec, but it has not been tested yet.

- Steve


extra text here


Will fix, thanks - steve


Signed-off-by: Steve Sistare 


Reviewed-by: Fabiano Rosas 





Re: [PATCH V1 09/26] migration: vmstate_register_named

2024-05-13 Thread Steven Sistare

On 5/9/2024 10:32 AM, Fabiano Rosas wrote:

Fabiano Rosas  writes:


Steve Sistare  writes:


Define vmstate_register_named which takes the instance name as its first
parameter, instead of generating the name from VMStateIf of the Object.
This will be needed to register objects that are not Objects.  Pass the
new name parameter to vmstate_register_with_alias_id.

Signed-off-by: Steve Sistare 


Reviewed-by: Fabiano Rosas 


Actually, can't we define a wrapper type just for this purpose? For
example, looking at dbus-vmstate.c:


One would need to provide a separate wrapper for each struct to be registered
as vmstate.  This patch set only has RAMBlock, but there are more coming in
my next patch sets.  vmstate_register_named avoids adding such boilerplate,
and makes it easier to add more cpr state in the future.

- Steve


static void dbus_vmstate_class_init(ObjectClass *oc, void *data)
{
...
 VMStateIfClass *vc = VMSTATE_IF_CLASS(oc);

 vc->get_id = dbus_vmstate_get_id;
...
}

static const TypeInfo dbus_vmstate_info = {
 .name = TYPE_DBUS_VMSTATE,
 .parent = TYPE_OBJECT,
 .instance_size = sizeof(DBusVMState),
 .instance_finalize = dbus_vmstate_finalize,
 .class_init = dbus_vmstate_class_init,
 .interfaces = (InterfaceInfo[]) {
 { TYPE_USER_CREATABLE },   // without this one
 { TYPE_VMSTATE_IF },
 { }
 }
};

static void register_types(void)
{
 type_register_static(_vmstate_info);
}
type_init(register_types);




Re: [PATCH V1 05/26] migration: precreate vmstate

2024-05-13 Thread Steven Sistare

On 5/7/2024 5:02 PM, Fabiano Rosas wrote:

Steve Sistare  writes:


Provide the VMStateDescription precreate field to mark objects that must
be loaded on the incoming side before devices have been created, because
they provide properties that will be needed at creation time.  They will
be saved to and loaded from their own QEMUFile, via


It's not obvious to me what the reason is to have a separate
QEMUFile. Could you expand on this?


The migration stream is read in the calling sequence at B below, but precreate
state is needed at A before chardev and memory backends are created.

main()
  qemu_init()
A:
qemu_create_early_backends()
qemu_create_late_backends()
migration_object_init()
qmp_x_exit_preconfig()
  qmp_migrate_incoming()

  qemu_default_main()
qemu_main_loop()
  fd_accept_incoming_migration()
migration_channel_process_incoming()
  migration_ioc_process_incoming()
migration_incoming_process()
  process_incoming_migration_co()
B:
qemu_loadvm_state()

precreate objects could be emitted first in the existing migration stream and
read at A, but this requires untangling numerous ordering dependencies amongst
migration_object_init, qemu_create_machine, configure_accelerators, monitor
init, and the main loop.

- Steve



Re: [PATCH V1 06/26] migration: precreate vmstate for exec

2024-05-13 Thread Steven Sistare

On 5/6/2024 7:34 PM, Fabiano Rosas wrote:

Steve Sistare  writes:


Provide migration_precreate_save for saving precreate vmstate across exec.
Create a memfd, save its value in the environment, and serialize state
to it.  Reverse the process in migration_precreate_load.

Signed-off-by: Steve Sistare 
---
  include/migration/misc.h |   5 ++
  migration/meson.build|   1 +
  migration/precreate.c| 139 +++
  3 files changed, 145 insertions(+)
  create mode 100644 migration/precreate.c

diff --git a/include/migration/misc.h b/include/migration/misc.h
index c9e200f..cf30351 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -56,6 +56,11 @@ AnnounceParameters *migrate_announce_params(void);
  
  void dump_vmstate_json_to_file(FILE *out_fp);
  
+/* migration/precreate.c */

+int migration_precreate_save(Error **errp);
+void migration_precreate_unsave(void);
+int migration_precreate_load(Error **errp);
+
  /* migration/migration.c */
  void migration_object_init(void);
  void migration_shutdown(void);
diff --git a/migration/meson.build b/migration/meson.build
index f76b1ba..50e7cb2 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -26,6 +26,7 @@ system_ss.add(files(
'ram-compress.c',
'options.c',
'postcopy-ram.c',
+  'precreate.c',
'savevm.c',
'socket.c',
'tls.c',
diff --git a/migration/precreate.c b/migration/precreate.c
new file mode 100644
index 000..0bf5e1f
--- /dev/null
+++ b/migration/precreate.c
@@ -0,0 +1,139 @@
+/*
+ * Copyright (c) 2022, 2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "qemu/memfd.h"
+#include "qapi/error.h"
+#include "io/channel-file.h"
+#include "migration/misc.h"
+#include "migration/qemu-file.h"
+#include "migration/savevm.h"
+
+#define PRECREATE_STATE_NAME "QEMU_PRECREATE_STATE"
+
+static QEMUFile *qemu_file_new_fd_input(int fd, const char *name)
+{
+g_autoptr(QIOChannelFile) fioc = qio_channel_file_new_fd(fd);
+QIOChannel *ioc = QIO_CHANNEL(fioc);
+qio_channel_set_name(ioc, name);
+return qemu_file_new_input(ioc);
+}
+
+static QEMUFile *qemu_file_new_fd_output(int fd, const char *name)
+{
+g_autoptr(QIOChannelFile) fioc = qio_channel_file_new_fd(fd);
+QIOChannel *ioc = QIO_CHANNEL(fioc);
+qio_channel_set_name(ioc, name);
+return qemu_file_new_output(ioc);
+}
+
+static int memfd_create_named(const char *name, Error **errp)
+{
+int mfd;
+char val[16];
+
+mfd = memfd_create(name, 0);
+if (mfd < 0) {
+error_setg_errno(errp, errno, "memfd_create failed");
+return -1;
+}
+
+/* Remember mfd in environment for post-exec load */
+qemu_clear_cloexec(mfd);
+snprintf(val, sizeof(val), "%d", mfd);
+g_setenv(name, val, 1);
+
+return mfd;
+}
+
+static int memfd_find_named(const char *name, int *mfd_p, Error **errp)
+{
+const char *val = g_getenv(name);
+
+if (!val) {
+*mfd_p = -1;
+return 0;   /* No memfd was created, not an error */
+}
+g_unsetenv(name);
+if (qemu_strtoi(val, NULL, 10, mfd_p)) {
+error_setg(errp, "Bad %s env value %s", PRECREATE_STATE_NAME, val);
+return -1;
+}
+lseek(*mfd_p, 0, SEEK_SET);
+return 0;
+}
+
+static void memfd_delete_named(const char *name)
+{
+int mfd;
+const char *val = g_getenv(name);
+
+if (val) {
+g_unsetenv(name);
+if (!qemu_strtoi(val, NULL, 10, )) {
+close(mfd);
+}
+}
+}
+
+static QEMUFile *qemu_file_new_memfd_output(const char *name, Error **errp)
+{
+int mfd = memfd_create_named(name, errp);
+
+if (mfd < 0) {
+return NULL;
+}
+
+return qemu_file_new_fd_output(mfd, name);
+}
+
+static QEMUFile *qemu_file_new_memfd_input(const char *name, Error **errp)
+{
+int ret, mfd;
+
+ret = memfd_find_named(name, , errp);
+if (ret || mfd < 0) {
+return NULL;
+}
+
+return qemu_file_new_fd_input(mfd, name);
+}
+
+int migration_precreate_save(Error **errp)
+{
+QEMUFile *f = qemu_file_new_memfd_output(PRECREATE_STATE_NAME, errp);
+
+if (!f) {
+return -1;
+} else if (qemu_savevm_precreate_save(f, errp)) {
+memfd_delete_named(PRECREATE_STATE_NAME);
+return -1;
+} else {
+/* Do not close f, as mfd must remain open. */
+return 0;
+}
+}
+
+void migration_precreate_unsave(void)
+{
+memfd_delete_named(PRECREATE_STATE_NAME);
+}
+
+int migration_precreate_load(Error **errp)
+{
+int ret;
+QEMUFile *f = qemu_file_new_memfd_input(PRECREATE_STATE_NAME, errp);


Can we avoid the QEMUFile? I don't see it being exported from this file.


It is not exported, but within this file, it is the basis for all read and
write operations, via the existing functions 

Re: [PATCH V1 03/26] migration: SAVEVM_FOREACH

2024-05-13 Thread Steven Sistare

On 5/6/2024 7:17 PM, Fabiano Rosas wrote:

Steve Sistare  writes:


Define an abstraction SAVEVM_FOREACH to loop over all savevm state
handlers, and replace QTAILQ_FOREACH.  Define variants for ALL so
we can loop over all handlers vs a subset of handlers in a subsequent
patch, but at this time there is no distinction between the two.
No functional change.

Signed-off-by: Steve Sistare 
---
  migration/savevm.c | 55 +++---
  1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 4509482..6829ba3 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -237,6 +237,15 @@ static SaveState savevm_state = {
  .global_section_id = 0,
  };
  
+#define SAVEVM_FOREACH(se, entry)\

+QTAILQ_FOREACH(se, _state.handlers, entry)\
+
+#define SAVEVM_FOREACH_ALL(se, entry)\
+QTAILQ_FOREACH(se, _state.handlers, entry)


This feels worse than SAVEVM_FOREACH_NOT_PRECREATED. We'll have to keep
coming back to the definition to figure out which FOREACH is the real
deal.


I take your point, but the majority of the loops do not care about precreated
objects, so it seems backwards to make them more verbose with 
SAVEVM_FOREACH_NOT_PRECREATE.  I can go either way, but we need

Peter's opinion also.


+
+#define SAVEVM_FOREACH_SAFE_ALL(se, entry, new_se)   \
+QTAILQ_FOREACH_SAFE(se, _state.handlers, entry, new_se)
+
  static SaveStateEntry *find_se(const char *idstr, uint32_t instance_id);
  
  static bool should_validate_capability(int capability)

@@ -674,7 +683,7 @@ static uint32_t calculate_new_instance_id(const char *idstr)
  SaveStateEntry *se;
  uint32_t instance_id = 0;
  
-QTAILQ_FOREACH(se, _state.handlers, entry) {

+SAVEVM_FOREACH_ALL(se, entry) {


In this patch we can't have both instances...


  if (strcmp(idstr, se->idstr) == 0
  && instance_id <= se->instance_id) {
  instance_id = se->instance_id + 1;
@@ -690,7 +699,7 @@ static int calculate_compat_instance_id(const char *idstr)
  SaveStateEntry *se;
  int instance_id = 0;
  
-QTAILQ_FOREACH(se, _state.handlers, entry) {

+SAVEVM_FOREACH(se, entry) {


...otherwise one of the two changes will go undocumented because the
actual reason for it will only be described in the next patch.


Sure, I'll move this to the precreate patch.

- Steve


  if (!se->compat) {
  continue;
  }
@@ -816,7 +825,7 @@ void unregister_savevm(VMStateIf *obj, const char *idstr, 
void *opaque)
  }
  pstrcat(id, sizeof(id), idstr);
  
-QTAILQ_FOREACH_SAFE(se, _state.handlers, entry, new_se) {

+SAVEVM_FOREACH_SAFE_ALL(se, entry, new_se) {
  if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
  savevm_state_handler_remove(se);
  g_free(se->compat);
@@ -939,7 +948,7 @@ void vmstate_unregister(VMStateIf *obj, const 
VMStateDescription *vmsd,
  {
  SaveStateEntry *se, *new_se;
  
-QTAILQ_FOREACH_SAFE(se, _state.handlers, entry, new_se) {

+SAVEVM_FOREACH_SAFE_ALL(se, entry, new_se) {
  if (se->vmsd == vmsd && se->opaque == opaque) {
  savevm_state_handler_remove(se);
  g_free(se->compat);
@@ -1223,7 +1232,7 @@ bool qemu_savevm_state_blocked(Error **errp)
  {
  SaveStateEntry *se;
  
-QTAILQ_FOREACH(se, _state.handlers, entry) {

+SAVEVM_FOREACH(se, entry) {
  if (se->vmsd && se->vmsd->unmigratable) {
  error_setg(errp, "State blocked by non-migratable device '%s'",
 se->idstr);
@@ -1237,7 +1246,7 @@ void qemu_savevm_non_migratable_list(strList **reasons)
  {
  SaveStateEntry *se;
  
-QTAILQ_FOREACH(se, _state.handlers, entry) {

+SAVEVM_FOREACH(se, entry) {
  if (se->vmsd && se->vmsd->unmigratable) {
  QAPI_LIST_PREPEND(*reasons,
g_strdup_printf("non-migratable device: %s",
@@ -1276,7 +1285,7 @@ bool qemu_savevm_state_guest_unplug_pending(void)
  {
  SaveStateEntry *se;
  
-QTAILQ_FOREACH(se, _state.handlers, entry) {

+SAVEVM_FOREACH(se, entry) {
  if (se->vmsd && se->vmsd->dev_unplug_pending &&
  se->vmsd->dev_unplug_pending(se->opaque)) {
  return true;
@@ -1291,7 +1300,7 @@ int qemu_savevm_state_prepare(Error **errp)
  SaveStateEntry *se;
  int ret;
  
-QTAILQ_FOREACH(se, _state.handlers, entry) {

+SAVEVM_FOREACH(se, entry) {
  if (!se->ops || !se->ops->save_prepare) {
  continue;
  }
@@ -1321,7 +1330,7 @@ int qemu_savevm_state_setup(QEMUFile *f, Error **errp)
  json_writer_start_array(ms->vmdesc, "devices");
  
  trace_savevm_state_setup();

-QTAILQ_FOREACH(se, _state.handlers, entry) {
+SAVEVM_FOREACH(se, entry) {
  if (se->vmsd && se->vmsd->early_setup) {
   

cpr-exec doc (was Re: [PATCH V1 00/26] Live update: cpr-exec)

2024-05-02 Thread Steven Sistare

On 4/29/2024 11:55 AM, Steve Sistare wrote:

This patch series adds the live migration cpr-exec mode.


Here is the text I plan to add to docs/devel/migration/CPR.rst.  It is
premature for me to submit this as a patch, because it includes all
the functionality I plan to add in this and future series, but it may
help you while reviewing this series.

- Steve



cpr-exec mode
---

In this mode, QEMU stops the VM, writes VM state to the migration
URI, and directly exec's a new version of QEMU on the same host,
replacing the original process while retaining its PID.  Guest RAM is
preserved in place, albeit with new virtual addresses.  The user
completes the migration by specifying the ``-incoming`` option, and
by issuing the ``migrate-incoming`` command if necessary; see details
below.

This mode supports vfio devices by preserving device descriptors and
hence kernel state across the exec, even for devices that do not
support live migration, and preserves tap and vhost descriptors.

cpr-exec also preserves descriptors for a subset of chardevs,
including socket, file, parallel, pipe, serial, pty, stdio, and null.
chardevs that support cpr-exec have the QEMU_CHAR_FEATURE_CPR set in
the Chardev object.  The client side of a preserved chardev sees no
loss of connectivity during cpr-exec.  More chardevs could be
preserved with additional developement.

All chardevs have a ``reopen-on-cpr`` option which causes the chardev
to be closed and reopened during cpr-exec.  This can be set to allow
cpr-exec when the configuration includes a chardev (such as vc) that
does not have QEMU_CHAR_FEATURE_CPR.

Because the old and new QEMU instances are not active concurrently,
the URI cannot be a type that streams data from one instance to the
other.

Usage
^

Arguments for the new QEMU process are taken from the
@cpr-exec-args parameter.  The first argument should be the
path of a new QEMU binary, or a prefix command that exec's the
new QEMU binary, and the arguments should include the ''-incoming''
option.

Memory backend objects must have the ``share=on`` attribute, and
must be mmap'able in the new QEMU process.  For example,
memory-backend-file is acceptable, but memory-backend-ram is
not.

The VM must be started with the ``-machine memfd-alloc=on``
option.  This causes implicit RAM blocks (those not explicitly
described by a memory-backend object) to be allocated by
mmap'ing a memfd.  Examples include VGA, ROM, and even guest
RAM when it is specified without without reference to a
memory-backend object.

Add the ``-only-migratable-modes cpr-exec`` option to guarantee that
the configuration supports cpr-exec.  QEMU will exit at start time
if not.

Outgoing:
  * Set the migration mode parameter to ``cpr-exec``.
  * Set the ``cpr-exec-args`` parameter.
  * Issue the ``migrate`` command.  It is recommended the the URI be
a ``file`` type, but one can use other types such as ``exec``,
provided the command captures all the data from the outgoing side,
and provides all the data to the incoming side.

Incoming:
  * You do not need to explicitly start new QEMU.  It is started as
a side effect of the migrate command above.
  * If the VM was running when the outgoing ``migrate`` command was
issued, then QEMU automatically resumes VM execution.

Example 1: incoming URI
^^^

In these examples, we simply restart the same version of QEMU, but in
a real scenario one would set a new QEMU binary path in cpr-exec-args.

::

  # qemu-kvm -monitor stdio
  -object 
memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m 4G

  -machine memfd-alloc=on
  ...

  QEMU 9.1.50 monitor - type 'help' for more information
  (qemu) info status
  VM status: running
  (qemu) migrate_set_parameter mode cpr-exec
  (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming 
file:vm.state

  (qemu) migrate -d file:vm.state
  (qemu) QEMU 9.1.50 monitor - type 'help' for more information
  (qemu) info status
  VM status: running

Example 2: incoming defer
^
::

  # qemu-kvm -monitor stdio
  -object 
memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m 4G

  -machine memfd-alloc=on
  ...

  QEMU 9.1.50 monitor - type 'help' for more information
  (qemu) info status
  VM status: running
  (qemu) migrate_set_parameter mode cpr-exec
  (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming defer
  (qemu) migrate -d file:vm.state
  (qemu) QEMU 9.1.50 monitor - type 'help' for more information
  (qemu) info status
  status: paused (inmigrate)
  (qemu) migrate_incoming file:vm.state
  (qemu) info status
  VM status: running


Caveats
^^^

cpr-exec mode may not be used with postcopy, background-snapshot,
or COLO.

cpr-exec mode requires permission to use the exec system call, which
is denied by certain sandbox options, such as spawn.  Use finer
grained 

Re: [PATCH V1 20/26] migration: cpr-exec mode

2024-05-02 Thread Steven Sistare

On 5/2/2024 8:23 AM, Markus Armbruster wrote:

Steve Sistare  writes:


Add the cpr-exec migration mode.  Usage:
   qemu-system-$arch -machine memfd-alloc=on ...
   migrate_set_parameter mode cpr-exec
   migrate_set_parameter cpr-exec-args \
   ... -incoming 
   migrate -d 

The migrate command stops the VM, saves state to the URI,
directly exec's a new version of QEMU on the same host,
replacing the original process while retaining its PID, and
loads state from the URI.  Guest RAM is preserved in place,
albeit with new virtual addresses.

Arguments for the new QEMU process are taken from the
@cpr-exec-args parameter.  The first argument should be the
path of a new QEMU binary, or a prefix command that exec's the
new QEMU binary.

Because old QEMU terminates when new QEMU starts, one cannot
stream data between the two, so the URI must be a type, such as
a file, that reads all data before old QEMU exits.

Memory backend objects must have the share=on attribute, and
must be mmap'able in the new QEMU process.  For example,
memory-backend-file is acceptable, but memory-backend-ram is
not.

The VM must be started with the '-machine memfd-alloc=on'
option.  This causes implicit ram blocks (those not explicitly
described by a memory-backend object) to be allocated by
mmap'ing a memfd.  Examples include VGA, ROM, and even guest
RAM when it is specified without a memory-backend object.

The implementation saves precreate vmstate at the end of normal
migration in migrate_fd_cleanup, and tells the main loop to call
cpr_exec.  Incoming qemu loads preceate state early, before objects
are created.  The memfds are kept open across exec by clearing the
close-on-exec flag, their values are saved in precreate vmstate,
and they are mmap'd in new qemu.

Note that the memfd-alloc option is not related to memory-backend-memfd.
Later patches add support for memory-backend-memfd, and for additional
devices, including vfio, chardev, and more.

Signed-off-by: Steve Sistare 


[...]


diff --git a/qapi/migration.json b/qapi/migration.json
index 49710e7..7c5f45f 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -665,9 +665,37 @@
  # or COLO.
  #
  # (since 8.2)
+#
+# @cpr-exec: The migrate command stops the VM, saves state to the URI,
+# directly exec's a new version of QEMU on the same host,
+# replacing the original process while retaining its PID, and
+# loads state from the URI.  Guest RAM is preserved in place,
+# albeit with new virtual addresses.


Do you mean the virtual addresses of guest RAM may differ betwen old and
new QEMU process?


The VA at which a guest RAM segment is mapped in the QEMU process
changes.  The end user would not notice or care, so I'll drop that
detail here.


+#
+# Arguments for the new QEMU process are taken from the
+# @cpr-exec-args parameter.  The first argument should be the
+# path of a new QEMU binary, or a prefix command that exec's the
+# new QEMU binary.


What's a "prefix command"?  A wrapper script, perhaps?


A prefix command is any command of the form:
  command1 command1-args command2 command2-args
where command1 performs some set up before exec'ing command2.
However, I will drop the word "prefix", it adds no meaning here.


+#
+# Because old QEMU terminates when new QEMU starts, one cannot
+# stream data between the two, so the URI must be a type, such as
+# a file, that reads all data before old QEMU exits.


What happens when you specify a URI that doesn't?


Old QEMU will quietly block indefinitely writing to the URI.


+#
+# Memory backend objects must have the share=on attribute, and
+# must be mmap'able in the new QEMU process.  For example,
+# memory-backend-file is acceptable, but memory-backend-ram is
+# not.
+#
+# The VM must be started with the '-machine memfd-alloc=on'


What happens when you don't?


If '-only-migratable-modes cpr-exec' is specified, then QEMU will fail
to start, and print a clear error message.

Otherwise, a blocker is registered and any attempt to cpr-exec will fail
with a clear error message.

- Steve


+# option.  This causes implicit ram blocks -- those not explicitly
+# described by a memory-backend object -- to be allocated by
+# mmap'ing a memfd.  Examples include VGA, ROM, and even guest
+# RAM when it is specified without a memory-backend object.
+#
+# (since 9.1)
  ##
  { 'enum': 'MigMode',
-  'data': [ 'normal', 'cpr-reboot' ] }
+  'data': [ 'normal', 'cpr-reboot', 'cpr-exec' ] }
  
  ##

  # @ZeroPageDetection:


[...]





Re: [PATCH V1 18/26] migration: cpr-exec-args parameter

2024-05-02 Thread Steven Sistare

On 5/2/2024 8:23 AM, Markus Armbruster wrote:

Steve Sistare  writes:


Create the cpr-exec-args migration parameter, defined as a list of
strings.  It will be used for cpr-exec migration mode in a subsequent
patch.

No functional change, except that cpr-exec-args is shown by the
'info migrate' command.

Signed-off-by: Steve Sistare 


[...]


diff --git a/qapi/migration.json b/qapi/migration.json
index 8c65b90..49710e7 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -914,6 +914,9 @@
  # See description in @ZeroPageDetection.  Default is 'multifd'.
  # (since 9.0)
  #
+# @cpr-exec-args: Arguments passed to new QEMU for @cpr-exec mode.
+#See @cpr-exec for details.  (Since 9.1)
+#


You mean migration mode @cpr-exec, don't you?

If yes, dangling reference until PATCH 20 adds it.  Okay, but worth a
mention in the commit message.

Suggest "See MigMode @cpr-exec for details."


Yes to all.  Will update as you suggest.

- Steve


  # Features:
  #
  # @deprecated: Member @block-incremental is deprecated.  Use
@@ -948,7 +951,8 @@
 { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
 'vcpu-dirty-limit',
 'mode',
-   'zero-page-detection'] }
+   'zero-page-detection',
+   'cpr-exec-args'] }
  
  ##

  # @MigrateSetParameters:
@@ -1122,6 +1126,9 @@
  # See description in @ZeroPageDetection.  Default is 'multifd'.
  # (since 9.0)
  #
+# @cpr-exec-args: Arguments passed to new QEMU for @cpr-exec mode.
+#See @cpr-exec for details.  (Since 9.1)
+#
  # Features:
  #
  # @deprecated: Member @block-incremental is deprecated.  Use
@@ -1176,7 +1183,8 @@
  'features': [ 'unstable' ] },
  '*vcpu-dirty-limit': 'uint64',
  '*mode': 'MigMode',
-'*zero-page-detection': 'ZeroPageDetection'} }
+'*zero-page-detection': 'ZeroPageDetection',
+'*cpr-exec-args': [ 'str' ]} }
  
  ##

  # @migrate-set-parameters:
@@ -1354,6 +1362,9 @@
  # See description in @ZeroPageDetection.  Default is 'multifd'.
  # (since 9.0)
  #
+# @cpr-exec-args: Arguments passed to new QEMU for @cpr-exec mode.
+#See @cpr-exec for details.  (Since 9.1)
+#
  # Features:
  #
  # @deprecated: Member @block-incremental is deprecated.  Use
@@ -1405,7 +1416,8 @@
  'features': [ 'unstable' ] },
  '*vcpu-dirty-limit': 'uint64',
  '*mode': 'MigMode',
-'*zero-page-detection': 'ZeroPageDetection'} }
+'*zero-page-detection': 'ZeroPageDetection',
+'*cpr-exec-args': [ 'str' ]} }
  
  ##

  # @query-migrate-parameters:






Re: [PATCH V4 10/14] migration: stop vm for cpr

2024-03-13 Thread Steven Sistare

On 2/29/2024 8:28 PM, Peter Xu wrote:

On Thu, Feb 29, 2024 at 10:21:14AM -0500, Steven Sistare wrote:

On 2/25/2024 9:08 PM, Peter Xu wrote:

On Thu, Feb 22, 2024 at 09:28:36AM -0800, Steve Sistare wrote:

When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because qmp_cont
rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare 


Reviewed-by: Peter Xu 

cpr-reboot mode keeps changing behavior.

Could we declare it "experimental" until it's solid?  Maybe a patch to
document this?

Normally IMHO we shouldn't merge a feature if it's not complete, however
cpr-reboot is so special that the mode itself is already merged in 8.2
before I started to merge patches, and it keeps changing things.  I don't
know what else we can do here besides declaring it experimental and not
declare it a stable feature.


Hi Peter, the planned/committed functionality for cpr-reboot changed only once, 
in:
 migration: stop vm for cpr

Suspension to support vfio is an enhancement which adds to the basic 
functionality,
it does not change it.  This was planned all along, but submitted as a separate


If VFIO used to migrate without suspension and now it won't, it's a
behavior change?


VFIO could not cpr-reboot migrate without suspension.  The existing vfio 
migration blockers applied to all modes:

  Error: :3a:10.0: VFIO migration is not supported in kernel

Now, with suspension, it will.  An addition, not a change.


series to manage complexity, as I outlined in my qemu community presentation,
which I emailed you at the time.

Other "changes" that arose during review were just clarifications and 
explanations.

So, I don't think cpr-reboot deserves to be condemned to experimental limbo.


IMHO it's not about a feature being condemned, it's about a kindful
heads-up to the users that one needs to take risk on using it until it
becomes stable, it also makes developers easier because of no limitation on
behavior change.  If all the changes are landing, then there's no need for
such a patch.

If so, please propose the planned complete document patch. I had a feeling
that cpr is still not fully understood by even many developers on the list.
It'll be great that such document will contain all the details one needs to
know on the feature, etc. meaning of the name cpr-reboot (what is "cpr"),
when to use it, how to use it, how it differs from "normal" mode
(etc. lifted limitations on some devices that used to block migration?),
what is enforced (vm stop, suspension, etc.) and what is optionally offered
(VFIO, shared-mem, etc.), the expected behaviors, etc.

When send it, please copy relevant people (Alex & Cedric for VFIO, while
Markus could also be a good candidate considering the QMP involvement).


Submitted.

- Steve



Re: [PATCH V2 00/11] privatize migration.h

2024-03-11 Thread Steven Sistare

On 3/11/2024 4:28 PM, Peter Xu wrote:

On Mon, Mar 11, 2024 at 04:24:14PM -0400, Steven Sistare wrote:

On 3/11/2024 3:45 PM, Steven Sistare wrote:

On 3/11/2024 3:30 PM, Peter Xu wrote:

Steve,

On Mon, Mar 11, 2024 at 10:48:47AM -0700, Steve Sistare wrote:

Changes in V2:
    * rebase to migration-next, add RB


Not apply even to master branch.  Note that there're >=1 PULLs sent and
merged since my last reply..  Perhaps you rebased to the "old" next?


I pulled from branch migration-next in https://gitlab.com/peterx/qemu a
few hours ago, but I must have screwed up somewhere.  I'll figure it out
and post a V4.


My pull was a fiew hours old, but my patches still apply cleanly to the
most recent tip:
   a1bb5dd169f4 ("migration: Fix format in error message")

I can sent that as V3, but ...
Note that you must apply "migration: export fewer options" before
"privatize migration.h".  If that does not help, I will send V3.


Ouch, I forgot that dependency... Sorry.

Yeah it works now.  No need to resend for now.


Great! - steve



Re: [PATCH V2 00/11] privatize migration.h

2024-03-11 Thread Steven Sistare

On 3/11/2024 3:45 PM, Steven Sistare wrote:

On 3/11/2024 3:30 PM, Peter Xu wrote:

Steve,

On Mon, Mar 11, 2024 at 10:48:47AM -0700, Steve Sistare wrote:

Changes in V2:
   * rebase to migration-next, add RB


Not apply even to master branch.  Note that there're >=1 PULLs sent and
merged since my last reply..  Perhaps you rebased to the "old" next?


I pulled from branch migration-next in https://gitlab.com/peterx/qemu a 
few hours ago, but I must have screwed up somewhere.  I'll figure it out

and post a V4.


My pull was a fiew hours old, but my patches still apply cleanly to the
most recent tip:
  a1bb5dd169f4 ("migration: Fix format in error message")

I can sent that as V3, but ...
Note that you must apply "migration: export fewer options" before
"privatize migration.h".  If that does not help, I will send V3.

- Steve



Re: [PATCH V2 00/11] privatize migration.h

2024-03-11 Thread Steven Sistare

On 3/11/2024 3:30 PM, Peter Xu wrote:

Steve,

On Mon, Mar 11, 2024 at 10:48:47AM -0700, Steve Sistare wrote:

Changes in V2:
   * rebase to migration-next, add RB


Not apply even to master branch.  Note that there're >=1 PULLs sent and
merged since my last reply..  Perhaps you rebased to the "old" next?


I pulled from branch migration-next in https://gitlab.com/peterx/qemu a 
few hours ago, but I must have screwed up somewhere.  I'll figure it out

and post a V4.

- Steve



Re: [PATCH V2] migration: export fewer options

2024-03-11 Thread Steven Sistare

On 3/1/2024 2:47 AM, Peter Xu wrote:

On Thu, Feb 29, 2024 at 07:03:36AM +0100, Markus Armbruster wrote:

Steven Sistare  writes:


Just a reminder, after our further discussion in the V1 thread,
this patch is still what I propose, no updates needed.

Markus, I think Peter is looking for your blessing on the new
file name: include/migration/client-options.h.


Not my preference, but no objection.


There's yet one alternative, which is to put these exported option
functions into misc.h directly.  After all that's not so much, and misc.h
already hold random stuff from elsewhere.

Steve, would you repost this patch (with/without my above comment taken)
along with your other series with a rebase to migration-next?  It doesn't
apply there.  Re the other series: one nitpick comment on the last patch,
where you may consider splitting the removal of the unused 2 functions into
a standalone patch.  Other than that it looks good to me.

https://gitlab.com/peterx/qemu/-/tree/migration-next


Both are rebased and reposted, thanks - steve



Re: [PATCH V4 10/14] migration: stop vm for cpr

2024-02-29 Thread Steven Sistare
On 2/25/2024 9:08 PM, Peter Xu wrote:
> On Thu, Feb 22, 2024 at 09:28:36AM -0800, Steve Sistare wrote:
>> When migration for cpr is initiated, stop the vm and set state
>> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
>> possibility of ram and device state being out of sync, and guarantees
>> that a guest in the suspended state remains suspended, because qmp_cont
>> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
>>
>> Signed-off-by: Steve Sistare 
> 
> Reviewed-by: Peter Xu 
> 
> cpr-reboot mode keeps changing behavior.
> 
> Could we declare it "experimental" until it's solid?  Maybe a patch to
> document this?
> 
> Normally IMHO we shouldn't merge a feature if it's not complete, however
> cpr-reboot is so special that the mode itself is already merged in 8.2
> before I started to merge patches, and it keeps changing things.  I don't
> know what else we can do here besides declaring it experimental and not
> declare it a stable feature.

Hi Peter, the planned/committed functionality for cpr-reboot changed only once, 
in:
migration: stop vm for cpr

Suspension to support vfio is an enhancement which adds to the basic 
functionality,
it does not change it.  This was planned all along, but submitted as a separate 
series to manage complexity, as I outlined in my qemu community presentation,
which I emailed you at the time.

Other "changes" that arose during review were just clarifications and 
explanations.

So, I don't think cpr-reboot deserves to be condemned to experimental limbo.

- Steve




Re: [PATCH V4 14/14] migration: options incompatible with cpr

2024-02-29 Thread Steven Sistare
On 2/29/2024 12:40 AM, Peter Xu wrote:
> On Thu, Feb 29, 2024 at 06:31:26AM +0100, Markus Armbruster wrote:
>> Hmm, perhaps Peter can still squash in the corrections before posting
>> his PR.  Peter?
> 
> The PR was sent yesterday, it's already in PeterM's -staging.  I worry it's
> a bit late to call a stop now.
> 
> https://lore.kernel.org/qemu-devel/20240228051315.400759-23-pet...@redhat.com
> 
> Steve, could you provide a standalone patch for the update?  Then I'll do
> the best accordingly (e.g. if the PR failed to apply I'll squash when
> resend v2; or I'll pick it up for the next).

I sent the patch.  I also re-wrapped all cpr-reboot paragraphs to 70 columns
and addressed Markus' other comments on "migration: update cpr-reboot 
description".

Peter, if you squash it, the last sentence "cpr-reboot may not be used ..." 
squashes into
   migration: options incompatible with cpr
and everything else squashes into
   migration: update cpr-reboot description

- Steve



Re: [PATCH] migration: re-format cpr-reboot documentation

2024-02-29 Thread Steven Sistare
Please ignore this, I will send a V2 that incorporates additional comments from
Markus that I missed in my inbox.

- Steve

On 2/29/2024 9:15 AM, Steve Sistare wrote:
> Re-wrap the cpr-reboot documentation to 70 columns, use '@' for
> cpr-reboot references, and capitalize COLO.
> 
> Suggested-by: Markus Armbruster 
> Signed-off-by: Steve Sistare 
> ---
>  qapi/migration.json | 36 +++-
>  1 file changed, 19 insertions(+), 17 deletions(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index c6bfe2e..b69f62a 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -636,28 +636,30 @@
>  #
>  # @normal: the original form of migration. (since 8.2)
>  #
> -# @cpr-reboot: The migrate command stops the VM and saves state to the URI.
> -# After quitting qemu, the user resumes by running qemu -incoming.
> +# @cpr-reboot: The migrate command stops the VM and saves state to
> +# the URI.  After quitting qemu, the user resumes by running
> +# qemu -incoming.
>  #
> -# This mode allows the user to quit qemu, and restart an updated version
> -# of qemu.  The user may even update and reboot the OS before restarting,
> -# as long as the URI persists across a reboot.
> +# This mode allows the user to quit qemu and restart an updated
> +# version of qemu.  The user may even update and reboot the OS
> +# before restarting, as long as the URI persists across a reboot.
>  #
> -# Unlike normal mode, the use of certain local storage options does not
> -# block the migration, but the user must not modify guest block devices
> -# between the quit and restart.
> +# Unlike normal mode, the use of certain local storage options
> +# does not block the migration, but the user must not modify guest
> +# block devices between the quit and restart.
>  #
> -# This mode supports vfio devices provided the user first puts the guest
> -# in the suspended runstate, such as by issuing guest-suspend-ram to the
> -# qemu guest agent.
> +# This mode supports vfio devices provided the user first puts
> +# the guest in the suspended runstate, such as by issuing
> +# guest-suspend-ram to the qemu guest agent.
>  #
> -# Best performance is achieved when the memory backend is shared and the
> -# @x-ignore-shared migration capability is set, but this is not required.
> -# Further, if the user reboots before restarting such a configuration, 
> the
> -# shared backend must be be non-volatile across reboot, such as by 
> backing
> -# it with a dax device.
> +# Best performance is achieved when the memory backend is shared
> +# and the @x-ignore-shared migration capability is set, but this
> +# is not required.  Further, if the user reboots before restarting
> +# such a configuration, the shared backend must be be non-volatile
> +# across reboot, such as by backing it with a dax device.
>  #
> -# cpr-reboot may not be used with postcopy, colo, or background-snapshot.
> +# @cpr-reboot may not be used with postcopy, background-snapshot,
> +# or COLO.
>  #
>  # (since 8.2)
>  ##



Re: [PATCH V4 11/14] vfio: register container for cpr

2024-02-29 Thread Steven Sistare



On 2/29/2024 3:35 AM, Cédric Le Goater wrote:
> Hello Steve,
> 
> On 2/22/24 18:28, Steve Sistare wrote:
>> Define entry points to perform per-container cpr-specific initialization
>> and teardown.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>   hw/vfio/container.c   | 11 ++-
>>   hw/vfio/cpr.c | 19 +++
>>   hw/vfio/iommufd.c |  6 ++
>>   hw/vfio/meson.build   |  1 +
>>   include/hw/vfio/vfio-common.h |  3 +++
>>   5 files changed, 39 insertions(+), 1 deletion(-)
>>   create mode 100644 hw/vfio/cpr.c
>>
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index bd25b9f..096d77e 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -621,10 +621,15 @@ static int vfio_connect_container(VFIOGroup *group, 
>> AddressSpace *as,
>>   goto free_container_exit;
>>   }
>>   +    ret = vfio_cpr_register_container(bcontainer, errp);
>> +    if (ret) {
>> +    goto free_container_exit;
>> +    }
>> +
>>   ret = vfio_ram_block_discard_disable(container, true);
>>   if (ret) {
>>   error_setg_errno(errp, -ret, "Cannot set discarding of RAM 
>> broken");
>> -    goto free_container_exit;
>> +    goto unregister_container_exit;
>>   }
>>     assert(bcontainer->ops->setup);
>> @@ -667,6 +672,9 @@ listener_release_exit:
>>   enable_discards_exit:
>>   vfio_ram_block_discard_disable(container, false);
>>   +unregister_container_exit:
>> +    vfio_cpr_unregister_container(bcontainer);
>> +
>>   free_container_exit:
>>   g_free(container);
>>   @@ -710,6 +718,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>>   vfio_container_destroy(bcontainer);
>>     trace_vfio_disconnect_container(container->fd);
>> +    vfio_cpr_unregister_container(bcontainer);
>>   close(container->fd);
>>   g_free(container);
>>   diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
>> new file mode 100644
>> index 000..3bede54
>> --- /dev/null
>> +++ b/hw/vfio/cpr.c
>> @@ -0,0 +1,19 @@
>> +/*
>> + * Copyright (c) 2021-2024 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "hw/vfio/vfio-common.h"
>> +#include "qapi/error.h"
>> +
>> +int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp)
>> +{
>> +    return 0;
>> +}
>> +
>> +void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer)
>> +{
>> +}
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 9bfddc1..e1be224 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -411,6 +411,11 @@ found_container:
>>   goto err_listener_register;
>>   }
>>   +    ret = vfio_cpr_register_container(bcontainer, errp);
>> +    if (ret) {
>> +    goto err_listener_register;
>> +    }
>> +
>>   /*
>>    * TODO: examine RAM_BLOCK_DISCARD stuff, should we do group level
>>    * for discarding incompatibility check as well?
>> @@ -461,6 +466,7 @@ static void iommufd_cdev_detach(VFIODevice *vbasedev)
>>   iommufd_cdev_ram_block_discard_disable(false);
>>   }
>>   +    vfio_cpr_unregister_container(bcontainer);
>>   iommufd_cdev_detach_container(vbasedev, container);
>>   iommufd_cdev_container_destroy(container);
>>   vfio_put_address_space(space);
>> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
>> index bb98493..bba776f 100644
>> --- a/hw/vfio/meson.build
>> +++ b/hw/vfio/meson.build
>> @@ -5,6 +5,7 @@ vfio_ss.add(files(
>>     'container-base.c',
>>     'container.c',
>>     'migration.c',
>> +  'cpr.c',
>>   ))
>>   vfio_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr.c'))
>>   vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files(
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 4a6c262..b9da6c0 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -205,6 +205,9 @@ void vfio_detach_device(VFIODevice *vbasedev);
>>   int vfio_kvm_device_add_fd(int fd, Error **errp);
>>   int vfio_kvm_device_del_fd(int fd, Error **errp);
>>   +int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error 
>> **errp);
> 
> Should we return bool since we have an errp ? the returned value
> is not an errno AFAICT.
> 
> Anyhow,
> 
> Reviewed-by: Cédric Le Goater 

Hi Cedric,  vfio_cpr_register_container sometimes returns the value returned 
from 
migrate_add_blocker_modes, which is an errno.

Thanks for reviewing these!

- Steve
 
>> +void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer);
>> +
>>   extern const MemoryRegionOps vfio_region_ops;
>>   typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
>>   typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
> 



Re: [PATCH V2] migration: export fewer options

2024-02-28 Thread Steven Sistare
Just a reminder, after our further discussion in the V1 thread, 
this patch is still what I propose, no updates needed.

Markus, I think Peter is looking for your blessing on the new
file name: include/migration/client-options.h.

- Steve

On 2/26/2024 5:16 PM, Steve Sistare wrote:
> A small number of migration options are accessed by migration clients,
> but to see them clients must include all of options.h, which is mostly
> for migration core code.  migrate_mode() in particular will be needed by
> multiple clients.
> 
> Refactor the option declarations so clients can see the necessary few via
> misc.h, which already exports a portion of the client API.
> 
> Signed-off-by: Steve Sistare 
> ---
> Changes in V2:
>   * renamed options-pub.h to client-options.h
> ---
> ---
>  hw/vfio/migration.c|  1 -
>  hw/virtio/virtio-balloon.c |  1 -
>  include/migration/client-options.h | 24 
>  include/migration/misc.h   |  1 +
>  migration/options.h|  6 +-
>  system/dirtylimit.c|  1 -
>  6 files changed, 26 insertions(+), 8 deletions(-)
>  create mode 100644 include/migration/client-options.h
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 50140ed..5d4a23c 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -18,7 +18,6 @@
>  #include "sysemu/runstate.h"
>  #include "hw/vfio/vfio-common.h"
>  #include "migration/migration.h"
> -#include "migration/options.h"
>  #include "migration/savevm.h"
>  #include "migration/vmstate.h"
>  #include "migration/qemu-file.h"
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 89f853f..a59ff17 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -32,7 +32,6 @@
>  #include "qemu/error-report.h"
>  #include "migration/misc.h"
>  #include "migration/migration.h"
> -#include "migration/options.h"
>  
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> diff --git a/include/migration/client-options.h 
> b/include/migration/client-options.h
> new file mode 100644
> index 000..887fea1
> --- /dev/null
> +++ b/include/migration/client-options.h
> @@ -0,0 +1,24 @@
> +/*
> + * QEMU public migration capabilities
> + *
> + * Copyright (c) 2012-2023 Red Hat Inc
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_MIGRATION_CLIENT_OPTIONS_H
> +#define QEMU_MIGRATION_CLIENT_OPTIONS_H
> +
> +/* capabilities */
> +
> +bool migrate_background_snapshot(void);
> +bool migrate_dirty_limit(void);
> +bool migrate_postcopy_ram(void);
> +bool migrate_switchover_ack(void);
> +
> +/* parameters */
> +
> +MigMode migrate_mode(void);
> +
> +#endif
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index 5d1aa59..4c226a4 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -17,6 +17,7 @@
>  #include "qemu/notify.h"
>  #include "qapi/qapi-types-migration.h"
>  #include "qapi/qapi-types-net.h"
> +#include "migration/client-options.h"
>  
>  /* migration/ram.c */
>  
> diff --git a/migration/options.h b/migration/options.h
> index 246c160..964ebdd 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -16,6 +16,7 @@
>  
>  #include "hw/qdev-properties.h"
>  #include "hw/qdev-properties-system.h"
> +#include "migration/client-options.h"
>  
>  /* migration properties */
>  
> @@ -24,12 +25,10 @@ extern Property migration_properties[];
>  /* capabilities */
>  
>  bool migrate_auto_converge(void);
> -bool migrate_background_snapshot(void);
>  bool migrate_block(void);
>  bool migrate_colo(void);
>  bool migrate_compress(void);
>  bool migrate_dirty_bitmaps(void);
> -bool migrate_dirty_limit(void);
>  bool migrate_events(void);
>  bool migrate_ignore_shared(void);
>  bool migrate_late_block_activate(void);
> @@ -37,11 +36,9 @@ bool migrate_multifd(void);
>  bool migrate_pause_before_switchover(void);
>  bool migrate_postcopy_blocktime(void);
>  bool migrate_postcopy_preempt(void);
> -bool migrate_postcopy_ram(void);
>  bool migrate_rdma_pin_all(void);
>  bool migrate_release_ram(void);
>  bool migrate_return_path(void);
> -bool migrate_switchover_ack(void);
>  bool migrate_validate_uuid(void);
>  bool migrate_xbzrle(void);
>  bool migrate_zero_blocks(void);
> @@ -83,7 +80,6 @@ uint8_t migrate_max_cpu_throttle(void);
>  uint64_t migrate_max_bandwidth(void);
>  uint64_t migrate_avail_switchover_bandwidth(void);
>  uint64_t migrate_max_postcopy_bandwidth(void);
> -MigMode migrate_mode(void);
>  int migrate_multifd_channels(void);
>  MultiFDCompression migrate_multifd_compression(void);
>  int migrate_multifd_zlib_level(void);
> diff --git a/system/dirtylimit.c b/system/dirtylimit.c
> index b5607eb..774ff44 100644
> --- a/system/dirtylimit.c
> +++ b/system/dirtylimit.c
> @@ -26,7 +26,6 @@
>  #include "trace.h"
>  #include "migration/misc.h"
>  

Re: [PATCH V4 14/14] migration: options incompatible with cpr

2024-02-28 Thread Steven Sistare
On 2/28/2024 11:05 AM, Markus Armbruster wrote:
> Steven Sistare  writes:
> 
>> On 2/28/2024 2:21 AM, Markus Armbruster wrote:
>>> Steve Sistare  writes:
>>>
>>>> Fail the migration request if options are set that are incompatible
>>>> with cpr.
>>>>
>>>> Signed-off-by: Steve Sistare 
> 
> [...]
> 
>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>> index 0990297..c6bfe2e 100644
>>>> --- a/qapi/migration.json
>>>> +++ b/qapi/migration.json
>>>> @@ -657,6 +657,8 @@
>>>>  # shared backend must be be non-volatile across reboot, such as by 
>>>> backing
>>>>  # it with a dax device.
>>>>  #
>>>> +# cpr-reboot may not be used with postcopy, colo, or 
>>>> background-snapshot.
>>>> +#
>>>
>>> @cpr-reboot
>>>
>>> COLO
>>>
>>> Wrap the line:
>>>
>>># @cpr-reboot may not be used with postcopy, COLO, or
>>># background-snapshot.
>>>
>>> This doesn't tell the reader what settings exactly do not work with
>>> @cpr-reboot.
>>>
>>> For instance "background-snapshot" is about enabling migration
>>> capability @background-snapshot.  We could write something like "is
>>> incompatible with enabling migration capability @background-snapshot".
>>>
>>> Same for the other two.  Worthwhile?
>>
>> I initially was more precise exactly as you suggest, but I realized that
>> postcopy encompasses multiple capabilities, and I did not want to enumerate
>> them, nor require new capabilities related to these 3 to be listed here 
>> if/when they are created, so I generalized.  A keyword search in this file 
>> gives unambiguous matches.
>>
>> Peter pushed the patch a few hours before you sent this.
> 
> Okay.
> 
> A followup patch to correct @cpr-reboot, COLO and line wrapping would be
> nice.

Will do - steve 



Re: [PATCH V4 14/14] migration: options incompatible with cpr

2024-02-28 Thread Steven Sistare
On 2/28/2024 2:21 AM, Markus Armbruster wrote:
> Steve Sistare  writes:
> 
>> Fail the migration request if options are set that are incompatible
>> with cpr.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  migration/migration.c | 17 +
>>  qapi/migration.json   |  2 ++
>>  2 files changed, 19 insertions(+)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 90a9094..7652fd4 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1953,6 +1953,23 @@ static bool migrate_prepare(MigrationState *s, bool 
>> blk, bool blk_inc,
>>  return false;
>>  }
>>  
>> +if (migrate_mode_is_cpr(s)) {
>> +const char *conflict = NULL;
>> +
>> +if (migrate_postcopy()) {
>> +conflict = "postcopy";
>> +} else if (migrate_background_snapshot()) {
>> +conflict = "background snapshot";
>> +} else if (migrate_colo()) {
>> +conflict = "COLO";
>> +}
>> +
>> +if (conflict) {
>> +error_setg(errp, "Cannot use %s with CPR", conflict);
>> +return false;
>> +}
>> +}
>> +
>>  if (blk || blk_inc) {
>>  if (migrate_colo()) {
>>  error_setg(errp, "No disk migration is required in COLO mode");
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 0990297..c6bfe2e 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -657,6 +657,8 @@
>>  # shared backend must be be non-volatile across reboot, such as by 
>> backing
>>  # it with a dax device.
>>  #
>> +# cpr-reboot may not be used with postcopy, colo, or 
>> background-snapshot.
>> +#
> 
> @cpr-reboot
> 
> COLO
> 
> Wrap the line:
> 
># @cpr-reboot may not be used with postcopy, COLO, or
># background-snapshot.
> 
> This doesn't tell the reader what settings exactly do not work with
> @cpr-reboot.
> 
> For instance "background-snapshot" is about enabling migration
> capability @background-snapshot.  We could write something like "is
> incompatible with enabling migration capability @background-snapshot".
> 
> Same for the other two.  Worthwhile?

I initially was more precise exactly as you suggest, but I realized that
postcopy encompasses multiple capabilities, and I did not want to enumerate
them, nor require new capabilities related to these 3 to be listed here 
if/when they are created, so I generalized.  A keyword search in this file 
gives unambiguous matches.

Peter pushed the patch a few hours before you sent this.

- Steve



Re: [PATCH V1 00/10] privatize migration.h

2024-02-27 Thread Steven Sistare
Please ignore this solo message, I accidentally sent it alone - steve

On 2/27/2024 12:42 PM, Steve Sistare wrote:
> migration/migration.h is the private interface for code in the migration
> sub-directory, but many other clients include it because they need accessors
> that are not exported by the publc interface in include/migration/misc.h.
> Fix that by refactoring accessors and defining new ones as needed.
> 
> After these fixes, no code outside of migration includes migration.h,
> and no code outside of migration uses MigrationState.
> 
> This series depends on the following:
>   * migration patches in the series "allow cpr-reboot for vfio"
>   * singleton patch "migration: export fewer options"
> 
> Steve Sistare (10):
>   migration: remove migration.h references
>   migration: export migration_is_setup_or_active
>   migration: export migration_is_active
>   migration: export migration_is_running
>   migration: export vcpu_dirty_limit_period
>   migration: migration_thread_is_self
>   migration: migration_is_device
>   migration: migration_file_set_error
>   migration: privatize colo interfaces
>   migration: purge MigrationState from public interface
> 
>  hw/vfio/common.c   | 17 +++---
>  hw/vfio/container.c|  1 -
>  hw/vfio/migration.c| 11 ++-
>  hw/virtio/vhost-user.c |  1 -
>  hw/virtio/virtio-balloon.c |  1 -
>  include/migration/client-options.h |  1 +
>  include/migration/misc.h   | 17 +-
>  migration/colo.c   | 17 ++
>  migration/migration.c  | 67 
> --
>  migration/migration.h  |  7 ++--
>  migration/options.c| 11 +--
>  migration/ram.c|  5 ++-
>  migration/savevm.c |  2 +-
>  net/colo-compare.c |  3 +-
>  net/vhost-vdpa.c   |  3 +-
>  stubs/colo.c   |  1 -
>  system/dirtylimit.c| 12 +++
>  system/qdev-monitor.c  |  1 -
>  target/loongarch/kvm/kvm.c |  1 -
>  target/riscv/kvm/kvm-cpu.c |  4 +--
>  tests/unit/test-vmstate.c  |  1 -
>  21 files changed, 96 insertions(+), 88 deletions(-)
> 



Re: [PATCH v7 0/3] string list functions

2024-02-27 Thread Steven Sistare
All the changes look good - steve

On 2/27/2024 10:33 AM, Markus Armbruster wrote:
> This is Steve's work (v1 to v5), tweaked by Philippe (v6), and now by
> me.
> 
> v7:
> * Old PATCH 1 dropped
>   The proposed str_split() is a composition of g_strsplit() and a
>   conversion from char ** to strList *.  I'm willing to accept a
>   conversion function str_list_from_strv() next to
>   strv_from_str_list().  I doubt having a function for the composition
>   is worth the trouble.
> * Old PATCH 4 dropped
>   The tests mostly test g_strsplit().  I'm willing to accept focused
>   tests for strv_from_str_list() and str_list_from_strv().
> * PATCH 1-3: Commit messages tweaked
> * PATCH 2: strv_from_strList() renamed strv_from_str_list(), and moved
>   to qapi/qapi-type-helpers.c.  Function comment tweaked.  Local
>   variable @argv renamed to @strv.
> * PATCH 3: Adjust for the rename.
> 
> Steve Sistare (3):
>   qapi: New QAPI_LIST_LENGTH()
>   qapi: New strv_from_str_list()
>   migration: simplify exec migration functions
> 
>  include/qapi/type-helpers.h |  8 ++
>  include/qapi/util.h | 13 +
>  migration/exec.c| 57 ++---
>  qapi/qapi-type-helpers.c| 14 +
>  4 files changed, 43 insertions(+), 49 deletions(-)
> 



Re: [PATCH v6 0/5] string list functions

2024-02-27 Thread Steven Sistare
On 2/27/2024 10:28 AM, Markus Armbruster wrote:
> Philippe Mathieu-Daudé  writes:
> 
>> Hi Markus,
>>
>> Here are the patches I queued until you told me you'd
>> object to the CamelCase filename strList.[ch].
>>
>> Steve, please take over ;)
> 
> I'm going to post the part of the series I'm ready to queue as v7, with
> minor modifications:
> 
> * Rename strv_from_strList() to strv_from_str_list(), and put it into
>   qapi/qapi-type-helpers.c.
> 
> * Tweak its function comment.
> 
> * Rename its local variable @argv to @strv.
> 
> * Cosmetic commit message tweaks.
> 
> We can then talk about the remainder.

Thanks Markus, that saves some time and email.

- Steve



Re: [PATCH V1] migration: export fewer options

2024-02-27 Thread Steven Sistare
On 2/27/2024 3:08 AM, Peter Xu wrote:
> On Mon, Feb 26, 2024 at 09:41:15AM -0500, Steven Sistare wrote:
>> On 2/26/2024 2:40 AM, Markus Armbruster wrote:
>>> Steve Sistare  writes:
>>>
>>>> A small number of migration options are accessed by migration clients,
>>>> but to see them clients must include all of options.h, which is mostly
>>>> for migration core code.  migrate_mode() in particular will be needed by
>>>> multiple clients.
>>>>
>>>> Refactor the option declarations so clients can see the necessary few via
>>>> misc.h, which already exports a portion of the client API.
>>>>
>>>> Signed-off-by: Steve Sistare 
>>>> ---
>>>> I suggest that eventually we should define a single file migration/client.h
>>>> which exports everything needed by the simpler clients: blockers, 
>>>> notifiers,
>>>> options, cpr, and state accessors.
>>>> ---
>>>> ---
>>>>  hw/vfio/migration.c |  1 -
>>>>  hw/virtio/virtio-balloon.c  |  1 -
>>>>  include/migration/misc.h|  1 +
>>>>  include/migration/options-pub.h | 24 
>>>>  migration/options.h |  6 +-
>>>
>>> Unusual naming.  We have zero headers named -pub.h or -public.h, and
>>> dozens named like -int.h or -internal.h.  Please stick to the existing
>>> convention.
>>
>> In the spirit of minimizing changes, I went that route to avoid renaming the 
>> existing migration/options.h and its references:
>>
>> 0 migration/block-dirty-bit 82 #include "options.h"
>> 1 migration/block.c 32 #include "options.h"
>> 2 migration/colo.c  37 #include "options.h"
>> 3 migration/migration-hmp-c 35 #include "options.h"
>> 4 migration/migration.c 68 #include "options.h"
>> 5 migration/multifd-zlib.c  21 #include "options.h"
>> 6 migration/multifd-zstd.c  21 #include "options.h"
>> 7 migration/multifd.c   29 #include "options.h"
>> 8 migration/options.c   30 #include "options.h"
>> 9 migration/postcopy-ram.c  40 #include "options.h"
>> a migration/qemu-file.c 33 #include "options.h"
>> b migration/ram-compress.c  37 #include "options.h"
>> c migration/ram.c   63 #include "options.h"
>> d migration/rdma.c  40 #include "options.h"
>> e migration/savevm.c71 #include "options.h"
>> f migration/socket.c30 #include "options.h"
>> g migration/tls.c   25 #include "options.h"
>>
>> But I take your point.
>>
>> Peter, which do you prefer?
> 
> From statistics, "-internal.h" wins "-int.h":
> 
> $ git grep "\-internal.h" | wc -l
> 135
> $ git grep "\-int.h" | wc -l
> 3

Yes, I did the same search to choose the name for option A below.
But in option B, I keep the existing name for the private file,
and choose a new name for the public file.  There is no suffix
in use in qemu to denote a public file; we just use names indicating
the functionality, so I chose client-options.h.

Let's use client-options.h and move on.  I am preparing another cleanup
series that I think you will like.

- Steve

>>   A. rename: migration/options.h -> migration/options-internal.h 
>>  rename: include/migration/options-pub.h -> include/migration/options.h
>>
>>   B. rename: include/migration/options.h -> 
>> include/migration/client-options.h
>>
>> I prefer B. If you prefer B, but want a different file name, please choose 
>> the
>> final name.
> 
> Personally I don't have a strong opinion on the name.  I'll see whether
> Markus has any comment.
> 
> [and of course, I removed this patch from -staging queue to keep the
>  discussion going..]
> 



Re: [PATCH V4 00/14] allow cpr-reboot for vfio

2024-02-26 Thread Steven Sistare
On 2/26/2024 3:21 PM, Steven Sistare wrote:
> On 2/26/2024 4:01 AM, Peter Xu wrote:
>> On Mon, Feb 26, 2024 at 09:49:46AM +0100, Cédric Le Goater wrote:
>>> Go ahead. It will help me for the changes I am doing on error reporting
>>> for VFIO migration. I will rebase on top.
>>
>> Thanks for confirming.  I queued the migration patches then, but leave the
>> two vfio one for further discussion.
> 
> Very good, thanks.  I am always happy to move the ball a few yards closer to
> the goal line :)

Peter, beware that patch 3 needs an edit before being queued.
This hunk snuck in and should be deleted:

[PATCH V4 03/14] migration: convert to NotifierWithReturn
diff --git a/roms/seabios-hppa b/roms/seabios-hppa
index 03774ed..e4eac85 16
--- a/roms/seabios-hppa
+++ b/roms/seabios-hppa
@@ -1 +1 @@
-Subproject commit 03774edaad3bfae090ac96ca5450353c641637d1
+Subproject commit e4eac85880e8677f96d8b9e94de9f2eec9c0751f


- Steve



Re: [PATCH V4 00/14] allow cpr-reboot for vfio

2024-02-26 Thread Steven Sistare
On 2/26/2024 4:01 AM, Peter Xu wrote:
> On Mon, Feb 26, 2024 at 09:49:46AM +0100, Cédric Le Goater wrote:
>> Go ahead. It will help me for the changes I am doing on error reporting
>> for VFIO migration. I will rebase on top.
> 
> Thanks for confirming.  I queued the migration patches then, but leave the
> two vfio one for further discussion.

Very good, thanks.  I am always happy to move the ball a few yards closer to
the goal line :)

- Steve




Re: [PATCH V1] migration: export fewer options

2024-02-26 Thread Steven Sistare
On 2/25/2024 9:40 PM, Peter Xu wrote:
> On Fri, Feb 23, 2024 at 09:13:24AM -0800, Steve Sistare wrote:
>> A small number of migration options are accessed by migration clients,
>> but to see them clients must include all of options.h, which is mostly
>> for migration core code.  migrate_mode() in particular will be needed by
>> multiple clients.
>>
>> Refactor the option declarations so clients can see the necessary few via
>> misc.h, which already exports a portion of the client API.
>>
>> Signed-off-by: Steve Sistare 
> 
> Sounds reasonable, queued, thanks.
> 
>> ---
>> I suggest that eventually we should define a single file migration/client.h
>> which exports everything needed by the simpler clients: blockers, notifiers,
>> options, cpr, and state accessors.
> 
> What's the difference v.s. current migration/misc.h?

This file would be sufficient for most clients:

diff --git a/include/migration/client.h b/include/migration/client.h
new file mode 100644
index 000..a55e504
--- /dev/null
+++ b/include/migration/client.h
@@ -0,0 +1,6 @@
+#ifndef MIGRATION_CLIENT_H
+#define MIGRATION_CLIENT_H
+#include "migration/misc.h"
+#include "migration/blocker.h"
+#include "migration/client-options.h"
+#endif

Or, we could rename misc.h -> client.h and include blocker.h and 
client-options.h in it.

I just like the idea that most clients could include a single, obviously named 
file to
use the most-common exported API.  "misc.h" is not obvious, and not complete.

- Steve



Re: [PATCH v6 0/5] string list functions

2024-02-26 Thread Steven Sistare
Thanks for trying a V6 Philippe.  I'll take it from here.  It helps to know
that someone besides me thinks these functions are worth having.

- Steve

On 2/26/2024 9:11 AM, Philippe Mathieu-Daudé wrote:
> Hi Markus,
> 
> Here are the patches I queued until you told me you'd
> object to the CamelCase filename strList.[ch].
> 
> Steve, please take over ;)
> 
> Since v5:
> - Cover files in MAINTAINERS
> - Complete @docstring mentioning g_auto.
> 
> v5: 
> https://lore.kernel.org/qemu-devel/1708638470-114846-3-git-send-email-steven.sist...@oracle.com/
> 
> Steve Sistare (5):
>   util: str_split
>   qapi: QAPI_LIST_LENGTH
>   util: strv_from_strList
>   util: strList unit tests
>   migration: simplify exec migration functions
> 
>  MAINTAINERS   |  2 +
>  include/monitor/hmp.h |  1 -
>  include/qapi/util.h   | 13 +++
>  include/qemu/strList.h| 33 
>  migration/exec.c  | 57 
>  monitor/hmp-cmds.c| 19 --
>  net/net-hmp-cmds.c|  3 +-
>  stats/stats-hmp-cmds.c|  3 +-
>  tests/unit/test-strList.c | 80 +++
>  util/strList.c| 38 +++
>  tests/unit/meson.build|  1 +
>  util/meson.build  |  1 +
>  12 files changed, 180 insertions(+), 71 deletions(-)
>  create mode 100644 include/qemu/strList.h
>  create mode 100644 tests/unit/test-strList.c
>  create mode 100644 util/strList.c
> 



Re: [PATCH V1] migration: export fewer options

2024-02-26 Thread Steven Sistare
On 2/26/2024 2:40 AM, Markus Armbruster wrote:
> Steve Sistare  writes:
> 
>> A small number of migration options are accessed by migration clients,
>> but to see them clients must include all of options.h, which is mostly
>> for migration core code.  migrate_mode() in particular will be needed by
>> multiple clients.
>>
>> Refactor the option declarations so clients can see the necessary few via
>> misc.h, which already exports a portion of the client API.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>> I suggest that eventually we should define a single file migration/client.h
>> which exports everything needed by the simpler clients: blockers, notifiers,
>> options, cpr, and state accessors.
>> ---
>> ---
>>  hw/vfio/migration.c |  1 -
>>  hw/virtio/virtio-balloon.c  |  1 -
>>  include/migration/misc.h|  1 +
>>  include/migration/options-pub.h | 24 
>>  migration/options.h |  6 +-
> 
> Unusual naming.  We have zero headers named -pub.h or -public.h, and
> dozens named like -int.h or -internal.h.  Please stick to the existing
> convention.

In the spirit of minimizing changes, I went that route to avoid renaming the 
existing migration/options.h and its references:

0 migration/block-dirty-bit 82 #include "options.h"
1 migration/block.c 32 #include "options.h"
2 migration/colo.c  37 #include "options.h"
3 migration/migration-hmp-c 35 #include "options.h"
4 migration/migration.c 68 #include "options.h"
5 migration/multifd-zlib.c  21 #include "options.h"
6 migration/multifd-zstd.c  21 #include "options.h"
7 migration/multifd.c   29 #include "options.h"
8 migration/options.c   30 #include "options.h"
9 migration/postcopy-ram.c  40 #include "options.h"
a migration/qemu-file.c 33 #include "options.h"
b migration/ram-compress.c  37 #include "options.h"
c migration/ram.c   63 #include "options.h"
d migration/rdma.c  40 #include "options.h"
e migration/savevm.c71 #include "options.h"
f migration/socket.c30 #include "options.h"
g migration/tls.c   25 #include "options.h"

But I take your point.

Peter, which do you prefer?

  A. rename: migration/options.h -> migration/options-internal.h 
 rename: include/migration/options-pub.h -> include/migration/options.h

  B. rename: include/migration/options.h -> include/migration/client-options.h

I prefer B. If you prefer B, but want a different file name, please choose the
final name.

- Steve



Re: [PATCH V5 1/5] util: str_split

2024-02-23 Thread Steven Sistare
On 2/23/2024 12:41 PM, Philippe Mathieu-Daudé wrote:
> On 23/2/24 15:01, Steven Sistare wrote:
>> On 2/23/2024 1:01 AM, Philippe Mathieu-Daudé wrote:
>>> On 22/2/24 22:47, Steve Sistare wrote:
>>>> Generalize hmp_split_at_comma() to take any delimiter string, rename
>>>> as str_split(), and move it to util/strList.c.
>>>>
>>>> No functional change.
>>>>
>>>> Signed-off-by: Steve Sistare 
>>>> ---
>>>>    include/monitor/hmp.h  |  1 -
>>>>    include/qemu/strList.h | 24 
>>>>    monitor/hmp-cmds.c | 19 ---
>>>>    net/net-hmp-cmds.c |  3 ++-
>>>>    stats/stats-hmp-cmds.c |  3 ++-
>>>>    util/meson.build   |  1 +
>>>>    util/strList.c | 24 
>>>>    7 files changed, 53 insertions(+), 22 deletions(-)
>>>>    create mode 100644 include/qemu/strList.h
>>>>    create mode 100644 util/strList.c
>>>
>>>
>>>> +#include "qapi/qapi-builtin-types.h"
>>>> +
>>>> +/*
>>>> + * Split @str into a strList using the delimiter string @delim.
>>>> + * The delimiter is not included in the result.
>>>> + * Return NULL if @str is NULL or an empty string.
>>>> + * A leading, trailing, or consecutive delimiter produces an
>>>> + * empty string at that position in the output.
>>>> + * All strings are g_strdup'd, and the result can be freed
>>>> + * using qapi_free_strList.
>>>
>>> Note "qapi/qapi-builtin-types.h" defines:
>>>
>>>    G_DEFINE_AUTOPTR_CLEANUP_FUNC(strList, qapi_free_strList)
>>>
>>> Maybe mention we can also use:
>>>
>>>    g_autoptr(strList)
>>
>> Thanks Philippe.  If we get to V6 for this series, I will mention this,
>> and also mention g_autoptr(GStrv) in the header comment for 
>> strv_from_strlist.
> 
> If there is no need for v6, do you mind sharing here what would be
> the resulting comment? Maybe Markus can update it directly...
> (assuming he takes your series).

Sure - steve


diff --git a/include/qemu/strList.h b/include/qemu/strList.h
index c1eb1dd..b13bd53 100644
--- a/include/qemu/strList.h
+++ b/include/qemu/strList.h
@@ -17,13 +17,16 @@
  * A leading, trailing, or consecutive delimiter produces an
  * empty string at that position in the output.
  * All strings are g_strdup'd, and the result can be freed
- * using qapi_free_strList.
+ * using qapi_free_strList, or by declaring a local variable
+ * with g_autoptr(strList).
  */
 strList *str_split(const char *str, const char *delim);

 /*
  * Produce and return a NULL-terminated array of strings from @list.
- * The result is g_malloc'd and all strings are g_strdup'd.
+ * The result is g_malloc'd and all strings are g_strdup'd.  The result
+ * can be freed using g_strfreev, or by declaring a local variable with
+ * g_auto(GStrv).
  */
 char **strv_from_strList(const strList *list);





Re: [PATCH V5 1/5] util: str_split

2024-02-23 Thread Steven Sistare
On 2/23/2024 1:01 AM, Philippe Mathieu-Daudé wrote:
> On 22/2/24 22:47, Steve Sistare wrote:
>> Generalize hmp_split_at_comma() to take any delimiter string, rename
>> as str_split(), and move it to util/strList.c.
>>
>> No functional change.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>   include/monitor/hmp.h  |  1 -
>>   include/qemu/strList.h | 24 
>>   monitor/hmp-cmds.c | 19 ---
>>   net/net-hmp-cmds.c |  3 ++-
>>   stats/stats-hmp-cmds.c |  3 ++-
>>   util/meson.build   |  1 +
>>   util/strList.c | 24 
>>   7 files changed, 53 insertions(+), 22 deletions(-)
>>   create mode 100644 include/qemu/strList.h
>>   create mode 100644 util/strList.c
> 
> 
>> +#include "qapi/qapi-builtin-types.h"
>> +
>> +/*
>> + * Split @str into a strList using the delimiter string @delim.
>> + * The delimiter is not included in the result.
>> + * Return NULL if @str is NULL or an empty string.
>> + * A leading, trailing, or consecutive delimiter produces an
>> + * empty string at that position in the output.
>> + * All strings are g_strdup'd, and the result can be freed
>> + * using qapi_free_strList.
> 
> Note "qapi/qapi-builtin-types.h" defines:
> 
>   G_DEFINE_AUTOPTR_CLEANUP_FUNC(strList, qapi_free_strList)
> 
> Maybe mention we can also use:
> 
>   g_autoptr(strList)

Thanks Philippe.  If we get to V6 for this series, I will mention this,
and also mention g_autoptr(GStrv) in the header comment for strv_from_strlist.

- Steve

>> + */
>> +strList *str_split(const char *str, const char *delim);
>> +
>> +#endif
> 



Re: [PATCH V4 1/5] util: strList_from_string

2024-02-22 Thread Steven Sistare
On 2/21/2024 12:01 PM, Steven Sistare wrote:
> On 2/21/2024 8:29 AM, Markus Armbruster wrote:
>> I apologize for the lateness of my review.
> 
> No problem.  Thanks for the review.
> 
>> Steve Sistare  writes:
>>
>>> Generalize hmp_split_at_comma() to take any delimiter string, rename
>>> as strList_from_string(), and move it to util/strList.c.
>>>
>>> No functional change.
>>>
>>> Signed-off-by: Steve Sistare 
>>
>> I can't see an actual use of generalized delimiters outside tests in
>> this series.  Do you have uses?
> 
> In this series, it is called from hmp_announce_self and stats_filter; 
> those were formerly calls to hmp_split_at_comma.
> 
> In my live update cpr-exec series, there is an additional call site, with a
> space delimiter instead of comma.  Live update V9 is posted but is old and 
> obsolete.  
> I will post V10 soon, but I am hoping you can pull this series first, so I 
> can 
> whittle down my pending patches and omit these from V10.
> 
>>> ---
>>>  include/monitor/hmp.h  |  1 -
>>>  include/qemu/strList.h | 24 
>>>  monitor/hmp-cmds.c | 19 ---
>>>  net/net-hmp-cmds.c |  3 ++-
>>>  stats/stats-hmp-cmds.c |  3 ++-
>>>  util/meson.build   |  1 +
>>>  util/strList.c | 24 
>>>  7 files changed, 53 insertions(+), 22 deletions(-)
>>>  create mode 100644 include/qemu/strList.h
>>>  create mode 100644 util/strList.c
>>>
>>> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
>>> index 13f9a2d..2df661e 100644
>>> --- a/include/monitor/hmp.h
>>> +++ b/include/monitor/hmp.h
>>> @@ -19,7 +19,6 @@
>>>  
>>>  bool hmp_handle_error(Monitor *mon, Error *err);
>>>  void hmp_help_cmd(Monitor *mon, const char *name);
>>> -strList *hmp_split_at_comma(const char *str);
>>>  
>>>  void hmp_info_name(Monitor *mon, const QDict *qdict);
>>>  void hmp_info_version(Monitor *mon, const QDict *qdict);
>>> diff --git a/include/qemu/strList.h b/include/qemu/strList.h
>>> new file mode 100644
>>> index 000..010237f
>>> --- /dev/null
>>> +++ b/include/qemu/strList.h
>>> @@ -0,0 +1,24 @@
>>> +/*
>>> + * Copyright (c) 2022 - 2024 Oracle and/or its affiliates.
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or 
>>> later.
>>> + * See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#ifndef QEMU_STR_LIST_H
>>> +#define QEMU_STR_LIST_H
>>> +
>>> +#include "qapi/qapi-builtin-types.h"
>>> +
>>> +/*
>>> + * Break @in into a strList using the delimiter string @delim.
>>> + * The delimiter is not included in the result.
>>> + * Return NULL if @in is NULL or an empty string.
>>> + * A leading, trailing, or consecutive delimiter produces an
>>> + * empty string at that position in the output.
>>> + * All strings are g_strdup'd, and the result can be freed
>>> + * using qapi_free_strList.
>>> + */
>>> +strList *strList_from_string(const char *in, const char *delim);
>>
>> The function name no longer tells us explicitly what the function does:
>> splitting the string.
> 
> The first sentence does not say it?
>   "Break @in into a strList using the delimiter string @delim"
> 
> Would you prefer this?
>   "Split string @in into a strList using the delimiter string @delim"

Sorry, I read your comment too quickly.  You want a different function name.
I propose:
  strList *str_split(const char *str, const char *delim);

- Steve



Re: [PATCH V4 00/14] allow cpr-reboot for vfio

2024-02-22 Thread Steven Sistare
Peter (and David if interested): these patches still need RB:
  migration: notifier error checking
  migration: stop vm for cpr
  migration: update cpr-reboot description
  migration: options incompatible with cpr

Alex, these patches still need RB:
  vfio: register container for cpr
  vfio: allow cpr-reboot migration if suspended

Thanks!

- Steve

On 2/22/2024 12:28 PM, Steve Sistare wrote:
> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
> guest drivers' suspend methods flush outstanding requests and re-initialize
> the devices, and thus there is no device state to save and restore.  The
> user is responsible for suspending the guest before initiating cpr, such as
> by issuing guest-suspend-ram to the qemu guest agent.
> 
> Most of the patches in this series enhance migration notifiers so they can
> return an error status and message.  The last few patches register a notifier
> for vfio that returns an error if the guest is not suspended.
> 
> Changes in V3:
>   * update to tip, add RB's
>   * replace MigrationStatus with new enum MigrationEventType
>   * simplify migrate_fd_connect error recovery
>   * support vfio iommufd containers
>   * add patches:
>   migration: stop vm for cpr
>   migration: update cpr-reboot description
> 
> Changes in V4:
>   * rebase to tip, add RB's
>   * add patch to prevent options such as precopy from being used with cpr.
>   migration: options incompatible with cpr
>   * refactor subroutines in "stop vm for cpr"
>   * simplify "notifier error checking" patch by restricting that only
> notifiers for MIG_EVENT_PRECOPY_SETUP may return an error.
>   * doc that a fail event may be sent without a prior setup event
> 
> Steve Sistare (14):
>   notify: pass error to notifier with return
>   migration: remove error from notifier data
>   migration: convert to NotifierWithReturn
>   migration: MigrationEvent for notifiers
>   migration: remove postcopy_after_devices
>   migration: MigrationNotifyFunc
>   migration: per-mode notifiers
>   migration: refactor migrate_fd_connect failures
>   migration: notifier error checking
>   migration: stop vm for cpr
>   vfio: register container for cpr
>   vfio: allow cpr-reboot migration if suspended
>   migration: update cpr-reboot description
>   migration: options incompatible with cpr
> 
>  hw/net/virtio-net.c   |  13 +--
>  hw/vfio/common.c  |   2 +-
>  hw/vfio/container.c   |  11 ++-
>  hw/vfio/cpr.c |  39 +
>  hw/vfio/iommufd.c |   6 ++
>  hw/vfio/meson.build   |   1 +
>  hw/vfio/migration.c   |  15 ++--
>  hw/vfio/trace-events  |   2 +-
>  hw/virtio/vhost-user.c|  10 +--
>  hw/virtio/virtio-balloon.c|   3 +-
>  include/hw/vfio/vfio-common.h |   5 +-
>  include/hw/vfio/vfio-container-base.h |   1 +
>  include/hw/virtio/virtio-net.h|   2 +-
>  include/migration/misc.h  |  47 +--
>  include/qemu/notify.h |   8 +-
>  migration/migration.c | 148 
> +++---
>  migration/migration.h |   4 -
>  migration/postcopy-ram.c  |   3 +-
>  migration/postcopy-ram.h  |   1 -
>  migration/ram.c   |   3 +-
>  net/vhost-vdpa.c  |  14 ++--
>  qapi/migration.json   |  37 ++---
>  roms/seabios-hppa |   2 +-
>  ui/spice-core.c   |  17 ++--
>  util/notify.c |   5 +-
>  25 files changed, 275 insertions(+), 124 deletions(-)
>  create mode 100644 hw/vfio/cpr.c
> 



Re: [PATCH V3 10/13] migration: stop vm for cpr

2024-02-22 Thread Steven Sistare
On 2/22/2024 4:30 AM, Peter Xu wrote:
> On Thu, Feb 22, 2024 at 05:12:53PM +0800, Peter Xu wrote:
>> On Wed, Feb 21, 2024 at 04:20:07PM -0500, Steven Sistare wrote:
>>> On 2/20/2024 2:33 AM, Peter Xu wrote:
>>>> On Thu, Feb 08, 2024 at 10:54:03AM -0800, Steve Sistare wrote:
>>>>> When migration for cpr is initiated, stop the vm and set state
>>>>> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
>>>>> possibility of ram and device state being out of sync, and guarantees
>>>>> that a guest in the suspended state remains suspended, because qmp_cont
>>>>> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
>>>>>
>>>>> Signed-off-by: Steve Sistare 
>>>>> ---
>>>>>  include/migration/misc.h |  1 +
>>>>>  migration/migration.c| 32 +---
>>>>>  2 files changed, 26 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>>>>> index 6dc234b..54c99a3 100644
>>>>> --- a/include/migration/misc.h
>>>>> +++ b/include/migration/misc.h
>>>>> @@ -60,6 +60,7 @@ void migration_object_init(void);
>>>>>  void migration_shutdown(void);
>>>>>  bool migration_is_idle(void);
>>>>>  bool migration_is_active(MigrationState *);
>>>>> +bool migrate_mode_is_cpr(MigrationState *);
>>>>>  
>>>>>  typedef enum MigrationEventType {
>>>>>  MIG_EVENT_PRECOPY_SETUP,
>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>> index d1fce9e..fc5c587 100644
>>>>> --- a/migration/migration.c
>>>>> +++ b/migration/migration.c
>>>>> @@ -1603,6 +1603,11 @@ bool migration_is_active(MigrationState *s)
>>>>>  s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
>>>>>  }
>>>>>  
>>>>> +bool migrate_mode_is_cpr(MigrationState *s)
>>>>> +{
>>>>> +return s->parameters.mode == MIG_MODE_CPR_REBOOT;
>>>>> +}
>>>>> +
>>>>>  int migrate_init(MigrationState *s, Error **errp)
>>>>>  {
>>>>>  int ret;
>>>>> @@ -2651,13 +2656,14 @@ static int 
>>>>> migration_completion_precopy(MigrationState *s,
>>>>>  bql_lock();
>>>>>  migration_downtime_start(s);
>>>>>  
>>>>> -s->vm_old_state = runstate_get();
>>>>> -global_state_store();
>>>>> -
>>>>> -ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
>>>>> -trace_migration_completion_vm_stop(ret);
>>>>> -if (ret < 0) {
>>>>> -goto out_unlock;
>>>>> +if (!migrate_mode_is_cpr(s)) {
>>>>> +s->vm_old_state = runstate_get();
>>>>> +global_state_store();
>>>>> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
>>>>> +trace_migration_completion_vm_stop(ret);
>>>>> +if (ret < 0) {
>>>>> +goto out_unlock;
>>>>> +}
>>>>>  }
>>>>>  
>>>>>  ret = migration_maybe_pause(s, current_active_state,
>>>>> @@ -3576,6 +3582,7 @@ void migrate_fd_connect(MigrationState *s, Error 
>>>>> *error_in)
>>>>>  Error *local_err = NULL;
>>>>>  uint64_t rate_limit;
>>>>>  bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
>>>>> +int ret;
>>>>>  
>>>>>  /*
>>>>>   * If there's a previous error, free it and prepare for another one.
>>>>> @@ -3651,6 +3658,17 @@ void migrate_fd_connect(MigrationState *s, Error 
>>>>> *error_in)
>>>>>  goto fail;
>>>>>  }
>>>>>  
>>>>> +if (migrate_mode_is_cpr(s)) {
>>>>> +s->vm_old_state = runstate_get();
>>>>> +global_state_store();
>>>>> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
>>>>> +trace_migration_completion_vm_stop(ret);
>>>>> +if (ret < 0) {
>>>>> +error_setg(_err, "migration_stop_vm failed, error %d", 
>>>>> -ret);
>>>>> + 

Re: [PATCH V3 10/13] migration: stop vm for cpr

2024-02-22 Thread Steven Sistare
On 2/22/2024 4:03 AM, Peter Xu wrote:
> On Wed, Feb 21, 2024 at 04:23:07PM -0500, Steven Sistare wrote:
>>> How about postcopy?  I know it's nonsense to enable postcopy for cpr.. but
>>> iiuc we don't yet forbid an user doing so.  Maybe we should?
>>
>> How about this?
>>
>> ---
>> @@ -3600,6 +3600,11 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  return;
>>  }
>>
>> +if (migrate_mode_is_cpr(s) && migrate_postcopy()) {
>> +error_setg(_err, "cannot mix postcopy and cpr");
>> +goto fail;
>> +}
>> +
>>  if (resume) {
>>  /* This is a resumed migration */
>>  rate_limit = migrate_max_postcopy_bandwidth();
>> 
> 
> migrate_fd_connect() will be a bit late, the error won't be able to be
> attached in the "migrate" request.  Perhaps, migrate_prepare()?

Thank you, that is better - steve




Re: [PATCH V3 10/13] migration: stop vm for cpr

2024-02-21 Thread Steven Sistare
On 2/20/2024 2:33 AM, Peter Xu wrote:
> On Thu, Feb 08, 2024 at 10:54:03AM -0800, Steve Sistare wrote:
>> When migration for cpr is initiated, stop the vm and set state
>> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
>> possibility of ram and device state being out of sync, and guarantees
>> that a guest in the suspended state remains suspended, because qmp_cont
>> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  include/migration/misc.h |  1 +
>>  migration/migration.c| 32 +---
>>  2 files changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index 6dc234b..54c99a3 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -60,6 +60,7 @@ void migration_object_init(void);
>>  void migration_shutdown(void);
>>  bool migration_is_idle(void);
>>  bool migration_is_active(MigrationState *);
>> +bool migrate_mode_is_cpr(MigrationState *);
>>  
>>  typedef enum MigrationEventType {
>>  MIG_EVENT_PRECOPY_SETUP,
>> diff --git a/migration/migration.c b/migration/migration.c
>> index d1fce9e..fc5c587 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1603,6 +1603,11 @@ bool migration_is_active(MigrationState *s)
>>  s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
>>  }
>>  
>> +bool migrate_mode_is_cpr(MigrationState *s)
>> +{
>> +return s->parameters.mode == MIG_MODE_CPR_REBOOT;
>> +}
>> +
>>  int migrate_init(MigrationState *s, Error **errp)
>>  {
>>  int ret;
>> @@ -2651,13 +2656,14 @@ static int 
>> migration_completion_precopy(MigrationState *s,
>>  bql_lock();
>>  migration_downtime_start(s);
>>  
>> -s->vm_old_state = runstate_get();
>> -global_state_store();
>> -
>> -ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
>> -trace_migration_completion_vm_stop(ret);
>> -if (ret < 0) {
>> -goto out_unlock;
>> +if (!migrate_mode_is_cpr(s)) {
>> +s->vm_old_state = runstate_get();
>> +global_state_store();
>> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
>> +trace_migration_completion_vm_stop(ret);
>> +if (ret < 0) {
>> +goto out_unlock;
>> +}
>>  }
>>  
>>  ret = migration_maybe_pause(s, current_active_state,
>> @@ -3576,6 +3582,7 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  Error *local_err = NULL;
>>  uint64_t rate_limit;
>>  bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
>> +int ret;
>>  
>>  /*
>>   * If there's a previous error, free it and prepare for another one.
>> @@ -3651,6 +3658,17 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  goto fail;
>>  }
>>  
>> +if (migrate_mode_is_cpr(s)) {
>> +s->vm_old_state = runstate_get();
>> +global_state_store();
>> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
>> +trace_migration_completion_vm_stop(ret);
>> +if (ret < 0) {
>> +error_setg(_err, "migration_stop_vm failed, error %d", 
>> -ret);
>> +goto fail;
>> +}
>> +}
> 
> Could we have a helper function for the shared codes?
> 
> How about postcopy?  I know it's nonsense to enable postcopy for cpr.. but
> iiuc we don't yet forbid an user doing so.  Maybe we should?

How about this?

---
@@ -3600,6 +3600,11 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 return;
 }

+if (migrate_mode_is_cpr(s) && migrate_postcopy()) {
+error_setg(_err, "cannot mix postcopy and cpr");
+goto fail;
+}
+
 if (resume) {
 /* This is a resumed migration */
 rate_limit = migrate_max_postcopy_bandwidth();


- Steve



Re: [PATCH V3 10/13] migration: stop vm for cpr

2024-02-21 Thread Steven Sistare
On 2/20/2024 2:33 AM, Peter Xu wrote:
> On Thu, Feb 08, 2024 at 10:54:03AM -0800, Steve Sistare wrote:
>> When migration for cpr is initiated, stop the vm and set state
>> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
>> possibility of ram and device state being out of sync, and guarantees
>> that a guest in the suspended state remains suspended, because qmp_cont
>> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  include/migration/misc.h |  1 +
>>  migration/migration.c| 32 +---
>>  2 files changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index 6dc234b..54c99a3 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -60,6 +60,7 @@ void migration_object_init(void);
>>  void migration_shutdown(void);
>>  bool migration_is_idle(void);
>>  bool migration_is_active(MigrationState *);
>> +bool migrate_mode_is_cpr(MigrationState *);
>>  
>>  typedef enum MigrationEventType {
>>  MIG_EVENT_PRECOPY_SETUP,
>> diff --git a/migration/migration.c b/migration/migration.c
>> index d1fce9e..fc5c587 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1603,6 +1603,11 @@ bool migration_is_active(MigrationState *s)
>>  s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
>>  }
>>  
>> +bool migrate_mode_is_cpr(MigrationState *s)
>> +{
>> +return s->parameters.mode == MIG_MODE_CPR_REBOOT;
>> +}
>> +
>>  int migrate_init(MigrationState *s, Error **errp)
>>  {
>>  int ret;
>> @@ -2651,13 +2656,14 @@ static int 
>> migration_completion_precopy(MigrationState *s,
>>  bql_lock();
>>  migration_downtime_start(s);
>>  
>> -s->vm_old_state = runstate_get();
>> -global_state_store();
>> -
>> -ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
>> -trace_migration_completion_vm_stop(ret);
>> -if (ret < 0) {
>> -goto out_unlock;
>> +if (!migrate_mode_is_cpr(s)) {
>> +s->vm_old_state = runstate_get();
>> +global_state_store();
>> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
>> +trace_migration_completion_vm_stop(ret);
>> +if (ret < 0) {
>> +goto out_unlock;
>> +}
>>  }
>>  
>>  ret = migration_maybe_pause(s, current_active_state,
>> @@ -3576,6 +3582,7 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  Error *local_err = NULL;
>>  uint64_t rate_limit;
>>  bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
>> +int ret;
>>  
>>  /*
>>   * If there's a previous error, free it and prepare for another one.
>> @@ -3651,6 +3658,17 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  goto fail;
>>  }
>>  
>> +if (migrate_mode_is_cpr(s)) {
>> +s->vm_old_state = runstate_get();
>> +global_state_store();
>> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
>> +trace_migration_completion_vm_stop(ret);
>> +if (ret < 0) {
>> +error_setg(_err, "migration_stop_vm failed, error %d", 
>> -ret);
>> +goto fail;
>> +}
>> +}
> 
> Could we have a helper function for the shared codes?

I propose to add code to migration_stop_vm to make it the helper.  Some call 
sites emit
more traces (via migration_stop_vm) as a result of my refactoring, and postcopy 
start sets 
vm_old_state, which is not used thereafter in that path.  Those changes seem 
harmless to me.
Tell me what you think:

---
diff --git a/migration/migration.c b/migration/migration.c
index fc5c587..30d2b08 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -107,12 +107,6 @@ static int migration_maybe_pause(MigrationState *s,
 static void migrate_fd_cancel(MigrationState *s);
 static bool close_return_path_on_source(MigrationState *s);

-static void migration_downtime_start(MigrationState *s)
-{
-trace_vmstate_downtime_checkpoint("src-downtime-start");
-s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-}
-
 static void migration_downtime_end(MigrationState *s)
 {
 int64_t now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
@@ -161,11 +155,20 @@ static gint page_request_addr_cmp(gconstpointer ap, 
gconstpo
 return (a > b) - (a < b);
 }

-int migration_stop_vm(RunState state)
+static int migration_stop_vm(MigrationState *s, RunState state)
 {
-int ret = vm_stop_force_state(state);
+int ret;
+
+trace_vmstate_downtime_checkpoint("src-downtime-start");
+s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+s->vm_old_state = runstate_get();
+global_state_store();
+
+ret = vm_stop_force_state(state);

 trace_vmstate_downtime_checkpoint("src-vm-stopped");
+trace_migration_completion_vm_stop(ret);

 return ret;
 }
@@ -2454,10 +2457,7 @@ static int 

Re: [PATCH V3 12/13] vfio: allow cpr-reboot migration if suspended

2024-02-21 Thread Steven Sistare
Hi Alex, any comments or RB on this or patch 11?  The last few changes I am 
making for Peter will not change this patch.

- Steve

On 2/8/2024 1:54 PM, Steve Sistare wrote:
> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
> guest drivers' suspend methods flush outstanding requests and re-initialize
> the devices, and thus there is no device state to save and restore.  The
> user is responsible for suspending the guest before initiating cpr, such as
> by issuing guest-suspend-ram to the qemu guest agent.
> 
> Relax the vfio blocker so it does not apply to cpr, and add a notifier that
> verifies the guest is suspended.
> 
> Signed-off-by: Steve Sistare 
> ---
>  hw/vfio/common.c  |  2 +-
>  hw/vfio/cpr.c | 20 
>  hw/vfio/migration.c   |  2 +-
>  include/hw/vfio/vfio-container-base.h |  1 +
>  4 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 059bfdc..ff88c3f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice 
> *vbasedev, Error **errp)
>  error_setg(_devices_migration_blocker,
> "Multiple VFIO devices migration is supported only if all of "
> "them support P2P migration");
> -ret = migrate_add_blocker(_devices_migration_blocker, errp);
> +ret = migrate_add_blocker_normal(_devices_migration_blocker, 
> errp);
>  
>  return ret;
>  }
> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
> index 3bede54..392c2dd 100644
> --- a/hw/vfio/cpr.c
> +++ b/hw/vfio/cpr.c
> @@ -7,13 +7,33 @@
>  
>  #include "qemu/osdep.h"
>  #include "hw/vfio/vfio-common.h"
> +#include "migration/misc.h"
>  #include "qapi/error.h"
> +#include "sysemu/runstate.h"
> +
> +static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
> +MigrationEvent *e, Error **errp)
> +{
> +if (e->type == MIG_EVENT_PRECOPY_SETUP &&
> +!runstate_check(RUN_STATE_SUSPENDED) && !vm_get_suspended()) {
> +
> +error_setg(errp,
> +"VFIO device only supports cpr-reboot for runstate suspended");
> +
> +return -1;
> +}
> +return 0;
> +}
>  
>  int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp)
>  {
> +migration_add_notifier_mode(>cpr_reboot_notifier,
> +vfio_cpr_reboot_notifier,
> +MIG_MODE_CPR_REBOOT);
>  return 0;
>  }
>  
>  void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer)
>  {
> +migration_remove_notifier(>cpr_reboot_notifier);
>  }
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 50140ed..2050ac8 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -889,7 +889,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, 
> Error *err, Error **errp)
>  vbasedev->migration_blocker = error_copy(err);
>  error_free(err);
>  
> -return migrate_add_blocker(>migration_blocker, errp);
> +return migrate_add_blocker_normal(>migration_blocker, errp);
>  }
>  
>  /* -- */
> diff --git a/include/hw/vfio/vfio-container-base.h 
> b/include/hw/vfio/vfio-container-base.h
> index b2813b0..3582d5f 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -49,6 +49,7 @@ typedef struct VFIOContainerBase {
>  QLIST_ENTRY(VFIOContainerBase) next;
>  QLIST_HEAD(, VFIODevice) device_list;
>  GList *iova_ranges;
> +NotifierWithReturn cpr_reboot_notifier;
>  } VFIOContainerBase;
>  
>  typedef struct VFIOGuestIOMMU {



Re: [PATCH V4 2/5] qapi: QAPI_LIST_LENGTH

2024-02-21 Thread Steven Sistare
On 2/21/2024 8:29 AM, Markus Armbruster wrote:
> Steve Sistare  writes:
> 
>> Signed-off-by: Steve Sistare 
>> Reviewed-by: Marc-André Lureau 
>> ---
>>  include/qapi/util.h | 13 +
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/include/qapi/util.h b/include/qapi/util.h
>> index 81a2b13..e1b8b1d 100644
>> --- a/include/qapi/util.h
>> +++ b/include/qapi/util.h
>> @@ -56,4 +56,17 @@ int parse_qapi_name(const char *name, bool complete);
>>  (tail) = &(*(tail))->next; \
>>  } while (0)
>>  
>> +/*
>> + * For any GenericList @list, return its length.
>> + */
>> +#define QAPI_LIST_LENGTH(list) \
>> +({ \
>> +int len = 0; \
> 
> size_t

ok.

>> +typeof(list) elem; \
> 
> Name this @tail, please.

ok.

>> +for (elem = list; elem != NULL; elem = elem->next) { \
>> +len++; \
>> +} \
>> +len; \
>> +})
>> +
>>  #endif
> 
> This is a macro instead of a function so users don't have to cast their
> FooList * to GenericList *.
> 
> The only user outside tests is strv_from_strList().  I'd be tempted to
> open-code it there and call it a day.  Or do you have more users in
> mind?

That's the only use.  If I make it private, I would still define it as
a static subroutine in util/strList.c, because it simplifies strv_from_strList.
IMO providing a public list length function or macro is pretty basic, but
I am not wedded to it.  Your call.

- Steve

> If we keep the macro, please align the backslashes like this:
> 
>#define QAPI_LIST_LENGTH(list)  \
>({  \
>int len = 0;\
>typeof(list) elem;  \
>for (elem = list; elem != NULL; elem = elem->next) {\
>len++;  \
>}   \
>len;\
>})
> 



Re: [PATCH V4 3/5] util: strv_from_strList

2024-02-21 Thread Steven Sistare
On 2/21/2024 8:14 AM, Markus Armbruster wrote:
> Steve Sistare  writes:
> 
>> Signed-off-by: Steve Sistare 
>> Reviewed-by: Marc-André Lureau 
>> ---
>>  include/qemu/strList.h |  6 ++
>>  util/strList.c | 14 ++
>>  2 files changed, 20 insertions(+)
>>
>> diff --git a/include/qemu/strList.h b/include/qemu/strList.h
>> index 010237f..4b86aa6 100644
>> --- a/include/qemu/strList.h
>> +++ b/include/qemu/strList.h
>> @@ -21,4 +21,10 @@
>>   */
>>  strList *strList_from_string(const char *in, const char *delim);
>>  
>> +/*
>> + * Produce and return a NULL-terminated array of strings from @args.
>> + * The result is g_malloc'd and all strings are g_strdup'd.
>> + */
>> +GStrv strv_from_strList(const strList *args);
>> +
>>  #endif
>> diff --git a/util/strList.c b/util/strList.c
>> index 7991de3..bad4187 100644
>> --- a/util/strList.c
>> +++ b/util/strList.c
>> @@ -22,3 +22,17 @@ strList *strList_from_string(const char *str, const char 
>> *delim)
>>  
>>  return res;
>>  }
>> +
>> +GStrv strv_from_strList(const strList *args)
> 
> Suggest to name the argument @list.
> 
>> +{
>> +const strList *arg;
> 
> Suggest to name this @tail.

ok.

>> +int i = 0;
>> +GStrv argv = g_new(char *, QAPI_LIST_LENGTH(args) + 1);
>> +
>> +for (arg = args; arg != NULL; arg = arg->next) {
>> +argv[i++] = g_strdup(arg->value);
>> +}
>> +argv[i] = NULL;
>> +
>> +return argv;
>> +}
> 
> Can we use char ** instread of GStrv?  I'd find that clearer.  For what
> it's worth, GLib documentation of functions like g_strsplit() doesn't
> use the GStrv typedef, either.

ok.

- Steve



Re: [PATCH V4 4/5] util: strList unit tests

2024-02-21 Thread Steven Sistare
On 2/21/2024 8:19 AM, Markus Armbruster wrote:
> Steve Sistare  writes:
> 
>> Signed-off-by: Steve Sistare 
>> Reviewed-by: Marc-André Lureau 
>> ---
>>  tests/unit/meson.build|  1 +
>>  tests/unit/test-strList.c | 80 
>> +++
>>  2 files changed, 81 insertions(+)
>>  create mode 100644 tests/unit/test-strList.c
>>
>> diff --git a/tests/unit/meson.build b/tests/unit/meson.build
>> index 69f9c05..113d12e 100644
>> --- a/tests/unit/meson.build
>> +++ b/tests/unit/meson.build
>> @@ -35,6 +35,7 @@ tests = {
>>'test-rcu-simpleq': [],
>>'test-rcu-tailq': [],
>>'test-rcu-slist': [],
>> +  'test-strList': [],
>>'test-qdist': [],
>>'test-qht': [],
>>'test-qtree': [],
>> diff --git a/tests/unit/test-strList.c b/tests/unit/test-strList.c
>> new file mode 100644
>> index 000..49a1cfd
>> --- /dev/null
>> +++ b/tests/unit/test-strList.c
>> @@ -0,0 +1,80 @@
>> +/*
>> + * Copyright (c) 2022 - 2024 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/strList.h"
>> +
>> +static strList *make_list(int length)
>> +{
>> +strList *head = 0, *list, **prev = 
>> +
>> +while (length--) {
>> +list = *prev = g_new0(strList, 1);
>> +list->value = g_strdup("aaa");
>> +prev = >next;
>> +}
>> +return head;
>> +}
>> +
>> +static void test_length(void)
>> +{
>> +strList *list;
>> +int i;
>> +
>> +for (i = 0; i < 5; i++) {
>> +list = make_list(i);
>> +g_assert_cmpint(i, ==, QAPI_LIST_LENGTH(list));
>> +qapi_free_strList(list);
>> +}
>> +}
>> +
>> +struct {
>> +const char *string;
>> +const char *delim;
>> +const char *args[5];
>> +} list_data[] = {
>> +{ 0, ",", { 0 } },
>> +{ "", ",", { 0 } },
>> +{ "a", ",", { "a", 0 } },
>> +{ "a,b", ",", { "a", "b", 0 } },
>> +{ "a,b,c", ",", { "a", "b", "c", 0 } },
>> +{ "first last", " ", { "first", "last", 0 } },
>> +{ "a:", ":", { "a", "", 0 } },
>> +{ "a::b", ":", { "a", "", "b", 0 } },
>> +{ ":", ":", { "", "", 0 } },
>> +{ ":a", ":", { "", "a", 0 } },
>> +{ "::a", ":", { "", "", "a", 0 } },
>> +};
> 
> NULL instead of 0, please.

ok.

>> +
>> +static void test_strv(void)
>> +{
>> +int i, j;
>> +const char **expect;
>> +strList *list;
>> +GStrv args;
> 
> I'd prefer char **argv;

ok.

>> +
>> +for (i = 0; i < ARRAY_SIZE(list_data); i++) {
>> +expect = list_data[i].args;
>> +list = strList_from_string(list_data[i].string, list_data[i].delim);
>> +args = strv_from_strList(list);
>> +qapi_free_strList(list);
>> +for (j = 0; expect[j] && args[j]; j++) {
> 
> Loop stops when either array element is null.  That's wrong, we need to
> exhaust both arrays: || instead of &&.

|| is not safe.  After one array is exhausted, [j] will be out of bounds for
the other.  The g_assert_null calls guarantee the arrays are the same length
and all elements have been compared.

- Steve

>> +g_assert_cmpstr(expect[j], ==, args[j]);
>> +}
>> +g_assert_null(expect[j]);
>> +g_assert_null(args[j]);
>> +g_strfreev(args);
>> +}
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> +g_test_init(, , NULL);
>> +g_test_add_func("/test-string/length", test_length);
>> +g_test_add_func("/test-string/strv", test_strv);
>> +return g_test_run();
>> +}
> 



Re: [PATCH V4 1/5] util: strList_from_string

2024-02-21 Thread Steven Sistare
On 2/21/2024 8:29 AM, Markus Armbruster wrote:
> I apologize for the lateness of my review.

No problem.  Thanks for the review.

> Steve Sistare  writes:
> 
>> Generalize hmp_split_at_comma() to take any delimiter string, rename
>> as strList_from_string(), and move it to util/strList.c.
>>
>> No functional change.
>>
>> Signed-off-by: Steve Sistare 
> 
> I can't see an actual use of generalized delimiters outside tests in
> this series.  Do you have uses?

In this series, it is called from hmp_announce_self and stats_filter; 
those were formerly calls to hmp_split_at_comma.

In my live update cpr-exec series, there is an additional call site, with a
space delimiter instead of comma.  Live update V9 is posted but is old and 
obsolete.  
I will post V10 soon, but I am hoping you can pull this series first, so I can 
whittle down my pending patches and omit these from V10.

>> ---
>>  include/monitor/hmp.h  |  1 -
>>  include/qemu/strList.h | 24 
>>  monitor/hmp-cmds.c | 19 ---
>>  net/net-hmp-cmds.c |  3 ++-
>>  stats/stats-hmp-cmds.c |  3 ++-
>>  util/meson.build   |  1 +
>>  util/strList.c | 24 
>>  7 files changed, 53 insertions(+), 22 deletions(-)
>>  create mode 100644 include/qemu/strList.h
>>  create mode 100644 util/strList.c
>>
>> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
>> index 13f9a2d..2df661e 100644
>> --- a/include/monitor/hmp.h
>> +++ b/include/monitor/hmp.h
>> @@ -19,7 +19,6 @@
>>  
>>  bool hmp_handle_error(Monitor *mon, Error *err);
>>  void hmp_help_cmd(Monitor *mon, const char *name);
>> -strList *hmp_split_at_comma(const char *str);
>>  
>>  void hmp_info_name(Monitor *mon, const QDict *qdict);
>>  void hmp_info_version(Monitor *mon, const QDict *qdict);
>> diff --git a/include/qemu/strList.h b/include/qemu/strList.h
>> new file mode 100644
>> index 000..010237f
>> --- /dev/null
>> +++ b/include/qemu/strList.h
>> @@ -0,0 +1,24 @@
>> +/*
>> + * Copyright (c) 2022 - 2024 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef QEMU_STR_LIST_H
>> +#define QEMU_STR_LIST_H
>> +
>> +#include "qapi/qapi-builtin-types.h"
>> +
>> +/*
>> + * Break @in into a strList using the delimiter string @delim.
>> + * The delimiter is not included in the result.
>> + * Return NULL if @in is NULL or an empty string.
>> + * A leading, trailing, or consecutive delimiter produces an
>> + * empty string at that position in the output.
>> + * All strings are g_strdup'd, and the result can be freed
>> + * using qapi_free_strList.
>> + */
>> +strList *strList_from_string(const char *in, const char *delim);
> 
> The function name no longer tells us explicitly what the function does:
> splitting the string.

The first sentence does not say it?
  "Break @in into a strList using the delimiter string @delim"

Would you prefer this?
  "Split string @in into a strList using the delimiter string @delim"

- Steve

>> +#endif
>> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
>> index 871898a..66b68a0 100644
>> --- a/monitor/hmp-cmds.c
>> +++ b/monitor/hmp-cmds.c
>> @@ -38,25 +38,6 @@ bool hmp_handle_error(Monitor *mon, Error *err)
>>  return false;
>>  }
>>  
>> -/*
>> - * Split @str at comma.
>> - * A null @str defaults to "".
>> - */
>> -strList *hmp_split_at_comma(const char *str)
>> -{
>> -char **split = g_strsplit(str ?: "", ",", -1);
>> -strList *res = NULL;
>> -strList **tail = 
>> -int i;
>> -
>> -for (i = 0; split[i]; i++) {
>> -QAPI_LIST_APPEND(tail, split[i]);
>> -}
>> -
>> -g_free(split);
>> -return res;
>> -}
>> -
>>  void hmp_info_name(Monitor *mon, const QDict *qdict)
>>  {
>>  NameInfo *info;
>> diff --git a/net/net-hmp-cmds.c b/net/net-hmp-cmds.c
>> index 41d326b..e893801 100644
>> --- a/net/net-hmp-cmds.c
>> +++ b/net/net-hmp-cmds.c
>> @@ -26,6 +26,7 @@
>>  #include "qemu/config-file.h"
>>  #include "qemu/help_option.h"
>>  #include "qemu/option.h"
>> +#include "qemu/strList.h"
>>  
>>  void hmp_info_network(Monitor *mon, const QDict *qdict)
>>  {
>> @@ -72,7 +73,7 @@ void hmp_announce_self(Monitor *mon, const QDict *qdict)
>>  migrate_announce_params());
>>  
>>  qapi_free_strList(params->interfaces);
>> -params->interfaces = hmp_split_at_comma(interfaces_str);
>> +params->interfaces = strList_from_string(interfaces_str, ",");
>>  params->has_interfaces = params->interfaces != NULL;
>>  params->id = g_strdup(id);
>>  qmp_announce_self(params, NULL);
>> diff --git a/stats/stats-hmp-cmds.c b/stats/stats-hmp-cmds.c
>> index 1f91bf8..428c0e6 100644
>> --- a/stats/stats-hmp-cmds.c
>> +++ b/stats/stats-hmp-cmds.c
>> @@ -10,6 +10,7 @@
>>  #include "monitor/hmp.h"
>>  #include "monitor/monitor.h"
>>  #include "qemu/cutils.h"
>> +#include 

Re: [PATCH V4 5/5] migration: simplify exec migration functions

2024-02-21 Thread Steven Sistare
On 2/21/2024 10:54 AM, Fabiano Rosas wrote:
> Fabiano Rosas  writes:
> 
>> Steve Sistare  writes:
>>
>>> Simplify the exec migration code by using list utility functions.
>>>
>>> As a side effect, this also fixes a minor memory leak.  On function return,
>>> "g_auto(GStrv) argv" frees argv and each element, which is wrong, because
>>> the function does not own the individual elements.  To compensate, the code
>>> uses g_steal_pointer which NULLs argv and prevents the destructor from
>>> running, but argv is leaked.
>>>
>>> Fixes: cbab4face57b ("migration: convert exec backend ...")
>>> Signed-off-by: Steve Sistare 
>>
>> Reviewed-by: Fabiano Rosas 
> 
> You'll have to reintroduce the qemu/cutils.h include:
> 
> ../migration/exec.c: In function 'exec_get_cmd_path':
> ../migration/exec.c:37:5: error: implicit declaration of function 'pstrcat'; 
> did you mean 'strcat'? [-Werror=implicit-function-declaration]
>37 | pstrcat(detected_path, MAX_PATH, "\\cmd.exe");
>   | ^~~
>   | strcat
> ../migration/exec.c:37:5: error: nested extern declaration of 'pstrcat' 
> [-Werror=nested-externs]

Thanks, I will rebase to the tip and verify all is well before I post V5.

- Steve



Re: [PATCH V3 00/13] allow cpr-reboot for vfio

2024-02-20 Thread Steven Sistare
On 2/20/2024 2:49 AM, Peter Xu wrote:
> On Thu, Feb 08, 2024 at 10:53:53AM -0800, Steve Sistare wrote:
>> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
>> guest drivers' suspend methods flush outstanding requests and re-initialize
>> the devices, and thus there is no device state to save and restore.  The
>> user is responsible for suspending the guest before initiating cpr, such as
>> by issuing guest-suspend-ram to the qemu guest agent.
>>
>> Most of the patches in this series enhance migration notifiers so they can
>> return an error status and message.  The last few patches register a notifier
>> for vfio that returns an error if the guest is not suspended.
>>
>> Changes in V3:
>>   * update to tip, add RB's
>>   * replace MigrationStatus with new enum MigrationEventType
>>   * simplify migrate_fd_connect error recovery
>>   * support vfio iommufd containers
>>   * add patches:
>>   migration: stop vm for cpr
>>   migration: update cpr-reboot description
> 
> This doesn't apply to master anymore, please rebase when repost, thanks.

Will do.  Before I do, any comments on "migration: update cpr-reboot 
description"?
After we converge on that short description, I will submit a longer treatment in
docs/devel/migration, which I see you have recently populated.

- Steve



Re: [PATCH V3 10/13] migration: stop vm for cpr

2024-02-20 Thread Steven Sistare
On 2/20/2024 2:33 AM, Peter Xu wrote:
> On Thu, Feb 08, 2024 at 10:54:03AM -0800, Steve Sistare wrote:
>> When migration for cpr is initiated, stop the vm and set state
>> RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
>> possibility of ram and device state being out of sync, and guarantees
>> that a guest in the suspended state remains suspended, because qmp_cont
>> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  include/migration/misc.h |  1 +
>>  migration/migration.c| 32 +---
>>  2 files changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index 6dc234b..54c99a3 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -60,6 +60,7 @@ void migration_object_init(void);
>>  void migration_shutdown(void);
>>  bool migration_is_idle(void);
>>  bool migration_is_active(MigrationState *);
>> +bool migrate_mode_is_cpr(MigrationState *);
>>  
>>  typedef enum MigrationEventType {
>>  MIG_EVENT_PRECOPY_SETUP,
>> diff --git a/migration/migration.c b/migration/migration.c
>> index d1fce9e..fc5c587 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1603,6 +1603,11 @@ bool migration_is_active(MigrationState *s)
>>  s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
>>  }
>>  
>> +bool migrate_mode_is_cpr(MigrationState *s)
>> +{
>> +return s->parameters.mode == MIG_MODE_CPR_REBOOT;
>> +}
>> +
>>  int migrate_init(MigrationState *s, Error **errp)
>>  {
>>  int ret;
>> @@ -2651,13 +2656,14 @@ static int 
>> migration_completion_precopy(MigrationState *s,
>>  bql_lock();
>>  migration_downtime_start(s);
>>  
>> -s->vm_old_state = runstate_get();
>> -global_state_store();
>> -
>> -ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
>> -trace_migration_completion_vm_stop(ret);
>> -if (ret < 0) {
>> -goto out_unlock;
>> +if (!migrate_mode_is_cpr(s)) {
>> +s->vm_old_state = runstate_get();
>> +global_state_store();
>> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
>> +trace_migration_completion_vm_stop(ret);
>> +if (ret < 0) {
>> +goto out_unlock;
>> +}
>>  }
>>  
>>  ret = migration_maybe_pause(s, current_active_state,
>> @@ -3576,6 +3582,7 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  Error *local_err = NULL;
>>  uint64_t rate_limit;
>>  bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
>> +int ret;
>>  
>>  /*
>>   * If there's a previous error, free it and prepare for another one.
>> @@ -3651,6 +3658,17 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  goto fail;
>>  }
>>  
>> +if (migrate_mode_is_cpr(s)) {
>> +s->vm_old_state = runstate_get();
>> +global_state_store();
>> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
>> +trace_migration_completion_vm_stop(ret);
>> +if (ret < 0) {
>> +error_setg(_err, "migration_stop_vm failed, error %d", 
>> -ret);
>> +goto fail;
>> +}
>> +}
> 
> Could we have a helper function for the shared codes?

Will do.

> How about postcopy?  I know it's nonsense to enable postcopy for cpr.. but
> iiuc we don't yet forbid an user doing so.  Maybe we should?

I will assert that mode != cpr in the postcopy path instead, to prevent any 
nonsense.

- Steve

>> +
>>  if (migrate_background_snapshot()) {
>>  qemu_thread_create(>thread, "bg_snapshot",
>>  bg_migration_thread, s, QEMU_THREAD_JOINABLE);
>> -- 
>> 1.8.3.1
>>
> 



Re: [PATCH V3 09/13] migration: notifier error checking

2024-02-20 Thread Steven Sistare
On 2/20/2024 2:12 AM, Peter Xu wrote:
> On Thu, Feb 08, 2024 at 10:54:02AM -0800, Steve Sistare wrote:
>> Check the status returned by migration notifiers and report errors.
>> If notifiers fail, call the notifiers again so they can clean up.
>> None of the notifiers return an error status at this time.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  include/migration/misc.h |  3 ++-
>>  migration/migration.c| 40 +---
>>  2 files changed, 31 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index 0ea1902..6dc234b 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -82,7 +82,8 @@ void migration_add_notifier(NotifierWithReturn *notify,
>>  void migration_add_notifier_mode(NotifierWithReturn *notify,
>>   MigrationNotifyFunc func, MigMode mode);
>>  void migration_remove_notifier(NotifierWithReturn *notify);
>> -void migration_call_notifiers(MigrationState *s, MigrationEventType type);
>> +int migration_call_notifiers(MigrationState *s, MigrationEventType type,
>> + Error **errp);
>>  bool migration_in_setup(MigrationState *);
>>  bool migration_has_finished(MigrationState *);
>>  bool migration_has_failed(MigrationState *);
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 01d8867..d1fce9e 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1318,6 +1318,8 @@ void migrate_set_state(int *state, int old_state, int 
>> new_state)
>>  
>>  static void migrate_fd_cleanup(MigrationState *s)
>>  {
>> +Error *local_err = NULL;
>> +
>>  g_free(s->hostname);
>>  s->hostname = NULL;
>>  json_writer_free(s->vmdesc);
>> @@ -1362,13 +1364,23 @@ static void migrate_fd_cleanup(MigrationState *s)
>>MIGRATION_STATUS_CANCELLED);
>>  }
>>  
>> +if (!migration_has_failed(s) &&
>> +migration_call_notifiers(s, MIG_EVENT_PRECOPY_DONE, _err)) {
>> +
>> +migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED);
>> +migrate_set_error(s, local_err);
>> +error_free(local_err);
>> +}
>> +
>>  if (s->error) {
>>  /* It is used on info migrate.  We can't free it */
>>  error_report_err(error_copy(s->error));
>>  }
>> -migration_call_notifiers(s, s->state == MIGRATION_STATUS_COMPLETED ?
>> - MIG_EVENT_PRECOPY_DONE :
>> - MIG_EVENT_PRECOPY_FAILED);
>> +
>> +if (migration_has_failed(s)) {
>> +migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
>> +}
> 
> AFAIU, the whole point of such split is, allowing DONE notifies to fail too
> and then if that happens we can invoke FAIL notifiers again.

Correct.

> 
> Perhaps we can avoid that complexity, but rather document only SETUP
> notifiers can fail?
> 
> The problem is that failing a notifier at this stage (if migration already
> finished) can already be too late; dest QEMU can already have started
> running, so no way to roll back.  We can document that, check and assert
> for !SETUP cases to make sure error is never hit?

Makes sense. I will modify the patch as you suggest.

- Steve

>> +
>>  block_cleanup_parameters();
>>  yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>>  }
>> @@ -1481,13 +1493,15 @@ void migration_remove_notifier(NotifierWithReturn 
>> *notify)
>>  }
>>  }
>>  
>> -void migration_call_notifiers(MigrationState *s, MigrationEventType type)
>> +int migration_call_notifiers(MigrationState *s, MigrationEventType type,
>> + Error **errp)
>>  {
>>  MigMode mode = s->parameters.mode;
>>  MigrationEvent e;
>>  
>>  e.type = type;
>> -notifier_with_return_list_notify(_state_notifiers[mode], , 
>> 0);
>> +return 
>> notifier_with_return_list_notify(_state_notifiers[mode],
>> +, errp);
>>  }
>>  
>>  bool migration_in_setup(MigrationState *s)
>> @@ -2535,7 +2549,9 @@ static int postcopy_start(MigrationState *ms, Error 
>> **errp)
>>   * at the transition to postcopy and after the device state; in 
>> particular
>>   * spice needs to trigger a transition now
>>   */
>> -migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE);
>> +if (migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE, errp)) {
>> +goto fail;
>> +}
>>  
>>  migration_downtime_end(ms);
>>  
>> @@ -2555,11 +2571,10 @@ static int postcopy_start(MigrationState *ms, Error 
>> **errp)
>>  
>>  ret = qemu_file_get_error(ms->to_dst_file);
>>  if (ret) {
>> -error_setg(errp, "postcopy_start: Migration stream errored");
>> -migrate_set_state(>state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>> -  MIGRATION_STATUS_FAILED);
>> +error_setg_errno(errp, -ret, "postcopy_start: Migration stream 
>> error");
>> +

Re: [PATCH V3 09/13] migration: notifier error checking

2024-02-12 Thread Steven Sistare
On 2/12/2024 4:24 AM, David Hildenbrand wrote:
> On 08.02.24 19:54, Steve Sistare wrote:
>> Check the status returned by migration notifiers and report errors.
>> If notifiers fail, call the notifiers again so they can clean up.
> 
> IIUC, if any of the notifiers will actually start to fail, say, during 
> MIG_EVENT_PRECOPY_SETUP, you will call MIG_EVENT_PRECOPY_FAILED on all 
> notifiers.
> 
> That will include notifiers that have never seen a MIG_EVENT_PRECOPY_SETUP 
> call.

Correct.

> Is that what we expect notifiers to be able to deal with? Can we document 
> that?

The notifiers have always needed to handle failure without knowing the previous 
migration
states, and are robust about unwinding their own internal state.  I will 
document it.

Thanks for all the RBs.

- Steve



Re: [PATCH V2 11/11] vfio: allow cpr-reboot migration if suspended

2024-01-17 Thread Steven Sistare
On 1/17/2024 2:12 AM, Peter Xu wrote:
> On Tue, Jan 16, 2024 at 03:37:33PM -0500, Steven Sistare wrote:
>> On 1/15/2024 2:33 AM, Peter Xu wrote:
>>> On Fri, Jan 12, 2024 at 07:05:10AM -0800, Steve Sistare wrote:
>>>> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
>>>> guest drivers' suspend methods flush outstanding requests and re-initialize
>>>> the devices, and thus there is no device state to save and restore.  The
>>>> user is responsible for suspending the guest before initiating cpr, such as
>>>> by issuing guest-suspend-ram to the qemu guest agent.
>>>>
>>>> Relax the vfio blocker so it does not apply to cpr, and add a notifier that
>>>> verifies the guest is suspended.  Skip dirty page tracking, which is N/A 
>>>> for
>>>> cpr, to avoid ioctl errors.
>>>>
>>>> Signed-off-by: Steve Sistare 
>>>> ---
>>>>  hw/vfio/common.c  |  2 +-
>>>>  hw/vfio/cpr.c | 20 
>>>>  hw/vfio/migration.c   |  2 +-
>>>>  include/hw/vfio/vfio-common.h |  1 +
>>>>  migration/ram.c   |  9 +
>>>>  5 files changed, 28 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index 0b3352f..09af934 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice 
>>>> *vbasedev, Error **errp)
>>>>  error_setg(_devices_migration_blocker,
>>>> "Multiple VFIO devices migration is supported only if all 
>>>> of "
>>>> "them support P2P migration");
>>>> -ret = migrate_add_blocker(_devices_migration_blocker, errp);
>>>> +ret = migrate_add_blocker_normal(_devices_migration_blocker, 
>>>> errp);
>>>>  
>>>>  return ret;
>>>>  }
>>>> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
>>>> index bbd1c7a..9f4b1fe 100644
>>>> --- a/hw/vfio/cpr.c
>>>> +++ b/hw/vfio/cpr.c
>>>> @@ -7,13 +7,33 @@
>>>>  
>>>>  #include "qemu/osdep.h"
>>>>  #include "hw/vfio/vfio-common.h"
>>>> +#include "migration/misc.h"
>>>>  #include "qapi/error.h"
>>>> +#include "sysemu/runstate.h"
>>>> +
>>>> +static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
>>>> +MigrationEvent *e, Error **errp)
>>>> +{
>>>> +if (e->state == MIGRATION_STATUS_SETUP &&
>>>> +!runstate_check(RUN_STATE_SUSPENDED)) {
>>>> +
>>>> +error_setg(errp,
>>>> +"VFIO device only supports cpr-reboot for runstate 
>>>> suspended");
>>>> +
>>>> +return -1;
>>>> +}
>>>
>>> What happens if the guest is suspended during SETUP, but then quickly waked
>>> up before CPR migration completes?
>>
>> That is a management layer bug -- we told them the VM must be suspended.
>> However, I will reject a wakeup request if migration is active and mode is 
>> cpr.
> 
> Yes it needs to be enforced if it is required by cpr-reboot.  QEMU
> hopefully should never rely on mgmt app behavior.
> 
>>
>>>> +return 0;
>>>> +}
>>>>  
>>>>  int vfio_cpr_register_container(VFIOContainer *container, Error **errp)
>>>>  {
>>>> +migration_add_notifier_mode(>cpr_reboot_notifier,
>>>> +vfio_cpr_reboot_notifier,
>>>> +MIG_MODE_CPR_REBOOT);
>>>>  return 0;
>>>>  }
>>>>  
>>>>  void vfio_cpr_unregister_container(VFIOContainer *container)
>>>>  {
>>>> +migration_remove_notifier(>cpr_reboot_notifier);
>>>>  }
>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>>> index 534fddf..488905d 100644
>>>> --- a/hw/vfio/migration.c
>>>> +++ b/hw/vfio/migration.c
>>>> @@ -895,7 +895,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, 
>>>> Error *err, Error **errp)
>>>>  vbasedev->migration_blocker = error_copy(err);
>>>>  error

Re: [PATCH V2 11/11] vfio: allow cpr-reboot migration if suspended

2024-01-16 Thread Steven Sistare



On 1/16/2024 3:37 PM, Steven Sistare wrote:
> On 1/15/2024 2:33 AM, Peter Xu wrote:
>> On Fri, Jan 12, 2024 at 07:05:10AM -0800, Steve Sistare wrote:
>>> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
>>> guest drivers' suspend methods flush outstanding requests and re-initialize
>>> the devices, and thus there is no device state to save and restore.  The
>>> user is responsible for suspending the guest before initiating cpr, such as
>>> by issuing guest-suspend-ram to the qemu guest agent.
>>>
>>> Relax the vfio blocker so it does not apply to cpr, and add a notifier that
>>> verifies the guest is suspended.  Skip dirty page tracking, which is N/A for
>>> cpr, to avoid ioctl errors.
>>>
>>> Signed-off-by: Steve Sistare 
>>> ---
>>>  hw/vfio/common.c  |  2 +-
>>>  hw/vfio/cpr.c | 20 
>>>  hw/vfio/migration.c   |  2 +-
>>>  include/hw/vfio/vfio-common.h |  1 +
>>>  migration/ram.c   |  9 +
>>>  5 files changed, 28 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 0b3352f..09af934 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice 
>>> *vbasedev, Error **errp)
>>>  error_setg(_devices_migration_blocker,
>>> "Multiple VFIO devices migration is supported only if all 
>>> of "
>>> "them support P2P migration");
>>> -ret = migrate_add_blocker(_devices_migration_blocker, errp);
>>> +ret = migrate_add_blocker_normal(_devices_migration_blocker, 
>>> errp);
>>>  
>>>  return ret;
>>>  }
>>> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
>>> index bbd1c7a..9f4b1fe 100644
>>> --- a/hw/vfio/cpr.c
>>> +++ b/hw/vfio/cpr.c
>>> @@ -7,13 +7,33 @@
>>>  
>>>  #include "qemu/osdep.h"
>>>  #include "hw/vfio/vfio-common.h"
>>> +#include "migration/misc.h"
>>>  #include "qapi/error.h"
>>> +#include "sysemu/runstate.h"
>>> +
>>> +static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
>>> +MigrationEvent *e, Error **errp)
>>> +{
>>> +if (e->state == MIGRATION_STATUS_SETUP &&
>>> +!runstate_check(RUN_STATE_SUSPENDED)) {
>>> +
>>> +error_setg(errp,
>>> +"VFIO device only supports cpr-reboot for runstate suspended");
>>> +
>>> +return -1;
>>> +}
>>
>> What happens if the guest is suspended during SETUP, but then quickly waked
>> up before CPR migration completes?
> 
> That is a management layer bug -- we told them the VM must be suspended.
> However, I will reject a wakeup request if migration is active and mode is 
> cpr.
> 
>>> +return 0;
>>> +}
>>>  
>>>  int vfio_cpr_register_container(VFIOContainer *container, Error **errp)
>>>  {
>>> +migration_add_notifier_mode(>cpr_reboot_notifier,
>>> +vfio_cpr_reboot_notifier,
>>> +MIG_MODE_CPR_REBOOT);
>>>  return 0;
>>>  }
>>>  
>>>  void vfio_cpr_unregister_container(VFIOContainer *container)
>>>  {
>>> +migration_remove_notifier(>cpr_reboot_notifier);
>>>  }
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index 534fddf..488905d 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -895,7 +895,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, 
>>> Error *err, Error **errp)
>>>  vbasedev->migration_blocker = error_copy(err);
>>>  error_free(err);
>>>  
>>> -return migrate_add_blocker(>migration_blocker, errp);
>>> +return migrate_add_blocker_normal(>migration_blocker, errp);
>>>  }
>>>  
>>>  /* -- 
>>> */
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 1add5b7..7a46e24 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -78,6 +78,7 @@ struct VFIOGroup

Re: [PATCH V2 11/11] vfio: allow cpr-reboot migration if suspended

2024-01-16 Thread Steven Sistare
On 1/15/2024 2:33 AM, Peter Xu wrote:
> On Fri, Jan 12, 2024 at 07:05:10AM -0800, Steve Sistare wrote:
>> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
>> guest drivers' suspend methods flush outstanding requests and re-initialize
>> the devices, and thus there is no device state to save and restore.  The
>> user is responsible for suspending the guest before initiating cpr, such as
>> by issuing guest-suspend-ram to the qemu guest agent.
>>
>> Relax the vfio blocker so it does not apply to cpr, and add a notifier that
>> verifies the guest is suspended.  Skip dirty page tracking, which is N/A for
>> cpr, to avoid ioctl errors.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  hw/vfio/common.c  |  2 +-
>>  hw/vfio/cpr.c | 20 
>>  hw/vfio/migration.c   |  2 +-
>>  include/hw/vfio/vfio-common.h |  1 +
>>  migration/ram.c   |  9 +
>>  5 files changed, 28 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 0b3352f..09af934 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice 
>> *vbasedev, Error **errp)
>>  error_setg(_devices_migration_blocker,
>> "Multiple VFIO devices migration is supported only if all of 
>> "
>> "them support P2P migration");
>> -ret = migrate_add_blocker(_devices_migration_blocker, errp);
>> +ret = migrate_add_blocker_normal(_devices_migration_blocker, 
>> errp);
>>  
>>  return ret;
>>  }
>> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
>> index bbd1c7a..9f4b1fe 100644
>> --- a/hw/vfio/cpr.c
>> +++ b/hw/vfio/cpr.c
>> @@ -7,13 +7,33 @@
>>  
>>  #include "qemu/osdep.h"
>>  #include "hw/vfio/vfio-common.h"
>> +#include "migration/misc.h"
>>  #include "qapi/error.h"
>> +#include "sysemu/runstate.h"
>> +
>> +static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
>> +MigrationEvent *e, Error **errp)
>> +{
>> +if (e->state == MIGRATION_STATUS_SETUP &&
>> +!runstate_check(RUN_STATE_SUSPENDED)) {
>> +
>> +error_setg(errp,
>> +"VFIO device only supports cpr-reboot for runstate suspended");
>> +
>> +return -1;
>> +}
> 
> What happens if the guest is suspended during SETUP, but then quickly waked
> up before CPR migration completes?

That is a management layer bug -- we told them the VM must be suspended.
However, I will reject a wakeup request if migration is active and mode is cpr.

>> +return 0;
>> +}
>>  
>>  int vfio_cpr_register_container(VFIOContainer *container, Error **errp)
>>  {
>> +migration_add_notifier_mode(>cpr_reboot_notifier,
>> +vfio_cpr_reboot_notifier,
>> +MIG_MODE_CPR_REBOOT);
>>  return 0;
>>  }
>>  
>>  void vfio_cpr_unregister_container(VFIOContainer *container)
>>  {
>> +migration_remove_notifier(>cpr_reboot_notifier);
>>  }
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 534fddf..488905d 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -895,7 +895,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, 
>> Error *err, Error **errp)
>>  vbasedev->migration_blocker = error_copy(err);
>>  error_free(err);
>>  
>> -return migrate_add_blocker(>migration_blocker, errp);
>> +return migrate_add_blocker_normal(>migration_blocker, errp);
>>  }
>>  
>>  /* -- */
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 1add5b7..7a46e24 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -78,6 +78,7 @@ struct VFIOGroup;
>>  typedef struct VFIOContainer {
>>  VFIOContainerBase bcontainer;
>>  int fd; /* /dev/vfio/vfio, empowered by the attached groups */
>> +NotifierWithReturn cpr_reboot_notifier;
>>  unsigned iommu_type;
>>  QLIST_HEAD(, VFIOGroup) group_list;
>>  } VFIOContainer;
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 1923366..44ad324 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2392,8 +2392,8 @@ static void ram_save_cleanup(void *opaque)
>>  RAMState **rsp = opaque;
>>  RAMBlock *block;
>>  
>> -/* We don't use dirty log with background snapshots */
>> -if (!migrate_background_snapshot()) {
>> +/* We don't use dirty log with background snapshots or cpr */
>> +if (!migrate_background_snapshot() && migrate_mode() == 
>> MIG_MODE_NORMAL) {
> 
> Same question here, on what happens if the user resumes the VM before
> migration completes?  IIUC shared-ram is not required, then it means if
> that happens the cpr migration image can contain corrupted data, and that
> may be a problem.
> 
> Background snapshot is special in that it relies on totally different
> tracking facilities 

Re: [PATCH V2 04/11] migration: remove migration_in_postcopy parameter

2024-01-16 Thread Steven Sistare
On 1/15/2024 1:48 AM, Peter Xu wrote:
> On Fri, Jan 12, 2024 at 07:05:03AM -0800, Steve Sistare wrote:
>>  bool migration_in_incoming_postcopy(void)
>> diff --git a/ui/spice-core.c b/ui/spice-core.c
>> index b3cd229..e43a93f 100644
>> --- a/ui/spice-core.c
>> +++ b/ui/spice-core.c
>> @@ -580,7 +580,7 @@ static int migration_state_notifier(NotifierWithReturn 
>> *notifier,
>>  if (migration_in_setup(s)) {
>>  spice_server_migrate_start(spice_server);
>>  } else if (migration_has_finished(s) ||
>> -   migration_in_postcopy_after_devices(s)) {
>> +   migration_in_postcopy_after_devices()) {
> 
> This can be a reply also to your other email: my previous suggestion of
> using PRECOPY_DONE should apply here, where we can convert this chunk into:
> 
>   } else if (event == MIG_EVENT_PRECOPY_DONE) {...}
> 
> Because PRECOPY_DONE should also cover the notification from
> postcopy_start(), then we can drop migration_in_postcopy_after_devices()
> completely, I think.

Yes, that works.  I will define and use MIG_EVENT_PRECOPY_DONE and friends in 
"MigrationEvent for notifiers".

- Steve



Re: [PATCH V2 00/11] allow cpr-reboot for vfio

2024-01-16 Thread Steven Sistare
On 1/12/2024 4:38 PM, Alex Williamson wrote:
> On Fri, 12 Jan 2024 07:04:59 -0800
> Steve Sistare  wrote:
> 
>> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
>> guest drivers' suspend methods flush outstanding requests and re-initialize
>> the devices, and thus there is no device state to save and restore.  The
>> user is responsible for suspending the guest before initiating cpr, such as
>> by issuing guest-suspend-ram to the qemu guest agent.
>>
>> Most of the patches in this series enhance migration notifiers so they can
>> return an error status and message.  The last two patches register a notifier
>> for vfio that returns an error if the guest is not suspended.
> 
> Hi Steve,
> 
> This appears to only support legacy container mode and not cdev/iommufd
> mode.  Shouldn't this support both?  Thanks,

My bad, thanks! I'll move cpr_reboot_notifier to the base container, and also 
call
cpr register/unregister from iommufd_cdev_attach/iommufd_cdev_detach.

- Steve

>> Steve Sistare (11):
>>   notify: pass error to notifier with return
>>   migration: remove error from notifier data
>>   migration: convert to NotifierWithReturn
>>   migration: remove migration_in_postcopy parameter
>>   migration: MigrationEvent for notifiers
>>   migration: MigrationNotifyFunc
>>   migration: per-mode notifiers
>>   migration: refactor migrate_fd_connect failures
>>   migration: notifier error checking
>>   vfio: register container for cpr
>>   vfio: allow cpr-reboot migration if suspended
>>
>>  hw/net/virtio-net.c|  14 ++---
>>  hw/vfio/common.c   |   2 +-
>>  hw/vfio/container.c|  11 +++-
>>  hw/vfio/cpr.c  |  39 ++
>>  hw/vfio/meson.build|   1 +
>>  hw/vfio/migration.c|  13 +++--
>>  hw/virtio/vhost-user.c |  10 ++--
>>  hw/virtio/virtio-balloon.c |   3 +-
>>  include/hw/vfio/vfio-common.h  |   6 ++-
>>  include/hw/virtio/virtio-net.h |   2 +-
>>  include/migration/misc.h   |  21 +---
>>  include/qemu/notify.h  |   7 ++-
>>  migration/migration.c  | 117 
>> +++--
>>  migration/postcopy-ram.c   |   3 +-
>>  migration/postcopy-ram.h   |   1 -
>>  migration/ram.c|  12 ++---
>>  net/vhost-vdpa.c   |  15 +++---
>>  ui/spice-core.c|  19 +++
>>  util/notify.c  |   5 +-
>>  19 files changed, 206 insertions(+), 95 deletions(-)
>>  create mode 100644 hw/vfio/cpr.c
>>
> 



Re: [PATCH V2 03/11] migration: convert to NotifierWithReturn

2024-01-16 Thread Steven Sistare
On 1/15/2024 1:44 AM, Peter Xu wrote:
> On Fri, Jan 12, 2024 at 07:05:02AM -0800, Steve Sistare wrote:
>> Change all migration notifiers to type NotifierWithReturn, so notifiers
>> can return an error status in a future patch.  For now, pass NULL for the
>> notifier error parameter, and do not check the return value.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  hw/net/virtio-net.c|  4 +++-
>>  hw/vfio/migration.c|  4 +++-
>>  include/hw/vfio/vfio-common.h  |  2 +-
>>  include/hw/virtio/virtio-net.h |  2 +-
>>  include/migration/misc.h   |  6 +++---
>>  migration/migration.c  | 16 
>>  net/vhost-vdpa.c   |  6 --
>>  ui/spice-core.c|  8 +---
>>  8 files changed, 28 insertions(+), 20 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 7a2846f..9570353 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -3529,11 +3529,13 @@ static void 
>> virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s)
>>  }
>>  }
>>  
>> -static void virtio_net_migration_state_notifier(Notifier *notifier, void 
>> *data)
>> +static int virtio_net_migration_state_notifier(NotifierWithReturn *notifier,
>> +   void *data, Error **errp)
>>  {
>>  MigrationState *s = data;
>>  VirtIONet *n = container_of(notifier, VirtIONet, migration_state);
>>  virtio_net_handle_migration_primary(n, s);
>> +return 0;
>>  }
> 
> I should have mentioned this earlier.. we have a trend recently to modify
> retval to boolean when Error** existed, e.g.:
> 
> https://lore.kernel.org/all/20231017202633.296756-5-pet...@redhat.com/
> 
> Let's start using boolean too here?  Previous patches may need touch-ups
> too for this.
> 
> Other than that it all looks good here.  Thanks,

Boolean makes sense when there is only one way to recover from failure.
However, when the notifier may fail in different ways, and recovery differs
for each, then the function should return an int errno.  NotifierWithReturn
could have future uses that need multiple return values and multiple recovery 
paths above the notifier_with_return_list_notify level, so IMO the function 
should continue to return int for generality.

- Steve



Re: [PATCH V2 08/11] migration: refactor migrate_fd_connect failures

2024-01-16 Thread Steven Sistare
On 1/15/2024 2:37 AM, Peter Xu wrote:
> On Fri, Jan 12, 2024 at 07:05:07AM -0800, Steve Sistare wrote:
>> Move common code for the error path in migrate_fd_connect to a shared
>> fail label.  No functional change.
>>
>> Signed-off-by: Steve Sistare 
> 
> Reviewed-by: Peter Xu 
> 
> One nitpick below:
> 
>> ---
>>  migration/migration.c | 22 +++---
>>  1 file changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index e9914aa..c828ba7 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3549,6 +3549,7 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  Error *local_err = NULL;
>>  uint64_t rate_limit;
>>  bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
>> +int expect_state = s->state;
> 
> IMHO we can drop this.
> 
>>  
>>  /*
>>   * If there's a previous error, free it and prepare for another one.
>> @@ -3603,11 +3604,7 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  if (migrate_postcopy_ram() || migrate_return_path()) {
>>  if (open_return_path_on_source(s)) {
>>  error_setg(_err, "Unable to open return-path for 
>> postcopy");
>> -migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED);
>> -migrate_set_error(s, local_err);
>> -error_report_err(local_err);
>> -migrate_fd_cleanup(s);
>> -return;
>> +goto fail;
>>  }
>>  }
>>  
>> @@ -3629,12 +3626,8 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  }
>>  
>>  if (multifd_save_setup(_err) != 0) {
>> -migrate_set_error(s, local_err);
>> -error_report_err(local_err);
>> -migrate_set_state(>state, MIGRATION_STATUS_SETUP,
>> -  MIGRATION_STATUS_FAILED);
>> -migrate_fd_cleanup(s);
>> -return;
>> +expect_state = MIGRATION_STATUS_SETUP;
> 
> Then drop this.
> 
>> +goto fail;
>>  }
>>  
>>  if (migrate_background_snapshot()) {
>> @@ -3645,6 +3638,13 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  migration_thread, s, QEMU_THREAD_JOINABLE);
>>  }
>>  s->migration_thread_running = true;
>> +return;
>> +
>> +fail:
>> +migrate_set_error(s, local_err);
>> +migrate_set_state(>state, expect_state, MIGRATION_STATUS_FAILED);
> 
> Then constantly use s->state.

Yes, if you are OK with it, I also prefer to simplify here.

- Steve

>> +error_report_err(local_err);
>> +migrate_fd_cleanup(s);
>>  }
>>  
>>  static void migration_class_init(ObjectClass *klass, void *data)
>> -- 
>> 1.8.3.1
>>
> 



Re: [PATCH V2 05/11] migration: MigrationEvent for notifiers

2024-01-12 Thread Steven Sistare
On 1/12/2024 10:05 AM, Steve Sistare wrote:
> Passing MigrationState to notifiers is unsound because they could access
> unstable migration state internals or even modify the state.  Instead, pass
> the minimal info needed in a new MigrationEvent struct, which could be
> extended in the future if needed.
> 
> Suggested-by: Peter Xu 
> Signed-off-by: Steve Sistare 
> ---
>  [...]
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index dcc98bb..0b4ce0f 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -60,6 +60,11 @@ void migration_object_init(void);
>  void migration_shutdown(void);
>  bool migration_is_idle(void);
>  bool migration_is_active(MigrationState *);
> +
> +typedef struct MigrationEvent {
> +MigrationStatus state;
> +} MigrationEvent;

Hi Peter, I chose to pass MigrationStatus rather than define a new enum and map
MigrationStatus to it.  IMO a new enum adds little value, yet it is more code 
and
another layer of abstraction for coders to grok.  

- Steve



Re: [PATCH V1 2/3] migration: notifier error reporting

2024-01-11 Thread Steven Sistare
On 1/10/2024 9:16 PM, Peter Xu wrote:
> On Wed, Jan 10, 2024 at 01:08:41PM -0500, Steven Sistare wrote:
>> On 1/10/2024 2:18 AM, Peter Xu wrote:
>>> On Wed, Dec 13, 2023 at 10:11:32AM -0800, Steve Sistare wrote:
>>>> After calling notifiers, check if an error has been reported via
>>>> migrate_set_error, and halt the migration.
>>>>
>>>> None of the notifiers call migrate_set_error at this time, so no
>>>> functional change.
>>>>
>>>> Signed-off-by: Steve Sistare 
>>>> ---
>>>>  include/migration/misc.h |  2 +-
>>>>  migration/migration.c| 26 ++
>>>>  2 files changed, 23 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>>>> index 901d117..231d7e4 100644
>>>> --- a/include/migration/misc.h
>>>> +++ b/include/migration/misc.h
>>>> @@ -65,7 +65,7 @@ MigMode migrate_mode_of(MigrationState *);
>>>>  void migration_add_notifier(Notifier *notify,
>>>>  void (*func)(Notifier *notifier, void *data));
>>>>  void migration_remove_notifier(Notifier *notify);
>>>> -void migration_call_notifiers(MigrationState *s);
>>>> +int migration_call_notifiers(MigrationState *s);
>>>>  bool migration_in_setup(MigrationState *);
>>>>  bool migration_has_finished(MigrationState *);
>>>>  bool migration_has_failed(MigrationState *);
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index d5bfe70..29a9a92 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -1280,6 +1280,8 @@ void migrate_set_state(int *state, int old_state, 
>>>> int new_state)
>>>>  
>>>>  static void migrate_fd_cleanup(MigrationState *s)
>>>>  {
>>>> +bool already_failed;
>>>> +
>>>>  qemu_bh_delete(s->cleanup_bh);
>>>>  s->cleanup_bh = NULL;
>>>>  
>>>> @@ -1327,11 +1329,20 @@ static void migrate_fd_cleanup(MigrationState *s)
>>>>MIGRATION_STATUS_CANCELLED);
>>>>  }
>>>>  
>>>> +already_failed = migration_has_failed(s);
>>>> +if (migration_call_notifiers(s)) {
>>>> +if (!already_failed) {
>>>> +migrate_set_state(>state, s->state, 
>>>> MIGRATION_STATUS_FAILED);
>>>> +/* Notify again to recover from this late failure. */
>>>> +migration_call_notifiers(s);
>>>> +}
>>>> +}
>>>> +
>>>>  if (s->error) {
>>>>  /* It is used on info migrate.  We can't free it */
>>>>  error_report_err(error_copy(s->error));
>>>>  }
>>>> -migration_call_notifiers(s);
>>>> +
>>>>  block_cleanup_parameters();
>>>>  yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>>>>  }
>>>> @@ -1450,9 +1461,10 @@ void migration_remove_notifier(Notifier *notify)
>>>>  }
>>>>  }
>>>>  
>>>> -void migration_call_notifiers(MigrationState *s)
>>>> +int migration_call_notifiers(MigrationState *s)
>>>>  {
>>>>  notifier_list_notify(_state_notifiers, s);
>>>> +return (s->error != NULL);
>>>
>>> Exporting more migration_*() functions is pretty ugly to me..
>>
>> I assume you mean migrate_set_error(), which is currently only called from
>> migration/*.c code.
>>
>> Instead, we could define a new function migrate_set_notifier_error(), defined
>> in the new file migration/notifier.h, so we clearly limit the migration 
>> functions which can be called from notifiers.  (Its implementation just calls
>> migrate_set_error)
> 
> Fundementally this allows another .c to change one more field of
> MigrationState (which is ->error) and I still want to avoid it.
> 
> I just replied in the other thread, but now with all these in mind I think
> I still prefer not passing in MigrationState* at all.  It's already kind of
> abused due to migrate_get_current(), and IMHO it's healthier to limit its
> usage to minimum to cover the core of migration states for migration/ use
> only.
> 
> Shrinking or even stop exporting migrate_get_current() is another more
> challenging task, but now what we can do is stop enlarging the direct use
> of Migrat

Re: [PATCH V1 2/3] migration: notifier error reporting

2024-01-10 Thread Steven Sistare
On 1/10/2024 2:18 AM, Peter Xu wrote:
> On Wed, Dec 13, 2023 at 10:11:32AM -0800, Steve Sistare wrote:
>> After calling notifiers, check if an error has been reported via
>> migrate_set_error, and halt the migration.
>>
>> None of the notifiers call migrate_set_error at this time, so no
>> functional change.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  include/migration/misc.h |  2 +-
>>  migration/migration.c| 26 ++
>>  2 files changed, 23 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index 901d117..231d7e4 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -65,7 +65,7 @@ MigMode migrate_mode_of(MigrationState *);
>>  void migration_add_notifier(Notifier *notify,
>>  void (*func)(Notifier *notifier, void *data));
>>  void migration_remove_notifier(Notifier *notify);
>> -void migration_call_notifiers(MigrationState *s);
>> +int migration_call_notifiers(MigrationState *s);
>>  bool migration_in_setup(MigrationState *);
>>  bool migration_has_finished(MigrationState *);
>>  bool migration_has_failed(MigrationState *);
>> diff --git a/migration/migration.c b/migration/migration.c
>> index d5bfe70..29a9a92 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1280,6 +1280,8 @@ void migrate_set_state(int *state, int old_state, int 
>> new_state)
>>  
>>  static void migrate_fd_cleanup(MigrationState *s)
>>  {
>> +bool already_failed;
>> +
>>  qemu_bh_delete(s->cleanup_bh);
>>  s->cleanup_bh = NULL;
>>  
>> @@ -1327,11 +1329,20 @@ static void migrate_fd_cleanup(MigrationState *s)
>>MIGRATION_STATUS_CANCELLED);
>>  }
>>  
>> +already_failed = migration_has_failed(s);
>> +if (migration_call_notifiers(s)) {
>> +if (!already_failed) {
>> +migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED);
>> +/* Notify again to recover from this late failure. */
>> +migration_call_notifiers(s);
>> +}
>> +}
>> +
>>  if (s->error) {
>>  /* It is used on info migrate.  We can't free it */
>>  error_report_err(error_copy(s->error));
>>  }
>> -migration_call_notifiers(s);
>> +
>>  block_cleanup_parameters();
>>  yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>>  }
>> @@ -1450,9 +1461,10 @@ void migration_remove_notifier(Notifier *notify)
>>  }
>>  }
>>  
>> -void migration_call_notifiers(MigrationState *s)
>> +int migration_call_notifiers(MigrationState *s)
>>  {
>>  notifier_list_notify(_state_notifiers, s);
>> +return (s->error != NULL);
> 
> Exporting more migration_*() functions is pretty ugly to me..

I assume you mean migrate_set_error(), which is currently only called from
migration/*.c code.

Instead, we could define a new function migrate_set_notifier_error(), defined
in the new file migration/notifier.h, so we clearly limit the migration 
functions which can be called from notifiers.  (Its implementation just calls
migrate_set_error)

> Would it be better to pass in "Error** errp" into each notifiers?  That may
> need an open coded notifier_list_notify(), breaking the loop if "*errp".
> 
> And the notifier API currently only support one arg..  maybe we should
> implement the notifiers ourselves, ideally passing in "(int state, Error
> **errp)" instead of "(MigrationState *s)".
> 
> Ideally with that MigrationState* shouldn't be visible outside migration/.

I will regret saying this because of the amount of (mechanical) code change 
involved,
but the cleanest solution is:

* Pass errp to: 
  notifier_with_return_list_notify(NotifierWithReturnList *list, void *data, 
Error *errp)
* Pass errp to the NotifierWithReturn notifier:
  int (*notify)(NotifierWithReturn *notifier, void *data, Error **errp);
* Delete the errp member from struct PostcopyNotifyData and pass errp to the 
notifier function
  Ditto for PrecopyNotifyData.
* Convert all migration notifiers to NotifierWithReturn

- Steve

>>  }
>>  
>>  bool migration_in_setup(MigrationState *s)
>> @@ -2520,7 +2532,9 @@ static int postcopy_start(MigrationState *ms, Error 
>> **errp)
>>   * spice needs to trigger a transition now
>>   */
>>  ms->postcopy_after_devices = true;
>> -migration_call_notifiers(ms);
>> +if (migration_call_notifiers(ms)) {
>> +goto fail;
>> +}
>>  
>>  migration_downtime_end(ms);
>>  
>> @@ -3589,7 +3603,11 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  rate_limit = migrate_max_bandwidth();
>>  
>>  /* Notify before starting migration thread */
>> -migration_call_notifiers(s);
>> +if (migration_call_notifiers(s)) {
>> +migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED);
>> +migrate_fd_cleanup(s);
>> +return;
>> +}
>>  }
>>  
>>  migration_rate_set(rate_limit);
>> -- 
>> 

Re: [PATCH V1 1/3] migration: check mode in notifiers

2024-01-10 Thread Steven Sistare
On 1/10/2024 2:09 AM, Peter Xu wrote:
> On Wed, Dec 13, 2023 at 10:11:31AM -0800, Steve Sistare wrote:
>> The existing notifiers should only apply to normal mode.
>>
>> No functional change.
> 
> Instead of adding such check in every notifier, why not make CPR a separate
> list of notifiers?  Just like the blocker lists.

Sure.   I proposed minimal changes in this current series, but extending the 
api to take migration mode would be nicer.

> Aside of this patch, I just started to look at this "notifier" code, I
> really don't think we should pass in MigrationState* into the notifiers.
> IIUC we only need the "state" as an enum.  Then with two separate
> registers, the device code knows the migration mode.
> 
> What do you think?

If we pass state, the notifier must either compare to enum values such as
MIGRATION_STATUS_COMPLETED instead of calling migration_has_finished(s), or
we must define new accessors such as migration_state_is_finished(state).

IMO passing MigrationState is the best approach.
MigrationState is an incomplete type in most notifiers, and the client can
pass it to a limited set of accessors to get more information -- exactly what 
we want to hide migration internals.  However, we could further limit the
allowed accessors, eg move these to a new file "include/migration/notifier.h".


#include "qemu/notify.h"
void migration_add_notifier(Notifier *notify,
void (*func)(Notifier *notifier, void *data));
void migration_remove_notifier(Notifier *notify);
bool migration_is_active(MigrationState *);
bool migration_in_setup(MigrationState *);
bool migration_has_finished(MigrationState *);
bool migration_has_failed(MigrationState *);
---

- Steve



Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate

2024-01-03 Thread Steven Sistare
On 12/23/2023 12:41 AM, Markus Armbruster wrote:
> Steven Sistare  writes:
> 
>> On 12/22/2023 7:20 AM, Markus Armbruster wrote:
>>> Steve Sistare  writes:
>>>
>>>> Currently, a vm in the suspended state is not completely stopped.  The 
>>>> VCPUs
>>>> have been paused, but the cpu clock still runs, and runstate notifiers for
>>>> the transition to stopped have not been called.  This causes problems for
>>>> live migration.  Stale cpu timers_state is saved to the migration stream,
>>>> causing time errors in the guest when it wakes from suspend, and state that
>>>> would have been modified by runstate notifiers is wrong.
>>>>
>>>> Modify vm_stop to completely stop the vm if the current state is suspended,
>>>> transition to RUN_STATE_PAUSED, and remember that the machine was 
>>>> suspended.
>>>> Modify vm_start to restore the suspended state.
>>>
>>> Can you explain this to me in terms of the @current_run_state state
>>> machine?  Like
>>>
>>> Before the patch, trigger X in state Y goes to state Z.
>>> Afterwards, it goes to ...
>>
>> Old behavior:
>>   RUN_STATE_SUSPENDED --> stop --> RUN_STATE_SUSPENDED
>>
>> New behavior:
>>   RUN_STATE_SUSPENDED --> stop --> RUN_STATE_PAUSED
>>   RUN_STATE_PAUSED--> cont --> RUN_STATE_SUSPENDED
> 
> This clarifies things quite a bit for me.  Maybe work it into the commit
> message?

Will do.

>>>> This affects all callers of vm_stop and vm_start, notably, the qapi stop 
>>>> and
>>>> cont commands.  For example:
>>>>
>>>> (qemu) info status
>>>> VM status: paused (suspended)
>>>>
>>>> (qemu) stop
>>>> (qemu) info status
>>>> VM status: paused
>>>>
>>>> (qemu) cont
>>>> (qemu) info status
>>>> VM status: paused (suspended)
>>>>
>>>> (qemu) system_wakeup
>>>> (qemu) info status
>>>> VM status: running
>>>>
>>>> Suggested-by: Peter Xu 
>>>> Signed-off-by: Steve Sistare 
>>>> ---
>>>>  include/sysemu/runstate.h |  5 +
>>>>  qapi/misc.json| 10 --
>>>>  system/cpus.c | 19 ++-
>>>>  system/runstate.c |  3 +++
>>>>  4 files changed, 30 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
>>>> index f6a337b..1d6828f 100644
>>>> --- a/include/sysemu/runstate.h
>>>> +++ b/include/sysemu/runstate.h
>>>> @@ -40,6 +40,11 @@ static inline bool 
>>>> shutdown_caused_by_guest(ShutdownCause cause)
>>>>  return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>>>>  }
>>>>  
>>>> +static inline bool runstate_is_started(RunState state)
>>>> +{
>>>> +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
>>>> +}
>>>> +
>>>>  void vm_start(void);
>>>>  
>>>>  /**
>>>> diff --git a/qapi/misc.json b/qapi/misc.json
>>>> index cda2eff..efb8d44 100644
>>>> --- a/qapi/misc.json
>>>> +++ b/qapi/misc.json
>>>> @@ -134,7 +134,7 @@
>>>>  ##
>>>>  # @stop:
>>>>  #
>>>> -# Stop all guest VCPU execution.
>>>> +# Stop all guest VCPU and VM execution.
>>>
>>> Doesn't "stop all VM execution" imply "stop all guest vCPU execution"?
>>
>> Agreed, so we simply have:
>>
>> # @stop:
>> # Stop guest VM execution.
>>
>> # @cont:
>> # Resume guest VM execution.
> 
> Yes, please.

Will do.

>>>>  #
>>>>  # Since: 0.14
>>>>  #
>>>> @@ -143,6 +143,9 @@
>>># Notes: This function will succeed even if the guest is already in
>>># the stopped state.  In "inmigrate" state, it will ensure that
>>>>  # the guest remains paused once migration finishes, as if the -S
>>>>  # option was passed on the command line.
>>>>  #
>>>> +# In the "suspended" state, it will completely stop the VM and
>>>> +# cause a transition to the "paused" state. (Since 9.0)
>>>> +#
>>>
>>> What user-observable (with que

Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate

2024-01-03 Thread Steven Sistare
On 1/3/2024 8:09 AM, Peter Xu wrote:
> Steven,
> 
> The discussions seem to still apply to the latest.  Do you plan to post a
> new version, with everything covered?

Yes, today, thanks - steve




Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate

2023-12-22 Thread Steven Sistare
On 12/22/2023 7:20 AM, Markus Armbruster wrote:
> Steve Sistare  writes:
> 
>> Currently, a vm in the suspended state is not completely stopped.  The VCPUs
>> have been paused, but the cpu clock still runs, and runstate notifiers for
>> the transition to stopped have not been called.  This causes problems for
>> live migration.  Stale cpu timers_state is saved to the migration stream,
>> causing time errors in the guest when it wakes from suspend, and state that
>> would have been modified by runstate notifiers is wrong.
>>
>> Modify vm_stop to completely stop the vm if the current state is suspended,
>> transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
>> Modify vm_start to restore the suspended state.
> 
> Can you explain this to me in terms of the @current_run_state state
> machine?  Like
> 
> Before the patch, trigger X in state Y goes to state Z.
> Afterwards, it goes to ...

Old behavior:
  RUN_STATE_SUSPENDED --> stop --> RUN_STATE_SUSPENDED

New behavior:
  RUN_STATE_SUSPENDED --> stop --> RUN_STATE_PAUSED
  RUN_STATE_PAUSED--> cont --> RUN_STATE_SUSPENDED

>> This affects all callers of vm_stop and vm_start, notably, the qapi stop and
>> cont commands.  For example:
>>
>> (qemu) info status
>> VM status: paused (suspended)
>>
>> (qemu) stop
>> (qemu) info status
>> VM status: paused
>>
>> (qemu) cont
>> (qemu) info status
>> VM status: paused (suspended)
>>
>> (qemu) system_wakeup
>> (qemu) info status
>> VM status: running
>>
>> Suggested-by: Peter Xu 
>> Signed-off-by: Steve Sistare 
>> ---
>>  include/sysemu/runstate.h |  5 +
>>  qapi/misc.json| 10 --
>>  system/cpus.c | 19 ++-
>>  system/runstate.c |  3 +++
>>  4 files changed, 30 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
>> index f6a337b..1d6828f 100644
>> --- a/include/sysemu/runstate.h
>> +++ b/include/sysemu/runstate.h
>> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
>> cause)
>>  return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>>  }
>>  
>> +static inline bool runstate_is_started(RunState state)
>> +{
>> +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
>> +}
>> +
>>  void vm_start(void);
>>  
>>  /**
>> diff --git a/qapi/misc.json b/qapi/misc.json
>> index cda2eff..efb8d44 100644
>> --- a/qapi/misc.json
>> +++ b/qapi/misc.json
>> @@ -134,7 +134,7 @@
>>  ##
>>  # @stop:
>>  #
>> -# Stop all guest VCPU execution.
>> +# Stop all guest VCPU and VM execution.
> 
> Doesn't "stop all VM execution" imply "stop all guest vCPU execution"?

Agreed, so we simply have:

# @stop:
# Stop guest VM execution.

# @cont:
# Resume guest VM execution.

>>  #
>>  # Since: 0.14
>>  #
>> @@ -143,6 +143,9 @@
># Notes: This function will succeed even if the guest is already in
># the stopped state.  In "inmigrate" state, it will ensure that
>>  # the guest remains paused once migration finishes, as if the -S
>>  # option was passed on the command line.
>>  #
>> +# In the "suspended" state, it will completely stop the VM and
>> +# cause a transition to the "paused" state. (Since 9.0)
>> +#
> 
> What user-observable (with query-status) state transitions are possible
> here?

{"status": "suspended", "singlestep": false, "running": false}
--> stop -->
{"status": "paused", "singlestep": false, "running": false}

>>  # Example:
>>  #
>>  # -> { "execute": "stop" }
>> @@ -153,7 +156,7 @@
>>  ##
>>  # @cont:
>>  #
>> -# Resume guest VCPU execution.
>> +# Resume guest VCPU and VM execution.
>>  #
>>  # Since: 0.14
>>  #
>> @@ -165,6 +168,9 @@
># Returns: If successful, nothing
>#
># Notes: This command will succeed if the guest is currently running.
># It will also succeed if the guest is in the "inmigrate" state;
># in this case, the effect of the command is to make sure the
>>  # guest starts once migration finishes, removing the effect of the
>>  # -S command line option if it was passed.
>>  #
>> +# If the VM was previously suspended, and not been reset or woken,
>> +# this command will transition back to the "suspended" state. (Since 
>> 9.0)
> 
> Long line.

It fits in 80 columns, but perhaps this looks nicer:

# If the VM was previously suspended, and not been reset or woken,
# this command will transition back to the "suspended" state.
# (Since 9.0)

> What user-observable state transitions are possible here?

{"status": "paused", "singlestep": false, "running": false}
--> cont -->
{"status": "suspended", "singlestep": false, "running": false}

>> +#
>>  # Example:
>>  #
>>  # -> { "execute": "cont" }
> 
> Should we update documentation of query-status, too?

IMO no. The new behavior changes the status/RunState field only, and the
domain of values does not change, only the transitions caused by the commands
described here.


Re: [PATCH V8 00/12] fix migration of suspended runstate

2023-12-20 Thread Steven Sistare
On 12/20/2023 9:52 AM, Anthony PERARD wrote:
> On Mon, Dec 18, 2023 at 01:14:51PM +0800, Peter Xu wrote:
>> On Wed, Dec 13, 2023 at 10:35:33AM -0500, Steven Sistare wrote:
>>> Hi Peter, all have RB's, with all i's dotted and t's crossed - steve
>>
>> Yes this seems to be more migration related so maybe good candidate for a
>> pull from migration submodule.
>>
>> But since this is still solving a generic issue, I'm copying a few more
>> people from get_maintainers.pl that this series touches, just in case
>> they'll have something to say before dev cycle starts.
> 
> I did a quick smoke test of migrating a Xen guest. It works fine for me.

Thanks very much, I appreciate it - steve



Re: [PATCH V8 00/12] fix migration of suspended runstate

2023-12-18 Thread Steven Sistare
On 12/18/2023 12:14 AM, Peter Xu wrote:
> On Wed, Dec 13, 2023 at 10:35:33AM -0500, Steven Sistare wrote:
>> Hi Peter, all have RB's, with all i's dotted and t's crossed - steve
> 
> Yes this seems to be more migration related so maybe good candidate for a
> pull from migration submodule.
> 
> But since this is still solving a generic issue, I'm copying a few more
> people from get_maintainers.pl that this series touches, just in case
> they'll have something to say before dev cycle starts.

The key aspects are summarized by the cover letter and the commit messages
pasted below for the first 6 patches:

https://lore.kernel.org/qemu-devel/1702481421-375368-1-git-send-email-steven.sist...@oracle.com

---

[PATCH V8 00/12] fix migration of suspended runstate

Migration of a guest in the suspended runstate is broken.  The incoming
migration code automatically tries to wake the guest, which is wrong;
the guest should end migration in the same runstate it started.  Further,
after saving a snapshot in the suspended state and loading it, the vm_start
fails.  The runstate is RUNNING, but the guest is not.
---

[PATCH V8 01/12] cpus: vm_was_suspended

Add a state variable to remember if a vm previously transitioned into a
suspended state.
---

[PATCH V8 02/12] cpus: stop vm in suspended runstate

Currently, a vm in the suspended state is not completely stopped.  The VCPUs
have been paused, but the cpu clock still runs, and runstate notifiers for
the transition to stopped have not been called.  This causes problems for
live migration.  Stale cpu timers_state is saved to the migration stream,
causing time errors in the guest when it wakes from suspend, and state that
would have been modified by runstate notifiers is wrong.

Modify vm_stop to completely stop the vm if the current state is suspended,
transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
Modify vm_start to restore the suspended state.

This affects all callers of vm_stop and vm_start, notably, the qapi stop and
cont commands.  For example:

(qemu) info status
VM status: paused (suspended)

(qemu) stop
(qemu) info status
VM status: paused

(qemu) system_wakeup
Error: Unable to wake up: guest is not in suspended state

(qemu) cont
(qemu) info status
VM status: paused (suspended)

(qemu) system_wakeup
(qemu) info status
VM status: running

---

[PATCH V8 03/12] cpus: check running not RUN_STATE_RUNNING

When a vm transitions from running to suspended, runstate notifiers are
not called, so the notifiers still think the vm is running.  Hence, when
we call vm_start to restore the suspended state, we call vm_state_notify
with running=1.  However, some notifiers check for RUN_STATE_RUNNING.
They must check the running boolean instead.

No functional change.
---

[PATCH V8 04/12] cpus: vm_resume

Define the vm_resume helper, for use in subsequent patches.
---

[PATCH V8 05/12] migration: propagate suspended runstate

If the outgoing machine was previously suspended, propagate that to the
incoming side via global_state, so a subsequent vm_start restores the
suspended state.  To maintain backward and forward compatibility, reclaim
some space from the runstate member.
---

[PATCH V8 06/12] migration: preserve suspended runstate

A guest that is migrated in the suspended state automaticaly wakes and
continues execution.  This is wrong; the guest should end migration in
the same state it started.  The root cause is that the outgoing migration
code automatically wakes the guest, then saves the RUNNING runstate in
global_state_store(), hence the incoming migration code thinks the guest is
running and continues the guest if autostart is true.

On the outgoing side, delete the call to qemu_system_wakeup_request().
Now that vm_stop completely stops a vm in the suspended state (from the
preceding patches), the existing call to vm_stop_force_state is sufficient
to correctly migrate all vmstate.

On the incoming side, call vm_start if the pre-migration state was running
or suspended.  For the latter, vm_start correctly restores the suspended
state, and a future system_wakeup monitor request will cause the vm to
resume running.
---



Re: [PATCH V8 00/12] fix migration of suspended runstate

2023-12-13 Thread Steven Sistare
Hi Peter, all have RB's, with all i's dotted and t's crossed - steve

On 12/13/2023 10:30 AM, Steve Sistare wrote:
> Migration of a guest in the suspended runstate is broken.  The incoming
> migration code automatically tries to wake the guest, which is wrong;
> the guest should end migration in the same runstate it started.  Further,
> after saving a snapshot in the suspended state and loading it, the vm_start
> fails.  The runstate is RUNNING, but the guest is not.
> 
> See the commit messages for the details.
> 
> Changes in V2:
>   * simplify "start on wakeup request"
>   * fix postcopy, snapshot, and background migration
>   * refactor fixes for each type of migration
>   * explicitly handled suspended events and runstate in tests
>   * add test for postcopy and background migration
> 
> Changes in V3:
>   * rebase to tip
>   * fix hang in new function migrate_wait_for_dirty_mem
> 
> Changes in V4:
>   * rebase to tip
>   * add patch for vm_prepare_start (thanks Peter)
>   * add patch to preserve cpu ticks
> 
> Changes in V5:
>   * rebase to tip
>   * added patches to completely stop vm in suspended state:
>   cpus: refactor vm_stop
>   cpus: stop vm in suspended state
>   * added patch to partially resume vm in suspended state:
>   cpus: start vm in suspended state
>   * modified "preserve suspended ..." patches to use the above.
>   * deleted patch "preserve cpu ticks if suspended".  stop ticks in
> vm_stop_force_state instead.
>   * deleted patch "add runstate function".  defined new helper function
> migrate_new_runstate in "preserve suspended runstate"
>   * Added some RB's, but removed other RB's because the patches changed.
> 
> Changes in V6:
>   * all vm_stop calls completely stop the suspended state
>   * refactored and updated the "cpus" patches
>   * simplified the "preserve suspended" patches
>   * added patch "bootfile per vm"
> 
> Changes in V7:
>   * rebase to tip, add RB-s
>   * fix backwards compatibility for global_state.vm_was_suspended
>   * delete vm_prepare_start state argument, and rename patch
> "pass runstate to vm_prepare_start" to
> "check running not RUN_STATE_RUNNING"
>   * drop patches:
>   tests/qtest: bootfile per vm
>   tests/qtest: background migration with suspend
>   * rename runstate_is_started to runstate_is_live
>   * move wait_for_suspend in tests
> 
> Changes in V8:
>   * rebase to tip
>   * add RB's
>   * add comment for runstate_is_live
>   * simplify global_state - the needed function, and its use of 
> vm_was_suspended
> 
> Steve Sistare (12):
>   cpus: vm_was_suspended
>   cpus: stop vm in suspended runstate
>   cpus: check running not RUN_STATE_RUNNING
>   cpus: vm_resume
>   migration: propagate suspended runstate
>   migration: preserve suspended runstate
>   migration: preserve suspended for snapshot
>   migration: preserve suspended for bg_migration
>   tests/qtest: migration events
>   tests/qtest: option to suspend during migration
>   tests/qtest: precopy migration with suspend
>   tests/qtest: postcopy migration with suspend
> 
>  backends/tpm/tpm_emulator.c  |   2 +-
>  hw/usb/hcd-ehci.c|   2 +-
>  hw/usb/redirect.c|   2 +-
>  hw/xen/xen-hvm-common.c  |   2 +-
>  include/migration/snapshot.h |   7 ++
>  include/sysemu/runstate.h|  20 
>  migration/global_state.c |  47 +
>  migration/migration-hmp-cmds.c   |   8 +-
>  migration/migration.c|  15 +--
>  migration/savevm.c   |  23 +++--
>  qapi/misc.json   |  10 +-
>  system/cpus.c|  47 +++--
>  system/runstate.c|   9 ++
>  system/vl.c  |   2 +
>  tests/migration/i386/Makefile|   5 +-
>  tests/migration/i386/a-b-bootblock.S |  50 +-
>  tests/migration/i386/a-b-bootblock.h |  26 +++--
>  tests/qtest/migration-helpers.c  |  27 ++
>  tests/qtest/migration-helpers.h  |  11 ++-
>  tests/qtest/migration-test.c | 181 
> +--
>  20 files changed, 352 insertions(+), 144 deletions(-)
> 



Re: [PATCH V8 02/12] cpus: stop vm in suspended runstate

2023-12-13 Thread Steven Sistare
FYI for Markus and Blake.  No change since V6 and V7.

- Steve

On 12/13/2023 10:30 AM, Steve Sistare wrote:
> Currently, a vm in the suspended state is not completely stopped.  The VCPUs
> have been paused, but the cpu clock still runs, and runstate notifiers for
> the transition to stopped have not been called.  This causes problems for
> live migration.  Stale cpu timers_state is saved to the migration stream,
> causing time errors in the guest when it wakes from suspend, and state that
> would have been modified by runstate notifiers is wrong.
> 
> Modify vm_stop to completely stop the vm if the current state is suspended,
> transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
> Modify vm_start to restore the suspended state.
> 
> This affects all callers of vm_stop and vm_start, notably, the qapi stop and
> cont commands.  For example:
> 
> (qemu) info status
> VM status: paused (suspended)
> 
> (qemu) stop
> (qemu) info status
> VM status: paused
> 
> (qemu) system_wakeup
> Error: Unable to wake up: guest is not in suspended state
> 
> (qemu) cont
> (qemu) info status
> VM status: paused (suspended)
> 
> (qemu) system_wakeup
> (qemu) info status
> VM status: running
> 
> Suggested-by: Peter Xu 
> Signed-off-by: Steve Sistare 
> Reviewed-by: Peter Xu 
> ---
>  include/sysemu/runstate.h |  9 +
>  qapi/misc.json| 10 --
>  system/cpus.c | 23 +++
>  system/runstate.c |  3 +++
>  4 files changed, 35 insertions(+), 10 deletions(-)
> 
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index 88a67e2..618eb49 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -40,6 +40,15 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
> cause)
>  return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>  }
>  
> +/*
> + * In a "live" state, the vcpu clock is ticking, and the runstate notifiers
> + * think we are running.
> + */
> +static inline bool runstate_is_live(RunState state)
> +{
> +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
> +}
> +
>  void vm_start(void);
>  
>  /**
> diff --git a/qapi/misc.json b/qapi/misc.json
> index cda2eff..efb8d44 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -134,7 +134,7 @@
>  ##
>  # @stop:
>  #
> -# Stop all guest VCPU execution.
> +# Stop all guest VCPU and VM execution.
>  #
>  # Since: 0.14
>  #
> @@ -143,6 +143,9 @@
>  # the guest remains paused once migration finishes, as if the -S
>  # option was passed on the command line.
>  #
> +# In the "suspended" state, it will completely stop the VM and
> +# cause a transition to the "paused" state. (Since 9.0)
> +#
>  # Example:
>  #
>  # -> { "execute": "stop" }
> @@ -153,7 +156,7 @@
>  ##
>  # @cont:
>  #
> -# Resume guest VCPU execution.
> +# Resume guest VCPU and VM execution.
>  #
>  # Since: 0.14
>  #
> @@ -165,6 +168,9 @@
>  # guest starts once migration finishes, removing the effect of the
>  # -S command line option if it was passed.
>  #
> +# If the VM was previously suspended, and not been reset or woken,
> +# this command will transition back to the "suspended" state. (Since 9.0)
> +#
>  # Example:
>  #
>  # -> { "execute": "cont" }
> diff --git a/system/cpus.c b/system/cpus.c
> index 9f631ab..f162435 100644
> --- a/system/cpus.c
> +++ b/system/cpus.c
> @@ -277,11 +277,15 @@ bool vm_get_suspended(void)
>  static int do_vm_stop(RunState state, bool send_stop)
>  {
>  int ret = 0;
> +RunState oldstate = runstate_get();
>  
> -if (runstate_is_running()) {
> +if (runstate_is_live(oldstate)) {
> +vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED);
>  runstate_set(state);
>  cpu_disable_ticks();
> -pause_all_vcpus();
> +if (oldstate == RUN_STATE_RUNNING) {
> +pause_all_vcpus();
> +}
>  vm_state_notify(0, state);
>  if (send_stop) {
>  qapi_event_send_stop();
> @@ -694,11 +698,13 @@ int vm_stop(RunState state)
>  
>  /**
>   * Prepare for (re)starting the VM.
> - * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already
> - * running or in case of an error condition), 0 otherwise.
> + * Returns 0 if the vCPUs should be restarted, -1 on an error condition,
> + * and 1 otherwise.
>   */
>  int vm_prepare_start(bool step_pending)
>  {
> +int ret = vm_was_suspended ? 1 : 0;
> +RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : 
> RUN_STATE_RUNNING;
>  RunState requested;
>  
>  qemu_vmstop_requested();
> @@ -729,9 +735,10 @@ int vm_prepare_start(bool step_pending)
>  qapi_event_send_resume();
>  
>  cpu_enable_ticks();
> -runstate_set(RUN_STATE_RUNNING);
> -vm_state_notify(1, RUN_STATE_RUNNING);
> -return 0;
> +runstate_set(state);
> +vm_state_notify(1, state);
> +vm_was_suspended = false;
> +

Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate

2023-12-11 Thread Steven Sistare
Hi Markus, Eric, any comment about this change in stop/cont behavior before
it gets pulled?  There is no change since V6.

- Steve

On 12/6/2023 12:23 PM, Steve Sistare wrote:
> Currently, a vm in the suspended state is not completely stopped.  The VCPUs
> have been paused, but the cpu clock still runs, and runstate notifiers for
> the transition to stopped have not been called.  This causes problems for
> live migration.  Stale cpu timers_state is saved to the migration stream,
> causing time errors in the guest when it wakes from suspend, and state that
> would have been modified by runstate notifiers is wrong.
> 
> Modify vm_stop to completely stop the vm if the current state is suspended,
> transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
> Modify vm_start to restore the suspended state.
> 
> This affects all callers of vm_stop and vm_start, notably, the qapi stop and
> cont commands.  For example:
> 
> (qemu) info status
> VM status: paused (suspended)
> 
> (qemu) stop
> (qemu) info status
> VM status: paused
> 
> (qemu) system_wakeup
> Error: Unable to wake up: guest is not in suspended state
> 
> (qemu) cont
> (qemu) info status
> VM status: paused (suspended)
> 
> (qemu) system_wakeup
> (qemu) info status
> VM status: running
> 
> Suggested-by: Peter Xu 
> Signed-off-by: Steve Sistare 
> Reviewed-by: Peter Xu 
> ---
>  include/sysemu/runstate.h |  5 +
>  qapi/misc.json| 10 --
>  system/cpus.c | 23 +++
>  system/runstate.c |  3 +++
>  4 files changed, 31 insertions(+), 10 deletions(-)
> 
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index 88a67e2..867e53c 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
> cause)
>  return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>  }
>  
> +static inline bool runstate_is_live(RunState state)
> +{
> +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
> +}
> +
>  void vm_start(void);
>  
>  /**
> diff --git a/qapi/misc.json b/qapi/misc.json
> index cda2eff..efb8d44 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -134,7 +134,7 @@
>  ##
>  # @stop:
>  #
> -# Stop all guest VCPU execution.
> +# Stop all guest VCPU and VM execution.
>  #
>  # Since: 0.14
>  #
> @@ -143,6 +143,9 @@
>  # the guest remains paused once migration finishes, as if the -S
>  # option was passed on the command line.
>  #
> +# In the "suspended" state, it will completely stop the VM and
> +# cause a transition to the "paused" state. (Since 9.0)
> +#
>  # Example:
>  #
>  # -> { "execute": "stop" }
> @@ -153,7 +156,7 @@
>  ##
>  # @cont:
>  #
> -# Resume guest VCPU execution.
> +# Resume guest VCPU and VM execution.
>  #
>  # Since: 0.14
>  #
> @@ -165,6 +168,9 @@
>  # guest starts once migration finishes, removing the effect of the
>  # -S command line option if it was passed.
>  #
> +# If the VM was previously suspended, and not been reset or woken,
> +# this command will transition back to the "suspended" state. (Since 9.0)
> +#
>  # Example:
>  #
>  # -> { "execute": "cont" }
> diff --git a/system/cpus.c b/system/cpus.c
> index 9f631ab..f162435 100644
> --- a/system/cpus.c
> +++ b/system/cpus.c
> @@ -277,11 +277,15 @@ bool vm_get_suspended(void)
>  static int do_vm_stop(RunState state, bool send_stop)
>  {
>  int ret = 0;
> +RunState oldstate = runstate_get();
>  
> -if (runstate_is_running()) {
> +if (runstate_is_live(oldstate)) {
> +vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED);
>  runstate_set(state);
>  cpu_disable_ticks();
> -pause_all_vcpus();
> +if (oldstate == RUN_STATE_RUNNING) {
> +pause_all_vcpus();
> +}
>  vm_state_notify(0, state);
>  if (send_stop) {
>  qapi_event_send_stop();
> @@ -694,11 +698,13 @@ int vm_stop(RunState state)
>  
>  /**
>   * Prepare for (re)starting the VM.
> - * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already
> - * running or in case of an error condition), 0 otherwise.
> + * Returns 0 if the vCPUs should be restarted, -1 on an error condition,
> + * and 1 otherwise.
>   */
>  int vm_prepare_start(bool step_pending)
>  {
> +int ret = vm_was_suspended ? 1 : 0;
> +RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : 
> RUN_STATE_RUNNING;
>  RunState requested;
>  
>  qemu_vmstop_requested();
> @@ -729,9 +735,10 @@ int vm_prepare_start(bool step_pending)
>  qapi_event_send_resume();
>  
>  cpu_enable_ticks();
> -runstate_set(RUN_STATE_RUNNING);
> -vm_state_notify(1, RUN_STATE_RUNNING);
> -return 0;
> +runstate_set(state);
> +vm_state_notify(1, state);
> +vm_was_suspended = false;
> +return ret;
>  }
>  
>  void vm_start(void)
> @@ -745,7 

Re: [PATCH V7 00/12] fix migration of suspended runstate

2023-12-11 Thread Steven Sistare
On 12/11/2023 1:56 AM, Peter Xu wrote:
> On Wed, Dec 06, 2023 at 12:30:02PM -0500, Steven Sistare wrote:
>> cpus: stop vm in suspended runstate
> 
> This patch still didn't copy the QAPI maintainers, please remember to do so
> in a new post.
> 
> Maybe it would be easier to move the QAPI doc changes into a separate
> patch?

This was intentional.  I did not cc them for the whole series to spare them from
excess email.  You cc'd them for "[PATCH V6 03/14] cpus: stop vm in suspended 
runstate" 
they had no comments, and there is no change for V7, so I assumed we are OK, 
but I will
cc them again for that patch to be sure.

- Steve



Re: [PATCH V7 05/12] migration: propagate suspended runstate

2023-12-11 Thread Steven Sistare
On 12/11/2023 1:46 AM, Peter Xu wrote:
> On Wed, Dec 06, 2023 at 09:23:30AM -0800, Steve Sistare wrote:
>> If the outgoing machine was previously suspended, propagate that to the
>> incoming side via global_state, so a subsequent vm_start restores the
>> suspended state.  To maintain backward and forward compatibility, reclaim
>> some space from the runstate member.
>>
>> Signed-off-by: Steve Sistare 
> 
> Reviewed-by: Peter Xu 
> 
> One nitpick below.
> 
>> ---
>>  migration/global_state.c | 35 +--
>>  1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/global_state.c b/migration/global_state.c
>> index 4e2a9d8..d4f61a1 100644
>> --- a/migration/global_state.c
>> +++ b/migration/global_state.c
>> @@ -22,7 +22,16 @@
>>  
>>  typedef struct {
>>  uint32_t size;
>> -uint8_t runstate[100];
>> +
>> +/*
>> + * runstate was 100 bytes, zero padded, but we trimmed it to add a
>> + * few fields and maintain backwards compatibility.
>> + */
>> +uint8_t runstate[32];
>> +uint8_t has_vm_was_suspended;
>> +uint8_t vm_was_suspended;
>> +uint8_t unused[66];
>> +
>>  RunState state;
>>  bool received;
>>  } GlobalState;
>> @@ -35,6 +44,10 @@ static void global_state_do_store(RunState state)
>>  assert(strlen(state_str) < sizeof(global_state.runstate));
>>  strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
>>state_str, '\0');
>> +global_state.has_vm_was_suspended = true;
>> +global_state.vm_was_suspended = vm_get_suspended();
>> +
>> +memset(global_state.unused, 0, sizeof(global_state.unused));
>>  }
>>  
>>  void global_state_store(void)
>> @@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque)
>>  return true;
>>  }
>>  
>> +/* If the suspended state must be remembered, it is needed */
>> +
>> +if (vm_get_suspended()) {
>> +return true;
>> +}
> 
> Can we drop this section?
> 
> I felt unsafe when QEMU can overwrite the option even if user explicitly
> specified store-global-state=off but we still send this..  Ideally I think
> it's better if it's as simple as:
> 
> static bool global_state_needed(void *opaque)
> {
> return migrate_get_current()->store_global_state;
> }

I agree, I also did not see the point of dropping global_state for some states.
I will simplify it to this. 

- Steve

> I don't think we can remove the old trick due to compatibility reasons, but
> maybe nice to not add new ones to make this section more unpredictable in
> the migration stream?
> 
> IMHO it shouldn't matter in reality for the current use case even dropping
> it, as I don't expect any non-Xen QEMU VMs migrates without having the
> option turned on (store-global-state=on) after QEMU 2.4.
> 
> Thanks,
> 



Re: [PATCH V7 05/12] migration: propagate suspended runstate

2023-12-08 Thread Steven Sistare
On 12/8/2023 11:37 AM, Fabiano Rosas wrote:
> Steve Sistare  writes:
> 
>> If the outgoing machine was previously suspended, propagate that to the
>> incoming side via global_state, so a subsequent vm_start restores the
>> suspended state.  To maintain backward and forward compatibility, reclaim
>> some space from the runstate member.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  migration/global_state.c | 35 +--
>>  1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/global_state.c b/migration/global_state.c
>> index 4e2a9d8..d4f61a1 100644
>> --- a/migration/global_state.c
>> +++ b/migration/global_state.c
>> @@ -22,7 +22,16 @@
>>  
>>  typedef struct {
>>  uint32_t size;
>> -uint8_t runstate[100];
>> +
>> +/*
>> + * runstate was 100 bytes, zero padded, but we trimmed it to add a
>> + * few fields and maintain backwards compatibility.
>> + */
>> +uint8_t runstate[32];
>> +uint8_t has_vm_was_suspended;
>> +uint8_t vm_was_suspended;
>> +uint8_t unused[66];
>> +
>>  RunState state;
>>  bool received;
>>  } GlobalState;
>> @@ -35,6 +44,10 @@ static void global_state_do_store(RunState state)
>>  assert(strlen(state_str) < sizeof(global_state.runstate));
>>  strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
>>state_str, '\0');
>> +global_state.has_vm_was_suspended = true;
>> +global_state.vm_was_suspended = vm_get_suspended();
>> +
>> +memset(global_state.unused, 0, sizeof(global_state.unused));
>>  }
>>  
>>  void global_state_store(void)
>> @@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque)
>>  return true;
>>  }
>>  
>> +/* If the suspended state must be remembered, it is needed */
>> +
>> +if (vm_get_suspended()) {
>> +return true;
>> +}
>> +
>>  /* If state is running or paused, it is not needed */
>>  
>>  if (strcmp(runstate, "running") == 0 ||
>> @@ -85,6 +104,7 @@ static int global_state_post_load(void *opaque, int 
>> version_id)
>>  Error *local_err = NULL;
>>  int r;
>>  char *runstate = (char *)s->runstate;
>> +bool vm_was_suspended = s->has_vm_was_suspended && s->vm_was_suspended;
> 
> Why do you need has_vm_was_suspended? If they're always read like this,
> then one flag should do, no?

has_vm_was_suspended=0 means a legacy qemu.  Currently the dest has no reason 
to care,
and we treat that as a vm_was_suspended=0 case, but I want to keep that field in
case we ever need to know.  But you are correct that this line can simply be:
bool vm_was_suspended = s->vm_was_suspended;

- Steve
 
>>  s->received = true;
>>  trace_migrate_global_state_post_load(runstate);
>> @@ -93,7 +113,7 @@ static int global_state_post_load(void *opaque, int 
>> version_id)
>>  sizeof(s->runstate)) == sizeof(s->runstate)) {
>>  /*
>>   * This condition should never happen during migration, because
>> - * all runstate names are shorter than 100 bytes (the size of
>> + * all runstate names are shorter than 32 bytes (the size of
>>   * s->runstate). However, a malicious stream could overflow
>>   * the qapi_enum_parse() call, so we force the last character
>>   * to a NUL byte.
>> @@ -110,6 +130,14 @@ static int global_state_post_load(void *opaque, int 
>> version_id)
>>  }
>>  s->state = r;
>>  
>> +/*
>> + * global_state is saved on the outgoing side before forcing a stopped
>> + * state, so it may have saved state=suspended and vm_was_suspended=0.
>> + * Now we are in a paused state, and when we later call vm_start, it 
>> must
>> + * restore the suspended state, so we must set vm_was_suspended=1 here.
>> + */
>> +vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED);
>> +
>>  return 0;
>>  }
>>  
>> @@ -134,6 +162,9 @@ static const VMStateDescription vmstate_globalstate = {
>>  .fields = (VMStateField[]) {
>>  VMSTATE_UINT32(size, GlobalState),
>>  VMSTATE_BUFFER(runstate, GlobalState),
>> +VMSTATE_UINT8(has_vm_was_suspended, GlobalState),
>> +VMSTATE_UINT8(vm_was_suspended, GlobalState),
>> +VMSTATE_BUFFER(unused, GlobalState),
>>  VMSTATE_END_OF_LIST()
>>  },
>>  };



Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate

2023-12-06 Thread Steven Sistare
On 12/6/2023 1:45 PM, Philippe Mathieu-Daudé wrote:
> Hi Steve,
> 
> On 6/12/23 18:23, Steve Sistare wrote:
>> Currently, a vm in the suspended state is not completely stopped.  The VCPUs
>> have been paused, but the cpu clock still runs, and runstate notifiers for
>> the transition to stopped have not been called.  This causes problems for
>> live migration.  Stale cpu timers_state is saved to the migration stream,
>> causing time errors in the guest when it wakes from suspend, and state that
>> would have been modified by runstate notifiers is wrong.
>>
>> Modify vm_stop to completely stop the vm if the current state is suspended,
>> transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
>> Modify vm_start to restore the suspended state.
>>
>> This affects all callers of vm_stop and vm_start, notably, the qapi stop and
>> cont commands.  For example:
>>
>>  (qemu) info status
>>  VM status: paused (suspended)
>>
>>  (qemu) stop
>>  (qemu) info status
>>  VM status: paused
>>
>>  (qemu) system_wakeup
>>  Error: Unable to wake up: guest is not in suspended state
>>
>>  (qemu) cont
>>  (qemu) info status
>>  VM status: paused (suspended)
>>
>>  (qemu) system_wakeup
>>  (qemu) info status
>>  VM status: running
>>
>> Suggested-by: Peter Xu 
>> Signed-off-by: Steve Sistare 
>> Reviewed-by: Peter Xu 
>> ---
>>   include/sysemu/runstate.h |  5 +
>>   qapi/misc.json    | 10 --
>>   system/cpus.c | 23 +++
>>   system/runstate.c |  3 +++
>>   4 files changed, 31 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
>> index 88a67e2..867e53c 100644
>> --- a/include/sysemu/runstate.h
>> +++ b/include/sysemu/runstate.h
>> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
>> cause)
>>   return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>>   }
>>   +static inline bool runstate_is_live(RunState state)
>> +{
>> +    return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
>> +}
> 
> Not being familiar with (live) migration, from a generic vCPU PoV
> I don't get what "runstate_is_live" means. Can we add a comment
> explaining what this helper is for?

Sure.  "live" means the cpu clock is ticking, and the runstate notifiers think
we are running.  It is everything that is enabled in vm_start except for 
starting
the vcpus.

> Since this is a migration particular case, maybe we can be verbose
> in vm_resume() and keep runstate_is_live() -- eventually undocumented
> -- in migration/migration.c.

runstate_is_live is about cpu and vm state, not migration state (the "live" is 
not 
live migration), and is used in multiple places in cpus code and elsewhere, so 
I would 
like to keep it in runstate.h.  It has a specific meaning, and it is useful to 
search 
for it to see who handles "liveness", and distinguish it from code that checks 
the
running and suspended states for other reasons.

- Steve

>  void vm_resume(RunState state)
>  {
>  switch (state) {
>  case RUN_STATE_RUNNING:
>  case RUN_STATE_SUSPENDED:
>  vm_start();
>  break;
>  default:
>  runstate_set(state);
>  }
>  }




Re: [PATCH V7 00/12] fix migration of suspended runstate

2023-12-06 Thread Steven Sistare
FYI, these patches still need RB:
  migration: propagate suspended runstate
  tests/qtest: precopy migration with suspend
  tests/qtest: postcopy migration with suspend

This has RB, but the interaction between vm_start and vm_prepare_start
changed, so needs another look.
cpus: stop vm in suspended runstate

- Steve

On 12/6/2023 12:23 PM, Steve Sistare wrote:
> Migration of a guest in the suspended runstate is broken.  The incoming
> migration code automatically tries to wake the guest, which is wrong;
> the guest should end migration in the same runstate it started.  Further,
> after saving a snapshot in the suspended state and loading it, the vm_start
> fails.  The runstate is RUNNING, but the guest is not.
> 
> See the commit messages for the details.
> 
> Changes in V2:
>   * simplify "start on wakeup request"
>   * fix postcopy, snapshot, and background migration
>   * refactor fixes for each type of migration
>   * explicitly handled suspended events and runstate in tests
>   * add test for postcopy and background migration
> 
> Changes in V3:
>   * rebase to tip
>   * fix hang in new function migrate_wait_for_dirty_mem
> 
> Changes in V4:
>   * rebase to tip
>   * add patch for vm_prepare_start (thanks Peter)
>   * add patch to preserve cpu ticks
> 
> Changes in V5:
>   * rebase to tip
>   * added patches to completely stop vm in suspended state:
>   cpus: refactor vm_stop
>   cpus: stop vm in suspended state
>   * added patch to partially resume vm in suspended state:
>   cpus: start vm in suspended state
>   * modified "preserve suspended ..." patches to use the above.
>   * deleted patch "preserve cpu ticks if suspended".  stop ticks in
> vm_stop_force_state instead.
>   * deleted patch "add runstate function".  defined new helper function
> migrate_new_runstate in "preserve suspended runstate"
>   * Added some RB's, but removed other RB's because the patches changed.
> 
> Changes in V6:
>   * all vm_stop calls completely stop the suspended state
>   * refactored and updated the "cpus" patches
>   * simplified the "preserve suspended" patches
>   * added patch "bootfile per vm"
> 
> Changes in V7:
>   * rebase to tip, add RB-s
>   * fix backwards compatibility for global_state.vm_was_suspended
>   * delete vm_prepare_start state argument, and rename patch
> "pass runstate to vm_prepare_start" to
> "check running not RUN_STATE_RUNNING"
>   * drop patches:
>   tests/qtest: bootfile per vm
>   tests/qtest: background migration with suspend
>   * rename runstate_is_started to runstate_is_live
>   * move wait_for_suspend in tests
> 
> Steve Sistare (12):
>   cpus: vm_was_suspended
>   cpus: stop vm in suspended runstate
>   cpus: check running not RUN_STATE_RUNNING
>   cpus: vm_resume
>   migration: propagate suspended runstate
>   migration: preserve suspended runstate
>   migration: preserve suspended for snapshot
>   migration: preserve suspended for bg_migration
>   tests/qtest: migration events
>   tests/qtest: option to suspend during migration
>   tests/qtest: precopy migration with suspend
>   tests/qtest: postcopy migration with suspend
> 
>  backends/tpm/tpm_emulator.c  |   2 +-
>  hw/usb/hcd-ehci.c|   2 +-
>  hw/usb/redirect.c|   2 +-
>  hw/xen/xen-hvm-common.c  |   2 +-
>  include/migration/snapshot.h |   7 ++
>  include/sysemu/runstate.h|  16 
>  migration/global_state.c |  35 ++-
>  migration/migration-hmp-cmds.c   |   8 +-
>  migration/migration.c|  15 +--
>  migration/savevm.c   |  23 +++--
>  qapi/misc.json   |  10 +-
>  system/cpus.c|  47 +++--
>  system/runstate.c|   9 ++
>  system/vl.c  |   2 +
>  tests/migration/i386/Makefile|   5 +-
>  tests/migration/i386/a-b-bootblock.S |  50 +-
>  tests/migration/i386/a-b-bootblock.h |  26 +++--
>  tests/qtest/migration-helpers.c  |  27 ++
>  tests/qtest/migration-helpers.h  |  11 ++-
>  tests/qtest/migration-test.c | 181 
> +--
>  20 files changed, 354 insertions(+), 126 deletions(-)
> 



Re: [PATCH V7 00/12] fix migration of suspended runstate

2023-12-06 Thread Steven Sistare
Gack, there was cruft in my send mail directory.  Please ignore this threaded 
series,
and I will resend.  Sorry for the noise.

- Steve

On 12/6/2023 12:12 PM, Steve Sistare wrote:
> Migration of a guest in the suspended runstate is broken.  The incoming
> migration code automatically tries to wake the guest, which is wrong;
> the guest should end migration in the same runstate it started.  Further,
> after saving a snapshot in the suspended state and loading it, the vm_start
> fails.  The runstate is RUNNING, but the guest is not.
> 
> See the commit messages for the details.
> 
> Changes in V2:
>   * simplify "start on wakeup request"
>   * fix postcopy, snapshot, and background migration
>   * refactor fixes for each type of migration
>   * explicitly handled suspended events and runstate in tests
>   * add test for postcopy and background migration
> 
> Changes in V3:
>   * rebase to tip
>   * fix hang in new function migrate_wait_for_dirty_mem
> 
> Changes in V4:
>   * rebase to tip
>   * add patch for vm_prepare_start (thanks Peter)
>   * add patch to preserve cpu ticks
> 
> Changes in V5:
>   * rebase to tip
>   * added patches to completely stop vm in suspended state:
>   cpus: refactor vm_stop
>   cpus: stop vm in suspended state
>   * added patch to partially resume vm in suspended state:
>   cpus: start vm in suspended state
>   * modified "preserve suspended ..." patches to use the above.
>   * deleted patch "preserve cpu ticks if suspended".  stop ticks in
> vm_stop_force_state instead.
>   * deleted patch "add runstate function".  defined new helper function
> migrate_new_runstate in "preserve suspended runstate"
>   * Added some RB's, but removed other RB's because the patches changed.
> 
> Changes in V6:
>   * all vm_stop calls completely stop the suspended state
>   * refactored and updated the "cpus" patches
>   * simplified the "preserve suspended" patches
>   * added patch "bootfile per vm"
> 
> Changes in V7:
>   * rebase to tip, add RB-s
>   * fix backwards compatibility for global_state.vm_was_suspended
>   * delete vm_prepare_start state argument, and rename patch
> "pass runstate to vm_prepare_start" to
> "check running not RUN_STATE_RUNNING"
>   * drop patches:
>   tests/qtest: bootfile per vm
>   tests/qtest: background migration with suspend
>   * rename runstate_is_started to runstate_is_live
>   * move wait_for_suspend in tests
> 
> Steve Sistare (12):
>   cpus: vm_was_suspended
>   cpus: stop vm in suspended runstate
>   cpus: check running not RUN_STATE_RUNNING
>   cpus: vm_resume
>   migration: propagate suspended runstate
>   migration: preserve suspended runstate
>   migration: preserve suspended for snapshot
>   migration: preserve suspended for bg_migration
>   tests/qtest: migration events
>   tests/qtest: option to suspend during migration
>   tests/qtest: precopy migration with suspend
>   tests/qtest: postcopy migration with suspend
> 
>  backends/tpm/tpm_emulator.c  |   2 +-
>  hw/usb/hcd-ehci.c|   2 +-
>  hw/usb/redirect.c|   2 +-
>  hw/xen/xen-hvm-common.c  |   2 +-
>  include/migration/snapshot.h |   7 ++
>  include/sysemu/runstate.h|  16 
>  migration/global_state.c |  35 ++-
>  migration/migration-hmp-cmds.c   |   8 +-
>  migration/migration.c|  15 +--
>  migration/savevm.c   |  23 +++--
>  qapi/misc.json   |  10 +-
>  system/cpus.c|  47 +++--
>  system/runstate.c|   9 ++
>  system/vl.c  |   2 +
>  tests/migration/i386/Makefile|   5 +-
>  tests/migration/i386/a-b-bootblock.S |  50 +-
>  tests/migration/i386/a-b-bootblock.h |  26 +++--
>  tests/qtest/migration-helpers.c  |  27 ++
>  tests/qtest/migration-helpers.h  |  11 ++-
>  tests/qtest/migration-test.c | 181 
> +--
>  20 files changed, 354 insertions(+), 126 deletions(-)
> 



Re: [PATCH V6 00/14] fix migration of suspended runstate

2023-12-05 Thread Steven Sistare
Hi Peter and Fabiano,

Any comments on these patches, before I send V7?  They are not affected by
the other changes we have discussed, except for renaming runstate_is_started
to runstate_is_live.

  [PATCH V6 04/14] cpus: vm_resume
  [PATCH V6 06/14] migration: preserve suspended runstate
  [PATCH V6 07/14] migration: preserve suspended for snapshot
  [PATCH V6 08/14] migration: preserve suspended for bg_migration

- Steve

On 11/30/2023 4:37 PM, Steve Sistare wrote:
> Migration of a guest in the suspended runstate is broken.  The incoming
> migration code automatically tries to wake the guest, which is wrong;
> the guest should end migration in the same runstate it started.  Further,
> after saving a snapshot in the suspended state and loading it, the vm_start
> fails.  The runstate is RUNNING, but the guest is not.
> 
> See the commit messages for the details.
> 
> Changes in V2:
>   * simplify "start on wakeup request"
>   * fix postcopy, snapshot, and background migration
>   * refactor fixes for each type of migration
>   * explicitly handled suspended events and runstate in tests
>   * add test for postcopy and background migration
> 
> Changes in V3:
>   * rebase to tip
>   * fix hang in new function migrate_wait_for_dirty_mem
> 
> Changes in V4:
>   * rebase to tip
>   * add patch for vm_prepare_start (thanks Peter)
>   * add patch to preserve cpu ticks
> 
> Changes in V5:
>   * rebase to tip
>   * added patches to completely stop vm in suspended state:
>   cpus: refactor vm_stop
>   cpus: stop vm in suspended state
>   * added patch to partially resume vm in suspended state:
>   cpus: start vm in suspended state
>   * modified "preserve suspended ..." patches to use the above.
>   * deleted patch "preserve cpu ticks if suspended".  stop ticks in
> vm_stop_force_state instead.
>   * deleted patch "add runstate function".  defined new helper function
> migrate_new_runstate in "preserve suspended runstate"
>   * Added some RB's, but removed other RB's because the patches changed.
> 
> Changes in V6:
>   * all vm_stop calls completely stop the suspended state
>   * refactored and updated the "cpus" patches
>   * simplified the "preserve suspended" patches
>   * added patch "bootfile per vm"
> 
> Steve Sistare (14):
>   cpus: pass runstate to vm_prepare_start
>   cpus: vm_was_suspended
>   cpus: stop vm in suspended runstate
>   cpus: vm_resume
>   migration: propagate suspended runstate
>   migration: preserve suspended runstate
>   migration: preserve suspended for snapshot
>   migration: preserve suspended for bg_migration
>   tests/qtest: migration events
>   tests/qtest: option to suspend during migration
>   tests/qtest: precopy migration with suspend
>   tests/qtest: postcopy migration with suspend
>   tests/qtest: bootfile per vm
>   tests/qtest: background migration with suspend
> 
>  backends/tpm/tpm_emulator.c  |   2 +-
>  gdbstub/system.c |   2 +-
>  hw/usb/hcd-ehci.c|   2 +-
>  hw/usb/redirect.c|   2 +-
>  hw/xen/xen-hvm-common.c  |   2 +-
>  include/migration/snapshot.h |   7 +
>  include/sysemu/runstate.h|  19 ++-
>  migration/global_state.c |  10 ++
>  migration/migration-hmp-cmds.c   |   8 +-
>  migration/migration.c|  15 +--
>  migration/savevm.c   |  23 ++--
>  qapi/misc.json   |  10 +-
>  system/cpus.c|  49 +--
>  system/runstate.c|   9 ++
>  system/vl.c  |   2 +
>  tests/migration/i386/Makefile|   5 +-
>  tests/migration/i386/a-b-bootblock.S |  50 +++-
>  tests/migration/i386/a-b-bootblock.h |  26 ++--
>  tests/qtest/migration-helpers.c  |  27 ++--
>  tests/qtest/migration-helpers.h  |  11 +-
>  tests/qtest/migration-test.c | 240 
> +--
>  21 files changed, 382 insertions(+), 139 deletions(-)
> 



Re: [PATCH V6 13/14] tests/qtest: bootfile per vm

2023-12-05 Thread Steven Sistare
On 12/4/2023 5:37 PM, Peter Xu wrote:
> On Mon, Dec 04, 2023 at 06:13:36PM -0300, Fabiano Rosas wrote:
>> Steve Sistare  writes:
>>
>>> Create a separate bootfile for the outgoing and incoming vm, so the block
>>> layer can lock the file during the background migration test.  Otherwise,
>>> the test fails with:
>>>   "Failed to get "write" lock.  Is another process using the image
>>>[/tmp/migration-test-WAKPD2/bootsect]?"
>>
>> Hm.. what is the background migration even trying to access on the boot
>> disk? @Peter?
> 
> I didn't yet notice this patch until you asked, but background snapshot is
> not designed to be used like this, afaict.
> 
> It should normally be used when someone would like to use "savevm", then
> background snapshot makes that snapshot save happen with VM running (live)
> and mostly as performant as "savevm" due to page write protections (IOW, it
> is not dirty tracking, but wr-protect each page so not writtable at all
> until unprotected).
> 
> Another difference (from "savevm") is, instead of storing that image onto
> the block images, it stores that image also separately just like migrating
> with "file:" as of now.
> 
> When the dest QEMU starts it'll try to grab the image lock already because
> it should never run with src running, just like when "loadvm" QEMU doesn't
> assume the QEMU that ran "savevm" will be running.
> 
>>
>> This might be a good use for the -snapshot option. It should stop any
>> attempt to get the write lock. Not a lot of difference, but slightly
>> simpler.
> 
> We don't yet have a background-snapshot test case.  If we ever need,
> that'll need to be done in two steps: start src, save snapshot into file,
> start dest, load from snapshot file.  We just shouldn't boot two together.
> 
> Now after two years when I re-read the snapshot code a bit, I didn't even
> find where QEMU took the disk snapshots.. logically it should be done at
> the start of live background snapshot when VM was dumping device states,
> something like bdrv_all_can_snapshot() orshould be needed to make sure all
> images support snapshot on its own or it should already fail, and take
> snapshots to match the image.
> 
> IOW, I don't even think current raw disk would be able to support
> background snapshot at all, otherwise if VM is live I don't see a way to
> match the image (which is still lively updated by the running VM) to a live
> snapshot taken.  Copy the author, Andrey, for this question.
> 
> Before that is confirmed, maybe the easiest way is we can go without a
> background snapshot test case for suspend vm scenario.

I am OK with dropping the test patches.  It's a lot of change for a trivial 
test.
  tests/qtest: background migration with suspend
  tests/qtest: bootfile per vm

- Steve



Re: [PATCH V6 05/14] migration: propagate suspended runstate

2023-12-05 Thread Steven Sistare
On 12/5/2023 11:50 AM, Peter Xu wrote:
> On Mon, Dec 04, 2023 at 05:23:57PM -0500, Steven Sistare wrote:
>> The V6 code does not pass a state to vm_start, and knowledge of 
>> vm_was_suspended
>> is confined to the global_state and cpus functions.  IMO this is a more 
>> modular
>> and robust solution, as multiple sites may call vm_start(), and the right 
>> thing
>> happens.  Look at patch 6.  The changes are minimal because vm_start "just 
>> works".
> 
> Oh I think I see what you meant.  Sounds good then.
> 
> Shall we hide that into vm_prepare_start()?  It seems three's still one
> more call sites that always pass in RUNNING (gdb_continue_partial).
> 
> If with above, vm_prepare_start() will go into either RUNNING, SUSPENDED,
> or an error.  It returns 0 only if RUNNING, -1 for all the rest.  Maybe we
> can already also touch up the retval of vm_prepare_start() to be a boolean,
> reflecting "whether vcpu needs to be started".

Yes, that is even nicer, thanks.

/**
 * Prepare for (re)starting the VM.
 * Returns 0 if the vCPUs should be restarted, -1 on an error condition,
 * and 1 otherwise.
 */
int vm_prepare_start(bool step_pending)
{
int ret = vm_was_suspended ? 1 : 0;
RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : RUN_STATE_RUNNING;
...
vm_was_suspended = false;
return ret;
}

void vm_start(void)
{
if (!vm_prepare_start(false)) {
resume_all_vcpus();
}
}

- Steve



Re: [PATCH V6 05/14] migration: propagate suspended runstate

2023-12-05 Thread Steven Sistare
On 12/5/2023 11:52 AM, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
>> On Tue, Dec 05, 2023 at 09:44:12AM -0300, Fabiano Rosas wrote:
>>> Peter Xu  writes:
>>>
 On Mon, Dec 04, 2023 at 06:09:16PM -0300, Fabiano Rosas wrote:
> Right, I got your point. I just think we could avoid designing this new
> string format by creating new fields with the extra space:
>
> typedef struct QEMU_PACKED {
> uint32_t size;
> uint8_t runstate[50];
> uint8_t unused[50];
> RunState state;
> bool received;
> } GlobalState;
>
> In my mind this works seamlessly, or am I mistaken?

 I think what you proposed should indeed work.

 Currently it's:

 .fields = (VMStateField[]) {
 VMSTATE_UINT32(size, GlobalState),
 VMSTATE_BUFFER(runstate, GlobalState),
 VMSTATE_END_OF_LIST()
 },

 I had a quick look at vmstate_info_buffer, it mostly only get()/put() those
 buffers with its sizeof(), so looks all fine.  For sure in all cases we'd
 better test it to verify.

 One side note is since we so far use qapi_enum_parse() for the runstate, I
 think the "size" is not ever used..

 If we do want a split, IMHO we can consider making runstate[] even smaller
 to just free up the rest spaces all in one shot:

   typedef struct QEMU_PACKED {
>>
>> [1]
>>
   uint32_t size;
   /*
* Assuming 16 is good enough to fit all possible runstate strings..
* This field must be a string ending with '\0'.
*/
   uint8_t runstate[16];
   /* 0x00 when QEMU doesn't support it, or "0"/"1" to reflect its 
 state */
   uint8_t vm_was_suspended[1];
   /*
* Still free of use space.  Note that we only have 99 bytes for use
* because the last byte (the 100th byte) must be zero due to legacy
* reasons, if not it may be set to zero after loaded on dest QEMU. 
*/
>>>
>>> I'd add a 'uint8_t reserved;' to go along with this comment instead of
>>> leaving a hole.
>>
>> Note that "struct GlobalState" is not a binary format but only some
>> internal storage, what really matters is vmstate_globalstate.  Here the
>> "uint8_reserved" will be a pure waste of 1 byte in QEMU binary, imho.
>>
> 
> I prefer wasting the byte and make the code more obvious to people who
> might not immediately understand what's going on. We could even
> assert(!global_state.reserved) to sanity check the assumption. Anyway,
> that's minor, I'm fine with it either way.
> 
>> I think I just copied what you had previously and extended it, logically I
>> don't think we ever need QEMU_PACKED right above [1].  We can also drop
>> "size" directly here, but this can be done later.
> 
> Ah right, I was initially thinking of letting the new qemu overrun
> runstate[16] so we wouldn't have to change the code. But that's indeed
> not necessary, your additions to the vmstate make it ok. Thanks.

There is no need to reserve byte 100 in the new scheme.  The incoming side
sets s->runstate[sizeof(s->runstate) - 1] = 0 to protect itself, and now
sizeof(runstate) is smaller.

- Steve



Re: [PATCH V6 11/14] tests/qtest: precopy migration with suspend

2023-12-05 Thread Steven Sistare
On 12/4/2023 3:49 PM, Peter Xu wrote:
> On Thu, Nov 30, 2023 at 01:37:24PM -0800, Steve Sistare wrote:
>> Add a test case to verify that the suspended state is handled correctly
>> during live migration precopy.  The test suspends the src, migrates, then
>> wakes the dest.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  tests/qtest/migration-helpers.c |  3 ++
>>  tests/qtest/migration-helpers.h |  2 ++
>>  tests/qtest/migration-test.c| 64 
>> ++---
>>  3 files changed, 65 insertions(+), 4 deletions(-)
>>
>> diff --git a/tests/qtest/migration-helpers.c 
>> b/tests/qtest/migration-helpers.c
>> index fd3b94e..37e8e81 100644
>> --- a/tests/qtest/migration-helpers.c
>> +++ b/tests/qtest/migration-helpers.c
>> @@ -32,6 +32,9 @@ bool migrate_watch_for_events(QTestState *who, const char 
>> *name,
>>  if (g_str_equal(name, "STOP")) {
>>  state->stop_seen = true;
>>  return true;
>> +} else if (g_str_equal(name, "SUSPEND")) {
>> +state->suspend_seen = true;
>> +return true;
>>  } else if (g_str_equal(name, "RESUME")) {
>>  state->resume_seen = true;
>>  return true;
>> diff --git a/tests/qtest/migration-helpers.h 
>> b/tests/qtest/migration-helpers.h
>> index 3d32699..b478549 100644
>> --- a/tests/qtest/migration-helpers.h
>> +++ b/tests/qtest/migration-helpers.h
>> @@ -18,6 +18,8 @@
>>  typedef struct QTestMigrationState {
>>  bool stop_seen;
>>  bool resume_seen;
>> +bool suspend_seen;
>> +bool suspend_me;
>>  } QTestMigrationState;
>>  
>>  bool migrate_watch_for_events(QTestState *who, const char *name,
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index e10d5a4..200f023 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -178,7 +178,7 @@ static void bootfile_delete(void)
>>  /*
>>   * Wait for some output in the serial output file,
>>   * we get an 'A' followed by an endless string of 'B's
>> - * but on the destination we won't have the A.
>> + * but on the destination we won't have the A (unless we enabled 
>> suspend/resume)
>>   */
>>  static void wait_for_serial(const char *side)
>>  {
>> @@ -245,6 +245,13 @@ static void wait_for_resume(QTestState *who, 
>> QTestMigrationState *state)
>>  }
>>  }
>>  
>> +static void wait_for_suspend(QTestState *who, QTestMigrationState *state)
>> +{
>> +if (!state->suspend_seen) {
>> +qtest_qmp_eventwait(who, "SUSPEND");
>> +}
>> +}
>> +
>>  /*
>>   * It's tricky to use qemu's migration event capability with qtest,
>>   * events suddenly appearing confuse the qmp()/hmp() responses.
>> @@ -299,7 +306,7 @@ static void wait_for_migration_pass(QTestState *who)
>>  {
>>  uint64_t pass, prev_pass = 0, changes = 0;
>>  
>> -while (changes < 2 && !src_state.stop_seen) {
>> +while (changes < 2 && !src_state.stop_seen && !src_state.suspend_seen) {
>>  usleep(1000);
>>  pass = get_migration_pass(who);
>>  changes += (pass != prev_pass);
>> @@ -595,7 +602,8 @@ static void migrate_wait_for_dirty_mem(QTestState *from,
>>  watch_byte = qtest_readb(from, watch_address);
>>  do {
>>  usleep(1000 * 10);
>> -} while (qtest_readb(from, watch_address) == watch_byte);
>> +} while (qtest_readb(from, watch_address) == watch_byte &&
>> + !src_state.suspend_seen);
> 
> This is hackish to me.
> 
> AFAIU the guest code won't ever dirty anything after printing the initial
> 'B'.  IOW, suspend not seen() should not be called for suspend
> test at all, I guess, because we know it won't.
Calling migrate_wait_for_dirty_mem proves that a source write is propagated
to the dest, even for the suspended case.  We fully expect that, but a good 
test verifies our expectations.  That is done in the first loop of 
migrate_wait_for_dirty_mem.  After that, we must check for the suspended 
state, because the second loop will not terminate.  Here is a more explicit
version:

static void migrate_wait_for_dirty_mem(QTestState *from,
   QTestState *to)
{
uint64_t watch_address = start_address + MAGIC_OFFSET_BASE;
uint64_t marker_address = start_address + MAGIC_OFFSET;
uint8_t watch_byte;

/*
 * Wait for the MAGIC_MARKER to get transferred, as an
 * indicator that a migration pass has made some known
 * amount of progress.
 */
do {
usleep(1000 * 10);
} while (qtest_readq(to, marker_address) != MAGIC_MARKER);

+/* If suspended, src only iterates once, and watch_byte may never change */
+if (src_state.suspend_me) {
+return;
+}

>>  }
>>  
>>  
>> @@ -771,6 +779,7 @@ static int test_migrate_start(QTestState **from, 
>> QTestState **to,
>>  dst_state = (QTestMigrationState) { };
>>  src_state = (QTestMigrationState) { };
>>  bootfile_create(tmpfs, args->suspend_me);
>> +src_state.suspend_me = args->suspend_me;
>>  
>>  if 

Re: [PATCH V6 05/14] migration: propagate suspended runstate

2023-12-05 Thread Steven Sistare
On 12/5/2023 7:44 AM, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
>> On Mon, Dec 04, 2023 at 06:09:16PM -0300, Fabiano Rosas wrote:
>>> Right, I got your point. I just think we could avoid designing this new
>>> string format by creating new fields with the extra space:
>>>
>>> typedef struct QEMU_PACKED {
>>> uint32_t size;
>>> uint8_t runstate[50];
>>> uint8_t unused[50];
>>> RunState state;
>>> bool received;
>>> } GlobalState;
>>>
>>> In my mind this works seamlessly, or am I mistaken?
>>
>> I think what you proposed should indeed work.
>>
>> Currently it's:
>>
>> .fields = (VMStateField[]) {
>> VMSTATE_UINT32(size, GlobalState),
>> VMSTATE_BUFFER(runstate, GlobalState),
>> VMSTATE_END_OF_LIST()
>> },
>>
>> I had a quick look at vmstate_info_buffer, it mostly only get()/put() those
>> buffers with its sizeof(), so looks all fine.  For sure in all cases we'd
>> better test it to verify.
>>
>> One side note is since we so far use qapi_enum_parse() for the runstate, I
>> think the "size" is not ever used..
>>
>> If we do want a split, IMHO we can consider making runstate[] even smaller
>> to just free up the rest spaces all in one shot:
>>
>>   typedef struct QEMU_PACKED {
>>   uint32_t size;
>>   /*
>>* Assuming 16 is good enough to fit all possible runstate strings..
>>* This field must be a string ending with '\0'.
>>*/
>>   uint8_t runstate[16];
>>   /* 0x00 when QEMU doesn't support it, or "0"/"1" to reflect its state 
>> */
>>   uint8_t vm_was_suspended[1];
>>   /*
>>* Still free of use space.  Note that we only have 99 bytes for use
>>* because the last byte (the 100th byte) must be zero due to legacy
>>* reasons, if not it may be set to zero after loaded on dest QEMU. 
>>*/
> 
> I'd add a 'uint8_t reserved;' to go along with this comment instead of
> leaving a hole.

I'll use this scheme, thanks.  It is a clearer than implicitly packing strings.

- Steve



Re: [PATCH V6 05/14] migration: propagate suspended runstate

2023-12-04 Thread Steven Sistare
On 12/4/2023 12:24 PM, Peter Xu wrote:
> On Fri, Dec 01, 2023 at 11:23:33AM -0500, Steven Sistare wrote:
>>>> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int 
>>>> version_id)
>>>>  return -EINVAL;
>>>>  }
>>>>  s->state = r;
>>>> +vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);
>>>
>>> IIUC current vm_was_suspended (based on my read of your patch) was not the
>>> same as a boolean representing "whether VM is suspended", but only a
>>> temporary field to remember that for a VM stop request.  To be explicit, I
>>> didn't see this flag set in qemu_system_suspend() in your previous patch.
>>>
>>> If so, we can already do:
>>>
>>>   vm_set_suspended(s->vm_was_suspended);
>>>
>>> Irrelevant of RUN_STATE_SUSPENDED?
>>
>> We need both terms of the expression.
>>
>> If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = 
>> false.
>> We call global_state_store prior to vm_stop_force_state, so the incoming
>> side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false.
> 
> Right.
> 
>> However, the runstate is RUN_STATE_INMIGRATE.  When incoming finishes by
>> calling vm_start, we need to restore the suspended state.  Thus in 
>> global_state_post_load, we must set vm_was_suspended = true.
> 
> With above, shouldn't global_state_get_runstate() (on dest) fetch SUSPENDED
> already?  Then I think it should call vm_start(SUSPENDED) if to start.

The V6 code does not pass a state to vm_start, and knowledge of vm_was_suspended
is confined to the global_state and cpus functions.  IMO this is a more modular
and robust solution, as multiple sites may call vm_start(), and the right thing
happens.  Look at patch 6.  The changes are minimal because vm_start "just 
works".

> Maybe you're talking about the special case where autostart==false?  We
> used to have this (existing process_incoming_migration_bh()):
> 
> if (!global_state_received() ||
> global_state_get_runstate() == RUN_STATE_RUNNING) {
> if (autostart) {
> vm_start();
> } else {
> runstate_set(RUN_STATE_PAUSED);
> }
> }
> 
> If so maybe I get you, because in the "else" path we do seem to lose the
> SUSPENDED state again, but in that case IMHO we should logically set
> vm_was_suspended only when we "lose" it - we didn't lose it during
> migration, but only until we decided to switch to PAUSED (due to
> autostart==false). IOW, change above to something like:
> 
> state = global_state_get_runstate();
> if (!global_state_received() || runstate_is_alive(state)) {
> if (autostart) {
> vm_start(state);
> } else {
> if (runstate_is_suspended(state)) {
> /* Remember suspended state before setting system to STOPed */
> vm_was_suspended = true;
> }
> runstate_set(RUN_STATE_PAUSED);
> }
> }

This is similar to V5 which tested suspended and fiddled with runstate at
multiple call sites in migration and snapshot.  I believe V6 is cleaner.

> It may or may not have a functional difference even if current patch,
> though.  However maybe clearer to follow vm_was_suspended's strict
> definition.
> 
>>
>> If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED),
>> then vm_was_suspended = true.  Migration from that state sets
>> vm_was_suspended = s->vm_was_suspended = true in global_state_post_load and 
>> ends with runstate_set(RUN_STATE_PAUSED).
>>
>> I will add a comment here in the code.
>>  
>>>>  return 0;
>>>>  }
>>>> @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = {
>>>>  .fields = (VMStateField[]) {
>>>>  VMSTATE_UINT32(size, GlobalState),
>>>>  VMSTATE_BUFFER(runstate, GlobalState),
>>>> +VMSTATE_BOOL(vm_was_suspended, GlobalState),
>>>>  VMSTATE_END_OF_LIST()
>>>>  },
>>>>  };
>>>
>>> I think this will break migration between old/new, unfortunately.  And
>>> since the global state exist mostly for every VM, all VM setup should be
>>> affected, and over all archs.
>>
>> Thanks, I keep forgetting that my binary tricks are no good here.  However,
>> I have one other trick up my sleeve, which is to store vm_was_running in
>> global_state.runstate[strlen(runstate) + 2].  It is forwards and bac

Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate

2023-12-04 Thread Steven Sistare
On 12/4/2023 11:35 AM, Peter Xu wrote:
> On Fri, Dec 01, 2023 at 12:11:32PM -0500, Steven Sistare wrote:
>>>> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
>>>> index f6a337b..1d6828f 100644
>>>> --- a/include/sysemu/runstate.h
>>>> +++ b/include/sysemu/runstate.h
>>>> @@ -40,6 +40,11 @@ static inline bool 
>>>> shutdown_caused_by_guest(ShutdownCause cause)
>>>>  return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>>>>  }
>>>>  
>>>> +static inline bool runstate_is_started(RunState state)
>>>
>>> Would runstate_has_vm_running() sound better?  It is a bit awkward when
>>> saying something like "start a runstate".
>>
>> I have been searching for the perfect name for this accessor.
>> IMO using "running" in this accessor is confusing because it applies to both
>> the running and suspended state.  So, I invented a new aggregate state called
>> started.  vm_start transitions the machine to a started state.
>>
>> How about runstate_was_started?  It works well at both start and stop call 
>> sites:
>>
>> void vm_resume(RunState state)
>> if (runstate_was_started(state)) {
> 
> This one looks fine, but...
> 
>> vm_start();
>>
>> int vm_stop_force_state(RunState state)
>> if (runstate_was_started(runstate_get())) {
> 
> .. this one makes the past tense not looking good.
> 
>> return vm_stop(state);
> 
> How about runstate_is_alive()?  So far the best I can come up with. :)
> 
> Even if you prefer "started", I'd vote for not using past tense, hence
> runstate_is_started().

runstate_is_live also occurred to me.  I'll use that.

- Steve



Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate

2023-12-01 Thread Steven Sistare
On 11/30/2023 5:10 PM, Peter Xu wrote:
> On Thu, Nov 30, 2023 at 01:37:16PM -0800, Steve Sistare wrote:
>> Currently, a vm in the suspended state is not completely stopped.  The VCPUs
>> have been paused, but the cpu clock still runs, and runstate notifiers for
>> the transition to stopped have not been called.  This causes problems for
>> live migration.  Stale cpu timers_state is saved to the migration stream,
>> causing time errors in the guest when it wakes from suspend, and state that
>> would have been modified by runstate notifiers is wrong.
>>
>> Modify vm_stop to completely stop the vm if the current state is suspended,
>> transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
>> Modify vm_start to restore the suspended state.
>>
>> This affects all callers of vm_stop and vm_start, notably, the qapi stop and
>> cont commands.  For example:
>>
>> (qemu) info status
>> VM status: paused (suspended)
>>
>> (qemu) stop
>> (qemu) info status
>> VM status: paused
>>
>> (qemu) cont
>> (qemu) info status
>> VM status: paused (suspended)
>>
>> (qemu) system_wakeup
>> (qemu) info status
>> VM status: running
> 
> So system_wakeup for a stopped (but used to be suspended) VM will fail
> directly, not touching vm_was_suspended.  It's not mentioned here, but that
> behavior makes sense to me.

Right.  I'll add that example above.

>> Suggested-by: Peter Xu 
>> Signed-off-by: Steve Sistare 
> 
> Reviewed-by: Peter Xu 
> 
> Since you touched qapi/, please copy maintainers too.  I've copied Markus
> and Eric in this reply.

My bad, thanks for the cc.

> I also have some nitpicks which may not affect the R-b, please see below.
> 
>> ---
>>  include/sysemu/runstate.h |  5 +
>>  qapi/misc.json| 10 --
>>  system/cpus.c | 19 ++-
>>  system/runstate.c |  3 +++
>>  4 files changed, 30 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
>> index f6a337b..1d6828f 100644
>> --- a/include/sysemu/runstate.h
>> +++ b/include/sysemu/runstate.h
>> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
>> cause)
>>  return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>>  }
>>  
>> +static inline bool runstate_is_started(RunState state)
> 
> Would runstate_has_vm_running() sound better?  It is a bit awkward when
> saying something like "start a runstate".

I have been searching for the perfect name for this accessor.
IMO using "running" in this accessor is confusing because it applies to both
the running and suspended state.  So, I invented a new aggregate state called
started.  vm_start transitions the machine to a started state.

How about runstate_was_started?  It works well at both start and stop call 
sites:

void vm_resume(RunState state)
if (runstate_was_started(state)) {
vm_start();

int vm_stop_force_state(RunState state)
if (runstate_was_started(runstate_get())) {
return vm_stop(state);

- Steve

>> +{
>> +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
>> +}
>> +



Re: [PATCH V6 05/14] migration: propagate suspended runstate

2023-12-01 Thread Steven Sistare
On 11/30/2023 6:06 PM, Peter Xu wrote:
> On Thu, Nov 30, 2023 at 01:37:18PM -0800, Steve Sistare wrote:
>> If the outgoing machine was previously suspended, propagate that to the
>> incoming side via global_state, so a subsequent vm_start restores the
>> suspended state.  To maintain backward and forward compatibility, define
>> the new field in a zero'd hole in the GlobalState struct.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  migration/global_state.c | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/migration/global_state.c b/migration/global_state.c
>> index 4e2a9d8..de2532c 100644
>> --- a/migration/global_state.c
>> +++ b/migration/global_state.c
>> @@ -25,6 +25,7 @@ typedef struct {
>>  uint8_t runstate[100];
>>  RunState state;
>>  bool received;
>> +bool vm_was_suspended;
>>  } GlobalState;
>>  
>>  static GlobalState global_state;
>> @@ -35,6 +36,7 @@ static void global_state_do_store(RunState state)
>>  assert(strlen(state_str) < sizeof(global_state.runstate));
>>  strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
>>state_str, '\0');
>> +global_state.vm_was_suspended = vm_get_suspended();
>>  }
>>  
>>  void global_state_store(void)
>> @@ -68,6 +70,12 @@ static bool global_state_needed(void *opaque)
>>  return true;
>>  }
>>  
>> +/* If the suspended state must be remembered, it is needed */
>> +
>> +if (vm_get_suspended()) {
>> +return true;
>> +}
>> +
>>  /* If state is running or paused, it is not needed */
>>  
>>  if (strcmp(runstate, "running") == 0 ||
>> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int 
>> version_id)
>>  return -EINVAL;
>>  }
>>  s->state = r;
>> +vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);
> 
> IIUC current vm_was_suspended (based on my read of your patch) was not the
> same as a boolean representing "whether VM is suspended", but only a
> temporary field to remember that for a VM stop request.  To be explicit, I
> didn't see this flag set in qemu_system_suspend() in your previous patch.
> 
> If so, we can already do:
> 
>   vm_set_suspended(s->vm_was_suspended);
> 
> Irrelevant of RUN_STATE_SUSPENDED?

We need both terms of the expression.

If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = false.
We call global_state_store prior to vm_stop_force_state, so the incoming
side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false.
However, the runstate is RUN_STATE_INMIGRATE.  When incoming finishes by
calling vm_start, we need to restore the suspended state.  Thus in 
global_state_post_load, we must set vm_was_suspended = true.

If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED),
then vm_was_suspended = true.  Migration from that state sets
vm_was_suspended = s->vm_was_suspended = true in global_state_post_load and 
ends with runstate_set(RUN_STATE_PAUSED).

I will add a comment here in the code.
 
>>  return 0;
>>  }
>> @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = {
>>  .fields = (VMStateField[]) {
>>  VMSTATE_UINT32(size, GlobalState),
>>  VMSTATE_BUFFER(runstate, GlobalState),
>> +VMSTATE_BOOL(vm_was_suspended, GlobalState),
>>  VMSTATE_END_OF_LIST()
>>  },
>>  };
> 
> I think this will break migration between old/new, unfortunately.  And
> since the global state exist mostly for every VM, all VM setup should be
> affected, and over all archs.

Thanks, I keep forgetting that my binary tricks are no good here.  However,
I have one other trick up my sleeve, which is to store vm_was_running in
global_state.runstate[strlen(runstate) + 2].  It is forwards and backwards
compatible, since that byte is always 0 in older qemu.  It can be implemented
with a few lines of code change confined to global_state.c, versus many lines 
spread across files to do it the conventional way using a compat property and
a subsection.  Sound OK?  

> We used to have the version_id field right above for adding fields, but I
> _think_ that will still break backward migration fron new->old binary, so
> not wanted.  Juan can keep me honest.
> 
> The best thing is still machine compat properties, afaict, to fix.  It's
> slightly involved, but let me attach a sample diff for you (at the end,
> possibly working with your current patch kind-of squashed, but not ever
> tested), hopefully make it slightly easier.
> 
> I'm wondering how bad it is to just ignore it, it's not as bad as if we
> don't fix stop-during-suspend, in this case the worst case of forgetting
> this field over migration is: if VM stopped (and used to be suspended) then
> after migration it'll keep being stopped, however after "cont" it'll forget
> the suspended state.  

Cont changes state to running, but the VM is broken because wake was never 
called.

> Not that bad!  IIUC SPR should always migrate with
> 

Re: [PATCH V5 02/12] cpus: stop vm in suspended state

2023-11-28 Thread Steven Sistare
On 11/22/2023 11:21 AM, Peter Xu wrote:
> On Wed, Nov 22, 2023 at 09:38:06AM +, Daniel P. Berrangé wrote:
>> On Mon, Nov 20, 2023 at 04:44:50PM -0500, Peter Xu wrote:
>>> On Mon, Nov 20, 2023 at 03:55:54PM -0500, Steven Sistare wrote:
>>>> If we drop force, then all calls to vm_stop will completely stop the
>>>> suspended state, eg an hmp "stop" command. This causes two problems.
>>>> First, that is a change in user-visible behavior for something that
>>>> currently works,
>>>
>>> IMHO it depends on what should be the correct behavior.  IOW, when VM is in
>>> SUSPENDED state and then the user sends "stop" QMP command, what should we
>>> expect?
>>
>> I would say that from a mgmtm app POV "stop" is initiating a state
>> transition, from RUN_STATE_RUNNING to RUN_STATE_PAUSED and "cont"
>> is doing the reverse from PAUSED to RUNNING.
>>
>> It is a little more complicated than that as there are some other
>> states like INMIGRATE that are conceptually equiv to RUNNING,
>> and states where the transition simply doesn't make sense.
> 
> In the qemu impl, INMIGRATE is, imho, more equiv of PAUSED - do_vm_stop()
> mostly ignores every state except RUNNING (putting bdrv operations aside).
> IOW, anything besides "running" is treated as "not running".
> 
> But then Paolo fixed that in 1e9981465f ("qmp: handle stop/cont in
> INMIGRATE state"), wiring that to autostart.
> 
> Now we seem to find that "suspended" should actually fall within (where
> "vm" is running, but "vcpu" is not), and it seems we should treat "vm" and
> "vcpu" differently.
> 
>>
>> So my question is if we're in "SUSPENDED" and someone issues "stop",
>> what state do we go into, and perhaps more importantly what state
>> do we go to in a subsequent "cont".
> 
> I think we must stop the "vm", not only the "vcpu".  I discussed this bit
> in the other thread more or less: it's because qemu_system_wakeup_request()
> can be called in many places, e.g. acpi_pm_tmr_timer().
> 
> It means even after the VM is "stop"ped by the admin QMP (where qmp_stop()
> ignores SUSPENDED, keep the "vm" running), it can silently got waken up
> without admin even noticing it.  I'm not sure what Libvirt will behave if
> it suddenly receives a QAPI_EVENT_WAKEUP randomly after a "stop".
> 
>>
>> If you say SUSPENDED ---(stop)---> PAUSED ---(cont)---> SUSPENDED
>> then we create a problem, because the decision for the transition
>> out of PAUSED needs memory of the previous state.
> 
> Right, that's why I think we at least need one more boolean to remember the
> suspended state, then when we switch from any STOP states into any RUN
> states, we know where to go.  Here STOP states I meant anything except
> RUNNING and SUSPENDED, while RUN -> RUNNING or SUSPENDED.
> 
>>
>>> My understanding is we should expect to fully stop the VM, including the
>>> ticks, for example.  Keeping the ticks running even after QMP "stop"
>>> doesn't sound right, isn't it?
>>
>> The "stop" QMP command is documented as
>>
>> "Stop all guest VCPU execution"
>>
>> the devil is in the detail though, and we've not documented any detail.
>>
>> Whether or not timers keep running across stop/cont I think can be
>> argued to be an impl detail, as long as the headline goal "vcpus
>> don't execute" is satisfied.
> 
> "stop" was since qemu v0.14, so I guess we can't blame the missing of
> details or any form of inaccuracy..  Obviously we do more than "stop vCPU
> executions" in the current implementation.
> 
> But after we reach a consensus on how we should fix the current suspended
> problem, we may want to update the documentation to start containing more
> information.
> 
>>
>>>> vs the migration code where we are fixing brokenness.
>>>
>>> This is not a migration-only bug if above holds, IMO.
>>>
>>>> Second, it does not quite work, because the state becomes
>>>> RUN_STATE_PAUSED, so the suspended state is forgotten, and the hmp "cont"
>>>> will try to set the running state.  I could fix that by introducing a new
>>>> state RUN_STATE_SUSPENDED_STOPPED, but again it is a user-visible change
>>>> in existing behavior.  (I even implemented that while developing, then I
>>>> realized it was not needed to fix the migratio

Re: [PATCH V5 02/12] cpus: stop vm in suspended state

2023-11-21 Thread Steven Sistare
On 11/20/2023 4:44 PM, Peter Xu wrote:
> On Mon, Nov 20, 2023 at 03:55:54PM -0500, Steven Sistare wrote:
>> If we drop force, then all calls to vm_stop will completely stop the
>> suspended state, eg an hmp "stop" command. This causes two problems.
>> First, that is a change in user-visible behavior for something that
>> currently works,
> 
> IMHO it depends on what should be the correct behavior.  IOW, when VM is in
> SUSPENDED state and then the user sends "stop" QMP command, what should we
> expect?
> 
> My understanding is we should expect to fully stop the VM, including the
> ticks, for example.  Keeping the ticks running even after QMP "stop"
> doesn't sound right, isn't it?

I agree, but others may not, and this decision would require approval from 
maintainers in other areas, including upper layers such as libvirt that are
aware of the suspended state.  It is a user-visible change, and may require 
a new libvirt release.

>> vs the migration code where we are fixing brokenness.
> 
> This is not a migration-only bug if above holds, IMO.
> 
>> Second, it does not quite work, because the state becomes
>> RUN_STATE_PAUSED, so the suspended state is forgotten, and the hmp "cont"
>> will try to set the running state.  I could fix that by introducing a new
>> state RUN_STATE_SUSPENDED_STOPPED, but again it is a user-visible change
>> in existing behavior.  (I even implemented that while developing, then I
>> realized it was not needed to fix the migration bugs.)
> 
> Good point.
> 
> Now with above comments, what's your thoughts on whether we should change
> the user behavior?  My answer is still a yes.
> 
> Maybe SUSPENDED should not be a RunState at all? SUSPENDED is guest visible
> behavior, while something like QMP "stop" is not guest visible.  Maybe we
> should remember it separately?
>
> It means qemu_system_suspend() could remember that in a separate field (as
> part of guest state), then when wakeup we should conditionally go back
> with/without vcpus running depending on the new "suspended" state.

Regardless of how we remember it, the status command must still expose the 
suspended state to the user.  The user must be able to see that a guest is 
suspended, and decide when to issue a wakeup command.

If we change the stop command to completely stop a suspended vm, then we must
allow the user to query whether a vm is suspended-running or suspended-stopped,
because the command they must issue to resume is different: wakeup for the
former, and cont for the latter.

This change is visible to libvirt.  Adding it will delay this entire patch
series, and is not necessary for fixing the migration bug.  There is no
downside to drawing the line here for this series, and possibly changing stop
semantics in the future.

- Steve



Re: [PATCH V5 02/12] cpus: stop vm in suspended state

2023-11-20 Thread Steven Sistare
On 11/20/2023 3:47 PM, Fabiano Rosas wrote:
> Peter Xu  writes:
>> On Mon, Nov 13, 2023 at 10:33:50AM -0800, Steve Sistare wrote:
>>> A vm in the suspended state is not completely stopped.  The VCPUs have been
>>> paused, but the cpu clock still runs, and runstate notifiers for the
>>> transition to stopped have not been called.  Modify vm_stop_force_state to
>>> completely stop the vm if the current state is suspended, to be called for
>>> live migration and snapshots.
>>>
>>> Suggested-by: Peter Xu 
>>> Signed-off-by: Steve Sistare 
>>> ---
>>>  system/cpus.c | 8 ++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/system/cpus.c b/system/cpus.c
>>> index f72c4be..c772708 100644
>>> --- a/system/cpus.c
>>> +++ b/system/cpus.c
>>> @@ -255,6 +255,8 @@ void cpu_interrupt(CPUState *cpu, int mask)
>>>  static int do_vm_stop(RunState state, bool send_stop, bool force)
>>>  {
>>>  int ret = 0;
>>> +bool running = runstate_is_running();
>>> +bool suspended = runstate_check(RUN_STATE_SUSPENDED);
>>>  
>>>  if (qemu_in_vcpu_thread()) {
>>>  qemu_system_vmstop_request_prepare();
>>> @@ -267,10 +269,12 @@ static int do_vm_stop(RunState state, bool send_stop, 
>>> bool force)
>>>  return 0;
>>>  }
>>>  
>>> -if (runstate_is_running()) {
>>> +if (running || (suspended && force)) {
>>>  runstate_set(state);
>>>  cpu_disable_ticks();
>>
>> Not directly relevant, but this is weird that I just notice.
>>
>> If we disable ticks before stopping vCPUs, IIUC it means vcpus can see
>> stall ticks.  I checked the vm_start() and indeed that one did it in the
>> other way round: we'll stop vCPUs before stopping the ticks.
>>
>>> -pause_all_vcpus();
>>> +if (running) {
>>> +pause_all_vcpus();
>>> +}
>>>  vm_state_notify(0, state);
>>>  if (send_stop) {
>>>  qapi_event_send_stop();
>>
>> IIUC the "force" is not actually needed.  It's only used when SUSPENDED,
>> right?

When not suspended, the force flag causes a stopped state to be forced even
if current is a different stopped state.

> That's the overloading I'm complaining about. We're using "force" to say
> both: "include suspended" and: "set the state". This is basically taking
> knowledge from the callsite being the migration code and encoding it in
> that flag.
> 
> I'd prefer something like:
> 
> static int do_vm_stop(RunState state, bool send_stop, bool set_state,
>   bool include_suspended);

This function has always been tailored for use by migration code and no other
callers.  Migration would always pass set_state=true and include_suspended=true.
We have no use case for other combinations and no test for them.

To my mind, "force" naturally implies both behaviors.  We force the machine 
into the
specified stop state, completely stopping suspended execution.

Perhaps renaming vm_stop_force_state would erase the old association of "force" 
with 
only forcing runstate, such as vm_stop_all().

- Steve



  1   2   3   4   >