Re: [PATCH V1 05/26] migration: precreate vmstate
On 5/27/2024 2:16 PM, Peter Xu wrote: On Mon, Apr 29, 2024 at 08:55:14AM -0700, Steve Sistare wrote: Provide the VMStateDescription precreate field to mark objects that must be loaded on the incoming side before devices have been created, because they provide properties that will be needed at creation time. They will be saved to and loaded from their own QEMUFile, via qemu_savevm_precreate_save and qemu_savevm_precreate_load, but these functions are not yet called in this patch. Allow them to be called before or after normal migration is active, when current_migration and current_incoming are not valid. Signed-off-by: Steve Sistare --- include/migration/vmstate.h | 6 migration/savevm.c | 69 + migration/savevm.h | 3 ++ 3 files changed, 73 insertions(+), 5 deletions(-) diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h index 294d2d8..4691334 100644 --- a/include/migration/vmstate.h +++ b/include/migration/vmstate.h @@ -198,6 +198,12 @@ struct VMStateDescription { * a QEMU_VM_SECTION_START section. */ bool early_setup; + +/* + * Send/receive this object in the precreate migration stream. + */ +bool precreate; + int version_id; int minimum_version_id; MigrationPriority priority; diff --git a/migration/savevm.c b/migration/savevm.c index 9789823..a30bcd9 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -239,6 +239,7 @@ static SaveState savevm_state = { #define SAVEVM_FOREACH(se, entry)\ QTAILQ_FOREACH(se, _state.handlers, entry)\ +if (!se->vmsd || !se->vmsd->precreate) #define SAVEVM_FOREACH_ALL(se, entry)\ QTAILQ_FOREACH(se, _state.handlers, entry) @@ -1006,13 +1007,19 @@ static void save_section_header(QEMUFile *f, SaveStateEntry *se, } } +static bool send_section_footer(SaveStateEntry *se) +{ +return (se->vmsd && se->vmsd->precreate) || + migrate_get_current()->send_section_footer; +} Does the precreate vmsd "require" the footer? Or it should also work? IMHO it's less optimal to bind features without good reasons. It is not required. However, IMO we should not treat send-section-footer as a fungible feature. It is strictly an improvement, as was added to catch misformated sections. It is only registered as a feature for backwards compatibility with qemu 2.3 and xen. For a brand new data stream such as precreate, where we are not constrained by backwards compatibility, we should unconditionally use the better protocol, and always send the footer. - Steve
Re: [PATCH V1 00/26] Live update: cpr-exec
On 5/27/2024 1:45 PM, Peter Xu wrote: On Tue, May 21, 2024 at 07:46:12AM -0400, Steven Sistare wrote: I understand, thanks. If I can help with any of your todo list, just ask - steve Thanks for offering the help, Steve. Started looking at this today, then I found that I miss something high-level. Let me ask here, and let me apologize already for starting to throw multiple questions.. IIUC the whole idea of this patchset is to allow efficient QEMU upgrade, in this case not host kernel but QEMU-only, and/or upper. Is there any justification on why the complexity is needed here? It looks to me this one is more involved than cpr-reboot, so I'm thinking how much we can get from the complexity, and whether it's worthwhile. 1000+ LOC is the min support, and if we even expect more to come, that's really important, IMHO. For example, what's the major motivation of this whole work? Is that more on performance, or is it more for supporting the special devices like VFIO which we used to not support, or something else? I can't find them in whatever cover letter I can find, including this one. Firstly, regarding performance, IMHO it'll be always nice to share even some very fundamental downtime measurement comparisons using the new exec mode v.s. the old migration ways to upgrade QEMU binary. Do you perhaps have some number on hand when you started working on this feature years ago? Or maybe some old links on the list would help too, as I didn't follow this work since the start. On VFIO, IIUC you started out this project without VFIO migration being there. Now we have VFIO migration so not sure how much it would work for the upgrade use case. Even with current VFIO migration, we may not want to migrate device states for a local upgrade I suppose, as that can be a lot depending on the type of device assigned. However it'll be nice to discuss this too if this is the major purpose of the series. I think one other challenge on QEMU upgrade with VFIO devices is that the dest QEMU won't be able to open the VFIO device when the src QEMU is still using it as the owner. IIUC this is a similar condition where QEMU wants to have proper ownership transfer of a shared block device, and AFAIR right now we resolved that issue using some form of file lock on the image file. In this case it won't easily apply to a VFIO dev fd, but maybe we still have other approaches, not sure whether you investigated any. E.g. could the VFIO handle be passed over using unix scm rights? I think this might remove one dependency of using exec which can cause quite some difference v.s. a generic migration (from which regard, cpr-reboot is still a pretty generic migration). You also mentioned vhost/tap, is that also a major goal of this series in the follow up patchsets? Is this a problem only because this solution will do exec? Can it work if either the exec()ed qemu or dst qemu create the vhost/tap fds when boot? Meanwhile, could you elaborate a bit on the implication on chardevs? From what I read in the doc update it looks like a major part of work in the future, but I don't yet understand the issue.. Is it also relevant to the exec() approach? In all cases, some of such discussion would be really appreciated. And if you used to consider other approaches to solve this problem it'll be great to mention how you chose this way. Considering this work contains too many things, it'll be nice if such discussion can start with the fundamentals, e.g. on why exec() is a must. The main goal of cpr-exec is providing a fast and reliable way to update qemu. cpr-reboot is not fast enough or general enough. It requires the guest to support suspend and resume for all devices, and that takes seconds. If one actually reboots the host, that adds more seconds, depending on system services. cpr-exec takes 0.1 secs, and works every time, unlike like migration which can fail to converge on a busy system. Live migration also consumes more system and network resources. cpr-exec seamlessly preserves client connections by preserving chardevs, and overall provides a much nicer user experience. chardev's are preserved by keeping their fd open across the exec, and remembering the value of the fd in precreate vmstate so that new qemu can associate the fd with the chardev rather than opening a new one. The approach of preserving open file descriptors is very general and applicable to all kinds of devices, regardless of whether they support live migration in hardware. Device fd's are preserved using the same mechanism as for chardevs. Devices that support live migration in hardware do not like to live migrate in place to the same node. It is not what they are designed for, and some implementations will flat out fail because the source and target interfaces are the same. For vhost/tap, sometimes the management layer opens the dev and passes an fd to qemu, and sometimes qemu opens the dev. The upcoming vhost/tap support allows both
Re: [PATCH V1 07/26] migration: VMStateId
On 5/27/2024 2:20 PM, Peter Xu wrote: On Mon, Apr 29, 2024 at 08:55:16AM -0700, Steve Sistare wrote: Define a type for the 256 byte id string to guarantee the same length is used and enforced everywhere. Signed-off-by: Steve Sistare --- include/exec/ramblock.h | 3 ++- include/migration/vmstate.h | 2 ++ migration/savevm.c | 8 migration/vmstate.c | 3 ++- 4 files changed, 10 insertions(+), 6 deletions(-) diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h index 0babd10..61deefe 100644 --- a/include/exec/ramblock.h +++ b/include/exec/ramblock.h @@ -23,6 +23,7 @@ #include "cpu-common.h" #include "qemu/rcu.h" #include "exec/ramlist.h" +#include "migration/vmstate.h" struct RAMBlock { struct rcu_head rcu; @@ -35,7 +36,7 @@ struct RAMBlock { void (*resized)(const char*, uint64_t length, void *host); uint32_t flags; /* Protected by the BQL. */ -char idstr[256]; +VMStateId idstr; /* RCU-enabled, writes protected by the ramlist lock */ QLIST_ENTRY(RAMBlock) next; QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers; Hmm.. Don't look like a good idea to include a migration header in ramblock.h? Is this ramblock change needed for this work? Well, entities that are migrated include migration headers, and now that includes RAMBlock. There is precedent: 0 include/exec/ramblock.h 26 #include "migration/vmstate.h" 1 include/hw/acpi/ich9_tco. 14 #include "migration/vmstate.h" 2 include/hw/display/ramfb. 4 #include "migration/vmstate.h" 3 include/hw/hyperv/vmbus.h 16 #include "migration/vmstate.h" 4 include/hw/input/pl050.h 14 #include "migration/vmstate.h" 5 include/hw/pci/shpc.h 7 #include "migration/vmstate.h" 6 include/hw/virtio/virtio. 20 #include "migration/vmstate.h" 7 include/migration/cpu.h8 #include "migration/vmstate.h" Granted, only some of the C files that include ramblock.h need all of vmstate.h. I could define VMStateId in a smaller file such as migration/misc.h, or a new file migration/vmstateid.h, and include that in ramblock.h. Any preference? - Steve
Re: [PATCH V1 08/26] migration: vmstate_info_void_ptr
On 5/27/2024 2:31 PM, Peter Xu wrote: On Mon, Apr 29, 2024 at 08:55:17AM -0700, Steve Sistare wrote: Define VMSTATE_VOID_PTR so the value of a pointer (but not its target) can be saved in the migration stream. This will be needed for CPR. Signed-off-by: Steve Sistare This is really tricky. From a first glance, I don't think migrating a VA is valid at all for migration even if with exec.. and looks insane to me for a cross-process migration, which seems to be allowed to use as a generic VMSD helper.. as VA is the address space barrier for different processes and I think it normally even apply to generic execve(), and we're trying to jailbreak for some reason.. It definitely won't work for any generic migration as sizeof(void*) can be different afaict between hosts, e.g. 32bit -> 64bit migrations. Some description would be really helpful in this commit message, e.g. explain the users and why. Do we need to poison that for generic VMSD use (perhaps with prefixed underscores)? I think I'll need to read on the rest to tell.. Short answer: we never dereference the void* in the new process. And must not. Longer answer: During CPR for vfio, each mapped DMA region is re-registered in the new process using the new VA. The ioctl to re-register identifies the mapping by IOVA and length. The same requirement holds for CPR of iommufd devices. However, in the iommufd framework, IOVA does not uniquely identify a dma mapping, and we need to use the old VA as the unique identifier. The new process re-registers each mapping, passing the old VA and new VA to the kernel. The old VA is never dereferenced in the new process, we just need its value. I suspected that the void* which must not be dereferenced might make people uncomfortable. I have an older version of my code which adds a uint64_t field to RAMBlock for recording and migrating the old VA. The saving and loading code is slightly less elegant, but no big deal. Would you prefer that? - Steve
Re: [PATCH V1 23/26] migration: misc cpr-exec blockers
On 5/24/2024 8:40 AM, Fabiano Rosas wrote: Steve Sistare writes: Add blockers for cpr-exec migration mode for devices and options that do not support it. Signed-off-by: Steve Sistare --- accel/xen/xen-all.c| 5 + backends/hostmem-epc.c | 12 ++-- hw/vfio/migration.c| 3 ++- replay/replay.c| 6 ++ 4 files changed, 23 insertions(+), 3 deletions(-) diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c index 0bdefce..9a7ed0f 100644 --- a/accel/xen/xen-all.c +++ b/accel/xen/xen-all.c This file is missing the migration/blocker.h include. Good eyes, will fix - steve
Re: [PATCH V1 20/26] migration: cpr-exec mode
On 5/24/2024 10:58 AM, Fabiano Rosas wrote: Steve Sistare writes: Add the cpr-exec migration mode. Usage: qemu-system-$arch -machine memfd-alloc=on ... migrate_set_parameter mode cpr-exec migrate_set_parameter cpr-exec-args \ ... -incoming migrate -d The migrate command stops the VM, saves state to the URI, directly exec's a new version of QEMU on the same host, replacing the original process while retaining its PID, and loads state from the URI. Guest RAM is preserved in place, albeit with new virtual addresses. Arguments for the new QEMU process are taken from the @cpr-exec-args parameter. The first argument should be the path of a new QEMU binary, or a prefix command that exec's the new QEMU binary. Because old QEMU terminates when new QEMU starts, one cannot stream data between the two, so the URI must be a type, such as a file, that reads all data before old QEMU exits. Memory backend objects must have the share=on attribute, and must be mmap'able in the new QEMU process. For example, memory-backend-file is acceptable, but memory-backend-ram is not. The VM must be started with the '-machine memfd-alloc=on' option. This causes implicit ram blocks (those not explicitly described by a memory-backend object) to be allocated by mmap'ing a memfd. Examples include VGA, ROM, and even guest RAM when it is specified without a memory-backend object. The implementation saves precreate vmstate at the end of normal migration in migrate_fd_cleanup, and tells the main loop to call cpr_exec. Incoming qemu loads preceate state early, before objects are created. The memfds are kept open across exec by clearing the close-on-exec flag, their values are saved in precreate vmstate, and they are mmap'd in new qemu. Note that the memfd-alloc option is not related to memory-backend-memfd. Later patches add support for memory-backend-memfd, and for additional devices, including vfio, chardev, and more. Signed-off-by: Steve Sistare --- include/migration/cpr.h | 14 + include/migration/misc.h | 3 ++ migration/cpr.c | 131 +++ migration/meson.build| 1 + migration/migration.c| 21 migration/migration.h| 5 +- migration/ram.c | 1 + qapi/migration.json | 30 ++- system/physmem.c | 2 + system/vl.c | 4 ++ 10 files changed, 210 insertions(+), 2 deletions(-) create mode 100644 include/migration/cpr.h create mode 100644 migration/cpr.c [...] diff --git a/migration/migration.c b/migration/migration.c index b5af6b5..0d91531 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -239,6 +239,7 @@ void migration_object_init(void) blk_mig_init(); ram_mig_init(); dirty_bitmap_mig_init(); +cpr_mig_init(); } typedef struct { @@ -1395,6 +1396,15 @@ static void migrate_fd_cleanup(MigrationState *s) qemu_fclose(tmp); } +if (migrate_mode() == MIG_MODE_CPR_EXEC) { +Error *err = NULL; +if (migration_precreate_save()) { +migrate_set_error(s, err); +error_report_err(err); There's an error_report_err() call already a few lines down. +migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED); +} +} Not a fan of saving state in the middle of the cleanup function. This adds extra restrictions to migrate_fd_cleanup() which already tends to be the source of a bunch of bugs. Can this be done either entirely before or after migrate_fd_cleanup()? There's only one callsite from which you actually want to do cpr-exec, migration_iteration_finish(). It's no big deal if we have to play a bit with the notifier call placement. static void migration_iteration_finish(MigrationState *s) { ... migration_bh_schedule(migrate_cpr_exec_bh, s); migration_bh_schedule(migrate_fd_cleanup_bh, s); bql_unlock(); } IIUC, the BQL ensures the ordering here, but if that doesn't work we could just call the cpr function right at the end of migrate_fd_cleanup(). That would already be better than interleaving. static void migrate_cpr_exec_bh(void *opaque) { MigrationState *s = opaque; Error *err = NULL; if (migration_has_failed(s) || migrate_mode() != MIG_MODE_CPR_EXEC) { return; } assert(s->state == MIGRATION_STATUS_COMPLETED); if (migration_precreate_save()) { migrate_set_error(s, err); error_report_err(err); migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED); migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL); return; } qemu_system_exec_request(cpr_exec, s->parameters.cpr_exec_args); } No problem, I can hoist the cpr exec logic out of migrate_fd_cleanup. I'll call migration_precreate_save prior, and I'll register a notifier for MIG_EVENT_PRECOPY_DONE that calls qemu_system_exec_request. BTW the following does not work
Re: [PATCH V1 00/26] Live update: cpr-exec
On 5/24/2024 9:02 AM, Fabiano Rosas wrote: Steve Sistare writes: This patch series adds the live migration cpr-exec mode. In this mode, QEMU stops the VM, writes VM state to the migration URI, and directly exec's a new version of QEMU on the same host, replacing the original process while retaining its PID. Guest RAM is preserved in place, albeit with new virtual addresses. The user completes the migration by specifying the -incoming option, and by issuing the migrate-incoming command if necessary. This saves and restores VM state, with minimal guest pause time, so that QEMU may be updated to a new version in between. The new interfaces are: * cpr-exec (MigMode migration parameter) * cpr-exec-args (migration parameter) * memfd-alloc=on (command-line option for -machine) * only-migratable-modes (command-line argument) The caller sets the mode parameter before invoking the migrate command. Arguments for the new QEMU process are taken from the cpr-exec-args parameter. The first argument should be the path of a new QEMU binary, or a prefix command that exec's the new QEMU binary, and the arguments should include the -incoming option. Memory backend objects must have the share=on attribute, and must be mmap'able in the new QEMU process. For example, memory-backend-file is acceptable, but memory-backend-ram is not. QEMU must be started with the '-machine memfd-alloc=on' option. This causes implicit RAM blocks (those not explicitly described by a memory-backend object) to be allocated by mmap'ing a memfd. Examples include VGA, ROM, and even guest RAM when it is specified without without reference to a memory-backend object. The memfds are kept open across exec, their values are saved in vmstate which is retrieved after exec, and they are re-mmap'd. The '-only-migratable-modes cpr-exec' option guarantees that the configuration supports cpr-exec. QEMU will exit at start time if not. Example: In this example, we simply restart the same version of QEMU, but in a real scenario one would set a new QEMU binary path in cpr-exec-args. # qemu-kvm -monitor stdio -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m 4G -machine memfd-alloc=on ... QEMU 9.1.50 monitor - type 'help' for more information (qemu) info status VM status: running (qemu) migrate_set_parameter mode cpr-exec (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming file:vm.state (qemu) migrate -d file:vm.state (qemu) QEMU 9.1.50 monitor - type 'help' for more information (qemu) info status VM status: running cpr-exec mode preserves attributes of outgoing devices that must be known before the device is created on the incoming side, such as the memfd descriptor number, but currently the migration stream is read after all devices are created. To solve this problem, I add two VMStateDescription options: precreate and factory. precreate objects are saved to their own migration stream, distinct from the main stream, and are read early by incoming QEMU, before devices are created. Factory objects are allocated on demand, without relying on a pre-registered object's opaque address, which is necessary because the devices to which the state will apply have not been created yet and hence have not registered an opaque address to receive the state. This patch series implements a minimal version of cpr-exec. Future series will add support for: * vfio * chardev's without loss of connectivity * vhost * fine-grained seccomp controls * hostmem-memfd * cpr-exec migration test Steve Sistare (26): oslib: qemu_clear_cloexec vl: helper to request re-exec migration: SAVEVM_FOREACH migration: delete unused parameter mis migration: precreate vmstate migration: precreate vmstate for exec migration: VMStateId migration: vmstate_info_void_ptr migration: vmstate_register_named migration: vmstate_unregister_named migration: vmstate_register at init time migration: vmstate factory object physmem: ram_block_create physmem: hoist guest_memfd creation physmem: hoist host memory allocation physmem: set ram block idstr earlier machine: memfd-alloc option migration: cpr-exec-args parameter physmem: preserve ram blocks for cpr migration: cpr-exec mode migration: migrate_add_blocker_mode migration: ram block cpr-exec blockers migration: misc cpr-exec blockers seccomp: cpr-exec blocker migration: fix mismatched GPAs during cpr-exec migration: only-migratable-modes accel/xen/xen-all.c| 5 + backends/hostmem-epc.c | 12 +- hmp-commands.hx| 2 +- hw/core/machine.c | 22 +++ hw/core/qdev.c | 1 + hw/intc/apic_common.c | 2 +- hw/vfio/migration.c| 3 +- include/exec/cpu-common.h | 3 +- include/exec/memory.h | 15 ++ include/exec/ramblock.h| 10 +-
Re: [PATCH V1 00/26] Live update: cpr-exec
I understand, thanks. If I can help with any of your todo list, just ask - steve On 5/20/2024 10:31 PM, Peter Xu wrote: Conference back then pto until today, so tomorrow will be my first working day after those. Sorry Steve, will try my best to read it before next week. I didn't dare to read too much my inbox yet. A bit scared but need to face it tomorrow. On Mon, May 20, 2024, 6:28 p.m. Fabiano Rosas <mailto:faro...@suse.de>> wrote: Steven Sistare mailto:steven.sist...@oracle.com>> writes: > Hi Peter, Hi Fabiano, > Will you have time to review the migration guts of this series any time soon? > In particular: > > [PATCH V1 05/26] migration: precreate vmstate > [PATCH V1 06/26] migration: precreate vmstate for exec > [PATCH V1 12/26] migration: vmstate factory object > [PATCH V1 18/26] migration: cpr-exec-args parameter > [PATCH V1 20/26] migration: cpr-exec mode > I'll get to them this week. I'm trying to make some progress with my own code before I forget how to program. I'm also trying to find some time to implement the device options in the migration tests so we can stop these virtio-* breakages that have been popping up.
Re: [PATCH V1 00/26] Live update: cpr-exec
Hi Peter, Hi Fabiano, Will you have time to review the migration guts of this series any time soon? In particular: [PATCH V1 05/26] migration: precreate vmstate [PATCH V1 06/26] migration: precreate vmstate for exec [PATCH V1 12/26] migration: vmstate factory object [PATCH V1 18/26] migration: cpr-exec-args parameter [PATCH V1 20/26] migration: cpr-exec mode - Steve On 4/29/2024 11:55 AM, Steve Sistare wrote: This patch series adds the live migration cpr-exec mode. In this mode, QEMU stops the VM, writes VM state to the migration URI, and directly exec's a new version of QEMU on the same host, replacing the original process while retaining its PID. Guest RAM is preserved in place, albeit with new virtual addresses. The user completes the migration by specifying the -incoming option, and by issuing the migrate-incoming command if necessary. This saves and restores VM state, with minimal guest pause time, so that QEMU may be updated to a new version in between. The new interfaces are: * cpr-exec (MigMode migration parameter) * cpr-exec-args (migration parameter) * memfd-alloc=on (command-line option for -machine) * only-migratable-modes (command-line argument) The caller sets the mode parameter before invoking the migrate command. Arguments for the new QEMU process are taken from the cpr-exec-args parameter. The first argument should be the path of a new QEMU binary, or a prefix command that exec's the new QEMU binary, and the arguments should include the -incoming option. Memory backend objects must have the share=on attribute, and must be mmap'able in the new QEMU process. For example, memory-backend-file is acceptable, but memory-backend-ram is not. QEMU must be started with the '-machine memfd-alloc=on' option. This causes implicit RAM blocks (those not explicitly described by a memory-backend object) to be allocated by mmap'ing a memfd. Examples include VGA, ROM, and even guest RAM when it is specified without without reference to a memory-backend object. The memfds are kept open across exec, their values are saved in vmstate which is retrieved after exec, and they are re-mmap'd. The '-only-migratable-modes cpr-exec' option guarantees that the configuration supports cpr-exec. QEMU will exit at start time if not. Example: In this example, we simply restart the same version of QEMU, but in a real scenario one would set a new QEMU binary path in cpr-exec-args. # qemu-kvm -monitor stdio -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m 4G -machine memfd-alloc=on ... QEMU 9.1.50 monitor - type 'help' for more information (qemu) info status VM status: running (qemu) migrate_set_parameter mode cpr-exec (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming file:vm.state (qemu) migrate -d file:vm.state (qemu) QEMU 9.1.50 monitor - type 'help' for more information (qemu) info status VM status: running cpr-exec mode preserves attributes of outgoing devices that must be known before the device is created on the incoming side, such as the memfd descriptor number, but currently the migration stream is read after all devices are created. To solve this problem, I add two VMStateDescription options: precreate and factory. precreate objects are saved to their own migration stream, distinct from the main stream, and are read early by incoming QEMU, before devices are created. Factory objects are allocated on demand, without relying on a pre-registered object's opaque address, which is necessary because the devices to which the state will apply have not been created yet and hence have not registered an opaque address to receive the state. This patch series implements a minimal version of cpr-exec. Future series will add support for: * vfio * chardev's without loss of connectivity * vhost * fine-grained seccomp controls * hostmem-memfd * cpr-exec migration test Steve Sistare (26): oslib: qemu_clear_cloexec vl: helper to request re-exec migration: SAVEVM_FOREACH migration: delete unused parameter mis migration: precreate vmstate migration: precreate vmstate for exec migration: VMStateId migration: vmstate_info_void_ptr migration: vmstate_register_named migration: vmstate_unregister_named migration: vmstate_register at init time migration: vmstate factory object physmem: ram_block_create physmem: hoist guest_memfd creation physmem: hoist host memory allocation physmem: set ram block idstr earlier machine: memfd-alloc option migration: cpr-exec-args parameter physmem: preserve ram blocks for cpr migration: cpr-exec mode migration: migrate_add_blocker_mode migration: ram block cpr-exec blockers migration: misc cpr-exec blockers seccomp: cpr-exec blocker migration: fix mismatched GPAs during cpr-exec migration: only-migratable-modes accel/xen/xen-all.c| 5 + backends/hostmem-epc.c |
Re: CPR/liveupdate: test results using prior bug fix
On 5/16/2024 1:24 PM, Michael Galaxy wrote: On 5/14/24 08:54, Michael Tokarev wrote: On 5/14/24 16:39, Michael Galaxy wrote: Steve, OK, so it does not look like this bugfix you wrote was included in 8.2.4 (which was released yesterday). Unfortunately, that means that anyone using CPR in that release will still (eventually) encounter the bug like I did. 8.2.4 is basically a "bugfix" release for 8.2.3 which I somewhat screwed up (in a minor way), plus a few currently (at the time) queued up changes. 8.2.3 was a big release though. I would recommend that y'all consider cherry-picking, perhaps, the relevant commits for a possible 8.2.5 ? Please Cc changes which are relevant for -stable to, well, qemu-sta...@nongnu.org :) Which changes needs to be picked up? Steve, can you comment here, please? At a minimum, we have this one: [PULL 20/25] migration: stop vm for cpr But that pull came with a handful of other changes that are also not in QEMU v8, so I suspect I'm missing some other important changes that might be important for a stable release? - Michael Hi Michael, I sent the full list of commits to this distribution yesterday, and I see it in my Sent email folder. Copying verbatim: Michael Galaxy, I'm afraid you are out of luck with respect to qemu 8.2. It has some of the cpr reboot commits, but is missing the following: 87a2848 migration: massage cpr-reboot documentation cbdafc1 migration: options incompatible with cpr ce5db1c migration: update cpr-reboot description 9867d4d migration: stop vm for cpr 4af667f migration: notifier error checking bf78a04 migration: refactor migrate_fd_connect failures 6835f5a migration: per-mode notifiers 5663dd3 migration: MigrationNotifyFunc c763a23e migration: remove postcopy_after_devices 9d9babf migration: MigrationEvent for notifiers 3e77573 migration: convert to NotifierWithReturn d91f33c migration: remove error from notifier data be19d83 notify: pass error to notifier with return b12635f migration: fix coverity migrate_mode finding 2b58a8b tests/qtest: postcopy migration with suspend b1fdd21 tests/qtest: precopy migration with suspend 5014478 tests/qtest: option to suspend during migration f064975 tests/qtest: migration events 49a5020 migration: preserve suspended for bg_migration 58b1057 migration: preserve suspended for snapshot b4e9ddc migration: preserve suspended runstate d3c86c99 migration: propagate suspended runstate 9ff5e79 cpus: vm_resume 0f1db06 cpus: check running not RUN_STATE_RUNNING b9ae473 cpus: stop vm in suspended runstate f06f316 cpus: vm_was_suspended All of those landed in qemu 9.0. --- - Steve
Re: CPR/liveupdate: test results using prior bug fix
On 5/14/2024 11:33 AM, Michael Tokarev wrote: 14.05.2024 16:54, Michael Tokarev пишет: On 5/14/24 16:39, Michael Galaxy wrote: Steve, OK, so it does not look like this bugfix you wrote was included in 8.2.4 (which was released yesterday). Unfortunately, that means that anyone using CPR in that release will still (eventually) encounter the bug like I did. 8.2.4 is basically a "bugfix" release for 8.2.3 which I somewhat screwed up (in a minor way), plus a few currently (at the time) queued up changes. 8.2.3 was a big release though. I would recommend that y'all consider cherry-picking, perhaps, the relevant commits for a possible 8.2.5 ? Please Cc changes which are relevant for -stable to, well, qemu-sta...@nongnu.org :) Which changes needs to be picked up? Please also note this particular change does not apply cleanly to stable-8.2 branch due to other changes in this area between 8.2 and 9.0, in particular, in postcopy_start(). So if this is to be picked up for stable-8.2, I need help from someone who better understands this code to select changes to pick. Thanks, /mjt Michael Galaxy, I'm afraid you are out of luck with respect to qemu 8.2. It has some of the cpr reboot commits, but is missing the following: 87a2848 migration: massage cpr-reboot documentation cbdafc1 migration: options incompatible with cpr ce5db1c migration: update cpr-reboot description 9867d4d migration: stop vm for cpr 4af667f migration: notifier error checking bf78a04 migration: refactor migrate_fd_connect failures 6835f5a migration: per-mode notifiers 5663dd3 migration: MigrationNotifyFunc c763a23e migration: remove postcopy_after_devices 9d9babf migration: MigrationEvent for notifiers 3e77573 migration: convert to NotifierWithReturn d91f33c migration: remove error from notifier data be19d83 notify: pass error to notifier with return b12635f migration: fix coverity migrate_mode finding 2b58a8b tests/qtest: postcopy migration with suspend b1fdd21 tests/qtest: precopy migration with suspend 5014478 tests/qtest: option to suspend during migration f064975 tests/qtest: migration events 49a5020 migration: preserve suspended for bg_migration 58b1057 migration: preserve suspended for snapshot b4e9ddc migration: preserve suspended runstate d3c86c99 migration: propagate suspended runstate 9ff5e79 cpus: vm_resume 0f1db06 cpus: check running not RUN_STATE_RUNNING b9ae473 cpus: stop vm in suspended runstate f06f316 cpus: vm_was_suspended All of those landed in qemu 9.0. - Steve
Re: CPR/liveupdate: test results using prior bug fix
Hi Michael, No surprise here, I did see some of the same failure messages and they prompted me to submit the fix. They are all symptoms of "the possibility of ram and device state being out of sync" as mentioned in the commit. I am not familiar with the process for maintaining old releases for qemu. Perhaps someone on this list can comment on 8.2.3. - Steve On 5/13/2024 2:22 PM, Michael Galaxy wrote: Hi Steve, We found that this specific change in particular ("migration: stop vm for cpr") fixes a bug that we've identified in testing back-to-back live updates in a lab environment. More specifically, *without* this change (which is not available in 8.2.2, but *is* available in 9.0.0) causes the metadata save file to be corrupted when doing live updates one after another. Typically we see a corrupted save file somewhere in between 20 and 30 live updates and while doing a git bisect, we found that this change makes the problem go away. Were you aware? Is there any plan in place to cherry pick this for 8.2.3, perhaps or a plan to release 8.2.3 at some point? Here are some examples of how the bug manifests in different locations of the QEMU metadata save file: 2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load mtrr_var:base 2024-04-26T13:28:53Z qemu-system-x86_64: Failed to load cpu:env.mtrr_var 2024-04-26T13:28:53Z qemu-system-x86_64: error while loading state for instance 0x1b of device 'cpu' 2024-04-26T13:28:53Z qemu-system-x86_64: load of migration failed: Input/output error And another: 2024-04-17T16:09:47Z qemu-system-x86_64: check_section_footer: Read section footer failed: -5 2024-04-17T16:09:47Z qemu-system-x86_64: load of migration failed: Invalid argument And another: 2024-04-30T21:53:29Z qemu-system-x86_64: Unable to read ID string for section 163 2024-04-30T21:53:29Z qemu-system-x86_64: load of migration failed: Invalid argument And another: 2024-05-01T16:01:44Z qemu-system-x86_64: Unable to read ID string for section 164 2024-05-01T16:01:44Z qemu-system-x86_64: load of migration failed: Invalid argument As you can see, they occur quite randomly, but generally it takes at least 20-30+ live updates before the problem occurs. - Michael On 2/27/24 23:13, pet...@redhat.com wrote: From: Steve Sistare When migration for cpr is initiated, stop the vm and set state RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the possibility of ram and device state being out of sync, and guarantees that a guest in the suspended state remains suspended, because qmp_cont rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu Link:https://urldefense.com/v3/__https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com__;!!GjvTz_vk!QLsFOCX-x2U9bzAo98SdidKlomHrmf_t0UmQKtgudoIcaDVoAJOPm39ZqaNP_nT5I8QqVfSgwhDZmg$ Signed-off-by: Peter Xu --- include/migration/misc.h | 1 + migration/migration.h| 2 -- migration/migration.c| 51 3 files changed, 32 insertions(+), 22 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index e4933b815b..5d1aa593ed 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -60,6 +60,7 @@ void migration_object_init(void); void migration_shutdown(void); bool migration_is_idle(void); bool migration_is_active(MigrationState *); +bool migrate_mode_is_cpr(MigrationState *); typedef enum MigrationEventType { MIG_EVENT_PRECOPY_SETUP, diff --git a/migration/migration.h b/migration/migration.h index aef8afbe1f..65c0b61cbd 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -541,6 +541,4 @@ int migration_rp_wait(MigrationState *s); */ void migration_rp_kick(MigrationState *s); -int migration_stop_vm(RunState state); - #endif diff --git a/migration/migration.c b/migration/migration.c index 37c836b0b0..90a90947fb 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -167,11 +167,19 @@ static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp) return (a > b) - (a < b); } -int migration_stop_vm(RunState state) +static int migration_stop_vm(MigrationState *s, RunState state) { -int ret = vm_stop_force_state(state); +int ret; + +migration_downtime_start(s); + +s->vm_old_state = runstate_get(); +global_state_store(); + +ret = vm_stop_force_state(state); trace_vmstate_downtime_checkpoint("src-vm-stopped"); +trace_migration_completion_vm_stop(ret); return ret; } @@ -1602,6 +1610,11 @@ bool migration_is_active(MigrationState *s) s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); } +bool migrate_mode_is_cpr(MigrationState *s) +{ +return s->parameters.mode == MIG_MODE_CPR_REBOOT; +} + int migrate_init(MigrationState *s, Error **errp) { int ret; @@ -2454,10 +2467,7 @@ static int postcopy_start(MigrationState *ms,
Re: [PATCH V1 26/26] migration: only-migratable-modes
On 5/9/2024 3:14 PM, Fabiano Rosas wrote: Steve Sistare writes: Add the only-migratable-modes option as a generalization of only-migratable. Only devices that support all requested modes are allowed. Signed-off-by: Steve Sistare --- include/migration/misc.h | 3 +++ include/sysemu/sysemu.h| 1 - migration/migration-hmp-cmds.c | 26 +- migration/migration.c | 22 +- migration/savevm.c | 2 +- qemu-options.hx| 16 ++-- system/globals.c | 1 - system/vl.c| 13 - target/s390x/cpu_models.c | 4 +++- 9 files changed, 75 insertions(+), 13 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index 5b963ba..3ad2cd9 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -119,6 +119,9 @@ bool migration_incoming_postcopy_advised(void); /* True if background snapshot is active */ bool migration_in_bg_snapshot(void); +void migration_set_required_mode(MigMode mode); +bool migration_mode_required(MigMode mode); + /* migration/block-dirty-bitmap.c */ void dirty_bitmap_mig_init(void); diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 5b4397e..0a9c4b4 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -8,7 +8,6 @@ /* vl.c */ -extern int only_migratable; extern const char *qemu_name; extern QemuUUID qemu_uuid; extern bool qemu_uuid_set; diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c index 414c7e8..ca913b7 100644 --- a/migration/migration-hmp-cmds.c +++ b/migration/migration-hmp-cmds.c @@ -16,6 +16,7 @@ #include "qemu/osdep.h" #include "block/qapi.h" #include "migration/snapshot.h" +#include "migration/misc.h" #include "monitor/hmp.h" #include "monitor/monitor.h" #include "qapi/error.h" @@ -33,6 +34,28 @@ #include "options.h" #include "migration.h" +static void migration_dump_modes(Monitor *mon) +{ +int mode, n = 0; + +monitor_printf(mon, "only-migratable-modes: "); + +for (mode = 0; mode < MIG_MODE__MAX; mode++) { +if (migration_mode_required(mode)) { +if (n++) { +monitor_printf(mon, ","); +} +monitor_printf(mon, "%s", MigMode_str(mode)); +} +} + +if (!n) { +monitor_printf(mon, "none\n"); +} else { +monitor_printf(mon, "\n"); +} +} + static void migration_global_dump(Monitor *mon) { MigrationState *ms = migrate_get_current(); @@ -41,7 +64,7 @@ static void migration_global_dump(Monitor *mon) monitor_printf(mon, "store-global-state: %s\n", ms->store_global_state ? "on" : "off"); monitor_printf(mon, "only-migratable: %s\n", - only_migratable ? "on" : "off"); + migration_mode_required(MIG_MODE_NORMAL) ? "on" : "off"); monitor_printf(mon, "send-configuration: %s\n", ms->send_configuration ? "on" : "off"); monitor_printf(mon, "send-section-footer: %s\n", @@ -50,6 +73,7 @@ static void migration_global_dump(Monitor *mon) ms->decompress_error_check ? "on" : "off"); monitor_printf(mon, "clear-bitmap-shift: %u\n", ms->clear_bitmap_shift); +migration_dump_modes(mon); } void hmp_info_migrate(Monitor *mon, const QDict *qdict) diff --git a/migration/migration.c b/migration/migration.c index 4984dee..5535b84 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1719,17 +1719,29 @@ static bool is_busy(Error **reasonp, Error **errp) return false; } -static bool is_only_migratable(Error **reasonp, Error **errp, int modes) +static int migration_modes_required; + +void migration_set_required_mode(MigMode mode) +{ +migration_modes_required |= BIT(mode); +} + +bool migration_mode_required(MigMode mode) +{ +return !!(migration_modes_required & BIT(mode)); +} + +static bool modes_are_required(Error **reasonp, Error **errp, int modes) { ERRP_GUARD(); -if (only_migratable && (modes & BIT(MIG_MODE_NORMAL))) { +if (migration_modes_required & modes) { error_propagate_prepend(errp, *reasonp, -"disallowing migration blocker " -"(--only-migratable) for: "); +"-only-migratable{-modes} specified, but: "); extra space before 'specified' Will fix, thanks. *reasonp = NULL; return true; } + return false; } @@ -1783,7 +1795,7 @@ int migrate_add_blocker_modes(Error **reasonp, Error **errp, MigMode mode, ...) modes = get_modes(mode, ap); va_end(ap); -if (is_only_migratable(reasonp, errp, modes)) { +if (modes_are_required(reasonp, errp, modes)) { return -EACCES; } else if (is_busy(reasonp, errp)) {
Re: [PATCH V1 13/26] physmem: ram_block_create
On 5/13/2024 2:37 PM, Fabiano Rosas wrote: Steve Sistare writes: Create a common subroutine to allocate a RAMBlock, de-duping the code to populate its common fields. Add a trace point for good measure. No functional change. Signed-off-by: Steve Sistare --- system/physmem.c| 47 ++- system/trace-events | 3 +++ 2 files changed, 29 insertions(+), 21 deletions(-) diff --git a/system/physmem.c b/system/physmem.c index c3d04ca..6216b14 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -52,6 +52,7 @@ #include "sysemu/hw_accel.h" #include "sysemu/xen-mapcache.h" #include "trace/trace-root.h" +#include "trace.h" #ifdef CONFIG_FALLOCATE_PUNCH_HOLE #include @@ -1918,11 +1919,29 @@ out_free: } } +static RAMBlock *ram_block_create(MemoryRegion *mr, ram_addr_t size, + ram_addr_t max_size, uint32_t ram_flags) +{ +RAMBlock *rb = g_malloc0(sizeof(*rb)); + +rb->used_length = size; +rb->max_length = max_size; +rb->fd = -1; +rb->flags = ram_flags; +rb->page_size = qemu_real_host_page_size(); +rb->mr = mr; +rb->guest_memfd = -1; +trace_ram_block_create(rb->idstr, rb->flags, rb->fd, rb->used_length, There's no idstr at this point, is there? I think this needs to be memory_region_name(mr). Thanks, will fix. That is a bug in my patch factoring. I add the call to qemu_ram_set_idstr in patch "physmem: set ram block idstr earlier". - Steve + rb->max_length, mr->align); +return rb; +} + #ifdef CONFIG_POSIX RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, uint32_t ram_flags, int fd, off_t offset, Error **errp) { +void *host; RAMBlock *new_block; Error *local_err = NULL; int64_t file_size, file_align; @@ -1962,19 +1981,14 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, return NULL; } -new_block = g_malloc0(sizeof(*new_block)); -new_block->mr = mr; -new_block->used_length = size; -new_block->max_length = size; -new_block->flags = ram_flags; -new_block->guest_memfd = -1; -new_block->host = file_ram_alloc(new_block, size, fd, !file_size, offset, - errp); -if (!new_block->host) { +new_block = ram_block_create(mr, size, size, ram_flags); +host = file_ram_alloc(new_block, size, fd, !file_size, offset, errp); +if (!host) { g_free(new_block); return NULL; } +new_block->host = host; ram_block_add(new_block, _err); if (local_err) { g_free(new_block); @@ -1982,7 +1996,6 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, return NULL; } return new_block; - } @@ -2054,18 +2067,10 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size, align = MAX(align, TARGET_PAGE_SIZE); size = ROUND_UP(size, align); max_size = ROUND_UP(max_size, align); - -new_block = g_malloc0(sizeof(*new_block)); -new_block->mr = mr; -new_block->resized = resized; -new_block->used_length = size; -new_block->max_length = max_size; assert(max_size >= size); -new_block->fd = -1; -new_block->guest_memfd = -1; -new_block->page_size = qemu_real_host_page_size(); -new_block->host = host; -new_block->flags = ram_flags; +new_block = ram_block_create(mr, size, max_size, ram_flags); +new_block->resized = resized; + ram_block_add(new_block, _err); if (local_err) { g_free(new_block); diff --git a/system/trace-events b/system/trace-events index 69c9044..f0a80ba 100644 --- a/system/trace-events +++ b/system/trace-events @@ -38,3 +38,6 @@ dirtylimit_state_finalize(void) dirtylimit_throttle_pct(int cpu_index, uint64_t pct, int64_t time_us) "CPU[%d] throttle percent: %" PRIu64 ", throttle adjust time %"PRIi64 " us" dirtylimit_set_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set dirty page rate limit %"PRIu64 dirtylimit_vcpu_execute(int cpu_index, int64_t sleep_time_us) "CPU[%d] sleep %"PRIi64 " us" + +# physmem.c +ram_block_create(const char *name, uint32_t flags, int fd, size_t used_length, size_t max_length, size_t align) "%s, flags %u, fd %d, len %lu, maxlen %lu, align %lu"
Re: [PATCH V1 24/26] seccomp: cpr-exec blocker
On 5/10/2024 3:54 AM, Daniel P. Berrangé wrote: On Mon, Apr 29, 2024 at 08:55:33AM -0700, Steve Sistare wrote: cpr-exec mode needs permission to exec. Block it if permission is denied. Signed-off-by: Steve Sistare --- include/sysemu/seccomp.h | 1 + system/qemu-seccomp.c| 10 -- system/vl.c | 6 ++ 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/sysemu/seccomp.h b/include/sysemu/seccomp.h index fe85989..023c0a1 100644 --- a/include/sysemu/seccomp.h +++ b/include/sysemu/seccomp.h @@ -22,5 +22,6 @@ #define QEMU_SECCOMP_SET_RESOURCECTL (1 << 4) int parse_sandbox(void *opaque, QemuOpts *opts, Error **errp); +uint32_t qemu_seccomp_get_opts(void); #endif diff --git a/system/qemu-seccomp.c b/system/qemu-seccomp.c index 5c20ac0..0d2a561 100644 --- a/system/qemu-seccomp.c +++ b/system/qemu-seccomp.c @@ -360,12 +360,18 @@ static int seccomp_start(uint32_t seccomp_opts, Error **errp) return rc < 0 ? -1 : 0; } +static uint32_t seccomp_opts; + +uint32_t qemu_seccomp_get_opts(void) +{ +return seccomp_opts; +} + int parse_sandbox(void *opaque, QemuOpts *opts, Error **errp) { if (qemu_opt_get_bool(opts, "enable", false)) { -uint32_t seccomp_opts = QEMU_SECCOMP_SET_DEFAULT -| QEMU_SECCOMP_SET_OBSOLETE; const char *value = NULL; +seccomp_opts = QEMU_SECCOMP_SET_DEFAULT | QEMU_SECCOMP_SET_OBSOLETE; value = qemu_opt_get(opts, "obsolete"); if (value) { diff --git a/system/vl.c b/system/vl.c index 7252100..b76881e 100644 --- a/system/vl.c +++ b/system/vl.c @@ -76,6 +76,7 @@ #include "hw/block/block.h" #include "hw/i386/x86.h" #include "hw/i386/pc.h" +#include "migration/blocker.h" #include "migration/cpr.h" #include "migration/misc.h" #include "migration/snapshot.h" @@ -2493,6 +2494,11 @@ static void qemu_process_early_options(void) QemuOptsList *olist = qemu_find_opts_err("sandbox", NULL); if (olist) { qemu_opts_foreach(olist, parse_sandbox, NULL, _fatal); +if (qemu_seccomp_get_opts() & QEMU_SECCOMP_SET_SPAWN) { +Error *blocker = NULL; +error_setg(, "-sandbox denies exec for cpr-exec"); +migrate_add_blocker_mode(, MIG_MODE_CPR_EXEC, _fatal); +} } #endi There are a whole pile of features that get blocked wehn -sandbox is used. I'm not convinced we should be adding code to check for specific blocked features, as such a list will always be incomplete at best, and incorrectly block things at worst. I view this primarily as a documentation task for the cpr-exec command. For cpr and live migration, we do our best to prevent breaking the guest for cases we know will fail. Independently, a clear error message here will reduce error reports for this new cpr feature. Would it be more palatable if I move this blocker's creation to cpr_mig_init? - Steve
Re: [PATCH V1 22/26] migration: ram block cpr-exec blockers
On 5/9/2024 2:01 PM, Fabiano Rosas wrote: Steve Sistare writes: Unlike cpr-reboot mode, cpr-exec mode cannot save volatile ram blocks in the migration stream file and recreate them later, because the physical memory for the blocks is pinned and registered for vfio. Add an exec-mode blocker for volatile ram blocks. Also add a blocker for RAM_GUEST_MEMFD. Preserving guest_memfd may be sufficient for cpr-exec, but it has not been tested yet. - Steve extra text here Will fix, thanks - steve Signed-off-by: Steve Sistare Reviewed-by: Fabiano Rosas
Re: [PATCH V1 09/26] migration: vmstate_register_named
On 5/9/2024 10:32 AM, Fabiano Rosas wrote: Fabiano Rosas writes: Steve Sistare writes: Define vmstate_register_named which takes the instance name as its first parameter, instead of generating the name from VMStateIf of the Object. This will be needed to register objects that are not Objects. Pass the new name parameter to vmstate_register_with_alias_id. Signed-off-by: Steve Sistare Reviewed-by: Fabiano Rosas Actually, can't we define a wrapper type just for this purpose? For example, looking at dbus-vmstate.c: One would need to provide a separate wrapper for each struct to be registered as vmstate. This patch set only has RAMBlock, but there are more coming in my next patch sets. vmstate_register_named avoids adding such boilerplate, and makes it easier to add more cpr state in the future. - Steve static void dbus_vmstate_class_init(ObjectClass *oc, void *data) { ... VMStateIfClass *vc = VMSTATE_IF_CLASS(oc); vc->get_id = dbus_vmstate_get_id; ... } static const TypeInfo dbus_vmstate_info = { .name = TYPE_DBUS_VMSTATE, .parent = TYPE_OBJECT, .instance_size = sizeof(DBusVMState), .instance_finalize = dbus_vmstate_finalize, .class_init = dbus_vmstate_class_init, .interfaces = (InterfaceInfo[]) { { TYPE_USER_CREATABLE }, // without this one { TYPE_VMSTATE_IF }, { } } }; static void register_types(void) { type_register_static(_vmstate_info); } type_init(register_types);
Re: [PATCH V1 05/26] migration: precreate vmstate
On 5/7/2024 5:02 PM, Fabiano Rosas wrote: Steve Sistare writes: Provide the VMStateDescription precreate field to mark objects that must be loaded on the incoming side before devices have been created, because they provide properties that will be needed at creation time. They will be saved to and loaded from their own QEMUFile, via It's not obvious to me what the reason is to have a separate QEMUFile. Could you expand on this? The migration stream is read in the calling sequence at B below, but precreate state is needed at A before chardev and memory backends are created. main() qemu_init() A: qemu_create_early_backends() qemu_create_late_backends() migration_object_init() qmp_x_exit_preconfig() qmp_migrate_incoming() qemu_default_main() qemu_main_loop() fd_accept_incoming_migration() migration_channel_process_incoming() migration_ioc_process_incoming() migration_incoming_process() process_incoming_migration_co() B: qemu_loadvm_state() precreate objects could be emitted first in the existing migration stream and read at A, but this requires untangling numerous ordering dependencies amongst migration_object_init, qemu_create_machine, configure_accelerators, monitor init, and the main loop. - Steve
Re: [PATCH V1 06/26] migration: precreate vmstate for exec
On 5/6/2024 7:34 PM, Fabiano Rosas wrote: Steve Sistare writes: Provide migration_precreate_save for saving precreate vmstate across exec. Create a memfd, save its value in the environment, and serialize state to it. Reverse the process in migration_precreate_load. Signed-off-by: Steve Sistare --- include/migration/misc.h | 5 ++ migration/meson.build| 1 + migration/precreate.c| 139 +++ 3 files changed, 145 insertions(+) create mode 100644 migration/precreate.c diff --git a/include/migration/misc.h b/include/migration/misc.h index c9e200f..cf30351 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -56,6 +56,11 @@ AnnounceParameters *migrate_announce_params(void); void dump_vmstate_json_to_file(FILE *out_fp); +/* migration/precreate.c */ +int migration_precreate_save(Error **errp); +void migration_precreate_unsave(void); +int migration_precreate_load(Error **errp); + /* migration/migration.c */ void migration_object_init(void); void migration_shutdown(void); diff --git a/migration/meson.build b/migration/meson.build index f76b1ba..50e7cb2 100644 --- a/migration/meson.build +++ b/migration/meson.build @@ -26,6 +26,7 @@ system_ss.add(files( 'ram-compress.c', 'options.c', 'postcopy-ram.c', + 'precreate.c', 'savevm.c', 'socket.c', 'tls.c', diff --git a/migration/precreate.c b/migration/precreate.c new file mode 100644 index 000..0bf5e1f --- /dev/null +++ b/migration/precreate.c @@ -0,0 +1,139 @@ +/* + * Copyright (c) 2022, 2024 Oracle and/or its affiliates. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/cutils.h" +#include "qemu/memfd.h" +#include "qapi/error.h" +#include "io/channel-file.h" +#include "migration/misc.h" +#include "migration/qemu-file.h" +#include "migration/savevm.h" + +#define PRECREATE_STATE_NAME "QEMU_PRECREATE_STATE" + +static QEMUFile *qemu_file_new_fd_input(int fd, const char *name) +{ +g_autoptr(QIOChannelFile) fioc = qio_channel_file_new_fd(fd); +QIOChannel *ioc = QIO_CHANNEL(fioc); +qio_channel_set_name(ioc, name); +return qemu_file_new_input(ioc); +} + +static QEMUFile *qemu_file_new_fd_output(int fd, const char *name) +{ +g_autoptr(QIOChannelFile) fioc = qio_channel_file_new_fd(fd); +QIOChannel *ioc = QIO_CHANNEL(fioc); +qio_channel_set_name(ioc, name); +return qemu_file_new_output(ioc); +} + +static int memfd_create_named(const char *name, Error **errp) +{ +int mfd; +char val[16]; + +mfd = memfd_create(name, 0); +if (mfd < 0) { +error_setg_errno(errp, errno, "memfd_create failed"); +return -1; +} + +/* Remember mfd in environment for post-exec load */ +qemu_clear_cloexec(mfd); +snprintf(val, sizeof(val), "%d", mfd); +g_setenv(name, val, 1); + +return mfd; +} + +static int memfd_find_named(const char *name, int *mfd_p, Error **errp) +{ +const char *val = g_getenv(name); + +if (!val) { +*mfd_p = -1; +return 0; /* No memfd was created, not an error */ +} +g_unsetenv(name); +if (qemu_strtoi(val, NULL, 10, mfd_p)) { +error_setg(errp, "Bad %s env value %s", PRECREATE_STATE_NAME, val); +return -1; +} +lseek(*mfd_p, 0, SEEK_SET); +return 0; +} + +static void memfd_delete_named(const char *name) +{ +int mfd; +const char *val = g_getenv(name); + +if (val) { +g_unsetenv(name); +if (!qemu_strtoi(val, NULL, 10, )) { +close(mfd); +} +} +} + +static QEMUFile *qemu_file_new_memfd_output(const char *name, Error **errp) +{ +int mfd = memfd_create_named(name, errp); + +if (mfd < 0) { +return NULL; +} + +return qemu_file_new_fd_output(mfd, name); +} + +static QEMUFile *qemu_file_new_memfd_input(const char *name, Error **errp) +{ +int ret, mfd; + +ret = memfd_find_named(name, , errp); +if (ret || mfd < 0) { +return NULL; +} + +return qemu_file_new_fd_input(mfd, name); +} + +int migration_precreate_save(Error **errp) +{ +QEMUFile *f = qemu_file_new_memfd_output(PRECREATE_STATE_NAME, errp); + +if (!f) { +return -1; +} else if (qemu_savevm_precreate_save(f, errp)) { +memfd_delete_named(PRECREATE_STATE_NAME); +return -1; +} else { +/* Do not close f, as mfd must remain open. */ +return 0; +} +} + +void migration_precreate_unsave(void) +{ +memfd_delete_named(PRECREATE_STATE_NAME); +} + +int migration_precreate_load(Error **errp) +{ +int ret; +QEMUFile *f = qemu_file_new_memfd_input(PRECREATE_STATE_NAME, errp); Can we avoid the QEMUFile? I don't see it being exported from this file. It is not exported, but within this file, it is the basis for all read and write operations, via the existing functions
Re: [PATCH V1 03/26] migration: SAVEVM_FOREACH
On 5/6/2024 7:17 PM, Fabiano Rosas wrote: Steve Sistare writes: Define an abstraction SAVEVM_FOREACH to loop over all savevm state handlers, and replace QTAILQ_FOREACH. Define variants for ALL so we can loop over all handlers vs a subset of handlers in a subsequent patch, but at this time there is no distinction between the two. No functional change. Signed-off-by: Steve Sistare --- migration/savevm.c | 55 +++--- 1 file changed, 32 insertions(+), 23 deletions(-) diff --git a/migration/savevm.c b/migration/savevm.c index 4509482..6829ba3 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -237,6 +237,15 @@ static SaveState savevm_state = { .global_section_id = 0, }; +#define SAVEVM_FOREACH(se, entry)\ +QTAILQ_FOREACH(se, _state.handlers, entry)\ + +#define SAVEVM_FOREACH_ALL(se, entry)\ +QTAILQ_FOREACH(se, _state.handlers, entry) This feels worse than SAVEVM_FOREACH_NOT_PRECREATED. We'll have to keep coming back to the definition to figure out which FOREACH is the real deal. I take your point, but the majority of the loops do not care about precreated objects, so it seems backwards to make them more verbose with SAVEVM_FOREACH_NOT_PRECREATE. I can go either way, but we need Peter's opinion also. + +#define SAVEVM_FOREACH_SAFE_ALL(se, entry, new_se) \ +QTAILQ_FOREACH_SAFE(se, _state.handlers, entry, new_se) + static SaveStateEntry *find_se(const char *idstr, uint32_t instance_id); static bool should_validate_capability(int capability) @@ -674,7 +683,7 @@ static uint32_t calculate_new_instance_id(const char *idstr) SaveStateEntry *se; uint32_t instance_id = 0; -QTAILQ_FOREACH(se, _state.handlers, entry) { +SAVEVM_FOREACH_ALL(se, entry) { In this patch we can't have both instances... if (strcmp(idstr, se->idstr) == 0 && instance_id <= se->instance_id) { instance_id = se->instance_id + 1; @@ -690,7 +699,7 @@ static int calculate_compat_instance_id(const char *idstr) SaveStateEntry *se; int instance_id = 0; -QTAILQ_FOREACH(se, _state.handlers, entry) { +SAVEVM_FOREACH(se, entry) { ...otherwise one of the two changes will go undocumented because the actual reason for it will only be described in the next patch. Sure, I'll move this to the precreate patch. - Steve if (!se->compat) { continue; } @@ -816,7 +825,7 @@ void unregister_savevm(VMStateIf *obj, const char *idstr, void *opaque) } pstrcat(id, sizeof(id), idstr); -QTAILQ_FOREACH_SAFE(se, _state.handlers, entry, new_se) { +SAVEVM_FOREACH_SAFE_ALL(se, entry, new_se) { if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) { savevm_state_handler_remove(se); g_free(se->compat); @@ -939,7 +948,7 @@ void vmstate_unregister(VMStateIf *obj, const VMStateDescription *vmsd, { SaveStateEntry *se, *new_se; -QTAILQ_FOREACH_SAFE(se, _state.handlers, entry, new_se) { +SAVEVM_FOREACH_SAFE_ALL(se, entry, new_se) { if (se->vmsd == vmsd && se->opaque == opaque) { savevm_state_handler_remove(se); g_free(se->compat); @@ -1223,7 +1232,7 @@ bool qemu_savevm_state_blocked(Error **errp) { SaveStateEntry *se; -QTAILQ_FOREACH(se, _state.handlers, entry) { +SAVEVM_FOREACH(se, entry) { if (se->vmsd && se->vmsd->unmigratable) { error_setg(errp, "State blocked by non-migratable device '%s'", se->idstr); @@ -1237,7 +1246,7 @@ void qemu_savevm_non_migratable_list(strList **reasons) { SaveStateEntry *se; -QTAILQ_FOREACH(se, _state.handlers, entry) { +SAVEVM_FOREACH(se, entry) { if (se->vmsd && se->vmsd->unmigratable) { QAPI_LIST_PREPEND(*reasons, g_strdup_printf("non-migratable device: %s", @@ -1276,7 +1285,7 @@ bool qemu_savevm_state_guest_unplug_pending(void) { SaveStateEntry *se; -QTAILQ_FOREACH(se, _state.handlers, entry) { +SAVEVM_FOREACH(se, entry) { if (se->vmsd && se->vmsd->dev_unplug_pending && se->vmsd->dev_unplug_pending(se->opaque)) { return true; @@ -1291,7 +1300,7 @@ int qemu_savevm_state_prepare(Error **errp) SaveStateEntry *se; int ret; -QTAILQ_FOREACH(se, _state.handlers, entry) { +SAVEVM_FOREACH(se, entry) { if (!se->ops || !se->ops->save_prepare) { continue; } @@ -1321,7 +1330,7 @@ int qemu_savevm_state_setup(QEMUFile *f, Error **errp) json_writer_start_array(ms->vmdesc, "devices"); trace_savevm_state_setup(); -QTAILQ_FOREACH(se, _state.handlers, entry) { +SAVEVM_FOREACH(se, entry) { if (se->vmsd && se->vmsd->early_setup) {
cpr-exec doc (was Re: [PATCH V1 00/26] Live update: cpr-exec)
On 4/29/2024 11:55 AM, Steve Sistare wrote: This patch series adds the live migration cpr-exec mode. Here is the text I plan to add to docs/devel/migration/CPR.rst. It is premature for me to submit this as a patch, because it includes all the functionality I plan to add in this and future series, but it may help you while reviewing this series. - Steve cpr-exec mode --- In this mode, QEMU stops the VM, writes VM state to the migration URI, and directly exec's a new version of QEMU on the same host, replacing the original process while retaining its PID. Guest RAM is preserved in place, albeit with new virtual addresses. The user completes the migration by specifying the ``-incoming`` option, and by issuing the ``migrate-incoming`` command if necessary; see details below. This mode supports vfio devices by preserving device descriptors and hence kernel state across the exec, even for devices that do not support live migration, and preserves tap and vhost descriptors. cpr-exec also preserves descriptors for a subset of chardevs, including socket, file, parallel, pipe, serial, pty, stdio, and null. chardevs that support cpr-exec have the QEMU_CHAR_FEATURE_CPR set in the Chardev object. The client side of a preserved chardev sees no loss of connectivity during cpr-exec. More chardevs could be preserved with additional developement. All chardevs have a ``reopen-on-cpr`` option which causes the chardev to be closed and reopened during cpr-exec. This can be set to allow cpr-exec when the configuration includes a chardev (such as vc) that does not have QEMU_CHAR_FEATURE_CPR. Because the old and new QEMU instances are not active concurrently, the URI cannot be a type that streams data from one instance to the other. Usage ^ Arguments for the new QEMU process are taken from the @cpr-exec-args parameter. The first argument should be the path of a new QEMU binary, or a prefix command that exec's the new QEMU binary, and the arguments should include the ''-incoming'' option. Memory backend objects must have the ``share=on`` attribute, and must be mmap'able in the new QEMU process. For example, memory-backend-file is acceptable, but memory-backend-ram is not. The VM must be started with the ``-machine memfd-alloc=on`` option. This causes implicit RAM blocks (those not explicitly described by a memory-backend object) to be allocated by mmap'ing a memfd. Examples include VGA, ROM, and even guest RAM when it is specified without without reference to a memory-backend object. Add the ``-only-migratable-modes cpr-exec`` option to guarantee that the configuration supports cpr-exec. QEMU will exit at start time if not. Outgoing: * Set the migration mode parameter to ``cpr-exec``. * Set the ``cpr-exec-args`` parameter. * Issue the ``migrate`` command. It is recommended the the URI be a ``file`` type, but one can use other types such as ``exec``, provided the command captures all the data from the outgoing side, and provides all the data to the incoming side. Incoming: * You do not need to explicitly start new QEMU. It is started as a side effect of the migrate command above. * If the VM was running when the outgoing ``migrate`` command was issued, then QEMU automatically resumes VM execution. Example 1: incoming URI ^^^ In these examples, we simply restart the same version of QEMU, but in a real scenario one would set a new QEMU binary path in cpr-exec-args. :: # qemu-kvm -monitor stdio -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m 4G -machine memfd-alloc=on ... QEMU 9.1.50 monitor - type 'help' for more information (qemu) info status VM status: running (qemu) migrate_set_parameter mode cpr-exec (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming file:vm.state (qemu) migrate -d file:vm.state (qemu) QEMU 9.1.50 monitor - type 'help' for more information (qemu) info status VM status: running Example 2: incoming defer ^ :: # qemu-kvm -monitor stdio -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m 4G -machine memfd-alloc=on ... QEMU 9.1.50 monitor - type 'help' for more information (qemu) info status VM status: running (qemu) migrate_set_parameter mode cpr-exec (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming defer (qemu) migrate -d file:vm.state (qemu) QEMU 9.1.50 monitor - type 'help' for more information (qemu) info status status: paused (inmigrate) (qemu) migrate_incoming file:vm.state (qemu) info status VM status: running Caveats ^^^ cpr-exec mode may not be used with postcopy, background-snapshot, or COLO. cpr-exec mode requires permission to use the exec system call, which is denied by certain sandbox options, such as spawn. Use finer grained
Re: [PATCH V1 20/26] migration: cpr-exec mode
On 5/2/2024 8:23 AM, Markus Armbruster wrote: Steve Sistare writes: Add the cpr-exec migration mode. Usage: qemu-system-$arch -machine memfd-alloc=on ... migrate_set_parameter mode cpr-exec migrate_set_parameter cpr-exec-args \ ... -incoming migrate -d The migrate command stops the VM, saves state to the URI, directly exec's a new version of QEMU on the same host, replacing the original process while retaining its PID, and loads state from the URI. Guest RAM is preserved in place, albeit with new virtual addresses. Arguments for the new QEMU process are taken from the @cpr-exec-args parameter. The first argument should be the path of a new QEMU binary, or a prefix command that exec's the new QEMU binary. Because old QEMU terminates when new QEMU starts, one cannot stream data between the two, so the URI must be a type, such as a file, that reads all data before old QEMU exits. Memory backend objects must have the share=on attribute, and must be mmap'able in the new QEMU process. For example, memory-backend-file is acceptable, but memory-backend-ram is not. The VM must be started with the '-machine memfd-alloc=on' option. This causes implicit ram blocks (those not explicitly described by a memory-backend object) to be allocated by mmap'ing a memfd. Examples include VGA, ROM, and even guest RAM when it is specified without a memory-backend object. The implementation saves precreate vmstate at the end of normal migration in migrate_fd_cleanup, and tells the main loop to call cpr_exec. Incoming qemu loads preceate state early, before objects are created. The memfds are kept open across exec by clearing the close-on-exec flag, their values are saved in precreate vmstate, and they are mmap'd in new qemu. Note that the memfd-alloc option is not related to memory-backend-memfd. Later patches add support for memory-backend-memfd, and for additional devices, including vfio, chardev, and more. Signed-off-by: Steve Sistare [...] diff --git a/qapi/migration.json b/qapi/migration.json index 49710e7..7c5f45f 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -665,9 +665,37 @@ # or COLO. # # (since 8.2) +# +# @cpr-exec: The migrate command stops the VM, saves state to the URI, +# directly exec's a new version of QEMU on the same host, +# replacing the original process while retaining its PID, and +# loads state from the URI. Guest RAM is preserved in place, +# albeit with new virtual addresses. Do you mean the virtual addresses of guest RAM may differ betwen old and new QEMU process? The VA at which a guest RAM segment is mapped in the QEMU process changes. The end user would not notice or care, so I'll drop that detail here. +# +# Arguments for the new QEMU process are taken from the +# @cpr-exec-args parameter. The first argument should be the +# path of a new QEMU binary, or a prefix command that exec's the +# new QEMU binary. What's a "prefix command"? A wrapper script, perhaps? A prefix command is any command of the form: command1 command1-args command2 command2-args where command1 performs some set up before exec'ing command2. However, I will drop the word "prefix", it adds no meaning here. +# +# Because old QEMU terminates when new QEMU starts, one cannot +# stream data between the two, so the URI must be a type, such as +# a file, that reads all data before old QEMU exits. What happens when you specify a URI that doesn't? Old QEMU will quietly block indefinitely writing to the URI. +# +# Memory backend objects must have the share=on attribute, and +# must be mmap'able in the new QEMU process. For example, +# memory-backend-file is acceptable, but memory-backend-ram is +# not. +# +# The VM must be started with the '-machine memfd-alloc=on' What happens when you don't? If '-only-migratable-modes cpr-exec' is specified, then QEMU will fail to start, and print a clear error message. Otherwise, a blocker is registered and any attempt to cpr-exec will fail with a clear error message. - Steve +# option. This causes implicit ram blocks -- those not explicitly +# described by a memory-backend object -- to be allocated by +# mmap'ing a memfd. Examples include VGA, ROM, and even guest +# RAM when it is specified without a memory-backend object. +# +# (since 9.1) ## { 'enum': 'MigMode', - 'data': [ 'normal', 'cpr-reboot' ] } + 'data': [ 'normal', 'cpr-reboot', 'cpr-exec' ] } ## # @ZeroPageDetection: [...]
Re: [PATCH V1 18/26] migration: cpr-exec-args parameter
On 5/2/2024 8:23 AM, Markus Armbruster wrote: Steve Sistare writes: Create the cpr-exec-args migration parameter, defined as a list of strings. It will be used for cpr-exec migration mode in a subsequent patch. No functional change, except that cpr-exec-args is shown by the 'info migrate' command. Signed-off-by: Steve Sistare [...] diff --git a/qapi/migration.json b/qapi/migration.json index 8c65b90..49710e7 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -914,6 +914,9 @@ # See description in @ZeroPageDetection. Default is 'multifd'. # (since 9.0) # +# @cpr-exec-args: Arguments passed to new QEMU for @cpr-exec mode. +#See @cpr-exec for details. (Since 9.1) +# You mean migration mode @cpr-exec, don't you? If yes, dangling reference until PATCH 20 adds it. Okay, but worth a mention in the commit message. Suggest "See MigMode @cpr-exec for details." Yes to all. Will update as you suggest. - Steve # Features: # # @deprecated: Member @block-incremental is deprecated. Use @@ -948,7 +951,8 @@ { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] }, 'vcpu-dirty-limit', 'mode', - 'zero-page-detection'] } + 'zero-page-detection', + 'cpr-exec-args'] } ## # @MigrateSetParameters: @@ -1122,6 +1126,9 @@ # See description in @ZeroPageDetection. Default is 'multifd'. # (since 9.0) # +# @cpr-exec-args: Arguments passed to new QEMU for @cpr-exec mode. +#See @cpr-exec for details. (Since 9.1) +# # Features: # # @deprecated: Member @block-incremental is deprecated. Use @@ -1176,7 +1183,8 @@ 'features': [ 'unstable' ] }, '*vcpu-dirty-limit': 'uint64', '*mode': 'MigMode', -'*zero-page-detection': 'ZeroPageDetection'} } +'*zero-page-detection': 'ZeroPageDetection', +'*cpr-exec-args': [ 'str' ]} } ## # @migrate-set-parameters: @@ -1354,6 +1362,9 @@ # See description in @ZeroPageDetection. Default is 'multifd'. # (since 9.0) # +# @cpr-exec-args: Arguments passed to new QEMU for @cpr-exec mode. +#See @cpr-exec for details. (Since 9.1) +# # Features: # # @deprecated: Member @block-incremental is deprecated. Use @@ -1405,7 +1416,8 @@ 'features': [ 'unstable' ] }, '*vcpu-dirty-limit': 'uint64', '*mode': 'MigMode', -'*zero-page-detection': 'ZeroPageDetection'} } +'*zero-page-detection': 'ZeroPageDetection', +'*cpr-exec-args': [ 'str' ]} } ## # @query-migrate-parameters:
Re: [PATCH V4 10/14] migration: stop vm for cpr
On 2/29/2024 8:28 PM, Peter Xu wrote: On Thu, Feb 29, 2024 at 10:21:14AM -0500, Steven Sistare wrote: On 2/25/2024 9:08 PM, Peter Xu wrote: On Thu, Feb 22, 2024 at 09:28:36AM -0800, Steve Sistare wrote: When migration for cpr is initiated, stop the vm and set state RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the possibility of ram and device state being out of sync, and guarantees that a guest in the suspended state remains suspended, because qmp_cont rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. Signed-off-by: Steve Sistare Reviewed-by: Peter Xu cpr-reboot mode keeps changing behavior. Could we declare it "experimental" until it's solid? Maybe a patch to document this? Normally IMHO we shouldn't merge a feature if it's not complete, however cpr-reboot is so special that the mode itself is already merged in 8.2 before I started to merge patches, and it keeps changing things. I don't know what else we can do here besides declaring it experimental and not declare it a stable feature. Hi Peter, the planned/committed functionality for cpr-reboot changed only once, in: migration: stop vm for cpr Suspension to support vfio is an enhancement which adds to the basic functionality, it does not change it. This was planned all along, but submitted as a separate If VFIO used to migrate without suspension and now it won't, it's a behavior change? VFIO could not cpr-reboot migrate without suspension. The existing vfio migration blockers applied to all modes: Error: :3a:10.0: VFIO migration is not supported in kernel Now, with suspension, it will. An addition, not a change. series to manage complexity, as I outlined in my qemu community presentation, which I emailed you at the time. Other "changes" that arose during review were just clarifications and explanations. So, I don't think cpr-reboot deserves to be condemned to experimental limbo. IMHO it's not about a feature being condemned, it's about a kindful heads-up to the users that one needs to take risk on using it until it becomes stable, it also makes developers easier because of no limitation on behavior change. If all the changes are landing, then there's no need for such a patch. If so, please propose the planned complete document patch. I had a feeling that cpr is still not fully understood by even many developers on the list. It'll be great that such document will contain all the details one needs to know on the feature, etc. meaning of the name cpr-reboot (what is "cpr"), when to use it, how to use it, how it differs from "normal" mode (etc. lifted limitations on some devices that used to block migration?), what is enforced (vm stop, suspension, etc.) and what is optionally offered (VFIO, shared-mem, etc.), the expected behaviors, etc. When send it, please copy relevant people (Alex & Cedric for VFIO, while Markus could also be a good candidate considering the QMP involvement). Submitted. - Steve
Re: [PATCH V2 00/11] privatize migration.h
On 3/11/2024 4:28 PM, Peter Xu wrote: On Mon, Mar 11, 2024 at 04:24:14PM -0400, Steven Sistare wrote: On 3/11/2024 3:45 PM, Steven Sistare wrote: On 3/11/2024 3:30 PM, Peter Xu wrote: Steve, On Mon, Mar 11, 2024 at 10:48:47AM -0700, Steve Sistare wrote: Changes in V2: * rebase to migration-next, add RB Not apply even to master branch. Note that there're >=1 PULLs sent and merged since my last reply.. Perhaps you rebased to the "old" next? I pulled from branch migration-next in https://gitlab.com/peterx/qemu a few hours ago, but I must have screwed up somewhere. I'll figure it out and post a V4. My pull was a fiew hours old, but my patches still apply cleanly to the most recent tip: a1bb5dd169f4 ("migration: Fix format in error message") I can sent that as V3, but ... Note that you must apply "migration: export fewer options" before "privatize migration.h". If that does not help, I will send V3. Ouch, I forgot that dependency... Sorry. Yeah it works now. No need to resend for now. Great! - steve
Re: [PATCH V2 00/11] privatize migration.h
On 3/11/2024 3:45 PM, Steven Sistare wrote: On 3/11/2024 3:30 PM, Peter Xu wrote: Steve, On Mon, Mar 11, 2024 at 10:48:47AM -0700, Steve Sistare wrote: Changes in V2: * rebase to migration-next, add RB Not apply even to master branch. Note that there're >=1 PULLs sent and merged since my last reply.. Perhaps you rebased to the "old" next? I pulled from branch migration-next in https://gitlab.com/peterx/qemu a few hours ago, but I must have screwed up somewhere. I'll figure it out and post a V4. My pull was a fiew hours old, but my patches still apply cleanly to the most recent tip: a1bb5dd169f4 ("migration: Fix format in error message") I can sent that as V3, but ... Note that you must apply "migration: export fewer options" before "privatize migration.h". If that does not help, I will send V3. - Steve
Re: [PATCH V2 00/11] privatize migration.h
On 3/11/2024 3:30 PM, Peter Xu wrote: Steve, On Mon, Mar 11, 2024 at 10:48:47AM -0700, Steve Sistare wrote: Changes in V2: * rebase to migration-next, add RB Not apply even to master branch. Note that there're >=1 PULLs sent and merged since my last reply.. Perhaps you rebased to the "old" next? I pulled from branch migration-next in https://gitlab.com/peterx/qemu a few hours ago, but I must have screwed up somewhere. I'll figure it out and post a V4. - Steve
Re: [PATCH V2] migration: export fewer options
On 3/1/2024 2:47 AM, Peter Xu wrote: On Thu, Feb 29, 2024 at 07:03:36AM +0100, Markus Armbruster wrote: Steven Sistare writes: Just a reminder, after our further discussion in the V1 thread, this patch is still what I propose, no updates needed. Markus, I think Peter is looking for your blessing on the new file name: include/migration/client-options.h. Not my preference, but no objection. There's yet one alternative, which is to put these exported option functions into misc.h directly. After all that's not so much, and misc.h already hold random stuff from elsewhere. Steve, would you repost this patch (with/without my above comment taken) along with your other series with a rebase to migration-next? It doesn't apply there. Re the other series: one nitpick comment on the last patch, where you may consider splitting the removal of the unused 2 functions into a standalone patch. Other than that it looks good to me. https://gitlab.com/peterx/qemu/-/tree/migration-next Both are rebased and reposted, thanks - steve
Re: [PATCH V4 10/14] migration: stop vm for cpr
On 2/25/2024 9:08 PM, Peter Xu wrote: > On Thu, Feb 22, 2024 at 09:28:36AM -0800, Steve Sistare wrote: >> When migration for cpr is initiated, stop the vm and set state >> RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the >> possibility of ram and device state being out of sync, and guarantees >> that a guest in the suspended state remains suspended, because qmp_cont >> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. >> >> Signed-off-by: Steve Sistare > > Reviewed-by: Peter Xu > > cpr-reboot mode keeps changing behavior. > > Could we declare it "experimental" until it's solid? Maybe a patch to > document this? > > Normally IMHO we shouldn't merge a feature if it's not complete, however > cpr-reboot is so special that the mode itself is already merged in 8.2 > before I started to merge patches, and it keeps changing things. I don't > know what else we can do here besides declaring it experimental and not > declare it a stable feature. Hi Peter, the planned/committed functionality for cpr-reboot changed only once, in: migration: stop vm for cpr Suspension to support vfio is an enhancement which adds to the basic functionality, it does not change it. This was planned all along, but submitted as a separate series to manage complexity, as I outlined in my qemu community presentation, which I emailed you at the time. Other "changes" that arose during review were just clarifications and explanations. So, I don't think cpr-reboot deserves to be condemned to experimental limbo. - Steve
Re: [PATCH V4 14/14] migration: options incompatible with cpr
On 2/29/2024 12:40 AM, Peter Xu wrote: > On Thu, Feb 29, 2024 at 06:31:26AM +0100, Markus Armbruster wrote: >> Hmm, perhaps Peter can still squash in the corrections before posting >> his PR. Peter? > > The PR was sent yesterday, it's already in PeterM's -staging. I worry it's > a bit late to call a stop now. > > https://lore.kernel.org/qemu-devel/20240228051315.400759-23-pet...@redhat.com > > Steve, could you provide a standalone patch for the update? Then I'll do > the best accordingly (e.g. if the PR failed to apply I'll squash when > resend v2; or I'll pick it up for the next). I sent the patch. I also re-wrapped all cpr-reboot paragraphs to 70 columns and addressed Markus' other comments on "migration: update cpr-reboot description". Peter, if you squash it, the last sentence "cpr-reboot may not be used ..." squashes into migration: options incompatible with cpr and everything else squashes into migration: update cpr-reboot description - Steve
Re: [PATCH] migration: re-format cpr-reboot documentation
Please ignore this, I will send a V2 that incorporates additional comments from Markus that I missed in my inbox. - Steve On 2/29/2024 9:15 AM, Steve Sistare wrote: > Re-wrap the cpr-reboot documentation to 70 columns, use '@' for > cpr-reboot references, and capitalize COLO. > > Suggested-by: Markus Armbruster > Signed-off-by: Steve Sistare > --- > qapi/migration.json | 36 +++- > 1 file changed, 19 insertions(+), 17 deletions(-) > > diff --git a/qapi/migration.json b/qapi/migration.json > index c6bfe2e..b69f62a 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -636,28 +636,30 @@ > # > # @normal: the original form of migration. (since 8.2) > # > -# @cpr-reboot: The migrate command stops the VM and saves state to the URI. > -# After quitting qemu, the user resumes by running qemu -incoming. > +# @cpr-reboot: The migrate command stops the VM and saves state to > +# the URI. After quitting qemu, the user resumes by running > +# qemu -incoming. > # > -# This mode allows the user to quit qemu, and restart an updated version > -# of qemu. The user may even update and reboot the OS before restarting, > -# as long as the URI persists across a reboot. > +# This mode allows the user to quit qemu and restart an updated > +# version of qemu. The user may even update and reboot the OS > +# before restarting, as long as the URI persists across a reboot. > # > -# Unlike normal mode, the use of certain local storage options does not > -# block the migration, but the user must not modify guest block devices > -# between the quit and restart. > +# Unlike normal mode, the use of certain local storage options > +# does not block the migration, but the user must not modify guest > +# block devices between the quit and restart. > # > -# This mode supports vfio devices provided the user first puts the guest > -# in the suspended runstate, such as by issuing guest-suspend-ram to the > -# qemu guest agent. > +# This mode supports vfio devices provided the user first puts > +# the guest in the suspended runstate, such as by issuing > +# guest-suspend-ram to the qemu guest agent. > # > -# Best performance is achieved when the memory backend is shared and the > -# @x-ignore-shared migration capability is set, but this is not required. > -# Further, if the user reboots before restarting such a configuration, > the > -# shared backend must be be non-volatile across reboot, such as by > backing > -# it with a dax device. > +# Best performance is achieved when the memory backend is shared > +# and the @x-ignore-shared migration capability is set, but this > +# is not required. Further, if the user reboots before restarting > +# such a configuration, the shared backend must be be non-volatile > +# across reboot, such as by backing it with a dax device. > # > -# cpr-reboot may not be used with postcopy, colo, or background-snapshot. > +# @cpr-reboot may not be used with postcopy, background-snapshot, > +# or COLO. > # > # (since 8.2) > ##
Re: [PATCH V4 11/14] vfio: register container for cpr
On 2/29/2024 3:35 AM, Cédric Le Goater wrote: > Hello Steve, > > On 2/22/24 18:28, Steve Sistare wrote: >> Define entry points to perform per-container cpr-specific initialization >> and teardown. >> >> Signed-off-by: Steve Sistare >> --- >> hw/vfio/container.c | 11 ++- >> hw/vfio/cpr.c | 19 +++ >> hw/vfio/iommufd.c | 6 ++ >> hw/vfio/meson.build | 1 + >> include/hw/vfio/vfio-common.h | 3 +++ >> 5 files changed, 39 insertions(+), 1 deletion(-) >> create mode 100644 hw/vfio/cpr.c >> >> diff --git a/hw/vfio/container.c b/hw/vfio/container.c >> index bd25b9f..096d77e 100644 >> --- a/hw/vfio/container.c >> +++ b/hw/vfio/container.c >> @@ -621,10 +621,15 @@ static int vfio_connect_container(VFIOGroup *group, >> AddressSpace *as, >> goto free_container_exit; >> } >> + ret = vfio_cpr_register_container(bcontainer, errp); >> + if (ret) { >> + goto free_container_exit; >> + } >> + >> ret = vfio_ram_block_discard_disable(container, true); >> if (ret) { >> error_setg_errno(errp, -ret, "Cannot set discarding of RAM >> broken"); >> - goto free_container_exit; >> + goto unregister_container_exit; >> } >> assert(bcontainer->ops->setup); >> @@ -667,6 +672,9 @@ listener_release_exit: >> enable_discards_exit: >> vfio_ram_block_discard_disable(container, false); >> +unregister_container_exit: >> + vfio_cpr_unregister_container(bcontainer); >> + >> free_container_exit: >> g_free(container); >> @@ -710,6 +718,7 @@ static void vfio_disconnect_container(VFIOGroup *group) >> vfio_container_destroy(bcontainer); >> trace_vfio_disconnect_container(container->fd); >> + vfio_cpr_unregister_container(bcontainer); >> close(container->fd); >> g_free(container); >> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c >> new file mode 100644 >> index 000..3bede54 >> --- /dev/null >> +++ b/hw/vfio/cpr.c >> @@ -0,0 +1,19 @@ >> +/* >> + * Copyright (c) 2021-2024 Oracle and/or its affiliates. >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2 or later. >> + * See the COPYING file in the top-level directory. >> + */ >> + >> +#include "qemu/osdep.h" >> +#include "hw/vfio/vfio-common.h" >> +#include "qapi/error.h" >> + >> +int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp) >> +{ >> + return 0; >> +} >> + >> +void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer) >> +{ >> +} >> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c >> index 9bfddc1..e1be224 100644 >> --- a/hw/vfio/iommufd.c >> +++ b/hw/vfio/iommufd.c >> @@ -411,6 +411,11 @@ found_container: >> goto err_listener_register; >> } >> + ret = vfio_cpr_register_container(bcontainer, errp); >> + if (ret) { >> + goto err_listener_register; >> + } >> + >> /* >> * TODO: examine RAM_BLOCK_DISCARD stuff, should we do group level >> * for discarding incompatibility check as well? >> @@ -461,6 +466,7 @@ static void iommufd_cdev_detach(VFIODevice *vbasedev) >> iommufd_cdev_ram_block_discard_disable(false); >> } >> + vfio_cpr_unregister_container(bcontainer); >> iommufd_cdev_detach_container(vbasedev, container); >> iommufd_cdev_container_destroy(container); >> vfio_put_address_space(space); >> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build >> index bb98493..bba776f 100644 >> --- a/hw/vfio/meson.build >> +++ b/hw/vfio/meson.build >> @@ -5,6 +5,7 @@ vfio_ss.add(files( >> 'container-base.c', >> 'container.c', >> 'migration.c', >> + 'cpr.c', >> )) >> vfio_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr.c')) >> vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files( >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h >> index 4a6c262..b9da6c0 100644 >> --- a/include/hw/vfio/vfio-common.h >> +++ b/include/hw/vfio/vfio-common.h >> @@ -205,6 +205,9 @@ void vfio_detach_device(VFIODevice *vbasedev); >> int vfio_kvm_device_add_fd(int fd, Error **errp); >> int vfio_kvm_device_del_fd(int fd, Error **errp); >> +int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error >> **errp); > > Should we return bool since we have an errp ? the returned value > is not an errno AFAICT. > > Anyhow, > > Reviewed-by: Cédric Le Goater Hi Cedric, vfio_cpr_register_container sometimes returns the value returned from migrate_add_blocker_modes, which is an errno. Thanks for reviewing these! - Steve >> +void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer); >> + >> extern const MemoryRegionOps vfio_region_ops; >> typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList; >> typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList; >
Re: [PATCH V2] migration: export fewer options
Just a reminder, after our further discussion in the V1 thread, this patch is still what I propose, no updates needed. Markus, I think Peter is looking for your blessing on the new file name: include/migration/client-options.h. - Steve On 2/26/2024 5:16 PM, Steve Sistare wrote: > A small number of migration options are accessed by migration clients, > but to see them clients must include all of options.h, which is mostly > for migration core code. migrate_mode() in particular will be needed by > multiple clients. > > Refactor the option declarations so clients can see the necessary few via > misc.h, which already exports a portion of the client API. > > Signed-off-by: Steve Sistare > --- > Changes in V2: > * renamed options-pub.h to client-options.h > --- > --- > hw/vfio/migration.c| 1 - > hw/virtio/virtio-balloon.c | 1 - > include/migration/client-options.h | 24 > include/migration/misc.h | 1 + > migration/options.h| 6 +- > system/dirtylimit.c| 1 - > 6 files changed, 26 insertions(+), 8 deletions(-) > create mode 100644 include/migration/client-options.h > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index 50140ed..5d4a23c 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -18,7 +18,6 @@ > #include "sysemu/runstate.h" > #include "hw/vfio/vfio-common.h" > #include "migration/migration.h" > -#include "migration/options.h" > #include "migration/savevm.h" > #include "migration/vmstate.h" > #include "migration/qemu-file.h" > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > index 89f853f..a59ff17 100644 > --- a/hw/virtio/virtio-balloon.c > +++ b/hw/virtio/virtio-balloon.c > @@ -32,7 +32,6 @@ > #include "qemu/error-report.h" > #include "migration/misc.h" > #include "migration/migration.h" > -#include "migration/options.h" > > #include "hw/virtio/virtio-bus.h" > #include "hw/virtio/virtio-access.h" > diff --git a/include/migration/client-options.h > b/include/migration/client-options.h > new file mode 100644 > index 000..887fea1 > --- /dev/null > +++ b/include/migration/client-options.h > @@ -0,0 +1,24 @@ > +/* > + * QEMU public migration capabilities > + * > + * Copyright (c) 2012-2023 Red Hat Inc > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory. > + */ > + > +#ifndef QEMU_MIGRATION_CLIENT_OPTIONS_H > +#define QEMU_MIGRATION_CLIENT_OPTIONS_H > + > +/* capabilities */ > + > +bool migrate_background_snapshot(void); > +bool migrate_dirty_limit(void); > +bool migrate_postcopy_ram(void); > +bool migrate_switchover_ack(void); > + > +/* parameters */ > + > +MigMode migrate_mode(void); > + > +#endif > diff --git a/include/migration/misc.h b/include/migration/misc.h > index 5d1aa59..4c226a4 100644 > --- a/include/migration/misc.h > +++ b/include/migration/misc.h > @@ -17,6 +17,7 @@ > #include "qemu/notify.h" > #include "qapi/qapi-types-migration.h" > #include "qapi/qapi-types-net.h" > +#include "migration/client-options.h" > > /* migration/ram.c */ > > diff --git a/migration/options.h b/migration/options.h > index 246c160..964ebdd 100644 > --- a/migration/options.h > +++ b/migration/options.h > @@ -16,6 +16,7 @@ > > #include "hw/qdev-properties.h" > #include "hw/qdev-properties-system.h" > +#include "migration/client-options.h" > > /* migration properties */ > > @@ -24,12 +25,10 @@ extern Property migration_properties[]; > /* capabilities */ > > bool migrate_auto_converge(void); > -bool migrate_background_snapshot(void); > bool migrate_block(void); > bool migrate_colo(void); > bool migrate_compress(void); > bool migrate_dirty_bitmaps(void); > -bool migrate_dirty_limit(void); > bool migrate_events(void); > bool migrate_ignore_shared(void); > bool migrate_late_block_activate(void); > @@ -37,11 +36,9 @@ bool migrate_multifd(void); > bool migrate_pause_before_switchover(void); > bool migrate_postcopy_blocktime(void); > bool migrate_postcopy_preempt(void); > -bool migrate_postcopy_ram(void); > bool migrate_rdma_pin_all(void); > bool migrate_release_ram(void); > bool migrate_return_path(void); > -bool migrate_switchover_ack(void); > bool migrate_validate_uuid(void); > bool migrate_xbzrle(void); > bool migrate_zero_blocks(void); > @@ -83,7 +80,6 @@ uint8_t migrate_max_cpu_throttle(void); > uint64_t migrate_max_bandwidth(void); > uint64_t migrate_avail_switchover_bandwidth(void); > uint64_t migrate_max_postcopy_bandwidth(void); > -MigMode migrate_mode(void); > int migrate_multifd_channels(void); > MultiFDCompression migrate_multifd_compression(void); > int migrate_multifd_zlib_level(void); > diff --git a/system/dirtylimit.c b/system/dirtylimit.c > index b5607eb..774ff44 100644 > --- a/system/dirtylimit.c > +++ b/system/dirtylimit.c > @@ -26,7 +26,6 @@ > #include "trace.h" > #include "migration/misc.h" >
Re: [PATCH V4 14/14] migration: options incompatible with cpr
On 2/28/2024 11:05 AM, Markus Armbruster wrote: > Steven Sistare writes: > >> On 2/28/2024 2:21 AM, Markus Armbruster wrote: >>> Steve Sistare writes: >>> >>>> Fail the migration request if options are set that are incompatible >>>> with cpr. >>>> >>>> Signed-off-by: Steve Sistare > > [...] > >>>> diff --git a/qapi/migration.json b/qapi/migration.json >>>> index 0990297..c6bfe2e 100644 >>>> --- a/qapi/migration.json >>>> +++ b/qapi/migration.json >>>> @@ -657,6 +657,8 @@ >>>> # shared backend must be be non-volatile across reboot, such as by >>>> backing >>>> # it with a dax device. >>>> # >>>> +# cpr-reboot may not be used with postcopy, colo, or >>>> background-snapshot. >>>> +# >>> >>> @cpr-reboot >>> >>> COLO >>> >>> Wrap the line: >>> >>># @cpr-reboot may not be used with postcopy, COLO, or >>># background-snapshot. >>> >>> This doesn't tell the reader what settings exactly do not work with >>> @cpr-reboot. >>> >>> For instance "background-snapshot" is about enabling migration >>> capability @background-snapshot. We could write something like "is >>> incompatible with enabling migration capability @background-snapshot". >>> >>> Same for the other two. Worthwhile? >> >> I initially was more precise exactly as you suggest, but I realized that >> postcopy encompasses multiple capabilities, and I did not want to enumerate >> them, nor require new capabilities related to these 3 to be listed here >> if/when they are created, so I generalized. A keyword search in this file >> gives unambiguous matches. >> >> Peter pushed the patch a few hours before you sent this. > > Okay. > > A followup patch to correct @cpr-reboot, COLO and line wrapping would be > nice. Will do - steve
Re: [PATCH V4 14/14] migration: options incompatible with cpr
On 2/28/2024 2:21 AM, Markus Armbruster wrote: > Steve Sistare writes: > >> Fail the migration request if options are set that are incompatible >> with cpr. >> >> Signed-off-by: Steve Sistare >> --- >> migration/migration.c | 17 + >> qapi/migration.json | 2 ++ >> 2 files changed, 19 insertions(+) >> >> diff --git a/migration/migration.c b/migration/migration.c >> index 90a9094..7652fd4 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -1953,6 +1953,23 @@ static bool migrate_prepare(MigrationState *s, bool >> blk, bool blk_inc, >> return false; >> } >> >> +if (migrate_mode_is_cpr(s)) { >> +const char *conflict = NULL; >> + >> +if (migrate_postcopy()) { >> +conflict = "postcopy"; >> +} else if (migrate_background_snapshot()) { >> +conflict = "background snapshot"; >> +} else if (migrate_colo()) { >> +conflict = "COLO"; >> +} >> + >> +if (conflict) { >> +error_setg(errp, "Cannot use %s with CPR", conflict); >> +return false; >> +} >> +} >> + >> if (blk || blk_inc) { >> if (migrate_colo()) { >> error_setg(errp, "No disk migration is required in COLO mode"); >> diff --git a/qapi/migration.json b/qapi/migration.json >> index 0990297..c6bfe2e 100644 >> --- a/qapi/migration.json >> +++ b/qapi/migration.json >> @@ -657,6 +657,8 @@ >> # shared backend must be be non-volatile across reboot, such as by >> backing >> # it with a dax device. >> # >> +# cpr-reboot may not be used with postcopy, colo, or >> background-snapshot. >> +# > > @cpr-reboot > > COLO > > Wrap the line: > ># @cpr-reboot may not be used with postcopy, COLO, or ># background-snapshot. > > This doesn't tell the reader what settings exactly do not work with > @cpr-reboot. > > For instance "background-snapshot" is about enabling migration > capability @background-snapshot. We could write something like "is > incompatible with enabling migration capability @background-snapshot". > > Same for the other two. Worthwhile? I initially was more precise exactly as you suggest, but I realized that postcopy encompasses multiple capabilities, and I did not want to enumerate them, nor require new capabilities related to these 3 to be listed here if/when they are created, so I generalized. A keyword search in this file gives unambiguous matches. Peter pushed the patch a few hours before you sent this. - Steve
Re: [PATCH V1 00/10] privatize migration.h
Please ignore this solo message, I accidentally sent it alone - steve On 2/27/2024 12:42 PM, Steve Sistare wrote: > migration/migration.h is the private interface for code in the migration > sub-directory, but many other clients include it because they need accessors > that are not exported by the publc interface in include/migration/misc.h. > Fix that by refactoring accessors and defining new ones as needed. > > After these fixes, no code outside of migration includes migration.h, > and no code outside of migration uses MigrationState. > > This series depends on the following: > * migration patches in the series "allow cpr-reboot for vfio" > * singleton patch "migration: export fewer options" > > Steve Sistare (10): > migration: remove migration.h references > migration: export migration_is_setup_or_active > migration: export migration_is_active > migration: export migration_is_running > migration: export vcpu_dirty_limit_period > migration: migration_thread_is_self > migration: migration_is_device > migration: migration_file_set_error > migration: privatize colo interfaces > migration: purge MigrationState from public interface > > hw/vfio/common.c | 17 +++--- > hw/vfio/container.c| 1 - > hw/vfio/migration.c| 11 ++- > hw/virtio/vhost-user.c | 1 - > hw/virtio/virtio-balloon.c | 1 - > include/migration/client-options.h | 1 + > include/migration/misc.h | 17 +- > migration/colo.c | 17 ++ > migration/migration.c | 67 > -- > migration/migration.h | 7 ++-- > migration/options.c| 11 +-- > migration/ram.c| 5 ++- > migration/savevm.c | 2 +- > net/colo-compare.c | 3 +- > net/vhost-vdpa.c | 3 +- > stubs/colo.c | 1 - > system/dirtylimit.c| 12 +++ > system/qdev-monitor.c | 1 - > target/loongarch/kvm/kvm.c | 1 - > target/riscv/kvm/kvm-cpu.c | 4 +-- > tests/unit/test-vmstate.c | 1 - > 21 files changed, 96 insertions(+), 88 deletions(-) >
Re: [PATCH v7 0/3] string list functions
All the changes look good - steve On 2/27/2024 10:33 AM, Markus Armbruster wrote: > This is Steve's work (v1 to v5), tweaked by Philippe (v6), and now by > me. > > v7: > * Old PATCH 1 dropped > The proposed str_split() is a composition of g_strsplit() and a > conversion from char ** to strList *. I'm willing to accept a > conversion function str_list_from_strv() next to > strv_from_str_list(). I doubt having a function for the composition > is worth the trouble. > * Old PATCH 4 dropped > The tests mostly test g_strsplit(). I'm willing to accept focused > tests for strv_from_str_list() and str_list_from_strv(). > * PATCH 1-3: Commit messages tweaked > * PATCH 2: strv_from_strList() renamed strv_from_str_list(), and moved > to qapi/qapi-type-helpers.c. Function comment tweaked. Local > variable @argv renamed to @strv. > * PATCH 3: Adjust for the rename. > > Steve Sistare (3): > qapi: New QAPI_LIST_LENGTH() > qapi: New strv_from_str_list() > migration: simplify exec migration functions > > include/qapi/type-helpers.h | 8 ++ > include/qapi/util.h | 13 + > migration/exec.c| 57 ++--- > qapi/qapi-type-helpers.c| 14 + > 4 files changed, 43 insertions(+), 49 deletions(-) >
Re: [PATCH v6 0/5] string list functions
On 2/27/2024 10:28 AM, Markus Armbruster wrote: > Philippe Mathieu-Daudé writes: > >> Hi Markus, >> >> Here are the patches I queued until you told me you'd >> object to the CamelCase filename strList.[ch]. >> >> Steve, please take over ;) > > I'm going to post the part of the series I'm ready to queue as v7, with > minor modifications: > > * Rename strv_from_strList() to strv_from_str_list(), and put it into > qapi/qapi-type-helpers.c. > > * Tweak its function comment. > > * Rename its local variable @argv to @strv. > > * Cosmetic commit message tweaks. > > We can then talk about the remainder. Thanks Markus, that saves some time and email. - Steve
Re: [PATCH V1] migration: export fewer options
On 2/27/2024 3:08 AM, Peter Xu wrote: > On Mon, Feb 26, 2024 at 09:41:15AM -0500, Steven Sistare wrote: >> On 2/26/2024 2:40 AM, Markus Armbruster wrote: >>> Steve Sistare writes: >>> >>>> A small number of migration options are accessed by migration clients, >>>> but to see them clients must include all of options.h, which is mostly >>>> for migration core code. migrate_mode() in particular will be needed by >>>> multiple clients. >>>> >>>> Refactor the option declarations so clients can see the necessary few via >>>> misc.h, which already exports a portion of the client API. >>>> >>>> Signed-off-by: Steve Sistare >>>> --- >>>> I suggest that eventually we should define a single file migration/client.h >>>> which exports everything needed by the simpler clients: blockers, >>>> notifiers, >>>> options, cpr, and state accessors. >>>> --- >>>> --- >>>> hw/vfio/migration.c | 1 - >>>> hw/virtio/virtio-balloon.c | 1 - >>>> include/migration/misc.h| 1 + >>>> include/migration/options-pub.h | 24 >>>> migration/options.h | 6 +- >>> >>> Unusual naming. We have zero headers named -pub.h or -public.h, and >>> dozens named like -int.h or -internal.h. Please stick to the existing >>> convention. >> >> In the spirit of minimizing changes, I went that route to avoid renaming the >> existing migration/options.h and its references: >> >> 0 migration/block-dirty-bit 82 #include "options.h" >> 1 migration/block.c 32 #include "options.h" >> 2 migration/colo.c 37 #include "options.h" >> 3 migration/migration-hmp-c 35 #include "options.h" >> 4 migration/migration.c 68 #include "options.h" >> 5 migration/multifd-zlib.c 21 #include "options.h" >> 6 migration/multifd-zstd.c 21 #include "options.h" >> 7 migration/multifd.c 29 #include "options.h" >> 8 migration/options.c 30 #include "options.h" >> 9 migration/postcopy-ram.c 40 #include "options.h" >> a migration/qemu-file.c 33 #include "options.h" >> b migration/ram-compress.c 37 #include "options.h" >> c migration/ram.c 63 #include "options.h" >> d migration/rdma.c 40 #include "options.h" >> e migration/savevm.c71 #include "options.h" >> f migration/socket.c30 #include "options.h" >> g migration/tls.c 25 #include "options.h" >> >> But I take your point. >> >> Peter, which do you prefer? > > From statistics, "-internal.h" wins "-int.h": > > $ git grep "\-internal.h" | wc -l > 135 > $ git grep "\-int.h" | wc -l > 3 Yes, I did the same search to choose the name for option A below. But in option B, I keep the existing name for the private file, and choose a new name for the public file. There is no suffix in use in qemu to denote a public file; we just use names indicating the functionality, so I chose client-options.h. Let's use client-options.h and move on. I am preparing another cleanup series that I think you will like. - Steve >> A. rename: migration/options.h -> migration/options-internal.h >> rename: include/migration/options-pub.h -> include/migration/options.h >> >> B. rename: include/migration/options.h -> >> include/migration/client-options.h >> >> I prefer B. If you prefer B, but want a different file name, please choose >> the >> final name. > > Personally I don't have a strong opinion on the name. I'll see whether > Markus has any comment. > > [and of course, I removed this patch from -staging queue to keep the > discussion going..] >
Re: [PATCH V4 00/14] allow cpr-reboot for vfio
On 2/26/2024 3:21 PM, Steven Sistare wrote: > On 2/26/2024 4:01 AM, Peter Xu wrote: >> On Mon, Feb 26, 2024 at 09:49:46AM +0100, Cédric Le Goater wrote: >>> Go ahead. It will help me for the changes I am doing on error reporting >>> for VFIO migration. I will rebase on top. >> >> Thanks for confirming. I queued the migration patches then, but leave the >> two vfio one for further discussion. > > Very good, thanks. I am always happy to move the ball a few yards closer to > the goal line :) Peter, beware that patch 3 needs an edit before being queued. This hunk snuck in and should be deleted: [PATCH V4 03/14] migration: convert to NotifierWithReturn diff --git a/roms/seabios-hppa b/roms/seabios-hppa index 03774ed..e4eac85 16 --- a/roms/seabios-hppa +++ b/roms/seabios-hppa @@ -1 +1 @@ -Subproject commit 03774edaad3bfae090ac96ca5450353c641637d1 +Subproject commit e4eac85880e8677f96d8b9e94de9f2eec9c0751f - Steve
Re: [PATCH V4 00/14] allow cpr-reboot for vfio
On 2/26/2024 4:01 AM, Peter Xu wrote: > On Mon, Feb 26, 2024 at 09:49:46AM +0100, Cédric Le Goater wrote: >> Go ahead. It will help me for the changes I am doing on error reporting >> for VFIO migration. I will rebase on top. > > Thanks for confirming. I queued the migration patches then, but leave the > two vfio one for further discussion. Very good, thanks. I am always happy to move the ball a few yards closer to the goal line :) - Steve
Re: [PATCH V1] migration: export fewer options
On 2/25/2024 9:40 PM, Peter Xu wrote: > On Fri, Feb 23, 2024 at 09:13:24AM -0800, Steve Sistare wrote: >> A small number of migration options are accessed by migration clients, >> but to see them clients must include all of options.h, which is mostly >> for migration core code. migrate_mode() in particular will be needed by >> multiple clients. >> >> Refactor the option declarations so clients can see the necessary few via >> misc.h, which already exports a portion of the client API. >> >> Signed-off-by: Steve Sistare > > Sounds reasonable, queued, thanks. > >> --- >> I suggest that eventually we should define a single file migration/client.h >> which exports everything needed by the simpler clients: blockers, notifiers, >> options, cpr, and state accessors. > > What's the difference v.s. current migration/misc.h? This file would be sufficient for most clients: diff --git a/include/migration/client.h b/include/migration/client.h new file mode 100644 index 000..a55e504 --- /dev/null +++ b/include/migration/client.h @@ -0,0 +1,6 @@ +#ifndef MIGRATION_CLIENT_H +#define MIGRATION_CLIENT_H +#include "migration/misc.h" +#include "migration/blocker.h" +#include "migration/client-options.h" +#endif Or, we could rename misc.h -> client.h and include blocker.h and client-options.h in it. I just like the idea that most clients could include a single, obviously named file to use the most-common exported API. "misc.h" is not obvious, and not complete. - Steve
Re: [PATCH v6 0/5] string list functions
Thanks for trying a V6 Philippe. I'll take it from here. It helps to know that someone besides me thinks these functions are worth having. - Steve On 2/26/2024 9:11 AM, Philippe Mathieu-Daudé wrote: > Hi Markus, > > Here are the patches I queued until you told me you'd > object to the CamelCase filename strList.[ch]. > > Steve, please take over ;) > > Since v5: > - Cover files in MAINTAINERS > - Complete @docstring mentioning g_auto. > > v5: > https://lore.kernel.org/qemu-devel/1708638470-114846-3-git-send-email-steven.sist...@oracle.com/ > > Steve Sistare (5): > util: str_split > qapi: QAPI_LIST_LENGTH > util: strv_from_strList > util: strList unit tests > migration: simplify exec migration functions > > MAINTAINERS | 2 + > include/monitor/hmp.h | 1 - > include/qapi/util.h | 13 +++ > include/qemu/strList.h| 33 > migration/exec.c | 57 > monitor/hmp-cmds.c| 19 -- > net/net-hmp-cmds.c| 3 +- > stats/stats-hmp-cmds.c| 3 +- > tests/unit/test-strList.c | 80 +++ > util/strList.c| 38 +++ > tests/unit/meson.build| 1 + > util/meson.build | 1 + > 12 files changed, 180 insertions(+), 71 deletions(-) > create mode 100644 include/qemu/strList.h > create mode 100644 tests/unit/test-strList.c > create mode 100644 util/strList.c >
Re: [PATCH V1] migration: export fewer options
On 2/26/2024 2:40 AM, Markus Armbruster wrote: > Steve Sistare writes: > >> A small number of migration options are accessed by migration clients, >> but to see them clients must include all of options.h, which is mostly >> for migration core code. migrate_mode() in particular will be needed by >> multiple clients. >> >> Refactor the option declarations so clients can see the necessary few via >> misc.h, which already exports a portion of the client API. >> >> Signed-off-by: Steve Sistare >> --- >> I suggest that eventually we should define a single file migration/client.h >> which exports everything needed by the simpler clients: blockers, notifiers, >> options, cpr, and state accessors. >> --- >> --- >> hw/vfio/migration.c | 1 - >> hw/virtio/virtio-balloon.c | 1 - >> include/migration/misc.h| 1 + >> include/migration/options-pub.h | 24 >> migration/options.h | 6 +- > > Unusual naming. We have zero headers named -pub.h or -public.h, and > dozens named like -int.h or -internal.h. Please stick to the existing > convention. In the spirit of minimizing changes, I went that route to avoid renaming the existing migration/options.h and its references: 0 migration/block-dirty-bit 82 #include "options.h" 1 migration/block.c 32 #include "options.h" 2 migration/colo.c 37 #include "options.h" 3 migration/migration-hmp-c 35 #include "options.h" 4 migration/migration.c 68 #include "options.h" 5 migration/multifd-zlib.c 21 #include "options.h" 6 migration/multifd-zstd.c 21 #include "options.h" 7 migration/multifd.c 29 #include "options.h" 8 migration/options.c 30 #include "options.h" 9 migration/postcopy-ram.c 40 #include "options.h" a migration/qemu-file.c 33 #include "options.h" b migration/ram-compress.c 37 #include "options.h" c migration/ram.c 63 #include "options.h" d migration/rdma.c 40 #include "options.h" e migration/savevm.c71 #include "options.h" f migration/socket.c30 #include "options.h" g migration/tls.c 25 #include "options.h" But I take your point. Peter, which do you prefer? A. rename: migration/options.h -> migration/options-internal.h rename: include/migration/options-pub.h -> include/migration/options.h B. rename: include/migration/options.h -> include/migration/client-options.h I prefer B. If you prefer B, but want a different file name, please choose the final name. - Steve
Re: [PATCH V5 1/5] util: str_split
On 2/23/2024 12:41 PM, Philippe Mathieu-Daudé wrote: > On 23/2/24 15:01, Steven Sistare wrote: >> On 2/23/2024 1:01 AM, Philippe Mathieu-Daudé wrote: >>> On 22/2/24 22:47, Steve Sistare wrote: >>>> Generalize hmp_split_at_comma() to take any delimiter string, rename >>>> as str_split(), and move it to util/strList.c. >>>> >>>> No functional change. >>>> >>>> Signed-off-by: Steve Sistare >>>> --- >>>> include/monitor/hmp.h | 1 - >>>> include/qemu/strList.h | 24 >>>> monitor/hmp-cmds.c | 19 --- >>>> net/net-hmp-cmds.c | 3 ++- >>>> stats/stats-hmp-cmds.c | 3 ++- >>>> util/meson.build | 1 + >>>> util/strList.c | 24 >>>> 7 files changed, 53 insertions(+), 22 deletions(-) >>>> create mode 100644 include/qemu/strList.h >>>> create mode 100644 util/strList.c >>> >>> >>>> +#include "qapi/qapi-builtin-types.h" >>>> + >>>> +/* >>>> + * Split @str into a strList using the delimiter string @delim. >>>> + * The delimiter is not included in the result. >>>> + * Return NULL if @str is NULL or an empty string. >>>> + * A leading, trailing, or consecutive delimiter produces an >>>> + * empty string at that position in the output. >>>> + * All strings are g_strdup'd, and the result can be freed >>>> + * using qapi_free_strList. >>> >>> Note "qapi/qapi-builtin-types.h" defines: >>> >>> G_DEFINE_AUTOPTR_CLEANUP_FUNC(strList, qapi_free_strList) >>> >>> Maybe mention we can also use: >>> >>> g_autoptr(strList) >> >> Thanks Philippe. If we get to V6 for this series, I will mention this, >> and also mention g_autoptr(GStrv) in the header comment for >> strv_from_strlist. > > If there is no need for v6, do you mind sharing here what would be > the resulting comment? Maybe Markus can update it directly... > (assuming he takes your series). Sure - steve diff --git a/include/qemu/strList.h b/include/qemu/strList.h index c1eb1dd..b13bd53 100644 --- a/include/qemu/strList.h +++ b/include/qemu/strList.h @@ -17,13 +17,16 @@ * A leading, trailing, or consecutive delimiter produces an * empty string at that position in the output. * All strings are g_strdup'd, and the result can be freed - * using qapi_free_strList. + * using qapi_free_strList, or by declaring a local variable + * with g_autoptr(strList). */ strList *str_split(const char *str, const char *delim); /* * Produce and return a NULL-terminated array of strings from @list. - * The result is g_malloc'd and all strings are g_strdup'd. + * The result is g_malloc'd and all strings are g_strdup'd. The result + * can be freed using g_strfreev, or by declaring a local variable with + * g_auto(GStrv). */ char **strv_from_strList(const strList *list);
Re: [PATCH V5 1/5] util: str_split
On 2/23/2024 1:01 AM, Philippe Mathieu-Daudé wrote: > On 22/2/24 22:47, Steve Sistare wrote: >> Generalize hmp_split_at_comma() to take any delimiter string, rename >> as str_split(), and move it to util/strList.c. >> >> No functional change. >> >> Signed-off-by: Steve Sistare >> --- >> include/monitor/hmp.h | 1 - >> include/qemu/strList.h | 24 >> monitor/hmp-cmds.c | 19 --- >> net/net-hmp-cmds.c | 3 ++- >> stats/stats-hmp-cmds.c | 3 ++- >> util/meson.build | 1 + >> util/strList.c | 24 >> 7 files changed, 53 insertions(+), 22 deletions(-) >> create mode 100644 include/qemu/strList.h >> create mode 100644 util/strList.c > > >> +#include "qapi/qapi-builtin-types.h" >> + >> +/* >> + * Split @str into a strList using the delimiter string @delim. >> + * The delimiter is not included in the result. >> + * Return NULL if @str is NULL or an empty string. >> + * A leading, trailing, or consecutive delimiter produces an >> + * empty string at that position in the output. >> + * All strings are g_strdup'd, and the result can be freed >> + * using qapi_free_strList. > > Note "qapi/qapi-builtin-types.h" defines: > > G_DEFINE_AUTOPTR_CLEANUP_FUNC(strList, qapi_free_strList) > > Maybe mention we can also use: > > g_autoptr(strList) Thanks Philippe. If we get to V6 for this series, I will mention this, and also mention g_autoptr(GStrv) in the header comment for strv_from_strlist. - Steve >> + */ >> +strList *str_split(const char *str, const char *delim); >> + >> +#endif >
Re: [PATCH V4 1/5] util: strList_from_string
On 2/21/2024 12:01 PM, Steven Sistare wrote: > On 2/21/2024 8:29 AM, Markus Armbruster wrote: >> I apologize for the lateness of my review. > > No problem. Thanks for the review. > >> Steve Sistare writes: >> >>> Generalize hmp_split_at_comma() to take any delimiter string, rename >>> as strList_from_string(), and move it to util/strList.c. >>> >>> No functional change. >>> >>> Signed-off-by: Steve Sistare >> >> I can't see an actual use of generalized delimiters outside tests in >> this series. Do you have uses? > > In this series, it is called from hmp_announce_self and stats_filter; > those were formerly calls to hmp_split_at_comma. > > In my live update cpr-exec series, there is an additional call site, with a > space delimiter instead of comma. Live update V9 is posted but is old and > obsolete. > I will post V10 soon, but I am hoping you can pull this series first, so I > can > whittle down my pending patches and omit these from V10. > >>> --- >>> include/monitor/hmp.h | 1 - >>> include/qemu/strList.h | 24 >>> monitor/hmp-cmds.c | 19 --- >>> net/net-hmp-cmds.c | 3 ++- >>> stats/stats-hmp-cmds.c | 3 ++- >>> util/meson.build | 1 + >>> util/strList.c | 24 >>> 7 files changed, 53 insertions(+), 22 deletions(-) >>> create mode 100644 include/qemu/strList.h >>> create mode 100644 util/strList.c >>> >>> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h >>> index 13f9a2d..2df661e 100644 >>> --- a/include/monitor/hmp.h >>> +++ b/include/monitor/hmp.h >>> @@ -19,7 +19,6 @@ >>> >>> bool hmp_handle_error(Monitor *mon, Error *err); >>> void hmp_help_cmd(Monitor *mon, const char *name); >>> -strList *hmp_split_at_comma(const char *str); >>> >>> void hmp_info_name(Monitor *mon, const QDict *qdict); >>> void hmp_info_version(Monitor *mon, const QDict *qdict); >>> diff --git a/include/qemu/strList.h b/include/qemu/strList.h >>> new file mode 100644 >>> index 000..010237f >>> --- /dev/null >>> +++ b/include/qemu/strList.h >>> @@ -0,0 +1,24 @@ >>> +/* >>> + * Copyright (c) 2022 - 2024 Oracle and/or its affiliates. >>> + * >>> + * This work is licensed under the terms of the GNU GPL, version 2 or >>> later. >>> + * See the COPYING file in the top-level directory. >>> + */ >>> + >>> +#ifndef QEMU_STR_LIST_H >>> +#define QEMU_STR_LIST_H >>> + >>> +#include "qapi/qapi-builtin-types.h" >>> + >>> +/* >>> + * Break @in into a strList using the delimiter string @delim. >>> + * The delimiter is not included in the result. >>> + * Return NULL if @in is NULL or an empty string. >>> + * A leading, trailing, or consecutive delimiter produces an >>> + * empty string at that position in the output. >>> + * All strings are g_strdup'd, and the result can be freed >>> + * using qapi_free_strList. >>> + */ >>> +strList *strList_from_string(const char *in, const char *delim); >> >> The function name no longer tells us explicitly what the function does: >> splitting the string. > > The first sentence does not say it? > "Break @in into a strList using the delimiter string @delim" > > Would you prefer this? > "Split string @in into a strList using the delimiter string @delim" Sorry, I read your comment too quickly. You want a different function name. I propose: strList *str_split(const char *str, const char *delim); - Steve
Re: [PATCH V4 00/14] allow cpr-reboot for vfio
Peter (and David if interested): these patches still need RB: migration: notifier error checking migration: stop vm for cpr migration: update cpr-reboot description migration: options incompatible with cpr Alex, these patches still need RB: vfio: register container for cpr vfio: allow cpr-reboot migration if suspended Thanks! - Steve On 2/22/2024 12:28 PM, Steve Sistare wrote: > Allow cpr-reboot for vfio if the guest is in the suspended runstate. The > guest drivers' suspend methods flush outstanding requests and re-initialize > the devices, and thus there is no device state to save and restore. The > user is responsible for suspending the guest before initiating cpr, such as > by issuing guest-suspend-ram to the qemu guest agent. > > Most of the patches in this series enhance migration notifiers so they can > return an error status and message. The last few patches register a notifier > for vfio that returns an error if the guest is not suspended. > > Changes in V3: > * update to tip, add RB's > * replace MigrationStatus with new enum MigrationEventType > * simplify migrate_fd_connect error recovery > * support vfio iommufd containers > * add patches: > migration: stop vm for cpr > migration: update cpr-reboot description > > Changes in V4: > * rebase to tip, add RB's > * add patch to prevent options such as precopy from being used with cpr. > migration: options incompatible with cpr > * refactor subroutines in "stop vm for cpr" > * simplify "notifier error checking" patch by restricting that only > notifiers for MIG_EVENT_PRECOPY_SETUP may return an error. > * doc that a fail event may be sent without a prior setup event > > Steve Sistare (14): > notify: pass error to notifier with return > migration: remove error from notifier data > migration: convert to NotifierWithReturn > migration: MigrationEvent for notifiers > migration: remove postcopy_after_devices > migration: MigrationNotifyFunc > migration: per-mode notifiers > migration: refactor migrate_fd_connect failures > migration: notifier error checking > migration: stop vm for cpr > vfio: register container for cpr > vfio: allow cpr-reboot migration if suspended > migration: update cpr-reboot description > migration: options incompatible with cpr > > hw/net/virtio-net.c | 13 +-- > hw/vfio/common.c | 2 +- > hw/vfio/container.c | 11 ++- > hw/vfio/cpr.c | 39 + > hw/vfio/iommufd.c | 6 ++ > hw/vfio/meson.build | 1 + > hw/vfio/migration.c | 15 ++-- > hw/vfio/trace-events | 2 +- > hw/virtio/vhost-user.c| 10 +-- > hw/virtio/virtio-balloon.c| 3 +- > include/hw/vfio/vfio-common.h | 5 +- > include/hw/vfio/vfio-container-base.h | 1 + > include/hw/virtio/virtio-net.h| 2 +- > include/migration/misc.h | 47 +-- > include/qemu/notify.h | 8 +- > migration/migration.c | 148 > +++--- > migration/migration.h | 4 - > migration/postcopy-ram.c | 3 +- > migration/postcopy-ram.h | 1 - > migration/ram.c | 3 +- > net/vhost-vdpa.c | 14 ++-- > qapi/migration.json | 37 ++--- > roms/seabios-hppa | 2 +- > ui/spice-core.c | 17 ++-- > util/notify.c | 5 +- > 25 files changed, 275 insertions(+), 124 deletions(-) > create mode 100644 hw/vfio/cpr.c >
Re: [PATCH V3 10/13] migration: stop vm for cpr
On 2/22/2024 4:30 AM, Peter Xu wrote: > On Thu, Feb 22, 2024 at 05:12:53PM +0800, Peter Xu wrote: >> On Wed, Feb 21, 2024 at 04:20:07PM -0500, Steven Sistare wrote: >>> On 2/20/2024 2:33 AM, Peter Xu wrote: >>>> On Thu, Feb 08, 2024 at 10:54:03AM -0800, Steve Sistare wrote: >>>>> When migration for cpr is initiated, stop the vm and set state >>>>> RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the >>>>> possibility of ram and device state being out of sync, and guarantees >>>>> that a guest in the suspended state remains suspended, because qmp_cont >>>>> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. >>>>> >>>>> Signed-off-by: Steve Sistare >>>>> --- >>>>> include/migration/misc.h | 1 + >>>>> migration/migration.c| 32 +--- >>>>> 2 files changed, 26 insertions(+), 7 deletions(-) >>>>> >>>>> diff --git a/include/migration/misc.h b/include/migration/misc.h >>>>> index 6dc234b..54c99a3 100644 >>>>> --- a/include/migration/misc.h >>>>> +++ b/include/migration/misc.h >>>>> @@ -60,6 +60,7 @@ void migration_object_init(void); >>>>> void migration_shutdown(void); >>>>> bool migration_is_idle(void); >>>>> bool migration_is_active(MigrationState *); >>>>> +bool migrate_mode_is_cpr(MigrationState *); >>>>> >>>>> typedef enum MigrationEventType { >>>>> MIG_EVENT_PRECOPY_SETUP, >>>>> diff --git a/migration/migration.c b/migration/migration.c >>>>> index d1fce9e..fc5c587 100644 >>>>> --- a/migration/migration.c >>>>> +++ b/migration/migration.c >>>>> @@ -1603,6 +1603,11 @@ bool migration_is_active(MigrationState *s) >>>>> s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); >>>>> } >>>>> >>>>> +bool migrate_mode_is_cpr(MigrationState *s) >>>>> +{ >>>>> +return s->parameters.mode == MIG_MODE_CPR_REBOOT; >>>>> +} >>>>> + >>>>> int migrate_init(MigrationState *s, Error **errp) >>>>> { >>>>> int ret; >>>>> @@ -2651,13 +2656,14 @@ static int >>>>> migration_completion_precopy(MigrationState *s, >>>>> bql_lock(); >>>>> migration_downtime_start(s); >>>>> >>>>> -s->vm_old_state = runstate_get(); >>>>> -global_state_store(); >>>>> - >>>>> -ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); >>>>> -trace_migration_completion_vm_stop(ret); >>>>> -if (ret < 0) { >>>>> -goto out_unlock; >>>>> +if (!migrate_mode_is_cpr(s)) { >>>>> +s->vm_old_state = runstate_get(); >>>>> +global_state_store(); >>>>> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); >>>>> +trace_migration_completion_vm_stop(ret); >>>>> +if (ret < 0) { >>>>> +goto out_unlock; >>>>> +} >>>>> } >>>>> >>>>> ret = migration_maybe_pause(s, current_active_state, >>>>> @@ -3576,6 +3582,7 @@ void migrate_fd_connect(MigrationState *s, Error >>>>> *error_in) >>>>> Error *local_err = NULL; >>>>> uint64_t rate_limit; >>>>> bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED; >>>>> +int ret; >>>>> >>>>> /* >>>>> * If there's a previous error, free it and prepare for another one. >>>>> @@ -3651,6 +3658,17 @@ void migrate_fd_connect(MigrationState *s, Error >>>>> *error_in) >>>>> goto fail; >>>>> } >>>>> >>>>> +if (migrate_mode_is_cpr(s)) { >>>>> +s->vm_old_state = runstate_get(); >>>>> +global_state_store(); >>>>> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); >>>>> +trace_migration_completion_vm_stop(ret); >>>>> +if (ret < 0) { >>>>> +error_setg(_err, "migration_stop_vm failed, error %d", >>>>> -ret); >>>>> +
Re: [PATCH V3 10/13] migration: stop vm for cpr
On 2/22/2024 4:03 AM, Peter Xu wrote: > On Wed, Feb 21, 2024 at 04:23:07PM -0500, Steven Sistare wrote: >>> How about postcopy? I know it's nonsense to enable postcopy for cpr.. but >>> iiuc we don't yet forbid an user doing so. Maybe we should? >> >> How about this? >> >> --- >> @@ -3600,6 +3600,11 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> return; >> } >> >> +if (migrate_mode_is_cpr(s) && migrate_postcopy()) { >> +error_setg(_err, "cannot mix postcopy and cpr"); >> +goto fail; >> +} >> + >> if (resume) { >> /* This is a resumed migration */ >> rate_limit = migrate_max_postcopy_bandwidth(); >> > > migrate_fd_connect() will be a bit late, the error won't be able to be > attached in the "migrate" request. Perhaps, migrate_prepare()? Thank you, that is better - steve
Re: [PATCH V3 10/13] migration: stop vm for cpr
On 2/20/2024 2:33 AM, Peter Xu wrote: > On Thu, Feb 08, 2024 at 10:54:03AM -0800, Steve Sistare wrote: >> When migration for cpr is initiated, stop the vm and set state >> RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the >> possibility of ram and device state being out of sync, and guarantees >> that a guest in the suspended state remains suspended, because qmp_cont >> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. >> >> Signed-off-by: Steve Sistare >> --- >> include/migration/misc.h | 1 + >> migration/migration.c| 32 +--- >> 2 files changed, 26 insertions(+), 7 deletions(-) >> >> diff --git a/include/migration/misc.h b/include/migration/misc.h >> index 6dc234b..54c99a3 100644 >> --- a/include/migration/misc.h >> +++ b/include/migration/misc.h >> @@ -60,6 +60,7 @@ void migration_object_init(void); >> void migration_shutdown(void); >> bool migration_is_idle(void); >> bool migration_is_active(MigrationState *); >> +bool migrate_mode_is_cpr(MigrationState *); >> >> typedef enum MigrationEventType { >> MIG_EVENT_PRECOPY_SETUP, >> diff --git a/migration/migration.c b/migration/migration.c >> index d1fce9e..fc5c587 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -1603,6 +1603,11 @@ bool migration_is_active(MigrationState *s) >> s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); >> } >> >> +bool migrate_mode_is_cpr(MigrationState *s) >> +{ >> +return s->parameters.mode == MIG_MODE_CPR_REBOOT; >> +} >> + >> int migrate_init(MigrationState *s, Error **errp) >> { >> int ret; >> @@ -2651,13 +2656,14 @@ static int >> migration_completion_precopy(MigrationState *s, >> bql_lock(); >> migration_downtime_start(s); >> >> -s->vm_old_state = runstate_get(); >> -global_state_store(); >> - >> -ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); >> -trace_migration_completion_vm_stop(ret); >> -if (ret < 0) { >> -goto out_unlock; >> +if (!migrate_mode_is_cpr(s)) { >> +s->vm_old_state = runstate_get(); >> +global_state_store(); >> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); >> +trace_migration_completion_vm_stop(ret); >> +if (ret < 0) { >> +goto out_unlock; >> +} >> } >> >> ret = migration_maybe_pause(s, current_active_state, >> @@ -3576,6 +3582,7 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> Error *local_err = NULL; >> uint64_t rate_limit; >> bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED; >> +int ret; >> >> /* >> * If there's a previous error, free it and prepare for another one. >> @@ -3651,6 +3658,17 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> goto fail; >> } >> >> +if (migrate_mode_is_cpr(s)) { >> +s->vm_old_state = runstate_get(); >> +global_state_store(); >> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); >> +trace_migration_completion_vm_stop(ret); >> +if (ret < 0) { >> +error_setg(_err, "migration_stop_vm failed, error %d", >> -ret); >> +goto fail; >> +} >> +} > > Could we have a helper function for the shared codes? > > How about postcopy? I know it's nonsense to enable postcopy for cpr.. but > iiuc we don't yet forbid an user doing so. Maybe we should? How about this? --- @@ -3600,6 +3600,11 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) return; } +if (migrate_mode_is_cpr(s) && migrate_postcopy()) { +error_setg(_err, "cannot mix postcopy and cpr"); +goto fail; +} + if (resume) { /* This is a resumed migration */ rate_limit = migrate_max_postcopy_bandwidth(); - Steve
Re: [PATCH V3 10/13] migration: stop vm for cpr
On 2/20/2024 2:33 AM, Peter Xu wrote: > On Thu, Feb 08, 2024 at 10:54:03AM -0800, Steve Sistare wrote: >> When migration for cpr is initiated, stop the vm and set state >> RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the >> possibility of ram and device state being out of sync, and guarantees >> that a guest in the suspended state remains suspended, because qmp_cont >> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. >> >> Signed-off-by: Steve Sistare >> --- >> include/migration/misc.h | 1 + >> migration/migration.c| 32 +--- >> 2 files changed, 26 insertions(+), 7 deletions(-) >> >> diff --git a/include/migration/misc.h b/include/migration/misc.h >> index 6dc234b..54c99a3 100644 >> --- a/include/migration/misc.h >> +++ b/include/migration/misc.h >> @@ -60,6 +60,7 @@ void migration_object_init(void); >> void migration_shutdown(void); >> bool migration_is_idle(void); >> bool migration_is_active(MigrationState *); >> +bool migrate_mode_is_cpr(MigrationState *); >> >> typedef enum MigrationEventType { >> MIG_EVENT_PRECOPY_SETUP, >> diff --git a/migration/migration.c b/migration/migration.c >> index d1fce9e..fc5c587 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -1603,6 +1603,11 @@ bool migration_is_active(MigrationState *s) >> s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); >> } >> >> +bool migrate_mode_is_cpr(MigrationState *s) >> +{ >> +return s->parameters.mode == MIG_MODE_CPR_REBOOT; >> +} >> + >> int migrate_init(MigrationState *s, Error **errp) >> { >> int ret; >> @@ -2651,13 +2656,14 @@ static int >> migration_completion_precopy(MigrationState *s, >> bql_lock(); >> migration_downtime_start(s); >> >> -s->vm_old_state = runstate_get(); >> -global_state_store(); >> - >> -ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); >> -trace_migration_completion_vm_stop(ret); >> -if (ret < 0) { >> -goto out_unlock; >> +if (!migrate_mode_is_cpr(s)) { >> +s->vm_old_state = runstate_get(); >> +global_state_store(); >> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); >> +trace_migration_completion_vm_stop(ret); >> +if (ret < 0) { >> +goto out_unlock; >> +} >> } >> >> ret = migration_maybe_pause(s, current_active_state, >> @@ -3576,6 +3582,7 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> Error *local_err = NULL; >> uint64_t rate_limit; >> bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED; >> +int ret; >> >> /* >> * If there's a previous error, free it and prepare for another one. >> @@ -3651,6 +3658,17 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> goto fail; >> } >> >> +if (migrate_mode_is_cpr(s)) { >> +s->vm_old_state = runstate_get(); >> +global_state_store(); >> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); >> +trace_migration_completion_vm_stop(ret); >> +if (ret < 0) { >> +error_setg(_err, "migration_stop_vm failed, error %d", >> -ret); >> +goto fail; >> +} >> +} > > Could we have a helper function for the shared codes? I propose to add code to migration_stop_vm to make it the helper. Some call sites emit more traces (via migration_stop_vm) as a result of my refactoring, and postcopy start sets vm_old_state, which is not used thereafter in that path. Those changes seem harmless to me. Tell me what you think: --- diff --git a/migration/migration.c b/migration/migration.c index fc5c587..30d2b08 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -107,12 +107,6 @@ static int migration_maybe_pause(MigrationState *s, static void migrate_fd_cancel(MigrationState *s); static bool close_return_path_on_source(MigrationState *s); -static void migration_downtime_start(MigrationState *s) -{ -trace_vmstate_downtime_checkpoint("src-downtime-start"); -s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); -} - static void migration_downtime_end(MigrationState *s) { int64_t now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); @@ -161,11 +155,20 @@ static gint page_request_addr_cmp(gconstpointer ap, gconstpo return (a > b) - (a < b); } -int migration_stop_vm(RunState state) +static int migration_stop_vm(MigrationState *s, RunState state) { -int ret = vm_stop_force_state(state); +int ret; + +trace_vmstate_downtime_checkpoint("src-downtime-start"); +s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + +s->vm_old_state = runstate_get(); +global_state_store(); + +ret = vm_stop_force_state(state); trace_vmstate_downtime_checkpoint("src-vm-stopped"); +trace_migration_completion_vm_stop(ret); return ret; } @@ -2454,10 +2457,7 @@ static int
Re: [PATCH V3 12/13] vfio: allow cpr-reboot migration if suspended
Hi Alex, any comments or RB on this or patch 11? The last few changes I am making for Peter will not change this patch. - Steve On 2/8/2024 1:54 PM, Steve Sistare wrote: > Allow cpr-reboot for vfio if the guest is in the suspended runstate. The > guest drivers' suspend methods flush outstanding requests and re-initialize > the devices, and thus there is no device state to save and restore. The > user is responsible for suspending the guest before initiating cpr, such as > by issuing guest-suspend-ram to the qemu guest agent. > > Relax the vfio blocker so it does not apply to cpr, and add a notifier that > verifies the guest is suspended. > > Signed-off-by: Steve Sistare > --- > hw/vfio/common.c | 2 +- > hw/vfio/cpr.c | 20 > hw/vfio/migration.c | 2 +- > include/hw/vfio/vfio-container-base.h | 1 + > 4 files changed, 23 insertions(+), 2 deletions(-) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index 059bfdc..ff88c3f 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice > *vbasedev, Error **errp) > error_setg(_devices_migration_blocker, > "Multiple VFIO devices migration is supported only if all of " > "them support P2P migration"); > -ret = migrate_add_blocker(_devices_migration_blocker, errp); > +ret = migrate_add_blocker_normal(_devices_migration_blocker, > errp); > > return ret; > } > diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c > index 3bede54..392c2dd 100644 > --- a/hw/vfio/cpr.c > +++ b/hw/vfio/cpr.c > @@ -7,13 +7,33 @@ > > #include "qemu/osdep.h" > #include "hw/vfio/vfio-common.h" > +#include "migration/misc.h" > #include "qapi/error.h" > +#include "sysemu/runstate.h" > + > +static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier, > +MigrationEvent *e, Error **errp) > +{ > +if (e->type == MIG_EVENT_PRECOPY_SETUP && > +!runstate_check(RUN_STATE_SUSPENDED) && !vm_get_suspended()) { > + > +error_setg(errp, > +"VFIO device only supports cpr-reboot for runstate suspended"); > + > +return -1; > +} > +return 0; > +} > > int vfio_cpr_register_container(VFIOContainerBase *bcontainer, Error **errp) > { > +migration_add_notifier_mode(>cpr_reboot_notifier, > +vfio_cpr_reboot_notifier, > +MIG_MODE_CPR_REBOOT); > return 0; > } > > void vfio_cpr_unregister_container(VFIOContainerBase *bcontainer) > { > +migration_remove_notifier(>cpr_reboot_notifier); > } > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index 50140ed..2050ac8 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -889,7 +889,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, > Error *err, Error **errp) > vbasedev->migration_blocker = error_copy(err); > error_free(err); > > -return migrate_add_blocker(>migration_blocker, errp); > +return migrate_add_blocker_normal(>migration_blocker, errp); > } > > /* -- */ > diff --git a/include/hw/vfio/vfio-container-base.h > b/include/hw/vfio/vfio-container-base.h > index b2813b0..3582d5f 100644 > --- a/include/hw/vfio/vfio-container-base.h > +++ b/include/hw/vfio/vfio-container-base.h > @@ -49,6 +49,7 @@ typedef struct VFIOContainerBase { > QLIST_ENTRY(VFIOContainerBase) next; > QLIST_HEAD(, VFIODevice) device_list; > GList *iova_ranges; > +NotifierWithReturn cpr_reboot_notifier; > } VFIOContainerBase; > > typedef struct VFIOGuestIOMMU {
Re: [PATCH V4 2/5] qapi: QAPI_LIST_LENGTH
On 2/21/2024 8:29 AM, Markus Armbruster wrote: > Steve Sistare writes: > >> Signed-off-by: Steve Sistare >> Reviewed-by: Marc-André Lureau >> --- >> include/qapi/util.h | 13 + >> 1 file changed, 13 insertions(+) >> >> diff --git a/include/qapi/util.h b/include/qapi/util.h >> index 81a2b13..e1b8b1d 100644 >> --- a/include/qapi/util.h >> +++ b/include/qapi/util.h >> @@ -56,4 +56,17 @@ int parse_qapi_name(const char *name, bool complete); >> (tail) = &(*(tail))->next; \ >> } while (0) >> >> +/* >> + * For any GenericList @list, return its length. >> + */ >> +#define QAPI_LIST_LENGTH(list) \ >> +({ \ >> +int len = 0; \ > > size_t ok. >> +typeof(list) elem; \ > > Name this @tail, please. ok. >> +for (elem = list; elem != NULL; elem = elem->next) { \ >> +len++; \ >> +} \ >> +len; \ >> +}) >> + >> #endif > > This is a macro instead of a function so users don't have to cast their > FooList * to GenericList *. > > The only user outside tests is strv_from_strList(). I'd be tempted to > open-code it there and call it a day. Or do you have more users in > mind? That's the only use. If I make it private, I would still define it as a static subroutine in util/strList.c, because it simplifies strv_from_strList. IMO providing a public list length function or macro is pretty basic, but I am not wedded to it. Your call. - Steve > If we keep the macro, please align the backslashes like this: > >#define QAPI_LIST_LENGTH(list) \ >({ \ >int len = 0;\ >typeof(list) elem; \ >for (elem = list; elem != NULL; elem = elem->next) {\ >len++; \ >} \ >len;\ >}) >
Re: [PATCH V4 3/5] util: strv_from_strList
On 2/21/2024 8:14 AM, Markus Armbruster wrote: > Steve Sistare writes: > >> Signed-off-by: Steve Sistare >> Reviewed-by: Marc-André Lureau >> --- >> include/qemu/strList.h | 6 ++ >> util/strList.c | 14 ++ >> 2 files changed, 20 insertions(+) >> >> diff --git a/include/qemu/strList.h b/include/qemu/strList.h >> index 010237f..4b86aa6 100644 >> --- a/include/qemu/strList.h >> +++ b/include/qemu/strList.h >> @@ -21,4 +21,10 @@ >> */ >> strList *strList_from_string(const char *in, const char *delim); >> >> +/* >> + * Produce and return a NULL-terminated array of strings from @args. >> + * The result is g_malloc'd and all strings are g_strdup'd. >> + */ >> +GStrv strv_from_strList(const strList *args); >> + >> #endif >> diff --git a/util/strList.c b/util/strList.c >> index 7991de3..bad4187 100644 >> --- a/util/strList.c >> +++ b/util/strList.c >> @@ -22,3 +22,17 @@ strList *strList_from_string(const char *str, const char >> *delim) >> >> return res; >> } >> + >> +GStrv strv_from_strList(const strList *args) > > Suggest to name the argument @list. > >> +{ >> +const strList *arg; > > Suggest to name this @tail. ok. >> +int i = 0; >> +GStrv argv = g_new(char *, QAPI_LIST_LENGTH(args) + 1); >> + >> +for (arg = args; arg != NULL; arg = arg->next) { >> +argv[i++] = g_strdup(arg->value); >> +} >> +argv[i] = NULL; >> + >> +return argv; >> +} > > Can we use char ** instread of GStrv? I'd find that clearer. For what > it's worth, GLib documentation of functions like g_strsplit() doesn't > use the GStrv typedef, either. ok. - Steve
Re: [PATCH V4 4/5] util: strList unit tests
On 2/21/2024 8:19 AM, Markus Armbruster wrote: > Steve Sistare writes: > >> Signed-off-by: Steve Sistare >> Reviewed-by: Marc-André Lureau >> --- >> tests/unit/meson.build| 1 + >> tests/unit/test-strList.c | 80 >> +++ >> 2 files changed, 81 insertions(+) >> create mode 100644 tests/unit/test-strList.c >> >> diff --git a/tests/unit/meson.build b/tests/unit/meson.build >> index 69f9c05..113d12e 100644 >> --- a/tests/unit/meson.build >> +++ b/tests/unit/meson.build >> @@ -35,6 +35,7 @@ tests = { >>'test-rcu-simpleq': [], >>'test-rcu-tailq': [], >>'test-rcu-slist': [], >> + 'test-strList': [], >>'test-qdist': [], >>'test-qht': [], >>'test-qtree': [], >> diff --git a/tests/unit/test-strList.c b/tests/unit/test-strList.c >> new file mode 100644 >> index 000..49a1cfd >> --- /dev/null >> +++ b/tests/unit/test-strList.c >> @@ -0,0 +1,80 @@ >> +/* >> + * Copyright (c) 2022 - 2024 Oracle and/or its affiliates. >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2 or later. >> + * See the COPYING file in the top-level directory. >> + */ >> + >> +#include "qemu/osdep.h" >> +#include "qemu/strList.h" >> + >> +static strList *make_list(int length) >> +{ >> +strList *head = 0, *list, **prev = >> + >> +while (length--) { >> +list = *prev = g_new0(strList, 1); >> +list->value = g_strdup("aaa"); >> +prev = >next; >> +} >> +return head; >> +} >> + >> +static void test_length(void) >> +{ >> +strList *list; >> +int i; >> + >> +for (i = 0; i < 5; i++) { >> +list = make_list(i); >> +g_assert_cmpint(i, ==, QAPI_LIST_LENGTH(list)); >> +qapi_free_strList(list); >> +} >> +} >> + >> +struct { >> +const char *string; >> +const char *delim; >> +const char *args[5]; >> +} list_data[] = { >> +{ 0, ",", { 0 } }, >> +{ "", ",", { 0 } }, >> +{ "a", ",", { "a", 0 } }, >> +{ "a,b", ",", { "a", "b", 0 } }, >> +{ "a,b,c", ",", { "a", "b", "c", 0 } }, >> +{ "first last", " ", { "first", "last", 0 } }, >> +{ "a:", ":", { "a", "", 0 } }, >> +{ "a::b", ":", { "a", "", "b", 0 } }, >> +{ ":", ":", { "", "", 0 } }, >> +{ ":a", ":", { "", "a", 0 } }, >> +{ "::a", ":", { "", "", "a", 0 } }, >> +}; > > NULL instead of 0, please. ok. >> + >> +static void test_strv(void) >> +{ >> +int i, j; >> +const char **expect; >> +strList *list; >> +GStrv args; > > I'd prefer char **argv; ok. >> + >> +for (i = 0; i < ARRAY_SIZE(list_data); i++) { >> +expect = list_data[i].args; >> +list = strList_from_string(list_data[i].string, list_data[i].delim); >> +args = strv_from_strList(list); >> +qapi_free_strList(list); >> +for (j = 0; expect[j] && args[j]; j++) { > > Loop stops when either array element is null. That's wrong, we need to > exhaust both arrays: || instead of &&. || is not safe. After one array is exhausted, [j] will be out of bounds for the other. The g_assert_null calls guarantee the arrays are the same length and all elements have been compared. - Steve >> +g_assert_cmpstr(expect[j], ==, args[j]); >> +} >> +g_assert_null(expect[j]); >> +g_assert_null(args[j]); >> +g_strfreev(args); >> +} >> +} >> + >> +int main(int argc, char **argv) >> +{ >> +g_test_init(, , NULL); >> +g_test_add_func("/test-string/length", test_length); >> +g_test_add_func("/test-string/strv", test_strv); >> +return g_test_run(); >> +} >
Re: [PATCH V4 1/5] util: strList_from_string
On 2/21/2024 8:29 AM, Markus Armbruster wrote: > I apologize for the lateness of my review. No problem. Thanks for the review. > Steve Sistare writes: > >> Generalize hmp_split_at_comma() to take any delimiter string, rename >> as strList_from_string(), and move it to util/strList.c. >> >> No functional change. >> >> Signed-off-by: Steve Sistare > > I can't see an actual use of generalized delimiters outside tests in > this series. Do you have uses? In this series, it is called from hmp_announce_self and stats_filter; those were formerly calls to hmp_split_at_comma. In my live update cpr-exec series, there is an additional call site, with a space delimiter instead of comma. Live update V9 is posted but is old and obsolete. I will post V10 soon, but I am hoping you can pull this series first, so I can whittle down my pending patches and omit these from V10. >> --- >> include/monitor/hmp.h | 1 - >> include/qemu/strList.h | 24 >> monitor/hmp-cmds.c | 19 --- >> net/net-hmp-cmds.c | 3 ++- >> stats/stats-hmp-cmds.c | 3 ++- >> util/meson.build | 1 + >> util/strList.c | 24 >> 7 files changed, 53 insertions(+), 22 deletions(-) >> create mode 100644 include/qemu/strList.h >> create mode 100644 util/strList.c >> >> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h >> index 13f9a2d..2df661e 100644 >> --- a/include/monitor/hmp.h >> +++ b/include/monitor/hmp.h >> @@ -19,7 +19,6 @@ >> >> bool hmp_handle_error(Monitor *mon, Error *err); >> void hmp_help_cmd(Monitor *mon, const char *name); >> -strList *hmp_split_at_comma(const char *str); >> >> void hmp_info_name(Monitor *mon, const QDict *qdict); >> void hmp_info_version(Monitor *mon, const QDict *qdict); >> diff --git a/include/qemu/strList.h b/include/qemu/strList.h >> new file mode 100644 >> index 000..010237f >> --- /dev/null >> +++ b/include/qemu/strList.h >> @@ -0,0 +1,24 @@ >> +/* >> + * Copyright (c) 2022 - 2024 Oracle and/or its affiliates. >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2 or later. >> + * See the COPYING file in the top-level directory. >> + */ >> + >> +#ifndef QEMU_STR_LIST_H >> +#define QEMU_STR_LIST_H >> + >> +#include "qapi/qapi-builtin-types.h" >> + >> +/* >> + * Break @in into a strList using the delimiter string @delim. >> + * The delimiter is not included in the result. >> + * Return NULL if @in is NULL or an empty string. >> + * A leading, trailing, or consecutive delimiter produces an >> + * empty string at that position in the output. >> + * All strings are g_strdup'd, and the result can be freed >> + * using qapi_free_strList. >> + */ >> +strList *strList_from_string(const char *in, const char *delim); > > The function name no longer tells us explicitly what the function does: > splitting the string. The first sentence does not say it? "Break @in into a strList using the delimiter string @delim" Would you prefer this? "Split string @in into a strList using the delimiter string @delim" - Steve >> +#endif >> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c >> index 871898a..66b68a0 100644 >> --- a/monitor/hmp-cmds.c >> +++ b/monitor/hmp-cmds.c >> @@ -38,25 +38,6 @@ bool hmp_handle_error(Monitor *mon, Error *err) >> return false; >> } >> >> -/* >> - * Split @str at comma. >> - * A null @str defaults to "". >> - */ >> -strList *hmp_split_at_comma(const char *str) >> -{ >> -char **split = g_strsplit(str ?: "", ",", -1); >> -strList *res = NULL; >> -strList **tail = >> -int i; >> - >> -for (i = 0; split[i]; i++) { >> -QAPI_LIST_APPEND(tail, split[i]); >> -} >> - >> -g_free(split); >> -return res; >> -} >> - >> void hmp_info_name(Monitor *mon, const QDict *qdict) >> { >> NameInfo *info; >> diff --git a/net/net-hmp-cmds.c b/net/net-hmp-cmds.c >> index 41d326b..e893801 100644 >> --- a/net/net-hmp-cmds.c >> +++ b/net/net-hmp-cmds.c >> @@ -26,6 +26,7 @@ >> #include "qemu/config-file.h" >> #include "qemu/help_option.h" >> #include "qemu/option.h" >> +#include "qemu/strList.h" >> >> void hmp_info_network(Monitor *mon, const QDict *qdict) >> { >> @@ -72,7 +73,7 @@ void hmp_announce_self(Monitor *mon, const QDict *qdict) >> migrate_announce_params()); >> >> qapi_free_strList(params->interfaces); >> -params->interfaces = hmp_split_at_comma(interfaces_str); >> +params->interfaces = strList_from_string(interfaces_str, ","); >> params->has_interfaces = params->interfaces != NULL; >> params->id = g_strdup(id); >> qmp_announce_self(params, NULL); >> diff --git a/stats/stats-hmp-cmds.c b/stats/stats-hmp-cmds.c >> index 1f91bf8..428c0e6 100644 >> --- a/stats/stats-hmp-cmds.c >> +++ b/stats/stats-hmp-cmds.c >> @@ -10,6 +10,7 @@ >> #include "monitor/hmp.h" >> #include "monitor/monitor.h" >> #include "qemu/cutils.h" >> +#include
Re: [PATCH V4 5/5] migration: simplify exec migration functions
On 2/21/2024 10:54 AM, Fabiano Rosas wrote: > Fabiano Rosas writes: > >> Steve Sistare writes: >> >>> Simplify the exec migration code by using list utility functions. >>> >>> As a side effect, this also fixes a minor memory leak. On function return, >>> "g_auto(GStrv) argv" frees argv and each element, which is wrong, because >>> the function does not own the individual elements. To compensate, the code >>> uses g_steal_pointer which NULLs argv and prevents the destructor from >>> running, but argv is leaked. >>> >>> Fixes: cbab4face57b ("migration: convert exec backend ...") >>> Signed-off-by: Steve Sistare >> >> Reviewed-by: Fabiano Rosas > > You'll have to reintroduce the qemu/cutils.h include: > > ../migration/exec.c: In function 'exec_get_cmd_path': > ../migration/exec.c:37:5: error: implicit declaration of function 'pstrcat'; > did you mean 'strcat'? [-Werror=implicit-function-declaration] >37 | pstrcat(detected_path, MAX_PATH, "\\cmd.exe"); > | ^~~ > | strcat > ../migration/exec.c:37:5: error: nested extern declaration of 'pstrcat' > [-Werror=nested-externs] Thanks, I will rebase to the tip and verify all is well before I post V5. - Steve
Re: [PATCH V3 00/13] allow cpr-reboot for vfio
On 2/20/2024 2:49 AM, Peter Xu wrote: > On Thu, Feb 08, 2024 at 10:53:53AM -0800, Steve Sistare wrote: >> Allow cpr-reboot for vfio if the guest is in the suspended runstate. The >> guest drivers' suspend methods flush outstanding requests and re-initialize >> the devices, and thus there is no device state to save and restore. The >> user is responsible for suspending the guest before initiating cpr, such as >> by issuing guest-suspend-ram to the qemu guest agent. >> >> Most of the patches in this series enhance migration notifiers so they can >> return an error status and message. The last few patches register a notifier >> for vfio that returns an error if the guest is not suspended. >> >> Changes in V3: >> * update to tip, add RB's >> * replace MigrationStatus with new enum MigrationEventType >> * simplify migrate_fd_connect error recovery >> * support vfio iommufd containers >> * add patches: >> migration: stop vm for cpr >> migration: update cpr-reboot description > > This doesn't apply to master anymore, please rebase when repost, thanks. Will do. Before I do, any comments on "migration: update cpr-reboot description"? After we converge on that short description, I will submit a longer treatment in docs/devel/migration, which I see you have recently populated. - Steve
Re: [PATCH V3 10/13] migration: stop vm for cpr
On 2/20/2024 2:33 AM, Peter Xu wrote: > On Thu, Feb 08, 2024 at 10:54:03AM -0800, Steve Sistare wrote: >> When migration for cpr is initiated, stop the vm and set state >> RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the >> possibility of ram and device state being out of sync, and guarantees >> that a guest in the suspended state remains suspended, because qmp_cont >> rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. >> >> Signed-off-by: Steve Sistare >> --- >> include/migration/misc.h | 1 + >> migration/migration.c| 32 +--- >> 2 files changed, 26 insertions(+), 7 deletions(-) >> >> diff --git a/include/migration/misc.h b/include/migration/misc.h >> index 6dc234b..54c99a3 100644 >> --- a/include/migration/misc.h >> +++ b/include/migration/misc.h >> @@ -60,6 +60,7 @@ void migration_object_init(void); >> void migration_shutdown(void); >> bool migration_is_idle(void); >> bool migration_is_active(MigrationState *); >> +bool migrate_mode_is_cpr(MigrationState *); >> >> typedef enum MigrationEventType { >> MIG_EVENT_PRECOPY_SETUP, >> diff --git a/migration/migration.c b/migration/migration.c >> index d1fce9e..fc5c587 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -1603,6 +1603,11 @@ bool migration_is_active(MigrationState *s) >> s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); >> } >> >> +bool migrate_mode_is_cpr(MigrationState *s) >> +{ >> +return s->parameters.mode == MIG_MODE_CPR_REBOOT; >> +} >> + >> int migrate_init(MigrationState *s, Error **errp) >> { >> int ret; >> @@ -2651,13 +2656,14 @@ static int >> migration_completion_precopy(MigrationState *s, >> bql_lock(); >> migration_downtime_start(s); >> >> -s->vm_old_state = runstate_get(); >> -global_state_store(); >> - >> -ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); >> -trace_migration_completion_vm_stop(ret); >> -if (ret < 0) { >> -goto out_unlock; >> +if (!migrate_mode_is_cpr(s)) { >> +s->vm_old_state = runstate_get(); >> +global_state_store(); >> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); >> +trace_migration_completion_vm_stop(ret); >> +if (ret < 0) { >> +goto out_unlock; >> +} >> } >> >> ret = migration_maybe_pause(s, current_active_state, >> @@ -3576,6 +3582,7 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> Error *local_err = NULL; >> uint64_t rate_limit; >> bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED; >> +int ret; >> >> /* >> * If there's a previous error, free it and prepare for another one. >> @@ -3651,6 +3658,17 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> goto fail; >> } >> >> +if (migrate_mode_is_cpr(s)) { >> +s->vm_old_state = runstate_get(); >> +global_state_store(); >> +ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE); >> +trace_migration_completion_vm_stop(ret); >> +if (ret < 0) { >> +error_setg(_err, "migration_stop_vm failed, error %d", >> -ret); >> +goto fail; >> +} >> +} > > Could we have a helper function for the shared codes? Will do. > How about postcopy? I know it's nonsense to enable postcopy for cpr.. but > iiuc we don't yet forbid an user doing so. Maybe we should? I will assert that mode != cpr in the postcopy path instead, to prevent any nonsense. - Steve >> + >> if (migrate_background_snapshot()) { >> qemu_thread_create(>thread, "bg_snapshot", >> bg_migration_thread, s, QEMU_THREAD_JOINABLE); >> -- >> 1.8.3.1 >> >
Re: [PATCH V3 09/13] migration: notifier error checking
On 2/20/2024 2:12 AM, Peter Xu wrote: > On Thu, Feb 08, 2024 at 10:54:02AM -0800, Steve Sistare wrote: >> Check the status returned by migration notifiers and report errors. >> If notifiers fail, call the notifiers again so they can clean up. >> None of the notifiers return an error status at this time. >> >> Signed-off-by: Steve Sistare >> --- >> include/migration/misc.h | 3 ++- >> migration/migration.c| 40 +--- >> 2 files changed, 31 insertions(+), 12 deletions(-) >> >> diff --git a/include/migration/misc.h b/include/migration/misc.h >> index 0ea1902..6dc234b 100644 >> --- a/include/migration/misc.h >> +++ b/include/migration/misc.h >> @@ -82,7 +82,8 @@ void migration_add_notifier(NotifierWithReturn *notify, >> void migration_add_notifier_mode(NotifierWithReturn *notify, >> MigrationNotifyFunc func, MigMode mode); >> void migration_remove_notifier(NotifierWithReturn *notify); >> -void migration_call_notifiers(MigrationState *s, MigrationEventType type); >> +int migration_call_notifiers(MigrationState *s, MigrationEventType type, >> + Error **errp); >> bool migration_in_setup(MigrationState *); >> bool migration_has_finished(MigrationState *); >> bool migration_has_failed(MigrationState *); >> diff --git a/migration/migration.c b/migration/migration.c >> index 01d8867..d1fce9e 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -1318,6 +1318,8 @@ void migrate_set_state(int *state, int old_state, int >> new_state) >> >> static void migrate_fd_cleanup(MigrationState *s) >> { >> +Error *local_err = NULL; >> + >> g_free(s->hostname); >> s->hostname = NULL; >> json_writer_free(s->vmdesc); >> @@ -1362,13 +1364,23 @@ static void migrate_fd_cleanup(MigrationState *s) >>MIGRATION_STATUS_CANCELLED); >> } >> >> +if (!migration_has_failed(s) && >> +migration_call_notifiers(s, MIG_EVENT_PRECOPY_DONE, _err)) { >> + >> +migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED); >> +migrate_set_error(s, local_err); >> +error_free(local_err); >> +} >> + >> if (s->error) { >> /* It is used on info migrate. We can't free it */ >> error_report_err(error_copy(s->error)); >> } >> -migration_call_notifiers(s, s->state == MIGRATION_STATUS_COMPLETED ? >> - MIG_EVENT_PRECOPY_DONE : >> - MIG_EVENT_PRECOPY_FAILED); >> + >> +if (migration_has_failed(s)) { >> +migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL); >> +} > > AFAIU, the whole point of such split is, allowing DONE notifies to fail too > and then if that happens we can invoke FAIL notifiers again. Correct. > > Perhaps we can avoid that complexity, but rather document only SETUP > notifiers can fail? > > The problem is that failing a notifier at this stage (if migration already > finished) can already be too late; dest QEMU can already have started > running, so no way to roll back. We can document that, check and assert > for !SETUP cases to make sure error is never hit? Makes sense. I will modify the patch as you suggest. - Steve >> + >> block_cleanup_parameters(); >> yank_unregister_instance(MIGRATION_YANK_INSTANCE); >> } >> @@ -1481,13 +1493,15 @@ void migration_remove_notifier(NotifierWithReturn >> *notify) >> } >> } >> >> -void migration_call_notifiers(MigrationState *s, MigrationEventType type) >> +int migration_call_notifiers(MigrationState *s, MigrationEventType type, >> + Error **errp) >> { >> MigMode mode = s->parameters.mode; >> MigrationEvent e; >> >> e.type = type; >> -notifier_with_return_list_notify(_state_notifiers[mode], , >> 0); >> +return >> notifier_with_return_list_notify(_state_notifiers[mode], >> +, errp); >> } >> >> bool migration_in_setup(MigrationState *s) >> @@ -2535,7 +2549,9 @@ static int postcopy_start(MigrationState *ms, Error >> **errp) >> * at the transition to postcopy and after the device state; in >> particular >> * spice needs to trigger a transition now >> */ >> -migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE); >> +if (migration_call_notifiers(ms, MIG_EVENT_PRECOPY_DONE, errp)) { >> +goto fail; >> +} >> >> migration_downtime_end(ms); >> >> @@ -2555,11 +2571,10 @@ static int postcopy_start(MigrationState *ms, Error >> **errp) >> >> ret = qemu_file_get_error(ms->to_dst_file); >> if (ret) { >> -error_setg(errp, "postcopy_start: Migration stream errored"); >> -migrate_set_state(>state, MIGRATION_STATUS_POSTCOPY_ACTIVE, >> - MIGRATION_STATUS_FAILED); >> +error_setg_errno(errp, -ret, "postcopy_start: Migration stream >> error"); >> +
Re: [PATCH V3 09/13] migration: notifier error checking
On 2/12/2024 4:24 AM, David Hildenbrand wrote: > On 08.02.24 19:54, Steve Sistare wrote: >> Check the status returned by migration notifiers and report errors. >> If notifiers fail, call the notifiers again so they can clean up. > > IIUC, if any of the notifiers will actually start to fail, say, during > MIG_EVENT_PRECOPY_SETUP, you will call MIG_EVENT_PRECOPY_FAILED on all > notifiers. > > That will include notifiers that have never seen a MIG_EVENT_PRECOPY_SETUP > call. Correct. > Is that what we expect notifiers to be able to deal with? Can we document > that? The notifiers have always needed to handle failure without knowing the previous migration states, and are robust about unwinding their own internal state. I will document it. Thanks for all the RBs. - Steve
Re: [PATCH V2 11/11] vfio: allow cpr-reboot migration if suspended
On 1/17/2024 2:12 AM, Peter Xu wrote: > On Tue, Jan 16, 2024 at 03:37:33PM -0500, Steven Sistare wrote: >> On 1/15/2024 2:33 AM, Peter Xu wrote: >>> On Fri, Jan 12, 2024 at 07:05:10AM -0800, Steve Sistare wrote: >>>> Allow cpr-reboot for vfio if the guest is in the suspended runstate. The >>>> guest drivers' suspend methods flush outstanding requests and re-initialize >>>> the devices, and thus there is no device state to save and restore. The >>>> user is responsible for suspending the guest before initiating cpr, such as >>>> by issuing guest-suspend-ram to the qemu guest agent. >>>> >>>> Relax the vfio blocker so it does not apply to cpr, and add a notifier that >>>> verifies the guest is suspended. Skip dirty page tracking, which is N/A >>>> for >>>> cpr, to avoid ioctl errors. >>>> >>>> Signed-off-by: Steve Sistare >>>> --- >>>> hw/vfio/common.c | 2 +- >>>> hw/vfio/cpr.c | 20 >>>> hw/vfio/migration.c | 2 +- >>>> include/hw/vfio/vfio-common.h | 1 + >>>> migration/ram.c | 9 + >>>> 5 files changed, 28 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>> index 0b3352f..09af934 100644 >>>> --- a/hw/vfio/common.c >>>> +++ b/hw/vfio/common.c >>>> @@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice >>>> *vbasedev, Error **errp) >>>> error_setg(_devices_migration_blocker, >>>> "Multiple VFIO devices migration is supported only if all >>>> of " >>>> "them support P2P migration"); >>>> -ret = migrate_add_blocker(_devices_migration_blocker, errp); >>>> +ret = migrate_add_blocker_normal(_devices_migration_blocker, >>>> errp); >>>> >>>> return ret; >>>> } >>>> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c >>>> index bbd1c7a..9f4b1fe 100644 >>>> --- a/hw/vfio/cpr.c >>>> +++ b/hw/vfio/cpr.c >>>> @@ -7,13 +7,33 @@ >>>> >>>> #include "qemu/osdep.h" >>>> #include "hw/vfio/vfio-common.h" >>>> +#include "migration/misc.h" >>>> #include "qapi/error.h" >>>> +#include "sysemu/runstate.h" >>>> + >>>> +static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier, >>>> +MigrationEvent *e, Error **errp) >>>> +{ >>>> +if (e->state == MIGRATION_STATUS_SETUP && >>>> +!runstate_check(RUN_STATE_SUSPENDED)) { >>>> + >>>> +error_setg(errp, >>>> +"VFIO device only supports cpr-reboot for runstate >>>> suspended"); >>>> + >>>> +return -1; >>>> +} >>> >>> What happens if the guest is suspended during SETUP, but then quickly waked >>> up before CPR migration completes? >> >> That is a management layer bug -- we told them the VM must be suspended. >> However, I will reject a wakeup request if migration is active and mode is >> cpr. > > Yes it needs to be enforced if it is required by cpr-reboot. QEMU > hopefully should never rely on mgmt app behavior. > >> >>>> +return 0; >>>> +} >>>> >>>> int vfio_cpr_register_container(VFIOContainer *container, Error **errp) >>>> { >>>> +migration_add_notifier_mode(>cpr_reboot_notifier, >>>> +vfio_cpr_reboot_notifier, >>>> +MIG_MODE_CPR_REBOOT); >>>> return 0; >>>> } >>>> >>>> void vfio_cpr_unregister_container(VFIOContainer *container) >>>> { >>>> +migration_remove_notifier(>cpr_reboot_notifier); >>>> } >>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >>>> index 534fddf..488905d 100644 >>>> --- a/hw/vfio/migration.c >>>> +++ b/hw/vfio/migration.c >>>> @@ -895,7 +895,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, >>>> Error *err, Error **errp) >>>> vbasedev->migration_blocker = error_copy(err); >>>> error
Re: [PATCH V2 11/11] vfio: allow cpr-reboot migration if suspended
On 1/16/2024 3:37 PM, Steven Sistare wrote: > On 1/15/2024 2:33 AM, Peter Xu wrote: >> On Fri, Jan 12, 2024 at 07:05:10AM -0800, Steve Sistare wrote: >>> Allow cpr-reboot for vfio if the guest is in the suspended runstate. The >>> guest drivers' suspend methods flush outstanding requests and re-initialize >>> the devices, and thus there is no device state to save and restore. The >>> user is responsible for suspending the guest before initiating cpr, such as >>> by issuing guest-suspend-ram to the qemu guest agent. >>> >>> Relax the vfio blocker so it does not apply to cpr, and add a notifier that >>> verifies the guest is suspended. Skip dirty page tracking, which is N/A for >>> cpr, to avoid ioctl errors. >>> >>> Signed-off-by: Steve Sistare >>> --- >>> hw/vfio/common.c | 2 +- >>> hw/vfio/cpr.c | 20 >>> hw/vfio/migration.c | 2 +- >>> include/hw/vfio/vfio-common.h | 1 + >>> migration/ram.c | 9 + >>> 5 files changed, 28 insertions(+), 6 deletions(-) >>> >>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>> index 0b3352f..09af934 100644 >>> --- a/hw/vfio/common.c >>> +++ b/hw/vfio/common.c >>> @@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice >>> *vbasedev, Error **errp) >>> error_setg(_devices_migration_blocker, >>> "Multiple VFIO devices migration is supported only if all >>> of " >>> "them support P2P migration"); >>> -ret = migrate_add_blocker(_devices_migration_blocker, errp); >>> +ret = migrate_add_blocker_normal(_devices_migration_blocker, >>> errp); >>> >>> return ret; >>> } >>> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c >>> index bbd1c7a..9f4b1fe 100644 >>> --- a/hw/vfio/cpr.c >>> +++ b/hw/vfio/cpr.c >>> @@ -7,13 +7,33 @@ >>> >>> #include "qemu/osdep.h" >>> #include "hw/vfio/vfio-common.h" >>> +#include "migration/misc.h" >>> #include "qapi/error.h" >>> +#include "sysemu/runstate.h" >>> + >>> +static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier, >>> +MigrationEvent *e, Error **errp) >>> +{ >>> +if (e->state == MIGRATION_STATUS_SETUP && >>> +!runstate_check(RUN_STATE_SUSPENDED)) { >>> + >>> +error_setg(errp, >>> +"VFIO device only supports cpr-reboot for runstate suspended"); >>> + >>> +return -1; >>> +} >> >> What happens if the guest is suspended during SETUP, but then quickly waked >> up before CPR migration completes? > > That is a management layer bug -- we told them the VM must be suspended. > However, I will reject a wakeup request if migration is active and mode is > cpr. > >>> +return 0; >>> +} >>> >>> int vfio_cpr_register_container(VFIOContainer *container, Error **errp) >>> { >>> +migration_add_notifier_mode(>cpr_reboot_notifier, >>> +vfio_cpr_reboot_notifier, >>> +MIG_MODE_CPR_REBOOT); >>> return 0; >>> } >>> >>> void vfio_cpr_unregister_container(VFIOContainer *container) >>> { >>> +migration_remove_notifier(>cpr_reboot_notifier); >>> } >>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >>> index 534fddf..488905d 100644 >>> --- a/hw/vfio/migration.c >>> +++ b/hw/vfio/migration.c >>> @@ -895,7 +895,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, >>> Error *err, Error **errp) >>> vbasedev->migration_blocker = error_copy(err); >>> error_free(err); >>> >>> -return migrate_add_blocker(>migration_blocker, errp); >>> +return migrate_add_blocker_normal(>migration_blocker, errp); >>> } >>> >>> /* -- >>> */ >>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h >>> index 1add5b7..7a46e24 100644 >>> --- a/include/hw/vfio/vfio-common.h >>> +++ b/include/hw/vfio/vfio-common.h >>> @@ -78,6 +78,7 @@ struct VFIOGroup
Re: [PATCH V2 11/11] vfio: allow cpr-reboot migration if suspended
On 1/15/2024 2:33 AM, Peter Xu wrote: > On Fri, Jan 12, 2024 at 07:05:10AM -0800, Steve Sistare wrote: >> Allow cpr-reboot for vfio if the guest is in the suspended runstate. The >> guest drivers' suspend methods flush outstanding requests and re-initialize >> the devices, and thus there is no device state to save and restore. The >> user is responsible for suspending the guest before initiating cpr, such as >> by issuing guest-suspend-ram to the qemu guest agent. >> >> Relax the vfio blocker so it does not apply to cpr, and add a notifier that >> verifies the guest is suspended. Skip dirty page tracking, which is N/A for >> cpr, to avoid ioctl errors. >> >> Signed-off-by: Steve Sistare >> --- >> hw/vfio/common.c | 2 +- >> hw/vfio/cpr.c | 20 >> hw/vfio/migration.c | 2 +- >> include/hw/vfio/vfio-common.h | 1 + >> migration/ram.c | 9 + >> 5 files changed, 28 insertions(+), 6 deletions(-) >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >> index 0b3352f..09af934 100644 >> --- a/hw/vfio/common.c >> +++ b/hw/vfio/common.c >> @@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice >> *vbasedev, Error **errp) >> error_setg(_devices_migration_blocker, >> "Multiple VFIO devices migration is supported only if all of >> " >> "them support P2P migration"); >> -ret = migrate_add_blocker(_devices_migration_blocker, errp); >> +ret = migrate_add_blocker_normal(_devices_migration_blocker, >> errp); >> >> return ret; >> } >> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c >> index bbd1c7a..9f4b1fe 100644 >> --- a/hw/vfio/cpr.c >> +++ b/hw/vfio/cpr.c >> @@ -7,13 +7,33 @@ >> >> #include "qemu/osdep.h" >> #include "hw/vfio/vfio-common.h" >> +#include "migration/misc.h" >> #include "qapi/error.h" >> +#include "sysemu/runstate.h" >> + >> +static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier, >> +MigrationEvent *e, Error **errp) >> +{ >> +if (e->state == MIGRATION_STATUS_SETUP && >> +!runstate_check(RUN_STATE_SUSPENDED)) { >> + >> +error_setg(errp, >> +"VFIO device only supports cpr-reboot for runstate suspended"); >> + >> +return -1; >> +} > > What happens if the guest is suspended during SETUP, but then quickly waked > up before CPR migration completes? That is a management layer bug -- we told them the VM must be suspended. However, I will reject a wakeup request if migration is active and mode is cpr. >> +return 0; >> +} >> >> int vfio_cpr_register_container(VFIOContainer *container, Error **errp) >> { >> +migration_add_notifier_mode(>cpr_reboot_notifier, >> +vfio_cpr_reboot_notifier, >> +MIG_MODE_CPR_REBOOT); >> return 0; >> } >> >> void vfio_cpr_unregister_container(VFIOContainer *container) >> { >> +migration_remove_notifier(>cpr_reboot_notifier); >> } >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >> index 534fddf..488905d 100644 >> --- a/hw/vfio/migration.c >> +++ b/hw/vfio/migration.c >> @@ -895,7 +895,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, >> Error *err, Error **errp) >> vbasedev->migration_blocker = error_copy(err); >> error_free(err); >> >> -return migrate_add_blocker(>migration_blocker, errp); >> +return migrate_add_blocker_normal(>migration_blocker, errp); >> } >> >> /* -- */ >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h >> index 1add5b7..7a46e24 100644 >> --- a/include/hw/vfio/vfio-common.h >> +++ b/include/hw/vfio/vfio-common.h >> @@ -78,6 +78,7 @@ struct VFIOGroup; >> typedef struct VFIOContainer { >> VFIOContainerBase bcontainer; >> int fd; /* /dev/vfio/vfio, empowered by the attached groups */ >> +NotifierWithReturn cpr_reboot_notifier; >> unsigned iommu_type; >> QLIST_HEAD(, VFIOGroup) group_list; >> } VFIOContainer; >> diff --git a/migration/ram.c b/migration/ram.c >> index 1923366..44ad324 100644 >> --- a/migration/ram.c >> +++ b/migration/ram.c >> @@ -2392,8 +2392,8 @@ static void ram_save_cleanup(void *opaque) >> RAMState **rsp = opaque; >> RAMBlock *block; >> >> -/* We don't use dirty log with background snapshots */ >> -if (!migrate_background_snapshot()) { >> +/* We don't use dirty log with background snapshots or cpr */ >> +if (!migrate_background_snapshot() && migrate_mode() == >> MIG_MODE_NORMAL) { > > Same question here, on what happens if the user resumes the VM before > migration completes? IIUC shared-ram is not required, then it means if > that happens the cpr migration image can contain corrupted data, and that > may be a problem. > > Background snapshot is special in that it relies on totally different > tracking facilities
Re: [PATCH V2 04/11] migration: remove migration_in_postcopy parameter
On 1/15/2024 1:48 AM, Peter Xu wrote: > On Fri, Jan 12, 2024 at 07:05:03AM -0800, Steve Sistare wrote: >> bool migration_in_incoming_postcopy(void) >> diff --git a/ui/spice-core.c b/ui/spice-core.c >> index b3cd229..e43a93f 100644 >> --- a/ui/spice-core.c >> +++ b/ui/spice-core.c >> @@ -580,7 +580,7 @@ static int migration_state_notifier(NotifierWithReturn >> *notifier, >> if (migration_in_setup(s)) { >> spice_server_migrate_start(spice_server); >> } else if (migration_has_finished(s) || >> - migration_in_postcopy_after_devices(s)) { >> + migration_in_postcopy_after_devices()) { > > This can be a reply also to your other email: my previous suggestion of > using PRECOPY_DONE should apply here, where we can convert this chunk into: > > } else if (event == MIG_EVENT_PRECOPY_DONE) {...} > > Because PRECOPY_DONE should also cover the notification from > postcopy_start(), then we can drop migration_in_postcopy_after_devices() > completely, I think. Yes, that works. I will define and use MIG_EVENT_PRECOPY_DONE and friends in "MigrationEvent for notifiers". - Steve
Re: [PATCH V2 00/11] allow cpr-reboot for vfio
On 1/12/2024 4:38 PM, Alex Williamson wrote: > On Fri, 12 Jan 2024 07:04:59 -0800 > Steve Sistare wrote: > >> Allow cpr-reboot for vfio if the guest is in the suspended runstate. The >> guest drivers' suspend methods flush outstanding requests and re-initialize >> the devices, and thus there is no device state to save and restore. The >> user is responsible for suspending the guest before initiating cpr, such as >> by issuing guest-suspend-ram to the qemu guest agent. >> >> Most of the patches in this series enhance migration notifiers so they can >> return an error status and message. The last two patches register a notifier >> for vfio that returns an error if the guest is not suspended. > > Hi Steve, > > This appears to only support legacy container mode and not cdev/iommufd > mode. Shouldn't this support both? Thanks, My bad, thanks! I'll move cpr_reboot_notifier to the base container, and also call cpr register/unregister from iommufd_cdev_attach/iommufd_cdev_detach. - Steve >> Steve Sistare (11): >> notify: pass error to notifier with return >> migration: remove error from notifier data >> migration: convert to NotifierWithReturn >> migration: remove migration_in_postcopy parameter >> migration: MigrationEvent for notifiers >> migration: MigrationNotifyFunc >> migration: per-mode notifiers >> migration: refactor migrate_fd_connect failures >> migration: notifier error checking >> vfio: register container for cpr >> vfio: allow cpr-reboot migration if suspended >> >> hw/net/virtio-net.c| 14 ++--- >> hw/vfio/common.c | 2 +- >> hw/vfio/container.c| 11 +++- >> hw/vfio/cpr.c | 39 ++ >> hw/vfio/meson.build| 1 + >> hw/vfio/migration.c| 13 +++-- >> hw/virtio/vhost-user.c | 10 ++-- >> hw/virtio/virtio-balloon.c | 3 +- >> include/hw/vfio/vfio-common.h | 6 ++- >> include/hw/virtio/virtio-net.h | 2 +- >> include/migration/misc.h | 21 +--- >> include/qemu/notify.h | 7 ++- >> migration/migration.c | 117 >> +++-- >> migration/postcopy-ram.c | 3 +- >> migration/postcopy-ram.h | 1 - >> migration/ram.c| 12 ++--- >> net/vhost-vdpa.c | 15 +++--- >> ui/spice-core.c| 19 +++ >> util/notify.c | 5 +- >> 19 files changed, 206 insertions(+), 95 deletions(-) >> create mode 100644 hw/vfio/cpr.c >> >
Re: [PATCH V2 03/11] migration: convert to NotifierWithReturn
On 1/15/2024 1:44 AM, Peter Xu wrote: > On Fri, Jan 12, 2024 at 07:05:02AM -0800, Steve Sistare wrote: >> Change all migration notifiers to type NotifierWithReturn, so notifiers >> can return an error status in a future patch. For now, pass NULL for the >> notifier error parameter, and do not check the return value. >> >> Signed-off-by: Steve Sistare >> --- >> hw/net/virtio-net.c| 4 +++- >> hw/vfio/migration.c| 4 +++- >> include/hw/vfio/vfio-common.h | 2 +- >> include/hw/virtio/virtio-net.h | 2 +- >> include/migration/misc.h | 6 +++--- >> migration/migration.c | 16 >> net/vhost-vdpa.c | 6 -- >> ui/spice-core.c| 8 +--- >> 8 files changed, 28 insertions(+), 20 deletions(-) >> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> index 7a2846f..9570353 100644 >> --- a/hw/net/virtio-net.c >> +++ b/hw/net/virtio-net.c >> @@ -3529,11 +3529,13 @@ static void >> virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s) >> } >> } >> >> -static void virtio_net_migration_state_notifier(Notifier *notifier, void >> *data) >> +static int virtio_net_migration_state_notifier(NotifierWithReturn *notifier, >> + void *data, Error **errp) >> { >> MigrationState *s = data; >> VirtIONet *n = container_of(notifier, VirtIONet, migration_state); >> virtio_net_handle_migration_primary(n, s); >> +return 0; >> } > > I should have mentioned this earlier.. we have a trend recently to modify > retval to boolean when Error** existed, e.g.: > > https://lore.kernel.org/all/20231017202633.296756-5-pet...@redhat.com/ > > Let's start using boolean too here? Previous patches may need touch-ups > too for this. > > Other than that it all looks good here. Thanks, Boolean makes sense when there is only one way to recover from failure. However, when the notifier may fail in different ways, and recovery differs for each, then the function should return an int errno. NotifierWithReturn could have future uses that need multiple return values and multiple recovery paths above the notifier_with_return_list_notify level, so IMO the function should continue to return int for generality. - Steve
Re: [PATCH V2 08/11] migration: refactor migrate_fd_connect failures
On 1/15/2024 2:37 AM, Peter Xu wrote: > On Fri, Jan 12, 2024 at 07:05:07AM -0800, Steve Sistare wrote: >> Move common code for the error path in migrate_fd_connect to a shared >> fail label. No functional change. >> >> Signed-off-by: Steve Sistare > > Reviewed-by: Peter Xu > > One nitpick below: > >> --- >> migration/migration.c | 22 +++--- >> 1 file changed, 11 insertions(+), 11 deletions(-) >> >> diff --git a/migration/migration.c b/migration/migration.c >> index e9914aa..c828ba7 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -3549,6 +3549,7 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> Error *local_err = NULL; >> uint64_t rate_limit; >> bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED; >> +int expect_state = s->state; > > IMHO we can drop this. > >> >> /* >> * If there's a previous error, free it and prepare for another one. >> @@ -3603,11 +3604,7 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> if (migrate_postcopy_ram() || migrate_return_path()) { >> if (open_return_path_on_source(s)) { >> error_setg(_err, "Unable to open return-path for >> postcopy"); >> -migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED); >> -migrate_set_error(s, local_err); >> -error_report_err(local_err); >> -migrate_fd_cleanup(s); >> -return; >> +goto fail; >> } >> } >> >> @@ -3629,12 +3626,8 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> } >> >> if (multifd_save_setup(_err) != 0) { >> -migrate_set_error(s, local_err); >> -error_report_err(local_err); >> -migrate_set_state(>state, MIGRATION_STATUS_SETUP, >> - MIGRATION_STATUS_FAILED); >> -migrate_fd_cleanup(s); >> -return; >> +expect_state = MIGRATION_STATUS_SETUP; > > Then drop this. > >> +goto fail; >> } >> >> if (migrate_background_snapshot()) { >> @@ -3645,6 +3638,13 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> migration_thread, s, QEMU_THREAD_JOINABLE); >> } >> s->migration_thread_running = true; >> +return; >> + >> +fail: >> +migrate_set_error(s, local_err); >> +migrate_set_state(>state, expect_state, MIGRATION_STATUS_FAILED); > > Then constantly use s->state. Yes, if you are OK with it, I also prefer to simplify here. - Steve >> +error_report_err(local_err); >> +migrate_fd_cleanup(s); >> } >> >> static void migration_class_init(ObjectClass *klass, void *data) >> -- >> 1.8.3.1 >> >
Re: [PATCH V2 05/11] migration: MigrationEvent for notifiers
On 1/12/2024 10:05 AM, Steve Sistare wrote: > Passing MigrationState to notifiers is unsound because they could access > unstable migration state internals or even modify the state. Instead, pass > the minimal info needed in a new MigrationEvent struct, which could be > extended in the future if needed. > > Suggested-by: Peter Xu > Signed-off-by: Steve Sistare > --- > [...] > diff --git a/include/migration/misc.h b/include/migration/misc.h > index dcc98bb..0b4ce0f 100644 > --- a/include/migration/misc.h > +++ b/include/migration/misc.h > @@ -60,6 +60,11 @@ void migration_object_init(void); > void migration_shutdown(void); > bool migration_is_idle(void); > bool migration_is_active(MigrationState *); > + > +typedef struct MigrationEvent { > +MigrationStatus state; > +} MigrationEvent; Hi Peter, I chose to pass MigrationStatus rather than define a new enum and map MigrationStatus to it. IMO a new enum adds little value, yet it is more code and another layer of abstraction for coders to grok. - Steve
Re: [PATCH V1 2/3] migration: notifier error reporting
On 1/10/2024 9:16 PM, Peter Xu wrote: > On Wed, Jan 10, 2024 at 01:08:41PM -0500, Steven Sistare wrote: >> On 1/10/2024 2:18 AM, Peter Xu wrote: >>> On Wed, Dec 13, 2023 at 10:11:32AM -0800, Steve Sistare wrote: >>>> After calling notifiers, check if an error has been reported via >>>> migrate_set_error, and halt the migration. >>>> >>>> None of the notifiers call migrate_set_error at this time, so no >>>> functional change. >>>> >>>> Signed-off-by: Steve Sistare >>>> --- >>>> include/migration/misc.h | 2 +- >>>> migration/migration.c| 26 ++ >>>> 2 files changed, 23 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/include/migration/misc.h b/include/migration/misc.h >>>> index 901d117..231d7e4 100644 >>>> --- a/include/migration/misc.h >>>> +++ b/include/migration/misc.h >>>> @@ -65,7 +65,7 @@ MigMode migrate_mode_of(MigrationState *); >>>> void migration_add_notifier(Notifier *notify, >>>> void (*func)(Notifier *notifier, void *data)); >>>> void migration_remove_notifier(Notifier *notify); >>>> -void migration_call_notifiers(MigrationState *s); >>>> +int migration_call_notifiers(MigrationState *s); >>>> bool migration_in_setup(MigrationState *); >>>> bool migration_has_finished(MigrationState *); >>>> bool migration_has_failed(MigrationState *); >>>> diff --git a/migration/migration.c b/migration/migration.c >>>> index d5bfe70..29a9a92 100644 >>>> --- a/migration/migration.c >>>> +++ b/migration/migration.c >>>> @@ -1280,6 +1280,8 @@ void migrate_set_state(int *state, int old_state, >>>> int new_state) >>>> >>>> static void migrate_fd_cleanup(MigrationState *s) >>>> { >>>> +bool already_failed; >>>> + >>>> qemu_bh_delete(s->cleanup_bh); >>>> s->cleanup_bh = NULL; >>>> >>>> @@ -1327,11 +1329,20 @@ static void migrate_fd_cleanup(MigrationState *s) >>>>MIGRATION_STATUS_CANCELLED); >>>> } >>>> >>>> +already_failed = migration_has_failed(s); >>>> +if (migration_call_notifiers(s)) { >>>> +if (!already_failed) { >>>> +migrate_set_state(>state, s->state, >>>> MIGRATION_STATUS_FAILED); >>>> +/* Notify again to recover from this late failure. */ >>>> +migration_call_notifiers(s); >>>> +} >>>> +} >>>> + >>>> if (s->error) { >>>> /* It is used on info migrate. We can't free it */ >>>> error_report_err(error_copy(s->error)); >>>> } >>>> -migration_call_notifiers(s); >>>> + >>>> block_cleanup_parameters(); >>>> yank_unregister_instance(MIGRATION_YANK_INSTANCE); >>>> } >>>> @@ -1450,9 +1461,10 @@ void migration_remove_notifier(Notifier *notify) >>>> } >>>> } >>>> >>>> -void migration_call_notifiers(MigrationState *s) >>>> +int migration_call_notifiers(MigrationState *s) >>>> { >>>> notifier_list_notify(_state_notifiers, s); >>>> +return (s->error != NULL); >>> >>> Exporting more migration_*() functions is pretty ugly to me.. >> >> I assume you mean migrate_set_error(), which is currently only called from >> migration/*.c code. >> >> Instead, we could define a new function migrate_set_notifier_error(), defined >> in the new file migration/notifier.h, so we clearly limit the migration >> functions which can be called from notifiers. (Its implementation just calls >> migrate_set_error) > > Fundementally this allows another .c to change one more field of > MigrationState (which is ->error) and I still want to avoid it. > > I just replied in the other thread, but now with all these in mind I think > I still prefer not passing in MigrationState* at all. It's already kind of > abused due to migrate_get_current(), and IMHO it's healthier to limit its > usage to minimum to cover the core of migration states for migration/ use > only. > > Shrinking or even stop exporting migrate_get_current() is another more > challenging task, but now what we can do is stop enlarging the direct use > of Migrat
Re: [PATCH V1 2/3] migration: notifier error reporting
On 1/10/2024 2:18 AM, Peter Xu wrote: > On Wed, Dec 13, 2023 at 10:11:32AM -0800, Steve Sistare wrote: >> After calling notifiers, check if an error has been reported via >> migrate_set_error, and halt the migration. >> >> None of the notifiers call migrate_set_error at this time, so no >> functional change. >> >> Signed-off-by: Steve Sistare >> --- >> include/migration/misc.h | 2 +- >> migration/migration.c| 26 ++ >> 2 files changed, 23 insertions(+), 5 deletions(-) >> >> diff --git a/include/migration/misc.h b/include/migration/misc.h >> index 901d117..231d7e4 100644 >> --- a/include/migration/misc.h >> +++ b/include/migration/misc.h >> @@ -65,7 +65,7 @@ MigMode migrate_mode_of(MigrationState *); >> void migration_add_notifier(Notifier *notify, >> void (*func)(Notifier *notifier, void *data)); >> void migration_remove_notifier(Notifier *notify); >> -void migration_call_notifiers(MigrationState *s); >> +int migration_call_notifiers(MigrationState *s); >> bool migration_in_setup(MigrationState *); >> bool migration_has_finished(MigrationState *); >> bool migration_has_failed(MigrationState *); >> diff --git a/migration/migration.c b/migration/migration.c >> index d5bfe70..29a9a92 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -1280,6 +1280,8 @@ void migrate_set_state(int *state, int old_state, int >> new_state) >> >> static void migrate_fd_cleanup(MigrationState *s) >> { >> +bool already_failed; >> + >> qemu_bh_delete(s->cleanup_bh); >> s->cleanup_bh = NULL; >> >> @@ -1327,11 +1329,20 @@ static void migrate_fd_cleanup(MigrationState *s) >>MIGRATION_STATUS_CANCELLED); >> } >> >> +already_failed = migration_has_failed(s); >> +if (migration_call_notifiers(s)) { >> +if (!already_failed) { >> +migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED); >> +/* Notify again to recover from this late failure. */ >> +migration_call_notifiers(s); >> +} >> +} >> + >> if (s->error) { >> /* It is used on info migrate. We can't free it */ >> error_report_err(error_copy(s->error)); >> } >> -migration_call_notifiers(s); >> + >> block_cleanup_parameters(); >> yank_unregister_instance(MIGRATION_YANK_INSTANCE); >> } >> @@ -1450,9 +1461,10 @@ void migration_remove_notifier(Notifier *notify) >> } >> } >> >> -void migration_call_notifiers(MigrationState *s) >> +int migration_call_notifiers(MigrationState *s) >> { >> notifier_list_notify(_state_notifiers, s); >> +return (s->error != NULL); > > Exporting more migration_*() functions is pretty ugly to me.. I assume you mean migrate_set_error(), which is currently only called from migration/*.c code. Instead, we could define a new function migrate_set_notifier_error(), defined in the new file migration/notifier.h, so we clearly limit the migration functions which can be called from notifiers. (Its implementation just calls migrate_set_error) > Would it be better to pass in "Error** errp" into each notifiers? That may > need an open coded notifier_list_notify(), breaking the loop if "*errp". > > And the notifier API currently only support one arg.. maybe we should > implement the notifiers ourselves, ideally passing in "(int state, Error > **errp)" instead of "(MigrationState *s)". > > Ideally with that MigrationState* shouldn't be visible outside migration/. I will regret saying this because of the amount of (mechanical) code change involved, but the cleanest solution is: * Pass errp to: notifier_with_return_list_notify(NotifierWithReturnList *list, void *data, Error *errp) * Pass errp to the NotifierWithReturn notifier: int (*notify)(NotifierWithReturn *notifier, void *data, Error **errp); * Delete the errp member from struct PostcopyNotifyData and pass errp to the notifier function Ditto for PrecopyNotifyData. * Convert all migration notifiers to NotifierWithReturn - Steve >> } >> >> bool migration_in_setup(MigrationState *s) >> @@ -2520,7 +2532,9 @@ static int postcopy_start(MigrationState *ms, Error >> **errp) >> * spice needs to trigger a transition now >> */ >> ms->postcopy_after_devices = true; >> -migration_call_notifiers(ms); >> +if (migration_call_notifiers(ms)) { >> +goto fail; >> +} >> >> migration_downtime_end(ms); >> >> @@ -3589,7 +3603,11 @@ void migrate_fd_connect(MigrationState *s, Error >> *error_in) >> rate_limit = migrate_max_bandwidth(); >> >> /* Notify before starting migration thread */ >> -migration_call_notifiers(s); >> +if (migration_call_notifiers(s)) { >> +migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED); >> +migrate_fd_cleanup(s); >> +return; >> +} >> } >> >> migration_rate_set(rate_limit); >> -- >>
Re: [PATCH V1 1/3] migration: check mode in notifiers
On 1/10/2024 2:09 AM, Peter Xu wrote: > On Wed, Dec 13, 2023 at 10:11:31AM -0800, Steve Sistare wrote: >> The existing notifiers should only apply to normal mode. >> >> No functional change. > > Instead of adding such check in every notifier, why not make CPR a separate > list of notifiers? Just like the blocker lists. Sure. I proposed minimal changes in this current series, but extending the api to take migration mode would be nicer. > Aside of this patch, I just started to look at this "notifier" code, I > really don't think we should pass in MigrationState* into the notifiers. > IIUC we only need the "state" as an enum. Then with two separate > registers, the device code knows the migration mode. > > What do you think? If we pass state, the notifier must either compare to enum values such as MIGRATION_STATUS_COMPLETED instead of calling migration_has_finished(s), or we must define new accessors such as migration_state_is_finished(state). IMO passing MigrationState is the best approach. MigrationState is an incomplete type in most notifiers, and the client can pass it to a limited set of accessors to get more information -- exactly what we want to hide migration internals. However, we could further limit the allowed accessors, eg move these to a new file "include/migration/notifier.h". #include "qemu/notify.h" void migration_add_notifier(Notifier *notify, void (*func)(Notifier *notifier, void *data)); void migration_remove_notifier(Notifier *notify); bool migration_is_active(MigrationState *); bool migration_in_setup(MigrationState *); bool migration_has_finished(MigrationState *); bool migration_has_failed(MigrationState *); --- - Steve
Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate
On 12/23/2023 12:41 AM, Markus Armbruster wrote: > Steven Sistare writes: > >> On 12/22/2023 7:20 AM, Markus Armbruster wrote: >>> Steve Sistare writes: >>> >>>> Currently, a vm in the suspended state is not completely stopped. The >>>> VCPUs >>>> have been paused, but the cpu clock still runs, and runstate notifiers for >>>> the transition to stopped have not been called. This causes problems for >>>> live migration. Stale cpu timers_state is saved to the migration stream, >>>> causing time errors in the guest when it wakes from suspend, and state that >>>> would have been modified by runstate notifiers is wrong. >>>> >>>> Modify vm_stop to completely stop the vm if the current state is suspended, >>>> transition to RUN_STATE_PAUSED, and remember that the machine was >>>> suspended. >>>> Modify vm_start to restore the suspended state. >>> >>> Can you explain this to me in terms of the @current_run_state state >>> machine? Like >>> >>> Before the patch, trigger X in state Y goes to state Z. >>> Afterwards, it goes to ... >> >> Old behavior: >> RUN_STATE_SUSPENDED --> stop --> RUN_STATE_SUSPENDED >> >> New behavior: >> RUN_STATE_SUSPENDED --> stop --> RUN_STATE_PAUSED >> RUN_STATE_PAUSED--> cont --> RUN_STATE_SUSPENDED > > This clarifies things quite a bit for me. Maybe work it into the commit > message? Will do. >>>> This affects all callers of vm_stop and vm_start, notably, the qapi stop >>>> and >>>> cont commands. For example: >>>> >>>> (qemu) info status >>>> VM status: paused (suspended) >>>> >>>> (qemu) stop >>>> (qemu) info status >>>> VM status: paused >>>> >>>> (qemu) cont >>>> (qemu) info status >>>> VM status: paused (suspended) >>>> >>>> (qemu) system_wakeup >>>> (qemu) info status >>>> VM status: running >>>> >>>> Suggested-by: Peter Xu >>>> Signed-off-by: Steve Sistare >>>> --- >>>> include/sysemu/runstate.h | 5 + >>>> qapi/misc.json| 10 -- >>>> system/cpus.c | 19 ++- >>>> system/runstate.c | 3 +++ >>>> 4 files changed, 30 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h >>>> index f6a337b..1d6828f 100644 >>>> --- a/include/sysemu/runstate.h >>>> +++ b/include/sysemu/runstate.h >>>> @@ -40,6 +40,11 @@ static inline bool >>>> shutdown_caused_by_guest(ShutdownCause cause) >>>> return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN; >>>> } >>>> >>>> +static inline bool runstate_is_started(RunState state) >>>> +{ >>>> +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED; >>>> +} >>>> + >>>> void vm_start(void); >>>> >>>> /** >>>> diff --git a/qapi/misc.json b/qapi/misc.json >>>> index cda2eff..efb8d44 100644 >>>> --- a/qapi/misc.json >>>> +++ b/qapi/misc.json >>>> @@ -134,7 +134,7 @@ >>>> ## >>>> # @stop: >>>> # >>>> -# Stop all guest VCPU execution. >>>> +# Stop all guest VCPU and VM execution. >>> >>> Doesn't "stop all VM execution" imply "stop all guest vCPU execution"? >> >> Agreed, so we simply have: >> >> # @stop: >> # Stop guest VM execution. >> >> # @cont: >> # Resume guest VM execution. > > Yes, please. Will do. >>>> # >>>> # Since: 0.14 >>>> # >>>> @@ -143,6 +143,9 @@ >>># Notes: This function will succeed even if the guest is already in >>># the stopped state. In "inmigrate" state, it will ensure that >>>> # the guest remains paused once migration finishes, as if the -S >>>> # option was passed on the command line. >>>> # >>>> +# In the "suspended" state, it will completely stop the VM and >>>> +# cause a transition to the "paused" state. (Since 9.0) >>>> +# >>> >>> What user-observable (with que
Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate
On 1/3/2024 8:09 AM, Peter Xu wrote: > Steven, > > The discussions seem to still apply to the latest. Do you plan to post a > new version, with everything covered? Yes, today, thanks - steve
Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate
On 12/22/2023 7:20 AM, Markus Armbruster wrote: > Steve Sistare writes: > >> Currently, a vm in the suspended state is not completely stopped. The VCPUs >> have been paused, but the cpu clock still runs, and runstate notifiers for >> the transition to stopped have not been called. This causes problems for >> live migration. Stale cpu timers_state is saved to the migration stream, >> causing time errors in the guest when it wakes from suspend, and state that >> would have been modified by runstate notifiers is wrong. >> >> Modify vm_stop to completely stop the vm if the current state is suspended, >> transition to RUN_STATE_PAUSED, and remember that the machine was suspended. >> Modify vm_start to restore the suspended state. > > Can you explain this to me in terms of the @current_run_state state > machine? Like > > Before the patch, trigger X in state Y goes to state Z. > Afterwards, it goes to ... Old behavior: RUN_STATE_SUSPENDED --> stop --> RUN_STATE_SUSPENDED New behavior: RUN_STATE_SUSPENDED --> stop --> RUN_STATE_PAUSED RUN_STATE_PAUSED--> cont --> RUN_STATE_SUSPENDED >> This affects all callers of vm_stop and vm_start, notably, the qapi stop and >> cont commands. For example: >> >> (qemu) info status >> VM status: paused (suspended) >> >> (qemu) stop >> (qemu) info status >> VM status: paused >> >> (qemu) cont >> (qemu) info status >> VM status: paused (suspended) >> >> (qemu) system_wakeup >> (qemu) info status >> VM status: running >> >> Suggested-by: Peter Xu >> Signed-off-by: Steve Sistare >> --- >> include/sysemu/runstate.h | 5 + >> qapi/misc.json| 10 -- >> system/cpus.c | 19 ++- >> system/runstate.c | 3 +++ >> 4 files changed, 30 insertions(+), 7 deletions(-) >> >> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h >> index f6a337b..1d6828f 100644 >> --- a/include/sysemu/runstate.h >> +++ b/include/sysemu/runstate.h >> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause >> cause) >> return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN; >> } >> >> +static inline bool runstate_is_started(RunState state) >> +{ >> +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED; >> +} >> + >> void vm_start(void); >> >> /** >> diff --git a/qapi/misc.json b/qapi/misc.json >> index cda2eff..efb8d44 100644 >> --- a/qapi/misc.json >> +++ b/qapi/misc.json >> @@ -134,7 +134,7 @@ >> ## >> # @stop: >> # >> -# Stop all guest VCPU execution. >> +# Stop all guest VCPU and VM execution. > > Doesn't "stop all VM execution" imply "stop all guest vCPU execution"? Agreed, so we simply have: # @stop: # Stop guest VM execution. # @cont: # Resume guest VM execution. >> # >> # Since: 0.14 >> # >> @@ -143,6 +143,9 @@ ># Notes: This function will succeed even if the guest is already in ># the stopped state. In "inmigrate" state, it will ensure that >> # the guest remains paused once migration finishes, as if the -S >> # option was passed on the command line. >> # >> +# In the "suspended" state, it will completely stop the VM and >> +# cause a transition to the "paused" state. (Since 9.0) >> +# > > What user-observable (with query-status) state transitions are possible > here? {"status": "suspended", "singlestep": false, "running": false} --> stop --> {"status": "paused", "singlestep": false, "running": false} >> # Example: >> # >> # -> { "execute": "stop" } >> @@ -153,7 +156,7 @@ >> ## >> # @cont: >> # >> -# Resume guest VCPU execution. >> +# Resume guest VCPU and VM execution. >> # >> # Since: 0.14 >> # >> @@ -165,6 +168,9 @@ ># Returns: If successful, nothing ># ># Notes: This command will succeed if the guest is currently running. ># It will also succeed if the guest is in the "inmigrate" state; ># in this case, the effect of the command is to make sure the >> # guest starts once migration finishes, removing the effect of the >> # -S command line option if it was passed. >> # >> +# If the VM was previously suspended, and not been reset or woken, >> +# this command will transition back to the "suspended" state. (Since >> 9.0) > > Long line. It fits in 80 columns, but perhaps this looks nicer: # If the VM was previously suspended, and not been reset or woken, # this command will transition back to the "suspended" state. # (Since 9.0) > What user-observable state transitions are possible here? {"status": "paused", "singlestep": false, "running": false} --> cont --> {"status": "suspended", "singlestep": false, "running": false} >> +# >> # Example: >> # >> # -> { "execute": "cont" } > > Should we update documentation of query-status, too? IMO no. The new behavior changes the status/RunState field only, and the domain of values does not change, only the transitions caused by the commands described here.
Re: [PATCH V8 00/12] fix migration of suspended runstate
On 12/20/2023 9:52 AM, Anthony PERARD wrote: > On Mon, Dec 18, 2023 at 01:14:51PM +0800, Peter Xu wrote: >> On Wed, Dec 13, 2023 at 10:35:33AM -0500, Steven Sistare wrote: >>> Hi Peter, all have RB's, with all i's dotted and t's crossed - steve >> >> Yes this seems to be more migration related so maybe good candidate for a >> pull from migration submodule. >> >> But since this is still solving a generic issue, I'm copying a few more >> people from get_maintainers.pl that this series touches, just in case >> they'll have something to say before dev cycle starts. > > I did a quick smoke test of migrating a Xen guest. It works fine for me. Thanks very much, I appreciate it - steve
Re: [PATCH V8 00/12] fix migration of suspended runstate
On 12/18/2023 12:14 AM, Peter Xu wrote: > On Wed, Dec 13, 2023 at 10:35:33AM -0500, Steven Sistare wrote: >> Hi Peter, all have RB's, with all i's dotted and t's crossed - steve > > Yes this seems to be more migration related so maybe good candidate for a > pull from migration submodule. > > But since this is still solving a generic issue, I'm copying a few more > people from get_maintainers.pl that this series touches, just in case > they'll have something to say before dev cycle starts. The key aspects are summarized by the cover letter and the commit messages pasted below for the first 6 patches: https://lore.kernel.org/qemu-devel/1702481421-375368-1-git-send-email-steven.sist...@oracle.com --- [PATCH V8 00/12] fix migration of suspended runstate Migration of a guest in the suspended runstate is broken. The incoming migration code automatically tries to wake the guest, which is wrong; the guest should end migration in the same runstate it started. Further, after saving a snapshot in the suspended state and loading it, the vm_start fails. The runstate is RUNNING, but the guest is not. --- [PATCH V8 01/12] cpus: vm_was_suspended Add a state variable to remember if a vm previously transitioned into a suspended state. --- [PATCH V8 02/12] cpus: stop vm in suspended runstate Currently, a vm in the suspended state is not completely stopped. The VCPUs have been paused, but the cpu clock still runs, and runstate notifiers for the transition to stopped have not been called. This causes problems for live migration. Stale cpu timers_state is saved to the migration stream, causing time errors in the guest when it wakes from suspend, and state that would have been modified by runstate notifiers is wrong. Modify vm_stop to completely stop the vm if the current state is suspended, transition to RUN_STATE_PAUSED, and remember that the machine was suspended. Modify vm_start to restore the suspended state. This affects all callers of vm_stop and vm_start, notably, the qapi stop and cont commands. For example: (qemu) info status VM status: paused (suspended) (qemu) stop (qemu) info status VM status: paused (qemu) system_wakeup Error: Unable to wake up: guest is not in suspended state (qemu) cont (qemu) info status VM status: paused (suspended) (qemu) system_wakeup (qemu) info status VM status: running --- [PATCH V8 03/12] cpus: check running not RUN_STATE_RUNNING When a vm transitions from running to suspended, runstate notifiers are not called, so the notifiers still think the vm is running. Hence, when we call vm_start to restore the suspended state, we call vm_state_notify with running=1. However, some notifiers check for RUN_STATE_RUNNING. They must check the running boolean instead. No functional change. --- [PATCH V8 04/12] cpus: vm_resume Define the vm_resume helper, for use in subsequent patches. --- [PATCH V8 05/12] migration: propagate suspended runstate If the outgoing machine was previously suspended, propagate that to the incoming side via global_state, so a subsequent vm_start restores the suspended state. To maintain backward and forward compatibility, reclaim some space from the runstate member. --- [PATCH V8 06/12] migration: preserve suspended runstate A guest that is migrated in the suspended state automaticaly wakes and continues execution. This is wrong; the guest should end migration in the same state it started. The root cause is that the outgoing migration code automatically wakes the guest, then saves the RUNNING runstate in global_state_store(), hence the incoming migration code thinks the guest is running and continues the guest if autostart is true. On the outgoing side, delete the call to qemu_system_wakeup_request(). Now that vm_stop completely stops a vm in the suspended state (from the preceding patches), the existing call to vm_stop_force_state is sufficient to correctly migrate all vmstate. On the incoming side, call vm_start if the pre-migration state was running or suspended. For the latter, vm_start correctly restores the suspended state, and a future system_wakeup monitor request will cause the vm to resume running. ---
Re: [PATCH V8 00/12] fix migration of suspended runstate
Hi Peter, all have RB's, with all i's dotted and t's crossed - steve On 12/13/2023 10:30 AM, Steve Sistare wrote: > Migration of a guest in the suspended runstate is broken. The incoming > migration code automatically tries to wake the guest, which is wrong; > the guest should end migration in the same runstate it started. Further, > after saving a snapshot in the suspended state and loading it, the vm_start > fails. The runstate is RUNNING, but the guest is not. > > See the commit messages for the details. > > Changes in V2: > * simplify "start on wakeup request" > * fix postcopy, snapshot, and background migration > * refactor fixes for each type of migration > * explicitly handled suspended events and runstate in tests > * add test for postcopy and background migration > > Changes in V3: > * rebase to tip > * fix hang in new function migrate_wait_for_dirty_mem > > Changes in V4: > * rebase to tip > * add patch for vm_prepare_start (thanks Peter) > * add patch to preserve cpu ticks > > Changes in V5: > * rebase to tip > * added patches to completely stop vm in suspended state: > cpus: refactor vm_stop > cpus: stop vm in suspended state > * added patch to partially resume vm in suspended state: > cpus: start vm in suspended state > * modified "preserve suspended ..." patches to use the above. > * deleted patch "preserve cpu ticks if suspended". stop ticks in > vm_stop_force_state instead. > * deleted patch "add runstate function". defined new helper function > migrate_new_runstate in "preserve suspended runstate" > * Added some RB's, but removed other RB's because the patches changed. > > Changes in V6: > * all vm_stop calls completely stop the suspended state > * refactored and updated the "cpus" patches > * simplified the "preserve suspended" patches > * added patch "bootfile per vm" > > Changes in V7: > * rebase to tip, add RB-s > * fix backwards compatibility for global_state.vm_was_suspended > * delete vm_prepare_start state argument, and rename patch > "pass runstate to vm_prepare_start" to > "check running not RUN_STATE_RUNNING" > * drop patches: > tests/qtest: bootfile per vm > tests/qtest: background migration with suspend > * rename runstate_is_started to runstate_is_live > * move wait_for_suspend in tests > > Changes in V8: > * rebase to tip > * add RB's > * add comment for runstate_is_live > * simplify global_state - the needed function, and its use of > vm_was_suspended > > Steve Sistare (12): > cpus: vm_was_suspended > cpus: stop vm in suspended runstate > cpus: check running not RUN_STATE_RUNNING > cpus: vm_resume > migration: propagate suspended runstate > migration: preserve suspended runstate > migration: preserve suspended for snapshot > migration: preserve suspended for bg_migration > tests/qtest: migration events > tests/qtest: option to suspend during migration > tests/qtest: precopy migration with suspend > tests/qtest: postcopy migration with suspend > > backends/tpm/tpm_emulator.c | 2 +- > hw/usb/hcd-ehci.c| 2 +- > hw/usb/redirect.c| 2 +- > hw/xen/xen-hvm-common.c | 2 +- > include/migration/snapshot.h | 7 ++ > include/sysemu/runstate.h| 20 > migration/global_state.c | 47 + > migration/migration-hmp-cmds.c | 8 +- > migration/migration.c| 15 +-- > migration/savevm.c | 23 +++-- > qapi/misc.json | 10 +- > system/cpus.c| 47 +++-- > system/runstate.c| 9 ++ > system/vl.c | 2 + > tests/migration/i386/Makefile| 5 +- > tests/migration/i386/a-b-bootblock.S | 50 +- > tests/migration/i386/a-b-bootblock.h | 26 +++-- > tests/qtest/migration-helpers.c | 27 ++ > tests/qtest/migration-helpers.h | 11 ++- > tests/qtest/migration-test.c | 181 > +-- > 20 files changed, 352 insertions(+), 144 deletions(-) >
Re: [PATCH V8 02/12] cpus: stop vm in suspended runstate
FYI for Markus and Blake. No change since V6 and V7. - Steve On 12/13/2023 10:30 AM, Steve Sistare wrote: > Currently, a vm in the suspended state is not completely stopped. The VCPUs > have been paused, but the cpu clock still runs, and runstate notifiers for > the transition to stopped have not been called. This causes problems for > live migration. Stale cpu timers_state is saved to the migration stream, > causing time errors in the guest when it wakes from suspend, and state that > would have been modified by runstate notifiers is wrong. > > Modify vm_stop to completely stop the vm if the current state is suspended, > transition to RUN_STATE_PAUSED, and remember that the machine was suspended. > Modify vm_start to restore the suspended state. > > This affects all callers of vm_stop and vm_start, notably, the qapi stop and > cont commands. For example: > > (qemu) info status > VM status: paused (suspended) > > (qemu) stop > (qemu) info status > VM status: paused > > (qemu) system_wakeup > Error: Unable to wake up: guest is not in suspended state > > (qemu) cont > (qemu) info status > VM status: paused (suspended) > > (qemu) system_wakeup > (qemu) info status > VM status: running > > Suggested-by: Peter Xu > Signed-off-by: Steve Sistare > Reviewed-by: Peter Xu > --- > include/sysemu/runstate.h | 9 + > qapi/misc.json| 10 -- > system/cpus.c | 23 +++ > system/runstate.c | 3 +++ > 4 files changed, 35 insertions(+), 10 deletions(-) > > diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h > index 88a67e2..618eb49 100644 > --- a/include/sysemu/runstate.h > +++ b/include/sysemu/runstate.h > @@ -40,6 +40,15 @@ static inline bool shutdown_caused_by_guest(ShutdownCause > cause) > return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN; > } > > +/* > + * In a "live" state, the vcpu clock is ticking, and the runstate notifiers > + * think we are running. > + */ > +static inline bool runstate_is_live(RunState state) > +{ > +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED; > +} > + > void vm_start(void); > > /** > diff --git a/qapi/misc.json b/qapi/misc.json > index cda2eff..efb8d44 100644 > --- a/qapi/misc.json > +++ b/qapi/misc.json > @@ -134,7 +134,7 @@ > ## > # @stop: > # > -# Stop all guest VCPU execution. > +# Stop all guest VCPU and VM execution. > # > # Since: 0.14 > # > @@ -143,6 +143,9 @@ > # the guest remains paused once migration finishes, as if the -S > # option was passed on the command line. > # > +# In the "suspended" state, it will completely stop the VM and > +# cause a transition to the "paused" state. (Since 9.0) > +# > # Example: > # > # -> { "execute": "stop" } > @@ -153,7 +156,7 @@ > ## > # @cont: > # > -# Resume guest VCPU execution. > +# Resume guest VCPU and VM execution. > # > # Since: 0.14 > # > @@ -165,6 +168,9 @@ > # guest starts once migration finishes, removing the effect of the > # -S command line option if it was passed. > # > +# If the VM was previously suspended, and not been reset or woken, > +# this command will transition back to the "suspended" state. (Since 9.0) > +# > # Example: > # > # -> { "execute": "cont" } > diff --git a/system/cpus.c b/system/cpus.c > index 9f631ab..f162435 100644 > --- a/system/cpus.c > +++ b/system/cpus.c > @@ -277,11 +277,15 @@ bool vm_get_suspended(void) > static int do_vm_stop(RunState state, bool send_stop) > { > int ret = 0; > +RunState oldstate = runstate_get(); > > -if (runstate_is_running()) { > +if (runstate_is_live(oldstate)) { > +vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED); > runstate_set(state); > cpu_disable_ticks(); > -pause_all_vcpus(); > +if (oldstate == RUN_STATE_RUNNING) { > +pause_all_vcpus(); > +} > vm_state_notify(0, state); > if (send_stop) { > qapi_event_send_stop(); > @@ -694,11 +698,13 @@ int vm_stop(RunState state) > > /** > * Prepare for (re)starting the VM. > - * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already > - * running or in case of an error condition), 0 otherwise. > + * Returns 0 if the vCPUs should be restarted, -1 on an error condition, > + * and 1 otherwise. > */ > int vm_prepare_start(bool step_pending) > { > +int ret = vm_was_suspended ? 1 : 0; > +RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : > RUN_STATE_RUNNING; > RunState requested; > > qemu_vmstop_requested(); > @@ -729,9 +735,10 @@ int vm_prepare_start(bool step_pending) > qapi_event_send_resume(); > > cpu_enable_ticks(); > -runstate_set(RUN_STATE_RUNNING); > -vm_state_notify(1, RUN_STATE_RUNNING); > -return 0; > +runstate_set(state); > +vm_state_notify(1, state); > +vm_was_suspended = false; > +
Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate
Hi Markus, Eric, any comment about this change in stop/cont behavior before it gets pulled? There is no change since V6. - Steve On 12/6/2023 12:23 PM, Steve Sistare wrote: > Currently, a vm in the suspended state is not completely stopped. The VCPUs > have been paused, but the cpu clock still runs, and runstate notifiers for > the transition to stopped have not been called. This causes problems for > live migration. Stale cpu timers_state is saved to the migration stream, > causing time errors in the guest when it wakes from suspend, and state that > would have been modified by runstate notifiers is wrong. > > Modify vm_stop to completely stop the vm if the current state is suspended, > transition to RUN_STATE_PAUSED, and remember that the machine was suspended. > Modify vm_start to restore the suspended state. > > This affects all callers of vm_stop and vm_start, notably, the qapi stop and > cont commands. For example: > > (qemu) info status > VM status: paused (suspended) > > (qemu) stop > (qemu) info status > VM status: paused > > (qemu) system_wakeup > Error: Unable to wake up: guest is not in suspended state > > (qemu) cont > (qemu) info status > VM status: paused (suspended) > > (qemu) system_wakeup > (qemu) info status > VM status: running > > Suggested-by: Peter Xu > Signed-off-by: Steve Sistare > Reviewed-by: Peter Xu > --- > include/sysemu/runstate.h | 5 + > qapi/misc.json| 10 -- > system/cpus.c | 23 +++ > system/runstate.c | 3 +++ > 4 files changed, 31 insertions(+), 10 deletions(-) > > diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h > index 88a67e2..867e53c 100644 > --- a/include/sysemu/runstate.h > +++ b/include/sysemu/runstate.h > @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause > cause) > return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN; > } > > +static inline bool runstate_is_live(RunState state) > +{ > +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED; > +} > + > void vm_start(void); > > /** > diff --git a/qapi/misc.json b/qapi/misc.json > index cda2eff..efb8d44 100644 > --- a/qapi/misc.json > +++ b/qapi/misc.json > @@ -134,7 +134,7 @@ > ## > # @stop: > # > -# Stop all guest VCPU execution. > +# Stop all guest VCPU and VM execution. > # > # Since: 0.14 > # > @@ -143,6 +143,9 @@ > # the guest remains paused once migration finishes, as if the -S > # option was passed on the command line. > # > +# In the "suspended" state, it will completely stop the VM and > +# cause a transition to the "paused" state. (Since 9.0) > +# > # Example: > # > # -> { "execute": "stop" } > @@ -153,7 +156,7 @@ > ## > # @cont: > # > -# Resume guest VCPU execution. > +# Resume guest VCPU and VM execution. > # > # Since: 0.14 > # > @@ -165,6 +168,9 @@ > # guest starts once migration finishes, removing the effect of the > # -S command line option if it was passed. > # > +# If the VM was previously suspended, and not been reset or woken, > +# this command will transition back to the "suspended" state. (Since 9.0) > +# > # Example: > # > # -> { "execute": "cont" } > diff --git a/system/cpus.c b/system/cpus.c > index 9f631ab..f162435 100644 > --- a/system/cpus.c > +++ b/system/cpus.c > @@ -277,11 +277,15 @@ bool vm_get_suspended(void) > static int do_vm_stop(RunState state, bool send_stop) > { > int ret = 0; > +RunState oldstate = runstate_get(); > > -if (runstate_is_running()) { > +if (runstate_is_live(oldstate)) { > +vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED); > runstate_set(state); > cpu_disable_ticks(); > -pause_all_vcpus(); > +if (oldstate == RUN_STATE_RUNNING) { > +pause_all_vcpus(); > +} > vm_state_notify(0, state); > if (send_stop) { > qapi_event_send_stop(); > @@ -694,11 +698,13 @@ int vm_stop(RunState state) > > /** > * Prepare for (re)starting the VM. > - * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already > - * running or in case of an error condition), 0 otherwise. > + * Returns 0 if the vCPUs should be restarted, -1 on an error condition, > + * and 1 otherwise. > */ > int vm_prepare_start(bool step_pending) > { > +int ret = vm_was_suspended ? 1 : 0; > +RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : > RUN_STATE_RUNNING; > RunState requested; > > qemu_vmstop_requested(); > @@ -729,9 +735,10 @@ int vm_prepare_start(bool step_pending) > qapi_event_send_resume(); > > cpu_enable_ticks(); > -runstate_set(RUN_STATE_RUNNING); > -vm_state_notify(1, RUN_STATE_RUNNING); > -return 0; > +runstate_set(state); > +vm_state_notify(1, state); > +vm_was_suspended = false; > +return ret; > } > > void vm_start(void) > @@ -745,7
Re: [PATCH V7 00/12] fix migration of suspended runstate
On 12/11/2023 1:56 AM, Peter Xu wrote: > On Wed, Dec 06, 2023 at 12:30:02PM -0500, Steven Sistare wrote: >> cpus: stop vm in suspended runstate > > This patch still didn't copy the QAPI maintainers, please remember to do so > in a new post. > > Maybe it would be easier to move the QAPI doc changes into a separate > patch? This was intentional. I did not cc them for the whole series to spare them from excess email. You cc'd them for "[PATCH V6 03/14] cpus: stop vm in suspended runstate" they had no comments, and there is no change for V7, so I assumed we are OK, but I will cc them again for that patch to be sure. - Steve
Re: [PATCH V7 05/12] migration: propagate suspended runstate
On 12/11/2023 1:46 AM, Peter Xu wrote: > On Wed, Dec 06, 2023 at 09:23:30AM -0800, Steve Sistare wrote: >> If the outgoing machine was previously suspended, propagate that to the >> incoming side via global_state, so a subsequent vm_start restores the >> suspended state. To maintain backward and forward compatibility, reclaim >> some space from the runstate member. >> >> Signed-off-by: Steve Sistare > > Reviewed-by: Peter Xu > > One nitpick below. > >> --- >> migration/global_state.c | 35 +-- >> 1 file changed, 33 insertions(+), 2 deletions(-) >> >> diff --git a/migration/global_state.c b/migration/global_state.c >> index 4e2a9d8..d4f61a1 100644 >> --- a/migration/global_state.c >> +++ b/migration/global_state.c >> @@ -22,7 +22,16 @@ >> >> typedef struct { >> uint32_t size; >> -uint8_t runstate[100]; >> + >> +/* >> + * runstate was 100 bytes, zero padded, but we trimmed it to add a >> + * few fields and maintain backwards compatibility. >> + */ >> +uint8_t runstate[32]; >> +uint8_t has_vm_was_suspended; >> +uint8_t vm_was_suspended; >> +uint8_t unused[66]; >> + >> RunState state; >> bool received; >> } GlobalState; >> @@ -35,6 +44,10 @@ static void global_state_do_store(RunState state) >> assert(strlen(state_str) < sizeof(global_state.runstate)); >> strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate), >>state_str, '\0'); >> +global_state.has_vm_was_suspended = true; >> +global_state.vm_was_suspended = vm_get_suspended(); >> + >> +memset(global_state.unused, 0, sizeof(global_state.unused)); >> } >> >> void global_state_store(void) >> @@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque) >> return true; >> } >> >> +/* If the suspended state must be remembered, it is needed */ >> + >> +if (vm_get_suspended()) { >> +return true; >> +} > > Can we drop this section? > > I felt unsafe when QEMU can overwrite the option even if user explicitly > specified store-global-state=off but we still send this.. Ideally I think > it's better if it's as simple as: > > static bool global_state_needed(void *opaque) > { > return migrate_get_current()->store_global_state; > } I agree, I also did not see the point of dropping global_state for some states. I will simplify it to this. - Steve > I don't think we can remove the old trick due to compatibility reasons, but > maybe nice to not add new ones to make this section more unpredictable in > the migration stream? > > IMHO it shouldn't matter in reality for the current use case even dropping > it, as I don't expect any non-Xen QEMU VMs migrates without having the > option turned on (store-global-state=on) after QEMU 2.4. > > Thanks, >
Re: [PATCH V7 05/12] migration: propagate suspended runstate
On 12/8/2023 11:37 AM, Fabiano Rosas wrote: > Steve Sistare writes: > >> If the outgoing machine was previously suspended, propagate that to the >> incoming side via global_state, so a subsequent vm_start restores the >> suspended state. To maintain backward and forward compatibility, reclaim >> some space from the runstate member. >> >> Signed-off-by: Steve Sistare >> --- >> migration/global_state.c | 35 +-- >> 1 file changed, 33 insertions(+), 2 deletions(-) >> >> diff --git a/migration/global_state.c b/migration/global_state.c >> index 4e2a9d8..d4f61a1 100644 >> --- a/migration/global_state.c >> +++ b/migration/global_state.c >> @@ -22,7 +22,16 @@ >> >> typedef struct { >> uint32_t size; >> -uint8_t runstate[100]; >> + >> +/* >> + * runstate was 100 bytes, zero padded, but we trimmed it to add a >> + * few fields and maintain backwards compatibility. >> + */ >> +uint8_t runstate[32]; >> +uint8_t has_vm_was_suspended; >> +uint8_t vm_was_suspended; >> +uint8_t unused[66]; >> + >> RunState state; >> bool received; >> } GlobalState; >> @@ -35,6 +44,10 @@ static void global_state_do_store(RunState state) >> assert(strlen(state_str) < sizeof(global_state.runstate)); >> strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate), >>state_str, '\0'); >> +global_state.has_vm_was_suspended = true; >> +global_state.vm_was_suspended = vm_get_suspended(); >> + >> +memset(global_state.unused, 0, sizeof(global_state.unused)); >> } >> >> void global_state_store(void) >> @@ -68,6 +81,12 @@ static bool global_state_needed(void *opaque) >> return true; >> } >> >> +/* If the suspended state must be remembered, it is needed */ >> + >> +if (vm_get_suspended()) { >> +return true; >> +} >> + >> /* If state is running or paused, it is not needed */ >> >> if (strcmp(runstate, "running") == 0 || >> @@ -85,6 +104,7 @@ static int global_state_post_load(void *opaque, int >> version_id) >> Error *local_err = NULL; >> int r; >> char *runstate = (char *)s->runstate; >> +bool vm_was_suspended = s->has_vm_was_suspended && s->vm_was_suspended; > > Why do you need has_vm_was_suspended? If they're always read like this, > then one flag should do, no? has_vm_was_suspended=0 means a legacy qemu. Currently the dest has no reason to care, and we treat that as a vm_was_suspended=0 case, but I want to keep that field in case we ever need to know. But you are correct that this line can simply be: bool vm_was_suspended = s->vm_was_suspended; - Steve >> s->received = true; >> trace_migrate_global_state_post_load(runstate); >> @@ -93,7 +113,7 @@ static int global_state_post_load(void *opaque, int >> version_id) >> sizeof(s->runstate)) == sizeof(s->runstate)) { >> /* >> * This condition should never happen during migration, because >> - * all runstate names are shorter than 100 bytes (the size of >> + * all runstate names are shorter than 32 bytes (the size of >> * s->runstate). However, a malicious stream could overflow >> * the qapi_enum_parse() call, so we force the last character >> * to a NUL byte. >> @@ -110,6 +130,14 @@ static int global_state_post_load(void *opaque, int >> version_id) >> } >> s->state = r; >> >> +/* >> + * global_state is saved on the outgoing side before forcing a stopped >> + * state, so it may have saved state=suspended and vm_was_suspended=0. >> + * Now we are in a paused state, and when we later call vm_start, it >> must >> + * restore the suspended state, so we must set vm_was_suspended=1 here. >> + */ >> +vm_set_suspended(vm_was_suspended || r == RUN_STATE_SUSPENDED); >> + >> return 0; >> } >> >> @@ -134,6 +162,9 @@ static const VMStateDescription vmstate_globalstate = { >> .fields = (VMStateField[]) { >> VMSTATE_UINT32(size, GlobalState), >> VMSTATE_BUFFER(runstate, GlobalState), >> +VMSTATE_UINT8(has_vm_was_suspended, GlobalState), >> +VMSTATE_UINT8(vm_was_suspended, GlobalState), >> +VMSTATE_BUFFER(unused, GlobalState), >> VMSTATE_END_OF_LIST() >> }, >> };
Re: [PATCH V7 02/12] cpus: stop vm in suspended runstate
On 12/6/2023 1:45 PM, Philippe Mathieu-Daudé wrote: > Hi Steve, > > On 6/12/23 18:23, Steve Sistare wrote: >> Currently, a vm in the suspended state is not completely stopped. The VCPUs >> have been paused, but the cpu clock still runs, and runstate notifiers for >> the transition to stopped have not been called. This causes problems for >> live migration. Stale cpu timers_state is saved to the migration stream, >> causing time errors in the guest when it wakes from suspend, and state that >> would have been modified by runstate notifiers is wrong. >> >> Modify vm_stop to completely stop the vm if the current state is suspended, >> transition to RUN_STATE_PAUSED, and remember that the machine was suspended. >> Modify vm_start to restore the suspended state. >> >> This affects all callers of vm_stop and vm_start, notably, the qapi stop and >> cont commands. For example: >> >> (qemu) info status >> VM status: paused (suspended) >> >> (qemu) stop >> (qemu) info status >> VM status: paused >> >> (qemu) system_wakeup >> Error: Unable to wake up: guest is not in suspended state >> >> (qemu) cont >> (qemu) info status >> VM status: paused (suspended) >> >> (qemu) system_wakeup >> (qemu) info status >> VM status: running >> >> Suggested-by: Peter Xu >> Signed-off-by: Steve Sistare >> Reviewed-by: Peter Xu >> --- >> include/sysemu/runstate.h | 5 + >> qapi/misc.json | 10 -- >> system/cpus.c | 23 +++ >> system/runstate.c | 3 +++ >> 4 files changed, 31 insertions(+), 10 deletions(-) >> >> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h >> index 88a67e2..867e53c 100644 >> --- a/include/sysemu/runstate.h >> +++ b/include/sysemu/runstate.h >> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause >> cause) >> return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN; >> } >> +static inline bool runstate_is_live(RunState state) >> +{ >> + return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED; >> +} > > Not being familiar with (live) migration, from a generic vCPU PoV > I don't get what "runstate_is_live" means. Can we add a comment > explaining what this helper is for? Sure. "live" means the cpu clock is ticking, and the runstate notifiers think we are running. It is everything that is enabled in vm_start except for starting the vcpus. > Since this is a migration particular case, maybe we can be verbose > in vm_resume() and keep runstate_is_live() -- eventually undocumented > -- in migration/migration.c. runstate_is_live is about cpu and vm state, not migration state (the "live" is not live migration), and is used in multiple places in cpus code and elsewhere, so I would like to keep it in runstate.h. It has a specific meaning, and it is useful to search for it to see who handles "liveness", and distinguish it from code that checks the running and suspended states for other reasons. - Steve > void vm_resume(RunState state) > { > switch (state) { > case RUN_STATE_RUNNING: > case RUN_STATE_SUSPENDED: > vm_start(); > break; > default: > runstate_set(state); > } > }
Re: [PATCH V7 00/12] fix migration of suspended runstate
FYI, these patches still need RB: migration: propagate suspended runstate tests/qtest: precopy migration with suspend tests/qtest: postcopy migration with suspend This has RB, but the interaction between vm_start and vm_prepare_start changed, so needs another look. cpus: stop vm in suspended runstate - Steve On 12/6/2023 12:23 PM, Steve Sistare wrote: > Migration of a guest in the suspended runstate is broken. The incoming > migration code automatically tries to wake the guest, which is wrong; > the guest should end migration in the same runstate it started. Further, > after saving a snapshot in the suspended state and loading it, the vm_start > fails. The runstate is RUNNING, but the guest is not. > > See the commit messages for the details. > > Changes in V2: > * simplify "start on wakeup request" > * fix postcopy, snapshot, and background migration > * refactor fixes for each type of migration > * explicitly handled suspended events and runstate in tests > * add test for postcopy and background migration > > Changes in V3: > * rebase to tip > * fix hang in new function migrate_wait_for_dirty_mem > > Changes in V4: > * rebase to tip > * add patch for vm_prepare_start (thanks Peter) > * add patch to preserve cpu ticks > > Changes in V5: > * rebase to tip > * added patches to completely stop vm in suspended state: > cpus: refactor vm_stop > cpus: stop vm in suspended state > * added patch to partially resume vm in suspended state: > cpus: start vm in suspended state > * modified "preserve suspended ..." patches to use the above. > * deleted patch "preserve cpu ticks if suspended". stop ticks in > vm_stop_force_state instead. > * deleted patch "add runstate function". defined new helper function > migrate_new_runstate in "preserve suspended runstate" > * Added some RB's, but removed other RB's because the patches changed. > > Changes in V6: > * all vm_stop calls completely stop the suspended state > * refactored and updated the "cpus" patches > * simplified the "preserve suspended" patches > * added patch "bootfile per vm" > > Changes in V7: > * rebase to tip, add RB-s > * fix backwards compatibility for global_state.vm_was_suspended > * delete vm_prepare_start state argument, and rename patch > "pass runstate to vm_prepare_start" to > "check running not RUN_STATE_RUNNING" > * drop patches: > tests/qtest: bootfile per vm > tests/qtest: background migration with suspend > * rename runstate_is_started to runstate_is_live > * move wait_for_suspend in tests > > Steve Sistare (12): > cpus: vm_was_suspended > cpus: stop vm in suspended runstate > cpus: check running not RUN_STATE_RUNNING > cpus: vm_resume > migration: propagate suspended runstate > migration: preserve suspended runstate > migration: preserve suspended for snapshot > migration: preserve suspended for bg_migration > tests/qtest: migration events > tests/qtest: option to suspend during migration > tests/qtest: precopy migration with suspend > tests/qtest: postcopy migration with suspend > > backends/tpm/tpm_emulator.c | 2 +- > hw/usb/hcd-ehci.c| 2 +- > hw/usb/redirect.c| 2 +- > hw/xen/xen-hvm-common.c | 2 +- > include/migration/snapshot.h | 7 ++ > include/sysemu/runstate.h| 16 > migration/global_state.c | 35 ++- > migration/migration-hmp-cmds.c | 8 +- > migration/migration.c| 15 +-- > migration/savevm.c | 23 +++-- > qapi/misc.json | 10 +- > system/cpus.c| 47 +++-- > system/runstate.c| 9 ++ > system/vl.c | 2 + > tests/migration/i386/Makefile| 5 +- > tests/migration/i386/a-b-bootblock.S | 50 +- > tests/migration/i386/a-b-bootblock.h | 26 +++-- > tests/qtest/migration-helpers.c | 27 ++ > tests/qtest/migration-helpers.h | 11 ++- > tests/qtest/migration-test.c | 181 > +-- > 20 files changed, 354 insertions(+), 126 deletions(-) >
Re: [PATCH V7 00/12] fix migration of suspended runstate
Gack, there was cruft in my send mail directory. Please ignore this threaded series, and I will resend. Sorry for the noise. - Steve On 12/6/2023 12:12 PM, Steve Sistare wrote: > Migration of a guest in the suspended runstate is broken. The incoming > migration code automatically tries to wake the guest, which is wrong; > the guest should end migration in the same runstate it started. Further, > after saving a snapshot in the suspended state and loading it, the vm_start > fails. The runstate is RUNNING, but the guest is not. > > See the commit messages for the details. > > Changes in V2: > * simplify "start on wakeup request" > * fix postcopy, snapshot, and background migration > * refactor fixes for each type of migration > * explicitly handled suspended events and runstate in tests > * add test for postcopy and background migration > > Changes in V3: > * rebase to tip > * fix hang in new function migrate_wait_for_dirty_mem > > Changes in V4: > * rebase to tip > * add patch for vm_prepare_start (thanks Peter) > * add patch to preserve cpu ticks > > Changes in V5: > * rebase to tip > * added patches to completely stop vm in suspended state: > cpus: refactor vm_stop > cpus: stop vm in suspended state > * added patch to partially resume vm in suspended state: > cpus: start vm in suspended state > * modified "preserve suspended ..." patches to use the above. > * deleted patch "preserve cpu ticks if suspended". stop ticks in > vm_stop_force_state instead. > * deleted patch "add runstate function". defined new helper function > migrate_new_runstate in "preserve suspended runstate" > * Added some RB's, but removed other RB's because the patches changed. > > Changes in V6: > * all vm_stop calls completely stop the suspended state > * refactored and updated the "cpus" patches > * simplified the "preserve suspended" patches > * added patch "bootfile per vm" > > Changes in V7: > * rebase to tip, add RB-s > * fix backwards compatibility for global_state.vm_was_suspended > * delete vm_prepare_start state argument, and rename patch > "pass runstate to vm_prepare_start" to > "check running not RUN_STATE_RUNNING" > * drop patches: > tests/qtest: bootfile per vm > tests/qtest: background migration with suspend > * rename runstate_is_started to runstate_is_live > * move wait_for_suspend in tests > > Steve Sistare (12): > cpus: vm_was_suspended > cpus: stop vm in suspended runstate > cpus: check running not RUN_STATE_RUNNING > cpus: vm_resume > migration: propagate suspended runstate > migration: preserve suspended runstate > migration: preserve suspended for snapshot > migration: preserve suspended for bg_migration > tests/qtest: migration events > tests/qtest: option to suspend during migration > tests/qtest: precopy migration with suspend > tests/qtest: postcopy migration with suspend > > backends/tpm/tpm_emulator.c | 2 +- > hw/usb/hcd-ehci.c| 2 +- > hw/usb/redirect.c| 2 +- > hw/xen/xen-hvm-common.c | 2 +- > include/migration/snapshot.h | 7 ++ > include/sysemu/runstate.h| 16 > migration/global_state.c | 35 ++- > migration/migration-hmp-cmds.c | 8 +- > migration/migration.c| 15 +-- > migration/savevm.c | 23 +++-- > qapi/misc.json | 10 +- > system/cpus.c| 47 +++-- > system/runstate.c| 9 ++ > system/vl.c | 2 + > tests/migration/i386/Makefile| 5 +- > tests/migration/i386/a-b-bootblock.S | 50 +- > tests/migration/i386/a-b-bootblock.h | 26 +++-- > tests/qtest/migration-helpers.c | 27 ++ > tests/qtest/migration-helpers.h | 11 ++- > tests/qtest/migration-test.c | 181 > +-- > 20 files changed, 354 insertions(+), 126 deletions(-) >
Re: [PATCH V6 00/14] fix migration of suspended runstate
Hi Peter and Fabiano, Any comments on these patches, before I send V7? They are not affected by the other changes we have discussed, except for renaming runstate_is_started to runstate_is_live. [PATCH V6 04/14] cpus: vm_resume [PATCH V6 06/14] migration: preserve suspended runstate [PATCH V6 07/14] migration: preserve suspended for snapshot [PATCH V6 08/14] migration: preserve suspended for bg_migration - Steve On 11/30/2023 4:37 PM, Steve Sistare wrote: > Migration of a guest in the suspended runstate is broken. The incoming > migration code automatically tries to wake the guest, which is wrong; > the guest should end migration in the same runstate it started. Further, > after saving a snapshot in the suspended state and loading it, the vm_start > fails. The runstate is RUNNING, but the guest is not. > > See the commit messages for the details. > > Changes in V2: > * simplify "start on wakeup request" > * fix postcopy, snapshot, and background migration > * refactor fixes for each type of migration > * explicitly handled suspended events and runstate in tests > * add test for postcopy and background migration > > Changes in V3: > * rebase to tip > * fix hang in new function migrate_wait_for_dirty_mem > > Changes in V4: > * rebase to tip > * add patch for vm_prepare_start (thanks Peter) > * add patch to preserve cpu ticks > > Changes in V5: > * rebase to tip > * added patches to completely stop vm in suspended state: > cpus: refactor vm_stop > cpus: stop vm in suspended state > * added patch to partially resume vm in suspended state: > cpus: start vm in suspended state > * modified "preserve suspended ..." patches to use the above. > * deleted patch "preserve cpu ticks if suspended". stop ticks in > vm_stop_force_state instead. > * deleted patch "add runstate function". defined new helper function > migrate_new_runstate in "preserve suspended runstate" > * Added some RB's, but removed other RB's because the patches changed. > > Changes in V6: > * all vm_stop calls completely stop the suspended state > * refactored and updated the "cpus" patches > * simplified the "preserve suspended" patches > * added patch "bootfile per vm" > > Steve Sistare (14): > cpus: pass runstate to vm_prepare_start > cpus: vm_was_suspended > cpus: stop vm in suspended runstate > cpus: vm_resume > migration: propagate suspended runstate > migration: preserve suspended runstate > migration: preserve suspended for snapshot > migration: preserve suspended for bg_migration > tests/qtest: migration events > tests/qtest: option to suspend during migration > tests/qtest: precopy migration with suspend > tests/qtest: postcopy migration with suspend > tests/qtest: bootfile per vm > tests/qtest: background migration with suspend > > backends/tpm/tpm_emulator.c | 2 +- > gdbstub/system.c | 2 +- > hw/usb/hcd-ehci.c| 2 +- > hw/usb/redirect.c| 2 +- > hw/xen/xen-hvm-common.c | 2 +- > include/migration/snapshot.h | 7 + > include/sysemu/runstate.h| 19 ++- > migration/global_state.c | 10 ++ > migration/migration-hmp-cmds.c | 8 +- > migration/migration.c| 15 +-- > migration/savevm.c | 23 ++-- > qapi/misc.json | 10 +- > system/cpus.c| 49 +-- > system/runstate.c| 9 ++ > system/vl.c | 2 + > tests/migration/i386/Makefile| 5 +- > tests/migration/i386/a-b-bootblock.S | 50 +++- > tests/migration/i386/a-b-bootblock.h | 26 ++-- > tests/qtest/migration-helpers.c | 27 ++-- > tests/qtest/migration-helpers.h | 11 +- > tests/qtest/migration-test.c | 240 > +-- > 21 files changed, 382 insertions(+), 139 deletions(-) >
Re: [PATCH V6 13/14] tests/qtest: bootfile per vm
On 12/4/2023 5:37 PM, Peter Xu wrote: > On Mon, Dec 04, 2023 at 06:13:36PM -0300, Fabiano Rosas wrote: >> Steve Sistare writes: >> >>> Create a separate bootfile for the outgoing and incoming vm, so the block >>> layer can lock the file during the background migration test. Otherwise, >>> the test fails with: >>> "Failed to get "write" lock. Is another process using the image >>>[/tmp/migration-test-WAKPD2/bootsect]?" >> >> Hm.. what is the background migration even trying to access on the boot >> disk? @Peter? > > I didn't yet notice this patch until you asked, but background snapshot is > not designed to be used like this, afaict. > > It should normally be used when someone would like to use "savevm", then > background snapshot makes that snapshot save happen with VM running (live) > and mostly as performant as "savevm" due to page write protections (IOW, it > is not dirty tracking, but wr-protect each page so not writtable at all > until unprotected). > > Another difference (from "savevm") is, instead of storing that image onto > the block images, it stores that image also separately just like migrating > with "file:" as of now. > > When the dest QEMU starts it'll try to grab the image lock already because > it should never run with src running, just like when "loadvm" QEMU doesn't > assume the QEMU that ran "savevm" will be running. > >> >> This might be a good use for the -snapshot option. It should stop any >> attempt to get the write lock. Not a lot of difference, but slightly >> simpler. > > We don't yet have a background-snapshot test case. If we ever need, > that'll need to be done in two steps: start src, save snapshot into file, > start dest, load from snapshot file. We just shouldn't boot two together. > > Now after two years when I re-read the snapshot code a bit, I didn't even > find where QEMU took the disk snapshots.. logically it should be done at > the start of live background snapshot when VM was dumping device states, > something like bdrv_all_can_snapshot() orshould be needed to make sure all > images support snapshot on its own or it should already fail, and take > snapshots to match the image. > > IOW, I don't even think current raw disk would be able to support > background snapshot at all, otherwise if VM is live I don't see a way to > match the image (which is still lively updated by the running VM) to a live > snapshot taken. Copy the author, Andrey, for this question. > > Before that is confirmed, maybe the easiest way is we can go without a > background snapshot test case for suspend vm scenario. I am OK with dropping the test patches. It's a lot of change for a trivial test. tests/qtest: background migration with suspend tests/qtest: bootfile per vm - Steve
Re: [PATCH V6 05/14] migration: propagate suspended runstate
On 12/5/2023 11:50 AM, Peter Xu wrote: > On Mon, Dec 04, 2023 at 05:23:57PM -0500, Steven Sistare wrote: >> The V6 code does not pass a state to vm_start, and knowledge of >> vm_was_suspended >> is confined to the global_state and cpus functions. IMO this is a more >> modular >> and robust solution, as multiple sites may call vm_start(), and the right >> thing >> happens. Look at patch 6. The changes are minimal because vm_start "just >> works". > > Oh I think I see what you meant. Sounds good then. > > Shall we hide that into vm_prepare_start()? It seems three's still one > more call sites that always pass in RUNNING (gdb_continue_partial). > > If with above, vm_prepare_start() will go into either RUNNING, SUSPENDED, > or an error. It returns 0 only if RUNNING, -1 for all the rest. Maybe we > can already also touch up the retval of vm_prepare_start() to be a boolean, > reflecting "whether vcpu needs to be started". Yes, that is even nicer, thanks. /** * Prepare for (re)starting the VM. * Returns 0 if the vCPUs should be restarted, -1 on an error condition, * and 1 otherwise. */ int vm_prepare_start(bool step_pending) { int ret = vm_was_suspended ? 1 : 0; RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : RUN_STATE_RUNNING; ... vm_was_suspended = false; return ret; } void vm_start(void) { if (!vm_prepare_start(false)) { resume_all_vcpus(); } } - Steve
Re: [PATCH V6 05/14] migration: propagate suspended runstate
On 12/5/2023 11:52 AM, Fabiano Rosas wrote: > Peter Xu writes: > >> On Tue, Dec 05, 2023 at 09:44:12AM -0300, Fabiano Rosas wrote: >>> Peter Xu writes: >>> On Mon, Dec 04, 2023 at 06:09:16PM -0300, Fabiano Rosas wrote: > Right, I got your point. I just think we could avoid designing this new > string format by creating new fields with the extra space: > > typedef struct QEMU_PACKED { > uint32_t size; > uint8_t runstate[50]; > uint8_t unused[50]; > RunState state; > bool received; > } GlobalState; > > In my mind this works seamlessly, or am I mistaken? I think what you proposed should indeed work. Currently it's: .fields = (VMStateField[]) { VMSTATE_UINT32(size, GlobalState), VMSTATE_BUFFER(runstate, GlobalState), VMSTATE_END_OF_LIST() }, I had a quick look at vmstate_info_buffer, it mostly only get()/put() those buffers with its sizeof(), so looks all fine. For sure in all cases we'd better test it to verify. One side note is since we so far use qapi_enum_parse() for the runstate, I think the "size" is not ever used.. If we do want a split, IMHO we can consider making runstate[] even smaller to just free up the rest spaces all in one shot: typedef struct QEMU_PACKED { >> >> [1] >> uint32_t size; /* * Assuming 16 is good enough to fit all possible runstate strings.. * This field must be a string ending with '\0'. */ uint8_t runstate[16]; /* 0x00 when QEMU doesn't support it, or "0"/"1" to reflect its state */ uint8_t vm_was_suspended[1]; /* * Still free of use space. Note that we only have 99 bytes for use * because the last byte (the 100th byte) must be zero due to legacy * reasons, if not it may be set to zero after loaded on dest QEMU. */ >>> >>> I'd add a 'uint8_t reserved;' to go along with this comment instead of >>> leaving a hole. >> >> Note that "struct GlobalState" is not a binary format but only some >> internal storage, what really matters is vmstate_globalstate. Here the >> "uint8_reserved" will be a pure waste of 1 byte in QEMU binary, imho. >> > > I prefer wasting the byte and make the code more obvious to people who > might not immediately understand what's going on. We could even > assert(!global_state.reserved) to sanity check the assumption. Anyway, > that's minor, I'm fine with it either way. > >> I think I just copied what you had previously and extended it, logically I >> don't think we ever need QEMU_PACKED right above [1]. We can also drop >> "size" directly here, but this can be done later. > > Ah right, I was initially thinking of letting the new qemu overrun > runstate[16] so we wouldn't have to change the code. But that's indeed > not necessary, your additions to the vmstate make it ok. Thanks. There is no need to reserve byte 100 in the new scheme. The incoming side sets s->runstate[sizeof(s->runstate) - 1] = 0 to protect itself, and now sizeof(runstate) is smaller. - Steve
Re: [PATCH V6 11/14] tests/qtest: precopy migration with suspend
On 12/4/2023 3:49 PM, Peter Xu wrote: > On Thu, Nov 30, 2023 at 01:37:24PM -0800, Steve Sistare wrote: >> Add a test case to verify that the suspended state is handled correctly >> during live migration precopy. The test suspends the src, migrates, then >> wakes the dest. >> >> Signed-off-by: Steve Sistare >> --- >> tests/qtest/migration-helpers.c | 3 ++ >> tests/qtest/migration-helpers.h | 2 ++ >> tests/qtest/migration-test.c| 64 >> ++--- >> 3 files changed, 65 insertions(+), 4 deletions(-) >> >> diff --git a/tests/qtest/migration-helpers.c >> b/tests/qtest/migration-helpers.c >> index fd3b94e..37e8e81 100644 >> --- a/tests/qtest/migration-helpers.c >> +++ b/tests/qtest/migration-helpers.c >> @@ -32,6 +32,9 @@ bool migrate_watch_for_events(QTestState *who, const char >> *name, >> if (g_str_equal(name, "STOP")) { >> state->stop_seen = true; >> return true; >> +} else if (g_str_equal(name, "SUSPEND")) { >> +state->suspend_seen = true; >> +return true; >> } else if (g_str_equal(name, "RESUME")) { >> state->resume_seen = true; >> return true; >> diff --git a/tests/qtest/migration-helpers.h >> b/tests/qtest/migration-helpers.h >> index 3d32699..b478549 100644 >> --- a/tests/qtest/migration-helpers.h >> +++ b/tests/qtest/migration-helpers.h >> @@ -18,6 +18,8 @@ >> typedef struct QTestMigrationState { >> bool stop_seen; >> bool resume_seen; >> +bool suspend_seen; >> +bool suspend_me; >> } QTestMigrationState; >> >> bool migrate_watch_for_events(QTestState *who, const char *name, >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c >> index e10d5a4..200f023 100644 >> --- a/tests/qtest/migration-test.c >> +++ b/tests/qtest/migration-test.c >> @@ -178,7 +178,7 @@ static void bootfile_delete(void) >> /* >> * Wait for some output in the serial output file, >> * we get an 'A' followed by an endless string of 'B's >> - * but on the destination we won't have the A. >> + * but on the destination we won't have the A (unless we enabled >> suspend/resume) >> */ >> static void wait_for_serial(const char *side) >> { >> @@ -245,6 +245,13 @@ static void wait_for_resume(QTestState *who, >> QTestMigrationState *state) >> } >> } >> >> +static void wait_for_suspend(QTestState *who, QTestMigrationState *state) >> +{ >> +if (!state->suspend_seen) { >> +qtest_qmp_eventwait(who, "SUSPEND"); >> +} >> +} >> + >> /* >> * It's tricky to use qemu's migration event capability with qtest, >> * events suddenly appearing confuse the qmp()/hmp() responses. >> @@ -299,7 +306,7 @@ static void wait_for_migration_pass(QTestState *who) >> { >> uint64_t pass, prev_pass = 0, changes = 0; >> >> -while (changes < 2 && !src_state.stop_seen) { >> +while (changes < 2 && !src_state.stop_seen && !src_state.suspend_seen) { >> usleep(1000); >> pass = get_migration_pass(who); >> changes += (pass != prev_pass); >> @@ -595,7 +602,8 @@ static void migrate_wait_for_dirty_mem(QTestState *from, >> watch_byte = qtest_readb(from, watch_address); >> do { >> usleep(1000 * 10); >> -} while (qtest_readb(from, watch_address) == watch_byte); >> +} while (qtest_readb(from, watch_address) == watch_byte && >> + !src_state.suspend_seen); > > This is hackish to me. > > AFAIU the guest code won't ever dirty anything after printing the initial > 'B'. IOW, suspend not seen() should not be called for suspend > test at all, I guess, because we know it won't. Calling migrate_wait_for_dirty_mem proves that a source write is propagated to the dest, even for the suspended case. We fully expect that, but a good test verifies our expectations. That is done in the first loop of migrate_wait_for_dirty_mem. After that, we must check for the suspended state, because the second loop will not terminate. Here is a more explicit version: static void migrate_wait_for_dirty_mem(QTestState *from, QTestState *to) { uint64_t watch_address = start_address + MAGIC_OFFSET_BASE; uint64_t marker_address = start_address + MAGIC_OFFSET; uint8_t watch_byte; /* * Wait for the MAGIC_MARKER to get transferred, as an * indicator that a migration pass has made some known * amount of progress. */ do { usleep(1000 * 10); } while (qtest_readq(to, marker_address) != MAGIC_MARKER); +/* If suspended, src only iterates once, and watch_byte may never change */ +if (src_state.suspend_me) { +return; +} >> } >> >> >> @@ -771,6 +779,7 @@ static int test_migrate_start(QTestState **from, >> QTestState **to, >> dst_state = (QTestMigrationState) { }; >> src_state = (QTestMigrationState) { }; >> bootfile_create(tmpfs, args->suspend_me); >> +src_state.suspend_me = args->suspend_me; >> >> if
Re: [PATCH V6 05/14] migration: propagate suspended runstate
On 12/5/2023 7:44 AM, Fabiano Rosas wrote: > Peter Xu writes: > >> On Mon, Dec 04, 2023 at 06:09:16PM -0300, Fabiano Rosas wrote: >>> Right, I got your point. I just think we could avoid designing this new >>> string format by creating new fields with the extra space: >>> >>> typedef struct QEMU_PACKED { >>> uint32_t size; >>> uint8_t runstate[50]; >>> uint8_t unused[50]; >>> RunState state; >>> bool received; >>> } GlobalState; >>> >>> In my mind this works seamlessly, or am I mistaken? >> >> I think what you proposed should indeed work. >> >> Currently it's: >> >> .fields = (VMStateField[]) { >> VMSTATE_UINT32(size, GlobalState), >> VMSTATE_BUFFER(runstate, GlobalState), >> VMSTATE_END_OF_LIST() >> }, >> >> I had a quick look at vmstate_info_buffer, it mostly only get()/put() those >> buffers with its sizeof(), so looks all fine. For sure in all cases we'd >> better test it to verify. >> >> One side note is since we so far use qapi_enum_parse() for the runstate, I >> think the "size" is not ever used.. >> >> If we do want a split, IMHO we can consider making runstate[] even smaller >> to just free up the rest spaces all in one shot: >> >> typedef struct QEMU_PACKED { >> uint32_t size; >> /* >>* Assuming 16 is good enough to fit all possible runstate strings.. >>* This field must be a string ending with '\0'. >>*/ >> uint8_t runstate[16]; >> /* 0x00 when QEMU doesn't support it, or "0"/"1" to reflect its state >> */ >> uint8_t vm_was_suspended[1]; >> /* >>* Still free of use space. Note that we only have 99 bytes for use >>* because the last byte (the 100th byte) must be zero due to legacy >>* reasons, if not it may be set to zero after loaded on dest QEMU. >>*/ > > I'd add a 'uint8_t reserved;' to go along with this comment instead of > leaving a hole. I'll use this scheme, thanks. It is a clearer than implicitly packing strings. - Steve
Re: [PATCH V6 05/14] migration: propagate suspended runstate
On 12/4/2023 12:24 PM, Peter Xu wrote: > On Fri, Dec 01, 2023 at 11:23:33AM -0500, Steven Sistare wrote: >>>> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int >>>> version_id) >>>> return -EINVAL; >>>> } >>>> s->state = r; >>>> +vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED); >>> >>> IIUC current vm_was_suspended (based on my read of your patch) was not the >>> same as a boolean representing "whether VM is suspended", but only a >>> temporary field to remember that for a VM stop request. To be explicit, I >>> didn't see this flag set in qemu_system_suspend() in your previous patch. >>> >>> If so, we can already do: >>> >>> vm_set_suspended(s->vm_was_suspended); >>> >>> Irrelevant of RUN_STATE_SUSPENDED? >> >> We need both terms of the expression. >> >> If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = >> false. >> We call global_state_store prior to vm_stop_force_state, so the incoming >> side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false. > > Right. > >> However, the runstate is RUN_STATE_INMIGRATE. When incoming finishes by >> calling vm_start, we need to restore the suspended state. Thus in >> global_state_post_load, we must set vm_was_suspended = true. > > With above, shouldn't global_state_get_runstate() (on dest) fetch SUSPENDED > already? Then I think it should call vm_start(SUSPENDED) if to start. The V6 code does not pass a state to vm_start, and knowledge of vm_was_suspended is confined to the global_state and cpus functions. IMO this is a more modular and robust solution, as multiple sites may call vm_start(), and the right thing happens. Look at patch 6. The changes are minimal because vm_start "just works". > Maybe you're talking about the special case where autostart==false? We > used to have this (existing process_incoming_migration_bh()): > > if (!global_state_received() || > global_state_get_runstate() == RUN_STATE_RUNNING) { > if (autostart) { > vm_start(); > } else { > runstate_set(RUN_STATE_PAUSED); > } > } > > If so maybe I get you, because in the "else" path we do seem to lose the > SUSPENDED state again, but in that case IMHO we should logically set > vm_was_suspended only when we "lose" it - we didn't lose it during > migration, but only until we decided to switch to PAUSED (due to > autostart==false). IOW, change above to something like: > > state = global_state_get_runstate(); > if (!global_state_received() || runstate_is_alive(state)) { > if (autostart) { > vm_start(state); > } else { > if (runstate_is_suspended(state)) { > /* Remember suspended state before setting system to STOPed */ > vm_was_suspended = true; > } > runstate_set(RUN_STATE_PAUSED); > } > } This is similar to V5 which tested suspended and fiddled with runstate at multiple call sites in migration and snapshot. I believe V6 is cleaner. > It may or may not have a functional difference even if current patch, > though. However maybe clearer to follow vm_was_suspended's strict > definition. > >> >> If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED), >> then vm_was_suspended = true. Migration from that state sets >> vm_was_suspended = s->vm_was_suspended = true in global_state_post_load and >> ends with runstate_set(RUN_STATE_PAUSED). >> >> I will add a comment here in the code. >> >>>> return 0; >>>> } >>>> @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = { >>>> .fields = (VMStateField[]) { >>>> VMSTATE_UINT32(size, GlobalState), >>>> VMSTATE_BUFFER(runstate, GlobalState), >>>> +VMSTATE_BOOL(vm_was_suspended, GlobalState), >>>> VMSTATE_END_OF_LIST() >>>> }, >>>> }; >>> >>> I think this will break migration between old/new, unfortunately. And >>> since the global state exist mostly for every VM, all VM setup should be >>> affected, and over all archs. >> >> Thanks, I keep forgetting that my binary tricks are no good here. However, >> I have one other trick up my sleeve, which is to store vm_was_running in >> global_state.runstate[strlen(runstate) + 2]. It is forwards and bac
Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate
On 12/4/2023 11:35 AM, Peter Xu wrote: > On Fri, Dec 01, 2023 at 12:11:32PM -0500, Steven Sistare wrote: >>>> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h >>>> index f6a337b..1d6828f 100644 >>>> --- a/include/sysemu/runstate.h >>>> +++ b/include/sysemu/runstate.h >>>> @@ -40,6 +40,11 @@ static inline bool >>>> shutdown_caused_by_guest(ShutdownCause cause) >>>> return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN; >>>> } >>>> >>>> +static inline bool runstate_is_started(RunState state) >>> >>> Would runstate_has_vm_running() sound better? It is a bit awkward when >>> saying something like "start a runstate". >> >> I have been searching for the perfect name for this accessor. >> IMO using "running" in this accessor is confusing because it applies to both >> the running and suspended state. So, I invented a new aggregate state called >> started. vm_start transitions the machine to a started state. >> >> How about runstate_was_started? It works well at both start and stop call >> sites: >> >> void vm_resume(RunState state) >> if (runstate_was_started(state)) { > > This one looks fine, but... > >> vm_start(); >> >> int vm_stop_force_state(RunState state) >> if (runstate_was_started(runstate_get())) { > > .. this one makes the past tense not looking good. > >> return vm_stop(state); > > How about runstate_is_alive()? So far the best I can come up with. :) > > Even if you prefer "started", I'd vote for not using past tense, hence > runstate_is_started(). runstate_is_live also occurred to me. I'll use that. - Steve
Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate
On 11/30/2023 5:10 PM, Peter Xu wrote: > On Thu, Nov 30, 2023 at 01:37:16PM -0800, Steve Sistare wrote: >> Currently, a vm in the suspended state is not completely stopped. The VCPUs >> have been paused, but the cpu clock still runs, and runstate notifiers for >> the transition to stopped have not been called. This causes problems for >> live migration. Stale cpu timers_state is saved to the migration stream, >> causing time errors in the guest when it wakes from suspend, and state that >> would have been modified by runstate notifiers is wrong. >> >> Modify vm_stop to completely stop the vm if the current state is suspended, >> transition to RUN_STATE_PAUSED, and remember that the machine was suspended. >> Modify vm_start to restore the suspended state. >> >> This affects all callers of vm_stop and vm_start, notably, the qapi stop and >> cont commands. For example: >> >> (qemu) info status >> VM status: paused (suspended) >> >> (qemu) stop >> (qemu) info status >> VM status: paused >> >> (qemu) cont >> (qemu) info status >> VM status: paused (suspended) >> >> (qemu) system_wakeup >> (qemu) info status >> VM status: running > > So system_wakeup for a stopped (but used to be suspended) VM will fail > directly, not touching vm_was_suspended. It's not mentioned here, but that > behavior makes sense to me. Right. I'll add that example above. >> Suggested-by: Peter Xu >> Signed-off-by: Steve Sistare > > Reviewed-by: Peter Xu > > Since you touched qapi/, please copy maintainers too. I've copied Markus > and Eric in this reply. My bad, thanks for the cc. > I also have some nitpicks which may not affect the R-b, please see below. > >> --- >> include/sysemu/runstate.h | 5 + >> qapi/misc.json| 10 -- >> system/cpus.c | 19 ++- >> system/runstate.c | 3 +++ >> 4 files changed, 30 insertions(+), 7 deletions(-) >> >> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h >> index f6a337b..1d6828f 100644 >> --- a/include/sysemu/runstate.h >> +++ b/include/sysemu/runstate.h >> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause >> cause) >> return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN; >> } >> >> +static inline bool runstate_is_started(RunState state) > > Would runstate_has_vm_running() sound better? It is a bit awkward when > saying something like "start a runstate". I have been searching for the perfect name for this accessor. IMO using "running" in this accessor is confusing because it applies to both the running and suspended state. So, I invented a new aggregate state called started. vm_start transitions the machine to a started state. How about runstate_was_started? It works well at both start and stop call sites: void vm_resume(RunState state) if (runstate_was_started(state)) { vm_start(); int vm_stop_force_state(RunState state) if (runstate_was_started(runstate_get())) { return vm_stop(state); - Steve >> +{ >> +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED; >> +} >> +
Re: [PATCH V6 05/14] migration: propagate suspended runstate
On 11/30/2023 6:06 PM, Peter Xu wrote: > On Thu, Nov 30, 2023 at 01:37:18PM -0800, Steve Sistare wrote: >> If the outgoing machine was previously suspended, propagate that to the >> incoming side via global_state, so a subsequent vm_start restores the >> suspended state. To maintain backward and forward compatibility, define >> the new field in a zero'd hole in the GlobalState struct. >> >> Signed-off-by: Steve Sistare >> --- >> migration/global_state.c | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/migration/global_state.c b/migration/global_state.c >> index 4e2a9d8..de2532c 100644 >> --- a/migration/global_state.c >> +++ b/migration/global_state.c >> @@ -25,6 +25,7 @@ typedef struct { >> uint8_t runstate[100]; >> RunState state; >> bool received; >> +bool vm_was_suspended; >> } GlobalState; >> >> static GlobalState global_state; >> @@ -35,6 +36,7 @@ static void global_state_do_store(RunState state) >> assert(strlen(state_str) < sizeof(global_state.runstate)); >> strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate), >>state_str, '\0'); >> +global_state.vm_was_suspended = vm_get_suspended(); >> } >> >> void global_state_store(void) >> @@ -68,6 +70,12 @@ static bool global_state_needed(void *opaque) >> return true; >> } >> >> +/* If the suspended state must be remembered, it is needed */ >> + >> +if (vm_get_suspended()) { >> +return true; >> +} >> + >> /* If state is running or paused, it is not needed */ >> >> if (strcmp(runstate, "running") == 0 || >> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int >> version_id) >> return -EINVAL; >> } >> s->state = r; >> +vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED); > > IIUC current vm_was_suspended (based on my read of your patch) was not the > same as a boolean representing "whether VM is suspended", but only a > temporary field to remember that for a VM stop request. To be explicit, I > didn't see this flag set in qemu_system_suspend() in your previous patch. > > If so, we can already do: > > vm_set_suspended(s->vm_was_suspended); > > Irrelevant of RUN_STATE_SUSPENDED? We need both terms of the expression. If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = false. We call global_state_store prior to vm_stop_force_state, so the incoming side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false. However, the runstate is RUN_STATE_INMIGRATE. When incoming finishes by calling vm_start, we need to restore the suspended state. Thus in global_state_post_load, we must set vm_was_suspended = true. If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED), then vm_was_suspended = true. Migration from that state sets vm_was_suspended = s->vm_was_suspended = true in global_state_post_load and ends with runstate_set(RUN_STATE_PAUSED). I will add a comment here in the code. >> return 0; >> } >> @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = { >> .fields = (VMStateField[]) { >> VMSTATE_UINT32(size, GlobalState), >> VMSTATE_BUFFER(runstate, GlobalState), >> +VMSTATE_BOOL(vm_was_suspended, GlobalState), >> VMSTATE_END_OF_LIST() >> }, >> }; > > I think this will break migration between old/new, unfortunately. And > since the global state exist mostly for every VM, all VM setup should be > affected, and over all archs. Thanks, I keep forgetting that my binary tricks are no good here. However, I have one other trick up my sleeve, which is to store vm_was_running in global_state.runstate[strlen(runstate) + 2]. It is forwards and backwards compatible, since that byte is always 0 in older qemu. It can be implemented with a few lines of code change confined to global_state.c, versus many lines spread across files to do it the conventional way using a compat property and a subsection. Sound OK? > We used to have the version_id field right above for adding fields, but I > _think_ that will still break backward migration fron new->old binary, so > not wanted. Juan can keep me honest. > > The best thing is still machine compat properties, afaict, to fix. It's > slightly involved, but let me attach a sample diff for you (at the end, > possibly working with your current patch kind-of squashed, but not ever > tested), hopefully make it slightly easier. > > I'm wondering how bad it is to just ignore it, it's not as bad as if we > don't fix stop-during-suspend, in this case the worst case of forgetting > this field over migration is: if VM stopped (and used to be suspended) then > after migration it'll keep being stopped, however after "cont" it'll forget > the suspended state. Cont changes state to running, but the VM is broken because wake was never called. > Not that bad! IIUC SPR should always migrate with >
Re: [PATCH V5 02/12] cpus: stop vm in suspended state
On 11/22/2023 11:21 AM, Peter Xu wrote: > On Wed, Nov 22, 2023 at 09:38:06AM +, Daniel P. Berrangé wrote: >> On Mon, Nov 20, 2023 at 04:44:50PM -0500, Peter Xu wrote: >>> On Mon, Nov 20, 2023 at 03:55:54PM -0500, Steven Sistare wrote: >>>> If we drop force, then all calls to vm_stop will completely stop the >>>> suspended state, eg an hmp "stop" command. This causes two problems. >>>> First, that is a change in user-visible behavior for something that >>>> currently works, >>> >>> IMHO it depends on what should be the correct behavior. IOW, when VM is in >>> SUSPENDED state and then the user sends "stop" QMP command, what should we >>> expect? >> >> I would say that from a mgmtm app POV "stop" is initiating a state >> transition, from RUN_STATE_RUNNING to RUN_STATE_PAUSED and "cont" >> is doing the reverse from PAUSED to RUNNING. >> >> It is a little more complicated than that as there are some other >> states like INMIGRATE that are conceptually equiv to RUNNING, >> and states where the transition simply doesn't make sense. > > In the qemu impl, INMIGRATE is, imho, more equiv of PAUSED - do_vm_stop() > mostly ignores every state except RUNNING (putting bdrv operations aside). > IOW, anything besides "running" is treated as "not running". > > But then Paolo fixed that in 1e9981465f ("qmp: handle stop/cont in > INMIGRATE state"), wiring that to autostart. > > Now we seem to find that "suspended" should actually fall within (where > "vm" is running, but "vcpu" is not), and it seems we should treat "vm" and > "vcpu" differently. > >> >> So my question is if we're in "SUSPENDED" and someone issues "stop", >> what state do we go into, and perhaps more importantly what state >> do we go to in a subsequent "cont". > > I think we must stop the "vm", not only the "vcpu". I discussed this bit > in the other thread more or less: it's because qemu_system_wakeup_request() > can be called in many places, e.g. acpi_pm_tmr_timer(). > > It means even after the VM is "stop"ped by the admin QMP (where qmp_stop() > ignores SUSPENDED, keep the "vm" running), it can silently got waken up > without admin even noticing it. I'm not sure what Libvirt will behave if > it suddenly receives a QAPI_EVENT_WAKEUP randomly after a "stop". > >> >> If you say SUSPENDED ---(stop)---> PAUSED ---(cont)---> SUSPENDED >> then we create a problem, because the decision for the transition >> out of PAUSED needs memory of the previous state. > > Right, that's why I think we at least need one more boolean to remember the > suspended state, then when we switch from any STOP states into any RUN > states, we know where to go. Here STOP states I meant anything except > RUNNING and SUSPENDED, while RUN -> RUNNING or SUSPENDED. > >> >>> My understanding is we should expect to fully stop the VM, including the >>> ticks, for example. Keeping the ticks running even after QMP "stop" >>> doesn't sound right, isn't it? >> >> The "stop" QMP command is documented as >> >> "Stop all guest VCPU execution" >> >> the devil is in the detail though, and we've not documented any detail. >> >> Whether or not timers keep running across stop/cont I think can be >> argued to be an impl detail, as long as the headline goal "vcpus >> don't execute" is satisfied. > > "stop" was since qemu v0.14, so I guess we can't blame the missing of > details or any form of inaccuracy.. Obviously we do more than "stop vCPU > executions" in the current implementation. > > But after we reach a consensus on how we should fix the current suspended > problem, we may want to update the documentation to start containing more > information. > >> >>>> vs the migration code where we are fixing brokenness. >>> >>> This is not a migration-only bug if above holds, IMO. >>> >>>> Second, it does not quite work, because the state becomes >>>> RUN_STATE_PAUSED, so the suspended state is forgotten, and the hmp "cont" >>>> will try to set the running state. I could fix that by introducing a new >>>> state RUN_STATE_SUSPENDED_STOPPED, but again it is a user-visible change >>>> in existing behavior. (I even implemented that while developing, then I >>>> realized it was not needed to fix the migratio
Re: [PATCH V5 02/12] cpus: stop vm in suspended state
On 11/20/2023 4:44 PM, Peter Xu wrote: > On Mon, Nov 20, 2023 at 03:55:54PM -0500, Steven Sistare wrote: >> If we drop force, then all calls to vm_stop will completely stop the >> suspended state, eg an hmp "stop" command. This causes two problems. >> First, that is a change in user-visible behavior for something that >> currently works, > > IMHO it depends on what should be the correct behavior. IOW, when VM is in > SUSPENDED state and then the user sends "stop" QMP command, what should we > expect? > > My understanding is we should expect to fully stop the VM, including the > ticks, for example. Keeping the ticks running even after QMP "stop" > doesn't sound right, isn't it? I agree, but others may not, and this decision would require approval from maintainers in other areas, including upper layers such as libvirt that are aware of the suspended state. It is a user-visible change, and may require a new libvirt release. >> vs the migration code where we are fixing brokenness. > > This is not a migration-only bug if above holds, IMO. > >> Second, it does not quite work, because the state becomes >> RUN_STATE_PAUSED, so the suspended state is forgotten, and the hmp "cont" >> will try to set the running state. I could fix that by introducing a new >> state RUN_STATE_SUSPENDED_STOPPED, but again it is a user-visible change >> in existing behavior. (I even implemented that while developing, then I >> realized it was not needed to fix the migration bugs.) > > Good point. > > Now with above comments, what's your thoughts on whether we should change > the user behavior? My answer is still a yes. > > Maybe SUSPENDED should not be a RunState at all? SUSPENDED is guest visible > behavior, while something like QMP "stop" is not guest visible. Maybe we > should remember it separately? > > It means qemu_system_suspend() could remember that in a separate field (as > part of guest state), then when wakeup we should conditionally go back > with/without vcpus running depending on the new "suspended" state. Regardless of how we remember it, the status command must still expose the suspended state to the user. The user must be able to see that a guest is suspended, and decide when to issue a wakeup command. If we change the stop command to completely stop a suspended vm, then we must allow the user to query whether a vm is suspended-running or suspended-stopped, because the command they must issue to resume is different: wakeup for the former, and cont for the latter. This change is visible to libvirt. Adding it will delay this entire patch series, and is not necessary for fixing the migration bug. There is no downside to drawing the line here for this series, and possibly changing stop semantics in the future. - Steve
Re: [PATCH V5 02/12] cpus: stop vm in suspended state
On 11/20/2023 3:47 PM, Fabiano Rosas wrote: > Peter Xu writes: >> On Mon, Nov 13, 2023 at 10:33:50AM -0800, Steve Sistare wrote: >>> A vm in the suspended state is not completely stopped. The VCPUs have been >>> paused, but the cpu clock still runs, and runstate notifiers for the >>> transition to stopped have not been called. Modify vm_stop_force_state to >>> completely stop the vm if the current state is suspended, to be called for >>> live migration and snapshots. >>> >>> Suggested-by: Peter Xu >>> Signed-off-by: Steve Sistare >>> --- >>> system/cpus.c | 8 ++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/system/cpus.c b/system/cpus.c >>> index f72c4be..c772708 100644 >>> --- a/system/cpus.c >>> +++ b/system/cpus.c >>> @@ -255,6 +255,8 @@ void cpu_interrupt(CPUState *cpu, int mask) >>> static int do_vm_stop(RunState state, bool send_stop, bool force) >>> { >>> int ret = 0; >>> +bool running = runstate_is_running(); >>> +bool suspended = runstate_check(RUN_STATE_SUSPENDED); >>> >>> if (qemu_in_vcpu_thread()) { >>> qemu_system_vmstop_request_prepare(); >>> @@ -267,10 +269,12 @@ static int do_vm_stop(RunState state, bool send_stop, >>> bool force) >>> return 0; >>> } >>> >>> -if (runstate_is_running()) { >>> +if (running || (suspended && force)) { >>> runstate_set(state); >>> cpu_disable_ticks(); >> >> Not directly relevant, but this is weird that I just notice. >> >> If we disable ticks before stopping vCPUs, IIUC it means vcpus can see >> stall ticks. I checked the vm_start() and indeed that one did it in the >> other way round: we'll stop vCPUs before stopping the ticks. >> >>> -pause_all_vcpus(); >>> +if (running) { >>> +pause_all_vcpus(); >>> +} >>> vm_state_notify(0, state); >>> if (send_stop) { >>> qapi_event_send_stop(); >> >> IIUC the "force" is not actually needed. It's only used when SUSPENDED, >> right? When not suspended, the force flag causes a stopped state to be forced even if current is a different stopped state. > That's the overloading I'm complaining about. We're using "force" to say > both: "include suspended" and: "set the state". This is basically taking > knowledge from the callsite being the migration code and encoding it in > that flag. > > I'd prefer something like: > > static int do_vm_stop(RunState state, bool send_stop, bool set_state, > bool include_suspended); This function has always been tailored for use by migration code and no other callers. Migration would always pass set_state=true and include_suspended=true. We have no use case for other combinations and no test for them. To my mind, "force" naturally implies both behaviors. We force the machine into the specified stop state, completely stopping suspended execution. Perhaps renaming vm_stop_force_state would erase the old association of "force" with only forcing runstate, such as vm_stop_all(). - Steve