On 7/30/2020 1:49 PM, Dr. David Alan Gilbert wrote:
> * Steve Sistare (steven.sist...@oracle.com) wrote:
>> Improve and extend the qemu functions that save and restore VM state so a
>> guest may be suspended and resumed with minimal pause time.  qemu may be
>> updated to a new version in between.
> 
> Nice.
> 
>> The first set of patches adds the cprsave and cprload commands to save and
>> restore VM state, and allow the host kernel to be updated and rebooted in
>> between.  The VM must create guest RAM in a persistent shared memory file,
>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
>>
>> cprsave stops the VCPUs and saves VM device state in a simple file, and
>> thus supports any type of guest image and block device.  The caller must
>> not modify the VM's block devices between cprsave and cprload.
> 
> can I ask why you don't just add a migration flag to skip the devices
> you don't want, and then do a migrate to a file?
> (i.e. migrate "exec:cat > afile")
> We already have the 'x-ignore-shared' capability that's used for doing
> RAM snapshots of VMs; primarily I think for being able to start a VM
> from a RAM snapshot as a fast VM start trick.
> (There's also a xen_save_devices that does something similar).
> If you backed the RAM as you say, enabled x-ignore-shared and then did:
> 
>    migrate "exec:cat > afile"
> 
> and restarted the destination with:
> 
>     migrate_incoming "exec:cat afile"
> 
> what is different (except the later stuff about the vfio magic and
> chardevs).
> 
> Dave

Yes, I did consider whether to extend the migration syntax and implemention in
save_vmstate and load_vmstate, versus creating something new.  Those functions 
handle stuff like bdrv snapshot, aio, and migration which are n/a for the cpr 
use case, and the cpr functions handle state that is n/a for the migration 
case. 
I judged that a single function handling both would be less readable and 
maintainable.  At their core all these routines call qemu_loadvm_state() and 
qemu_savevm_state().
 The surrounding code is mostly different.


Take a look at 
  savevm.c:save_vmstate()   vs   save_cpr_snapshot() attached
and
  savevm.c:load_vmstate()   vs   load_cpr_snapshot() attached

I attached the complete versions of the cpr functions because they are built up
over multiple patches in this series, thus hard to visualize in patch form.

- Steve

> 
>> cprsave and cprload support guests with vfio devices if the caller first
>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>> The guest drivers suspend methods flush outstanding requests and re-
>> initialize the devices, and thus there is no device state to save and
>> restore.
>>
>>    1 savevm: add vmstate handler iterators
>>    2 savevm: VM handlers mode mask
>>    3 savevm: QMP command for cprsave
>>    4 savevm: HMP Command for cprsave
>>    5 savevm: QMP command for cprload
>>    6 savevm: HMP Command for cprload
>>    7 savevm: QMP command for cprinfo
>>    8 savevm: HMP command for cprinfo
>>    9 savevm: prevent cprsave if memory is volatile
>>   10 kvmclock: restore paused KVM clock
>>   11 cpu: disable ticks when suspended
>>   12 vl: pause option
>>   13 gdbstub: gdb support for suspended state
>>
>> The next patches add a restart method that eliminates the persistent memory
>> constraint, and allows qemu to be updated across the restart, but does not
>> allow host reboot.  Anonymous memory segments used by the guest are
>> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
>> madvise(MADV_DOEXEC) option in the Linux kernel.  See
>> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yzn...@oracle.com/
>>
>>   14 savevm: VMS_RESTART and cprsave restart
>>   15 vl: QEMU_START_FREEZE env var
>>   16 oslib: add qemu_clr_cloexec
>>   17 util: env var helpers
>>   18 osdep: import MADV_DOEXEC
>>   19 memory: ram_block_add cosmetic changes
>>   20 vl: add helper to request re-exec
>>   21 exec, memory: exec(3) to restart
>>   22 char: qio_channel_socket_accept reuse fd
>>   23 char: save/restore chardev socket fds
>>   24 ui: save/restore vnc socket fds
>>   25 char: save/restore chardev pty fds
>>   26 monitor: save/restore QMP negotiation status
>>   27 vhost: reset vhost devices upon cprsave
>>   28 char: restore terminal on restart
>>
>> The next patches extend the restart method to save and restore vfio-pci
>> state, eliminating the requirement for a guest agent.  The vfio container,
>> group, and device descriptors are preserved across the qemu re-exec.
>>
>>   29 pci: export pci_update_mappings
>>   30 vfio-pci: save and restore
>>   31 vfio-pci: trace pci config
>>   32 vfio-pci: improved tracing
>>
>> Here is an example of updating qemu from v4.2.0 to v4.2.1 using 
>> "cprload restart".  The software update is performed while the guest is
>> running to minimize downtime.
>>
>> window 1                             | window 2
>>                                      |
>> # qemu-system-x86_64 ...             |
>> QEMU 4.2.0 monitor - type 'help' ... |
>> (qemu) info status                   |
>> VM status: running                   |
>>                                      | # yum update qemu
>> (qemu) cprsave /tmp/qemu.sav restart |
>> QEMU 4.2.1 monitor - type 'help' ... |
>> (qemu) info status                   |
>> VM status: paused (prelaunch)                |
>> (qemu) cprload /tmp/qemu.sav         |
>> (qemu) info status                   |
>> VM status: running                   |
>>
>>
>> Here is an example of updating the host kernel using "cprload reboot"
>>
>> window 1                                     | window 2
>>                                              |
>> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
>> QEMU 4.2.1 monitor - type 'help' ...         |
>> (qemu) info status                           |
>> VM status: running                           |
>>                                              | # yum update kernel-uek
>> (qemu) cprsave /tmp/qemu.sav restart         |
>>                                              |
>> # systemctl kexec                            |
>> kexec_core: Starting new kernel                      |
>> ...                                          |
>>                                              |
>> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
>> QEMU 4.2.1 monitor - type 'help' ...         |
>> (qemu) info status                           |
>> VM status: paused (prelaunch)                        |
>> (qemu) cprload /tmp/qemu.sav                 |
>> (qemu) info status                           |
>> VM status: running                           |
>>
>>
>> Mark Kanda (5):
>>   char: qio_channel_socket_accept reuse fd
>>   char: save/restore chardev socket fds
>>   ui: save/restore vnc socket fds
>>   monitor: save/restore QMP negotiation status
>>   vhost: reset vhost devices upon cprsave
>>
>> Steve Sistare (27):
>>   savevm: add vmstate handler iterators
>>   savevm: VM handlers mode mask
>>   savevm: QMP command for cprsave
>>   savevm: HMP Command for cprsave
>>   savevm: QMP command for cprload
>>   savevm: HMP Command for cprload
>>   savevm: QMP command for cprinfo
>>   savevm: HMP command for cprinfo
>>   savevm: prevent cprsave if memory is volatile
>>   kvmclock: restore paused KVM clock
>>   cpu: disable ticks when suspended
>>   vl: pause option
>>   gdbstub: gdb support for suspended state
>>   savevm: VMS_RESTART and cprsave restart
>>   vl: QEMU_START_FREEZE env var
>>   oslib: add qemu_clr_cloexec
>>   util: env var helpers
>>   osdep: import MADV_DOEXEC
>>   memory: ram_block_add cosmetic changes
>>   vl: add helper to request re-exec
>>   exec, memory: exec(3) to restart
>>   char: save/restore chardev pty fds
>>   char: restore terminal on restart
>>   pci: export pci_update_mappings
>>   vfio-pci: save and restore
>>   vfio-pci: trace pci config
>>   vfio-pci: improved tracing
>>
>>  MAINTAINERS                    |   7 ++
>>  accel/kvm/kvm-all.c            |   8 +-
>>  accel/kvm/trace-events         |   3 +-
>>  chardev/char-pty.c             |  38 +++++--
>>  chardev/char-socket.c          |  35 ++++++
>>  chardev/char-stdio.c           |   7 ++
>>  chardev/char.c                 |  16 +++
>>  exec.c                         |  88 +++++++++++++--
>>  gdbstub.c                      |  11 +-
>>  hmp-commands.hx                |  46 ++++++++
>>  hw/i386/kvm/clock.c            |   6 +-
>>  hw/pci/msix.c                  |   1 +
>>  hw/pci/pci.c                   |  17 +--
>>  hw/pci/trace-events            |   5 +-
>>  hw/vfio/common.c               | 115 ++++++++++++++++----
>>  hw/vfio/pci.c                  | 179 ++++++++++++++++++++++++++++++-
>>  hw/vfio/platform.c             |   2 +-
>>  hw/vfio/trace-events           |  11 +-
>>  hw/virtio/vhost.c              |  12 +++
>>  include/chardev/char.h         |   8 ++
>>  include/exec/memory.h          |   4 +
>>  include/hw/pci/pci.h           |   2 +
>>  include/hw/vfio/vfio-common.h  |   4 +-
>>  include/io/channel-socket.h    |   3 +-
>>  include/migration/register.h   |   3 +
>>  include/migration/vmstate.h    |  11 ++
>>  include/monitor/hmp.h          |   3 +
>>  include/qemu/cutils.h          |   1 +
>>  include/qemu/env.h             |  31 ++++++
>>  include/qemu/osdep.h           |   8 ++
>>  include/sysemu/sysemu.h        |  10 ++
>>  io/channel-socket.c            |  12 ++-
>>  io/net-listener.c              |   4 +-
>>  migration/block.c              |   1 +
>>  migration/migration.c          |   4 +-
>>  migration/ram.c                |   1 +
>>  migration/savevm.c             | 237 
>> ++++++++++++++++++++++++++++++++++++-----
>>  migration/savevm.h             |   4 +-
>>  monitor/hmp-cmds.c             |  28 +++++
>>  monitor/qmp-cmds.c             |  16 +++
>>  monitor/qmp.c                  |  42 ++++++++
>>  qapi/migration.json            |  35 ++++++
>>  qapi/pragma.json               |   1 +
>>  qemu-options.hx                |   9 ++
>>  scsi/qemu-pr-helper.c          |   2 +-
>>  softmmu/vl.c                   |  65 ++++++++++-
>>  tests/qtest/tpm-emu.c          |   2 +-
>>  tests/test-char.c              |   2 +-
>>  tests/test-io-channel-socket.c |   4 +-
>>  trace-events                   |   2 +
>>  ui/vnc.c                       | 153 +++++++++++++++++++++-----
>>  util/Makefile.objs             |   2 +-
>>  util/env.c                     | 132 +++++++++++++++++++++++
>>  util/oslib-posix.c             |   9 ++
>>  util/oslib-win32.c             |   4 +
>>  55 files changed, 1331 insertions(+), 135 deletions(-)
>>  create mode 100644 include/qemu/env.h
>>  create mode 100644 util/env.c
>>
>> -- 
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
void save_cpr_snapshot(const char *file, const char *mode, Error **errp)
{
    int ret = 0;
    QEMUFile *f;
    VMStateMode op;

    if (!strcmp(mode, "reboot")) {
        op = VMS_REBOOT;
    } else if (!strcmp(mode, "restart")) {
        op = VMS_RESTART;
    } else {
        error_setg(errp, "cprsave: bad mode %s", mode);
        return;
    }

    if (op == VMS_REBOOT && qemu_ram_volatile(errp)) {
        return;
    }

    if (op == VMS_RESTART && QEMU_MADV_DOEXEC == QEMU_MADV_INVALID) {
        error_setg(errp, "kernel does not support MADV_DOEXEC.");
        return;
    }

    if (op == VMS_RESTART && xen_enabled()) {
        error_setg(errp, "xen does not support cprsave restart");
        return;
    }

    f = qf_file_open(file, O_CREAT | O_WRONLY | O_TRUNC, 0600, errp);
    if (!f) {
        return;
    }

    ret = global_state_store();
    if (ret) {
        error_setg(errp, "Error saving global state");
        qemu_fclose(f);
        return;
    }

    /* Update timers_state before saving.  Suspend did not so do. */
    if (runstate_check(RUN_STATE_SUSPENDED)) {
        cpu_disable_ticks();
    }

    vm_stop(RUN_STATE_SAVE_VM);

    ret = qemu_savevm_state(f, op, errp);
    if ((ret < 0) && !*errp) {
        error_setg(errp, "qemu_savevm_state failed");
    }
    qemu_fclose(f);

    if (op == VMS_REBOOT) {
        no_shutdown = 0;
        qemu_system_shutdown_request();
    } else if (op == VMS_RESTART) {
        if (qemu_preserve_ram(errp)) {
            return;
        }
        save_chardev_fds();
        save_vnc_fds();
        save_named_fd("mntfd");          /* was received from qemu-cpr */
        save_named_fd("ctlfd");          /* was received from qemu-cpr */
        walkenv(FD_PREFIX, preserve_fd, 0);
        reset_vhost_devices();
        save_qmp_negotiation_status();
        qemu_term_exit();
        qemu_system_exec_request();
        putenv((char *)"QEMU_START_FREEZE=");
    }
}

void load_cpr_snapshot(const char *file, Error **errp)
{
    QEMUFile *f;
    int ret;
    RunState state;

    if (runstate_is_running()) {
        error_setg(errp, "cprload called for a running VM");
        return;
    }

    f = qf_file_open(file, O_RDONLY, 0, errp);
    if (!f) {
        return;
    }

    ret = qemu_loadvm_state(f, VMS_REBOOT | VMS_RESTART);
    qemu_fclose(f);
    if (ret < 0) {
        error_setg(errp, "Error %d while loading VM state", ret);
        return;
    }

    state = global_state_get_runstate();
    if (state == RUN_STATE_RUNNING) {
        vm_start();
    } else {
        runstate_set(state);
        if (runstate_check(RUN_STATE_SUSPENDED)) {
            start_on_wake = 1;
        }
    }

    load_vnc_fds();
}

Reply via email to