On Thu, Jul 30, 2020 at 02:48:44PM -0400, Steven Sistare wrote:
> On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
> > On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
> >> Improve and extend the qemu functions that save and restore VM state so a
> >> guest may be suspended and resumed with minimal pause time.  qemu may be
> >> updated to a new version in between.
> >>
> >> The first set of patches adds the cprsave and cprload commands to save and
> >> restore VM state, and allow the host kernel to be updated and rebooted in
> >> between.  The VM must create guest RAM in a persistent shared memory file,
> >> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> >> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
> >>
> >> cprsave stops the VCPUs and saves VM device state in a simple file, and
> >> thus supports any type of guest image and block device.  The caller must
> >> not modify the VM's block devices between cprsave and cprload.
> >>
> >> cprsave and cprload support guests with vfio devices if the caller first
> >> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
> >> The guest drivers suspend methods flush outstanding requests and re-
> >> initialize the devices, and thus there is no device state to save and
> >> restore.
> >>
> >>    1 savevm: add vmstate handler iterators
> >>    2 savevm: VM handlers mode mask
> >>    3 savevm: QMP command for cprsave
> >>    4 savevm: HMP Command for cprsave
> >>    5 savevm: QMP command for cprload
> >>    6 savevm: HMP Command for cprload
> >>    7 savevm: QMP command for cprinfo
> >>    8 savevm: HMP command for cprinfo
> >>    9 savevm: prevent cprsave if memory is volatile
> >>   10 kvmclock: restore paused KVM clock
> >>   11 cpu: disable ticks when suspended
> >>   12 vl: pause option
> >>   13 gdbstub: gdb support for suspended state
> >>
> >> The next patches add a restart method that eliminates the persistent memory
> >> constraint, and allows qemu to be updated across the restart, but does not
> >> allow host reboot.  Anonymous memory segments used by the guest are
> >> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
> >> madvise(MADV_DOEXEC) option in the Linux kernel.  See
> >> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yzn...@oracle.com/
> >>
> >>   14 savevm: VMS_RESTART and cprsave restart
> >>   15 vl: QEMU_START_FREEZE env var
> >>   16 oslib: add qemu_clr_cloexec
> >>   17 util: env var helpers
> >>   18 osdep: import MADV_DOEXEC
> >>   19 memory: ram_block_add cosmetic changes
> >>   20 vl: add helper to request re-exec
> >>   21 exec, memory: exec(3) to restart
> >>   22 char: qio_channel_socket_accept reuse fd
> >>   23 char: save/restore chardev socket fds
> >>   24 ui: save/restore vnc socket fds
> >>   25 char: save/restore chardev pty fds
> > 
> > Keeping FDs open across re-exec is a nice trick, but how are you dealing
> > with the state associated with them, most especially the TLS encryption
> > state ? AFAIK, there's no way to serialize/deserialize the TLS state that
> > GNUTLS maintains, and the patches don't show any sign of dealing with
> > this. IOW it looks like while the FD will be preserved, any TLS session
> > running on it will fail.
> 
> I had not considered TLS.  If a non-qemu library maintains connection state, 
> then
> we won't be able to support it for live update until the library provides 
> interfaces
> to serialize the state.
> 
> For qemu objects, so far vmstate has been adequate to represent the devices 
> with
> descriptors that we preserve.

My main concern about this series is that there is an implicit assumption
that QEMU is *not* configured with certain features that are not handled
If QEMU is using one of the unsupported features, I don't see anything in
the series which attempts to prevent the actions.

IOW, users can have an arbitrary QEMU config, attempt to use these new features,
the commands may well succeed, but the user is silently left with a broken QEMU.
Such silent failure modes are really undesirable as they'll lead to a never
ending stream of hard to diagnose bug reports for QEMU maintainers.

TLS is one example of this, the live upgrade  will "succeed", but the TLS
connections will be totally non-functional.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to