Re: [PATCH V1 00/32] Live Update

2020-08-11 Thread Dr. David Alan Gilbert
* Daniel P. Berrangé (berra...@redhat.com) wrote:
> On Fri, Jul 31, 2020 at 11:27:45AM -0400, Steven Sistare wrote:
> > On 7/31/2020 4:53 AM, Daniel P. Berrangé wrote:
> > > On Thu, Jul 30, 2020 at 02:48:44PM -0400, Steven Sistare wrote:
> > >> On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
> > >>> On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
> >  Improve and extend the qemu functions that save and restore VM state 
> >  so a
> >  guest may be suspended and resumed with minimal pause time.  qemu may 
> >  be
> >  updated to a new version in between.
> > 
> >  The first set of patches adds the cprsave and cprload commands to save 
> >  and
> >  restore VM state, and allow the host kernel to be updated and rebooted 
> >  in
> >  between.  The VM must create guest RAM in a persistent shared memory 
> >  file,
> >  such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> >  https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
> > 
> >  cprsave stops the VCPUs and saves VM device state in a simple file, and
> >  thus supports any type of guest image and block device.  The caller 
> >  must
> >  not modify the VM's block devices between cprsave and cprload.
> > 
> >  cprsave and cprload support guests with vfio devices if the caller 
> >  first
> >  suspends the guest by issuing guest-suspend-ram to the qemu guest 
> >  agent.
> >  The guest drivers suspend methods flush outstanding requests and re-
> >  initialize the devices, and thus there is no device state to save and
> >  restore.
> > 
> > 1 savevm: add vmstate handler iterators
> > 2 savevm: VM handlers mode mask
> > 3 savevm: QMP command for cprsave
> > 4 savevm: HMP Command for cprsave
> > 5 savevm: QMP command for cprload
> > 6 savevm: HMP Command for cprload
> > 7 savevm: QMP command for cprinfo
> > 8 savevm: HMP command for cprinfo
> > 9 savevm: prevent cprsave if memory is volatile
> >    10 kvmclock: restore paused KVM clock
> >    11 cpu: disable ticks when suspended
> >    12 vl: pause option
> >    13 gdbstub: gdb support for suspended state
> > 
> >  The next patches add a restart method that eliminates the persistent 
> >  memory
> >  constraint, and allows qemu to be updated across the restart, but does 
> >  not
> >  allow host reboot.  Anonymous memory segments used by the guest are
> >  preserved across a re-exec of qemu, mapped at the same VA, via a 
> >  proposed
> >  madvise(MADV_DOEXEC) option in the Linux kernel.  See
> >  https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yzn...@oracle.com/
> > 
> >    14 savevm: VMS_RESTART and cprsave restart
> >    15 vl: QEMU_START_FREEZE env var
> >    16 oslib: add qemu_clr_cloexec
> >    17 util: env var helpers
> >    18 osdep: import MADV_DOEXEC
> >    19 memory: ram_block_add cosmetic changes
> >    20 vl: add helper to request re-exec
> >    21 exec, memory: exec(3) to restart
> >    22 char: qio_channel_socket_accept reuse fd
> >    23 char: save/restore chardev socket fds
> >    24 ui: save/restore vnc socket fds
> >    25 char: save/restore chardev pty fds
> > >>>
> > >>> Keeping FDs open across re-exec is a nice trick, but how are you dealing
> > >>> with the state associated with them, most especially the TLS encryption
> > >>> state ? AFAIK, there's no way to serialize/deserialize the TLS state 
> > >>> that
> > >>> GNUTLS maintains, and the patches don't show any sign of dealing with
> > >>> this. IOW it looks like while the FD will be preserved, any TLS session
> > >>> running on it will fail.
> > >>
> > >> I had not considered TLS.  If a non-qemu library maintains connection 
> > >> state, then
> > >> we won't be able to support it for live update until the library 
> > >> provides interfaces
> > >> to serialize the state.
> > >>
> > >> For qemu objects, so far vmstate has been adequate to represent the 
> > >> devices with
> > >> descriptors that we preserve.
> > > 
> > > My main concern about this series is that there is an implicit assumption
> > > that QEMU is *not* configured with certain features that are not handled
> > > If QEMU is using one of the unsupported features, I don't see anything in
> > > the series which attempts to prevent the actions.
> > > 
> > > IOW, users can have an arbitrary QEMU config, attempt to use these new 
> > > features,
> > > the commands may well succeed, but the user is silently left with a 
> > > broken QEMU.
> > > Such silent failure modes are really undesirable as they'll lead to a 
> > > never
> > > ending stream of hard to diagnose bug reports for QEMU maintainers.
> > > 
> > > TLS is one example of this, the live upgrade  will "succeed", but the TLS
> > > 

Re: [PATCH V1 00/32] Live Update

2020-08-04 Thread Steven Sistare
Hi folks, any questions or comments on the vfio and pci changes in 
patch 30?  Or on the means of preserving anonymous memory and re-exec'ing 
in patches 14 - 21?

- Steve

On 7/30/2020 11:14 AM, Steve Sistare wrote:
> Improve and extend the qemu functions that save and restore VM state so a
> guest may be suspended and resumed with minimal pause time.  qemu may be
> updated to a new version in between.
> 
> The first set of patches adds the cprsave and cprload commands to save and
> restore VM state, and allow the host kernel to be updated and rebooted in
> between.  The VM must create guest RAM in a persistent shared memory file,
> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
> 
> cprsave stops the VCPUs and saves VM device state in a simple file, and
> thus supports any type of guest image and block device.  The caller must
> not modify the VM's block devices between cprsave and cprload.
> 
> cprsave and cprload support guests with vfio devices if the caller first
> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
> The guest drivers suspend methods flush outstanding requests and re-
> initialize the devices, and thus there is no device state to save and
> restore.
> 
>1 savevm: add vmstate handler iterators
>2 savevm: VM handlers mode mask
>3 savevm: QMP command for cprsave
>4 savevm: HMP Command for cprsave
>5 savevm: QMP command for cprload
>6 savevm: HMP Command for cprload
>7 savevm: QMP command for cprinfo
>8 savevm: HMP command for cprinfo
>9 savevm: prevent cprsave if memory is volatile
>   10 kvmclock: restore paused KVM clock
>   11 cpu: disable ticks when suspended
>   12 vl: pause option
>   13 gdbstub: gdb support for suspended state
> 
> The next patches add a restart method that eliminates the persistent memory
> constraint, and allows qemu to be updated across the restart, but does not
> allow host reboot.  Anonymous memory segments used by the guest are
> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
> madvise(MADV_DOEXEC) option in the Linux kernel.  See
> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yzn...@oracle.com/
> 
>   14 savevm: VMS_RESTART and cprsave restart
>   15 vl: QEMU_START_FREEZE env var
>   16 oslib: add qemu_clr_cloexec
>   17 util: env var helpers
>   18 osdep: import MADV_DOEXEC
>   19 memory: ram_block_add cosmetic changes
>   20 vl: add helper to request re-exec
>   21 exec, memory: exec(3) to restart
>   22 char: qio_channel_socket_accept reuse fd
>   23 char: save/restore chardev socket fds
>   24 ui: save/restore vnc socket fds
>   25 char: save/restore chardev pty fds
>   26 monitor: save/restore QMP negotiation status
>   27 vhost: reset vhost devices upon cprsave
>   28 char: restore terminal on restart
> 
> The next patches extend the restart method to save and restore vfio-pci
> state, eliminating the requirement for a guest agent.  The vfio container,
> group, and device descriptors are preserved across the qemu re-exec.
> 
>   29 pci: export pci_update_mappings
>   30 vfio-pci: save and restore
>   31 vfio-pci: trace pci config
>   32 vfio-pci: improved tracing
> 
> Here is an example of updating qemu from v4.2.0 to v4.2.1 using 
> "cprload restart".  The software update is performed while the guest is
> running to minimize downtime.
> 
> window 1  | window 2
>   |
> # qemu-system-x86_64 ...  |
> QEMU 4.2.0 monitor - type 'help' ...  |
> (qemu) info status|
> VM status: running|
>   | # yum update qemu
> (qemu) cprsave /tmp/qemu.sav restart  |
> QEMU 4.2.1 monitor - type 'help' ...  |
> (qemu) info status|
> VM status: paused (prelaunch) |
> (qemu) cprload /tmp/qemu.sav  |
> (qemu) info status|
> VM status: running|
> 
> 
> Here is an example of updating the host kernel using "cprload reboot"
> 
> window 1  | window 2
>   |
> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
> QEMU 4.2.1 monitor - type 'help' ...  |
> (qemu) info status|
> VM status: running|
>   | # yum update kernel-uek
> (qemu) cprsave /tmp/qemu.sav restart  |
>   |
> # systemctl kexec |
> kexec_core: Starting new kernel   |
> ...   |
>   |
> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
> QEMU 4.2.1 monitor - type 'help' ...  |
> (qemu) info status  

Re: [PATCH V1 00/32] Live Update

2020-07-31 Thread Steven Sistare
On 7/30/2020 5:39 PM, Paolo Bonzini wrote:
> On 30/07/20 21:09, Steven Sistare wrote:
>>> please spell it out.  Also, how does the functionality compare to
>>> xen-save-devices-state and xen-load-devices-state?
>>
>> qmp_xen_save_devices_state serializes device state to a file which is loaded 
>> on the target for a live migration.  It performs some of the same actions
>> as cprsave/cprload but does not support live update-in-place.
> 
> So it is a subset, can code be reused across both?  

They use common subroutines, but their bodies check different conditions, so I
don't think merging would be an improvement.  We do provide a new helper 
qf_file_open() which could replace a handful of lines in both 
qmp_xen_save_devices_state 
and qmp_xen_load_devices_state.

> Also, live migration
> across versions is supported, so can you describe the special
> update-in-place support more precisely?  I am confused about the use
> cases, which require (or try) to keep file descriptors across re-exec,
> which are for kexec, and so on.

Sure. The first use case allows you to kexec reboot the host and update host
software and/or qemu.  It does not preserve descriptors, and guest ram must be
backed by persistant shared memory.  Guest pause time depends on host reboot
time, which can be seconds to 10's of seconds.

The second case allows you to update qemu in place, but not update the host.
Guest ram can be in shared or anonymous memory.  We call madvise(MADV_DOEXEC)
to tell the kernel to preserve anon memory across the exec.  Open descriptors
are preserved.  Addresses and lengths of saved memory segments are saved in
the environment, and the values of descriptors are saved.  When new qemu
restarts, it finds those values in the environment and uses them when the
various objects are created.  Memory is not realloc'd, it is already present,
and the address and lengths are saved in the ram objects.  Guest pause time
is in the 100 to 200 msec range.  It is less resource intensive than live
migration, and is appropriate if your only goal is to update qemu, as opposed
to evacuating a host.

 cprsave and cprload support guests with vfio devices if the caller first
 suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
 The guest drivers suspend methods flush outstanding requests and re-
 initialize the devices, and thus there is no device state to save and
 restore.
>>> This probably should be allowed even for regular migration.  Can you
>>> generalize the code as a separate series?
>>
>> Maybe.  I think that would be a distinct patch that ignores the vfio 
>> migration blocker 
>> if the state is suspended.  Plus a qemu agent call to do the suspend.  Needs 
>> more
>> thought.
> 
> The agent already supports suspend, so that should be relatively easy.
> Only the code to add/remove the VFIO migration blocker from a VM state
> change notifier, or something like that, would be needed.

Yes, I have experimented with the guest's suspend method.

- Steve



Re: [PATCH V1 00/32] Live Update

2020-07-31 Thread Steven Sistare
On 7/31/2020 11:52 AM, Daniel P. Berrangé wrote:
> On Fri, Jul 31, 2020 at 11:27:45AM -0400, Steven Sistare wrote:
>> On 7/31/2020 4:53 AM, Daniel P. Berrangé wrote:
>>> On Thu, Jul 30, 2020 at 02:48:44PM -0400, Steven Sistare wrote:
 On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
> On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
>> Improve and extend the qemu functions that save and restore VM state so a
>> guest may be suspended and resumed with minimal pause time.  qemu may be
>> updated to a new version in between.
>>
>> The first set of patches adds the cprsave and cprload commands to save 
>> and
>> restore VM state, and allow the host kernel to be updated and rebooted in
>> between.  The VM must create guest RAM in a persistent shared memory 
>> file,
>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
>>
>> cprsave stops the VCPUs and saves VM device state in a simple file, and
>> thus supports any type of guest image and block device.  The caller must
>> not modify the VM's block devices between cprsave and cprload.
>>
>> cprsave and cprload support guests with vfio devices if the caller first
>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>> The guest drivers suspend methods flush outstanding requests and re-
>> initialize the devices, and thus there is no device state to save and
>> restore.
>>
>>1 savevm: add vmstate handler iterators
>>2 savevm: VM handlers mode mask
>>3 savevm: QMP command for cprsave
>>4 savevm: HMP Command for cprsave
>>5 savevm: QMP command for cprload
>>6 savevm: HMP Command for cprload
>>7 savevm: QMP command for cprinfo
>>8 savevm: HMP command for cprinfo
>>9 savevm: prevent cprsave if memory is volatile
>>   10 kvmclock: restore paused KVM clock
>>   11 cpu: disable ticks when suspended
>>   12 vl: pause option
>>   13 gdbstub: gdb support for suspended state
>>
>> The next patches add a restart method that eliminates the persistent 
>> memory
>> constraint, and allows qemu to be updated across the restart, but does 
>> not
>> allow host reboot.  Anonymous memory segments used by the guest are
>> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
>> madvise(MADV_DOEXEC) option in the Linux kernel.  See
>> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yzn...@oracle.com/
>>
>>   14 savevm: VMS_RESTART and cprsave restart
>>   15 vl: QEMU_START_FREEZE env var
>>   16 oslib: add qemu_clr_cloexec
>>   17 util: env var helpers
>>   18 osdep: import MADV_DOEXEC
>>   19 memory: ram_block_add cosmetic changes
>>   20 vl: add helper to request re-exec
>>   21 exec, memory: exec(3) to restart
>>   22 char: qio_channel_socket_accept reuse fd
>>   23 char: save/restore chardev socket fds
>>   24 ui: save/restore vnc socket fds
>>   25 char: save/restore chardev pty fds
>
> Keeping FDs open across re-exec is a nice trick, but how are you dealing
> with the state associated with them, most especially the TLS encryption
> state ? AFAIK, there's no way to serialize/deserialize the TLS state that
> GNUTLS maintains, and the patches don't show any sign of dealing with
> this. IOW it looks like while the FD will be preserved, any TLS session
> running on it will fail.

 I had not considered TLS.  If a non-qemu library maintains connection 
 state, then
 we won't be able to support it for live update until the library provides 
 interfaces
 to serialize the state.

 For qemu objects, so far vmstate has been adequate to represent the 
 devices with
 descriptors that we preserve.
>>>
>>> My main concern about this series is that there is an implicit assumption
>>> that QEMU is *not* configured with certain features that are not handled
>>> If QEMU is using one of the unsupported features, I don't see anything in
>>> the series which attempts to prevent the actions.
>>>
>>> IOW, users can have an arbitrary QEMU config, attempt to use these new 
>>> features,
>>> the commands may well succeed, but the user is silently left with a broken 
>>> QEMU.
>>> Such silent failure modes are really undesirable as they'll lead to a never
>>> ending stream of hard to diagnose bug reports for QEMU maintainers.
>>>
>>> TLS is one example of this, the live upgrade  will "succeed", but the TLS
>>> connections will be totally non-functional.
>>
>> I agree with all your points and would like to do better in this area.  
>> Other than hunting for 
>> every use of a descriptor and either supporting it or blocking cpr, do you 
>> have any suggestions?
>> Thinking out loud, maybe we 

Re: [PATCH V1 00/32] Live Update

2020-07-31 Thread Daniel P . Berrangé
On Fri, Jul 31, 2020 at 11:27:45AM -0400, Steven Sistare wrote:
> On 7/31/2020 4:53 AM, Daniel P. Berrangé wrote:
> > On Thu, Jul 30, 2020 at 02:48:44PM -0400, Steven Sistare wrote:
> >> On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
> >>> On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
>  Improve and extend the qemu functions that save and restore VM state so a
>  guest may be suspended and resumed with minimal pause time.  qemu may be
>  updated to a new version in between.
> 
>  The first set of patches adds the cprsave and cprload commands to save 
>  and
>  restore VM state, and allow the host kernel to be updated and rebooted in
>  between.  The VM must create guest RAM in a persistent shared memory 
>  file,
>  such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
>  https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
> 
>  cprsave stops the VCPUs and saves VM device state in a simple file, and
>  thus supports any type of guest image and block device.  The caller must
>  not modify the VM's block devices between cprsave and cprload.
> 
>  cprsave and cprload support guests with vfio devices if the caller first
>  suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>  The guest drivers suspend methods flush outstanding requests and re-
>  initialize the devices, and thus there is no device state to save and
>  restore.
> 
> 1 savevm: add vmstate handler iterators
> 2 savevm: VM handlers mode mask
> 3 savevm: QMP command for cprsave
> 4 savevm: HMP Command for cprsave
> 5 savevm: QMP command for cprload
> 6 savevm: HMP Command for cprload
> 7 savevm: QMP command for cprinfo
> 8 savevm: HMP command for cprinfo
> 9 savevm: prevent cprsave if memory is volatile
>    10 kvmclock: restore paused KVM clock
>    11 cpu: disable ticks when suspended
>    12 vl: pause option
>    13 gdbstub: gdb support for suspended state
> 
>  The next patches add a restart method that eliminates the persistent 
>  memory
>  constraint, and allows qemu to be updated across the restart, but does 
>  not
>  allow host reboot.  Anonymous memory segments used by the guest are
>  preserved across a re-exec of qemu, mapped at the same VA, via a proposed
>  madvise(MADV_DOEXEC) option in the Linux kernel.  See
>  https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yzn...@oracle.com/
> 
>    14 savevm: VMS_RESTART and cprsave restart
>    15 vl: QEMU_START_FREEZE env var
>    16 oslib: add qemu_clr_cloexec
>    17 util: env var helpers
>    18 osdep: import MADV_DOEXEC
>    19 memory: ram_block_add cosmetic changes
>    20 vl: add helper to request re-exec
>    21 exec, memory: exec(3) to restart
>    22 char: qio_channel_socket_accept reuse fd
>    23 char: save/restore chardev socket fds
>    24 ui: save/restore vnc socket fds
>    25 char: save/restore chardev pty fds
> >>>
> >>> Keeping FDs open across re-exec is a nice trick, but how are you dealing
> >>> with the state associated with them, most especially the TLS encryption
> >>> state ? AFAIK, there's no way to serialize/deserialize the TLS state that
> >>> GNUTLS maintains, and the patches don't show any sign of dealing with
> >>> this. IOW it looks like while the FD will be preserved, any TLS session
> >>> running on it will fail.
> >>
> >> I had not considered TLS.  If a non-qemu library maintains connection 
> >> state, then
> >> we won't be able to support it for live update until the library provides 
> >> interfaces
> >> to serialize the state.
> >>
> >> For qemu objects, so far vmstate has been adequate to represent the 
> >> devices with
> >> descriptors that we preserve.
> > 
> > My main concern about this series is that there is an implicit assumption
> > that QEMU is *not* configured with certain features that are not handled
> > If QEMU is using one of the unsupported features, I don't see anything in
> > the series which attempts to prevent the actions.
> > 
> > IOW, users can have an arbitrary QEMU config, attempt to use these new 
> > features,
> > the commands may well succeed, but the user is silently left with a broken 
> > QEMU.
> > Such silent failure modes are really undesirable as they'll lead to a never
> > ending stream of hard to diagnose bug reports for QEMU maintainers.
> > 
> > TLS is one example of this, the live upgrade  will "succeed", but the TLS
> > connections will be totally non-functional.
> 
> I agree with all your points and would like to do better in this area.  Other 
> than hunting for 
> every use of a descriptor and either supporting it or blocking cpr, do you 
> have any suggestions?
> Thinking out loud, maybe we can gather all the fds that we support, then look 
> 

Re: [PATCH V1 00/32] Live Update

2020-07-31 Thread Steven Sistare
On 7/31/2020 4:53 AM, Daniel P. Berrangé wrote:
> On Thu, Jul 30, 2020 at 02:48:44PM -0400, Steven Sistare wrote:
>> On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
>>> On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
 Improve and extend the qemu functions that save and restore VM state so a
 guest may be suspended and resumed with minimal pause time.  qemu may be
 updated to a new version in between.

 The first set of patches adds the cprsave and cprload commands to save and
 restore VM state, and allow the host kernel to be updated and rebooted in
 between.  The VM must create guest RAM in a persistent shared memory file,
 such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
 https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/

 cprsave stops the VCPUs and saves VM device state in a simple file, and
 thus supports any type of guest image and block device.  The caller must
 not modify the VM's block devices between cprsave and cprload.

 cprsave and cprload support guests with vfio devices if the caller first
 suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
 The guest drivers suspend methods flush outstanding requests and re-
 initialize the devices, and thus there is no device state to save and
 restore.

1 savevm: add vmstate handler iterators
2 savevm: VM handlers mode mask
3 savevm: QMP command for cprsave
4 savevm: HMP Command for cprsave
5 savevm: QMP command for cprload
6 savevm: HMP Command for cprload
7 savevm: QMP command for cprinfo
8 savevm: HMP command for cprinfo
9 savevm: prevent cprsave if memory is volatile
   10 kvmclock: restore paused KVM clock
   11 cpu: disable ticks when suspended
   12 vl: pause option
   13 gdbstub: gdb support for suspended state

 The next patches add a restart method that eliminates the persistent memory
 constraint, and allows qemu to be updated across the restart, but does not
 allow host reboot.  Anonymous memory segments used by the guest are
 preserved across a re-exec of qemu, mapped at the same VA, via a proposed
 madvise(MADV_DOEXEC) option in the Linux kernel.  See
 https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yzn...@oracle.com/

   14 savevm: VMS_RESTART and cprsave restart
   15 vl: QEMU_START_FREEZE env var
   16 oslib: add qemu_clr_cloexec
   17 util: env var helpers
   18 osdep: import MADV_DOEXEC
   19 memory: ram_block_add cosmetic changes
   20 vl: add helper to request re-exec
   21 exec, memory: exec(3) to restart
   22 char: qio_channel_socket_accept reuse fd
   23 char: save/restore chardev socket fds
   24 ui: save/restore vnc socket fds
   25 char: save/restore chardev pty fds
>>>
>>> Keeping FDs open across re-exec is a nice trick, but how are you dealing
>>> with the state associated with them, most especially the TLS encryption
>>> state ? AFAIK, there's no way to serialize/deserialize the TLS state that
>>> GNUTLS maintains, and the patches don't show any sign of dealing with
>>> this. IOW it looks like while the FD will be preserved, any TLS session
>>> running on it will fail.
>>
>> I had not considered TLS.  If a non-qemu library maintains connection state, 
>> then
>> we won't be able to support it for live update until the library provides 
>> interfaces
>> to serialize the state.
>>
>> For qemu objects, so far vmstate has been adequate to represent the devices 
>> with
>> descriptors that we preserve.
> 
> My main concern about this series is that there is an implicit assumption
> that QEMU is *not* configured with certain features that are not handled
> If QEMU is using one of the unsupported features, I don't see anything in
> the series which attempts to prevent the actions.
> 
> IOW, users can have an arbitrary QEMU config, attempt to use these new 
> features,
> the commands may well succeed, but the user is silently left with a broken 
> QEMU.
> Such silent failure modes are really undesirable as they'll lead to a never
> ending stream of hard to diagnose bug reports for QEMU maintainers.
> 
> TLS is one example of this, the live upgrade  will "succeed", but the TLS
> connections will be totally non-functional.

I agree with all your points and would like to do better in this area.  Other 
than hunting for 
every use of a descriptor and either supporting it or blocking cpr, do you have 
any suggestions?
Thinking out loud, maybe we can gather all the fds that we support, then look 
for all fds in the
process, and block the cpr if we find an unrecognized fd.

- Steve



Re: [PATCH V1 00/32] Live Update

2020-07-31 Thread Daniel P . Berrangé
On Thu, Jul 30, 2020 at 02:48:44PM -0400, Steven Sistare wrote:
> On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
> > On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
> >> Improve and extend the qemu functions that save and restore VM state so a
> >> guest may be suspended and resumed with minimal pause time.  qemu may be
> >> updated to a new version in between.
> >>
> >> The first set of patches adds the cprsave and cprload commands to save and
> >> restore VM state, and allow the host kernel to be updated and rebooted in
> >> between.  The VM must create guest RAM in a persistent shared memory file,
> >> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> >> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
> >>
> >> cprsave stops the VCPUs and saves VM device state in a simple file, and
> >> thus supports any type of guest image and block device.  The caller must
> >> not modify the VM's block devices between cprsave and cprload.
> >>
> >> cprsave and cprload support guests with vfio devices if the caller first
> >> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
> >> The guest drivers suspend methods flush outstanding requests and re-
> >> initialize the devices, and thus there is no device state to save and
> >> restore.
> >>
> >>1 savevm: add vmstate handler iterators
> >>2 savevm: VM handlers mode mask
> >>3 savevm: QMP command for cprsave
> >>4 savevm: HMP Command for cprsave
> >>5 savevm: QMP command for cprload
> >>6 savevm: HMP Command for cprload
> >>7 savevm: QMP command for cprinfo
> >>8 savevm: HMP command for cprinfo
> >>9 savevm: prevent cprsave if memory is volatile
> >>   10 kvmclock: restore paused KVM clock
> >>   11 cpu: disable ticks when suspended
> >>   12 vl: pause option
> >>   13 gdbstub: gdb support for suspended state
> >>
> >> The next patches add a restart method that eliminates the persistent memory
> >> constraint, and allows qemu to be updated across the restart, but does not
> >> allow host reboot.  Anonymous memory segments used by the guest are
> >> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
> >> madvise(MADV_DOEXEC) option in the Linux kernel.  See
> >> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yzn...@oracle.com/
> >>
> >>   14 savevm: VMS_RESTART and cprsave restart
> >>   15 vl: QEMU_START_FREEZE env var
> >>   16 oslib: add qemu_clr_cloexec
> >>   17 util: env var helpers
> >>   18 osdep: import MADV_DOEXEC
> >>   19 memory: ram_block_add cosmetic changes
> >>   20 vl: add helper to request re-exec
> >>   21 exec, memory: exec(3) to restart
> >>   22 char: qio_channel_socket_accept reuse fd
> >>   23 char: save/restore chardev socket fds
> >>   24 ui: save/restore vnc socket fds
> >>   25 char: save/restore chardev pty fds
> > 
> > Keeping FDs open across re-exec is a nice trick, but how are you dealing
> > with the state associated with them, most especially the TLS encryption
> > state ? AFAIK, there's no way to serialize/deserialize the TLS state that
> > GNUTLS maintains, and the patches don't show any sign of dealing with
> > this. IOW it looks like while the FD will be preserved, any TLS session
> > running on it will fail.
> 
> I had not considered TLS.  If a non-qemu library maintains connection state, 
> then
> we won't be able to support it for live update until the library provides 
> interfaces
> to serialize the state.
> 
> For qemu objects, so far vmstate has been adequate to represent the devices 
> with
> descriptors that we preserve.

My main concern about this series is that there is an implicit assumption
that QEMU is *not* configured with certain features that are not handled
If QEMU is using one of the unsupported features, I don't see anything in
the series which attempts to prevent the actions.

IOW, users can have an arbitrary QEMU config, attempt to use these new features,
the commands may well succeed, but the user is silently left with a broken QEMU.
Such silent failure modes are really undesirable as they'll lead to a never
ending stream of hard to diagnose bug reports for QEMU maintainers.

TLS is one example of this, the live upgrade  will "succeed", but the TLS
connections will be totally non-functional.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH V1 00/32] Live Update

2020-07-30 Thread Paolo Bonzini
On 30/07/20 21:09, Steven Sistare wrote:
>> please spell it out.  Also, how does the functionality compare to
>> xen-save-devices-state and xen-load-devices-state?
>
> qmp_xen_save_devices_state serializes device state to a file which is loaded 
> on the target for a live migration.  It performs some of the same actions
> as cprsave/cprload but does not support live update-in-place.

So it is a subset, can code be reused across both?  Also, live migration
across versions is supported, so can you describe the special
update-in-place support more precisely?  I am confused about the use
cases, which require (or try) to keep file descriptors across re-exec,
which are for kexec, and so on.

>>> cprsave and cprload support guests with vfio devices if the caller first
>>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>>> The guest drivers suspend methods flush outstanding requests and re-
>>> initialize the devices, and thus there is no device state to save and
>>> restore.
>> This probably should be allowed even for regular migration.  Can you
>> generalize the code as a separate series?
>
> Maybe.  I think that would be a distinct patch that ignores the vfio 
> migration blocker 
> if the state is suspended.  Plus a qemu agent call to do the suspend.  Needs 
> more
> thought.

The agent already supports suspend, so that should be relatively easy.
Only the code to add/remove the VFIO migration blocker from a VM state
change notifier, or something like that, would be needed.

Paolo




Re: [PATCH V1 00/32] Live Update

2020-07-30 Thread Steven Sistare
On 7/30/2020 1:49 PM, Dr. David Alan Gilbert wrote:
> * Steve Sistare (steven.sist...@oracle.com) wrote:
>> Improve and extend the qemu functions that save and restore VM state so a
>> guest may be suspended and resumed with minimal pause time.  qemu may be
>> updated to a new version in between.
> 
> Nice.
> 
>> The first set of patches adds the cprsave and cprload commands to save and
>> restore VM state, and allow the host kernel to be updated and rebooted in
>> between.  The VM must create guest RAM in a persistent shared memory file,
>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
>>
>> cprsave stops the VCPUs and saves VM device state in a simple file, and
>> thus supports any type of guest image and block device.  The caller must
>> not modify the VM's block devices between cprsave and cprload.
> 
> can I ask why you don't just add a migration flag to skip the devices
> you don't want, and then do a migrate to a file?
> (i.e. migrate "exec:cat > afile")
> We already have the 'x-ignore-shared' capability that's used for doing
> RAM snapshots of VMs; primarily I think for being able to start a VM
> from a RAM snapshot as a fast VM start trick.
> (There's also a xen_save_devices that does something similar).
> If you backed the RAM as you say, enabled x-ignore-shared and then did:
> 
>migrate "exec:cat > afile"
> 
> and restarted the destination with:
> 
> migrate_incoming "exec:cat afile"
> 
> what is different (except the later stuff about the vfio magic and
> chardevs).
> 
> Dave

Yes, I did consider whether to extend the migration syntax and implemention in
save_vmstate and load_vmstate, versus creating something new.  Those functions 
handle stuff like bdrv snapshot, aio, and migration which are n/a for the cpr 
use case, and the cpr functions handle state that is n/a for the migration 
case. 
I judged that a single function handling both would be less readable and 
maintainable.  At their core all these routines call qemu_loadvm_state() and 
qemu_savevm_state().
 The surrounding code is mostly different.


Take a look at 
  savevm.c:save_vmstate()   vs   save_cpr_snapshot() attached
and
  savevm.c:load_vmstate()   vs   load_cpr_snapshot() attached

I attached the complete versions of the cpr functions because they are built up
over multiple patches in this series, thus hard to visualize in patch form.

- Steve

> 
>> cprsave and cprload support guests with vfio devices if the caller first
>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>> The guest drivers suspend methods flush outstanding requests and re-
>> initialize the devices, and thus there is no device state to save and
>> restore.
>>
>>1 savevm: add vmstate handler iterators
>>2 savevm: VM handlers mode mask
>>3 savevm: QMP command for cprsave
>>4 savevm: HMP Command for cprsave
>>5 savevm: QMP command for cprload
>>6 savevm: HMP Command for cprload
>>7 savevm: QMP command for cprinfo
>>8 savevm: HMP command for cprinfo
>>9 savevm: prevent cprsave if memory is volatile
>>   10 kvmclock: restore paused KVM clock
>>   11 cpu: disable ticks when suspended
>>   12 vl: pause option
>>   13 gdbstub: gdb support for suspended state
>>
>> The next patches add a restart method that eliminates the persistent memory
>> constraint, and allows qemu to be updated across the restart, but does not
>> allow host reboot.  Anonymous memory segments used by the guest are
>> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
>> madvise(MADV_DOEXEC) option in the Linux kernel.  See
>> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yzn...@oracle.com/
>>
>>   14 savevm: VMS_RESTART and cprsave restart
>>   15 vl: QEMU_START_FREEZE env var
>>   16 oslib: add qemu_clr_cloexec
>>   17 util: env var helpers
>>   18 osdep: import MADV_DOEXEC
>>   19 memory: ram_block_add cosmetic changes
>>   20 vl: add helper to request re-exec
>>   21 exec, memory: exec(3) to restart
>>   22 char: qio_channel_socket_accept reuse fd
>>   23 char: save/restore chardev socket fds
>>   24 ui: save/restore vnc socket fds
>>   25 char: save/restore chardev pty fds
>>   26 monitor: save/restore QMP negotiation status
>>   27 vhost: reset vhost devices upon cprsave
>>   28 char: restore terminal on restart
>>
>> The next patches extend the restart method to save and restore vfio-pci
>> state, eliminating the requirement for a guest agent.  The vfio container,
>> group, and device descriptors are preserved across the qemu re-exec.
>>
>>   29 pci: export pci_update_mappings
>>   30 vfio-pci: save and restore
>>   31 vfio-pci: trace pci config
>>   32 vfio-pci: improved tracing
>>
>> Here is an example of updating qemu from v4.2.0 to v4.2.1 using 
>> "cprload restart".  The software update is performed while the guest is
>> running to minimize downtime.
>>
>> 

Re: [PATCH V1 00/32] Live Update

2020-07-30 Thread Steven Sistare
On 7/30/2020 1:15 PM, Paolo Bonzini wrote:
> On 30/07/20 17:14, Steve Sistare wrote:
>> The first set of patches adds the cprsave and cprload commands to save and
>> restore VM state, and allow the host kernel to be updated and rebooted in
>> between.  The VM must create guest RAM in a persistent shared memory file,
>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
>>
>> cprsave stops the VCPUs and saves VM device state in a simple file, and
>> thus supports any type of guest image and block device.  The caller must
>> not modify the VM's block devices between cprsave and cprload.
> 
> Stupid question, what does cpr stand for?  If it is checkpoint/restore,

Checkpoint/restart.  An acronym from my HPC days.  I will spell it out.

> please spell it out.  Also, how does the functionality compare to
> xen-save-devices-state and xen-load-devices-state?

qmp_xen_save_devices_state serializes device state to a file which is loaded 
on the target for a live migration.  It performs some of the same actions
as cprsave/cprload but does not support live update-in-place.

>> cprsave and cprload support guests with vfio devices if the caller first
>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>> The guest drivers suspend methods flush outstanding requests and re-
>> initialize the devices, and thus there is no device state to save and
>> restore.
> 
> This probably should be allowed even for regular migration.  Can you
> generalize the code as a separate series?

Maybe.  I think that would be a distinct patch that ignores the vfio migration 
blocker 
if the state is suspended.  Plus a qemu agent call to do the suspend.  Needs 
more
thought.

- Steve

> 
> Thanks,
> 
> Paolo
> 



Re: [PATCH V1 00/32] Live Update

2020-07-30 Thread Steven Sistare
On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
> On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
>> Improve and extend the qemu functions that save and restore VM state so a
>> guest may be suspended and resumed with minimal pause time.  qemu may be
>> updated to a new version in between.
>>
>> The first set of patches adds the cprsave and cprload commands to save and
>> restore VM state, and allow the host kernel to be updated and rebooted in
>> between.  The VM must create guest RAM in a persistent shared memory file,
>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
>>
>> cprsave stops the VCPUs and saves VM device state in a simple file, and
>> thus supports any type of guest image and block device.  The caller must
>> not modify the VM's block devices between cprsave and cprload.
>>
>> cprsave and cprload support guests with vfio devices if the caller first
>> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
>> The guest drivers suspend methods flush outstanding requests and re-
>> initialize the devices, and thus there is no device state to save and
>> restore.
>>
>>1 savevm: add vmstate handler iterators
>>2 savevm: VM handlers mode mask
>>3 savevm: QMP command for cprsave
>>4 savevm: HMP Command for cprsave
>>5 savevm: QMP command for cprload
>>6 savevm: HMP Command for cprload
>>7 savevm: QMP command for cprinfo
>>8 savevm: HMP command for cprinfo
>>9 savevm: prevent cprsave if memory is volatile
>>   10 kvmclock: restore paused KVM clock
>>   11 cpu: disable ticks when suspended
>>   12 vl: pause option
>>   13 gdbstub: gdb support for suspended state
>>
>> The next patches add a restart method that eliminates the persistent memory
>> constraint, and allows qemu to be updated across the restart, but does not
>> allow host reboot.  Anonymous memory segments used by the guest are
>> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
>> madvise(MADV_DOEXEC) option in the Linux kernel.  See
>> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yzn...@oracle.com/
>>
>>   14 savevm: VMS_RESTART and cprsave restart
>>   15 vl: QEMU_START_FREEZE env var
>>   16 oslib: add qemu_clr_cloexec
>>   17 util: env var helpers
>>   18 osdep: import MADV_DOEXEC
>>   19 memory: ram_block_add cosmetic changes
>>   20 vl: add helper to request re-exec
>>   21 exec, memory: exec(3) to restart
>>   22 char: qio_channel_socket_accept reuse fd
>>   23 char: save/restore chardev socket fds
>>   24 ui: save/restore vnc socket fds
>>   25 char: save/restore chardev pty fds
> 
> Keeping FDs open across re-exec is a nice trick, but how are you dealing
> with the state associated with them, most especially the TLS encryption
> state ? AFAIK, there's no way to serialize/deserialize the TLS state that
> GNUTLS maintains, and the patches don't show any sign of dealing with
> this. IOW it looks like while the FD will be preserved, any TLS session
> running on it will fail.

I had not considered TLS.  If a non-qemu library maintains connection state, 
then
we won't be able to support it for live update until the library provides 
interfaces
to serialize the state.

For qemu objects, so far vmstate has been adequate to represent the devices with
descriptors that we preserve.

> I'm going to presume that you're probably just considering the TLS features
> out of scope for your patch series.  It would be useful if you have any
> info about this and other things you've considered out of scope for this
> patch series.

The descriptors covered in these patches are needed for our use case.  I realize
there are others that could perhaps be preserved, but we have not tried them.
Those descriptors are closed on exec as usual, and are reopened after exec. I
expect that we or others will support more over time.

> I'm not seeing anything in the block layer about preserving open FDs, so
> I presume you're just letting the block layer close and then re-open any
> FDs it has ?  

Correct.

> This would have the side effect that any locks held on the
> FDs are lost, so there's a potential race condition where another process
> could acquire the lock and prevent the re-exec completing. That said this
> is unavoidable, because Linux kernel is completely broken wrt keeping
> fnctl() locks held across a re-exec, always throwing away the locks if
> more than 1 thread is running [1].

Ouch.

- Steve

> 
> Regards,
> Daniel
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1552621
> 



Re: [PATCH V1 00/32] Live Update

2020-07-30 Thread Dr. David Alan Gilbert
* Steve Sistare (steven.sist...@oracle.com) wrote:
> Improve and extend the qemu functions that save and restore VM state so a
> guest may be suspended and resumed with minimal pause time.  qemu may be
> updated to a new version in between.

Nice.

> The first set of patches adds the cprsave and cprload commands to save and
> restore VM state, and allow the host kernel to be updated and rebooted in
> between.  The VM must create guest RAM in a persistent shared memory file,
> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
> 
> cprsave stops the VCPUs and saves VM device state in a simple file, and
> thus supports any type of guest image and block device.  The caller must
> not modify the VM's block devices between cprsave and cprload.

can I ask why you don't just add a migration flag to skip the devices
you don't want, and then do a migrate to a file?
(i.e. migrate "exec:cat > afile")
We already have the 'x-ignore-shared' capability that's used for doing
RAM snapshots of VMs; primarily I think for being able to start a VM
from a RAM snapshot as a fast VM start trick.
(There's also a xen_save_devices that does something similar).
If you backed the RAM as you say, enabled x-ignore-shared and then did:

   migrate "exec:cat > afile"

and restarted the destination with:

migrate_incoming "exec:cat afile"

what is different (except the later stuff about the vfio magic and
chardevs).

Dave

> cprsave and cprload support guests with vfio devices if the caller first
> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
> The guest drivers suspend methods flush outstanding requests and re-
> initialize the devices, and thus there is no device state to save and
> restore.
> 
>1 savevm: add vmstate handler iterators
>2 savevm: VM handlers mode mask
>3 savevm: QMP command for cprsave
>4 savevm: HMP Command for cprsave
>5 savevm: QMP command for cprload
>6 savevm: HMP Command for cprload
>7 savevm: QMP command for cprinfo
>8 savevm: HMP command for cprinfo
>9 savevm: prevent cprsave if memory is volatile
>   10 kvmclock: restore paused KVM clock
>   11 cpu: disable ticks when suspended
>   12 vl: pause option
>   13 gdbstub: gdb support for suspended state
> 
> The next patches add a restart method that eliminates the persistent memory
> constraint, and allows qemu to be updated across the restart, but does not
> allow host reboot.  Anonymous memory segments used by the guest are
> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
> madvise(MADV_DOEXEC) option in the Linux kernel.  See
> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yzn...@oracle.com/
> 
>   14 savevm: VMS_RESTART and cprsave restart
>   15 vl: QEMU_START_FREEZE env var
>   16 oslib: add qemu_clr_cloexec
>   17 util: env var helpers
>   18 osdep: import MADV_DOEXEC
>   19 memory: ram_block_add cosmetic changes
>   20 vl: add helper to request re-exec
>   21 exec, memory: exec(3) to restart
>   22 char: qio_channel_socket_accept reuse fd
>   23 char: save/restore chardev socket fds
>   24 ui: save/restore vnc socket fds
>   25 char: save/restore chardev pty fds
>   26 monitor: save/restore QMP negotiation status
>   27 vhost: reset vhost devices upon cprsave
>   28 char: restore terminal on restart
> 
> The next patches extend the restart method to save and restore vfio-pci
> state, eliminating the requirement for a guest agent.  The vfio container,
> group, and device descriptors are preserved across the qemu re-exec.
> 
>   29 pci: export pci_update_mappings
>   30 vfio-pci: save and restore
>   31 vfio-pci: trace pci config
>   32 vfio-pci: improved tracing
> 
> Here is an example of updating qemu from v4.2.0 to v4.2.1 using 
> "cprload restart".  The software update is performed while the guest is
> running to minimize downtime.
> 
> window 1  | window 2
>   |
> # qemu-system-x86_64 ...  |
> QEMU 4.2.0 monitor - type 'help' ...  |
> (qemu) info status|
> VM status: running|
>   | # yum update qemu
> (qemu) cprsave /tmp/qemu.sav restart  |
> QEMU 4.2.1 monitor - type 'help' ...  |
> (qemu) info status|
> VM status: paused (prelaunch) |
> (qemu) cprload /tmp/qemu.sav  |
> (qemu) info status|
> VM status: running|
> 
> 
> Here is an example of updating the host kernel using "cprload reboot"
> 
> window 1  | window 2
>   |
> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
> QEMU 4.2.1 monitor - type 'help' ...  |
> (qemu) info status|
> VM status: running|
> 

Re: [PATCH V1 00/32] Live Update

2020-07-30 Thread Paolo Bonzini
On 30/07/20 17:14, Steve Sistare wrote:
> The first set of patches adds the cprsave and cprload commands to save and
> restore VM state, and allow the host kernel to be updated and rebooted in
> between.  The VM must create guest RAM in a persistent shared memory file,
> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
> 
> cprsave stops the VCPUs and saves VM device state in a simple file, and
> thus supports any type of guest image and block device.  The caller must
> not modify the VM's block devices between cprsave and cprload.

Stupid question, what does cpr stand for?  If it is checkpoint/restore,
please spell it out.  Also, how does the functionality compare to
xen-save-devices-state and xen-load-devices-state?

> cprsave and cprload support guests with vfio devices if the caller first
> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
> The guest drivers suspend methods flush outstanding requests and re-
> initialize the devices, and thus there is no device state to save and
> restore.

This probably should be allowed even for regular migration.  Can you
generalize the code as a separate series?

Thanks,

Paolo




Re: [PATCH V1 00/32] Live Update

2020-07-30 Thread Daniel P . Berrangé
On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
> Improve and extend the qemu functions that save and restore VM state so a
> guest may be suspended and resumed with minimal pause time.  qemu may be
> updated to a new version in between.
> 
> The first set of patches adds the cprsave and cprload commands to save and
> restore VM state, and allow the host kernel to be updated and rebooted in
> between.  The VM must create guest RAM in a persistent shared memory file,
> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in 
> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/
> 
> cprsave stops the VCPUs and saves VM device state in a simple file, and
> thus supports any type of guest image and block device.  The caller must
> not modify the VM's block devices between cprsave and cprload.
> 
> cprsave and cprload support guests with vfio devices if the caller first
> suspends the guest by issuing guest-suspend-ram to the qemu guest agent.
> The guest drivers suspend methods flush outstanding requests and re-
> initialize the devices, and thus there is no device state to save and
> restore.
> 
>1 savevm: add vmstate handler iterators
>2 savevm: VM handlers mode mask
>3 savevm: QMP command for cprsave
>4 savevm: HMP Command for cprsave
>5 savevm: QMP command for cprload
>6 savevm: HMP Command for cprload
>7 savevm: QMP command for cprinfo
>8 savevm: HMP command for cprinfo
>9 savevm: prevent cprsave if memory is volatile
>   10 kvmclock: restore paused KVM clock
>   11 cpu: disable ticks when suspended
>   12 vl: pause option
>   13 gdbstub: gdb support for suspended state
> 
> The next patches add a restart method that eliminates the persistent memory
> constraint, and allows qemu to be updated across the restart, but does not
> allow host reboot.  Anonymous memory segments used by the guest are
> preserved across a re-exec of qemu, mapped at the same VA, via a proposed
> madvise(MADV_DOEXEC) option in the Linux kernel.  See
> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yzn...@oracle.com/
> 
>   14 savevm: VMS_RESTART and cprsave restart
>   15 vl: QEMU_START_FREEZE env var
>   16 oslib: add qemu_clr_cloexec
>   17 util: env var helpers
>   18 osdep: import MADV_DOEXEC
>   19 memory: ram_block_add cosmetic changes
>   20 vl: add helper to request re-exec
>   21 exec, memory: exec(3) to restart
>   22 char: qio_channel_socket_accept reuse fd
>   23 char: save/restore chardev socket fds
>   24 ui: save/restore vnc socket fds
>   25 char: save/restore chardev pty fds

Keeping FDs open across re-exec is a nice trick, but how are you dealing
with the state associated with them, most especially the TLS encryption
state ? AFAIK, there's no way to serialize/deserialize the TLS state that
GNUTLS maintains, and the patches don't show any sign of dealing with
this. IOW it looks like while the FD will be preserved, any TLS session
running on it will fail.

I'm going to presume that you're probably just considering the TLS features
out of scope for your patch series.  It would be useful if you have any
info about this and other things you've considered out of scope for this
patch series.

I'm not seeing anything in the block layer about preserving open FDs, so
I presume you're just letting the block layer close and then re-open any
FDs it has ?  This would have the side effect that any locks held on the
FDs are lost, so there's a potential race condition where another process
could acquire the lock and prevent the re-exec completing. That said this
is unavoidable, because Linux kernel is completely broken wrt keeping
fnctl() locks held across a re-exec, always throwing away the locks if
more than 1 thread is running [1].

Regards,
Daniel

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1552621
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|