On Tue, Sep 20, 2016 at 04:20:49PM +0100, Daniel P. Berrange wrote:
> On Tue, Sep 20, 2016 at 11:01:23AM -0400, Sean Dague wrote:
> > Here is my reconstruction of the snapshot issue from what I can remember
> > of the conversation.
> > Nova defaults to live snapshots. This uses the libvirt facility which
> > dumps both memory and disk. And then we throw away the memory. For large
> > memory guests (especially volume backed ones that might have a fast path
> > for the disk) this leads to a lot of overhead for no gain. The
> > workaround got them past it.
> I think you've got it backwards there.
> Nova defaults to *not* using live snapshots:
> Disable live snapshots when using the libvirt driver.
> When live snapshot is disabled like this, the snapshot code is unable
> to guarantee a consistent disk state. So the libvirt nova driver will
> stop the guest by doing a managed save (this saves all memory to
> disk), then does the disk snapshot, then restores the managed saved
> (which loads all memory from disk).
> This is terrible for multiple reasons
> 1. the guest workload stops running while snapshot is taken
> 2. we churn disk I/O saving & loading VM memory
> 3. you can't do it at all if host PCI devices are attached to
> the VM
> Enabling live snapshots by default fixes all these problems, at the
> risk of hitting the live snapshot bug we saw in the gate CI but never
> anywhere else.
Yes, FWIW, I agree. In addition to the nice details above, enabling the
live snapshots also allows you to quiesce file systems for consistent
disk state, via the Glance image metadata properties
'hw_qemu_guest_agent' and 'os_require_quiesce'. (Current cold snapshot
mechanism doesn't allow this.)
Anyhow, Sean seems to have submitted the change to toggle the config for
https://review.openstack.org/#/c/373430/ -- "Change
OpenStack Development Mailing List (not for usage questions)