On Tue, Feb 13, 2018 at 04:23:21PM +0100, Kevin Wolf wrote:
> Am 13.02.2018 um 15:58 hat Daniel P. Berrangé geschrieben:
> > On Tue, Feb 13, 2018 at 03:43:10PM +0100, Kevin Wolf wrote:
> > > Am 13.02.2018 um 15:30 hat Roman Kagan geschrieben:
> > > > On Tue, Feb 13, 2018 at 11:50:24AM +0100, Kevin Wolf wrote:
> > > > > Am 11.01.2018 um 14:04 hat Daniel P. Berrange geschrieben:
> > > > > > Then you could just use the regular migrate QMP commands for loading
> > > > > > and saving snapshots.
> > > > >
> > > > > Yes, you could. I think for a proper implementation you would want to
> > > > > do
> > > > > better, though. Live migration provides just a stream, but that's not
> > > > > really well suited for snapshots. When a RAM page is dirtied, you just
> > > > > want to overwrite the old version of it in a snapshot [...]
> > > >
> > > > This means the point in time where the guest state is snapshotted is not
> > > > when the command is issued, but any unpredictable amount of time later.
> > > >
> > > > I'm not sure this is what a user expects.
> > >
> > > I don't think it's necessarily a big problem as long as you set the
> > > expectations right, but good point anyway.
> > >
> > > > A better approach for the save part appears to be to stop the vcpus,
> > > > dump the device state, resume the vcpus, and save the memory contents
> > > > in the background, prioritizing the old copies of the pages that
> > > > change.
> > >
> > > So basically you would let the guest fault whenever it writes to a page
> > > that is not saved yet, and then save it first before you make the page
> > > writable again? Essentially blockdev-backup, except for RAM.
> > The page fault servicing will be delayed by however long it takes to
> > write the page to underling storage, which could be considerable with
> > non-SSD. So guest performance could be significantly impacted on slow
> > storage with high dirtying rate. On the flip side it gurantees a live
> > snapshot would complete in finite time which is good.
> You can just use a bounce buffer for writing out the old page. Then the
> VM is only stopped for the duration of a malloc() + memcpy().
The would allow QEMU memory usage to balloon up to x2 RAM, if there was
slow storage and very fast dirtying rate. I don't think that's viable
unless there was a cap on how much bounce buffering we would allow before
just blocking the page faults
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|