On Tue, Sep 27, 2016 at 10:27:12AM +0100, Stefan Hajnoczi wrote: > On Mon, Aug 29, 2016 at 06:56:42PM +0000, Felipe Franciosi wrote: > > Heya! > > > > > On 29 Aug 2016, at 08:06, Stefan Hajnoczi <stefa...@gmail.com> wrote: > > > > > > At KVM Forum an interesting idea was proposed to avoid > > > bdrv_drain_all() during live migration. Mike Cui and Felipe Franciosi > > > mentioned running at queue depth 1. It needs more thought to make it > > > workable but I want to capture it here for discussion and to archive > > > it. > > > > > > bdrv_drain_all() is synchronous and can cause VM downtime if I/O > > > requests hang. We should find a better way of quiescing I/O that is > > > not synchronous. Up until now I thought we should simply add a > > > timeout to bdrv_drain_all() so it can at least fail (and live > > > migration would fail) if I/O is stuck instead of hanging the VM. But > > > the following approach is also interesting... > > > > > > During the iteration phase of live migration we could limit the queue > > > depth so points with no I/O requests in-flight are identified. At > > > these points the migration algorithm has the opportunity to move to > > > the next phase without requiring bdrv_drain_all() since no requests > > > are pending. > > > > I actually think that this "io quiesced state" is highly unlikely to _just_ > > happen on a busy guest. The main idea behind running at QD1 is to naturally > > throttle the guest and make it easier to "force quiesce" the VQs. > > > > In other words, if the guest is busy and we run at QD1, I would expect the > > rings to be quite full of pending (ie. unprocessed) requests. At the same > > time, I would expect that a call to bdrv_drain_all() (as part of > > do_vm_stop()) should complete much quicker. > > > > Nevertheless, you mentioned that this is still problematic as that single > > outstanding IO could block, leaving the VM paused for longer. > > > > My suggestion is therefore that we leave the vCPUs running, but stop > > picking up requests from the VQs. Provided nothing blocks, you should reach > > the "io quiesced state" fairly quickly. If you don't, then the VM is at > > least still running (despite seeing no progress on its VQs). > > > > Thoughts on that? > > If the guest experiences a hung disk it may enter error recovery. QEMU > should avoid this so the guest doesn't remount file systems read-only. > > This can be solved by only quiescing the disk for, say, 30 seconds at a > time. If we don't reach a point where live migration can proceed during > those 30 seconds then the disk will service requests again temporarily > to avoid upsetting the guest.
What is the actual trigger for guest error recovery ? If you have the situation where bdrv_drain_all could hang, surely even if you start processing requests again after 30 seconds, you might not actually be able to complete those requests for a long time, due to fact that drain all has still got outstanding work blocking the new requests you just accepted from the guest ? Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|