Re: How to impove downtime of Live-Migration caused bdrv_drain_all()

2020-01-02 Thread Felipe Franciosi


> On Jan 2, 2020, at 3:07 PM, Stefan Hajnoczi  wrote:
> 
> On Thu, Dec 26, 2019 at 05:40:22PM +0800, 张海斌 wrote:
>> Stefan Hajnoczi  于2019年3月29日周五 上午1:08写道:
>>> 
>>> On Thu, Mar 28, 2019 at 05:53:34PM +0800, 张海斌 wrote:
 hi, stefan
 
 I have faced the same problem you wrote in
 https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg04025.html
 
 Reproduce as follow:
 1. Clone qemu code from https://git.qemu.org/git/qemu.git, add some
 debug information and compile
 2. Start a new VM
 3. In VM, use fio randwrite to add pressure for disk
 4. Live migrate
 
 Log show as follow:
 [2019-03-28 15:10:40.206] /data/qemu/cpus.c:1086: enter do_vm_stop
 [2019-03-28 15:10:40.212] /data/qemu/cpus.c:1097: call bdrv_drain_all
 [2019-03-28 15:10:40.989] /data/qemu/cpus.c:1099: call 
 replay_disable_events
 [2019-03-28 15:10:40.989] /data/qemu/cpus.c:1101: call bdrv_flush_all
 [2019-03-28 15:10:41.004] /data/qemu/cpus.c:1104: done do_vm_stop
 
 Calling bdrv_drain_all() costs 792 mini-seconds.
 I just add a bdrv_drain_all() at start of do_vm_stop() before
 pause_all_vcpus(), but it doesn't work.
 Is there any way to improve live-migration downtime cause by 
 bdrv_drain_all()?
> 
> I believe there were ideas about throttling storage controller devices
> during the later phases of live migration to reduce the number of
> pending I/Os.
> 
> In other words, if QEMU's virtio-blk/scsi emulation code reduces the
> queue depth as live migration nears the handover point, bdrv_drain_all()
> should become cheaper because fewer I/O requests will be in-flight.
> 
> A simple solution would reduce the queue depth during live migration
> (e.g. queue depth 1).  A smart solution would look at I/O request
> latency to decide what queue depth is acceptable.  For example, if
> requests are taking 4 ms to complete then we might allow 2 or 3 requests
> to achieve a ~10 ms bdrv_drain_all() downtime target.
> 
> As far as I know this has not been implemented.
> 
> Do you want to try implementing this?
> 
> Stefan

It is a really hard problem to solve. Ultimately, if guarantees are
needed about the blackout period, I don't see any viable solution
other than aborting all pending storage commands.

Starting with a "go to QD=1 mode" approach is probably sensible.
Vhost-based backends could even do that off the "you need to log"
message, given that these are only used during migration.

Having a "you are taking too long, abort everything" command might be
something worth looking into, specially if we can *safely* replay them
on the other side. (That may be backend-dependent.)

F.

Re: How to impove downtime of Live-Migration caused bdrv_drain_all()

2020-01-02 Thread Stefan Hajnoczi
On Thu, Dec 26, 2019 at 05:40:22PM +0800, 张海斌 wrote:
> Stefan Hajnoczi  于2019年3月29日周五 上午1:08写道:
> >
> > On Thu, Mar 28, 2019 at 05:53:34PM +0800, 张海斌 wrote:
> > > hi, stefan
> > >
> > > I have faced the same problem you wrote in
> > > https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg04025.html
> > >
> > > Reproduce as follow:
> > > 1. Clone qemu code from https://git.qemu.org/git/qemu.git, add some
> > > debug information and compile
> > > 2. Start a new VM
> > > 3. In VM, use fio randwrite to add pressure for disk
> > > 4. Live migrate
> > >
> > > Log show as follow:
> > > [2019-03-28 15:10:40.206] /data/qemu/cpus.c:1086: enter do_vm_stop
> > > [2019-03-28 15:10:40.212] /data/qemu/cpus.c:1097: call bdrv_drain_all
> > > [2019-03-28 15:10:40.989] /data/qemu/cpus.c:1099: call 
> > > replay_disable_events
> > > [2019-03-28 15:10:40.989] /data/qemu/cpus.c:1101: call bdrv_flush_all
> > > [2019-03-28 15:10:41.004] /data/qemu/cpus.c:1104: done do_vm_stop
> > >
> > > Calling bdrv_drain_all() costs 792 mini-seconds.
> > > I just add a bdrv_drain_all() at start of do_vm_stop() before
> > > pause_all_vcpus(), but it doesn't work.
> > > Is there any way to improve live-migration downtime cause by 
> > > bdrv_drain_all()?

I believe there were ideas about throttling storage controller devices
during the later phases of live migration to reduce the number of
pending I/Os.

In other words, if QEMU's virtio-blk/scsi emulation code reduces the
queue depth as live migration nears the handover point, bdrv_drain_all()
should become cheaper because fewer I/O requests will be in-flight.

A simple solution would reduce the queue depth during live migration
(e.g. queue depth 1).  A smart solution would look at I/O request
latency to decide what queue depth is acceptable.  For example, if
requests are taking 4 ms to complete then we might allow 2 or 3 requests
to achieve a ~10 ms bdrv_drain_all() downtime target.

As far as I know this has not been implemented.

Do you want to try implementing this?

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 3/3] iotests: Test external snapshot with VM state

2020-01-02 Thread Dr. David Alan Gilbert
* Kevin Wolf (kw...@redhat.com) wrote:
> Am 19.12.2019 um 15:26 hat Max Reitz geschrieben:
> > On 17.12.19 15:59, Kevin Wolf wrote:
> > > This tests creating an external snapshot with VM state (which results in
> > > an active overlay over an inactive backing file, which is also the root
> > > node of an inactive BlockBackend), re-activating the images and
> > > performing some operations to test that the re-activation worked as
> > > intended.
> > > 
> > > Signed-off-by: Kevin Wolf 
> > 
> > [...]
> > 
> > > diff --git a/tests/qemu-iotests/280.out b/tests/qemu-iotests/280.out
> > > new file mode 100644
> > > index 00..5d382faaa8
> > > --- /dev/null
> > > +++ b/tests/qemu-iotests/280.out
> > > @@ -0,0 +1,50 @@
> > > +Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=67108864 
> > > cluster_size=65536 lazy_refcounts=off refcount_bits=16
> > > +
> > > +=== Launch VM ===
> > > +Enabling migration QMP events on VM...
> > > +{"return": {}}
> > > +
> > > +=== Migrate to file ===
> > > +{"execute": "migrate", "arguments": {"uri": "exec:cat > /dev/null"}}
> > > +{"return": {}}
> > > +{"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": 
> > > {"microseconds": "USECS", "seconds": "SECS"}}
> > > +{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": 
> > > {"microseconds": "USECS", "seconds": "SECS"}}
> > > +{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": 
> > > {"microseconds": "USECS", "seconds": "SECS"}}
> > > +
> > > +VM is now stopped:
> > > +completed
> > > +{"execute": "query-status", "arguments": {}}
> > > +{"return": {"running": false, "singlestep": false, "status": 
> > > "postmigrate"}}
> > 
> > Hmmm, I get a finish-migrate status here (on tmpfs)...
> 
> Dave, is it intentional that the "completed" migration event is emitted
> while we are still in finish-migration rather than postmigrate?

Yes it looks like it;  it's that the migration state machine hits
COMPLETED that then _causes_ the runstate transitition to POSTMIGRATE.

static void migration_iteration_finish(MigrationState *s)
{
/* If we enabled cpu throttling for auto-converge, turn it off. */
cpu_throttle_stop();

qemu_mutex_lock_iothread();
switch (s->state) {
case MIGRATION_STATUS_COMPLETED:
migration_calculate_complete(s);
runstate_set(RUN_STATE_POSTMIGRATE);
break;

then there are a bunch of error cases where if it landed in
FAILED/CANCELLED etc then we either restart the VM or also go to
POSTMIGRATE.

> I guess we could change wait_migration() in qemu-iotests to wait for the
> postmigrate state rather than the "completed" event, but maybe it would
> be better to change the migration code to avoid similar races in other
> QMP clients.

Given that the migration state machine is driving the runstate state
machine I think it currently makes sense internally;  (although I don't
think it's documented to be in that order or tested to be, which we
might want to fix).

Looking at 234 and 262, it looks like you're calling wait_migration on
both the source and dest; I don't think the dest will see the
POSTMIGRATE.  Also note that depending what you're trying to do, with
postcopy you'll be running on the destination before you see COMPLETED.

Waiting for the destination to leave 'inmigrate' state is probably
the best strategy; then wait for the source to be in postmigrate.
You can cause early exits if you see transitions to 'FAILED' - but
actually the destination will likely quit in that case; so it should
be much rarer for you to hit a timeout on a failed migration.

Dave


> Kevin


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH 00/18] block: Allow exporting BDSs via FUSE

2020-01-02 Thread Stefan Hajnoczi
On Fri, Dec 20, 2019 at 11:30:33AM +0100, Max Reitz wrote:
> On 20.12.19 11:08, Stefan Hajnoczi wrote:
> > On Thu, Dec 19, 2019 at 03:38:00PM +0100, Max Reitz wrote:
> > Please send a follow-up patch that adds a qemu(1) -blockdev
> > 'Driver-specific options for "fuse"' documentation section.
> 
> What exactly do you mean?  This is not a block driver, so it doesn’t
> work as part of -blockdev.  Currently, it can only be used through QMP
> (fuse-export-add/fuse-export-remove).

I don't know what I was thinking :).

Stefan


signature.asc
Description: PGP signature