On Tue, Sep 16, 2014 at 02:10:39PM +0200, Paolo Bonzini wrote: > Il 16/09/2014 14:02, Alexey Kardashevskiy ha scritto: > > I am having problems when migrate a guest via libvirt like this: > > > > virsh migrate --live --persistent --undefinesource --copy-storage-all > > --verbose --desturi qemu+ssh://legkvm/system --domain chig1 > > > > The XML used to create the guest is at the end of this mail. > > > > I see NBD FLUSH command after the destination QEMU received EOF for > > migration stream and this produces a crash in qcow2_co_flush_to_os() as > > s->lock is false or s->l2_table_cache is NULL. > > > > Max, Kevin, could the fix be something like this? > > diff --git a/block/qcow2.c b/block/qcow2.c > index 0daf25c..e7459ea 100644 > --- a/block/qcow2.c > +++ b/block/qcow2.c > @@ -1442,6 +1442,7 @@ static void qcow2_invalidate_cache(BlockDriverState > *bs, Error **errp) > memcpy(&aes_decrypt_key, &s->aes_decrypt_key, > sizeof(aes_decrypt_key)); > } > > + qemu_co_mutex_lock(&s->lock); > qcow2_close(bs); > > bdrv_invalidate_cache(bs->file, &local_err); > @@ -1455,6 +1456,7 @@ static void qcow2_invalidate_cache(BlockDriverState > *bs, Error **errp) > > ret = qcow2_open(bs, options, flags, &local_err); > QDECREF(options); > + qemu_co_mutex_unlock(&s->lock); > if (local_err) { > error_setg(errp, "Could not reopen qcow2 layer: %s", > error_get_pretty(local_err)); > > On top of this, *_invalidate_cache needs to be marked as coroutine_fn.
I think the fundamental problem here is that the mirror block job on the source host does not synchronize with live migration. Remember the mirror block job iterates on the dirty bitmap whenever it feels like. There is no guarantee that the mirror block job has quiesced before migration handover takes place, right? IMO that's the fundamental problem and trying to protect qcow2_invalidate_cache() seems wrong. We must make sure that the mirror block job on the source has quiesced before we hand over, otherwise the destination could see an outdated copy of the disk! Please let me know if I missed something which makes the operation safe on the source, but I didn't spot any guards. Stefan
pgp5Os5EzKEzb.pgp
Description: PGP signature