On Wed, Sep 17, 2014 at 10:25 AM, Paolo Bonzini <pbonz...@redhat.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Il 17/09/2014 11:06, Stefan Hajnoczi ha scritto: >> I think the fundamental problem here is that the mirror block job >> on the source host does not synchronize with live migration. >> >> Remember the mirror block job iterates on the dirty bitmap >> whenever it feels like. >> >> There is no guarantee that the mirror block job has quiesced before >> migration handover takes place, right? > > Libvirt does that. Migration is started only once storage mirroring > is out of the bulk phase, and the handover looks like: > > 1) migration completes > > 2) because the source VM is stopped, the disk has quiesced on the source
But the mirror block job might still be writing out dirty blocks. > 3) libvirt sends block-job-complete No, it sends block-job-cancel after the source QEMU's migration has completed. See the qemuMigrationCancelDriveMirror() call in src/qemu/qemu_migration.c:qemuMigrationRun(). > 4) libvirt receives BLOCK_JOB_COMPLETED. The disk has now quiesced on > the destination as well. I don't see where this happens in the libvirt source code. Libvirt doesn't care about block job events for drive-mirror during migration. And that's why there could still be I/O going on (since block-job-cancel is asynchronous). > 5) the VM is started on the destination > > 6) the NBD server is stopped on the destination and the source VM is quit. > > It is actually a feature that storage migration is completed > asynchronously with respect to RAM migration. The problem is that > qcow2_invalidate_cache happens between (3) and (5), and it doesn't > like the concurrent I/O received by the NBD server. I agree that qcow2_invalidate_cache() (and any other invalidate cache implementations) need to allow concurrent I/O requests. Either I'm misreading the libvirt code or libvirt is not actually ensuring that the block job on the source has cancelled/completed before the guest is resumed on the destination. So I think there is still a bug, maybe Eric can verify this? Stefan