Juraj Marcin <jmar...@redhat.com> writes: > From: Peter Xu <pet...@redhat.com> > > If a rare split brain happens (e.g. dest QEMU started running somehow, > taking shared drive locks), src QEMU may not be able to activate the > drives anymore. In this case, src QEMU shouldn't start the VM or it might > crash the block layer later with something like: > > bdrv_co_write_req_prepare: Assertion `!(bs->open_flags & > BDRV_O_INACTIVE)' failed.
This patch doesn't fix the assert, so I think it's best not to mention it at all. We've had a few instances of this assert (or similar) and there's a fair chance someone still looks at git log searching for a backport. > > Meanwhile, src QEMU cannot try to continue either even if dest QEMU can > release the drive locks (e.g. by QMP "stop"). Because as long as dest QEMU > started running, it means dest QEMU's RAM is the only version that is > consistent with current status of the shared storage. > > Signed-off-by: Peter Xu <pet...@redhat.com> > --- > migration/migration.c | 29 ++++++++++++++++++++++++----- > 1 file changed, 24 insertions(+), 5 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 10c216d25d..54dac3db88 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -3502,6 +3502,8 @@ static MigIterateState > migration_iteration_run(MigrationState *s) > > static void migration_iteration_finish(MigrationState *s) > { > + Error *local_err = NULL; > + > bql_lock(); > > /* > @@ -3525,11 +3527,28 @@ static void migration_iteration_finish(MigrationState > *s) > case MIGRATION_STATUS_FAILED: > case MIGRATION_STATUS_CANCELLED: > case MIGRATION_STATUS_CANCELLING: > - /* > - * Re-activate the block drives if they're inactivated. Note, COLO > - * shouldn't use block_active at all, so it should be no-op there. > - */ > - migration_block_activate(NULL); > + if (!migration_block_activate(&local_err)) { > + /* > + * Re-activate the block drives if they're inactivated. Comment formatting is wrong. > + * > + * If it fails (e.g. in case of a split brain, where dest QEMU > + * might have taken some of the drive locks and running!), do > + * not start VM, instead wait for mgmt to decide the next step. > + * > + * If dest already started, it means dest QEMU should contain > + * all the data it needs and it properly owns all the drive > + * locks. Then even if src QEMU got a FAILED in migration, it > + * normally should mean we should treat the migration as > + * COMPLETED. > + * > + * NOTE: it's not safe anymore to start VM on src now even if > + * dest would release the drive locks. It's because as long as > + * dest started running then only dest QEMU's RAM is consistent > + * with the shared storage. > + */ > + error_free(local_err); > + break; > + } > if (runstate_is_live(s->vm_old_state)) { > if (!runstate_check(RUN_STATE_SHUTDOWN)) { > vm_start(); Reviewed-by: Fabiano Rosas <faro...@suse.de>