>>>> Avoid starting a new migration task while the previous one still >>> exist. >>> >>> Can you explain how to reproduce the problem? >>> >> When network disconnection between source and destination happened, >> the migration thread stuck at below stack, >> Then I cancel the migration task, the migration state in qemu will be set to >> MIG_STATE_CANCELLED, so the migration job in libvirt quits. >> Then I perform migration again, at this time, the network reconnected >> successfully, since the TCP timeout retransmission, above stack will not >> return immediately, so two migration tasks exist at the same time. >> And still worse, source qemu will crash, because of accessing the NULL >> pointer in qemu_bh_schedule(s->cleanup_bh); statement in latter migration >> task, since the "s->cleanup_bh" had been deleted by previous migration task. > >Thanks for explaining. CANCELLING looks like a useful addition. > >Why do you need both CANCELLING and COMPLETING? The COMPLETED state should be >set only after all I/O is done. > There is a period of time from the timing of setting COMPLETED state to that of migration task exits, so it's problematic in COMPLETED->CANCELLED transition, but if applying your below proposal, the problem gone. do { old_state = s->state; if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) { break; } migrate_set_state(s, old_state, MIG_STATE_CANCELLED); } while (s->state != MIG_STATE_CANCELLED);
>I agree with Eric that the CANCELLING state should not be exposed via QMP. >"info migrate" and "query-migrate" can keep showing "active" for maximum >backwards compatibility. > >More comments below. > > >> - if (s->state != MIG_STATE_COMPLETED) { >> + if (s->state != MIG_STATE_COMPLETING) { >> qemu_savevm_state_cancel(); >> + if (s->state == MIG_STATE_CANCELLING) { >> + migrate_set_state(s, MIG_STATE_CANCELLING, >> MIG_STATE_CANCELLED); >> + } > >I think you can remove the "if" and unconditionally call migrate_set_state. Do you mean to remove the "if (s->state == MIG_STATE_CANCELLING)" ? The s->state probably is MIG_STATE_ERROR here, is it okay to unconditionally call migrate_set_state? Thanks, Zhang Haoyu > >> + }else { >> + migrate_set_state(s, MIG_STATE_COMPLETING, >> + MIG_STATE_COMPLETED); >> } >> >> notifier_list_notify(&migration_state_notifiers, s); } >> >> -static void migrate_set_state(MigrationState *s, int old_state, int >> new_state) -{ >> - if (atomic_cmpxchg(&s->state, old_state, new_state) == new_state) { >> - trace_migrate_set_state(new_state); >> - } >> -} >> - >> void migrate_fd_error(MigrationState *s) { >> DPRINTF("setting error state\n"); @@ -328,7 +337,7 @@ static void >> migrate_fd_cancel(MigrationState *s) { >> DPRINTF("cancelling migration\n"); >> >> - migrate_set_state(s, s->state, MIG_STATE_CANCELLED); >> + migrate_set_state(s, s->state, MIG_STATE_CANCELLING); > >Here probably we want something like > > do { > old_state = s->state; > if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) { > break; > } > migrate_set_state(s, old_state, MIG_STATE_CANCELLING); > } while (s->state != MIG_STATE_CANCELLING); > >to avoid a bogus COMPLETED->CANCELLED transition. Please separate the patch >in two parts: > >(1) the first uses the above code, with CANCELLED instead of CANCELLING > >(2) the second, similar to the one you have posted, introduces the new >CANCELLING state > >Thanks, > >Paolo