On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: > Migration of a guest in the suspended state is broken. The incoming > migration code automatically tries to wake the guest, which IMO is > wrong -- the guest should end migration in the same state it started. > Further, the wakeup is done by calling qemu_system_wakeup_request(), which > bypasses vm_start(). The guest appears to be in the running state, but > it is not. > > To fix, leave the guest in the suspended state, but call > qemu_system_start_on_wakeup_request() so the guest is properly resumed > later, when the client sends a system_wakeup command. > > Signed-off-by: Steve Sistare <steven.sist...@oracle.com> > --- > migration/migration.c | 11 ++++------- > softmmu/runstate.c | 1 + > 2 files changed, 5 insertions(+), 7 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 17b4b47..851fe6d 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque) > vm_start(); > } else { > runstate_set(global_state_get_runstate()); > + if (runstate_check(RUN_STATE_SUSPENDED)) { > + /* Force vm_start to be called later. */ > + qemu_system_start_on_wakeup_request(); > + }
Is this really needed, along with patch 1? I have a very limited knowledge on suspension, so I'm prone to making mistakes.. But from what I read this, qemu_system_wakeup_request() (existing one, not after patch 1 applied) will setup wakeup_reason and kick the main thread using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in the main thread later on after qemu_wakeup_requested() returns true. > } > /* > * This must happen after any state changes since as soon as an external > @@ -2101,7 +2105,6 @@ static int postcopy_start(MigrationState *ms) > qemu_mutex_lock_iothread(); > trace_postcopy_start_set_run(); > > - qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > global_state_store(); > ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); > if (ret < 0) { > @@ -2307,7 +2310,6 @@ static void migration_completion(MigrationState *s) > if (s->state == MIGRATION_STATUS_ACTIVE) { > qemu_mutex_lock_iothread(); > s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > - qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > > s->vm_old_state = runstate_get(); > global_state_store(); > @@ -3102,11 +3104,6 @@ static void *bg_migration_thread(void *opaque) > > qemu_mutex_lock_iothread(); > > - /* > - * If VM is currently in suspended state, then, to make a valid runstate > - * transition in vm_stop_force_state() we need to wakeup it up. > - */ > - qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); Removal of these three places seems reasonable to me, or we won't persist the SUSPEND state. Above comment was the major reason I used to have thought it was needed (again, based on zero knowledge around this..), but perhaps it was just wrong? I would assume vm_stop_force_state() will still just work with suepended, am I right? > s->vm_old_state = runstate_get(); > > global_state_store(); > diff --git a/softmmu/runstate.c b/softmmu/runstate.c > index e127b21..771896c 100644 > --- a/softmmu/runstate.c > +++ b/softmmu/runstate.c > @@ -159,6 +159,7 @@ static const RunStateTransition > runstate_transitions_def[] = { > { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED }, > { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING }, > { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE }, > + { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED }, > { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH }, > { RUN_STATE_SUSPENDED, RUN_STATE_COLO}, > > -- > 1.8.3.1 > -- Peter Xu