On 8/14/2023 3:37 PM, Peter Xu wrote: > On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote: >>> Can we just call vm_state_notify() earlier? >> >> We cannot. The guest is not running yet, and will not be until later. >> We cannot call notifiers that perform actions that complete, or react to, >> the guest entering a running state. > > I tried to look at a few examples of the notifees and most of them I read > do not react to "vcpu running" but "vm running" (in which case I think > "suspended" mode falls into "vm running" case); most of them won't care on > the RunState parameter passed in, but only the bool "running". > > In reality, when running=true, it must be RUNNING so far. > > In that case does it mean we should notify right after the switchover, > since after migration the vm is indeed running only if the vcpus are not > during suspend?
I cannot parse your question, but maybe this answers it. If the outgoing VM is running and not suspended, then the incoming side tests for autostart==true and calls vm_start, which calls the notifiers, right after the switchover. > One example (of possible issue) is vfio_vmstate_change(), where iiuc if we > try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that > device; this kind of prove to me that SUSPEND is actually one of > running=true states. > > If we postpone all notifiers here even after we switched over to dest qemu > to the next upcoming suspend wakeup, I think it means these devices will > not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps > VFIO_DEVICE_STATE_STOP. or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup. AFAIK it is OK to remain in that state until wakeup is called later. > Ideally I think we should here call vm_state_notify() with running=true and > state=SUSPEND, but since I do see some hooks are not well prepared for > SUSPEND over running=true, I'd think we should on the safe side call > vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch > over phase. With that IIUC it'll naturally work (e.g. when wakeup again > later we just need to call no notifiers). Notifiers are just one piece, all the code in vm_prepare_start must be called. Is it correct to call all of that long before we actually resume the CPUs in wakeup? I don't know, but what is the point? The wakeup code still needs modification to conditionally resume the vcpus. The scheme would be roughly: loadvm_postcopy_handle_run_bh() runstat = global_state_get_runstate(); if (runstate == RUN_STATE_RUNNING) { vm_start() } else if (runstate == RUN_STATE_SUSPENDED) vm_prepare_start(); // the start of vm_start() } qemu_system_wakeup_request() if (some condition) resume_all_vcpus(); // the remainder of vm_start() else runstate_set(RUN_STATE_RUNNING) How is that better than my patches [PATCH V3 01/10] vl: start on wakeup request [PATCH V3 02/10] migration: preserve suspended runstate loadvm_postcopy_handle_run_bh() runstate = global_state_get_runstate(); if (runstate == RUN_STATE_RUNNING) vm_start() else runstate_set(runstate); // eg RUN_STATE_SUSPENDED qemu_system_wakeup_request() if (!vm_started) vm_start(); else runstate_set(RUN_STATE_RUNNING); Recall this thread started with your comment "It then can avoid touching the system wakeup code which seems cleaner". We still need to touch the wakeup code. - Steve