On 8/17/2023 2:19 PM, Peter Xu wrote: > On Wed, Aug 16, 2023 at 01:48:13PM -0400, Steven Sistare wrote: >> On 8/14/2023 3:37 PM, Peter Xu wrote: >>> On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote: >>>>> Can we just call vm_state_notify() earlier? >>>> >>>> We cannot. The guest is not running yet, and will not be until later. >>>> We cannot call notifiers that perform actions that complete, or react to, >>>> the guest entering a running state. >>> >>> I tried to look at a few examples of the notifees and most of them I read >>> do not react to "vcpu running" but "vm running" (in which case I think >>> "suspended" mode falls into "vm running" case); most of them won't care on >>> the RunState parameter passed in, but only the bool "running". >>> >>> In reality, when running=true, it must be RUNNING so far. >>> >>> In that case does it mean we should notify right after the switchover, >>> since after migration the vm is indeed running only if the vcpus are not >>> during suspend? >> >> I cannot parse your question, but maybe this answers it. >> If the outgoing VM is running and not suspended, then the incoming side >> tests for autostart==true and calls vm_start, which calls the notifiers, >> right after the switchover. > > I meant IMHO SUSPENDED should be seen as "vm running" case to me, just like > RUNNING. Then, we should invoke vm_prepare_start(), just need some touch > ups. > >> >>> One example (of possible issue) is vfio_vmstate_change(), where iiuc if we >>> try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that >>> device; this kind of prove to me that SUSPEND is actually one of >>> running=true states. >>> >>> If we postpone all notifiers here even after we switched over to dest qemu >>> to the next upcoming suspend wakeup, I think it means these devices will >>> not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps >>> VFIO_DEVICE_STATE_STOP. >> >> or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup. >> AFAIK it is OK to remain in that state until wakeup is called later. > > So let me provide another clue of why I think we should call > vm_prepare_start().. > > Firstly, I think RESUME event should always be there right after we > switched over, no matter suspeneded or not. I just noticed that your test > case would work because you put "wakeup" to be before RESUME. I'm not sure > whether that's correct. I'd bet people could rely on that RESUME to > identify the switchover.
Yes, possibly. > More importantly, I'm wondering whether RTC should still be running during > the suspended mode? Sorry again if my knowledge over there is just > limited, so correct me otherwise - but my understanding is during suspend > mode (s1 or s3, frankly I can't tell which one this belongs..), rtc should > still be running along with the system clock. It means we _should_ at > least call cpu_enable_ticks() to enable rtc: Agreed. The comment above cpu_get_ticks says: * return the time elapsed in VM between vm_start and vm_stop Suspending a guest does not call vm_stop, so ticks keeps running. I also verified that experimentally. > /* > * enable cpu_get_ticks() > * Caller must hold BQL which serves as mutex for vm_clock_seqlock. > */ > void cpu_enable_ticks(void) > > I think that'll enable cpu_get_tsc() and make it start to work right. > >> >>> Ideally I think we should here call vm_state_notify() with running=true and >>> state=SUSPEND, but since I do see some hooks are not well prepared for >>> SUSPEND over running=true, I'd think we should on the safe side call >>> vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch >>> over phase. With that IIUC it'll naturally work (e.g. when wakeup again >>> later we just need to call no notifiers). >> >> Notifiers are just one piece, all the code in vm_prepare_start must be >> called. >> Is it correct to call all of that long before we actually resume the CPUs in >> wakeup? I don't know, but what is the point? > > The point is not only for cleaness (again, I really, really don't like that > new global.. sorry), but also now I think we should make the vm running. I do believe it is safe to call vm_prepare_start immediately, since the vcpus are never running when it is called. >> The wakeup code still needs >> modification to conditionally resume the vcpus. The scheme would be roughly: >> >> loadvm_postcopy_handle_run_bh() >> runstat = global_state_get_runstate(); >> if (runstate == RUN_STATE_RUNNING) { >> vm_start() >> } else if (runstate == RUN_STATE_SUSPENDED) >> vm_prepare_start(); // the start of vm_start() >> } >> >> qemu_system_wakeup_request() >> if (some condition) >> resume_all_vcpus(); // the remainder of vm_start() >> else >> runstate_set(RUN_STATE_RUNNING) > > No it doesn't. wakeup_reason is set there, main loop does the resuming. > See: > > if (qemu_wakeup_requested()) { > pause_all_vcpus(); > qemu_system_wakeup(); > notifier_list_notify(&wakeup_notifiers, &wakeup_reason); > wakeup_reason = QEMU_WAKEUP_REASON_NONE; > resume_all_vcpus(); > qapi_event_send_wakeup(); > } Agreed, we can rely on that main loop code to call resume_all_vcpus, and not modify qemu_system_wakeup_request. That is cleaner. >> How is that better than my patches >> [PATCH V3 01/10] vl: start on wakeup request >> [PATCH V3 02/10] migration: preserve suspended runstate >> >> loadvm_postcopy_handle_run_bh() >> runstate = global_state_get_runstate(); >> if (runstate == RUN_STATE_RUNNING) >> vm_start() >> else >> runstate_set(runstate); // eg RUN_STATE_SUSPENDED >> >> qemu_system_wakeup_request() >> if (!vm_started) >> vm_start(); >> else >> runstate_set(RUN_STATE_RUNNING); >> >> Recall this thread started with your comment "It then can avoid touching the >> system wakeup code which seems cleaner". We still need to touch the wakeup >> code. > > Let me provide some code and reply to your new patchset inlines. Thank you, I have more comments there. - Steve