On Mon, 2025-09-22 at 10:31 -0700, Dongli Zhang wrote: > Hi David, > > Thank you very much for quick reply! > > On 9/22/25 9:58 AM, David Woodhouse wrote: > > On Mon, 2025-09-22 at 09:37 -0700, Dongli Zhang wrote: > > > Hi, > > > > > > Would you mind helping confirm if kvm-clock/guest_tsc should stop counting > > > elapsed time during downtime blackout? > > > > > > 1. guest_clock=T1, realtime=R1. > > > 2. (qemu) stop > > > 3. Wait for several seconds. > > > 4. (qemu) cont > > > 5. guest_clock=T2, realtime=R2. > > > > > > Should (T1 == T2), or (R2 - R1 == T2 - T1)? > > > > Neither. > > > > Realtime is something completely different and runs at a different rate > > to the monotonic clock. In fact its rate compared to the monotonic > > clock (and the TSC) is *variable* as NTP guides it. > > > > In your example of stopping and continuing on the *same* host, the > > guest TSC *offset* from the host's TSC should remain the same. > > > > And the *precise* mathematical relationship that KVM advertises to the > > guest as "how to turn a TSC value into nanoseconds since boot" should > > also remain precisely the same. > > Does that mean: > > Regarding "stop/cont" scenario, both kvm-clock and guest_tsc value should > remain > the same, i.e., > > 1. When "stop", kvm-clock=K1, guest_tsc=T1. > 2. Suppose many hours passed. > 3. When "cont", guest VM should see kvm-clock==K1 and guest_tsc==T1, by > refreshing both PVTI and tsc_offset at KVM.
Assuming a modern host where the TSC just counts sanely at a consistent rate and never deviates.... No. The PVTI should basically *never* change. Whatever the estimated (not NTP skewed) frequency of the TSC is believed to be, the KVM clock PVTI should indicate that at boot, telling the guest how to convert a TSC value into 'monotonic nanoseconds since boot'. If it ever changes, that's a KVM bug. It should be saved and restored in precisely its native form, using the KVM_[GS]ET_CLOCK_GUEST I referenced before. For both live update (same host) and live migration (different host). The TSC should also continue to count at exactly the same rate as the host's TSC at all times. No breaks or discontinuities due to any kind of 'steal time'. For live update that's easy as you just apply the same *offset*. For live migration that's where you have to accept that it depends on clock synchronization between your source and destination hosts, which is probably based on realtime. > > As demonstrated in my test, currently guest_tsc doesn't stop counting during > blackout because of the lack of "MSR_IA32_TSC put" at > kvmclock_vm_state_change(). Per my understanding, it is a bug and we may need > to > fix it. > > BTW, kvmclock_vm_state_change() already utilizes KVM_SET_CLOCK to re-configure > kvm-clock before continuing the guest VM. > > > > > KVM already lets you restore the TSC correctly. To restore KVM clock > > correctly, you want something like KVM_SET_CLOCK_GUEST from > > https://lore.kernel.org/all/20240522001817.619072-4-dw...@infradead.org/ > > > > For cross machine migration, you *do* need to use a realtime clock > > reference as that's the best you have (make sure you use TAI not UTC > > and don't get affected by leap seconds or smearing). Use that to > > restore the *TSC* as well as you can to make it appear to have kept > > running consistently. And then KVM_SET_CLOCK_GUEST just as you would on > > the same host. > > Indeed QEMU Live Migration also relies on kvmclock_vm_state_change() to > temporarily stop/cont the source/target VM. > > Would you mean we expect something different for live migration, i.e., > > 1. Live Migrate a source VM to a file. > 2. Copy the file to another server. > 3. Wait for 1 hour. > 4. Migrate from the file to target VM. > > Although it is equivalent to a one-hour downtime, we do need to count the > missing one-hour, correct? I don't look at it as counting anything. The clock keeps running even when I'm not looking at it. If I wake up and look at it again, there is no 'counting' how long I was asleep...
smime.p7s
Description: S/MIME cryptographic signature