On Mon, Oct 17, 2016 at 04:50:09PM +0200, Radim Krčmář wrote:
> 2016-10-17 07:47-0200, Marcelo Tosatti:
> > On Fri, Oct 14, 2016 at 06:20:31PM -0300, Eduardo Habkost wrote:
> >> I have been wondering: should we allow live migration with the
> >> invtsc flag enabled, if TSC scaling is available on the
> >> destination?
> > TSC scaling and invtsc flag, yes.
> Yes, if we have well synchronized time between hosts, then we might be
> able to migrate with a TSC shift that cannot be perceived by the guest.
Even if the guest can't detect the TSC difference (relative to realtime),
i suppose TSC should be advanced to account for the migration stopped
time (so that TSC appears to have incremented at a "constant rate").
> Unless the VM also has a migratable assigned PCI device that uses ART,
> because we have no protocol to update the setting of ART (in CPUID), so
> we should keep migration forbidden then.
What is the use case for ART again? (need to catchup on that).
> >> For reference, this is what the Intel SDM says about invtsc:
> >> The time stamp counter in newer processors may support an
> >> enhancement, referred to as invariant TSC. Processor’s support
> >> for invariant TSC is indicated by CPUID.80000007H:EDX.
> >> The invariant TSC will run at a constant rate in all ACPI P-,
> >> C-. and T-states. This is the architectural behavior moving
> >> forward. On processors with invariant TSC support, the OS may
> >> use the TSC for wall clock timer services (instead of ACPI or
> >> HPET timers). TSC reads are much more efficient and do not
> >> incur the overhead associated with a ring transition or access
> >> to a platform resource.
> > Yes. The blockage happened for different reasons:
> > 1) Migration: to host with different TSC frequency.
> We shouldn't have done this even now when emulating anything newer than
> Pentium 4, because those CPUs have constant TSC, which only lacks the
> guarantee that it doesn't stop in deep C-states:
> For [a list of processors we emulate]: the time-stamp counter
> increments at a constant rate. That rate may be set by the maximum
> core-clock to bus-clock ratio of the processor or may be set by the
> maximum resolved frequency at which the processor is booted. The
> maximum resolved frequency may differ from the processor base
> frequency, see Section 18.18.2 for more detail. On certain processors,
> the TSC frequency may not be the same as the frequency in the brand
> The specific processor configuration determines the behavior. Constant
> TSC behavior ensures that the duration of each clock tick is uniform
> and supports the use of the TSC as a wall clock timer even if the
> processor core changes frequency. This is the architectural behavior
> moving forward.
> Invariant TSC is more useful, though, so more applications would break
> when migrating to a different TSC frequency.
> > 2) Savevm: It is not safe to use the TSC for wall clock timer
> > services.
> With constant TSC, we could argue that a shift to deep C-state happened
> and paused TSC, which is not a good behavior, but somewhat defensible.
> > By allowing savevm, you make a commitment to allow a feature
> > at the expense of not complying with the spec (specifically the "
> > the OS may use the TSC for wall clock timer services", because the
> > TSC stops relative to realtime for the duration of the savevm stop
> > window).
> Yep, we should at least guesstimate the TSC to allow the guest to resume
> with as small TSC-shift as possible and check that hosts were somewhat
> synchronized with UTC (or something we choose for time).
There are two options for savevm:
Option 1) Stop the TSC for savevm duration.
Option 2) Advance TSC to match realtime (this is known to overflow Linux
> > But since Linux guests use kvmclock and Windows guests use Hyper-V
> > enlightenment, it should be fine to disable 2).
> > There is a bug open for this, btw:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1353073
> These people should be happy with just live-migrations, so can't we just
> keep savevm forbidden?
Don't see why. Perhaps savevm should be considered a "special type of
operation" that deviates from baremetal behaviour and that if
the user does savevm, then it knows TSC does not count "at a constant
rate" (so savevm breaks invariant tsc behaviour).