>> >> The main problem is that QEMU changes virtual_tsc_khz when migrating
>> >> without hardware scaling, so KVM is forced to get nanoseconds wrong ...
>> >> If QEMU doesn't want to keep the TSC frequency constant, then it would
>> >> be better if it didn't expose TSC in CPUID -- guest would just use
>> >> kvmclock without being tempted by direct TSC accesses.
>> > Isn't enough to simply not expose invtsc? Aren't guests expected
>> > to assume the TSC frequency can change if invtsc isn't set on
>> > CPUID?
>> There are exceptions.  An OS can assume constant TSC on some models that
>> QEMU emulates: coreduo, core2duo, Conroe, Penryn, n270, kvm32 and kvm64.
>> The list from SDM (17.15 TIME-STAMP COUNTER):
>>   Pentium 4 processors, Intel Xeon processors (family [0FH], models [03H
>>   and higher]); Intel Core Solo and Intel Core Duo processors (family
>>   [06H], model [0EH]); the Intel Xeon processor 5100 series and Intel
>>   Core 2 Duo processors (family [06H], model [0FH]); Intel Core 2 and
>>   Intel Xeon processors (family [06H], DisplayModel [17H]); Intel Atom
>>   processors (family [06H], DisplayModel [1CH]))
>> Another sad part is that Linux uses the following condition to assume
>> constant TSC frequency:
>>      if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
>>              (c->x86 == 0x6 && c->x86_model >= 0x0e))
>>              set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
>> which returns sets constant TSC for all modern processors.  It's not a
>> problem on real hardware, because all modern processors likely have
>> invariant TSC.
>> Fun fact: Linux shows constant_tsc flag in /proc/cpuinfo even if the
>>           modern CPU doesn't expose TSC in CPUID.
>> Considering that Linux is fixed on Nehalem and newer processors, we have
>> few options for the rest:
>>  1) treat TSC like invariant TSC on those models (the guest cannot use
>>     ACPI state, so its OS might assume that they are equivalent)
>>  2) hide TSC on those models
>>  3) ignore the problem
>>  4) remove those models
>> I don't know enough about QEMU design goals to guess which one is the
>> most appropriate.  (4) is the clear winner for me, followed by (3). :)
> (4) can't be implemented because it breaks existing
> configurations. (3) is the current solution.

Existing machine types must remain compatible, but isn't it possible to
cull options in new machine types?

> Option (2) sounds attractive to me, but seems risky.

If users have a setup that works, then any change can break it.

It would be the best option few years back when we wrote the code, but
now the change will happen *in* the guest, so we can't control it as in
the case of (4), where broken guests won't start, or (1), where broken
guests won't migrate.

>                                                      I would like
> to understand the consequences for guests. What could stop
> working if we remove TSC? What about kvmclock?

Hiding TSC in CPUID doesn't disable the RDTSC instruction in the guest.

kvmclock is a paravirtual device on top of TSC, so if kvmclock is
present, then it should be safe to assume that the guest can use TSC for
operations with kvmclock.
Linux does that, but I don't think this behavior was ever written down,
so other kvmclock users could break.

Maybe Hyper-V TSC page would stop working, because Windows and other
users could have a check for CPUID.1:EDX.TSC separately.
Linux's implemention would work, because it just checks for the
paravirtual feature, like in case of kvmclock.

And minor cases are: an OS that has no other option that TSC for clock;
userspace that checks TSC before using it; an OS that stops setting
CR4.TSD and its userspace starts to use TSC; and probably many others.

> If we implement (2), we could even add an extra check that blocks
> migration (or at least prints a warning) in case:
> 1) TSC is forcibly enabled in the configuration;
> 2) TSC scaling is not available on destination; and
> 3) the family/model values match the ones on the list above.
> And we could even keep TSC enabled by default for users who don't
> want migration (using migratable=false).

That would be nice.

