On Thu, 2025-08-21 at 12:43 -0700, Sohil Mehta wrote: > On 8/21/2025 12:34 PM, Sohil Mehta wrote: > > On 8/21/2025 6:15 AM, David Woodhouse wrote: > > > > > Hm. My test host is INTEL_HASWELL_X (0x63f). For reasons which are > > > unclear to me, QEMU doesn't set bit 8 of 0x80000007 EDX unless I > > > explicitly append ',+invtsc' to the existing '-cpu host' on its command > > > line. So now my guest doesn't think it has X86_FEATURE_CONSTANT_TSC. > > > > > > > Haswell should have X86_FEATURE_CONSTANT_TSC, so I would have expected > > the guest bit to be set. Until now, X86_FEATURE_CONSTANT_TSC was set > > based on the Family-model instead of the CPUID enumeration which may > > have hid the issue. > > > > Correction: > s/instead/as well as > > > From my initial look at the QEMU implementation, this seems intentional. > > > > QEMU considers Invariant TSC as un-migratable which prevents it from > > being exposed to migratable guests (default). > > target/i386/cpu.c: > > [FEAT_8000_0007_EDX] > > .unmigratable_flags = CPUID_APM_INVTSC, > > > > Can you please try '-cpu host,migratable=off'? > > This is mainly to verify. If confirmed, I am not sure what the long term > solution should be.
Yes, explicitly turning it on with -cpu host,+invtsc does work. I've been looking into why it takes a Xen guest four seconds per vCPU in this case, but not a KVM guest. When running as a KVM guest, Linux will infer the TSC frequency from the KVM clock — or better still, from CPUID; see https://lore.kernel.org/all/20250816101308.2594298-1-dw...@infradead.org and/or https://lore.kernel.org/all/20250227021855.3257188-36-sea...@google.com As a Xen guest though, Linux doesn't do that. This patch in the guest should make it work without recalibrating the TSC for each vCPU... --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -489,7 +489,15 @@ static void xen_setup_vsyscall_time_info(void) */ static int __init xen_tsc_safe_clocksource(void) { - u32 eax, ebx, ecx, edx; + u32 eax, ebx, ecx, edx; + u64 lpj; + + /* Leaf 4, sub-leaf 0 (0x40000x03) */ + cpuid_count(xen_cpuid_base() + 3, 0, &eax, &ebx, &ecx, &edx); + + lpj = ((u64)ecx * 1000); + do_div(lpj, HZ); + preset_lpj = lpj; if (!(boot_cpu_has(X86_FEATURE_CONSTANT_TSC))) return 0; @@ -500,9 +508,6 @@ static int __init xen_tsc_safe_clocksource(void) if (check_tsc_unstable()) return 0; - /* Leaf 4, sub-leaf 0 (0x40000x03) */ - cpuid_count(xen_cpuid_base() + 3, 0, &eax, &ebx, &ecx, &edx); - return ebx == XEN_CPUID_TSC_MODE_NEVER_EMULATE; } ... but then I got slightly distracted by the question of why I was getting *nonsense* in those values, and why KVM is 'correcting' EAX in subleaf 2 which is supposed to be the *host* TSC, not ECX in subleaf zero... Under the Fedora 6.13.8-200 kernel I'm fairly sure the guest was seeing values in subleaf 0 ECX/EDX that *should* have been in subleaf 1 ECX/EDX, and that problem went away when I rebooted the host into a mainline kernel. Will have to go back and retest that part...
smime.p7s
Description: S/MIME cryptographic signature