On Thu, Sep 14, 2017 at 15:41 +0200, Reyk Floeter wrote: > I'd like to raise another point: as we figured out, the TSC frequency > can be different than the CPU frequency on modern Intel CPUs. The > const+invar TSC can even run faster than the CPU. > > For example, my X1 4th Gen currently boots up with the following speeds: > > cpu0: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz, 2808.00 MHz > ... > cpu0: Enhanced SpeedStep 2808 MHz: speeds: 2701, 2700, 2600, 2500, 2300, > 2100, 1900, 1800, 1600, 1400, 1300, 1100, 800, 700, 600, 400 MHz > > The 2.8GHz is actually the TSC speed, the fastest CPU speed is 2.7GHz. > > (I didn't figure out why ark.intel.com says max turbo speed is 3.4GHz, > but the 2.7GHz is probably a vendor configuration from Lenovo based on > the Laptop's cooling capabilities). > > But the MSR function returns a different value which is supposed to be > the actual CPU speed (of core 0) after going out of power-saving mode. > This caused problems on Skylake because the returned value can be very > different to the TSC - using this value, the kernel failed to > calibrated the TSC timecounter correctly. That's why I added the code > that reads the TSC frequency from CPUID 0x15 and skips the > MSR/RDTSC-based calibration, if CPUID 0x15 is supported and present. > > For example, without the CPUID method, I get the MSR-derived speed > (depending on the on-boot power state that is configured in the BIOS): > > cpu0: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz, 2494.75 MHz > > But here is my mistake: I actually thought that the MSR function is > "wrong" on Skylake. But it is just a different thing. We have to > differentiate if we want to obtain the TSC or the CPU frequency, but > the CPU frequency might just be derived from the TSC on older CPUs > that don't support the MSR. > > I attached a diff that illustrates the problem on the -current code, > but the same could be adjusted for the TSC recalibration diff. > > I split the cpu_tsc_freq() function into two functions: > > 1. cpu_freq() tries to get the CPU frequency from the MSR, falls back > to the TSC via cpu_tsc_freq(). This is used for cpuX dmesg printfs > and for the global speedstep "cpuspeed" variable. > > 2. cpu_tsc_freq() tries to get the TSC frequency from CPUID 0x15 or > falls back to the less accurate rdtsc() method. This is used for the > (initial) TSC timecounter frequency. > > Thoughts? > > (I'm not asking for OKs on the diff) > > Reyk >
Keeping all of what Reyk said in mind, here's an updated diff that incorporates additional changes: 1) doesn't break i386 kernel compilation; 2) allows for multiple recalibrations using better timecounter sources since acpihpet (and maybe acpitimer) can be attached both before and after cpu0; 3) factors out the TSC timecounter code into a separate file so that it's clear what's related to the timecounter code and what's not as I didn't quite like pollution of ACPI bits with unrelated stuff; 4) cuts down on global variables and provides additional cleanup. Same diff as in here: https://github.com/mbelop/src/tree/tsc Does this look good to everybody? acpitimer and acpihpet parts look a tiny bit gross I guess but on overall the tsc.c can be copied to i386 and these ifdefs extended to include it. diff --git sys/arch/amd64/amd64/identcpu.c sys/arch/amd64/amd64/identcpu.c index a448b885ba7..775e342650d 100644 --- sys/arch/amd64/amd64/identcpu.c +++ sys/arch/amd64/amd64/identcpu.c @@ -45,26 +45,20 @@ #include <machine/cpu.h> #include <machine/cpufunc.h> void replacesmap(void); -u_int64_t cpu_tsc_freq(struct cpu_info *); -u_int64_t cpu_tsc_freq_ctr(struct cpu_info *); +uint64_t cpu_freq(struct cpu_info *); +void tsc_timecounter_init(struct cpu_info *); #if NVMM > 0 void cpu_check_vmm_cap(struct cpu_info *); #endif /* NVMM > 0 */ /* sysctl wants this. */ char cpu_model[48]; int cpuspeed; -u_int tsc_get_timecount(struct timecounter *tc); - -struct timecounter tsc_timecounter = { - tsc_get_timecount, NULL, ~0u, 0, "tsc", -1000, NULL -}; - int amd64_has_xcrypt; #ifdef CRYPTO int amd64_has_pclmul; int amd64_has_aesni; #endif @@ -385,17 +379,16 @@ via_update_sensor(void *args) ci->ci_sensor.value += 273150000; ci->ci_sensor.flags &= ~SENSOR_FINVALID; } #endif -u_int64_t -cpu_tsc_freq_ctr(struct cpu_info *ci) +uint64_t +cpu_freq_ctr(struct cpu_info *ci) { - u_int64_t count, last_count, msr; + uint64_t count, last_count, msr; if ((ci->ci_flags & CPUF_CONST_TSC) == 0 || - (ci->ci_flags & CPUF_INVAR_TSC) || (cpu_perf_eax & CPUIDEAX_VERID) <= 1 || CPUIDEDX_NUM_FC(cpu_perf_edx) <= 1) return (0); msr = rdmsr(MSR_PERF_FIXED_CTR_CTRL); @@ -423,69 +416,30 @@ cpu_tsc_freq_ctr(struct cpu_info *ci) wrmsr(MSR_PERF_GLOBAL_CTRL, msr); return ((count - last_count) * 10); } -u_int64_t -cpu_tsc_freq(struct cpu_info *ci) +uint64_t +cpu_freq(struct cpu_info *ci) { - u_int64_t last_count, count; - uint32_t eax, ebx, khz, dummy; + uint64_t last_count, count; - if (!strcmp(cpu_vendor, "GenuineIntel") && - cpuid_level >= 0x15) { - eax = ebx = khz = dummy = 0; - CPUID(0x15, eax, ebx, khz, dummy); - khz /= 1000; - if (khz == 0) { - switch (ci->ci_model) { - case 0x4e: /* Skylake mobile */ - case 0x5e: /* Skylake desktop */ - case 0x8e: /* Kabylake mobile */ - case 0x9e: /* Kabylake desktop */ - khz = 24000; /* 24.0 Mhz */ - break; - case 0x55: /* Skylake X */ - khz = 25000; /* 25.0 Mhz */ - break; - case 0x5c: /* Atom Goldmont */ - khz = 19200; /* 19.2 Mhz */ - break; - } - } - if (ebx == 0 || eax == 0) - count = 0; - else if ((count = khz * ebx / eax) != 0) { - /* - * Using the CPUID-derived frequency increases - * the quality of the TSC time counter. - */ - tsc_timecounter.tc_quality = 2000; - return (count * 1000); - } - } - - count = cpu_tsc_freq_ctr(ci); + count = cpu_freq_ctr(ci); if (count != 0) return (count); last_count = rdtsc(); delay(100000); count = rdtsc(); return ((count - last_count) * 10); } -u_int -tsc_get_timecount(struct timecounter *tc) -{ - return rdtsc(); -} - void identifycpu(struct cpu_info *ci) { + uint64_t freq = 0; u_int32_t dummy, val; char mycpu_model[48]; int i; char *brandstr_from, *brandstr_to; int skipspace; @@ -566,22 +520,22 @@ identifycpu(struct cpu_info *ci) /* Check if it's an invariant TSC */ if (cpu_apmi_edx & CPUIDEDX_ITSC) ci->ci_flags |= CPUF_INVAR_TSC; } - ci->ci_tsc_freq = cpu_tsc_freq(ci); + freq = cpu_freq(ci); amd_cpu_cacheinfo(ci); printf("%s: %s", ci->ci_dev->dv_xname, mycpu_model); - if (ci->ci_tsc_freq != 0) - printf(", %llu.%02llu MHz", (ci->ci_tsc_freq + 4999) / 1000000, - ((ci->ci_tsc_freq + 4999) / 10000) % 100); + if (freq != 0) + printf(", %llu.%02llu MHz", (freq + 4999) / 1000000, + ((freq + 4999) / 10000) % 100); if (ci->ci_flags & CPUF_PRIMARY) { - cpuspeed = (ci->ci_tsc_freq + 4999) / 1000000; + cpuspeed = (freq + 4999) / 1000000; cpu_cpuspeed = cpu_amd64speed; } printf("\n%s: ", ci->ci_dev->dv_xname); @@ -721,18 +675,11 @@ identifycpu(struct cpu_info *ci) sensor_attach(&ci->ci_sensordev, &ci->ci_sensor); sensordev_install(&ci->ci_sensordev); #endif } - if ((ci->ci_flags & CPUF_PRIMARY) && - (ci->ci_flags & CPUF_CONST_TSC) && - (ci->ci_flags & CPUF_INVAR_TSC)) { - printf("%s: TSC frequency %llu Hz\n", - ci->ci_dev->dv_xname, ci->ci_tsc_freq); - tsc_timecounter.tc_frequency = ci->ci_tsc_freq; - tc_init(&tsc_timecounter); - } + tsc_timecounter_init(ci); cpu_topology(ci); #if NVMM > 0 cpu_check_vmm_cap(ci); #endif /* NVMM > 0 */ diff --git sys/arch/amd64/amd64/machdep.c sys/arch/amd64/amd64/machdep.c index 937094504d1..3c80d5998c5 100644 --- sys/arch/amd64/amd64/machdep.c +++ sys/arch/amd64/amd64/machdep.c @@ -423,10 +423,12 @@ bios_sysctl(int *name, u_int namelen, void *oldp, size_t *oldlenp, void *newp, */ int cpu_sysctl(int *name, u_int namelen, void *oldp, size_t *oldlenp, void *newp, size_t newlen, struct proc *p) { + extern uint64_t amd64_tsc_frequency; + extern int amd64_has_invariant_tsc; extern int amd64_has_xcrypt; dev_t consdev; dev_t dev; int val, error; @@ -494,10 +496,16 @@ cpu_sysctl(int *name, u_int namelen, void *oldp, size_t *oldlenp, void *newp, error = sysctl_int(oldp, oldlenp, newp, newlen, &forceukbd); if (forceukbd) pckbc_release_console(); return (error); #endif + case CPU_TSCFREQ: + return (sysctl_rdquad(oldp, oldlenp, newp, + amd64_tsc_frequency)); + case CPU_INVARIANTTSC: + return (sysctl_rdint(oldp, oldlenp, newp, + amd64_has_invariant_tsc)); default: return (EOPNOTSUPP); } /* NOTREACHED */ } diff --git sys/arch/amd64/amd64/tsc.c sys/arch/amd64/amd64/tsc.c new file mode 100644 index 00000000000..32a02eb00af --- /dev/null +++ sys/arch/amd64/amd64/tsc.c @@ -0,0 +1,223 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016,2017 Reyk Floeter <r...@openbsd.org> + * Copyright (c) 2017 Adam Steen <a...@adamsteen.com.au> + * Copyright (c) 2017 Mike Belopuhov <m...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ + +#include <sys/param.h> +#include <sys/systm.h> +#include <sys/sysctl.h> +#include <sys/timetc.h> + +#include <machine/cpu.h> +#include <machine/cpufunc.h> + +#define RECALIBRATE_MAX_RETRIES 5 +#define RECALIBRATE_SMI_THRESHOLD 50000 +#define RECALIBRATE_DELAY_THRESHOLD 20 + +int tsc_recalibrate; + +uint64_t amd64_tsc_frequency; +int amd64_has_invariant_tsc; + +uint tsc_get_timecount(struct timecounter *tc); + +struct timecounter tsc_timecounter = { + tsc_get_timecount, NULL, ~0u, 0, "tsc", -1000, NULL +}; + +uint64_t +tsc_freq_cpuid(struct cpu_info *ci) +{ + uint64_t count; + uint32_t eax, ebx, khz, dummy; + + if (!strcmp(cpu_vendor, "GenuineIntel") && + cpuid_level >= 0x15) { + eax = ebx = khz = dummy = 0; + CPUID(0x15, eax, ebx, khz, dummy); + khz /= 1000; + if (khz == 0) { + switch (ci->ci_model) { + case 0x4e: /* Skylake mobile */ + case 0x5e: /* Skylake desktop */ + case 0x8e: /* Kabylake mobile */ + case 0x9e: /* Kabylake desktop */ + khz = 24000; /* 24.0 Mhz */ + break; + case 0x55: /* Skylake X */ + khz = 25000; /* 25.0 Mhz */ + break; + case 0x5c: /* Atom Goldmont */ + khz = 19200; /* 19.2 Mhz */ + break; + } + } + if (ebx == 0 || eax == 0) + count = 0; + else if ((count = (uint64_t)khz * (uint64_t)ebx / eax) != 0) + return (count * 1000); + } + + return (0); +} + +static inline int +get_tsc_and_timecount(struct timecounter *tc, uint64_t *tsc, uint64_t *count) +{ + uint64_t n, tsc1, tsc2; + int i; + + for (i = 0; i < RECALIBRATE_MAX_RETRIES; i++) { + tsc1 = rdtsc(); + n = (tc->tc_get_timecount(tc) & tc->tc_counter_mask); + tsc2 = rdtsc(); + + if ((tsc2 - tsc1) < RECALIBRATE_SMI_THRESHOLD) { + *count = n; + *tsc = tsc2; + return (0); + } + } + return (1); +} + +static inline uint64_t +calculate_tsc_freq(uint64_t tsc1, uint64_t tsc2, int usec) +{ + uint64_t delta; + + delta = (tsc2 - tsc1); + return (delta * 1000000 / usec); +} + +static inline uint64_t +calculate_tc_delay(struct timecounter *tc, uint64_t count1, uint64_t count2) +{ + uint64_t delta; + + if (count2 < count1) + count2 += tc->tc_counter_mask; + + delta = (count2 - count1); + return (delta * 1000000 / tc->tc_frequency); +} + +uint64_t +measure_tsc_freq(struct timecounter *tc) +{ + uint64_t count1, count2, frequency, min_freq, tsc1, tsc2; + u_long ef; + int delay_usec, i, err1, err2, usec; + + /* warmup the timers */ + for (i = 0; i < 3; i++) { + (void)tc->tc_get_timecount(tc); + (void)rdtsc(); + } + + min_freq = ULLONG_MAX; + + delay_usec = 100000; + for (i = 0; i < 3; i++) { + ef = read_rflags(); + disable_intr(); + + err1 = get_tsc_and_timecount(tc, &tsc1, &count1); + delay(delay_usec); + err2 = get_tsc_and_timecount(tc, &tsc2, &count2); + + write_rflags(ef); + + if (err1 || err2) + continue; + + usec = calculate_tc_delay(tc, count1, count2); + + if ((usec < (delay_usec - RECALIBRATE_DELAY_THRESHOLD)) || + (usec > (delay_usec + RECALIBRATE_DELAY_THRESHOLD))) + continue; + + frequency = calculate_tsc_freq(tsc1, tsc2, usec); + + min_freq = MIN(min_freq, frequency); + } + + return (min_freq); +} + +void +calibrate_tsc_freq(void) +{ + struct timecounter *reference = tsc_timecounter.tc_priv; + uint64_t freq; + + if (!reference || !tsc_recalibrate) + return; + + if ((freq = measure_tsc_freq(reference)) == 0) + return; + amd64_tsc_frequency = freq; + tsc_timecounter.tc_frequency = freq; + if (amd64_has_invariant_tsc) + tsc_timecounter.tc_quality = 2000; + + printf("%s: recalibrated TSC frequency %lld Hz\n", + reference->tc_name, tsc_timecounter.tc_frequency); +} + +void +cpu_recalibrate_tsc(struct timecounter *tc) +{ + struct timecounter *reference = tsc_timecounter.tc_priv; + + /* Prevent recalibration with a worse timecounter source */ + if (reference && reference->tc_quality > tc->tc_quality) + return; + + tsc_timecounter.tc_priv = tc; + calibrate_tsc_freq(); +} + +uint +tsc_get_timecount(struct timecounter *tc) +{ + return rdtsc(); +} + +void +tsc_timecounter_init(struct cpu_info *ci) +{ + if (!(ci->ci_flags & CPUF_PRIMARY) || + !(ci->ci_flags & CPUF_CONST_TSC) || + !(ci->ci_flags & CPUF_INVAR_TSC)) + return; + + amd64_tsc_frequency = tsc_freq_cpuid(ci); + amd64_has_invariant_tsc = 1; + + /* Newer CPUs don't require recalibration */ + if (amd64_tsc_frequency > 0) { + tsc_timecounter.tc_frequency = amd64_tsc_frequency; + tsc_timecounter.tc_quality = 2000; + } else { + tsc_recalibrate = 1; + calibrate_tsc_freq(); + } + + tc_init(&tsc_timecounter); +} diff --git sys/arch/amd64/conf/files.amd64 sys/arch/amd64/conf/files.amd64 index bfbd6153d7a..c141e5071b8 100644 --- sys/arch/amd64/conf/files.amd64 +++ sys/arch/amd64/conf/files.amd64 @@ -8,10 +8,11 @@ file arch/amd64/amd64/conf.c file arch/amd64/amd64/disksubr.c disk file arch/amd64/amd64/gdt.c multiprocessor file arch/amd64/amd64/machdep.c file arch/amd64/amd64/hibernate_machdep.c hibernate file arch/amd64/amd64/identcpu.c +file arch/amd64/amd64/tsc.c file arch/amd64/amd64/via.c file arch/amd64/amd64/locore.S file arch/amd64/amd64/aes_intel.S crypto file arch/amd64/amd64/aesni.c crypto file arch/amd64/amd64/amd64errata.c diff --git sys/arch/amd64/include/cpu.h sys/arch/amd64/include/cpu.h index 37e2b490eec..e92c950b005 100644 --- sys/arch/amd64/include/cpu.h +++ sys/arch/amd64/include/cpu.h @@ -136,11 +136,10 @@ struct cpu_info { u_int32_t ci_extcacheinfo[4]; u_int32_t ci_signature; u_int32_t ci_family; u_int32_t ci_model; u_int32_t ci_cflushsz; - u_int64_t ci_tsc_freq; int ci_inatomic; #define ARCH_HAVE_CPU_TOPOLOGY u_int32_t ci_smt_id; @@ -425,11 +424,13 @@ void mp_setperf_init(void); #define CPU_CPUFEATURE 8 /* cpuid features */ #define CPU_KBDRESET 10 /* keyboard reset under pcvt */ #define CPU_XCRYPT 12 /* supports VIA xcrypt in userland */ #define CPU_LIDACTION 14 /* action caused by lid close */ #define CPU_FORCEUKBD 15 /* Force ukbd(4) as console keyboard */ -#define CPU_MAXID 16 /* number of valid machdep ids */ +#define CPU_TSCFREQ 16 /* tsc frequency */ +#define CPU_INVARIANTTSC 17 /* has invariant tsc */ +#define CPU_MAXID 18 /* number of valid machdep ids */ #define CTL_MACHDEP_NAMES { \ { 0, 0 }, \ { "console_device", CTLTYPE_STRUCT }, \ { "bios", CTLTYPE_INT }, \ @@ -444,10 +445,12 @@ void mp_setperf_init(void); { 0, 0 }, \ { "xcrypt", CTLTYPE_INT }, \ { 0, 0 }, \ { "lidaction", CTLTYPE_INT }, \ { "forceukbd", CTLTYPE_INT }, \ + { "tscfreq", CTLTYPE_QUAD }, \ + { "invarianttsc", CTLTYPE_INT }, \ } /* * Default cr4 flags. * Doesn't really belong here, but doesn't really belong anywhere else diff --git sys/arch/amd64/include/cpuvar.h sys/arch/amd64/include/cpuvar.h index 24fc8fe880d..edcf1223b82 100644 --- sys/arch/amd64/include/cpuvar.h +++ sys/arch/amd64/include/cpuvar.h @@ -94,7 +94,8 @@ void x86_ipi_init(int); #endif void identifycpu(struct cpu_info *); void cpu_init(struct cpu_info *); void cpu_init_first(void); +void cpu_adjust_tsc_freq(uint64_t (*)()); #endif diff --git sys/dev/acpi/acpihpet.c sys/dev/acpi/acpihpet.c index 17f5e59facf..0518fca835d 100644 --- sys/dev/acpi/acpihpet.c +++ sys/dev/acpi/acpihpet.c @@ -262,10 +262,14 @@ acpihpet_attach(struct device *parent, struct device *self, void *aux) hpet_timecounter.tc_frequency = (u_int32_t)freq; hpet_timecounter.tc_priv = sc; hpet_timecounter.tc_name = sc->sc_dev.dv_xname; tc_init(&hpet_timecounter); +#if defined(__amd64__) + extern void cpu_recalibrate_tsc(struct timecounter *); + cpu_recalibrate_tsc(&hpet_timecounter); +#endif acpihpet_attached++; } u_int acpihpet_gettime(struct timecounter *tc) diff --git sys/dev/acpi/acpitimer.c sys/dev/acpi/acpitimer.c index fb78acf2564..c4f3aaf1c5c 100644 --- sys/dev/acpi/acpitimer.c +++ sys/dev/acpi/acpitimer.c @@ -94,10 +94,14 @@ acpitimerattach(struct device *parent, struct device *self, void *aux) if (psc->sc_fadt->flags & FADT_TMR_VAL_EXT) acpi_timecounter.tc_counter_mask = 0xffffffffU; acpi_timecounter.tc_priv = sc; acpi_timecounter.tc_name = sc->sc_dev.dv_xname; tc_init(&acpi_timecounter); +#if defined(__amd64__) + extern void cpu_recalibrate_tsc(struct timecounter *); + cpu_recalibrate_tsc(&acpi_timecounter); +#endif } u_int acpi_get_timecount(struct timecounter *tc)