On Fri, Aug 02, 2019 at 01:29:37PM +0300, Paul Irofti wrote:
> On Mon, Jul 01, 2019 at 10:32:51AM +0200, Mark Kettenis wrote:
> > > Date: Thu, 27 Jun 2019 15:08:00 +0300
> > > From: Paul Irofti <p...@irofti.net>
> > > 
> > > Hi,
> > > 
> > > Here is an initial diff, adapted from NetBSD, that synchronizes TSC
> > > clocks across cores.
> > > 
> > > CPU0 is the reference clock and all others are skewed. During CPU
> > > initialization the clocks synchronize by keeping a registry of each CPU
> > > clock skewness and adapting the TSC read routine accordingly.
> > > 
> > > I choose this implementation over what FreeBSD is doing (which is just
> > > copying Linux really), because it is clean and elegant.
> > > 
> > > I would love to hear reports from machines that were broken by this.
> > > Mine, which never exhibited the problem in the first place, run just
> > > fine with the following diff. In fact I am writting this message on one
> > > such machine.
> > > 
> > > Also constructive comments are more than welcomed!
> > > 
> > > Notes:
> > > 
> > > - cpu_counter_serializing() could probably have a better name
> > >   (tsc _read for example)
> > > - the PAUSE instruction is probably not needed
> > > - acpi(4) suspend and resume bits are left out on purpose, but should
> > >   be trivial to add once the current diff settles
> > > 
> > > Paul Irofti
> > 
> > I don't think we want to introduce a <machine/tsc.h> header file.
> > 
> > The code suffers from some NetBSD-isms, so that'll need to be fixed.
> > I pointed some of them out below.
> > 
> > Also, how accurate is your skew detection?  What skew is detected on a
> > machine that (supposedly) has the TSCs in sync?  The result will be
> > that you actually slightly desync the counters on different CPUs.
> > 
> > I think Linux uses the TSC_ADJUST MSR and compares its value across
> > cores.  If the skew is small and the TSC_ADJUST values are the same
> > across cores it skips the TSC adjustments.
> 
> Hi,
> 
> Here is an updated diff with a few bugs eliminated from the previous and
> with most of the concerns I got in private and from Mark fixed.
> 
> I will do the TSC_ADJUST_MSR dance in another iteration if the current
> incarnation turns out to be correct for machines suffering from TSCs not
> in sync.
> 
> The thing I am mostly worried about now is in the following sum
> 
>  uint
>  tsc_get_timecount(struct timecounter *tc)
>  {
>       return rdtsc() + curcpu()->cpu_cc_skew;
>  }
>  
> can one term be executed on one CPU and the other on another? Is there a
> way to protect this from happening other than locking?
> 
> I see NetBSD is checking for a change in the number of context switches 
> of the current process.
> 
> My plan is to have a fix in the tree before 6.6 is released, so I would
> love to hear your thoughts and reports on this.
> 
> Thanks,
> Paul

Hi,

Here is a third version of the TSC diff that also take into
consideration the suspend-resume path which was ignored by the previous
thus rendering resume broken.

Have a go at it. Reports are welcome. So far I only got ONE report from
a machine with broken TSC :(

Paul


Index: arch/amd64/amd64/acpi_machdep.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/acpi_machdep.c,v
retrieving revision 1.86
diff -u -p -u -p -r1.86 acpi_machdep.c
--- arch/amd64/amd64/acpi_machdep.c     23 Oct 2018 17:51:32 -0000      1.86
+++ arch/amd64/amd64/acpi_machdep.c     5 Aug 2019 13:54:33 -0000
@@ -60,6 +60,8 @@ extern paddr_t tramp_pdirpa;
 
 extern int acpi_savecpu(void) __returns_twice;
 
+extern int64_t tsc_drift_observed;
+
 #define ACPI_BIOS_RSDP_WINDOW_BASE        0xe0000
 #define ACPI_BIOS_RSDP_WINDOW_SIZE        0x20000
 
@@ -481,6 +483,8 @@ acpi_resume_cpu(struct acpi_softc *sc)
 {
        fpuinit(&cpu_info_primary);
 
+       cpu_info_primary.cpu_cc_skew = 0;       /* futile */
+       tsc_drift_observed = 0;                 /* reset tsc drift on resume */
        cpu_init(&cpu_info_primary);
        cpu_ucode_apply(&cpu_info_primary);
 
Index: arch/amd64/amd64/cpu.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/cpu.c,v
retrieving revision 1.137
diff -u -p -u -p -r1.137 cpu.c
--- arch/amd64/amd64/cpu.c      28 May 2019 18:17:01 -0000      1.137
+++ arch/amd64/amd64/cpu.c      5 Aug 2019 13:54:34 -0000
@@ -754,6 +754,10 @@ cpu_init(struct cpu_info *ci)
        cr4 = rcr4();
        lcr4(cr4 & ~CR4_PGE);
        lcr4(cr4);
+
+       /* Synchronize TSC */
+       if (cold && !CPU_IS_PRIMARY(ci))
+             tsc_sync_ap(ci);
 #endif
 }
 
@@ -808,6 +812,7 @@ void
 cpu_start_secondary(struct cpu_info *ci)
 {
        int i;
+       u_long s;
 
        ci->ci_flags |= CPUF_AP;
 
@@ -828,6 +833,17 @@ cpu_start_secondary(struct cpu_info *ci)
                printf("dropping into debugger; continue from here to resume 
boot\n");
                db_enter();
 #endif
+       } else {
+               /*
+                * Synchronize time stamp counters. Invalidate cache and do
+                * twice (in tsc_sync_bp) to minimize possible cache effects.
+                * Disable interrupts to try and rule out any external
+                * interference.
+                */
+               s = intr_disable();
+               wbinvd();
+               tsc_sync_bp(ci);
+               intr_restore(s);
        }
 
        if ((ci->ci_flags & CPUF_IDENTIFIED) == 0) {
@@ -852,6 +868,8 @@ void
 cpu_boot_secondary(struct cpu_info *ci)
 {
        int i;
+       int64_t drift;
+       u_long s;
 
        atomic_setbits_int(&ci->ci_flags, CPUF_GO);
 
@@ -864,6 +882,17 @@ cpu_boot_secondary(struct cpu_info *ci)
                printf("dropping into debugger; continue from here to resume 
boot\n");
                db_enter();
 #endif
+       } else if (cold) {
+               /* Synchronize TSC again, check for drift. */
+               drift = ci->cpu_cc_skew;
+               s = intr_disable();
+               wbinvd();
+               tsc_sync_bp(ci);
+               intr_restore(s);
+               drift -= ci->cpu_cc_skew;
+               printf("TSC skew=%lld drift=%lld\n",
+                   (long long)ci->cpu_cc_skew, (long long)drift);
+               tsc_sync_drift(drift);
        }
 }
 
@@ -888,7 +917,14 @@ cpu_hatch(void *v)
                panic("%s: already running!?", ci->ci_dev->dv_xname);
 #endif
 
+       /*
+        * Synchronize the TSC for the first time. Note that interrupts are
+        * off at this point.
+        */
+       wbinvd();
        ci->ci_flags |= CPUF_PRESENT;
+       ci->cpu_cc_skew = 0;    /* reset on resume */
+       tsc_sync_ap(ci);
 
        lapic_enable();
        lapic_startclock();
Index: arch/amd64/amd64/tsc.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.11
diff -u -p -u -p -r1.11 tsc.c
--- arch/amd64/amd64/tsc.c      6 Jun 2019 19:43:35 -0000       1.11
+++ arch/amd64/amd64/tsc.c      5 Aug 2019 13:54:34 -0000
@@ -1,8 +1,10 @@
 /*     $OpenBSD: tsc.c,v 1.11 2019/06/06 19:43:35 kettenis Exp $       */
 /*
+ * Copyright (c) 2008 The NetBSD Foundation, Inc.
  * Copyright (c) 2016,2017 Reyk Floeter <r...@openbsd.org>
  * Copyright (c) 2017 Adam Steen <a...@adamsteen.com.au>
  * Copyright (c) 2017 Mike Belopuhov <m...@openbsd.org>
+ * Copyright (c) 2019 Paul Irofti <piro...@openbsd.org>
  *
  * Permission to use, copy, modify, and distribute this software for any
  * purpose with or without fee is hereby granted, provided that the above
@@ -20,6 +22,7 @@
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/timetc.h>
+#include <sys/atomic.h>
 
 #include <machine/cpu.h>
 #include <machine/cpufunc.h>
@@ -33,6 +36,13 @@ int          tsc_recalibrate;
 uint64_t       tsc_frequency;
 int            tsc_is_invariant;
 
+int64_t        tsc_drift_max = 250;    /* max cycles */
+int64_t        tsc_drift_observed;
+bool   tsc_good;
+
+volatile int64_t       tsc_sync_val;
+volatile struct cpu_info       *tsc_sync_cpu;
+
 uint           tsc_get_timecount(struct timecounter *tc);
 
 struct timecounter tsc_timecounter = {
@@ -172,10 +182,8 @@ calibrate_tsc_freq(void)
                return;
        tsc_frequency = freq;
        tsc_timecounter.tc_frequency = freq;
-#ifndef MULTIPROCESSOR
        if (tsc_is_invariant)
                tsc_timecounter.tc_quality = 2000;
-#endif
 }
 
 void
@@ -194,26 +202,25 @@ cpu_recalibrate_tsc(struct timecounter *
 uint
 tsc_get_timecount(struct timecounter *tc)
 {
-       return rdtsc();
+       return rdtsc() + curcpu()->cpu_cc_skew;
 }
 
 void
 tsc_timecounter_init(struct cpu_info *ci, uint64_t cpufreq)
 {
-       if (!(ci->ci_flags & CPUF_PRIMARY) ||
-           !(ci->ci_flags & CPUF_CONST_TSC) ||
+       if (!(ci->ci_flags & CPUF_CONST_TSC) ||
            !(ci->ci_flags & CPUF_INVAR_TSC))
                return;
 
        tsc_frequency = tsc_freq_cpuid(ci);
        tsc_is_invariant = 1;
 
+       tsc_good = (rdmsr(MSR_TSC) != 0 || rdmsr(MSR_TSC) != 0);
+
        /* Newer CPUs don't require recalibration */
        if (tsc_frequency > 0) {
                tsc_timecounter.tc_frequency = tsc_frequency;
-#ifndef MULTIPROCESSOR
                tsc_timecounter.tc_quality = 2000;
-#endif
        } else {
                tsc_recalibrate = 1;
                tsc_frequency = cpufreq;
@@ -221,5 +228,112 @@ tsc_timecounter_init(struct cpu_info *ci
                calibrate_tsc_freq();
        }
 
-       tc_init(&tsc_timecounter);
+       if (tsc_drift_observed > tsc_drift_max) {
+               printf("ERROR: %lld cycle TSC drift observed\n",
+                   (long long)tsc_drift_observed);
+               tsc_timecounter.tc_quality = -100;
+               tsc_is_invariant = 0;
+       }
+
+       printf("%s: TSC skew=%lld observed drift=%lld\n", __func__,
+           (long long)ci->cpu_cc_skew, (long long)tsc_drift_observed);
+
+       if (ci->ci_flags & CPUF_PRIMARY)
+               tc_init(&tsc_timecounter);
+}
+
+static uint64_t
+cpu_counter_serializing(struct cpu_info *ci)
+{
+       if (tsc_good)
+               return rdmsr(MSR_TSC);
+       else
+               return (rdtsc() + ci->cpu_cc_skew);
+}
+
+/*
+ * Record drift (in clock cycles).  Called during AP startup.
+ */
+void
+tsc_sync_drift(int64_t drift)
+{
+       if (drift < 0)
+               drift = -drift;
+       if (drift > tsc_drift_observed)
+               tsc_drift_observed = drift;
+}
+
+/*
+ * Called during startup of APs, by the boot processor.  Interrupts
+ * are disabled on entry.
+ */
+static void
+tsc_read_bp(struct cpu_info *ci, uint64_t *bptscp, uint64_t *aptscp)
+{
+       uint64_t bptsc;
+
+       if (atomic_swap_ptr(&tsc_sync_cpu, ci) != NULL)
+               panic("tsc_sync_bp: 1");
+
+       /* Flag it and read our TSC. */
+       atomic_setbits_int(&ci->ci_flags, CPUF_SYNCTSC);
+       bptsc = cpu_counter_serializing(ci) >> 1;
+
+       /* Wait for remote to complete, and read ours again. */
+       while ((ci->ci_flags & CPUF_SYNCTSC) != 0)
+               membar_consumer();
+       bptsc += (cpu_counter_serializing(ci) >> 1);
+
+       /* Wait for the results to come in. */
+       while (tsc_sync_cpu == ci)
+               CPU_BUSY_CYCLE();
+       if (tsc_sync_cpu != NULL)
+               panic("tsc_sync_bp: 2");
+
+       *bptscp = bptsc;
+       *aptscp = tsc_sync_val;
+}
+
+void
+tsc_sync_bp(struct cpu_info *ci)
+{
+       uint64_t bptsc, aptsc;
+
+       tsc_read_bp(ci, &bptsc, &aptsc); /* discarded - cache effects */
+       tsc_read_bp(ci, &bptsc, &aptsc);
+
+       /* Compute final value to adjust for skew. */
+       ci->cpu_cc_skew = bptsc - aptsc;
+}
+
+/*
+ * Called during startup of AP, by the AP itself.  Interrupts are
+ * disabled on entry.
+ */
+static void
+tsc_post_ap(struct cpu_info *ci)
+{
+       uint64_t tsc;
+
+       /* Wait for go-ahead from primary. */
+       while ((ci->ci_flags & CPUF_SYNCTSC) == 0)
+               membar_consumer();
+       tsc = (cpu_counter_serializing(ci) >> 1);
+
+       /* Instruct primary to read its counter. */
+       atomic_clearbits_int(&ci->ci_flags, CPUF_SYNCTSC);
+       tsc += (cpu_counter_serializing(ci) >> 1);
+
+       /* Post result.  Ensure the whole value goes out atomically. */
+       (void)atomic_swap_64(&tsc_sync_val, tsc);
+
+       if (atomic_swap_ptr(&tsc_sync_cpu, NULL) != ci)
+               panic("tsc_sync_ap");
+}
+
+void
+tsc_sync_ap(struct cpu_info *ci)
+{
+       tsc_post_ap(ci);
+       tsc_post_ap(ci);
 }
Index: arch/amd64/include/cpu.h
===================================================================
RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
retrieving revision 1.131
diff -u -p -u -p -r1.131 cpu.h
--- arch/amd64/include/cpu.h    17 May 2019 19:07:16 -0000      1.131
+++ arch/amd64/include/cpu.h    5 Aug 2019 13:54:34 -0000
@@ -206,6 +206,8 @@ struct cpu_info {
        union           vmm_cpu_cap ci_vmm_cap;
        paddr_t         ci_vmxon_region_pa;
        struct vmxon_region *ci_vmxon_region;
+
+       int64_t         cpu_cc_skew;            /* counter skew vs cpu0 */
 };
 
 #define CPUF_BSP       0x0001          /* CPU is the original BSP */
@@ -221,6 +223,7 @@ struct cpu_info {
 #define CPUF_INVAR_TSC 0x0100          /* CPU has invariant TSC */
 #define CPUF_USERXSTATE        0x0200          /* CPU has curproc's xsave 
state */
 
+#define        CPUF_SYNCTSC    0x0800          /* Synchronize TSC */
 #define CPUF_PRESENT   0x1000          /* CPU is present */
 #define CPUF_RUNNING   0x2000          /* CPU is running */
 #define CPUF_PAUSE     0x4000          /* CPU is paused in DDB */
Index: arch/amd64/include/cpuvar.h
===================================================================
RCS file: /cvs/src/sys/arch/amd64/include/cpuvar.h,v
retrieving revision 1.9
diff -u -p -u -p -r1.9 cpuvar.h
--- arch/amd64/include/cpuvar.h 6 Oct 2017 13:33:53 -0000       1.9
+++ arch/amd64/include/cpuvar.h 5 Aug 2019 13:54:34 -0000
@@ -98,4 +98,8 @@ void cpu_init(struct cpu_info *);
 void cpu_init_first(void);
 void cpu_adjust_tsc_freq(uint64_t (*)());
 
+void tsc_sync_drift(int64_t);
+void tsc_sync_bp(struct cpu_info *);
+void tsc_sync_ap(struct cpu_info *);
+
 #endif

Reply via email to