On Sun, Jul 29, 2012 at 02:58:36PM +1000, Bruce Evans wrote:
On Sat, 28 Jul 2012, Konstantin Belousov wrote:
...
Handling of usermode will follow later.
I hesitate to mention that this doesn't pessimize all uses of rdtsc:
- x86/isa/clock.c uses raw rdtsc32() for DELAY()
There, fence is not needed because we do not compare counters on
different cores. Note that delay_tc() explicitely performs pin when
tsc is going to be used.
- x86/x86/mca.c uses raw rdtsc() for mca_check_status()
mca use of rdtsc() seems to tbe purely informational.
diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c
index c253a96..101cbb3 100644
--- a/sys/x86/x86/tsc.c
+++ b/sys/x86/x86/tsc.c
...
@@ -328,15 +344,26 @@ init_TSC(void)
#ifdef SMP
-/* rmb is required here because rdtsc is not a serializing instruction. */
+/*
+ * RDTSC is not a serializing instruction, so we need to drain
+ * instruction stream before executing it. It could be fixed by use of
+ * RDTSCP, except the instruction is not available everywhere.
Sentence breaks are 2 spaces in KNF. There was a recent thread about this.
Changed.
RDTSC apparently doesn't need full serialization (else fences wouldn't
be enough, and the necessary serialization would be slower). The
necessary serialization is very complex and not fully described above,
but the above is not a good place to describe many details. It already
describes too much. It is obvious that we shouldn't use large code or
ifdefs to be portable in places where we don't care at all about
efficiency. Only delicate timing requirements would prevent us using
the simple CPUID in such code. For example, in code very much like
this, we might need to _not_ slow things down, since we might want
to see small differences between the TSCs on different CPUs.
This is argument to stop using do_cpuid(), since it causes relatively
large amount of unneeded memory writes. I inlined CPUID instead with
explicit register clobber list.
@@ -592,23 +628,55 @@ sysctl_machdep_tsc_freq(SYSCTL_HANDLER_ARGS)
SYSCTL_PROC(_machdep, OID_AUTO, tsc_freq, CTLTYPE_U64 | CTLFLAG_RW,
0, 0, sysctl_machdep_tsc_freq, QU, Time Stamp Counter frequency);
-static u_int
+static inline u_int
tsc_get_timecount(struct timecounter *tc __unused)
{
return (rdtsc32());
}
Adding `inline' gives up to 5 (!) style bugs at different levels:
- `inline' is usually spelled `__inline'. The former is now standard,
but sys/cdefs.h never caught up with this and has better magic for
the latter. And `inline' doesn't match most existing spellings.
There is no reasons to use __inline in the kernel code anymore. We are in
C99 mode always when compiling kernel.
- the function is still forward-declared without `inline'. The differenc
is confusing.
- forward declaration of inline functions is nonsense. It sort of asks
for early uses to be non-inline and late uses to be inline. But
gcc now doesn't care much about the order, since -funit-at-a-time is
the default.
- this function is never actually inline. You added calls to the others
but not this.
- if this function were inline, then this might happen automatically due
to -finline-functions-called-once. See below.
-static u_int
+static inline u_int
tsc_get_timecount_low(struct timecounter *tc)
{
uint32_t rv;
__asm __volatile(rdtsc; shrd %%cl, %%edx, %0
-: =a (rv) : c ((int)(intptr_t)tc-tc_priv) : edx);
+: =a (rv) : c ((int)(intptr_t)tc-tc_priv) : edx);
return (rv);
}
...
Added inline to declaration of tsc_get_timecounter_low. And removed
'inline' from other functions. Compiler still generates exactly '*fence;
rdtsc' sequences. In truth, for _low variabnts, the load of tc_priv is
done between fence and rdtsc, but I kept it as is for now.
diff --git a/sys/amd64/include/cpufunc.h b/sys/amd64/include/cpufunc.h
index 94d4133..881fcd2 100644
--- a/sys/amd64/include/cpufunc.h
+++ b/sys/amd64/include/cpufunc.h
@@ -290,6 +290,13 @@ popcntq(u_long mask)
}
static __inline void
+lfence(void)
+{
+
+ __asm __volatile(lfence : : : memory);
+}
+
+static __inline void
mfence(void)
{
diff --git a/sys/i386/include/cpufunc.h b/sys/i386/include/cpufunc.h
index 62d268d..7cd3663 100644
--- a/sys/i386/include/cpufunc.h
+++ b/sys/i386/include/cpufunc.h
@@ -155,6 +155,13 @@ cpu_mwait(u_long extensions, u_int hints)
}
static __inline void
+lfence(void)
+{
+
+ __asm __volatile(lfence : : : memory);
+}
+
+static __inline void
mfence(void)
{
diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c
index c253a96..3d8bd30 100644
--- a/sys/x86/x86/tsc.c
+++ b/sys/x86/x86/tsc.c
@@ -82,7 +82,11 @@ static void tsc_freq_changed(void *arg, const struct
cf_level *level,
static void tsc_freq_changing(void *arg, const struct cf_level *level,
int *status);
static unsigned tsc_get_timecount(struct timecounter *tc);
-static unsigned tsc_get_timecount_low(struct timecounter *tc);
+static inline unsigned