On Thu, Jul 16, 2020 at 02:14:45PM +0200, Christian Weisgerber wrote:
> Scott Cheloha:
>
> > Can we add the missing LFENCE instructions to userspace and the
> > kernel? And can we excise the upper 32 bits?
>
> > + uint32_t lo;
> > +
> > + asm volatile("lfence");
> > + asm volatile("rdtsc" : "=a"(lo));
>
> That's wrong. rtdsc will clobber %rdx, whether you use that value
> or not. You need a corresponding constraint:
>
> asm volatile("rdtsc" : "=a"(lo) : : "rdx");
Whoops. Thanks! Updated below.
On Thu, Jul 16, 2020 at 02:26:56PM +0200, Mark Kettenis wrote:
> > Date: Wed, 15 Jul 2020 20:36:04 -0500
> > From: Scott Cheloha <[email protected]>
> >
> > [...]
> >
> > My understanding is that you need an "LFENCE sandwich" to prevent the
> > RDTSC from being reordered in the pipeline.
>
> Hmm, it is my understanding that only the lfence before is needed,
> that is why rdtsc_lfence() looks the way it looks. If that isn't the
> case, that function should probably be adapted.
My reasoning is based on the ISA reference:
https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.html
In particular (p. 1197; Vol. 2B 4-545):
> The RDTSC instruction is not a serializing instruction.
>
> It does not necessarily wait until all previous instructions
> have been executed before reading the counter.
>
> Similarly, subsequent instructions may begin execution before the
> read operation is performed.
>
> If software requires RDTSC to be executed only after all previous
> instructions have completed locally, it can either use RDTSCP (if
> the processor supports that instruction) or execute the sequence
> LFENCE;RDTSC.
Note the third sentence.
Given that, I reason that a serializing instruction before *and* after
the RDTSC should freeze it in place.
Whether or not there are diminishing returns on the 2nd LFENCE is
beyond what I can comment on. However, if you want a guarantee that
it won't move around at all I think you need both LFENCEs.
Updated patch adds the second LFENCE to rdtsc_lfence() in cpufunc.h.
Index: lib/libc/arch/amd64/gen/usertc.c
===================================================================
RCS file: /cvs/src/lib/libc/arch/amd64/gen/usertc.c,v
retrieving revision 1.2
diff -u -p -r1.2 usertc.c
--- lib/libc/arch/amd64/gen/usertc.c 8 Jul 2020 09:17:48 -0000 1.2
+++ lib/libc/arch/amd64/gen/usertc.c 16 Jul 2020 17:21:28 -0000
@@ -21,9 +21,13 @@
static inline u_int
rdtsc(void)
{
- uint32_t hi, lo;
- asm volatile("rdtsc" : "=a"(lo), "=d"(hi));
- return ((uint64_t)lo)|(((uint64_t)hi)<<32);
+ uint32_t lo;
+
+ asm volatile("lfence");
+ asm volatile("rdtsc" : "=a"(lo) : : "rdx");
+ asm volatile("lfence");
+
+ return lo;
}
static int
Index: sys/arch/amd64/include/cpufunc.h
===================================================================
RCS file: /cvs/src/sys/arch/amd64/include/cpufunc.h,v
retrieving revision 1.35
diff -u -p -r1.35 cpufunc.h
--- sys/arch/amd64/include/cpufunc.h 3 Jul 2020 17:54:27 -0000 1.35
+++ sys/arch/amd64/include/cpufunc.h 16 Jul 2020 17:21:28 -0000
@@ -296,7 +296,9 @@ rdtsc_lfence(void)
{
uint32_t hi, lo;
- __asm volatile("lfence; rdtsc" : "=d" (hi), "=a" (lo));
+ __asm volatile("lfence");
+ __asm volatile("rdtsc" : "=d" (hi), "=a" (lo));
+ __asm volatile("lfence");
return (((uint64_t)hi << 32) | (uint64_t) lo);
}
Index: sys/arch/amd64/amd64/tsc.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.19
diff -u -p -r1.19 tsc.c
--- sys/arch/amd64/amd64/tsc.c 6 Jul 2020 13:33:06 -0000 1.19
+++ sys/arch/amd64/amd64/tsc.c 16 Jul 2020 17:21:28 -0000
@@ -211,7 +211,13 @@ cpu_recalibrate_tsc(struct timecounter *
u_int
tsc_get_timecount(struct timecounter *tc)
{
- return rdtsc() + curcpu()->ci_tsc_skew;
+ uint32_t lo;
+
+ asm volatile("lfence");
+ asm volatile("rdtsc" : "=a"(lo) : : "rdx");
+ asm volatile("lfence");
+
+ return lo + curcpu()->ci_tsc_skew;
}
void