On Mon, 25 Nov 2002, Luigi Rizzo wrote:
I just got hit by a peculiar problem related to out-of-order
execution of instructions.
I was doing some low-level timing measurements using the rdtsc()
around selected pieces of code (the rdtsc() is included in
the TSTMP() functions that are in RELENG_4, source is in
sys/i386/isa/clock.c), as follows:
TSTMP(3, ifp-if_unit, 1, 0);
tmp = CSR_READ_1(sc, FXP_CSR_SCB_STATACK);
TSTMP(3, ifp-if_unit, 2, 0);
TSTMP(3, ifp-if_unit, 3, 0);
CSR_READ_1() goes to do a volatile read on memory across a 33MHz
PCI bus, so it should take a very minimum of 100ns, plus arbitration
and bridge crossing and whatnot. To my surprise, on a 750MHz Athlon
box, the delta between the first two timestamps turned out to be
in the order of 39 clock cycles, whereas the delta between 2 and 3
is the 270-300 cycles range.
The only explaination i can find is that the rdtsc() within TSTMP()
is executed out of order.
I wonder, is there on the high-end i386 processors any 'barrier'
instruction of some kind that enforces in-order execution of some
piece of code ?
The Intel processor manual has an explicit example for this and recommends
you use cpuid as a serializing instruction before the call to rdtsc.
Basically you call cpuid + rdtsc a bunch of times to calibrate its average
latency. Then do your run with cpuid + rdtsc to get the beginning and end
clockstamp, subtract the two plus the latency you calculated above. This
gives a good value for the cycles in your routine.
Other factors like acpi can affect rdtsc so beware of this.
-Nate
To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message