amd64: i8254_delay(): simpler microsecond->ticks conversion

Scott Cheloha Sat, 18 May 2019 22:48:08 -0700

This code is really fidgety and I think we can do better.

If we use a 64-bit value here for microseconds the compiler arranges
for all the math to be done with 64-bit quantites.  I'm pretty sure
this is standard C numeric type upconversion.  As noted in the comment
for the assembly, use of the 64-bit intermediate avoids overflow for
most plausible inputs.


So with one line of code we get the same result as we do with the
__GNUC__ assembly and the nigh incomprehensible default computation.

We could add error checking, but it wasn't here before, and you won't
overflow an int of 8254 ticks until you try to delay for more than
1799795545 microseconds.  Maybe a KASSERT...

My toy program suggests that this simpler code is actually faster when
compiled with both base gcc and clang.  But that's just a userspace
benchmark, not measurement during practical execution in the kernel.
Is the setup for delay(9) considered a hot path?

Near-identical simplification could be done in arch/i386/isa/clock.c.
I don't have an i386 though so I don't know whether this code might be
faster on such hardware.  It'd definitely be smaller though.

thoughts?

Index: arch/amd64/isa/clock.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/isa/clock.c,v
retrieving revision 1.28
diff -u -p -r1.28 clock.c
--- arch/amd64/isa/clock.c      27 Jul 2018 21:11:31 -0000      1.28
+++ arch/amd64/isa/clock.c      19 May 2019 05:31:04 -0000
@@ -226,6 +226,7 @@ gettick(void)
 void
 i8254_delay(int n)
 {
+       int64_t microseconds;
        int limit, tick, otick;
        static const int delaytab[26] = {
                 0,  2,  3,  4,  5,  6,  7,  9, 10, 11,
@@ -242,31 +243,9 @@ i8254_delay(int n)
        if (n <= 25)
                n = delaytab[n];
        else {
-#ifdef __GNUC__
-               /*
-                * Calculate ((n * TIMER_FREQ) / 1e6) using explicit assembler
-                * code so we can take advantage of the intermediate 64-bit
-                * quantity to prevent loss of significance.
-                */
-               int m;
-               __asm volatile("mul %3"
-                                : "=a" (n), "=d" (m)
-                                : "0" (n), "r" (TIMER_FREQ));
-               __asm volatile("div %4"
-                                : "=a" (n), "=d" (m)
-                                : "0" (n), "1" (m), "r" (1000000));
-#else
-               /*
-                * Calculate ((n * TIMER_FREQ) / 1e6) without using floating
-                * point and without any avoidable overflows.
-                */
-               int sec = n / 1000000,
-                   usec = n % 1000000;
-               n = sec * TIMER_FREQ +
-                   usec * (TIMER_FREQ / 1000000) +
-                   usec * ((TIMER_FREQ % 1000000) / 1000) / 1000 +
-                   usec * (TIMER_FREQ % 1000) / 1000000;
-#endif
+               /* Using a 64-bit intermediate here avoids most overflow. */
+               microseconds = n;
+               n = microseconds * TIMER_FREQ / 1000000;
        }
 
        limit = TIMER_FREQ / hz;

amd64: i8254_delay(): simpler microsecond->ticks conversion

Reply via email to