On Sun, May 19, 2019 at 12:47:11AM -0500, Scott Cheloha wrote:
> This code is really fidgety and I think we can do better.
>
> If we use a 64-bit value here for microseconds the compiler arranges
> for all the math to be done with 64-bit quantites. I'm pretty sure
> this is standard C numeric type upconversion. As noted in the comment
> for the assembly, use of the 64-bit intermediate avoids overflow for
> most plausible inputs.
>
> So with one line of code we get the same result as we do with the
> __GNUC__ assembly and the nigh incomprehensible default computation.
>
> We could add error checking, but it wasn't here before, and you won't
> overflow an int of 8254 ticks until you try to delay for more than
> 1799795545 microseconds. Maybe a KASSERT...
>
> My toy program suggests that this simpler code is actually faster when
> compiled with both base gcc and clang. But that's just a userspace
> benchmark, not measurement during practical execution in the kernel.
> Is the setup for delay(9) considered a hot path?
>
> Near-identical simplification could be done in arch/i386/isa/clock.c.
> I don't have an i386 though so I don't know whether this code might be
> faster on such hardware. It'd definitely be smaller though.
>
> thoughts?
>
What bug or problem is this fixing?
-ml
> Index: arch/amd64/isa/clock.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/isa/clock.c,v
> retrieving revision 1.28
> diff -u -p -r1.28 clock.c
> --- arch/amd64/isa/clock.c 27 Jul 2018 21:11:31 -0000 1.28
> +++ arch/amd64/isa/clock.c 19 May 2019 05:31:04 -0000
> @@ -226,6 +226,7 @@ gettick(void)
> void
> i8254_delay(int n)
> {
> + int64_t microseconds;
> int limit, tick, otick;
> static const int delaytab[26] = {
> 0, 2, 3, 4, 5, 6, 7, 9, 10, 11,
> @@ -242,31 +243,9 @@ i8254_delay(int n)
> if (n <= 25)
> n = delaytab[n];
> else {
> -#ifdef __GNUC__
> - /*
> - * Calculate ((n * TIMER_FREQ) / 1e6) using explicit assembler
> - * code so we can take advantage of the intermediate 64-bit
> - * quantity to prevent loss of significance.
> - */
> - int m;
> - __asm volatile("mul %3"
> - : "=a" (n), "=d" (m)
> - : "0" (n), "r" (TIMER_FREQ));
> - __asm volatile("div %4"
> - : "=a" (n), "=d" (m)
> - : "0" (n), "1" (m), "r" (1000000));
> -#else
> - /*
> - * Calculate ((n * TIMER_FREQ) / 1e6) without using floating
> - * point and without any avoidable overflows.
> - */
> - int sec = n / 1000000,
> - usec = n % 1000000;
> - n = sec * TIMER_FREQ +
> - usec * (TIMER_FREQ / 1000000) +
> - usec * ((TIMER_FREQ % 1000000) / 1000) / 1000 +
> - usec * (TIMER_FREQ % 1000) / 1000000;
> -#endif
> + /* Using a 64-bit intermediate here avoids most overflow. */
> + microseconds = n;
> + n = microseconds * TIMER_FREQ / 1000000;
> }
>
> limit = TIMER_FREQ / hz;
>