On Sun, May 19, 2019 at 12:47:11AM -0500, Scott Cheloha wrote:
> This code is really fidgety and I think we can do better.
> 
> If we use a 64-bit value here for microseconds the compiler arranges
> for all the math to be done with 64-bit quantites.  I'm pretty sure
> this is standard C numeric type upconversion.  As noted in the comment
> for the assembly, use of the 64-bit intermediate avoids overflow for
> most plausible inputs.
> 
> So with one line of code we get the same result as we do with the
> __GNUC__ assembly and the nigh incomprehensible default computation.
> 
> We could add error checking, but it wasn't here before, and you won't
> overflow an int of 8254 ticks until you try to delay for more than
> 1799795545 microseconds.  Maybe a KASSERT...
> 
> My toy program suggests that this simpler code is actually faster when
> compiled with both base gcc and clang.  But that's just a userspace
> benchmark, not measurement during practical execution in the kernel.
> Is the setup for delay(9) considered a hot path?
> 
> Near-identical simplification could be done in arch/i386/isa/clock.c.
> I don't have an i386 though so I don't know whether this code might be
> faster on such hardware.  It'd definitely be smaller though.
> 
> thoughts?
> 

What bug or problem is this fixing?

-ml

> Index: arch/amd64/isa/clock.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/isa/clock.c,v
> retrieving revision 1.28
> diff -u -p -r1.28 clock.c
> --- arch/amd64/isa/clock.c    27 Jul 2018 21:11:31 -0000      1.28
> +++ arch/amd64/isa/clock.c    19 May 2019 05:31:04 -0000
> @@ -226,6 +226,7 @@ gettick(void)
>  void
>  i8254_delay(int n)
>  {
> +     int64_t microseconds;
>       int limit, tick, otick;
>       static const int delaytab[26] = {
>                0,  2,  3,  4,  5,  6,  7,  9, 10, 11,
> @@ -242,31 +243,9 @@ i8254_delay(int n)
>       if (n <= 25)
>               n = delaytab[n];
>       else {
> -#ifdef __GNUC__
> -             /*
> -              * Calculate ((n * TIMER_FREQ) / 1e6) using explicit assembler
> -              * code so we can take advantage of the intermediate 64-bit
> -              * quantity to prevent loss of significance.
> -              */
> -             int m;
> -             __asm volatile("mul %3"
> -                              : "=a" (n), "=d" (m)
> -                              : "0" (n), "r" (TIMER_FREQ));
> -             __asm volatile("div %4"
> -                              : "=a" (n), "=d" (m)
> -                              : "0" (n), "1" (m), "r" (1000000));
> -#else
> -             /*
> -              * Calculate ((n * TIMER_FREQ) / 1e6) without using floating
> -              * point and without any avoidable overflows.
> -              */
> -             int sec = n / 1000000,
> -                 usec = n % 1000000;
> -             n = sec * TIMER_FREQ +
> -                 usec * (TIMER_FREQ / 1000000) +
> -                 usec * ((TIMER_FREQ % 1000000) / 1000) / 1000 +
> -                 usec * (TIMER_FREQ % 1000) / 1000000;
> -#endif
> +             /* Using a 64-bit intermediate here avoids most overflow. */
> +             microseconds = n;
> +             n = microseconds * TIMER_FREQ / 1000000;
>       }
>  
>       limit = TIMER_FREQ / hz;
> 

Reply via email to