On Sat, May 18, 2019 at 10:57:32PM -0700, Mike Larkin wrote:
> On Sun, May 19, 2019 at 12:47:11AM -0500, Scott Cheloha wrote:
> > This code is really fidgety and I think we can do better.
> > 
> > If we use a 64-bit value here for microseconds the compiler arranges
> > for all the math to be done with 64-bit quantites.  I'm pretty sure
> > this is standard C numeric type upconversion.  As noted in the comment
> > for the assembly, use of the 64-bit intermediate avoids overflow for
> > most plausible inputs.
> > 
> > So with one line of code we get the same result as we do with the
> > __GNUC__ assembly and the nigh incomprehensible default computation.
> > 
> > We could add error checking, but it wasn't here before, and you won't
> > overflow an int of 8254 ticks until you try to delay for more than
> > 1799795545 microseconds.  Maybe a KASSERT...
> > 
> > My toy program suggests that this simpler code is actually faster when
> > compiled with both base gcc and clang.  But that's just a userspace
> > benchmark, not measurement during practical execution in the kernel.
> > Is the setup for delay(9) considered a hot path?
> > 
> > Near-identical simplification could be done in arch/i386/isa/clock.c.
> > I don't have an i386 though so I don't know whether this code might be
> > faster on such hardware.  It'd definitely be smaller though.
> > 
> > thoughts?
> 
> What bug or problem is this fixing?

The code is more complicated than it needs to be.

There are two different ways to convert microseconds to i8254 ticks
in this function.  One is assembly, the other is a non-obvious method
that avoids overflowing an int while using ints for partial sums.

You can replace both of them with legible, plain C that does exactly
what it looks like it is doing by using a larger intermediate type.

Reply via email to