t...@gmplib.org (Torbjörn Granlund) writes:
> ni...@lysator.liu.se (Niels Möller) writes:
> we might also try doing addmul_2 using toom32, which
> would save 1/3 of the mul instructions. Toom32 is nice because we can
> use the four easiest evaluation points: 0, infinity, and +/-1.
>
>
ni...@lysator.liu.se (Niels Möller) writes:
> Our latest batch of x86-32 code dates from 2011 (for the original Intel
> atom) but we have not done anything for high-end AMD and Intel CPUs
> (e.g., AMD k10, bulldozer, piledriver, steamroller, excavator, zen, or
> Intel penryn, nehalem,
t...@gmplib.org (Torbjörn Granlund) writes:
> Our latest batch of x86-32 code dates from 2011 (for the original Intel
> atom) but we have not done anything for high-end AMD and Intel CPUs
> (e.g., AMD k10, bulldozer, piledriver, steamroller, excavator, zen, or
> Intel penryn, nehalem,
The new measurement reporting pages have highlighted many improvement
opportunities, and as you might have seen I've lately fixed a handful of
the _basecase functions for x86-64.
An aspect not directly covered by the new measurement reporting is that
the 32-bit and 64-bit performance-per-limb is