I doubt we can make addmul_1 run faster on sandybridge.
But I'd like mul_basecase to run much faster than 3 c/l. Then
sqr_basecase and redc_1, redc_2 should be fixed.
An addmul_2 running better at 3 c/l or better would be great. That
means we need to handle a tick in it using = 17 insns,
Hello,
is there any objection if I replace most uses of -_mp_alloc by calls to
the ALLOC macro in mp[zqf] (and similarly for _mp_size, etc)? It helps
when experimenting... I am also considering moving the NUM and DEN macros
from test/mpq/t-cmp* to gmp-impl.h, since I assume mpq_numref and
Marc Glisse marc.gli...@inria.fr writes:
is there any objection if I replace most uses of -_mp_alloc by calls
to the ALLOC macro in mp[zqf] (and similarly for _mp_size, etc)? It
helps when experimenting... I am also considering moving the NUM and
DEN macros from test/mpq/t-cmp* to
Ciao,
Il Mer, 22 Febbraio 2012 7:41 pm, Torbjorn Granlund ha scritto:
Marc Glisse marc.gli...@inria.fr writes:
their length. By the way, is there any difference between PTR and
LIMBS? Say one that should be used in some circumstances and one in
others?
You're welcome to clean up
bodr...@mail.dm.unipi.it writes:
Unrelated :-) We might define more macros like TMP_ALLOC_LIMBS_2 . I mean
_3 and _4. So that they can be used to reduce the number of allocations.
Do you agree? (I just touched mpz/gcdext.c, and _4 should be used there).
I'd vote for killing
Torbjorn Granlund t...@gmplib.org writes:
TMP_ALLOC_LIMBS_2 is clutter IMHO.
Sure, it's pointless in a normal build.
As I understand it, the reason for having TMP_ALLOC_LIMBS_2 is to make
--enable-alloca=debug more effective, by getting some kind of red zone
separating the two areas. Whether
On Wed, 22 Feb 2012, Torbjorn Granlund wrote:
bodr...@mail.dm.unipi.it writes:
Unrelated :-) We might define more macros like TMP_ALLOC_LIMBS_2 . I mean
_3 and _4. So that they can be used to reduce the number of allocations.
Do you agree? (I just touched mpz/gcdext.c, and _4 should be used
Marc Glisse marc.gli...@inria.fr writes:
That's for the alloca case. Without alloca, one call to malloc is
better than two (although that usually also means the numbers are big
and any gmp operation will dwarf allocation). Also, the threshold
between alloca and malloc is quite high, and