Re: hgcd1/2

2019-09-04 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > Excellent. I just pushed tuning support, with HGCD2_METHOD == 1 using > plain division for div1, and HGCD2_METHOD == 2 (default) using the old > method with the div1 function. I had a quick look at the machines that completed tuning this night. These

Re: hgcd1/2

2019-09-04 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > ni...@lysator.liu.se (Niels Möller) writes: > > Pushed now. Plan is to next add a tuned HGCD2_METHOD, initially > controlling which div1 to use. Torbjörn, can you add that to the > thresholds web page once it's ready? > > I went ahead and added

Re: hgcd1/2

2019-09-04 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Pushed now. Plan is to next add a tuned HGCD2_METHOD, initially controlling which div1 to use. Torbjörn, can you add that to the thresholds web page once it's ready? I went ahead and added it. When data comes in, the subpage will become more

Re: hgcd1/2

2019-09-04 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > First step is to add support to speed. Does the below look reasonable? I > modelled it a bit on SPEED_ROUTINE_MODLIMB_INVERT, which also measures a > fix size, but I don't quite understand all of struct speed_params. Pushed now. Plan is to next add a

Re: mpz_nextprime

2019-09-04 Thread Niels Möller
paul zimmermann writes: > with a small base, no need of k-ary modexp. In the REDC domain, instead of > computing a'*b'/r mod n, where b' = r*b mod n is the REDC encoding of the > base b, just compute a'*b mod n. Right, since the montgomery/redc mapping a <-> a' is linear, the needed multiply

Re: mpz_nextprime

2019-09-04 Thread paul zimmermann
Torbjörn, > With k-ary modexp, the number of multiplications is log2(n)/k, > while the number of squarings are log(n). Therefore a small b will not > make a huge difference. > > As Marco points implies, one may handle some special values like b = 2 > also in REDC coding and still not pay

Small-base modexp (was: Re: mpz_nextprime)

2019-09-04 Thread Niels Möller
"Marco Bodrato" writes: > Moreover one may avoid the windowing on the exponent (we currently > pre-compute b^3, b^5, b^7... so that we need fewer products) and use a > shift by a single bit every time a multiplication times the base is > needed. And in the bit more general small-base case, that

Re: mpz_nextprime

2019-09-04 Thread Torbjörn Granlund
"Marco Bodrato" writes: > I think I've seen comments in the literature saying that 2^e mod m is > "obviously" more efficient than general modular exponentiation. But as > far as I can tell, b^e mod m with small b needs essentially one modular > squaring per bit in the exponent, including

Re: mpz_nextprime

2019-09-04 Thread Marco Bodrato
Ciao, Il Mer, 4 Settembre 2019 10:44 am, Niels Möller ha scritto: > Seth Troisi writes: >> I think the vast majority (98%+) of the function is spent in calls to >> mpz_rabinmiller. > To get better speed, we'd need to either increase the size of the prime > table used for sieving considerably,

Re: mpz_nextprime

2019-09-04 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Good to have some profile numbers! Then microoptimizations elsewhere will not make any significant difference. To get better speed, we'd need to either increase the size of the prime table used for sieving considerably, say by a factor of 2, 5

Re: mpz_nextprime

2019-09-04 Thread Niels Möller
Seth Troisi writes: > Yes I've tried that too. It was the same exact speed, so I started to > profile the function, which hasn't been easy. > I think the vast majority (98%+) of the function is spent in calls to > mpz_rabinmiller. Good to have some profile numbers! Then microoptimizations