Re: mini-gmp mpz_{get,set}_d not fully compatible with GMP

2018-03-11 Thread Adrien Prost-Boucle
On Sun, 2018-03-11 at 00:15 +0100, Niels Möller wrote: > > so that the initializer is evaluated exactly once, after the first entry > to the function, but before any thread uses the value (likely generating > code using pthread_once). Has anything like that made it into recent C > standards? __at

Re: New code for primality testing

2018-11-17 Thread Adrien Prost-Boucle
Hi Marco, Thank you for this improvement of the primality testing :-) I was working on my side on implementing BPSW with general aim at proposing this for GMP. So I'm very happy to see interest in this from other people too. And to see a certainly-better-than-mine implementation of these Lucas s

Re: New code for primality testing

2018-11-20 Thread Adrien Prost-Boucle
Hi Marco, > So, you have some possible implementation details in mind... If you want > to read the code I pushed, and ask any question like "why did you > implement this way that step?", I'll be happy to answer. My approach was very modest in its design. Mostly "use" existing mpz functions to rea

Re: New code for primality testing

2018-11-20 Thread Adrien Prost-Boucle
Another way at solving the randstate issue with just spec/doc : > The probabilistic test should take a gmp_randstate_t parameter, so that > repeated calls to the test can give increasing confidence. There may be a simpler (and more generic) solution. The confidence issue can be solved just by ex

Re: New code for primality testing

2018-11-21 Thread Adrien Prost-Boucle
Hi, > Does this mean that we have to implement an unconditional primality > proving function to test if our "probab_primes" are really primes or they > are not? > Adrien will be happy to read this :-D Yes and no... testing all number up to 2^64 would be a bit slow xD Besides about BPSW returning

mpn_sqrtrem1

2016-11-03 Thread Adrien Prost-Boucle
Hi, This is a follow-up for previous discussion about improvement of mpn_sqrtrem1() and maybe mpn_sqrtrem2(): https://gmplib.org/list-archives/gmp-devel/2016-July/004310.html Getting as much performance as possible while using clear algorithms is something I am appreciate very much, so I did a bi

Re: mpn_sqrtrem1

2016-12-19 Thread Adrien Prost-Boucle
Hi Torbjörn, I updated my code since my last message to the mailing list. Now there is a small archive on my server, with the Makefile: http://94.23.21.190/publicshare/sqrt.tar.gz If you find some interest in my code, I can create a github/gitlab repo. Basically my code will compile only when the

Re: PS: mpn_sqrtrem1

2016-12-20 Thread Adrien Prost-Boucle
I'm not sure using a table of invroot*invroot would bring speedup. On one side, maybe prefetch processor stages can read from the table transparently, But using a table involves adding a table to the binary + doing a memory access. On the other side, doing invroot*invroot is a simple register-on

Re: mpn_sqrtrem1

2016-12-20 Thread Adrien Prost-Boucle
Hi, > Is there a reason why you defined three different invsqrt8_ arrays? > Doesn't invsqrttab contain suitable values? Unlike current GMP code which has only one table, I couldn't find one unique table for all three uses. Note that the tables invsqrt8_32b and invsqrt8_64b are different for sure

Re: mpn_sqrtrem1

2016-12-20 Thread Adrien Prost-Boucle
> It is quite difficult to interpret the numbers, times spent by direct > calls to some functions is compared to the time spent by other functions > called trough a wrapper... Ah that's right I wanted to test overhead of function call but forgot xD I've just tested with inline declaration of fun

Re: mpn_sqrtrem1

2016-12-23 Thread Adrien Prost-Boucle
hardware division, then my 64x2 implementation should be better for sure. However don't forget that nothing is fully checked nor proven in my implementation! It may still be wrong in some corner cases! Adrien On Tue, 2016-12-20 at 22:36 +0100, Marco Bodrato wrote: > Ciao, > > Il Lun

Fallback code for two-word multiplication

2017-01-02 Thread Adrien Prost-Boucle
Hi, This is a sort of follow-up for last month discussion about two-word multiplication. I have been trying another implementation, which has no check for overflow. Most often, it seems to be significantly faster. My code is available here: http://94.23.21.190/publicshare/mul64x2.tar.gz The Mak

Re: mpn_sqrtrem{1,2}

2017-01-28 Thread Adrien Prost-Boucle
*r to check if the root is too low usub64x2(&remh, &reml, root >> 63, root << 1, remh, reml); // Get the full correction corr += remh >> 63; #endif // Apply the error correction //printf("%"PRIi64" ", corr);

Re: Re: mpn_sqrtrem{1,2}

2017-01-29 Thread Adrien Prost-Boucle
probably other subtleties about floating-pout stuff that I am not aware of... So first I'd like to know, what do GMP developers think about using FP there? Adrien On Sat, 2017-01-28 at 17:49 +0100, Adrien Prost-Boucle wrote: > Hi Marco, > > Thank you for the tests, > and f

Re: mpn_sqrtrem{1,2}

2017-01-29 Thread Adrien Prost-Boucle
, Torbjörn Granlund wrote: > > Adrien Prost-Boucle writes: > >   So first I'd like to know, >   what do GMP developers think about using FP there? >    > Making GMP dependent in libm is not OK. > > Using time-critical floating-point features on a CPU-by-CPU basis is

Re: mpn_sqrtrem{1,2}

2017-02-16 Thread Adrien Prost-Boucle
ating sqrt. Regards, Adrien On Wed, 2017-02-01 at 18:55 +0100, Niels Möller wrote: > > Adrien Prost-Boucle writes: > > > Maybe the availability of SSE / AVX / NEON etc instruction sets can be > > checked at compilation time? > > That's what configure (and its helper

Re: mpn_sqrtrem{1,2}

2017-03-09 Thread Adrien Prost-Boucle
Hi, > I'll be grateful if you run your measures again, > in particular for 100 and 128 bits. I pulled the latest changes (hg rev 17327) and launched my tests again. I cleaned up the values ad numbers of bits, just so it's a bit cleaner. Reminder: I compiled GMP with functions mpn_sqrtrem{1,2} de

Re: Re: mpn_sqrtrem{1,2}

2017-03-09 Thread Adrien Prost-Boucle
Sorry for alignment in my previous email. Here is a cleaner version. Note that the first time value (left) is for GMP hg17327 And the second one (right) is with the FP functions GMP 6.1.99.hg17327 modified so functions mpn_sqrtrem{1,2} are extern Laptop Core 2 Duo 2GHz All tests are repeated 1

Re: mpn_sqrtrem{1,2}

2017-03-14 Thread Adrien Prost-Boucle
Hi, On Mon, 2017-03-13 at 22:11 +0100, Marco Bodrato wrote: > Not even my change really gave 2x... but, before it, you could obtain > 20-30%, now you measure 50-70%... that's the right direction :-) I just reached 2.4x :-) For that, I added macros to indicate whether sqrtrem{1,2} need normalized

Re: mpn_sqrtrem{1,2}

2017-03-15 Thread Adrien Prost-Boucle
Hi, > I miss a case: 32 bits; to fully evaluate the impact of the patch+FP on > one-limb operands in the range 1..62. Isn't 64-bit and 32-bit data identical, for one mpn_sqrtrem1 call on x86-64? I don't get why we would see a difference. Or, do you mean we should add another test (along with tes

Re: mpn_sqrtrem{1,2}

2017-03-22 Thread Adrien Prost-Boucle
Hi, I now have a working version of sqrtrem1 that uses floating-point sqrt instruction on x86-64. For a quick glance, here is the speedup on my two machines: Laptop Core 2 Duo 2GHz == 1) Time when using only rev 17327 2) Time when using rev 17327 + FP version for sqrtrem1 on

Re: mpn_sqrtrem{1,2}

2017-03-23 Thread Adrien Prost-Boucle
Hi, > About the pure C code, integer version that was working on, > I now have an exhaustively validated version, with only one table of invsqrt > shared between the 2 versions (32b, 64b, 2x64b). > Previously I observed a moderate but interesting speedup compared to GMP. > But... when I put that

Re: mpn_sqrtrem{1,2}

2017-03-24 Thread Adrien Prost-Boucle
As far as I know, these instructions are affected bu rounding mode. And no instruction specifies the rounding mode. So, I have to assume worst-case and consider the user programs may have set a rounding mode that I don't expect. That's why I propose - as a first proposal - a function that always c

Re: mpn_sqrtrem{1,2}

2017-03-24 Thread Adrien Prost-Boucle
This page, at the end, tends to discourage playing with rounding mode too frequently: https://www.gnu.org/software/libc/manual/html_node/Rounding.html On Fri, 2017-03-24 at 15:56 +0100, Torbjörn Granlund wrote: > I haven't followed this thread carefully, but now I have some questions: > > Are t

Re: mpn_sqrtrem{1,2} - rounding mode

2017-03-25 Thread Adrien Prost-Boucle
Hi, > Let N = 2147580932 = 46342^2 - 1, and rounding be toward +infinity. > Will N be rounded up to 8388989*256 when converted to float? > Then its root 11863552.60.../256 rounded up to 11863553/256? > and this number converted back to integer rounding it up to 46343? > If so, we have a possible +

Re: mpn_sqrtrem{1,2} - rounding mode

2017-03-25 Thread Adrien Prost-Boucle
> Il Ven, 24 Marzo 2017 11:54 pm, Adrien Prost-Boucle ha scritto: > > This page, at the end, tends to discourage playing with rounding mode too > > frequently: > > https://www.gnu.org/software/libc/manual/html_node/Rounding.html > > This page is about writing portab

Re: mpn_sqrtrem{1,2} - rounding mode - erratum

2017-03-25 Thread Adrien Prost-Boucle
> I tested with a function that repeatedly sets all rounding modes. > The result is: 995413 calls to fesetround() per second on my laptop > That's extremely slow given the speed of the sqrt function! Ooops there was a typo in my code: division by 4 instead of multiplication by 4 (the number of rou

Re: mpn_sqrtrem{1,2} - patch for pure C implem

2017-03-28 Thread Adrien Prost-Boucle
Hi, On Sat, 2017-03-25 at 15:38 +0100, Marco Bodrato wrote: > Il Gio, 23 Marzo 2017 8:46 pm, Adrien Prost-Boucle ha scritto: > > > About the pure C code, integer version that was working on, > > > But... when I put that code in GMP code, that resulted in > >

Re: mpn_sqrtrem{1,2} - patch for pure C implem

2017-03-28 Thread Adrien Prost-Boucle
> The only branch is for the final correction at end of mpn_sqrtrem1. > I tried with the previous mpn_sqrtrem1 version, which has a condition, > and with an unconditional code that needs 2 multiplications. > > My version with unconditional correction had same speed. > My version with current mpn_

Re: mpn_sqrtrem{1,2} - patch for pure C implem

2017-03-28 Thread Adrien Prost-Boucle
On Wed, 2017-03-29 at 02:06 +0200, Marco Bodrato wrote: > You didn't try > ./configure ABI=32 && make && make check > did you? Couldn't try, little problem with multilib install... "Release early", they said xD Standard copy/paste problem... Just replace vsh by a0 for now, I'll test ABI=32 when p

Re: mpn_sqrtrem{1,2} - floating-point sqrts{s,d}

2017-04-01 Thread Adrien Prost-Boucle
Hi, On Sat, 2017-03-25 at 21:34 +0100, Torbjörn Granlund wrote: > The sqrtss and sqrtds are SIMD operations, right?  That means that if we > don't initialise all input fields with something, they might contain > special values which triggers exceptional conditions. The Intel docs say that instruc

Re: mpn_sqrtrem{1,2} - patch for pure C implem

2017-04-01 Thread Adrien Prost-Boucle
On Sat, 2017-04-01 at 18:15 +0200, Marco Bodrato wrote: > Sorry, but even correcting the obvious typos, it doesn't pass the > tests. I think I have found the error. The final correction was wrong. I hope it's OK now, but... I still can't compile GMP with ABI=32. Like you suggested I launched: ./c

Re: mpn_sqrtrem{1,2} - patch for pure C implem

2017-04-01 Thread Adrien Prost-Boucle
On Sat, 2017-04-01 at 21:58 +0200, Marc Glisse wrote: > Did you run "make distclean" between the 64-bit build and the 32-bit > build? (doing the build out-of-tree avoids this kind of problem, since > you can easily do the 32-bit build in a different directory) I did clean but forgot about distclea

Re: mpn_sqrtrem{1,2} - patch for pure C implem

2017-04-02 Thread Adrien Prost-Boucle
as you suggest. Regards, Adrien On Sun, 2017-04-02 at 06:44 +0200, Marco Bodrato wrote: > Ciao, > > Il Sab, 1 Aprile 2017 9:02 pm, Adrien Prost-Boucle ha scritto: > > On Sat, 2017-04-01 at 18:15 +0200, Marco Bodrato wrote: > > > After the patch: > > > $ (cd tests

Re: mpn_sqrtrem{1,2} - patch for pure C implem

2017-04-02 Thread Adrien Prost-Boucle
On Sun, 2017-04-02 at 06:44 +0200, Marco Bodrato wrote: > > For ABI=32, can you please tell us the timings obtained with: > > make&&(cd tests/devel/;make sqrtrem_1_2&&time ./sqrtrem_1_2 x 1) > > before, and after your patch (maybe playing with the different flavours of > the correction step :-)?