On Sun, 2018-03-11 at 00:15 +0100, Niels Möller wrote:
>
> so that the initializer is evaluated exactly once, after the first entry
> to the function, but before any thread uses the value (likely generating
> code using pthread_once). Has anything like that made it into recent C
> standards?
__at
Hi Marco,
Thank you for this improvement of the primality testing :-)
I was working on my side on implementing BPSW with general aim at proposing
this for GMP.
So I'm very happy to see interest in this from other people too.
And to see a certainly-better-than-mine implementation of these Lucas s
Hi Marco,
> So, you have some possible implementation details in mind... If you want
> to read the code I pushed, and ask any question like "why did you
> implement this way that step?", I'll be happy to answer.
My approach was very modest in its design.
Mostly "use" existing mpz functions to rea
Another way at solving the randstate issue with just spec/doc :
> The probabilistic test should take a gmp_randstate_t parameter, so that
> repeated calls to the test can give increasing confidence.
There may be a simpler (and more generic) solution.
The confidence issue can be solved just by ex
Hi,
> Does this mean that we have to implement an unconditional primality
> proving function to test if our "probab_primes" are really primes or they
> are not?
> Adrien will be happy to read this :-D
Yes and no... testing all number up to 2^64 would be a bit slow xD
Besides about BPSW returning
Hi,
This is a follow-up for previous discussion about improvement of
mpn_sqrtrem1() and maybe mpn_sqrtrem2():
https://gmplib.org/list-archives/gmp-devel/2016-July/004310.html
Getting as much performance as possible while using clear algorithms is
something I am appreciate very much, so I did a bi
Hi Torbjörn,
I updated my code since my last message to the mailing list.
Now there is a small archive on my server, with the Makefile:
http://94.23.21.190/publicshare/sqrt.tar.gz
If you find some interest in my code, I can create a github/gitlab repo.
Basically my code will compile only when the
I'm not sure using a table of invroot*invroot would bring speedup.
On one side, maybe prefetch processor stages can read from the table
transparently,
But using a table involves adding a table to the binary + doing a memory access.
On the other side, doing invroot*invroot is a simple register-on
Hi,
> Is there a reason why you defined three different invsqrt8_ arrays?
> Doesn't invsqrttab contain suitable values?
Unlike current GMP code which has only one table,
I couldn't find one unique table for all three uses.
Note that the tables invsqrt8_32b and invsqrt8_64b are different for sure
> It is quite difficult to interpret the numbers, times spent by direct
> calls to some functions is compared to the time spent by other functions
> called trough a wrapper...
Ah that's right I wanted to test overhead of function call but forgot xD
I've just tested with inline declaration of fun
hardware division, then my 64x2 implementation should be
better for sure.
However don't forget that nothing is fully checked nor proven in my
implementation! It may still be wrong in some corner cases!
Adrien
On Tue, 2016-12-20 at 22:36 +0100, Marco Bodrato wrote:
> Ciao,
>
> Il Lun
Hi,
This is a sort of follow-up for last month discussion about two-word
multiplication.
I have been trying another implementation, which has no check for overflow.
Most often, it seems to be significantly faster.
My code is available here: http://94.23.21.190/publicshare/mul64x2.tar.gz
The Mak
*r to check if the root is too low
usub64x2(&remh, &reml, root >> 63, root << 1, remh, reml);
// Get the full correction
corr += remh >> 63;
#endif
// Apply the error correction
//printf("%"PRIi64" ", corr);
probably other subtleties about floating-pout stuff
that I am not aware of...
So first I'd like to know,
what do GMP developers think about using FP there?
Adrien
On Sat, 2017-01-28 at 17:49 +0100, Adrien Prost-Boucle wrote:
> Hi Marco,
>
> Thank you for the tests,
> and f
, Torbjörn Granlund wrote:
> > Adrien Prost-Boucle writes:
>
> So first I'd like to know,
> what do GMP developers think about using FP there?
>
> Making GMP dependent in libm is not OK.
>
> Using time-critical floating-point features on a CPU-by-CPU basis is
ating sqrt.
Regards,
Adrien
On Wed, 2017-02-01 at 18:55 +0100, Niels Möller wrote:
> > Adrien Prost-Boucle writes:
>
> > Maybe the availability of SSE / AVX / NEON etc instruction sets can be
> > checked at compilation time?
>
> That's what configure (and its helper
Hi,
> I'll be grateful if you run your measures again,
> in particular for 100 and 128 bits.
I pulled the latest changes (hg rev 17327) and launched my tests again.
I cleaned up the values ad numbers of bits, just so it's a bit cleaner.
Reminder: I compiled GMP with functions mpn_sqrtrem{1,2} de
Sorry for alignment in my previous email.
Here is a cleaner version.
Note that the first time value (left) is for GMP hg17327
And the second one (right) is with the FP functions
GMP 6.1.99.hg17327 modified so functions mpn_sqrtrem{1,2} are extern
Laptop Core 2 Duo 2GHz
All tests are repeated 1
Hi,
On Mon, 2017-03-13 at 22:11 +0100, Marco Bodrato wrote:
> Not even my change really gave 2x... but, before it, you could obtain
> 20-30%, now you measure 50-70%... that's the right direction :-)
I just reached 2.4x :-)
For that, I added macros to indicate whether sqrtrem{1,2} need normalized
Hi,
> I miss a case: 32 bits; to fully evaluate the impact of the patch+FP on
> one-limb operands in the range 1..62.
Isn't 64-bit and 32-bit data identical, for one mpn_sqrtrem1 call on x86-64?
I don't get why we would see a difference.
Or, do you mean we should add another test (along with tes
Hi,
I now have a working version of sqrtrem1 that uses floating-point sqrt
instruction on x86-64.
For a quick glance, here is the speedup on my two machines:
Laptop Core 2 Duo 2GHz
==
1) Time when using only rev 17327
2) Time when using rev 17327 + FP version for sqrtrem1 on
Hi,
> About the pure C code, integer version that was working on,
> I now have an exhaustively validated version, with only one table of invsqrt
> shared between the 2 versions (32b, 64b, 2x64b).
> Previously I observed a moderate but interesting speedup compared to GMP.
> But... when I put that
As far as I know, these instructions are affected bu rounding mode.
And no instruction specifies the rounding mode.
So, I have to assume worst-case and consider the user programs may
have set a rounding mode that I don't expect.
That's why I propose - as a first proposal - a function that always
c
This page, at the end, tends to discourage playing with rounding mode too
frequently:
https://www.gnu.org/software/libc/manual/html_node/Rounding.html
On Fri, 2017-03-24 at 15:56 +0100, Torbjörn Granlund wrote:
> I haven't followed this thread carefully, but now I have some questions:
>
> Are t
Hi,
> Let N = 2147580932 = 46342^2 - 1, and rounding be toward +infinity.
> Will N be rounded up to 8388989*256 when converted to float?
> Then its root 11863552.60.../256 rounded up to 11863553/256?
> and this number converted back to integer rounding it up to 46343?
> If so, we have a possible +
> Il Ven, 24 Marzo 2017 11:54 pm, Adrien Prost-Boucle ha scritto:
> > This page, at the end, tends to discourage playing with rounding mode too
> > frequently:
> > https://www.gnu.org/software/libc/manual/html_node/Rounding.html
>
> This page is about writing portab
> I tested with a function that repeatedly sets all rounding modes.
> The result is: 995413 calls to fesetround() per second on my laptop
> That's extremely slow given the speed of the sqrt function!
Ooops there was a typo in my code: division by 4 instead of
multiplication by 4 (the number of rou
Hi,
On Sat, 2017-03-25 at 15:38 +0100, Marco Bodrato wrote:
> Il Gio, 23 Marzo 2017 8:46 pm, Adrien Prost-Boucle ha scritto:
> > > About the pure C code, integer version that was working on,
> > > But... when I put that code in GMP code, that resulted in
> >
> The only branch is for the final correction at end of mpn_sqrtrem1.
> I tried with the previous mpn_sqrtrem1 version, which has a condition,
> and with an unconditional code that needs 2 multiplications.
>
> My version with unconditional correction had same speed.
> My version with current mpn_
On Wed, 2017-03-29 at 02:06 +0200, Marco Bodrato wrote:
> You didn't try
> ./configure ABI=32 && make && make check
> did you?
Couldn't try, little problem with multilib install...
"Release early", they said xD
Standard copy/paste problem...
Just replace vsh by a0 for now, I'll test ABI=32 when p
Hi,
On Sat, 2017-03-25 at 21:34 +0100, Torbjörn Granlund wrote:
> The sqrtss and sqrtds are SIMD operations, right? That means that if we
> don't initialise all input fields with something, they might contain
> special values which triggers exceptional conditions.
The Intel docs say that instruc
On Sat, 2017-04-01 at 18:15 +0200, Marco Bodrato wrote:
> Sorry, but even correcting the obvious typos, it doesn't pass the
> tests.
I think I have found the error.
The final correction was wrong.
I hope it's OK now, but... I still can't compile GMP with ABI=32.
Like you suggested I launched: ./c
On Sat, 2017-04-01 at 21:58 +0200, Marc Glisse wrote:
> Did you run "make distclean" between the 64-bit build and the 32-bit
> build? (doing the build out-of-tree avoids this kind of problem, since
> you can easily do the 32-bit build in a different directory)
I did clean but forgot about distclea
as you suggest.
Regards,
Adrien
On Sun, 2017-04-02 at 06:44 +0200, Marco Bodrato wrote:
> Ciao,
>
> Il Sab, 1 Aprile 2017 9:02 pm, Adrien Prost-Boucle ha scritto:
> > On Sat, 2017-04-01 at 18:15 +0200, Marco Bodrato wrote:
> > > After the patch:
> > > $ (cd tests
On Sun, 2017-04-02 at 06:44 +0200, Marco Bodrato wrote:
>
> For ABI=32, can you please tell us the timings obtained with:
>
> make&&(cd tests/devel/;make sqrtrem_1_2&&time ./sqrtrem_1_2 x 1)
>
> before, and after your patch (maybe playing with the different flavours of
> the correction step :-)?
35 matches
Mail list logo