Most popular GMP downloads

2014-04-21 Thread Torbjörn Granlund
Surely, our newer releases with broader testing, faster code, richer set of functions, and broader systems support are much more popular than older releases? Here is the toplist: https://gmplib.org/~tege/stats/usage_201404.html#TOPURLS GMP 4.3.2 is the most popular release, followed by 5.1.1,

Re: Most popular GMP downloads

2014-04-23 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: If you add hits collected by /var/ftp/pub/gmp-5.1.3/gmp-5.1.3.tar.bz2 with hits to /var/ftp/pub/gmp/gmp-5.1.3.tar.bz2 and so on... you will see that 5.1.3 is more popular than 5.1.1 ... You're right, good! Does this statistics count

Re: mpf: which bug should we correct? (doc or code)

2014-06-02 Thread Torbjörn Granlund
Instead of re-implementing mpfr, an easy fix would be to simply drop mpf simply breaking backward compatibility is easy on our side, but it may be not on users side... Of course I agree, there is no need to (re-)implement correct rounding. The best fix probably is to relax the claim

Stack allocation

2014-06-06 Thread Torbjörn Granlund
This started as a thread in gmp-discuss about crashes due to stack overflow. I modified the TMP_SALLOC macro in gmp-impl.h to print its allocation argument. I did this as I suspected that we sometimes invoke the SALLOC form inappropriately for huge allocation. Below is a sample output. We

Re: Stack allocation

2014-06-06 Thread Torbjörn Granlund
t...@gmplib.org (Torbjörn Granlund) writes: I modified the TMP_SALLOC macro in gmp-impl.h to print its allocation argument. I did this as I suspected that we sometimes invoke the SALLOC form inappropriately for huge allocation. After adding printing of __FILE__ and __LINE__

Re: Stack allocation

2014-06-08 Thread Torbjörn Granlund
I made the automated GMP nightbuilds use at most 512 KiB. Now I realise that the testsuite needs might both overestimate and underestimate the actual requirements. The overestimate will come from tests/mpn where we call functions outside their normal operand size envelope. Underestimation might

Re: Stack allocation

2014-06-08 Thread Torbjörn Granlund
I decided to lower the TMP_SALLOC limit to a bit under 2^15 from the previous 2^16. With that change and a couple of other allocaton changes, GMP's now using less than 300 KiB of stack. The nightly builds attempt to enforce this limit. Torbjörn Please encrypt, key id 0xC8601622

Re: Stack allocation

2014-06-09 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: t...@gmplib.org (Torbjörn Granlund) writes: I decided to lower the TMP_SALLOC limit to a bit under 2^15 from the previous 2^16. What's the a relative cost of allocation vs simple operations like mpn_add_n? For 2^15 limit, that's 512

Re: Stack allocation

2014-06-09 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: What stack usage do you get if you disable use of stack allocation? A good question. My measurements are blunt, using 'ulimit -s'. I don't know how to measure it accurately without instrumenting the code. I assume that GMP will use around 1 KiB,

Re: Segfaults during testing

2014-06-14 Thread Torbjörn Granlund
t...@gmplib.org (Torbjörn Granlund) writes: This function needs to be improved to avoid this silly recursion. I pushed an improved version. I'll try running the nightbuilds with much smaller stack size limit. Torbjörn Please encrypt, key id 0xC8601622

Re: mpn_add or mpn_add_n+MPN_COPY+MPN_INCR_U ?

2014-06-15 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Looks like mpn_add_n is an inline function defined in terms of __GMP_AORS, which does carry propagation differently. With an inline mpn_add, there's no good reason for it to have more overhead than mpn_add_N + MPN_COPY + mpn_incr_u. A

Re: mpn_sec_add_1_itch

2014-07-05 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Can we document that mpn_sec_add_1_itch(n) = n? I see no reason any implementation would need more scartch space, and this makes it possibly to skip the function call to the itch function if one is willing to always pass n limbs of scratch.

Re: mpn_sec_add_1_itch

2014-07-06 Thread Torbjörn Granlund
We seem to have forgotten to document that that operand overlap is permitted. Perhas you could fix that too? (mpn_mul_1 has it, copying to addmul_1 and submul_1 would probably be sufficient.) No overlap was intended there (I forgot the length 4 argument in the example). But I've

Testing on Darwin

2014-07-16 Thread Torbjörn Granlund
The GMP project currently have no means of testing our development sources on x86-64 Darwin. (Through emulation we have x86-32 Darwin.) Do you have a Mac OS system where you could allow automated GMP testing to take place? Please contact me at my private email address in that case. Torbjörn

Re: backport patches for gmp-4.x.x

2014-07-28 Thread Torbjörn Granlund
Rongqing Li rongqing...@windriver.com writes: Torbjorn Granlund: we am using the gmp-4.2.1, but compiling gmp failed on mips64 and 64bit userspace, I see the patch 12418[(mips, powerpc): Provide assembly-free umul_ppmm for newer gcc] can fix it. but it is under GPLv3; Could you

Dynamic libs on Windoze

2014-08-07 Thread Torbjörn Granlund
While investigating a test failure (bikodos64.gmplib.org-bobcat-stat:64) on one of our test Windoze machines, I noticed that only the static lib build resulted in failures, while the shared lib build went OK. Since the bug should have affected both, something strange was going on. Apparently,

Re: Best way to carry on 2-input architecture?

2014-08-17 Thread Torbjörn Granlund
Both I and Niels have looked into ISAs which support GMP operations well. My work is available here: https://gmplib.org/~tege/fisa.pdf You're right that umulhi and addition as well as subtraction with carry/borrow are critical operations. And for multiply throughput is more important than low

Re: mpn_perfpow: do we need a special GCD for mp_bitcnt_t?

2014-09-04 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: A compromise might be to add something like #if MP_BITCNT_T_MAX MP_LIMB_T_MAX #error mp_limb_t too small !? #endif close to the code which assigns an mp_bitcnt_t to an mp_limb_t. At least, this documents the assumption made.

Re: mpn_perfpow: do we need a special GCD for mp_bitcnt_t?

2014-09-04 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I've had a look at x86_64/gcd_1.asm, to try to extract an gcd_11.asm from it. Before entering the main binary gcd loop, the code needs to: 1. Remove and keep track of common trailing zeros, 2. Remove trailing zeros on one of the

Re: mpn_perfpow: do we need a special GCD for mp_bitcnt_t?

2014-09-05 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: For a start, let's look at what's needed by gcd_1. It would look something like (untested): mp_limb_t mpn_gcd_1 (mp_srcptr up, mp_size_t size, mp_limb_t vlimb) { mp_limb_t ulimb; ASSERT (size = 1); ASSERT (vlimb

Re: gcd_11

2014-09-05 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: t...@gmplib.org (Torbjörn Granlund) writes: I don't think gcd(0,0) is possible to define. At least Knuth claims it is convenient to set gcd(0,0) = 0, TAoCP, 4.5.2. Then gcd(u, 0) = |u| for *all* integers u. Also PARI/GP seems to follow

Re: reference

2014-09-30 Thread Torbjörn Granlund
Zimmermann Paul paul.zimmerm...@inria.fr writes: http://eprint.iacr.org/2014/755.pdf, see Fig. 1 page 17. I took a quick glance. They compare against GMP and GMP Optimised. Note that GMP here is some undefined precompiled variant, perhaps 32-bit and surely not the corrrect compile for

Broad valgrind run

2014-10-30 Thread Torbjörn Granlund
I decided to test running the nightly GMP runs under valgrind. We've run valgrind manually for many configurations, but never over the entire set if test configurations. I expect https://gmplib.org/devel/tm-date.html to have a few spurious failures, but let's be prepared for some genuine

Re: Broad valgrind run

2014-11-05 Thread Torbjörn Granlund
t...@gmplib.org (Torbjörn Granlund) writes: I decided to test running the nightly GMP runs under valgrind. We've run valgrind manually for many configurations, but never over the entire set if test configurations. I expect https://gmplib.org/devel/tm-date.html to have a few spurious

Compiling GMP with clang

2014-11-29 Thread Torbjörn Granlund
I've added clang as an alternative compiler for the nightly GMP builds. The results are not encouraging, we're triggering many problems. For some reason, clang purports to be gcc. This makes GMP and other packages expect it to work as GCC. Unfortunately, it does not. I've accomodated the

Re: error handling

2014-12-17 Thread Torbjörn Granlund
[Moved thread from gmp-devel.] I'd like to think of error handling also in a C perspective, and consider a few more problems at the same time. Which sources of exceptions do we have currently? 1 We divide by 0 to generate a SIGFPE. I think we do that for division by zero as well as some

Re: error handling

2014-12-17 Thread Torbjörn Granlund
t...@gmplib.org (Torbjörn Granlund) writes: [Moved thread from gmp-devel.] Oops, s/from/to/ -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: Additional memory handler features.

2015-01-03 Thread Torbjörn Granlund
You insist on discussing on a quite abstract level, ignoring Niels' requests for a bit of concreteness. Tell you what, memory allocation is required by any non-trivial library. If this in your opinion breaks the do one thing, and do it well rule, then you won't like any software, not even hello

Re: Adding support for R6 of MIPS architecture

2015-02-02 Thread Torbjörn Granlund
Steve Ellcey sell...@imgtec.com writes: #if __mips_isa_rev 6 multu $8,$7 #else mulu$11,$8,$7 muhu$12,$8,$7 #endif are not working. I guess I things more like: ifdef(`ISA_REV6',` mulu$11,$8,$7 muhu$12,$8,$7 ',`

Re: Adding support for R6 of MIPS architecture

2015-02-06 Thread Torbjörn Granlund
Steve Ellcey sell...@imgtec.com writes: OK, so what I did was to create a mips32r6 directory under mips32 and a mips64r6 directory under mips64 and put copies of the routines that had to be changed for r6 in those directories. I have done test builds for various MIPS targets and verified

Re: Adding support for R6 of MIPS architecture

2015-02-03 Thread Torbjörn Granlund
Marc Glisse marc.gli...@inria.fr writes: Apparently not, the motivation for the patch is that multu has disappeared... Then I see no other robust approach than making mpn/mipsnomultu_64 (or somesuch). Well, an analogous robust approach would be moving the dmultu code into mpn/mips64/dmultu

GMP web

2015-01-15 Thread Torbjörn Granlund
We today enabled the SPDY protocol on GMP's web servers. The result is surprisingly significant; the site feels much zippier. There is nothing users need to do, they only need a reasonably current web browser. Ref: https://en.wikipedia.org/wiki/SPDY -- Torbjörn Please encrypt, key id

Re: Memory barrier for fat initialization

2015-01-16 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: x86_64 and arm are the most important. OK, those are interesting to GMP as well. What's Nettle's intended fat granularity? I assume existence of instructions is sometimes enough, e.g., if AES hardware support exists then it seems safe to assume

Fat arm support

2015-01-17 Thread Torbjörn Granlund
It is possible to avoid parsing /proc/cpuinfo, and instead call getauxval. This works on most systems, but my A15 system does not seem to have the auxv.h include file. #include stdio.h #include sys/auxv.h #include asm/hwcap.h #define T(FEATURE) \ do {

Re: Memory barrier for fat initialization

2015-01-14 Thread Torbjörn Granlund
Which architectures do you intend to target for fat nettle builds? I really would want to move GMP towards fattyness for all current platforms. Unfortunately, this is not easy, and the problem is neither writing the actual code (fat.c, fat_entry.asm) nor getting memory ordering right. The real

Re: Fat arm support

2015-01-18 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Hmm, I had a quick look in asm/hwcap.h and the getauxval manpage. As far as I see, this doesn't provide any information on the architecture version. How do you get that, is there some other type constant for getauxval which I'm missing? In

Re: Public mpn_divexact_1

2015-02-19 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: To make any progress, we could start with the easy ones. I'm looking into making mpn_divexact_1 public. I guess what it takes is only moving the prototype from gmp-impl.h to gmp-h.in, and document it (including an entry in NEWS). But

Re: Memory barrier for fat initialization

2015-01-13 Thread Torbjörn Granlund
This article summarises things well. The x86/AMD64 indeed only reorders load before stores (except under the obscure OOStore mode). http://www.rdrop.com/users/paulmck/scalability/paper/ordering.2007.09.19a.pdf There is some reasoning about the weaker ordering of some SSE instructions, as well

Re: Memory barrier for fat initialization

2015-01-14 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Sounds doable for the sqr_basecase threshold, at least. On the other hand, on x86_64, maybe all chips we care about have the needed extensions, so it's *easy* to add an mfence or sfence instruction and not have to worry? I guess 32-bit x86 is

Re: Memory barrier for fat initialization

2015-01-13 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: GMP's fat initialization (I'm looking at the x86_64 code now) ends with *((volatile int *) __gmpn_cpuvec_initialized) = 1; I suspect that it's possible (but unlikely) that a different thread on another cpu may read

Re: Memory barrier for fat initialization

2015-01-13 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Fat gmp library, cpuvec not yet initialized. Several threads, on different cpus, call the CPUVEC_THRESHOLD macro at about the same time. That's the scenario. When the first of those threads get to setting __gmpn_cpuvec_initialized, the magic

Re: Memory barrier for fat initialization

2015-01-13 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I'm a bit confused about this. Then when are memory barriers (mfence and friends) ever needed? I have a pretty vague idea about how memory models work in both theory and practice. I'm thinking about something like: cpu0 for some reason has

Re: Memory barrier for fat initialization

2015-01-13 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I've tried to look in the intel architecture manual, but I can't find any obvious place where this ordering guarantee is described. It is not well-documented. That makes me wonder even more when mfence (or in particular, sfence) is ever

Re: Adding support for R6 of MIPS architecture

2015-02-11 Thread Torbjörn Granlund
Steve Ellcey sell...@imgtec.com writes: I think the old assembly code should be tweaked for r6 in a slightly deeper way. Two extra move instructions in a critical loop isn't OK. The mips code you started with is seriously out-of-date, with over-scheduling of load; this ought to be

Re: Adding support for R6 of MIPS architecture

2015-02-12 Thread Torbjörn Granlund
I think mpn/alpha/addmul_1.asm might serve as a better starting point than the mips64 lo/hi code. That code is simple enough, yet OK for pipelined in-order and out-of-order cores. I will take a look at that. On second thought, the top-level alpha code is overscheduled, at least

Re: fast inversion

2015-05-19 Thread Torbjörn Granlund
There are new build failures which seem related to this change. -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: fast inversion

2015-05-18 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: The new code is faster for n==1, slower for 2 = n = 4, and faster (more than twice) for n = 16. Nice speedup! In mpn/x86_64/fastsse/com.asm we have an mpn_com which will speed things up another 2x. It is not enabled on any platforms now as it needs

Re: fast inversion

2015-05-18 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: @shell ~/gmp-repo$ tune/speed -s 1-1030 -f 2 -c mpn_neg mpn_com You might want to pass -p100 or somesuch to allow the CPU to speed up. (We might want to change the default, not sure to what.) -- Torbjörn Please encrypt, key id 0xC8601622

Re: Broad valgrind run

2015-06-07 Thread Torbjörn Granlund
Perhaps our private red-zones are obsolete now that we have valgrind and also vaeious memcheck compiler features? I am not familiar with the valgrind annotations you're using and what benefits they might bring. It would be nice to not require code annotations, as these take time to write, debug

Re: Broad valgrind run

2015-06-07 Thread Torbjörn Granlund
I tried adding -fsanitize=address to the default options on the system ivyubu64v1504 (access via shell.gmplib.org as usually). A 64-bit build passed all tests, but alas, a 32-bit build fails two tests: make[4]: Entering directory '/var/tmp/gmp-obj/otmp/tests/mpq' FAIL: t-get_d FAIL: reuse The

Re: Broad valgrind run

2015-06-09 Thread Torbjörn Granlund
t...@gmplib.org (Torbjörn Granlund) writes: I tried adding -fsanitize=address to the default options on the system ivyubu64v1504 (access via shell.gmplib.org as usually). A 64-bit build passed all tests, but alas, a 32-bit build fails two tests: make[4]: Entering directory '/var/tmp

Re: Broad valgrind run

2015-06-09 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: What's the required alignment? x86/README refers to http://www.sco.com/developer/devspecs, which is mostly dead. At function entry, esp mod 8 = 4 should hold. This means that one should exlicitly allocate 4 (mod 8) bytes if one makes a call.

Re: Anomaly in mpn_sqrtrem and mpn_rottrem

2015-06-08 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: t...@gmplib.org (Torbjörn Granlund) writes: Surely, we could make mpn_rootrem run faster, in particular for small arguments. But also for large arguments, 2x slowdown seems like a lot. I've had a quick look. Both mpn_dc_sqrtrem

Re: Anomaly in mpn_sqrtrem and mpn_rottrem

2015-06-08 Thread Torbjörn Granlund
t...@gmplib.org (Torbjörn Granlund) writes: For mpn_rootrem, i.e., a^(1/k), I suppose we could make the work take O(M(log(a^(1/k as opposed to the current O(M(log(a))), where M(n) is the time for an n-bit multiply. We'd need to make use of mpn_pow_1_highpart. This will be a lot

Re: Anomaly in mpn_sqrtrem and mpn_rottrem

2015-06-09 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: To compare them I wrote a quick and dirty specialization of the rootrem algorithm for the k==2 case (use sqr instead of mul, lshift instead of mul_1...) Cool! We should probably use sqr whenever possible, perhaps it is worth having a condition for this in

Re: Anomaly in mpn_sqrtrem and mpn_rottrem

2015-06-09 Thread Torbjörn Granlund
t...@gmplib.org (Torbjörn Granlund) writes: I am not sure that div_q vs divrem matters a whole lot for this usage (i.e., 2n/n sizes). Apparently wrong. The first column is n, division is 2n-limb by n-limb: mpn_divrem mpn_div_qratio 50.0009

Re: Anomaly in mpn_sqrtrem and mpn_rottrem

2015-06-10 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I also had a quick look at the math, and I realized (some of you surely knew that already) that floor(sqrt(a)) is mostly independent of the lowest half of the bits of a. Some of us indeed knew that... And this generalises to kth roots. It is

Re: Quality for binary distributions

2015-06-21 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Here's a common scenario where make check may fail to detect problems. Say a distributor compiles gmp, runs make check, and builds some type of binary package, which is distributed to users. For supported platforms (i.e., x86), fat builds are

Symbol hiding and unit tests

2015-06-21 Thread Torbjörn Granlund
I am finally doing something about hiding internal GMP symbols for the shared gmplib.{so,lib,dll}. The machanism is essentially to use GCC's __attribute__((visibility(hidden))), which in turn make use of features of the object file formats. The good effects of this are: 1. The symbols cannot be

Re: Quality for binary distributions

2015-06-22 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: A generic (I mean, not recognised) CPU_TYPE should hit all the C versions, doesn't it? Or we can also have different C sources compiled for different CPUs? (e.g. selected by JACOBI_BASE_METHOD ?) Maybe testing at least the generic CPU should hit almost any

Re: mpn_zero_p

2015-06-24 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: It will be available from the next release. As the manual says about low-level functions: No size argument may be zero. That certainly isn't true in general, at least not for the entire set of mpn functions. I think it might be a mistake to make functions

Re: mpn_zero_p

2015-06-24 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: In general, when supporting zero sizes is an O(1) cost compared to an O(n) or bigger cost of the function, I may agree. On average mpn_zero_p will return after the first branch (on random data, the first limb we check is non-zero). Supporting zero sizes

Re: mpn_zero_p

2015-06-24 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: mpn/x86_64/fastsse/com.asm does support zero size with an initial testn, n jz L(don) while neither the generic C function for the library ( mpn/generic/com.c ), nor the inlined version in gmp-impl.h does. I took a quick look at

Re: mpn_zero_p

2015-06-24 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: No, according to a comment in gmp-impl.h about MPN_COPY_DECR /* Copy N limbs from SRC to DST decrementing, N==0 allowed. */ Silly me...and that explains the com_n.asm therein; it was made from copyi.asm. (Most of my copyi.asm patch is OK, though. I just

Re: mini-gmp

2015-06-15 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Should make it easier to reach the goal of being able to build gcc using mini-gmp for its modest bignum needs. This makes a lot of sense from the perspective of GCC. It might hurt GMP in an awkward way because GCC developers will no longer tend

Re: Anomaly in mpn_sqrtrem and mpn_rottrem

2015-06-12 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Hmm. Or maybe this is stupid. I could stop insisting on using a full size inverse (so that A / x or E / x can be computed as a *single* mulhi), and instead work with a half-size inverse, so that the quotient is computed in two steps. Then

Re: Anomaly in mpn_sqrtrem and mpn_rottrem

2015-06-13 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: Ciao, Il Ven, 12 Giugno 2015 9:04 am, Torbjörn Granlund ha scritto: We might want to look into the plain division-free sqrt(A) = A*sqrt(1/A) approach before implementing a tricky division sqrt(A). We can try improving the current implementation

Anomaly in mpn_sqrtrem and mpn_rottrem

2015-05-28 Thread Torbjörn Granlund
Paul Zimmermann has pointed out to us that mpn_rootrem is sometimes faster than mpn_sqrtrem, specifically when not requesting the remainder. When requesting the remainder we have anomalous behaviour too, but in the other direction: mpn_sqrtrem mpn_rootrem.2 mpn_rootrem.3 1

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-06 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: 4095 #2112866.692407388.572453779.52 4096 #1849300.022377781.132416991.81 4097 #2144198.042376456.382492709.77 4098 #1853563.192379253.132469931.56 ... yes, I know, we really need to improve also odd

Re: Symbol hiding and unit tests

2015-07-06 Thread Torbjörn Granlund
The good effects of this are: 1. The symbols cannot be reached from outside of the shared lib. 2. The internal references are resolved at library creation time instead of at application startup time 3. The internal references do not need an indirection (via a PLT in the

Re: Anomaly in mpn_sqrtrem and mpn_rottrem

2015-06-11 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: As far as I see, a plain Newton iteration on won't produce an x' inverse with 4n bits of accuracy. One would need aither two iterations, or some other trick, maybe mixing in some interpolation of x' - x. I don't think one should struggle in

Re: Anomaly in mpn_sqrtrem and mpn_rottrem

2015-06-11 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Here's a sketch with some more details. I've tried to work out both sqrt(B^{n-1} A) and sqrt(B^{n-2}). To my surprise, they seem independent, not mutually recursive. Before I try to understand the rest of your reasoning: What is B? It's not

Re: GMP and clang bugginess

2015-05-22 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: Do we have any working configuration for the x32 ABI? It works. We had testing of it on a Gentoo system until perhaps a year ago. The reason for the ivydeb32v7.gmplib.org-stat-clang-clang++:x32 failure is that clang apparently accepts and ignores -mx32.

Re: GMP and clang bugginess

2015-05-25 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: GMP triggers bugs in clang on every platform where we tried this compiler. It looks like it almost works on x86, except for failures with the (obscure?) x32 ABI. The clang on FreeBSD 10 miscompiles GMP on for some x86 CPU subtypes.

Re: GMP and clang bugginess

2015-05-25 Thread Torbjörn Granlund
Marc Glisse marc.gli...@inria.fr writes: Now I've found it (and reported https://llvm.org/bugs/show_bug.cgi?id=23646 ). Note that the same (?) instruction is spelled differently in the same file: bc+ 12, 28, L(9) vs. blt+cr7, L(24) (there is also a mix of

Re: GMP and clang bugginess

2015-05-25 Thread Torbjörn Granlund
Marc Glisse marc.gli...@inria.fr writes: On powerpc-linux-gnu, clang complains about the bc+ instruction, and indeed I can't find that in IBM's documentation. https://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.alangref/idalangref_bcbr_inst.htm (The + sign manipulates

Re: GMP and clang bugginess

2015-05-25 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Harsh against whom? The point is not to make a statement, but to make it more likely that GMP works correctly for our users. It's going to look very much like you're making a statement, whether or not that's your intention. Please don't

Re: GMP and clang bugginess

2015-05-25 Thread Torbjörn Granlund
Marc Glisse marc.gli...@inria.fr writes: Now I've found it (and reported https://llvm.org/bugs/show_bug.cgi?id=23646 ). Note that the same (?) instruction is spelled differently in the same file: bc+ 12, 28, L(9) vs. blt+cr7, L(24) Note that the former form

Re: GMP and clang bugginess

2015-05-25 Thread Torbjörn Granlund
Marc Glisse marc.gli...@inria.fr writes: bc+ 12, 28, L(9) vs. blt+cr7, L(24) Note that the former form works with clang 3.5 installs. A 3.6 regression? Indeed... One may debate what is a valid instruction form. I suppose one needs to read the specs for

Re: GMP and clang bugginess

2015-05-26 Thread Torbjörn Granlund
I added clang 3.5 and clang 3.6 testing to a Breadwell system. We got one new build failure, and a handful new check failures. I suspect the steamroller failures are a real hardware compatibility problem. I suspect the build failure is due to plain (Intel NUC) hardware without ECC, or Linux

GMP and clang bugginess

2015-05-21 Thread Torbjörn Granlund
GMP triggers bugs in clang on every platform where we tried this compiler. Some configs work, though. To see how bad it is, please take a look here: https://gmplib.org/devel/tm-date.html I think we would help our users by making it hard to use clang with the next release. What do you think?

Re: fast inversion

2015-05-21 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: But it is not an inline function, it's a macro redefining mpn_com, it will not conflict with the prototype __gmpn_com. (I hope ;-) Thanks, it seems to have helped. I suppose this bug means that we didn't really provide mpn_com in the public interface.

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-07 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: https://gmplib.org/repo/gmp/rev/87ba695c8878 But I do not really like it. We alloc-copy-shift to add a dummy limb, then we call a code that allocs-copies-shifts to (virtually) add two more dummy limbs... There is still a very noticeable difference

Re: sqrt algorithm

2015-08-13 Thread Torbjörn Granlund
Marco Bodrato bodr...@mail.dm.unipi.it writes: Ciao, On Wed, August 12, 2015 2:03 pm, Torbjörn Granlund wrote: I tested this approach for sqrlo_basecase too, you can find the code enclosed by #ifdef SQRLO_SHORTCUT_MULTIPLICATIONS But I'm not sure it is faster, so

Re: sqrt algorithm

2015-08-13 Thread Torbjörn Granlund
Current implementation of both mullo and sqrlo do write n limbs only, possibly by full 2n product in a temporary area followed by MPN_COPY. Doing an MPN_COPY in *_basecase is of course not allowable for efficiency reasons. IIRC someone proposed the interface mullo(res, x, y, n,

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-08-19 Thread Torbjörn Granlund
Marco Bodrato bodr...@mail.dm.unipi.it writes: Before the changes I just pushed, I simply reordered the steps in the loop to shorten the first and the last iteration in the loop... Resulting in even better performance, I presume? How much speed difference is there now, for k = 4 vs

New Intel Skylake support

2015-08-18 Thread Torbjörn Granlund
The GMP development sources now support Intel's new processor family Skylake which debuted last week. This CPU sets speed records for almost every GMP inner loop, as well as for GMPbench. We haven't tweaked any GMP loops for the CPU yet, but the pre-existing Haswell and Broadwell code already

Re: mpq_cmp_z

2015-08-19 Thread Torbjörn Granlund
Marco Bodrato bodr...@mail.dm.unipi.it writes: Should we specialise code in mpq_cmp so that it is faster when a denominator is 1? Then write: Absolutely worth considering. mpq_cmp_z (mpq_srcptr q, mpz_srcptr z) { static const mp_limb_t dummy = 1; mpq_t qz; SIZ(NUM(qz))

Re: mpq_cmp_z

2015-08-20 Thread Torbjörn Granlund
Marco Bodrato bodr...@mail.dm.unipi.it writes: Maybe I've found a good sharing strategy... I'll take a proper look later. One idea which could perhaps avoid some branch is if you accepted a den_size argument for the maybe function. (We might consider adding mpf_cmp_z too, at least in a

Re: mpq_cmp_z

2015-08-21 Thread Torbjörn Granlund
Marco Bodrato bodr...@mail.dm.unipi.it writes: Maybe we can promise the right type, by adding an explicit cast? SIZ((mpz_srcptr) NUM(op2)) Except that we should cast op2, not NUM(ops). I am not sure Marc's reasoning is accurate, nor am I suggesting it is not, I've forgotten this level of

Re: sqrt algorithm

2015-07-29 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Hmm, but if we shift to make the *root* normalized, that also means that the input will always be an even number of limbs. Not entirely sure that's good, in particular for smallish sizes. But it ought to simplify some things. I've seen code

Re: sqrt algorithm

2015-07-29 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: It could likely be made to work if the input is properly normalized, does the current code do that? I kind-of dislike having to normalize numbers up-front, but it might well be a good thing for sqrt. First, it provides better accuracy for a

Re: sqrt algorithm

2015-08-05 Thread Torbjörn Granlund
Marco Bodrato bodr...@mail.dm.unipi.it writes: Do we need a sqrlo_basecase? The DC version of sqrlo would use a full squaring and a single mullo, so that the base_cases for sqrlo_dc are sqr_basecase, mul_basecase and mullo_basecase. A sqrlo_basecase would surely speed things up, at least

Re: sqrt algorithm

2015-08-07 Thread Torbjörn Granlund
Marco Bodrato bodr...@mail.dm.unipi.it writes: I pushed an implementation for mpn_sqrlo, it is based on mpn_mullo. I pushed also a primitive mpn_sqrlo_basecase, based on sqr_basecase. The range where it is faster than sqrmod is narrow: tune/speed-cs 1-9000 -f2 mpn_sqr mpn_sqrlo

Re: sqrt algorithm

2015-08-14 Thread Torbjörn Granlund
Marco Bodrato bodr...@mail.dm.unipi.it writes: I prefer 'do while', whenever the initial branch can be saved. That makes sense. I agree with saving variables which are simultaneously alive. In this code, n and i are not simultaneously alive. I like symmetry, eg: h += up[n] *

Re: sqrt algorithm

2015-08-12 Thread Torbjörn Granlund
Marco Bodrato bodr...@mail.dm.unipi.it writes: We have an explicit example of this: INV_MULMOD_BNM1_THRESHOLD is typically larger than the MULMOD_BNM1_THRESHOLD, the latter is only used internally . OK. These are widely apart, the quotient between them is 3 on average.

Re: 3-prime FFT

2015-07-16 Thread Torbjörn Granlund
paul zimmermann paul.zimmerm...@inria.fr writes: on https://hal.archives-ouvertes.fr/hal-01022383, page 27, table 16, the 3-prime FFT implemented in Mathemagix is faster than GMP for 2^23 to 2^25 bits. Apparently, they do, but just about 10%... Not impressed. :-) We have small primes

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-15 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: But I spotted % k and / k there, and those are very expensive, unless you table inverses of k, of course... There is already a single (there are two currently, but we can avoid one) division by k in the code, it is used to compute the first single-bit

Re: sqrt algorithm

2015-07-20 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: To make this fast, we need some variant of divappr_q which don't require any of the uninteresting low limbs. Or alternatively, resurrect the notion of fraction limbs. I would say that padding out the numerator and use divappr_q won't be the

Re: gmpbench update

2015-11-12 Thread Torbjörn Granlund
Joe keane writes: When can we get numbers for 6.1.0? Most numbers on the report page were measured close to the release. -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org

  1   2   3   4   5   >