Re: mpz_t caching
Vincent Lefevre <vinc...@vinc17.net> writes: > In 2014, Patrick Pelissier (in Cc) implemented a mpz_t allocation > cache for MPFR, redefining mpz_init and mpz_clear, in order to > avoid some deallocations/allocations (via the indirect call to > the allocation functions) when mpz_t's cleared and init'ed again > a bit after. I've attached the patch that was applied to MPFR. I take it this was done to improve performance? Do you have any information on what applications benefitted, and some numbers for typical performance improvement? If I understand the patch correctly, the idea is essentially to keep your own free-list for the limb storage pointed to by _mp_d? If you beat libc's malloc, and by how much, is going to be quite system dependent. > Note: A drawback is that it may be necessary to free the caches at > the end of the program so that tools like valgrind don't complain. And there are also thread-safety issues; currently GMP leaves all such problems to the provided allocation function, while a cache would need some thread-local storage (or locks, but I guess that's highly undesirable if the idea is to gain performance). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Lazy mpz allocation
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: >> I'd like a sort of "copy-on-write" interface, that allocates the needed > > I looked into the code and realised that my proposal requires an > MPZ_REALLOC or if(ALLOC(x)==0) branch in ANY place where we write into an > mpz_t. E.g. mpz_combit starts with: > > if (limb_index + 1 < SIZ(x)) { > PTR(x)[limb_index] ^= bit; > return; > } Good point. I'd say we postpone the "copy-on-write" thing for now. And only support the following cases: 1. alloc > 0: Normal case, |size| <= alloc, reallocated as needed. 2. alloc == 0, size == 0: Initial lazy state, produced by mpz_init. Allocated on first write. 3. alloc == 0, size != 0: Allowed for input (read-only) mpz:s. Produced by mpz_roinit. Any attempt at writing the variable is invalid, and should call abort in all cases where checking for this case is cheap enough. One corner case is what should be produced by mpz_roinit with a zero input. After normalization we get size == 0. The intended meaning is that of case (3), but if it is misused as an output variable, it will be treated as case (2), and the allocated storage will leak. But I think it is acceptable to have some random bad things happen on misuse. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Pull request: Raise SIGFPE instead of abort in __gmp_exception`
Yichao Yu <yyc1...@gmail.com> writes: > We (julia) catches the SIGFPE generated by GMP and raise our own > error. Out of curiosity, what are you doing on error? Display a message and exit the process, or do you try to continue? At the very least, you could suffer memory leaks, and perhaps leave other gmp objects in an inconsistent state. Maybe it would be simple and helpful to add a global error callback pointer (where you could install a function looking up some thread-local storage, if you want something thread-specific). But robust cleanup from errors is quite tricky (it has been discussed on list from time to time, one possibility is to combine it with custom memory allocators with markers, and let the error handler longjmp out and clean up and deallocate transient storage). That said, it sounds reasonable to me to replace abort by raise(SIGFPE). A configure check may be needed; I think both abort and raise are ANSI C, but I suspect SIGFPE isn't. Depending on pthreads functions is undesirable. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Pull request: Raise SIGFPE instead of abort in __gmp_exception`
Yichao Yu <yyc1...@gmail.com> writes: > Hmm. What about `#ifdef` on `SIGFPE` and `abort` otherwise. This > should be a sane enough default for the even when the custom handler > is added. Sounds good enough. If noone objects, I'll make a patch to do that and try the patch approval procedure at work. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
mpq_cmp
t how and terminate. If (a == c || b == d) // A large gcd found Compare the un-equal pair of elements and terminate. // Compute full products. If practical, high limb first with early // termination. return sgn (a d - b c) Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Register gmp with pkg-config
Irina Gossmann <irina.gossm...@gmail.com> writes: > Is there any reason for not including a .pc file with gmp? Since it's not > registered with pkg-config, code depending on it won't build without some > addition fiddling (I'm on Linux and OSX, not sure if/how it works > differently on other platforms). I guess the .pc file has to be generated by configure, to get file names right. In a different project, I did it like this: https://git.lysator.liu.se/nettle/nettle/commit/6e371fbfa7107fcd2f89e45693d099e6cd98a0a1 (There might be some helpers with libtool or automake; I don't know those tools too well). pkg-config support should also be documentated, somewhere close to the section https://gmplib.org/manual/Autoconf.html#Autoconf (generated from doc/gmp.texi). > If you have a way of accepting contributions from outside, I can contribute > a pkg-config file. Couldn't find a howto for contributors on the website, > would've sent a patch directly otherwise. Please send patches to this list. For non-trivial changes, we ask for a Free Software Foundation copyright assignment. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Pull request: Raise SIGFPE instead of abort in __gmp_exception`
Yichao Yu <yyc1...@gmail.com> writes: > If I read the patch[1] correctly, the divide by zero is nor performed > if SIGFPE exists. You're right, in that case it skips the divide by zero and goes directly to abort. >> It's possible to add som OSX workaround, say >> >> #if defined (SIGFPE) && defined (MAC_SOMETHING) >> >> But I'd prefer to do that only if there's any real problem. After all, > There is. Can you suggest and test a workaround of that form? > I think the right way is to conditionally (on osx) use `pthread_kill` > (and link to pthread). A link dependency on pthread is highly undesirable. *Maybe* if it can be hidden from users. > I think another difference that worth clarifying between what we want > and what is currently assumed in gmp is that we don't really want the > program to terminate if there's some arithmetic error. Using a signal handler for that sounds brittle to me. I would strongly recommend checking for invalid operands (like, if (mpz_sgn(d) == 0) julia_divide_by_zero_error()) before calling gmp functions which might divide by zero. Allocation failures are a differnt a more difficult matter, but there it's probably better to longjmp out of the allocation function than to raise a signal. > Of course, using a callback to report the error would be even better. > However, as you (or someone else) mentioned, having such API would > probably imply that GMP should still be in a consistent state after an > error is thrown (and handled) But you have very similar consistency issues when using a signal handler. I'm afraid it's going to be a bit brittle. Or are you using some thread local state in the allocation functions, so you can deallocate temporary storage when terminating the thread? > Adding such API and refine it later would still be strictly > better than the current situation though. To me it would make sense with something like mp_set_error_handler (void (*handler)(enum gmp_error err)); where the current __gmp_exception is the default handler. Still tricky to safely use a handler which doesn't terminate the program, but at least cleaner and more portable than using signal handlers. In particular, mixing signals and longjmp really makes me uneasy... Not sure what the other developers think? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Pull request: Raise SIGFPE instead of abort in __gmp_exception`
Yichao Yu <yyc1...@gmail.com> writes: > However, if I understand correctly, this may cause a regression on OSX > x64 for multi-threaded programs since the signal may be delivered to > any thread instead of the current thread. I don't fully understand the consequences, but if raise ever returns for any reason, __gmp_exception will go on and divide by zero, and if that doesn't cause program termination, it will next call abort. Posix seems pretty clear that raise must deliver the signal to the thread executing raise. http://pubs.opengroup.org/onlinepubs/009695399/functions/raise.html. And the C standard says at least that "If a signal handler is called, the raise function shall not return until after the signal handler does." http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1570.pdf It's possible to add som OSX workaround, say #if defined (SIGFPE) && defined (MAC_SOMETHING) But I'd prefer to do that only if there's any real problem. After all, the typical action is that the program is terminated, and if a signal handler is installed, it typically just adds a friendlier error message before termination. > Unconditionally trying the old approach first should solve this issue. You mean, first try divide by zero, and in case that doesn't crash the program, raise a signal? If we want the specific signal SIGFPE, then I think using raise is preferable, because it has a reasonably well defined meaning in C. While division by zero is undefined behaviour. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Pull request: Raise SIGFPE instead of abort in __gmp_exception`
Yichao Yu <yyc1...@gmail.com> writes: >> Patch checked in now. > > I guess I'm not familiar with hg/gmp work flow. Is checked in > different from commited to the repo? I don't see the change appearing > in the repo[1] and my local fresh clone doesn't have it either. > > [1] https://gmplib.org/repo/gmp/ Maybe there's something wrong with the mirroring to the public repo? It's missing my checkin and a previous one of Torbjörn's, compared to what I see with ssh access to shell.gmplib.org/var/hg/gmp. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: getrusage vs clock
t...@gmplib.org (Torbjörn Granlund) writes: > I believe we already use clock_gettime(2) directly on GNU/Linux, which > is the underlying syscall that clock(3) uses in newer GNU libc versions. Last time I looked, one had to run speed with -o cycles-broken to force using clock_gettime. Maybe it would make sense to use cycle counter and clock_gettime together? But I have no idea how the "supplemented by" logic works. The problem with the cycle counter on linux is that it isn't saved and restored on context switch. clock_gettime is the supported interface for getting high precision per-process cpu time. And it's news to me that the old clock function uses the same timers in recent glibc. But it sounds nice. What's the value of CLOCKS_PER_SEC? Does it match the actual resolution? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpn_redc_n
paul zimmermann <paul.zimmerm...@inria.fr> writes: > at the request of Niels I re-raise the issue at > https://gmplib.org/list-archives/gmp-devel/2015-March/003945.html, > i.e., the fact that redc_1 and redc_2 do not have the same convention > than redc_n. When is a good time to try to merge my old bdiv changes? I don't even recall the details, but IIRC, the log message is "New bdiv convention, R B^{qn} = U + Q D", and I think the bdiv return value is also the carry out from that addition. In ~nisse/hack/gmp-bdiv on shell. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Move -DNO_ASM to config.h?
Marc Glisse <marc.gli...@inria.fr> writes: > when configure wants to define a macro, it usually puts it in > config.h. We do have one exception: NO_ASM, which ends up defined in > CFLAGS. Was there a particular reason for this choice? I'm greping for its uses, and it is used only in C files as far as I can see. Moving it to config.h makes sense to me. If we move it, we have make sure that config.h is always included before longlong.h. Most files get config.h via gmp-impl.h. So, e.g., #include "longlong.h" followed by #include "gmp-impl.h" would break. I think the autoconf manual recommends an explicit include of config.h (guarded by HAVE_CONFIG_H) first in every C file. We don't have to do it that way of course, just be aware that we depart from that convention. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Lazy mpz allocation
t...@gmplib.org (Torbjörn Granlund) writes: > I forgot about that we need to have an explicit denominator. > We could surely point that to some static read-only data, but that would > of course incur a cost when "re-allocating". I'm not familiar with mpq internals, but I guess it might be possible to get away with leaving the denominator as zero on initialization. Then one would need to follow the convention that if numerator is zero, then the denominator is ignored. Or (but I guess that's more cumbersome) adapt the convention that zero denominator means "this is an integer", and treated as one. /nisse -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpn_invert_limb
paul zimmermann <paul.zimmerm...@inria.fr> writes: > would it be possible to add mpn_invert_limb in the public interface? > It would be very useful for GNU MPFR. Makes sense to me, it's a useful and well-defined operations, and discussed at length in various papers. So should be quite easy to document. As far as I understand, it's not always compiled into the library, though. Internally the code uses the macro invert_limb, which expands to a call to mpn_invert_limb only if there exists a native implementation, otherwise, the macro expands to a call to udiv_qrnnd. Not sure if there's any problem with always making mpn_invert_limb a real function and call it explicitly. In the cases where it expands to udiv_qrnnd, it's probably so slow that an extra level of function call shouldn't matter much. Looking at longlong.h, there appears to be some cases where udiv_qrnnd is in turn defined as mpn_invert_limb + udiv_qrnnd_preinv (not entirely clear when that might happen, seems possible only for alpha, arm, cray, and ia64). And there's some room for cleanup in longlong.h, for example a few #if 0, and some constants (UMUL_TIME and UDIV_TIME) which are no longer used, as far as I can see, and most recently mentioned in ChangeLog back in 2002. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Add mpz_inp_str to mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > I haven't looked at mini-gmp's mpz_set_str, but I assume it is O(n^2), > so not much is lost by using a trivial implementation like the one > above. It tries to be O(n) when base is a power of two. Which I think is a reasonable tradeoff. I was a bit surprised and disappointed when I found out (a few years ago, no idea if it's still the case) that the base-256 conversion in libtomcrypt, used by the dropbear ssh implementation which is popular on embedded systems, was O(n^2). And one use-case for mini-gmp is constrained embedded systems. > (I don't think we should accomodate spaces in the numbers, since that's > considered a mis-feature of the main GMP.) Then the documentation should perhaps be more explicit and discourage use of that feature. Speaking of documentation, the docs for mpz_inp_str isn't clear at all on what input characters are considered to terminate the input. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Add mpz_inp_str to mini-gmp
Austyn Krutsinger <akrutsin...@gmail.com> writes: > I think the approach is simple enough, but the possible > downside is that the entire string has to be read into memory before being > passed to mpz_set_str, That's no big deal for mini-gmp, it's not intended to be used for huge numbers, or make a lot of effort to be as efficient. Which of course is a judgement call. Austyn Krutsinger <akrutsin...@gmail.com> writes: > According to current documentation mpz_set_str > <https://gmplib.org/manual/Assigning-Integers.html#index-mpz_005fset_005fstr> > allows spaces within the string and mpz_inp_str > <https://gmplib.org/manual/I_002fO-of-Integers.html#I_002fO-of-Integers> > only allows preceding white space. Torbjörn, had mentioned whitespace in an > inputted string is a mis-feature in the main GMP. Just curious what the > intended function of GMP actually is with respect to strings. I'm not sure how mpz_inp_str is intended to behave. If it's acceptable to read everything until white-space or EOF is encountered, that makes things a lot simpler. After a quick look at the code (in mpz), my understanding was that it should terminate at the first non-digit. I.e., for a file containing 0123456789, calling mpz_inp_str with base 10 would read the entire file, but calling mpz_inp_str with base 0 (or 8) would read until it sees the '8', ungetc that character, and convert only the digits preceding '8'. > Additionally, the 6.1.1 documentation says that the mpz_str_ functions > should allow for bases from 2 to 62. Currently mini-gmp only allows for > bases from 2 to 36. If there is any intent on allowing mini-gmp to support > strings with base from 2 to 62, I will write up a patch and submit for > review. Extending support up to base == 62 sounds reasonable to me. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Add mpz_inp_str to mini-gmp
Austyn Krutsinger <akrutsin...@gmail.com> writes: > As I have it now, the only code that is duplicated between my mpz_inp_str > and mpz_set_str is the removal of any leading white spaces between in the > string. I think that duplication is fine. One could add a helper function to eat whitespace, but I don't think that really makes things simpler. I'd like to hear Torbjörn's opinion on using white space as terminator, i.e., reading a file containing "12345678 " with base 8 will result in an error, which I suspect differs from the behavior of mpz_inp_str in the real gmmp. Some more detailed comments further below. > I have updated mini-gmp to support up to base 62 also. Please do this as a separate patch. > I'm really not sure of the optimal way to > allocate then realloc memory for the input buffer. I suppose some kind of > statistical analysis of the size of strings read in from file streams would > be ideal, but since I don't have such statistics, I have arbitrarily chosen > the sizes you see. I think typical cases will be quite small. I'd suggest an initial size of 100 characters, and then increasing the size by a factor of 3/2 at a time. Possibly with some arbitrary upper limit to fail in a predictable way for a malicious input file. > int > mpz_inp_str (mpz_t x, FILE *stream, int base) > { > int c; > size_t nread = 0; > size_t size = 1000; > > if (stream == 0) Either "!stream", or "stream == NULL". > stream = stdin; > > char *buf = (char *) gmp_xalloc (size); The (char *) cast is unnecessary (unless we try to support compilation as C++, do we?). > /* Skip leading whitespace */ > do > { > c = getc (stream); > } while (isspace (c)); > > /* Read input until end of file or a space character */ > while ((c != EOF) && (isspace (c) == 0)) I think "!isspace (c)" is clearer. In in the case c == EOF, I think one ought to also have an if (ferror (stream)) { gmp_free (buf); return 0; } Not sure how the real gmp handles i/o errors in mpz_inp_str, but returning success seems dangerous. > { > if (nread >= size - 1) This looks prone to off-by-one errors (even if I believe it is correct). Maybe let size be the maximum number of digits, with an actual allocation of size+1 to accomodate the terminating NUL character? > { > /* Increase input buffer size */ > size_t old_size = size; > size += 100; I think a multiplicative increase is appropriate. E.g., "size = 3*size/2;". > buf = (char *) gmp_xrealloc (buf, old_size, size); You don't need to bother with passing old_size, mini-gmp is documented to always pass 0 for this argument (see mini-gmp/README). BTW, the restriction to base <= 36 is also documented there. > } > > buf[nread++] = c; > > c = getc (stream); > } > > buf[nread] = '\0'; /* Ensure null terminated string */ "NUL terminated". > ungetc (c, stream); Not sure if ungetc is safe when c == EOF? > c = mpz_set_str(x, buf, base); Don't reuse the c variable, use a separate int res. > gmp_free (buf); > /* 0 on error else number of characters read excluding null-terminator */ Drop " excluding...", or change "null-" to "NUL ". > return (c == -1 ? 0 : nread); > } Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Add mpz_inp_str to mini-gmp
Austyn Krutsinger <akrutsin...@gmail.com> writes: > I am not sure if there was a reason FILE streams were not implemented > before, but if there is I'd be interested in hearing why. mpz_out_str is included in mini-gmp, so it would make sense to also include the companion function mpz_inp_str. However, for mini-gmp, we need something considerably simpler than the code in mpz/inp_str.c. Which isn't entirely trivial. Ideally, it should just read the number and pass it on to mpz_set_str. But to detect the end of the number, it first has to parse any base prefix (like "0x"), and then look for the first non-digit. And it's no good to have a lot of code duplication with mpz_set_str. Without thinking very long about it, maybe one could get something reasonable by extracting the digit logic of mpz_set_str to a helper function. We could also consider omitting support for base == 0 for the mini-gmp version of mpz_inp_str, if that makes things simpler. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: increase mini-gmp base support to 62
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > Applied. But I changed again the code that checks the base. I used the > following: > > digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"; > if (base > 1) >{ > if (base <= 36) >digits = "0123456789abcdefghijklmnopqrstuvwxyz"; > else if (base > 62) >return NULL; >} > else if (base >= -1) >base = 10; It's documented that base = 0 means base 10. But should |base| = 1 also imply base 10? Would make more sense to me to treat that as an error. > else > { > base = -base; > if (base > 36) >return NULL; >} /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpn_sqrtrem{1,2}
Adrien Prost-Boucle <adrien.prost-bou...@laposte.net> writes: > Maybe the availability of SSE / AVX / NEON etc instruction sets can be > checked at compilation time? That's what configure (and its helper scripts) does. With --enable-fat, we also have runtime detection on certain systems. > The ASM version would be very easy to obtain: > compile sqrtrem1 and sqrtrem2 (an FP implementation) on the right > machine and keep the ASM. Or if these are small and easy functions, write them by hand. > There would be no dependency on libm. > How difficult would it be to add such checks in GMP code? It's not that hard to add new assembly files. You need to know some assembly, of course. And be aware that gmp uses m4 to preprocess assembly source files. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: increase mini-gmp base support to 62
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > But the proposed patch adds constants that enlarge (too much in my > opinion) both the source and the binary of this small-size implementation. I also prefer the conciser version without the large table. > I attach an alternative patch. Please revise it. Looks good. Only detail I don't quite like is the variable name "lowercaseoffset". Maybe "value_of_a" (since it's the value of the digit 'a')? Or at least add an underscore or two, lowercase_offset. Thanks, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Problem with gmp_randinit_set
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > The problem is that Niels' code, mine, and the current mpz code... do all > "reduce" modulo (2^19937-20023) obtaining some non-canonical > representation. I was thinking that one should convert to canonical representation at the end of the powering. But if the current code doesn't do that, staying compatible would need extra care. And it rules out the simple way of just using mpn_powm. (And then I realized that one has to produce a canonical representation also for the initial reduction, since that uses a different modulo, p-4). > If we "do not want to generate different sequences than earlier GMP", we > will have to mimic current behaviour, even in the corner cases... I think that should be doable, if we want to. Without thinking too deeply about it, it looks like your variant with shift should be equivalent to the current mpz code, maybe mine too (it essentially combines the shift with the addmul_1 call). > Is it worth doing? I think it's desirable to eliminate the dependency on mpz. How important it is not change the seed --> sequence mapping, I don't know. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Recent changes to mpn_get_str/mpn_set_str
t...@gmplib.org (Torbjörn Granlund) writes: > I am not sure sliding-window powering makes much sense for powering in Z It could reduce the mumber of mpn_mul_1 calls, but (as always) leave the number of squarings unchanged. > (as opposed to in a ring). But Z *is* a ring ;-) > For radix conversion, we arrange things so that the bases are never > small, so the plain mpn_pow_1 should be very hard to beat. Ah, then sliding window makes less sense, since there will be no interesting single-limb powers to tabulate. And we probably don't want so large table entries that mpn_mul wins over repated mul_1. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz reuse test takes too much time
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: >> Now we have two possible strategies: >> >> 1. Call mpn_gcdext (B, A), which produces T. And compute S as >> >> S = (G - T B) / A > > This means computing both cofactors, right? Yes. >> 2. First do a division up front, >> >> B = Q A + R >> >>and call mpn_gcdext(A, R). We then get a cofactor S', of m limbs. But >>we also need the other cofactor T', because the wanted cofactor S is > > This also means computing both cofactors, don't it? I think you're right. We compute S' and T' (gcdext after initial division), and then construct S. So I didn't say that we compute T. But in fact, T' = T, since T = (B/G)^-1 (mod A/G), T' = (R/G)^-1 (mod A/G), and B = R (mod A). Right? > I assume that the best way to compute both cofactors is the one currently > implemented in mpz_gcdext. It's best under the theory think that a few large multiplications at the end is more efficient than building up the other cofactor incrementally. But might not be true in all cases... But we should probably reorganize mpn_gcdexp first, before evaluating other strategies regarding the larger cofactor. The divide-and-conquer thing should be done slightly different, and that might make it cheaper to get the other cofactor at least for large operands. >> And a third strategy could be to extend mpn_gcdext to support A < B, and >> hence compute the larger cofactor directly. But I'd guess that would be >> more work, since gcdext uses a quadratic algorithm for much larger sizes >> than for multiplication and division. > > Actually mpn_gcdext supports A<B, but it requires to zero-pad A so that > its length in limbs is not smaller than the length of B. Maybe the way mpn_gcdext handles this can be more or less unchanged. But we shouldn't have to zeropad the input. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: GMP work on symbol visibility
ni...@lysator.liu.se (Niels Möller) writes: > Is this 6 year old post still valid? Hmm, here's a related gcc bug with is marked as fixed, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51880, also marked as a dup of the old bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19520 Ian refers to. > I guess it would also be useful with some function __attribute__ > expressing that "I don't give a damn whether or not pointer comparison > for address(es) to this function work according to the C standard". Anyone else who thinks that would be a good idea? I'm considering filing a bug report/feature request to gcc. Anyway, as to using protected, it seems it's a bit too hairy for its own good. But why doesn't symbol aliases with hidden visibility have the same issues? Say we have an internal symbol mpn_foo_internal with visibility "hidden", and an alias mpn_foo with visibility "default". Then I think the desirable behaviour is that pointers to mpn_foo and mpn_foo_internal are equal if and only if the executable doesn't provide a different implementation of mpn_foo. But I have no idea how that really works. And since symbol aliases aren't C standard, one could argue that the standard's rules for pointer equality don't apply. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: GMP work on symbol visibility
t...@gmplib.org (Torbjörn Granlund) writes: > There are several complex issues I need to understand and resolve. One > issue is how to handle (unit) testing of hidden functions. Do we have a lot of functions which (1) you'd want to hide, and (2) there are unit tests for? I see two simple solutions: Either do those unittests only against the static library. Or use visibility "protected". As I understand it, "protected" does about what we want: Makes sure internal references don't go via PLT / and GOT. There are some drawbacks: It doesn't shrink the symbol table, which would be an issue if we have lots of "protected" symbols, and IIRC, I think Ian said that glibc's ld.so was poorly optimized for this case, so protected symbols could make library load slower than using "default" visibility. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Add pkg-config support
Michael Catanzaro <mcatanz...@gnome.org> writes: > Right... using AC_CHECK_LIB has been considered primitive for a long > time now. I was really surprised to see GMP doesn't already have pkg- > config file. I can't imagine why you wouldn't want to support it, since > it's just one file to install. I don't say we don't want to support it, even if I'm not as enthusiastic about it as you are. Also, in my opinion, the main reason to have pkg-config is to make things simpler for projects which are *not* using autoconf, but just want to add CFLAGS=`pkg-config bla bla` directly in their Makefile or build script. > I took a quick peek at the documentation format but it doesn't look > straightforward to update, Source format is texinfo, in the file doc/gmp.texi. If you want to work with GNU packages, you should get at least somewhat familiar with it. > It's optional. One popular convention is that you never break API > without changing the name of the pkg-config file, and adding an API > version there accomplishes that. Are conventions and best practices documented somewhere? > I like it since it's a nice way to ensure > application developers know an API break has occurred, if API breaks > are rare (as they seem to be for GMP). That's tricky to make useful, you have to think carefully about what the number means. There's ABI breaks, API breaks, and compatible extensions, all tricky to capture in a single number. I think I'd prefer to not have any version number here. And if a program needs to test for a particular gmp feature, I *strongly* recommend doing traditional autoconf-style functional tests for the features that are actually needed. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: GMP work on symbol visibility
t...@gmplib.org (Torbjörn Granlund) writes: > We should never forget that the most important aspect of our tests is to > catch miscompilatilons. If we build static and shared libraries from the same object files (I suspect automake and libtool doesn't do that for us now), testing the static library should detect miscompilations. But not mislinking, including broken link-time optimizations. And even if we don't do that, I would hope that the risk that a miscompiled object file is caught by unit tests of internal functions, but not by the tests of public gmp functions, is pretty low. To make an informed decision, I think it would help to compile a rough list of the functions to hide. > http://www.airs.com/blog/archives/307 Is this 6 year old post still valid? The runtime linker could build some index to avoid extra walks over symbol tables. I guess it would also be useful with some function __attribute__ expressing that "I don't give a damn if pointer comparison for address(es) to this function work according to the C standard". > I wonder if one could edit the .so file and make hidden symbols visible > again. Perhaps objcopy knows how to do that? The point of hidden is to completely remove the symbols from the symbol table in the shared object, right? So to get them back, I guess one would need to reconstruct GOT and PLT entries either from the object files or from debug info. Perhaps one can relink the .so file with some linker flag saying "ignore hidden"? Or simply collect the object files into a static library. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH] Add pkg-config support
Michael Catanzaro <mcatanz...@gnome.org> writes: > It would be desirable for gmp to install a pkg-config file. This makes > it much easier for other projects to link to gmp, especially projects > that use Autotools build systems. The GMP manual includes the recommended autoconf check. See https://gmplib.org/manual/Autoconf.html#index-Autoconf. If we want to support, or even encourage, pkg-config, this should be documented. > diff -r 31d28753a4e1 -r 95850e2b3e2e Makefile.am > --- a/Makefile.am Thu Jun 02 15:34:47 2016 +0200 > +++ b/Makefile.am Mon Aug 29 08:33:58 2016 -0500 > @@ -113,6 +113,8 @@ > EXTRA_DIST = configfsf.guess configfsf.sub .gdbinit INSTALL.autoconf \ >COPYING.LESSERv3 COPYINGv2 COPYINGv3 > > +pkgconfigdir = $(libdir)/pkgconfig > +pkgconfig_DATA = gmp-6.pc I find the "-6" odd. By which convention should there be a version number in there? > --- /dev/null Thu Jan 01 00:00:00 1970 + > +++ b/gmp-6.pc.in Mon Aug 29 08:33:58 2016 -0500 > @@ -0,0 +1,11 @@ > +prefix=@prefix@ > +exec_prefix=@exec_prefix@ > +libdir=@libdir@ > +includedir=@includedir@ > + > +Name: gmp > +Description: A free library for arbitrary precision arithmetic > +URL: https://gmplib.org/ > +Version: @VERSION@ > +Libs: -L${libdir} -lgmp > +Cflags: -I${includedir} A curious question: If and how does the pkg-config machinery add linker flags like -R or -Wl,-rpath, ? These are needed if using shared libraries installed in some location not included in the default ld.so search path. E.g., say I configure gmp with --prefix=/pkg/gmp/6.1.1/x86_64, without adding the corresponding lib directory to /etc/ld.so.conf or similar. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Flurry of GMP check failures
t...@gmplib.org (Torbjörn Granlund) writes: > But this is odd: At least Debian (didn't have a chance to check Gentoo > yet) does not ban R_X86_64_64 which we use frequently from GMP's > assembly for references to tables of constants. It bans R_X86_64_32S > which we use for tables of function pointer offsets. Hmm, didn't openbsd pioneer this pie-by-default thing some year ago, and there was a similar oddness with supported reloc types? > To me, this discrepancy makes very little sense; either the resulting > user binaries will not be PIE or they will require load-time reloc > patching. IIRC, the motivation was that a 64-bit reloc is large enough to *always* be pathchable to point to the right place at load time, while a 32-bit reloc might be too small, depending in the ultimate location of the target. (If I haven't misunderstood what these relocs are; I have a very fuzzy understanding of the details). And if the code is expected to be pic and sharable at loadtime, neither reloc type should be used in the text or rodata segment; while R_X86_64_64 is perfectly fine for tables placed in the data segment. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Flurry of GMP check failures
t...@gmplib.org (Torbjörn Granlund) writes: > This fails: > $ gcc sym.c -c -fno-pic && gcc sym.o m.c I guess we never need to pass -fno-pic to gcc, we should either pass -fpic, when building shared library, or no pic-reaetd flags at all, when building static libraries and executables. Right? Problem is that when -fpic is on by default, assembly files need to use pic conventions too. And to get this right automatically we'd need configure to detect whether or not pic is on by default. Or change the x86_64 assembly files to always use pic. Is there any penalty to that? My understanding is that pic is somewhat of a pain with 32-bit x86, due to lack if addressing modes using pc + offset, but quite nice and easy on x86_64. > I'm migrating from Debian to Gentoo for GMP's testing, so unless > Debian's approach is gaining momentum in the GNU/Linux world, we could > just ignore this. If debian is following ubuntu, I think we have to say there already is some considerable momentum at work. For reference, in Nettle, everything is built as pic by default since the 1.11 release back in 2004. The motivation back then was the use case that an interpreter, e.g., pike, loading it's crypto support code dynamically using ldopen("Nettle.so"), and the corresponding .so file then linked statically to libnettle.a. To get a non-pic static library, one has to configure with --disable-shared --disable-pic, but I haven't heard about anyone actually doing that. At this point in time, I think use-cases involving static libgmp.a (or libnettle.a) are somewhat obscure. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Flurry of GMP check failures
t...@gmplib.org (Torbjörn Granlund) writes: > 2. Something is broken wrt Debian sid from the past few weeks. >It seems they want static lib code to be PIC. Or rather, PIC code for all executables, to enable randomization of the mapping of the text segment into the process address space. I think the rationale is documented here: https://wiki.debian.org/Hardening. Changes to default gcc seem to be described at https://wiki.debian.org/Hardening/PIEByDefaultTransition It's not entirely clear if and how it's intended to affect builds which don't involve debian-specific tools, maybe something is broken. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Flurry of GMP check failures
Marc Glisse <marc.gli...@inria.fr> writes: > Uh? I have 2 very natural use cases for libgmp.a: > > 1) to build a static (or at least with few dependencies) executable > that I am going to send to someone else [...] > 2) I am rebuilding GMP on my specific platform in order to get the > best possible performance. [...] I see. In these cases, is it important whether or not libgmp.a uses pic code? Regards, /Niels I might as well use a static libgmp to > squeeze that extra bit of speed out of it (I might even want to use > LTO in that case, although IIRC it currently causes some trouble in > configure). > > Note that I do not care about security in either case. -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_gcd_ext(NULL, ...)
t...@gmplib.org (Torbjörn Granlund) writes: > I suppose it also implies a slight computation overhead > to check size + low limb for the return value. Slight enough to be reasonably for the mpz-layer, I think. One could consider doing that only when the gcd pointer is NULL, and otherwise return some arbitrary value. But that's not a pretty interface to document... > The type of the gcd function change, which breaks e.g. user function > pointers to these functions. You're right, I forgot about that use case. > This sort of things probably below here: > https://gmplib.org/devel/incompatibility.html Agreed, and it's not obvious what's the right interface either. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_gcd_ext(NULL, ...)
t...@gmplib.org (Torbjörn Granlund) writes: > This sort of things probably below here: > https://gmplib.org/devel/incompatibility.html Done. I also added an entry for _mpz_newalloc, while editing that file. www-update failed, though: [nisse@shell /var/hg]$ ./www-update Checking out www. destination directory: www updating to branch default 201 files updated, 0 files merged, 0 files removed, 0 files unresolved Compressing .html and .txt. This can take a minute. ssh: connect to host www.gmplib.org port 22: Operation timed out rsync: connection unexpectedly closed (0 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(226) [sender=3.1.2] /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > Of the remaining https://gmplib.org/devel/mini-gmp-status.html issues I > worry most about #2. Marco adjusted the parameters to make it faster, > but I remain unconvinced that g5.gmplib.org-dyn:32 really needed 400 > seconds for the test with old parameters. I suspect the code can hang > for certain seeds. (We cannot tell the seed because of #1.) If it happens again, the seed should be printed out. https://gmplib.org/repo/gmp/rev/5abbd164e2a3 > I am not sure #8 (alpha/dupont) is for real. I'll try to log in to alpha-gentoo and reproduce. > #15 is strange, I haven't tried to understand why these libgcc link > errors happens for just certain similar configs. The direct links to examples are dead. Do these system normally have LD_LIBRARY_PATH set? If so, maybe something wrong when gmp:s .lib is prepended to the path. > I am glad the "red" bugs had one cause (although a serious one). It > would have been much worse with many errors of this kind. (Well, the > results for end users are same same, but it matters for how to assess > code quality.) When the dust has settled, we'll have to think about whether we should make a "mini-gmp release" with bug-fixes, or just send out an announcement. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
New mini-gmp macro GMP_CMP?
I was looking at how to simplify mini-gmp's mpz_cmp_si. I considered using mpz_cmpabs_ui to handle the case of both inputs negative, which made it more consice but perhaps not clearer. Then I came up with the following. The pattern (un == 0) ? 0 : u->_mp_d[0] and variants are replaced by calls to mpz_get_ui which does the same thing, and the pattern (a > b) - (a < b) is replaced by a GMP_CMP macro. Resulting context patch below, reducing the code size by a dozen of lines. Do you think this is also an improvement in clarity? /Niels *** /tmp/extdiff.RPnb3V/gmp.afda1bbf3ee3/mini-gmp/mini-gmp.c2016-11-22 09:18:29.455237339 +0100 --- /home/nisse/hack/gmp/mini-gmp/mini-gmp.c2016-11-22 08:54:54.290548487 +0100 *** see https://www.gnu.org/licenses/. */ *** 70,73 --- 70,75 #define GMP_MAX(a, b) ((a) > (b) ? (a) : (b)) + #define GMP_CMP(a,b) (((a) > (b)) - ((a) < (b))) + #define gmp_assert_nocarry(x) do { \ mp_limb_t __cy = (x);\ *** int *** 1778,1784 mpz_sgn (const mpz_t u) { ! mp_size_t usize = u->_mp_size; ! ! return (usize > 0) - (usize < 0); } --- 1780,1784 mpz_sgn (const mpz_t u) { ! return GMP_CMP (u->_mp_size, 0); } *** mpz_cmp_si (const mpz_t u, long v) *** 1795,1805 return 1; else /* usize == -1 */ ! { ! mp_limb_t ul = u->_mp_d[0]; ! if ((mp_limb_t)GMP_NEG_CAST (unsigned long int, v) < ul) ! return -1; ! else ! return (mp_limb_t)GMP_NEG_CAST (unsigned long int, v) > ul; ! } } --- 1795,1799 return 1; else /* usize == -1 */ ! return GMP_CMP (GMP_NEG_CAST (mp_limb_t, v), u->_mp_d[0]); } *** mpz_cmp_ui (const mpz_t u, unsigned long *** 1814,1821 return -1; else ! { ! mp_limb_t ul = (usize > 0) ? u->_mp_d[0] : 0; ! return (ul > v) - (ul < v); ! } } --- 1808,1812 return -1; else ! return GMP_CMP (mpz_get_ui (u), v); } *** int *** 1837,1849 mpz_cmpabs_ui (const mpz_t u, unsigned long v) { ! mp_size_t un = GMP_ABS (u->_mp_size); ! mp_limb_t ul; ! ! if (un > 1) return 1; ! ! ul = (un == 1) ? u->_mp_d[0] : 0; ! ! return (ul > v) - (ul < v); } --- 1828,1835 mpz_cmpabs_ui (const mpz_t u, unsigned long v) { ! if (GMP_ABS (u->_mp_size) > 1) return 1; ! else ! return GMP_CMP (mpz_get_ui (u), v); } -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > Venerable dupont (alphaev56 gentoo) seems to pass mini-gmp now. Let's > see if the previously consistently failing mini-gmp t-cmp_d now > consistently passes; then I think we can assume this to be fixed. Are there any of the changes which could if it now passes? The LD_LIBRARY_PATH fixes? /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
mini-gmp testing (was: Re: Help stabilising mini-gmp)
t...@gmplib.org (Torbjörn Granlund) writes: > We should at least make sure the algorithms' corner cases are exercised, > e.g., that large quotient are generated for Euclid's algorithm and that > remainders of +-epsilon are used for division. The mini-gmp testsuite is a bit hairy because it uses both gmp and mini-gmp. This works fine with linking, thanks to the name mangling of gmp symbols, but we can't include both gmp.h and mini-gmp.h in the same compilation unit, and we have to types called mpz_t which in general aren't compatible. Most of the current tests works by invoking a function which uses the real gmp to produce random inputs and a reference answer. E.g., (a, b, a+b), or (a, b, gcd(a,b)). These are then translated between gmp mpz_t and mini-gmp mpz_t via hex representation. I think this is reasonable for simple things like add and mul, but maybe not for the more complex things. In particular, for division and gcd(ext), where the result is easy to validate using multiplication and addition, there are different ways to do things. We need to call into gmp to generate inputs or pieces of inputs (because mini-gmp lacks mpz_urandom and mpz_rrandom). And we obviously want to do the operation under test using mini-gmp. But we can choose freely whether or not the code to produce inputs and validate outputs should use gmp or mini-gmp. One opposite way to organize the tests would be to have the main part of the test programs use the real gmp, i.e., include gmp.h rather than mini-gmp.h. And then do wrapper functions for the functions under test, e.g, a mini_mpz_gcd which accepts gmp mpz_t as inputs, converts to hex, passes on to a separate compilation unit which uses mini-gmp.h rather than gmp.h to convert the hex inputs to mini-gmp mpz_t, call the corresponding mini-gmp function, and convert back. That should make it easier to copy testcases from the real gmp. The main drawback I see is that testcases get less clear if there's a hairy wrapper between the the testcode and the function it's testing. Complex wrappers around generation of test inputs and validation of results somehow suits my taste better, but I have no good rational motivation for that. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_gcd_ext(NULL, ...)
Marc Glisse <marc.gli...@inria.fr> writes: > a user was asking if we could support calling mpz_gcd_ext with a NULL > first argument (the gcd), since they are only interested in the > coefficients s and t and would like to save the unnecessary > allocation. I doubt it would save that much, but it seems trivial to > add a check if(g!=NULL) similar to the tests for s and t. Does it make > sense to you? Sounds reasonable to me. But then I'd also consider adding a return value, returning one if the inputs were coprime (gcd == 1), otherwise zero. Would be useful for mpz_invert. If we add a return value, for consistency we should do it for mpz_gcd too, so one could define mpz_coprime_p as #define mpz_coprime_p(u,v) mpz_gcd (NULL, (u), (v)) (Adding an integer return value is a breaking ABI change in theory, but not really in practice, as far as I understand). It won't save much for the computation, only the final copy. But it's also a gain to clarity if callers don't have to pass in dummy, unused, result arguments. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Using MPZ_ROINIT_N in mpz sources
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > A single shared limb is my goal too, but it's a cost (comparable to a > function call?) if we must use a shared symbol for it. IIRC Torbjörn said > that the symbol hiding stuff might not be able to mitigate this cost, > because of the PIC-only policy of some Linux distributions... > On the other side, inlining a function with a static const var inside > means, I believe, replicating lots of statics variables everywhere... Sounds a bit hairy. If we need a function call, ideally, we should have only one, and that should be calling mpz_init... My understanding is that both data and rodata are loaded at a link time constant offset from the code, so the address of a *static* constant would be instruction pointer + link time constant offset. While the address of a global constant is a single load from the GOT, which in turn is located in the data sigment at a link-time constant offset from the code. So if we have x._mp_data = _limb; and %rdp points to x, that should be something like lea (%rsp, constant_limb), %rax move %rax, _mp_data_offset(%rdp) or mov (%rsp, constant_limb@GOT), %rax move %rax, _mp_data_offset(%rdp) for a static and externally visible constant, respectively. Do I get this right? > One to be used in the declaration phase: > mpz_t temp = MPZ_PREINIT(...); I was thinking of something like mpz_t temp = MPZ_INITALIZER; The nice thing about that is that it should work also for compound initialization, like struct { int x; mpz_t y; } foo = { 1, MPZ_INITIALIZER; }; I think it might be useful to eventually make public too. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Seeding in mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > Should seeds really be limited to an unsigned long and at the same time > to 4 bytes? Both limits seem unnecessary. It's platform dependent. On 64-bit machines, unsigned long is 64 bits (except windows...), and we read 8 bytes from /dev/random. Which ought to be enough. On 32-bit machines, using gmp_randseed_ui limits the seed to 32-bits. If that's a problem, we need to fix it; I just tried to keep things simple. > I haven't looked deeper into the code, but if there is a seed function > which accepts an mpz_t, then please consider using it instead. And then > follow GMP's example and read 6 bytes of random data from /dev/urandom. There's gmp_randseed. I guess we could use that. Why 6 bytes (48 bits), and not, e.g., 64 bits? Hmm, now I realize that it's quite important to be able to feed the same seed to both 32-bit and 64-bit builds; then it's no good to use gmp_randseed_ui with a platform-dependent range. I'll have to fix that. > +/* Unsigned long may be only 32 bits, and then a plain microsecond > + count would wrap around in only 71 minutes. So instead, xor > + microseconds with the most significant second bits, which are > + the least "random". */ > +return tv.tv_sec ^ (tv.tv_usec << 12); > > You probably need a cast there, else you'll typically end up with 32-bit > arithmetic there. (Or even better, use mpz_t here too.) For this piece of code, I don't think we have much use for more than 32-bits. Iff time_t (type of tv_sec) is 64-bit, we'll use the bits saying whether or not we're beyond 2038, which isn't particularly random. If we switch to using a 48-bit mpz_t instead, we could do this differently and get a better range. E.g, low 48 bits of 100*tv_sec + tv_usec would wraparound in almost 9 years, and tv_sec + tv_usec << 28 would also fill 48 bits and collide rarely. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > I re-enabled it again, expect hundreds of failures due to the ABI > adherence issue. It's not easy to suppress those. I think this small change should fix that, but I can't test it at the moment. --- a/Makefile.am Wed Nov 16 23:07:02 2016 +0100 +++ b/Makefile.am Thu Nov 17 07:12:08 2016 +0100 @@ -433,7 +433,7 @@ check-mini-gmp: MINI_GMP_DIR="$$abs_srcdir/mini-gmp" \ LDFLAGS="-L../../.libs" \ LIBS="-lgmp -lm" \ - CC="$(CC_FOR_BUILD)" EXTRA_CFLAGS="-g -I../.." check + CC="$(CC)" EXTRA_CFLAGS="$(CFLAGS) -g -I../.." check clean-mini-gmp: if [ -d mini-gmp/tests ] ; then \ Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > I realise that mini-gmp's requirement of GNU make hurts GMP's build > environment compatibility testing; until now, we have tried to use plain > 'make' on almost all systems. If mini-gmp now forces the entire build > to use GNU make, then that's bad. I agree. Is there any easy way to use GNU make only for make check-mini-gmp? Either in test scripts, or automatically in that target (the make used for the target itself doens't matter, it's the recursive make, currently using $(MAKE), that really needs GNU make. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > https://gmplib.org/devel/tm/gmp/check/failure/bwldeb64v7.gmplib.org-dyn-fat:64.txt That's t-powm. I can reproduce that one locally. GMP_CHECK_RANDOMIZE=1479273572 ./t-powm > So we might have 3 pending arithmetic bugs in mini-gmp (unless the gcd > one is a clang bug). I left while GMP_CHECK_RANDOMIZE=0 ./t-gcd ; do : ; done running over night, with no failures (64-bit x86_64, gcc-6). But I can reproduce the problem with the data from shell.gmplib.org-stat-clang35-clang++35-fat:32.txt, GMP_CHECK_RANDOMIZE=1479112190 ./t-gcd So this one isn't clang only. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > For those issues which result in a check/failure entry, I have marked > them as "expected" (by means of ~gmp/gmp/expected_check_failures). I > did that to make sure these failures do not hide any problems with GMP > itself. I tried to mark the mini-gmp t-limbs section with , but saving my changes from emacs failed, Cannot write backup file; backing up in /home/nisse/%backup%~ basic-save-buffer-2: Doing chmod: operation not permitted, /home/gmp/gmp/expected_check_failures > If you check in a fix for any of these problems, please strike the > entire item by using Please also clean up the > ~gmp/gmp/check/failure report files. How, more precisely? Just delete the corresponding files? E.g, shell.gmplib.org-dyn-gcc49-g++49:64.txt shell.gmplib.org-dyn-gcc49-g++49:64.txt.gz /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > I made that and another change to now get the test to run on the > majority of our systems. The other change was this one: > > https://gmplib.org/repo/gmp/rev/88fead579828 > > This makes sure linking knows the ABI. To make sure I understand the issue, this is to get flags like -m32 passed on to the linker? > I am not sure there are no side-effects conflicting with the author's > intended Makefile behaviour. I don't think so. But I think the right way is to simply add $(CFLAGS) to the linker command line. That's the usual way to use it, right? And omitting it in the linking rule in mini-gmp/tests/Makefile is a bug. > Another problem with the Makefile.am check-mini-gmp code has surfaced. > It overrides LD_LIBRARY_PATH (and DYLD_LIBRARY_PATH) for the entire > mini-gmp build. That includes gcc, which as you know is linked to GMP. > The result is compilation problems on at least one platform. Ah, that's an interesting side effect. To do something more specific to the tests, how should that be named? Should we adopt the EMULATOR variable (from automake, not sure if it's still in the current version)? Then one would set EMULATOR="LD_LIBRARY_PATH=... DYLD_LIBRARY_PATH=..." \ $(MAKE) ... when running mini-gmp tests. I think EMULATOR is already supported by the version of run-tests script which is in mini-gmp, it runs the test programs ("$1" is the executable to be run) using if [ -z "$EMULATOR" ] || head -1 "$1" | grep '^#!' > /dev/null; then "$1" $testflags else $EMULATOR "$1" $testflags fi > There is also a problem with t-signed on ppc64 machines. I noticed one of those failures, and it was't trivially reproducible on x86_64. Needs some investigation. The first failure was also a bit interesting, an internal compiler error with gcc-6. In file included from gmp/mini-gmp/tests/testutils.c:24:0: gmp/mini-gmp/tests/../mini-gmp.c: In function 'mpz_set_d': gmp/mini-gmp/tests/../mini-gmp.c:1647:3: internal compiler error: Aborted if (x != x || x == x * 0.5) ^~ https://gmplib.org/devel/tm/gmp/build/failure/sky.gmplib.org-dyn-fat-fake:64.txt Begs for a proper gcc bug report, I guess. I fail to log in to sky, via nshell, at the moment ("ssh -p localhost" on shell fails with connection refused). And this failure was on the non-virtualized host sky, right, not on one of the vms? /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > ni...@lysator.liu.se (Niels Möller) writes: > I don't think so. But I think the right way is to simply add $(CFLAGS) > to the linker command line. That's the usual way to use it, right? And > omitting it in the linking rule in mini-gmp/tests/Makefile is a bug. > > Since CFLAGS includes -Iblah it seemed like a uglier choice. Changed to pass CFLAGS when linking (according to GNU standards), and move -I flags to CPPFLAGS where they belong. > It seems "trivially" reproducible on ppc64 though, both real metal > versions and fake ones like ppceb-debv8 (and ppcel-debv8). I'll try to debug (if no one else beats me to it), but not today. > In file included from gmp/mini-gmp/tests/testutils.c:24:0: > gmp/mini-gmp/tests/../mini-gmp.c: In function 'mpz_set_d': > gmp/mini-gmp/tests/../mini-gmp.c:1647:3: internal compiler error: > Aborted > if (x != x || x == x * 0.5) > ^~ > > That's the LD_LIBRARY_PATH bug, I think. That could explain it (I was first thinking that "ICE" meant a failed assert in the compiler code, but I guess any SIGSEGV or similar shows the ICE message?). I just pushed an LD_LIBRARY_PATH improvement, we'll see if it reduces the number of failures. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
Marc Glisse <marc.gli...@inria.fr> writes: > After inlining, there are subtractions. check_si is called at least > once with oi = si + c (c is ±1). gcc simplifies the test si > si - 1 > to true, and warns that this optimization may break your program if > you rely on wrapping. I suspect this test program relies on it. With c == 1, si takes the values 2, 4, 8, ..., (1<<62), (-1<<63), and I guess the check gcc warns about is intended to detect overflow and act as a stop condition. And my adding of debug printf might have pushed the size of the function over some threshold so that it's no longer inlined, comparison not optimized away, and then the test succeeded. > The usefulness of such a warning is debatable, and we tend to drop > some of them from gcc when we think nobody will notice. Not sure. There are two dangers: Programs relying on undefined behavior, and optimization based on the assumption that there will never be any undefined behaviour. Anyway, I can now repro locally, by running make check CFLAGS="-O -Wall -g -fsanitize=undefined -fno-sanitize-recover" in the mini-gmp/tests source directory. This fails with t-signed.c:93:8: runtime error: signed integer overflow: -1 + -9223372036854775808 cannot be represented in type 'long int' FAIL: t-signed It would make sense to test both gmp and mini-gmp with -fsanitize=undefined. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > ni...@lysator.liu.se (Niels Möller) writes: > > I tried to mark the mini-gmp t-limbs section with , but saving my > changes from emacs failed, > > My bad, please try now. Worked better now. > How, more precisely? Just delete the corresponding files? E.g, > > shell.gmplib.org-dyn-gcc49-g++49:64.txt > shell.gmplib.org-dyn-gcc49-g++49:64.txt.gz > > Yes, if permission allows it. :-) Done, 6 files deleted. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > Should we implement (or adapt) something like the tests/devel/try thing to > widely test the basic operations in mini-gmp? I think devel/try is mainly for testing assembly code, to find register clobbering bugs and the like. For mini-gmp, I hope running the testsuite with a large number of random seeds should give good coverage, so we don't need any additional framework. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Reorganized mini-gmp/tests/t-signed.
ni...@lysator.liu.se (Niels Möller) writes: > Anyway, I can now repro locally, by running > > make check CFLAGS="-O -Wall -g -fsanitize=undefined -fno-sanitize-recover" > > in the mini-gmp/tests source directory. This fails with > > t-signed.c:93:8: runtime error: signed integer overflow: -1 + > -9223372036854775808 cannot be represented in type 'long int' > FAIL: t-signed I've checked in a fix for this. https://gmplib.org/repo/gmp/rev/6a2a9d2f639c Marco, would mind having a look at the reorganized test? I deleted checks at the end of the loop which I didn't quite understand, and which didn't quite fit with the new stop condition. The idea is to instead of checking for overflow by examining signed integer values, compute the mpz version first and use mpz_fits_slong_p to check if the computation on signed long makes sense. And I added tests to check that that function agrees with LONG_MAX and LONG_MIN. I also avoid even computing the signed values when they wouldn't fit, to make it possible to use -fsanitizer=undefined without complaints. It would be cool if -fsanitizer=undefined could cooperate with valgrind, and not complain immediately when there's an undefined value, but instead taint that value, and complain if it or anything which depends in it is ever used for a branch condition, memory access or system call. (And strictly speaking, that would make sense only for implementation defined behaviour, not undefied behaviour, I guess). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > There are currently 10 outstanding mini-gmp issues according to > <https://gmplib.org/devel/mini-gmp-status.html>, will be 9 if your > LD_LIBRARY_PATH change is effective. Is it up-to-date? I think the three red arithmetic bugs (t-gcd, t-powm and t-div) were all caused by the bug in mpn_invert_3by2, so hopefully fixed now. Or have you seen any new failures for those? /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > 2. The latest Gentoo doesn't support -fno-sanitize-recover. I suppose >it works without it? As I understand it, the point of -fno-sanitize-recover is to make programs crash on the spot when there's some undefined operation. Not sure what the behaviour is without that, if the program might complete normally with just logging of problems, or if detected problems still makes it exit with non-zero status. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Help stabilising mini-gmp
t...@gmplib.org (Torbjörn Granlund) writes: > I created a web page for tracking the mini-gmp problems: > > https://gmplib.org/devel/mini-gmp-status.html From that page: (P2) The seeding triggered by GMP_CHECK_RANDOMIZE is apparently based on seconds since epoch (with some funny skew). This is not good enough, as tests which run at the same time will get the same seed. Seed is currently time(NULL) + getpid(), so a loop like while GMP_CHECK_RANDOMIZE=0 ./t-foo ; do : ; done will typically use an incrementing sequence of seeds, and not repeat until the pid space wraps around. Main gmp tests seem to use /dev/urandom if available, falling back to gettimeofday if available, otherwise only the time function (no pid). For mini-gmp tests, it was my intention that tests should be able to run standalone from the gmp tree (assuming gmp and GNU make is installed on the system, just "make -C tests check" in a mini-gmp source tree should work). So it shouldn't depend on any configure tests, and ideally should use only standard C functions, or the most mainstream functions outside of the standard. (And depending on getpid is not ideal... but a significant improvement over only time(NULL)). Attempting to open /dev/urandom seems very reasonable to me. If we do that, the remaining question is how elaborate a fallback we need if opening fails. If really needed, we could also propagate certain gmp configure values in the make check-mini-gmp target, e.g., HAVE_GETPID and HAVE_GETTIMEOFDAY. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Using MPZ_ROINIT_N in mpz sources
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > I do not remember if we require C99 to compile the library... > > Can we use MPZ_ROINIT_N inside our sources to improve readability? e.g. For what it's worth, IIRC the non-constant initializer used by MPZ_ROINIT_N has been supported by gcc more or less forever, at least since 2.3 from the mid 1990s. And it would make some sense to me to require c99 or gcc. > Moreover, can we slightly abuse it to obtain a static lazy init w/out > calling the _init function? E.g. If we want to do that, I think we should have a separate macro for the initializer (and preferably point to some shared read-only limb). Or make mpz_init inline? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz reuse test takes too much time
t...@gmplib.org (Torbjörn Granlund) writes: > Target operands which are not also input operands are now overwritten > with garbage via the CLOBBER macro. (It would make sense to do this in > almost every other GMP test file.) > > I split your gcdext check macros into 2-operand and 3-operand variants. > The old macros checked res3 even when it was not part of the > computation, exhibiting possibly undefined behaviour. > > OK with you to commit? Looks good to me, go aehad. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz reuse test takes too much time
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: >> ! #define GCDEXT_CHECK2(i1, i2) do { \ >> !mpz_gcdext (res1, res2, NULL, i1, i2); \ >> !MPZ_CHECK_FORMAT (res1);\ >> !MPZ_CHECK_FORMAT (res2);\ >> !if (mpz_cmp (ref1, res1) != 0 || mpz_cmp (ref2, res2) != 0) \ >> ! FAIL2 (mpz_gcdext, i1, i2, NULL); \ >> ! } while (0) > > Should we check both mpz_gcdext (res1, res2, NULL, i1, i2); and mpz_gcdext > (res1, NULL, res2, i2, i1) in this macro? Hmm, you changed the code to not only allow G to be NULL, but allow NULL S too? I think that's unnecessary. As you say, mpz_gcdext(G, NULL, NULL, A, B) is better done as mpz_gcd(G, A, B), and mpz_gcdext(*, NULL, T, A, B) is equivalent to mpz_gcdext(*, T, NULL, B, A). (Maybe in the latter case, return value can differ in some corner case, so I'm not sure it's 100% equivalent, but in general swapping the inputs A and B implies swapping of the S and T outputs). So allowing NULL S adds more cases to implement and test, with no gain in functionality. > Since we currently support NULL also for the first argument, should we > check also the cases > mpz_gcdext (NULL, x, y, x, y); > mpz_gcdext (NULL, y, x, x, y); > mpz_gcdext (NULL, s, y, x, y); mpz_gcdext (NULL, y, t, y, x); > mpz_gcdext (NULL, s, x, x, y); mpz_gcdext (NULL, x, t, y, x); > mpz_gcdext (NULL, NULL, y, x, y); mpz_gcdext (NULL, x, NULL, x, y); ^ > mpz_gcdext (NULL, NULL, x, x, y); mpz_gcdext (NULL, y, NULL, x, y); ? ^ Maybe, except for the above two. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz reuse test takes too much time
Marc Glisse <marc.gli...@inria.fr> writes: > On Sat, 3 Dec 2016, Niels Möller wrote: > >> Hmm, you changed the code to not only allow G to be NULL, but allow NULL >> S too? > > That was already the case before my patch. Sorry for the confusin, I had forgotten it was your change. > Since s and t may get swapped depending on the values of a and b, you > cannot really support t==NULL without supporting s==NULL. You may > prefer not documenting it though... I think that's an implementation detail that shouldn't be documented. Can we just strike it from the docs? (And possibly back it up with an ASSERT at function entry). In the case mpn_gcdext(G, S, NULL, A, B) with size(A) < size(B), IIRC we currently handle that by swapping arguments, calling mpn_gcdext to compute T (even though the caller didn't need that, and produce S with an multiplication using multiply + exact division. That doesn't seem ideal. The alternative would be to do an up-front division, and call mpn_gcdext with inputs A, B mod A, and some different postprocessing (I don't have the cofactor stuff in my head at the moment, it should be somewhere between easy and trivial, and likely involving the quotient floor (B/A). One also needs to keep in mind that if |A| < |B|, then typically |T| < |S| too. Which could influence algorithm choice if the size difference |is large. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Changes for GMP 6.1.2
t...@gmplib.org (Torbjörn Granlund) writes: > Whatever is safer. > > I think GMP isn't jealous, it can handle being less lazy than mini-gmp. I think it will be easier for us to keep track if we copy head mini-gmp (except the ChangeLog file which now needs special handling). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz reuse test takes too much time
t...@gmplib.org (Torbjörn Granlund) writes: > Please check code spacing and . vs , in comments. Done and checked in. > Shouldn't res1, res2, res3 be overwritten with some fixed garbage when > they are not also input operands? This remains to do, as well as updating the corresponding mini-gmp test. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz reuse test takes too much time
ni...@lysator.liu.se (Niels Möller) writes: > In the case > > mpn_gcdext(G, S, NULL, A, B) > > with size(A) < size(B), IIRC we currently handle that by swapping > arguments, calling mpn_gcdext to compute T (even though the caller > didn't need that, and produce S with an multiplication using multiply + > exact division. I've looked a bit more at this, and it's not at clear to me what strategy one should use. The cofactors satisfy G = S A + T B, where S is the same order of magnitude as B/G, and T is of the same order of magnitude as A/G. For simplicity, start with the case of small G so we don't have to care about its effect on the sizes, and assume that B is n limbs, A is m limbs, and n > m. Then we'd also expect S to be n limbs and T to be m limbs. Now we have two possible strategies: 1. Call mpn_gcdext (B, A), which produces T. And compute S as S = (G - T B) / A that's one n x m multiply for T B, and one (n + m) / m exact division. 2. First do a division up front, B = Q A + R and call mpn_gcdext(A, R). We then get a cofactor S', of m limbs. But we also need the other cofactor T', because the wanted cofactor S is S = S' - Q T' That's one (n - m) x m multiply. To get T', we have to use T' = (G - S' A) / R Which is a m x m multiply and an 2m / m exact division. I had a quick look at also substituting Q = (B - R)/A into the equations, but that didn't lead to any simplifications. (And we don't count the cost of the initial division, bacause it will be needed for strategy (1) too, done by mpn_gcdext). So the question is, which cost is smallest, MUL(n, m) + DIVE(n + m, m), or MUL(m, m) + MUL(n - m, m) + DIVE(2m, m) ? With subquadratic multiplication, I'd expect MUL(n, m) < MUL(m, m) + MUL(n-m, m) or am I confused? But DIVE(n + m, m) > DIVE(2m, m), since the quotient is larger. And a third strategy could be to extend mpn_gcdext to support A < B, and hence compute the larger cofactor directly. But I'd guess that would be more work, since gcdext uses a quadratic algorithm for much larger sizes than for multiplication and division. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Test compile: mpn_lshift_com optimization 2
Richard Biener <rguent...@suse.de> writes: > This test intoduced with GMP 6.1.2 FAILs with malloc debugging > because it appearantly relies on malloc returing zeroed memory? > Or it tests the wrong array elements? Can you please clarify which test program fails? There's no "t.c" or "a.out" files in my gmp tree. Please also provide the additional information requested in the Bug reporting instructions in the manual. For what it's worth, TESTS_ENVIROMENT=valgrind make check works fine in my gmp build. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpz_gcd_ext(NULL, ...)
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > While we are looking at gcdext, I propose a small change to save an > allocation when t must be computed. Can you review it, to check if the > size assumptions are safe? Sorry, I haven't looked too closely at this code... I guess it still allocates an extra limb for some arguments, which no longer is required by mpn_gcdext. > PTR () = tmp_gp; > SIZ () = gsize; > > PTR () = tmp_sp; > SIZ () = tmp_ssize; This style makes my uneasy... I'd prefer to either use real reallocatable mpz_t for destination operands, or use mpn functions directly... But that's unrelated to this change. You did one more thing in the committed change 84ce52f0d360: --- a/mpz/gcdext.c Tue Dec 13 19:20:31 2016 +0100 +++ b/mpz/gcdext.c Sat Dec 17 21:26:50 2016 +0100 @@ -64,7 +64,7 @@ mpz_gcdext (mpz_ptr g, mpz_ptr s, mpz_pt if (g != NULL) { - gp = MPZ_REALLOC (g, asize); + gp = MPZ_NEWALLOC (g, asize); MPN_COPY (gp, PTR (a), asize); SIZ (g) = asize; } That will break in case g == a, right? So we'd need to either keep MPZ_REALLOC, or check for that special case. So I'm afraid the reuse test doesn't hit this corner... Regards, and Happy New Year, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Test compile: mpn_lshift_com optimization 2
Emmanuel Thomé <emmanuel.th...@inria.fr> writes: > On Wed, Jan 04, 2017 at 03:08:57PM +0100, Niels Möller wrote: >> I'm also a bit puzzled. And after a quick look, I can't find any >> lshift_com function in mul_fft.c (or anywhere else but acinclude.m4). > > It's mpn_lshiftc (and by the way the comment in ./mpn/generic/lshiftc.c > is misleading -- lshiftc is not lshift). Interesting. I thought that mpn_lshiftc was analogous to, e.g., mpn_add_nc and mpn_addmul_1c, with an additional argument providing the bits to be shifted in. But it seems I were completely wrong. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpn_sqrtrem1
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > I decided to try an as-branch-free-as-I-can Karatsuba implementation for > umul_ppmm in the simpler environment: mini-gmp. > > Apply the following patch, define UMUL_PPMM_USE_KARATSUBA, and experience > a slowed-down, larger-code version of mini-gmp ;-D > +__x1 = (mp_limb_t) __ul * __vl; \ [...] > +if (__vh)\ > + { \ > + __x2 += __x1; \ > + __ul += __x2 < __x1;\ > + } \ > +else \ > + { \ > + __ul -= __x1 > __x2;\ > + __x2 -= __x1; \ > + } \ I think I'd arrange this as a branch free conditional negation + sign-extend of __x1. If __vh represents sign as 0 or ~0, that would be something like __x1 = (__x1 ^ __vh) - __vh; Now {__vh, __x1} is a two-limb two's complement representation of the product (__ul - __uh) * (__vl - __vh), where the notation in the latter expression refers to the initial values of these variables. Which can be accumulated into the final product. But it still won't beat the plain version on any machine with a reasonable mul-instruction. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: More mini-gmp asserts?
t...@gmplib.org (Torbjörn Granlund) writes: > ni...@lysator.liu.se (Niels Möller) writes: > > Hi, I'm considering adding no-overlap asserts to mini-gmp's mpn_mul > function. Any reason not to? If we do it by copying the MPN_OVERLAP_P > macro, we should add a GMP_-prefix for namespace reasons. > > This makes sense to me (since I haven't participated actively in > mini-gmp's development I don't have a string opinion). Done. Maybe such asserts should be addded in more places in mini-gmp? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpn_invert_limb
t...@gmplib.org (Torbjörn Granlund) writes: > These date back to 68020 and 80386. I.e., quite obsolete. :-) Gone now. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mpn_sqrtrem{1,2} - patch for pure C implem
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > The constant (c) I added to the first iteration should be incorporated in > T1, and of course it is not necessary to really use a constant. A > different value can be used for each entry in the table... In Newton iteration, one usually converges from one side, and it's tempting (to ease analysis and enable use of unsigned arithmetic) to maintain a known and fixed sign of the error. But one could gain about one bit of accuracy with error range roughly centered around zero. For the initial iteration with tabulated values, you'd adjust each table entry, as you say. In later iterations, one could add a suitably choosen constant if one is willing to handle signed adjustments cancellation logic. When I worked on reciprocal quite a while ago, I aimed for a signed symmetric error for the initial table lookup, but for the main iterations, I designed for a non-negative error and I didn't explore using signed errors. > I believe that we can define > > h(a) = a >> GMP_NUMB_BITS - 9 /* the highest 9 bits */ > > l(a) = (a >> GMP_NUMB_BITS - 20) & ((1<<12)-1) /* the next 11 bits */ One could save an instruction with using just l(a) = (a >> GMP_NUMB_BITS - 20) and adjusting the tables accordingly. You'd get a larger multiply, but limiting to 16x16 is probably not a saving on relevant platforms. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: GMP's x86-32 performance
t...@gmplib.org (Torbjörn Granlund) writes: > Our latest batch of x86-32 code dates from 2011 (for the original Intel > atom) but we have not done anything for high-end AMD and Intel CPUs > (e.g., AMD k10, bulldozer, piledriver, steamroller, excavator, zen, or > Intel penryn, nehalem, sandybridge, ivybridge, haswell, broadwell, > skylake, kabylake) in a very long time. When are those later cpus run in 32-bit mode? M$ windows or mac applications? I would have expected 64_64 mode, possibly with some use of the x32 abi (small pointers), to be used almost exclusively by now. > What do I have in mind? I believe pmovzxdq, pmuludq, psrlq (or some > shuffle insn), and paddq could be used to build an addmul_2 which runs > at at close to 1 cycle/limb using sse2, I think I looked at pmuludq in the past, the variant doing two 32x32->64 multiplies, without having any success. IIRC, the throughput of that instruction on then current cpus was too poor to make it useful. Other possible reasons for failure: (i) I didn't try hard enough, (ii) too much shuffling around of the operands are needed. BTW, speaking of addmul_2. Where current addmul_2 wins over addmul_1, that's because we get more independent mul instructions and can more easily saturate multiplication units. At least, that's my understanding. We've considered using karatsuba aka toom2 for addmul_2, but it has always turned out that saving 1/4 of the multiply instructions is very easily eaten up by the additional operations needed. But the other day, it striked me that we might also try doing addmul_2 using toom32, which would save 1/3 of the mul instructions. Toom32 is nice because we can use the four easiest evaluation points: 0, infinity, and +/-1. Or addmul_3 using toom32, which has the additional advantage that more of the evaluation work is loop-invariant, and we could also jump to separate innerloops depending on the carry bits from evaluation. Perhaps this is still crazy, and useful only for machines with very slow multiplication. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
addmul_k with toom (was: Re: GMP's x86-32 performance)
t...@gmplib.org (Torbjörn Granlund) writes: > ni...@lysator.liu.se (Niels Möller) writes: > we might also try doing addmul_2 using toom32, which > would save 1/3 of the mul instructions. Toom32 is nice because we can > use the four easiest evaluation points: 0, infinity, and +/-1. > > Give it a shot! A concern with any toom stuff is side-channel leakage. > We currently assume _basecase is non-leaky. So any low-level trickery > needs to be applied above the _basecase cutoff points (unless we split > out _basecase_sec functions). If we do this with toom32, we'd need a separate basecase_sec. I think there's little chance to gain any speedup if we enforce side-channel silence. Looks better for toom2, there I think it costs only a single instruction in the innerloop to handle all signs in a side-channel silent fashion. If we don't trust cmov to be side-channel silent, we'd also need to do conditional negation using a mask and xor + add instead of cmov, unclear what that means for performance. > Or addmul_3 using toom32, which has the additional advantage that more > of the evaluation work is loop-invariant, and we could also jump to > separate innerloops depending on the carry bits from evaluation. I've looked a bit more at this, writing some pseudo C code for the simplest case where both evaluated values v0 + v1 + v2 and v0 - v1 + v2 are even, non-negative, and fit in a single limb after division by two. One iteration of addmul_3 would then need roughly 5 alu instructions for evaluation, 4 multiplies, 8 instructions to combine products into odd and even terms (u0 v0 + u1 v1 + u0 v2) and (u0 v1 + u1 v0 + u1 v2), and another 14 for the largish final additions, including the three carry words from the previous iteration. If we divide by 6 (since this corresponds to the work of 6 iterations of addmul_1), that's 2/3 multiply instructions and 4.5 alu instructions. A sketch for addmul_2 using toom2 (and evaluating at the -1 point) gives 3 instructions for evaluation, 3 multiplies, 13 instructions for interpolation and addition. Divide by 4, to get 3/4 multiply and 4 alu instructions to do the same amount of work as a single addmul_1 iteration. To compare to x86_64/aorsmul_1.asm, which uses one multiply and 3 adc instructions per iteration, and at best runs at 2.5 cycles per limb, and x86_64/coreibwl/addmul_1.asm, which uses an iteration consisting of mulx, adox, adcx, and runs at about 1.66 cycles per limb. > Slow hardware multiplication helps. :-) At the list https://gmplib.org/devel/asm.html, we have the "Intel noc" (what's that?) with current addmul_1 running at 15 cycles per limb, and Intel Atom where it runs at 19 cycles / limb. Writing up an addmul_2 for intel atom using toom2 is probably the most promising step in this direction. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Fast mpz_bin_ui
Fredrik Johansson <fredrik.johans...@gmail.com> writes: > The current mpz_bin_ui is not asymptotically fast. I propose replacing it > with a call to mpz_bin_uiui when n fits one word (as already suggested in a > code comment) and a plain divide and conquer product otherwise. If I get this right, the speedup comes from a more balanced multiplication tree for n (n-1)...(n-k+1), right? I wonder if one can do something more clever. mpz_bin_ui_ui seems to count the multiplicities of primefactors, somewhat related to the factorial code, I think. It would be nice to avoid having to compute k! and divide it out at the end. For a start, consider the powers of two, and for concreteness, say n = 1000, k = 15. Let !! denote semifactorial, k (k-2) (k-4) We can compute k! = 15!, taking out powers of two, as 15! = 15!! 14!! = 2^7 15!! 7! = 2^7 15!! 7!! 6!! = 2^10 15!! 7!! 3! = 2^11 3^3 (5*7)^2 (15*13*11*9) (Further sieving could take other primes into account too, the current factorial code does that, I think) To do the same splitting for de numerator n(n-1)...(n-k+1), we have 1000*999*...*986 = (1000*998*...*986) (999*997*...987) = 2^7 (500*499*...*493) (999*997*...987) = 2^7 (500*498*496*494) (499*497*495*493) (999*997*...987) = 2^11 (250*249*248*247) (499*497*495*493) (999*997*...987) = 2^11 (250*248) (249*247) (499*497*495*493) (999*997*...987) = 2^13 (125*124) (249*247) (499*497*495*493) (999*997*...987) We could go on to take out factors of two of 124, but above is enough to cancel out the power of two in 15!. Unlike the factorial case, we don't get overlapping ranges and no nice squares or powers of products So we would get 1000 over 15 as 2^2 (125*124) * (249*247) * (499*497*495*493) * (999*997*...987) --- 3^3 (5*7)^2 (15*13*11*9) We could do sieving to take out the other factors of k!, but it's not clear to me that would be an improvement. Assuming n is a bignum, computing k! and dividing by it will be a fairly small operation, and taking of the factors early won't save much work in computing the numerator, as far as I see. > Related to this, it would be useful to have a public function that computes > rising factorials (say mpz_rising_ui). Then mpz_bin_ui could be implemented > very simply by calling either mpz_bin_uiui or > mpz_rising_ui+mpz_fac_ui+mpz_divexact. There are a couple of building blocks in oddfac.c and prodlimbs.c, but all for single-limb inputs, it seems. If we want to do something reasonably simple, it would make sense to use mpz_oddfac (which computes the odd part of the factorial), and mpz_fall_fac or mpz_odd_fall_fac for computing the falling factorial or the odd part thereof. I'm not familiar with the sophisticated things done for mpz_bin_uiui and mpz_fac, but maybe parts of that could be applicable also to the falling factorials of a bignum? It might work to first take out powers of two, and then rewrite multiplies as squares like n (n-2) = (n-1)^2 - 1 n (n-2)(n-4)(n-6) = [(n-3)^2 - 5]^2 - 25 Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Fast mpz_bin_ui
Let me write some more, after pondering the problem for a few days. Regular powers n^k (mod m) can be efficiently computed based on repeated squaring and the basic rule that n^{2k} = (n^2)^k = (n^2)^k (and without mod m, one would use the same algorithm, it's just that the size of the result grows linearly with k). First, let's note that there's unlikely to be any comparatively efficient way to compute falling factorial powers, for it there were, there would be an efficient way to compute n! (mod m), and that would be a fast factoring algorithm: if n is the product of two unknown primes, then the smallest factor is p = gcd (n, floor(sqrt n)! mod n ) The closest analog of a squaring rule for falling factorial powers I've been able to find is n^_{2k} = n^{_k} (n-k)^{_k} where n^{_k} denotes the falling factorial n (n-1) ... (n-k+1). And the right hand side doesn't look much like a square, I'd wish we could rewrite it to a formula like n^_{2k} = ((n-k/2)^{_k})^{_2} + some small number but I'm afraid we can't. So, back to binomial coefficients, n \over k = n^{_k} / k! and the case where n is larger than one limb, k a single limb, and k <= 2n. I guess k has to be reasonably small, for the result to fit in available memory (but I don't have any numbers handy). Then k! is going to be much smaller than n^{_k}, so that computing k! and doing the exact division should be a very small fraction of the total work. So for k!, I'd suggest just using mpz_oddfac_1 (and it would make sense to me to change it to return the exponent for the omitted power of two, which from the implementation of mpz_fac_ui seems to be k - popcount(k)). For n^_{k}, I would suggest a function to compute the odd part and return the exponent of the power of two, which would repeatedly split in even and odd numbers and compute products of the form n (n-2) (n-4) ... for odd n. Each of these products could be computed by grouping four terms at a time and use > n (n-2)(n-4)(n-6) = [(n-3)^2 - 5]^2 - 25 I like this trick, replacing three multiplies by two squares (but it's unclear how much it will save compared to the total running time). For the reasons given above, I wouldn't expect the trick to generalize much further. Then multiply all the resulting odd factors together in some roughly balanced tree. And then we could do something like void mpz_bin_ui (mpz_ptr r, mpz_srcptr n, unsigned long int k) { mpz_t k_fac; mp_bitcnt_t k_exp, n_exp; mpz_init (k_fac); k_exp = mpz_oddfac_1 (k_fac, k); n_exp = mpz_odd_falling_fac (r, n, k); ASSERT (k_exp <= n_exp); mpz_divexact (r, r, k_fac); mpz_mul_2exp (r, r, n_exp - k_exp); mpz_clear (k); } Would you like to give it a try? One could try some sieving and instead produce n^_{k} as a product of prime powers for all primes <= k and a residual, to be able to easily subtract the corresonding prime-power representation of k!. It would be interesting with some analysis of the size of n \over k after dividing out all prime factors smaller than k. If the prime powers of primes <= k makes up a large part, sieving could pay off, since that part can be computed efficiently. Will n \over k be more or less "smooth" than a random number of similar size? I'd guess slightly more smooth, since it's going to be huge compared to n, but still can't have any prime factors larger than n. But we can still have lots of prime factors in the range k < p <= n. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Revisiting gcd and gcdext for small operands
t...@gmplib.org (Torbjörn Granlund) writes: > Nisse and other great common dividers! > > Please consider these measurements: > > ibr32$ ./tune/speed -p1000 -c -s1-60 -f1.2 -r mpn_mul mpn_gcd mpn_gcdext > mpn_mul mpn_gcdmpn_gcdext > 1 #39.146.5512 17.1607 > 2 #50.77 12.4422 51.4993 > 3 #66.27 36.0216 57.9561 > 4 #87.17 40.7439 58.6398 > 5 #110.95 41.7779 57.3457 > 6 #136.56 42.4274 54.8383 > 7 #165.67 42.7125 51.9271 > 8 #205.14 39.7861 47.5393 > 9 #249.47 37.0144 42.8386 > 10#289.77 36.0182 40.6772 > 12#398.58 32.2387 34.2015 > 14#516.54 28.4706 28.4355 > 16#653.70 25.3546 25.7443 > 19#887.40 21.4907 20.0516 > 22 #1196.18 17.9833 17.4368 > 26 #1617.71 15.1357 15.3959 > 31 #2196.36 12.3736 13.3466 > 37 #2963.459.1705 12.3230 > 44 #3970.829.4049 11.9397 > 52 #5309.947.8460 11.6024 > > ibr64# tune/speed -p1000 -c -s1-60 -f1.2 -r mpn_mul mpn_gcd mpn_gcdext > mpn_mul mpn_gcdmpn_gcdext > 1 #34.519.5965 43.4851 > 2 #41.46 24.7985 91.1526 > 3 #63.37 60.3494 89.9912 > 4 #80.39 69.5938 92.7650 > 5 #105.76 69.8544 88.4005 > 6 #134.70 68.6191 83.3540 > 7 #173.62 64.7691 74.9246 > 8 #208.92 62.9363 72.0671 > 9 #262.49 56.4922 63.4748 > 10#314.04 52.3732 59.2200 > 12#437.50 46.1083 50.0230 > 14#578.39 41.1333 42.6137 > 16#714.52 38.2636 38.8416 > 19 #1021.12 32.3526 30.0309 > 22 #1274.16 29.6524 28.3373 > 26 #1653.53 26.4316 25.1063 > 31 #2283.67 22.3448 20.5088 > 37 #3185.51 16.2611 14.3919 > 44 #4003.10 16.8047 15.6203 > 52 #5287.91 12.4089 16.1354 > > Without detail knowledge of the current implementation, these numbers > suggest to me that we could speed small operand gcd and gcdext. But I > might be wrong. Hmm. I'd reason like this: All three functions are expected to be quadratic for small sizes. And then mul should become faster relative to the others above the Karatsuba threshold, since the gcd thresholds are much higher then the Karatsube threshold. But above numbers don't look like that at all. Which might indicate that the linear term of the complexity is *much* higher for gcd than for mul. Do I read the data the same way as you? > Would asm "binary algorithm" code also make sense for, say, n = 10? I > have some doubt, as C code around the current (asm) primitives add_n, > sub_n, rshift probably is close to optimal. I'd guess that assembly code for small sizes, which keeps the operands in registers, could bring some speedup. Could start with just n=2, to see what the gain is. But before doing that beyond n = 2, I think we should consider an assembly implementation of hgcd2. That's the main linear part of gcd-complexity. > I had an idea now for using a Hensel norm Euclidean algorithm. Finding > a^(-1) mod 2^b where b is the limb size in the inner loop would be > expensive, but how about letting b = 8 or thereabout? Then a table > lookup would work for the inverses. Does that make sense? Say we compute a k-bit hensel quotient, and replace (a, b) by ( (a + q b)/2^k, b). Then we get no additional progress by making k larger than the bitsize difference between a and b; making q b much larer than a is useless. So just like the plain Euclidean gcd, useful quotients are pretty small on average. To get more progress than one or two bits per step, one needs either a matrix (representing multiple division steps), or a more general linear combination a <-- k a + m b (like the "accelerated gcd" by Ken Weber, and with some need to handle spurious factors). One can gain one bit by using signed values, though, and choose q so that |q| < 2^{k-1}). Putting k = 3 (as you suggest) and |q| < 4 might give a speedup over plain binary. If it can be implemented to keep branches to a minimum. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mini-gmp
paul zimmermann <paul.zimmerm...@inria.fr> writes: > However, we are still missing a proper (documented) way to distinguish > mini-gmp from plain gmp. We could use the __MINI_GMP_H__ macro but it is not > documented. We could add some documented macro for this purpse, but I'm not sure I understand why you need it. In nettle, I do things like this: In configure.ac: AC_ARG_ENABLE(mini-gmp, AC_HELP_STRING([--enable-mini-gmp], [Enable mini-gmp, used instead of libgmp.]),, [enable_mini_gmp=no]) if test "x$enable_mini_gmp" = xyes ; then NETTLE_USE_MINI_GMP=1 else NETTLE_USE_MINI_GMP=0 fi AC_SUBST([NETTLE_USE_MINI_GMP]) Then in version.h.in (which is subject to configure substitutions, and unlike config.h, nettle/version.h is a public header file): #define NETTLE_USE_MINI_GMP @NETTLE_USE_MINI_GMP@ Finally, in bignum.h, #include "version.h" #if NETTLE_USE_MINI_GMP # include "mini-gmp.h" /* Any other min--gmp-specific hacks */ #else # include #endif And then I have a couple of #if NETTLE_USE_MINI_GMP spread out in the rest of the code. > And I notice that very few of the division functions from mini-gmp correspond > to those of the GMP documentation. Thus to compile MPFR with mini-gmp we had > to re-implement mpn_divrem_1, mpn_divrem and mpn_tdiv_qr. We can contribute > them to mini-gmp if you want. mini-gmp.h doesn't declare any mpn_*div functions. The mpn division interface in GMP needs some work, and the code in mini-gmp is closer in spirit to the new division functions we worked on (must be a decade ago!), which still haven't made it to the public GMP api. For compatibility, it would make some sense to add more of the public mpn division functions to mini-gmp, if they can be done as reasonably simple wrappers around the existing functions. > The other functions we had to re-implement are some random functions > (using srand48/lrand48, thus generating different random sequences, but > for MPFR this is fine), Right, I think sophisticated random numbers is out-of-scope for mini-gmp. > and finally mpz_dump. Hmm. It seems this function is declared in gmp.h, but not mentioned in the manual, and with a discouraging comment at the top of mpz/dump.c. Any reason you can't switch to always use mpz_out_str instead? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mini-gmp
paul zimmermann <paul.zimmerm...@inria.fr> writes: >> understand why you need it. > > for example, we check the version of GMP is at least 5.0.0: > > #if !__MPFR_GMP(5,0,0) > # error "GMP 5.0.0 or newer is required" > #endif > > where the __MPFR_GMP macro uses __GNU_MP_VERSION, ... So you can detect that mini-gmp is used with #ifndef __GNU_MP_VERSION But that's not the problem? > Since mini-gmp does not provide any version macro, we can't check > the version of mini-gmp is good enough. Is the issue really that you want to distinguish between different versions of mini-gmp? My recommendation is to including a copy of mini-gmp with mpfr, and then you know which version you have at any point, and you can upgrade it in a controlled manner to get new features or bugfixes. If we add a version number to mini-gmp, what should it be? The only "releases" of mini-gmp are gmp releases, should we adopt that, and then have no well-defined version number for mini-gmp between gmp releases? I'm afraid I'm still a bit confused, and I don't quite understand what problem you are aiming to solve with additional defines in mini-gmp. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mini-gmp
Marc Glisse <marc.gli...@inria.fr> writes: > As far as I understand, the goal of mini-gmp is that a user can take a > copy of those 2 files, stick them in his project, and get a > self-contained program. That's right. And preferably with a configure time option to use the real GMP if available. > Unless you provide a similar mini-mpfr, your > user is going to have to install the mpfr dependency anyway, his > project is not self-contained, so he might as well install the real > GMP. Even dropping only the GMP dependency is one thing less that can go wrong. One user of interest is gcc. As far as I understand, it uses mpfr and mpc (for complex arithmetic) for constant propagation. Which means pretty small numbers, and the time spent on this arithmetic is likely a very small proportion of the compile time. Using the real GMP when easily available is nice, but it's also massive overkill. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Generic get_d_2exp failures
Marc Glisse <marc.gli...@inria.fr> writes: > 2) The rounding occurs in the addition in > > weight = 1/MP_BASE_AS_DOUBLE; > d = up[size - 1]; > for (i = size - 2; i >= 0; i--) > { > d += up[i] * weight; > weight /= MP_BASE_AS_DOUBLE; > if (weight == 0) > break; > } > > we could make that code uglier to make sure there is no rounding up, > maybe comparing d+up[i]*weight to d+oldweight, and trying again by > zeroing out an increasing number of low bits of up[i]. It would be nice if we could find a portable way to add two floating point values without rounding up. Would something like this work? s = a + b; /* Assume a > b */ r = (s - a) - b; /* No rounding expected here. */ if (r > 0) s -= 2*r; But doesn't make the tests pass in my initial testing. I guess it might be necessary to isolate the high bit of r (possibly be converting to mp_limb_t ans usign count_leading_zeros), which should correspond to the lsb of the mantissa (or if it's half the lsb). And might also need special handling of the case that s, after rounding, is a power of two. The variant I tested: @@ -368,7 +369,13 @@ mpn_get_d (mp_srcptr up, mp_size_t size, d = up[size - 1]; for (i = size - 2; i >= 0; i--) { - d += up[i] * weight; + double l, s, r; + l = up[i] * weight; + s = d + l; + r = (s - d) - l; + if (r > 0) + s -= 2*r; + d = s; weight /= MP_BASE_AS_DOUBLE; if (weight == 0) break; > 3) Drop the generic path. It hasn't passed the testsuite for a long > time, can't be that important. If we keep it, we'd need to (i) fix it, and (ii) test it regularly. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Generic get_d_2exp failures
Marc Glisse <marc.gli...@inria.fr> writes: > On Sat, 28 Oct 2017, Niels Möller wrote: > >> It would be nice if we could find a portable way to add two floating >> point values without rounding up. Would something like this work? >> >> s = a + b; /* Assume a > b */ >> r = (s - a) - b; /* No rounding expected here. */ >> if (r > 0) >>s -= 2*r; > > Not sure where the 2*r is coming from. And I am not very optimistic > that there is such a simple formula that works for any rounding mode, > though I could easily be wrong. Sorry for the confusion, I mistakenly thought that since the rounding error affects the lsb, it would be of the same order. But I now realize that r may be much smaller, and affect the lsb only after carry propagation through low (otherwise ignored) mantissa of b. So on second thought, I don't think we can fix the rounding by only examining a, b, and s. We could compute rounded_b = b + r, and then we know that s == a + b_rounded, with no rounding, but that still doesn't tell us the position of the lsb, only that it's somewhere between the least signficant 1-bit of b_rounded and the most significant 1-bit of r. > There is also the question of how portable exactly it needs to be. C99 > nextafter can be helpful. So if (r > 0) s = nextafter(s, -infinity) might work. Except that (i) c99 might not be available on obscure systems where this code is used, and (ii) we try to avoid depending on linking with -lm. > It may be easier to assert that FLT_RADIX==2 and use DBL_MANT_DIG to > avoid any rounding. That's probably the sane way to do it. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: udiv_qr_3by2 vs divappr
paul zimmermann <paul.zimmerm...@inria.fr> writes: > if you use udiv_qr_3by2 to divide (say) a 6-limb number by a 3-limb number, > then in the schoolbook loop you might have {n2, n1} = {d1, d0}, in the case > where n0 is smaller than the next divisor limb d[-1]. > > If it is the responsibility of the caller to ensure {n2, n1} < {d1, d0}, > then every caller of udiv_qr_3by2 must deal with that special case. I think the reason udiv_qr_3by2 and udiv_qrnnd_preinv requires n < d is that returning GMP_NUMB_MAX for n == d requires one additional check, and some callers (in particular mpn_div_qr_1 and mpn_div_qr_2) don't need that. Tradeoffs may be different for divappr_2. > In my opinion, it would be simpler to allow {n2, n1} = {d1, d0}, and > return q=GMP_NUMB_MAX in that case. I've tried to analyse the problem of divappr_2 sometimes overflowing q and trying to return GMP_NUMB_MAX + 1. To avoid that, it turns out we need a check if ({n_2, n_1} >= {d_1, d_0} - 2) q = GMP_NUMB_MAX; (and that quotient is still approximate, so we can't omit the schoolbook division adjustment step). May still want to leave that check outside of divappr_2, to take the computation of {d_1, d_0} - 2 out of the loop. My current version of the schoolbook division code does something like d1m2 = d_1 - (d0 < 2); for (...) { if (UNLIKELY (n2 >= d1m2) && (n2 > d1m2 || (n1 >= d0 - 2))) q = GMP_NUMB_MAX; else q = divappr_2(...); submul_1 + adjustment } Left to do is strict analysis of the case d0 == 0, the code seems to work fine without any special case for that, even the computation of r1 (based on the assumption that d0 > 0) is off by one in this case. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: dead code in div_q.c?
paul zimmermann <paul.zimmerm...@inria.fr> writes: >> It would be good to either prove that qh > 0 can't happen, or add a test >> that will exercise the code handling that case. > > yes more generally it would be nice to know for the divappr functions, if the > exact quotient fits on qn limbs (i.e., the exact qh is 0), can the > "approximate" qh be 1? I suspect we may have qh == 0 always in this special context, where we have additional constraints on N and D. But in general, I doubt we can promise that the qh return value from divappr euals what we would get for the correct quotient. I.e., that the qh return values from div and divappr always match. Seems particularly unlikely for the mu divappr, which can have an error larger than 1 unit. Not sure how to come up with an example, though. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: documentation on internals not up to date
Marc Glisse <marc.gli...@inria.fr> writes: > There would be a significant advantage to mpq if we could have a > non-allocated 1 for the denominator. But indeed, with the current code > where only some mpz functions would work, it seems safer to document > that none work. We could do that internally, even if we don't advertise it for other gmp users. Then mpq_init wouldn't do any allocation, right? We could have a single mpq object representing 0/1 with _mp_alloc == 0 for both parts, and initialize with struct assignment or memcpy. We would just ensure that mpz realloc keeps supporting thuis case. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
divappr interface
Hi, after the discussion on div_q, I had another look at divappr, or more precisely, the mpn_sbpi1_divappr_q function. It's not entirely obvious from the implementation which limbs of the {np, nn} argument are actually used. But I think it should access only the qn+2 upper limbs. For accuracy, it should be sufficient with qn+1 limbs, but I suspect qn+2 is more convenient since we lack a variant of mpn_submul_1 which ignores the low half of the lowest limb product (we'd need a function for R -= floor (D * q / B)). For a call to divappr to produce qn quotient limbs (plus possibly a qh being zero or one), inputs should be a numerator of qn+2 limbs and a normalized divisor of at most qn+1 limbs. As long as qn > dn - 1, we loop computing one exact quotient limb and eliminating one high limb of the numerator per iteration, decrementing qn. (This phase would be equivalent to a call to sbpi1_div_qr, I guess). Once we reach qn = dn - 1, keep looping to produce quotient limbs, but also discard one limb of dp in each interation, until we in the final iteration have qn = 1, qn+2 = 3 numerator limbs, and qn+1 = 2 divisor limbs, i.e., a single udiv_qr_3by2 (where we might consider skipping the adjustment steps). I haven't done the error analysis, but I would expect that errors are similar to the current code. This is analogous to the discussion of mpn_bdiv_q, which we discussed years ago, where N and Q are of the same size, and D possibly smaller. The structure with two loops is also similar to the old mpn_divrem function with fractional limbs; it would be cleaner if we could have a function doing only the fractional part, i.e., require that dn == qn + 1. Fixing this should provide a small speedup for mpn_div_q, since new_nn = 2 * qn + 1; ... cy = mpn_lshift (new_np, np + nn - new_nn, new_nn, cnt); ... qh = mpn_sbpi1_divappr_q (tp, new_np, new_nn, new_dp, qn + 1, dinv.inv32); implies that new_nn and the shift are twice as large as they need to be (new_nn should be only qn+2). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: dead code in div_q.c?
ni...@lysator.liu.se (Niels Möller) writes: > --- a/tests/mpn/t-div.c Wed Apr 25 07:38:14 2018 +0200 > +++ b/tests/mpn/t-div.c Wed Apr 25 22:19:17 2018 +0200 > @@ -445,6 +445,7 @@ main (int argc, char **argv) > alloc = itch + 1; > } > scratch[itch] = ran; > + MPN_COPY (qp, junkp, nn - dn + 1); > mpn_div_q (qp, np, nn, dup, dn, scratch); > ASSERT_ALWAYS (ran == scratch[itch]); > ASSERT_ALWAYS (qp[-1] == qran0); ASSERT_ALWAYS (qp[nn - dn + 1] == > qran1); Committed. > *** /tmp/extdiff.0aBNfO/gmp.765c2c27523b/mpn/generic/div_q.c 2018-04-25 > 21:55:51.457769871 +0200 > --- /home/nisse/hack/gmp/mpn/generic/div_q.c 2018-04-25 21:18:01.516501993 > +0200 > *** mpn_div_q (mp_ptr qp, > *** 171,184 > if (cy == 0) > qp[qn - 1] = qh; > ! else if (UNLIKELY (qh != 0)) > ! { > ! /* This happens only when the quotient is close to B^n and > ! mpn_*_divappr_q returned B^n. */ > ! mp_size_t i, n; > ! n = new_nn - dn; > ! for (i = 0; i < n; i++) > ! qp[i] = GMP_NUMB_MAX; > ! qh = 0; /* currently ignored */ > ! } > } > else /* divisor is already normalised */ > --- 171,176 > if (cy == 0) > qp[qn - 1] = qh; > ! else > ! ASSERT_ALWAYS (qh == 0); > } > else /* divisor is already normalised */ And committed this first part (but with ASSERT instead of ASSERT_ALWAYS);. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: documentation on internals not up to date
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > ... and it's not just a problem of supporting mpq_denref() we should also > change some mpq_ functions. If someone does init an mpq_t with: > mpz_init (mpq_numref (accumulator)); > mpz_roinit_n (mpq_denref (accumulator), _one, 1); > > then a simple, mpq_add (accumulator, accumulator, another_mpq) might fail. I don't think we should document support for that code. If the user initializes parts of an mpq with a read-only mpz objects, the user ought to treat also the mpq object as readonly. That said, it sounds reasonable to me to use an an unallocated one for the internal initialization of the denominator. Then we have to support that the user accesses it and assigned to it like any other mpz object. But that doesn't mean that we have to promise that mpz objects the user creates using ro_init behaves in the same friendly way. E.g., in case we for some reason change the internals to break assignment to non-zero unallocated objects, we could hack mpq_denref to do a real allocation if needed (except that the macro is documented to work on a const mpq_t). Or go back to using an allocated one. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: divappr interface
ni...@lysator.liu.se (Niels Möller) writes: > Once we reach qn = dn - 1, keep looping to produce quotient limbs, but > also discard one limb of dp in each interation, until we in the final > iteration have qn = 1, qn+2 = 3 numerator limbs, and qn+1 = 2 divisor > limbs, i.e., a single udiv_qr_3by2 (where we might consider skipping the > adjustment steps). I haven't done the error analysis, but I would expect > that errors are similar to the current code. In the current mpn_sbpi1_divappr_q, there's a curious flag (or mask) variable used in this loop. It's initially all ones, cleared if we ever fail to cancel the top limb of the current partial remainder. When the flag is cleared, remaining quotient limbs are set to GMP_NUMB_MAX. Torbjörn, Paul, do you remember the analysis behind this? (Code is since 2009...). I would guess it might happen that even if higher quotient limbs are all correct, we might get non-zero high limb in partial remainder - GMP_NUMB_MAX * truncated D If we set the rest of the quotient limbs to GMP_NUMB_MAX, then the quotient should be large enough thanks to the low end divisor limbs which we ignored in the truncation. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
udiv_qr_3by2 vs divappr
It's fun looking at clever things you did a decade ago, and try to improve... I just realized that our 3/2 division, udiv_qr_3by2, work horse for the linear work of schoolbok division, has one more input than it needs. We would get a useful quotient if we always passed GMP_NUMB_MAX for the n0 argument. Which wouldn't provide any useful gain by itself (and require adjustments where the remainder is used). But it shows that we could have a function with inputs n2, n1, d1, d0, computing a suitable quotient approximation. Fixing n0 as above would mean that we always return q = f(n2, n1, d1, d0) = floor ({n2, n1, B-1} / {d1, d0}) but we could allow slightly more freedom in what we return. Inputs are a two limb normlized divisor, {d1, d0}, d1 >= B/2, and two numerator words, {n2, n1}, which must be less than {d1, d0}. I think we'd want the same precomputed inverse as for 3/2. It computes an single word approximate quotient q, roughly B {n2,n1} / {d1, d0}, which, when applied to larger numbers as part of schoolbook division, is never too small, at most one too large, and highly likely to be correct. If convenient, the function could also return a single word remainder r = {n1,n0} - q d1 - mulhi(q, d0), I think such an r would have to be in the range 0 <= r <= d1, with some subteties on when r == d1 is permissible. We might also want to return mullo(q, d0), if that's easily available. Potential I see: * It might be possible to implement this cheaper than udiv_qr_3by2, omitting most of the book-keeping and carry propagation involving the low remainder word r0. * One limb less to extract and handle specially in the schoolbook division loops. It would be particularly nice if we try to write a schoolbook division that doesn't require arguments to be normalized up front, but instead just normalizes top limbs on the fly. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: documentation on internals not up to date
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > What about the following? > > diff -r 85eda8a9f6e2 doc/gmp.texi > --- a/doc/gmp.texiFri May 04 21:43:34 2018 +0200 > +++ b/doc/gmp.texiFri May 04 23:12:34 2018 +0200 > @@ -10155,7 +10155,7 @@ > @item @code{_mp_size} > The number of limbs, or the negative of that when representing a negative > integer. Zero is represented by @code{_mp_size} set to zero, in which case > -the @code{_mp_d} data is unused. > +the @code{_mp_d} data is undefined. > > @item @code{_mp_d} > A pointer to an array of limbs which is the magnitude. These are stored > @@ -10164,17 +10164,21 @@ > significant. Whenever @code{_mp_size} is non-zero, the most significant > limb > is non-zero. > > -Currently there's always at least one limb allocated, so for instance > -@code{mpz_set_ui} never needs to reallocate, and @code{mpz_get_ui} can fetch > -@code{_mp_d[0]} unconditionally (though its value is then only wanted if > -@code{_mp_size} is non-zero). > +Currently there's always at least one readable limb, so for instance > +@code{mpz_get_ui} can fetch @code{_mp_d[0]} unconditionally (though its > +value is undefined if @code{_mp_size} is zero). > > @item @code{_mp_alloc} > @code{_mp_alloc} is the number of limbs currently allocated at @code{_mp_d}, > -and naturally @code{_mp_alloc >= ABS(_mp_size)}. When an @code{mpz} routine > +and normally @code{_mp_alloc >= ABS(_mp_size)}. When an @code{mpz} routine > is about to (or might be about to) increase @code{_mp_size}, it checks > @code{_mp_alloc} to see whether there's enough space, and reallocates if > not. > @code{MPZ_REALLOC} is generally used for this. > + > +@code{mpz_t} variables initialised with the @code{mpz_roinit_n} function > +or the @code{MPZ_ROINIT_N} macro have @code{_mp_alloc = 0} but can have a > +non-zero @code{_mp_size}. They can only be used as read-only constants. > +See @ref{Integer Special Functions} for details. > @end table > > The various bitwise logical functions like @code{mpz_and} behave as if Looks good to me. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: _ptr and _srcptr types
Vincent Lefevre <vinc...@vinc17.net> writes: > I've just noticed that GMP provides mpz_inits, so that mpz_ptr is > necessary: Since type checking cannot be done with variadic functions, > one must provide types compatible with mpz_ptr, or the behavior is > undefined. This is not an issue for mpz_t variables, but the last > argument needs to be (mpz_ptr) 0 or equivalent. Isn't (void*) 0 good enough to get defined behavior? (For which NULL is an alias in most C (but not C++) compilers). Digression: As far as I've understood, the C standard requires NULL to expand to either 0 or ((void*)0), and all sane compilers do the latter. While the C++ standard requires NULL to expand to 0, which means that C++ code calling mpz_inits can't use NULL to terminate the argument list, it should probably use nullptr instead. (And I think at least g++ tries to work around this strange C++ quirk by expanding NULL to some magic *slightly* different from 0, which will produce some warnings if it used as an integer rather then pointer). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mini-gmp mpz_{get,set}_d not fully compatible with GMP
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > diff -r 41e50c4fdc46 bootstrap.c > --- a/bootstrap.c Wed May 16 08:36:03 2018 +0200 > +++ b/bootstrap.c Fri May 18 07:24:52 2018 +0200 > @@ -29,6 +29,7 @@ > see https://www.gnu.org/licenses/. */ > > > +#define DONT_USE_FLOAT_H 1 > #include "mini-gmp/mini-gmp.c" > > #define MIN(l,o) ((l) < (o) ? (l) : (o)) > diff -r 41e50c4fdc46 mini-gmp/mini-gmp.c > --- a/mini-gmp/mini-gmp.c Wed May 16 08:36:03 2018 +0200 > +++ b/mini-gmp/mini-gmp.c Fri May 18 07:24:52 2018 +0200 > @@ -50,6 +50,9 @@ > > #include "mini-gmp.h" > > +#if !defined(DONT_USE_FLOAT_H) > +#include > +#endif Makes sense to me, provided that the define also excludes whatever mini-gmp functions depend on float.h (which would be get_d and set_d, possibly cmp_d), and is named accordingly. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mini-gmp mpz_{get,set}_d not fully compatible with GMP
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > float.h is only used for > > +#if defined(DBL_MANT_DIG) && FLT_RADIX == 2 > +#define GMP_DBL_MANT_BITS DBL_MANT_DIG > +#else > +#define GMP_DBL_MANT_BITS (53) > +#endif > > This constant is used only by the new version of get_d, to convert the > correct number of bits and avoid rounding. > I mean, the functions do not "depend on" float.h , and we don't need to > exclude them. Maybe we should decide for a different default value... You could consider making this into small step towards modularizing mini-gmp. Something like #ifndef MINI_GMP_FLOAT_SUPPORT #define MINI_GMP_FLOAT_SUPPORT 1 #endif #if MINI_GMP_FLOAT_SUPPORT #include #endif ... #if MINI_GMP_FLOAT_SUPPORT #if FLT_RADIX == 2 #define GMP_DBL_MANT_BITS DBL_MANT_DIG #else /* What are our options? Fail as unsupported? Display warning? Or document expected breakage on systems for non-binary floats? Or approximate as DBL_MANT_DIG * log2(FLT_RADIX) ? */ #define GMP_DBL_MANT_BITS (53) #endif I.e., use float.h constants unconditionally whenever float support is enabled. And gmp/bootstrap.c would then #define MINI_GMP_FLOAT_SUPPORT 0. Independent of this suggestion, I have two minor comments on the your change (43659273c9cf): I think the DONT_USE_FLOAT_H macro should have a GMP or MINI_GMP prefix. And it would be nice with a test, e.g, based on your sample program converting integers 2^i +/- 1 to double and comparing the results (before your fix, mini-gmp rounded to nearest, producing identical doubles for large enough i). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: _ptr and _srcptr types
Vincent Lefevre <vinc...@vinc17.net> writes: > No, pointers to different types may have different representations, > and in particular, different sizes. Hmm. I was aware that *function* pointers may use different representation and size than data pointers. But not that a data pointer can use different representation than void*. If that really is the case, most occurences of NULL or (void*)0 in the argument list in the call of a varargs function would be undefined behavior, right? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: _ptr and _srcptr types
t...@gmplib.org (Torbjörn Granlund) writes: > I also think usage of mpz_roinit_n might lead to memory leaks, triggered > by mpz_init (or lack of mpz_clear) before its use. We need to point > this out. The name suggests that it part of the mpz_init family of functions, which should only be used on uninitialized mpz objects. But this could be made more explicit, and we should also mention that mpz_ronit_n shouldn't be followed by mpz_clear. > I believe that both types will not change their role or definition in the > near future, as such they can be documented. > I'm not sure about: how we should explain how to use them. > > That is a challenge. It shouldn't be so difficult for users familiar with the fine points of arrays and pointers in C. mpz_t is a typedef for a size 1 array containing the internal struct, and mpz_ptr is a pointer to the same struct, i.e., the type that an mpz_t "decays" to in expression context. > (Should we provide similar types for mpq and mpf?) Yes. In m opinion, every advertised typedef of an array type should have a corresponding advertised pointer type. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Any interest in multi-threaded code?
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > I merged the current head with that 10 years old repository, try again. Thanks for taking care of that! Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: _ptr and _srcptr types
t...@gmplib.org (Torbjörn Granlund) writes: > Emmanuel Thomé <emmanuel.th...@inria.fr> writes: > > I've personally always used the mpz_ptr and mpz_srcptr in functions that > take mpz arguments, taking the liberty to use this undocumented type. > > I suppose we could declare these. I'd like to hear from the rest of the > GMP "core team". My fairly strong opinion is that we should document them. They are important for user code that needs to handle pointers to mpz_t values. One simple use case: void foo (mp_srcptr a, mp_srcptr b) { if (mpz_cmp (a, b) < 0) { mp_srcptr t = a; a = b; b = t; } ... code that requires a >= b ... } This gets a lot more awkward if the function has to be declared using documented types, as void foo (const mpz_t a, const mpz_t b) Simplest alternative I see would be to copy into local non-const mpz_t variables and use mpz_swap, and that's an extra copy I don't want to force users to make. Functions returning pointers to an mpz_t is another important case (mpz_roinit_n is one (unusual) example, spotted by Emmanuel). mpq_numref and mpq_denref too, if they were implemented as (inline) functions rather than macros. I don't think need for handling pointers is that common in GMP user code, but not so obscure that it should be ignored and unsupported. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: _ptr and _srcptr types
Emmanuel Thomé <emmanuel.th...@inria.fr> writes: > diff -r aab8a010d10f -r 14b2416f45de doc/gmp.texi > --- a/doc/gmp.texiMon May 21 00:13:27 2018 +0200 > +++ b/doc/gmp.texiThu May 24 00:31:28 2018 +0200 > @@ -1982,6 +1982,42 @@ > Also, in general @code{mp_bitcnt_t} is used for bit counts and ranges, and > @code{size_t} is used for byte or character counts. > > +@sp 1 > + > +@cindex Pointer types > +@tindex @code{mpz_ptr} > +@tindex @code{mpz_srcptr} > +@tindex @code{mpq_ptr} > +@tindex @code{mpq_srcptr} > +@tindex @code{mpf_ptr} > +@tindex @code{mpf_srcptr} > +@tindex @code{gmp_randstate_ptr} > +@tindex @code{gmp_randstate_srcptr} > +Internally, GMP data types such as @code{mpz_t} are defined as > +one-element arrays, whose element type is part of the GMP > +internals (@pxref{Internals}). In some cases, it may be desirable to > manipulate > +pointers to this element type. Types such as @code{mpz_ptr} and > +@code{mpz_srcptr} are defined for this purpose. I think I'd like to add something like When an array is used as a function argument in C, it is not passed by value, instead its value is a pointer to the first element. In C jargon, this is sometimes referred to as the array "decaying" to a pointer. For GMP types like mpz_t, that means that the function called gets a pointer to the caller's mpz_t value, which is why no explicit & operator is needed when passing output arguments. [this is related to the section Parameter Conventions] GMP defines names for these pointer types, e.g., mpz_ptr corresponding to mpz_t, and mpz_srcptr corresponding to const mpz_t. Most functions don't need to use these pointer types directly; it works fine to declare a function using the mpz_t or const mpz_t as the argument types, the same "pointer decay" happens in the background regardless. Occasionally, it is useful to manipulate pointers directly, e.g, to conditionally swap *references* to a function's inputs without changing the *values* as seen by the caller, or returning a pointer to an mpz_t which is part of a larger structure. For these cases, the pointer types are necessary. And a mpz_ptr can be passed as argument to any GMP function declared to take an mpz_t argument. > +equivalent to the following code, which is given for illustratory > +purposes only: > + > +@example > +typedef some_internal_data_type mpz_t[1]; > +typedef some_internal_data_type * mpz_ptr; > +typedef const some_internal_data_type * mpz_srcptr; > +@end example I think it's confusing with a made-up name for the internal types, but actual names for types defined. I'd prefer to either use the actual internal name __mpz_struct instead of some_internal_data_type above, or change to the obviously made up name foo consistently, like typedef foo_internal foo_t[1]; typedef foo_internal * foo_ptr; typedef const foo_internal * foo_srcptr; Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mini-gmp mpz_{get,set}_d not fully compatible with GMP
"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > Il Dom, 20 Maggio 2018 9:57 pm, Niels Möller ha scritto: >> You could consider making this into small step towards modularizing >> mini-gmp. Something like > >> #ifndef MINI_GMP_FLOAT_SUPPORT > >> ... > > The idea is interesting. How should we modularize? Should we split mini- > in many files (each #include-ing the ones it depends on) or should we have > a single file with selector like the one you propose? The only module we > have, currently is mini-mpq, as a file apart. I think it should stay as one file. Not sure if it's worth the effort, but I was thinking of trying to divide mini-gmp into modules including things like * Add and sub. * Bitwise logic functions * Mul * Div * Base conversion for power-of-2 bases * Base conversion for non-power-of-2 bases, in particular, 10 * Float conversion * Import/export * Various higher level functions, roots, gcd, primality testing, ... We'd need to sort out dependencies between the modules and likely maintain that graph manually. By default, compiling mini-gmp would include everything, but if the user does something like #define MINI_GMP_ENABLE_BASE_CONVERSION_POWER_OF_TWO 1 #define MINI_GMP_ENABLE_MUL 1 #include "mini.gmp.c" only those routines and their dependencies would be included. We'd need a pretty large block of #ifs at the top to implement the dependencies. So the drawback is the complexity of the needed #ifdeffery, and the testing thereof. Uncler to me how far it's worth going in dividing code into smaller modules. To have one or (even two, in the case of get_str) module per function will not be easy to maintain. Alternative might be to recommend applications that care about binary size to compile mini-gmp.c with -ffunction-sections and let the linker sort it out, or rely on link-time-optimization eliminating unused code. But I doubt that LTO will detect, e.g., that mpz_get_str is only ever called with power-of-two arguments, and eliminate the unused mpn_get_str_other together with the division code it depends on. > I'm already tempted to join at least mini-gmp.h and mini-mpq.h Makes sense to me. Could consider renaming mini-gmp.c to mini-mpz.c, but doesn't matter so much. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: mini-gmp mpz_{get,set}_d not fully compatible with GMP
t...@gmplib.org (Torbjörn Granlund) writes: > Would it be beneficial to add some non-IEEE host to the GMP autobuild > setup? We could revive vax using simh. I think so. Does current gcc support VAX, or what compiler would we use? Can VAX be supported using the generic code, without HAVE_DOUBLE_VAX_D? Testing the generic code seems more important than testing VAX-specific optimizations. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: udiv_qr_3by2 vs divappr
t...@gmplib.org (Torbjörn Granlund) writes: > ni...@lysator.liu.se (Niels Möller) writes: > > Yes. By one if divappr_2 can produce a useful single-limb > remainder, otherwise by two. > > I am very curious about the result of this work. A larger submul_1 > tripcount is not for free, but the current 3/2 quotient approximation is > also not for free. The main case of the current schoolbook iteration looks like udiv_qr_3by2 (q, n1, n0, n1, np[1], np[0], d1, d0, dinv); cy = mpn_submul_1 (np - dn, dp, dn, q); cy1 = n0 < cy; n0 = (n0 - cy) & GMP_NUMB_MASK; cy = n1 < cy1; n1 = (n1 - cy1) & GMP_NUMB_MASK; np[0] = n0; if (UNLIKELY (cy != 0)) ... So I would hope that increased submul_1 tripcount would be reasonably cheap compared to these n0 and n1 updates, even if submul_1 might redo one or two multiplications already done inside the quotient approximation. > I suspect that significant speedup would come from early, speculative > quotient approximation. We do its equivalent for some asm redc_1 and > its sibling mpn_sbpi1_bdiv_r (see e.g. mpn/x86_64/zen/sbpi1_bdiv_r.asm). > This was a significant speedup. I'll try to keep that in mind, but it's going to be a bit tricky. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: udiv_qr_3by2 vs divappr
t...@gmplib.org (Torbjörn Granlund) writes: > So the submul_1 size agument will increase slightly? Yes. By one if divappr_2 can produce a useful single-limb remainder, otherwise by two. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel