bug#42269: Remove non-GMP code from coreutils factor.c
On 7/8/20 12:34 PM, Torbjörn Granlund wrote: Any number which does not happen to be B-smooth for, say B < 2^30, will show easily measurable performance difference of 5x to 40x IIRC. Ah, I had tried the example in the manual, (2^31 - 1) * (2^61 - 1). Even though it isn't B-smooth for B < 2^30, the performance difference was only 2x on my machine. I just now tried 2^127 - 1 and saw a similar performance difference, but 2^127 - 3 had a 15x difference so it's a better example. I installed the attached to try to document this better. I have a patch which makes the non-GMP code some 2x - 3x faster. It's been maturing for several years now, so I suppose I should really finish it. (It got tangled with code which improves the GMP case by letting it fall into the non-GMP code as numbers get smaller. That sounds simple but is quite messy for various reasons. It is also not clear how much complexity we could defend for this command of limited utility.) Yes, 'factor' is just a minor utility needed for POSIX compliance. Although it'd be nice to get that 2x-3x improvement whenever you have the time, it's not urgent. Thanks for your guidance on the GMP issue. >From ba1489d763b66dd1fcec08ecb4cba5917745f6bf Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 8 Jul 2020 18:58:18 -0700 Subject: [PATCH] factor: explain why non-GMP code (Bug#42269) * doc/coreutils.texi (factor invocation): * src/factor.c: Explain why the two-word algorithm is useful. --- doc/coreutils.texi | 24 ++-- src/factor.c | 5 + 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 6ec1e6c31..656b8bc79 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -18368,14 +18368,17 @@ Print the program version on standard output, then exit without further processing. @end table -Factoring the product of the eighth and ninth Mersenne primes -takes about 4 milliseconds of CPU time on an Intel Xeon Silver 4116. +If the number to be factored is small (less than @math{2^{127}} on +typical machines), @command{factor} uses a faster algorithm. +For example, on a circa-2017 Intel Xeon Silver 4116, factoring the +product of the eighth and ninth Mersenne primes (approximately +@math{2^{92}}) takes about 4 ms of CPU time: @example -M8=$(echo 2^31-1|bc) -M9=$(echo 2^61-1|bc) -n=$(echo "$M8 * $M9" | bc) -bash -c "time factor $n" +$ M8=$(echo 2^31-1 | bc) +$ M9=$(echo 2^61-1 | bc) +$ n=$(echo "$M8 * $M9" | bc) +$ bash -c "time factor $n" 4951760154835678088235319297: 2147483647 2305843009213693951 real 0m0.004s @@ -18383,11 +18386,12 @@ user 0m0.004s sys 0m0.000s @end example -Similarly, factoring the eighth Fermat number @math{2^{256}+1} takes -about 14 seconds on the same machine. +For larger numbers, @command{factor} uses a slower algorithm. On the +same platform, factoring the eighth Fermat number @math{2^{256} + 1} +takes about 14 seconds, and the slower algorithm would have taken +about 750 ms to factor @math{2^{127} - 3} instead of the 50 ms needed by +the faster algorithm. -The single-precision code uses an algorithm -designed for factoring smaller numbers. Factoring large numbers is, in general, hard. The Pollard-Brent rho algorithm used by @command{factor} is particularly effective for numbers with relatively small factors. If you wish to factor large diff --git a/src/factor.c b/src/factor.c index c1c35a562..1b1607f16 100644 --- a/src/factor.c +++ b/src/factor.c @@ -53,6 +53,11 @@ trick of multiplying all n-residues by the word base, allowing cheap Hensel reductions mod n. +The GMP code uses an algorithm that can be considerably slower; +for example, on a circa-2017 Intel Xeon Silver 4116, factoring +2^{127}-3 takes about 50 ms with the two-word algorithm but would +take about 750 ms with the GMP code. + Improvements: * Use modular inverses also for exact division in the Lucas code, and -- 2.17.1
bug#42269: Remove non-GMP code from coreutils factor.c
ni...@lysator.liu.se (Niels Möller) writes: I'm really not that familiar with state of the art factoring, but I'd guess pollard rho is a bad algorithm choice for that range, and one ought to use, e.g., some variant of the quadratic sieve. Let me modify that statement somewhat. Pollars rho is suitable for finding small factors of any size composites. Small here might mean about < 2^64. Any factoring effort should start with trial division, then some rounds of Pollard rho, then perhaps some EC rounds. Only after that and when a remaining cofactor is non-prime, QS is to be rolled out. That could sound like the GMP code of coreutils factor is great for factoring really large numbers. It is, but only for smooth huge numbers. If we ever get crazy enough to consider QS for coreutils factor, its current GMP Pollard rho code would become part of a general factoring engine for numbers > 2^127. -- Torbjörn Please encrypt, key id 0xC8601622
bug#42269: Remove non-GMP code from coreutils factor.c
t...@gmplib.org (Torbjörn Granlund) writes: > If any code is to be removed, then that would be the GMP code of > coreutils factor. I agree with Torbjörn. The GMP code in GNU factor might have made more sense when most computers were 32-bit, and "bignums" were smaller. But on 64-bit computers, the GMP code would be used only for numbers above 127 bits. I'm really not that familiar with state of the art factoring, but I'd guess pollard rho is a bad algorithm choice for that range, and one ought to use, e.g., some variant of the quadratic sieve. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.
bug#42269: Remove non-GMP code from coreutils factor.c
Paul Eggert writes: Could you give an example of where the 128-bit code shines, compared to the GMP code on the same arguments? I could add the example as a comment in the factor.c code, to let me and future maintainers know why it's useful for performance. Any number which does not happen to be B-smooth for, say B < 2^30, will show easily measurable performance difference of 5x to 40x IIRC. A semantic difference which sometimes makes the speed difference less pronounced is that the non-GMP code proves that the printed factors are indeed prime. We use a criterion which requires factoring of p-1 for any assumed prime factor p. In unlucky cases that recursive factorisation is costlier than the main factorisation. I have a patch which makes the non-GMP code some 2x - 3x faster. It's been maturing for several years now, so I suppose I should really finish it. (It got tangled with code which improves the GMP case by letting it fall into the non-GMP code as numbers get smaller. That sounds simple but is quite messy for various reasons. It is also not clear how much complexity we could defend for this command of limited utility.) -- Torbjörn Please encrypt, key id 0xC8601622
bug#42269: Remove non-GMP code from coreutils factor.c
On 7/8/20 9:57 AM, Torbjörn Granlund wrote: The non-GMP code of coreutils was extremely well-tuned by me and Niels Möller a couple of years ago. How time flies! The code was merged in 2012. By leaving just the GMP code, you would create a pretty useless factor command. Any naive old factor command would often beat it. It would make much more sense to remove the factor command altogether. OK, thanks. Then let's forget about the patch I just proposed. Could you give an example of where the 128-bit code shines, compared to the GMP code on the same arguments? I could add the example as a comment in the factor.c code, to let me and future maintainers know why it's useful for performance.
bug#42269: Remove non-GMP code from coreutils factor.c
Paul Eggert writes: I recently modified GNU coreutils so that it can assume GMP, possibly by compiling and linking mini-gmp.c. This helps simplify the coreutils source code and makes coreutils behavior more portable. In doing so, I noticed that factor.c has special-purpose code to factor integers up to 127 bits. Although this code added functionality when coreutils could not assume GMP, it's no longer needed for that. And although it runs faster than the GMP code does, while doing the recent surgery on factor.c I began to wonder whether the hassle of maintaining the code outweighed its usefulness. So I wrote up the attached patch, which simply removes the non-GMP code and simplifies factor.c quite a bit. I assume the attached patch will hurt performance significantly in some cases for 127-bit numbers, so I did not install it. Perhaps it would be better to keep the non-GMP algorithm and recode it with GMP. Or perhaps it would be better to leave the factor.c code alone. Comments? The GMP code in coreutils factor.c was writen by me as a demo (see gmp/demos) over 25 years ago. It was put into coreutils' factor.c without consulting me. I would have disagreed if I had been asked. The non-GMP code of coreutils was extremely well-tuned by me and Niels Möller a couple of years ago. It is so fast that it has created some stir in the mathematical community, or so I have been told I expect the GMP code of factor.c to be pretty much unused. Why? Because Pollard rho is suitable only in the range currently covered by the non-GMP code. Iff we were to write well-tuned low-level GMP code, that we could expend the practical range ever so slightly. By leaving just the GMP code, you would create a pretty useless factor command. Any naive old factor command would often beat it. It would make much more sense to remove the factor command altogether. If any code is to be removed, then that would be the GMP code of coreutils factor. -- Torbjörn Please encrypt, key id 0xC8601622