bug#42269: Remove non-GMP code from coreutils factor.c

2020-07-08 Thread Paul Eggert

On 7/8/20 12:34 PM, Torbjörn Granlund wrote:


Any number which does not happen to be B-smooth for, say B < 2^30, will
show easily measurable performance difference of 5x to 40x IIRC.


Ah, I had tried the example in the manual, (2^31 - 1) * (2^61 - 1). Even though 
it isn't B-smooth for B < 2^30, the performance difference was only 2x on my 
machine. I just now tried 2^127 - 1 and saw a similar performance difference, 
but 2^127 - 3 had a 15x difference so it's a better example.


I installed the attached to try to document this better.


I have a patch which makes the non-GMP code some 2x - 3x faster.  It's
been maturing for several years now, so I suppose I should really finish
it.  (It got tangled with code which improves the GMP case by letting it
fall into the non-GMP code as numbers get smaller.  That sounds simple
but is quite messy for various reasons.  It is also not clear how much
complexity we could defend for this command of limited utility.)


Yes, 'factor' is just a minor utility needed for POSIX compliance. Although it'd 
be nice to get that 2x-3x improvement whenever you have the time, it's not 
urgent. Thanks for your guidance on the GMP issue.


>From ba1489d763b66dd1fcec08ecb4cba5917745f6bf Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Wed, 8 Jul 2020 18:58:18 -0700
Subject: [PATCH] factor: explain why non-GMP code (Bug#42269)

* doc/coreutils.texi (factor invocation):
* src/factor.c: Explain why the two-word algorithm is useful.
---
 doc/coreutils.texi | 24 ++--
 src/factor.c   |  5 +
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 6ec1e6c31..656b8bc79 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -18368,14 +18368,17 @@ Print the program version on standard output, then exit without further
 processing.
 @end table
 
-Factoring the product of the eighth and ninth Mersenne primes
-takes about 4 milliseconds of CPU time on an Intel Xeon Silver 4116.
+If the number to be factored is small (less than @math{2^{127}} on
+typical machines), @command{factor} uses a faster algorithm.
+For example, on a circa-2017 Intel Xeon Silver 4116, factoring the
+product of the eighth and ninth Mersenne primes (approximately
+@math{2^{92}}) takes about 4 ms of CPU time:
 
 @example
-M8=$(echo 2^31-1|bc)
-M9=$(echo 2^61-1|bc)
-n=$(echo "$M8 * $M9" | bc)
-bash -c "time factor $n"
+$ M8=$(echo 2^31-1 | bc)
+$ M9=$(echo 2^61-1 | bc)
+$ n=$(echo "$M8 * $M9" | bc)
+$ bash -c "time factor $n"
 4951760154835678088235319297: 2147483647 2305843009213693951
 
 real	0m0.004s
@@ -18383,11 +18386,12 @@ user	0m0.004s
 sys	0m0.000s
 @end example
 
-Similarly, factoring the eighth Fermat number @math{2^{256}+1} takes
-about 14 seconds on the same machine.
+For larger numbers, @command{factor} uses a slower algorithm.  On the
+same platform, factoring the eighth Fermat number @math{2^{256} + 1}
+takes about 14 seconds, and the slower algorithm would have taken
+about 750 ms to factor @math{2^{127} - 3} instead of the 50 ms needed by
+the faster algorithm.
 
-The single-precision code uses an algorithm
-designed for factoring smaller numbers.
 Factoring large numbers is, in general, hard.  The Pollard-Brent rho
 algorithm used by @command{factor} is particularly effective for
 numbers with relatively small factors.  If you wish to factor large
diff --git a/src/factor.c b/src/factor.c
index c1c35a562..1b1607f16 100644
--- a/src/factor.c
+++ b/src/factor.c
@@ -53,6 +53,11 @@
 trick of multiplying all n-residues by the word base, allowing cheap Hensel
 reductions mod n.
 
+The GMP code uses an algorithm that can be considerably slower;
+for example, on a circa-2017 Intel Xeon Silver 4116, factoring
+2^{127}-3 takes about 50 ms with the two-word algorithm but would
+take about 750 ms with the GMP code.
+
   Improvements:
 
 * Use modular inverses also for exact division in the Lucas code, and
-- 
2.17.1



bug#42269: Remove non-GMP code from coreutils factor.c

2020-07-08 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  I'm really not that familiar with state of the art factoring, but I'd
  guess pollard rho is a bad algorithm choice for that range, and one
  ought to use, e.g., some variant of the quadratic sieve.

Let me modify that statement somewhat.

Pollars rho is suitable for finding small factors of any size
composites.  Small here might mean about < 2^64.

Any factoring effort should start with trial division, then some rounds
of Pollard rho, then perhaps some EC rounds.  Only after that and when a
remaining cofactor is non-prime, QS is to be rolled out.

That could sound like the GMP code of coreutils factor is great for
factoring really large numbers.  It is, but only for smooth huge
numbers.

If we ever get crazy enough to consider QS for coreutils factor, its
current GMP Pollard rho code would become part of a general factoring
engine for numbers > 2^127.

-- 
Torbjörn
Please encrypt, key id 0xC8601622





bug#42269: Remove non-GMP code from coreutils factor.c

2020-07-08 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes:

> If any code is to be removed, then that would be the GMP code of
> coreutils factor.

I agree with Torbjörn. The GMP code in GNU factor might have made more
sense when most computers were 32-bit, and "bignums" were smaller.

But on 64-bit computers, the GMP code would be used only for numbers
above 127 bits. I'm really not that familiar with state of the art
factoring, but I'd guess pollard rho is a bad algorithm choice for that
range, and one ought to use, e.g., some variant of the quadratic sieve.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.





bug#42269: Remove non-GMP code from coreutils factor.c

2020-07-08 Thread Torbjörn Granlund
Paul Eggert  writes:

  Could you give an example of where the 128-bit code shines, compared
  to the GMP code on the same arguments? I could add the example as a
  comment in the factor.c code, to let me and future maintainers know
  why it's useful for performance.

Any number which does not happen to be B-smooth for, say B < 2^30, will
show easily measurable performance difference of 5x to 40x IIRC.

A semantic difference which sometimes makes the speed difference less
pronounced is that the non-GMP code proves that the printed factors are
indeed prime.  We use a criterion which requires factoring of p-1 for
any assumed prime factor p.  In unlucky cases that recursive
factorisation is costlier than the main factorisation.

I have a patch which makes the non-GMP code some 2x - 3x faster.  It's
been maturing for several years now, so I suppose I should really finish
it.  (It got tangled with code which improves the GMP case by letting it
fall into the non-GMP code as numbers get smaller.  That sounds simple
but is quite messy for various reasons.  It is also not clear how much
complexity we could defend for this command of limited utility.)

-- 
Torbjörn
Please encrypt, key id 0xC8601622





bug#42269: Remove non-GMP code from coreutils factor.c

2020-07-08 Thread Paul Eggert

On 7/8/20 9:57 AM, Torbjörn Granlund wrote:


The non-GMP code of coreutils was extremely well-tuned by me and Niels
Möller a couple of years ago.


How time flies! The code was merged in 2012.


By leaving just the GMP code, you would create a pretty useless factor
command.  Any naive old factor command would often beat it.  It would
make much more sense to remove the factor command altogether.


OK, thanks. Then let's forget about the patch I just proposed.

Could you give an example of where the 128-bit code shines, compared to the GMP 
code on the same arguments? I could add the example as a comment in the factor.c 
code, to let me and future maintainers know why it's useful for performance.






bug#42269: Remove non-GMP code from coreutils factor.c

2020-07-08 Thread Torbjörn Granlund
Paul Eggert  writes:

  I recently modified GNU coreutils so that it can assume GMP, possibly
  by compiling and linking mini-gmp.c. This helps simplify the coreutils
  source code and makes coreutils behavior more portable.

  In doing so, I noticed that factor.c has special-purpose code to
  factor integers up to 127 bits. Although this code added functionality
  when coreutils could not assume GMP, it's no longer needed for
  that. And although it runs faster than the GMP code does, while doing
  the recent surgery on factor.c I began to wonder whether the hassle of
  maintaining the code outweighed its usefulness. So I wrote up the
  attached patch, which simply removes the non-GMP code and simplifies
  factor.c quite a bit.

  I assume the attached patch will hurt performance significantly in
  some cases for 127-bit numbers, so I did not install it. Perhaps it
  would be better to keep the non-GMP algorithm and recode it with
  GMP. Or perhaps it would be better to leave the factor.c code alone.

  Comments?

The GMP code in coreutils factor.c was writen by me as a demo (see
gmp/demos) over 25 years ago.  It was put into coreutils' factor.c
without consulting me.  I would have disagreed if I had been asked.

The non-GMP code of coreutils was extremely well-tuned by me and Niels
Möller a couple of years ago.  It is so fast that it has created some
stir in the mathematical community, or so I have been told

I expect the GMP code of factor.c to be pretty much unused.  Why?
Because Pollard rho is suitable only in the range currently covered by
the non-GMP code.  Iff we were to write well-tuned low-level GMP code,
that we could expend the practical range ever so slightly.

By leaving just the GMP code, you would create a pretty useless factor
command.  Any naive old factor command would often beat it.  It would
make much more sense to remove the factor command altogether.

If any code is to be removed, then that would be the GMP code of
coreutils factor.


-- 
Torbjörn
Please encrypt, key id 0xC8601622