subject:"bug#12350\: Composites identified as primes in factor.c \(when HAVE

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-17 Thread Philipp Thomas

* Niels Möller (ni...@lysator.liu.se) [20120907 11:10]:

 My understanding is that most gnu/linux distributions build coreutils
 without linking to gmp. So lots of users don't get this capability.

At least openSUSE has been building coreutils with gmp for quite some time.

Philipp

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-17 Thread Jim Meyering

Philipp Thomas wrote:
 * Niels Möller (ni...@lysator.liu.se) [20120907 11:10]:

 My understanding is that most gnu/linux distributions build coreutils
 without linking to gmp. So lots of users don't get this capability.

 At least openSUSE has been building coreutils with gmp for quite some time.

So does Fedora.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-09 Thread Jim Meyering

Pádraig Brady wrote:
 On 10/08/2012 03:47 PM, Pádraig Brady wrote:
 diff --git a/src/factor.c b/src/factor.c
 index 5bfbfdc..843542b 100644
 --- a/src/factor.c
 +++ b/src/factor.c
 @@ -526,6 +526,29 @@ factor_insert_large (struct factors *factors,
   }

   #if HAVE_GMP
 +
 +# if !HAVE_DECL_MPZ_INITS
 +
 +#  define mpz_inits(...) mpz_va_init (mpz_init, __VA_ARGS__)
 +#  define mpz_clears(...) mpz_va_init (mpz_clear, __VA_ARGS__)
 +
 +static void
 +mpz_va_init (void (*mpz_single_init)(mpz_t), mpz_ptr mpz, ...)
 +{
 +  va_list ap;
 +
 +  va_start (ap, mpz);
 +
 +  while (mpz != NULL)
 +{
 +  mpz_single_init (mpz);
 +  mpz = va_arg (ap, mpz_ptr);
 +}
 +
 +  va_end (ap);
 +}
 +# endif
 +
   static void mp_factor (mpz_t, struct mp_factors *);

 Actually the above doesn't order the va_arg() call correctly.
 Also it uses mpz_ptr which is not kosher it seems:
   http://gmplib.org/list-archives/gmp-discuss/2009-May/003769.html
 So I've adjusted to:

 #define mpz_inits(...) mpz_va_init (mpz_init, __VA_ARGS__)
 #define mpz_clears(...) mpz_va_init (mpz_clear, __VA_ARGS__)

 static void
 mpz_va_init (void (*mpz_single_init)(mpz_t), ...)
 {
   va_list ap;

   va_start (ap, mpz_single_init);

   mpz_t *mpz;
   while ((mpz = va_arg (ap, mpz_t *)))
 mpz_single_init (*mpz);

   va_end (ap);
 }

Oh!  Were there symptoms?

Ship it :-)

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Pádraig Brady


On 09/04/2012 10:55 PM, Torbjorn Granlund wrote:

Pádraig Brady p...@draigbrady.com writes:

   Sure. I was just quantifying the performance change,
   for others who may be referencing or noticing patches.
   (Actually, I'd add a note to the commit message that,
   this increases calculations by about 25%).

And surely mode for certain cases. We spend 25/3 or about 8 times more
effort in Miller Rabin.

As I mentioned in the original post, we will replace the current code
with code that is many times faster.  Your example above will run at
less than a minute on your system.

   I'd left my test files in place in anticipation ;)

Please do, and let me and Niels know if it takes more than 45s.  Your
test case takes 28s on my 3.3 GHz Sandy bridge system with our current
code.  I'm a little disappointed the code doesn't beat the old code more
for small factorisations.


So on my 2.1GHz i3-2310M, running over the range 452,930,477 to 472,882,027.

old broken code = 14m
old fixed code  = 18m
new code= 63s

cheers,
Pádraig.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Pádraig Brady


On 10/07/2012 10:00 AM, Jim Meyering wrote:

Torbjorn Granlund wrote:

Jim Meyering j...@meyering.net writes:

   How about this in place of the final sentence?

   The new program also
   runs a deterministic primality test for each prime factor, not just
   a probabilistic test.

That's better, thanks.


I pushed the actual bug fix (for the issue mentioned in the Subject)
long ago, so I'm closing this issue.

Regarding your upcoming improvements, please start a new thread
when you're ready to discuss them, so that your comments are not
lost in the volume of with this now-done bug report.


A small amendment I'm going to push is not to rely on GMP5.
GMP4 on my fedora 15 system doesn't have mpz_inits().
i.e. support for initializing multiple variables at once.

Patch is simple enough...

diff --git a/src/factor.c b/src/factor.c
index 5bfbfdc..1857297 100644
--- a/src/factor.c
+++ b/src/factor.c
@@ -1335,7 +1335,10 @@ mp_prime_p (mpz_t n)
   if (mpz_cmp_ui (n, (long) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)  0)
 return true;

-  mpz_inits (q, a, nm1, tmp, NULL);
+  mpz_init (q);
+  mpz_init (a);
+  mpz_init (nm1);
+  mpz_init (tmp);

   /* Precomputation for Miller-Rabin.  */
   mpz_sub_ui (nm1, n, 1);
@@ -1399,7 +1402,10 @@ mp_prime_p (mpz_t n)
   if (flag_prove_primality)
 mp_factor_clear (factors);
  ret2:
-  mpz_clears (q, a, nm1, tmp, NULL);
+  mpz_clear (q);
+  mpz_clear (a);
+  mpz_clear (nm1);
+  mpz_clear (tmp);

   return is_prime;
 }
@@ -1608,7 +1614,8 @@ mp_factor_using_pollard_rho (mpz_t n, unsigned long int a,

   debug ([pollard-rho (%lu)] , a);

-  mpz_inits (t, t2, NULL);
+  mpz_init (t);
+  mpz_init (t2);
   mpz_init_set_si (y, 2);
   mpz_init_set_si (x, 2);
   mpz_init_set_si (z, 2);
@@ -1688,7 +1695,12 @@ mp_factor_using_pollard_rho (mpz_t n, unsigned long int 
a,
   mpz_mod (y, y, n);
 }

-  mpz_clears (P, t2, t, z, x, y, NULL);
+  mpz_clear (P);
+  mpz_clear (t2);
+  mpz_clear (t);
+  mpz_clear (z);
+  mpz_clear (x);
+  mpz_clear (y);
 }
 #endif

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Jim Meyering

Pádraig Brady wrote:
 On 10/07/2012 10:00 AM, Jim Meyering wrote:
 Torbjorn Granlund wrote:
 Jim Meyering j...@meyering.net writes:

How about this in place of the final sentence?

The new program also
runs a deterministic primality test for each prime factor, not just
a probabilistic test.

 That's better, thanks.

 I pushed the actual bug fix (for the issue mentioned in the Subject)
 long ago, so I'm closing this issue.

 Regarding your upcoming improvements, please start a new thread
 when you're ready to discuss them, so that your comments are not
 lost in the volume of with this now-done bug report.

 A small amendment I'm going to push is not to rely on GMP5.
 GMP4 on my fedora 15 system doesn't have mpz_inits().
 i.e. support for initializing multiple variables at once.

 Patch is simple enough...

Hi Pádraig,

Thanks, but wouldn't that be a slight pessimization?
What do you think about providing the missing function instead?
Maybe not worth the hassle, but still, it would avoid adding those 12
in-function lines.  factor.c is already large and complex enough that
every little bit helps.

BTW, Fedora 15 passed end of life back in June.

 diff --git a/src/factor.c b/src/factor.c
 index 5bfbfdc..1857297 100644
 --- a/src/factor.c
 +++ b/src/factor.c
 @@ -1335,7 +1335,10 @@ mp_prime_p (mpz_t n)
if (mpz_cmp_ui (n, (long) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)  0)
  return true;

 -  mpz_inits (q, a, nm1, tmp, NULL);
 +  mpz_init (q);
 +  mpz_init (a);
 +  mpz_init (nm1);
 +  mpz_init (tmp);
...

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Torbjorn Granlund

   Please do, and let me and Niels know if it takes more than 45s.  Your
   test case takes 28s on my 3.3 GHz Sandy bridge system with our current
   code.  I'm a little disappointed the code doesn't beat the old code more
   for small factorisations.
  
  So on my 2.1GHz i3-2310M, running over the range 452,930,477 to 472,882,027.
  
  old broken code = 14m
  old fixed code  = 18m
  new code= 63s
  
OK, this is about 60% slower than I would have expected.  Our code at
http://gmplib.org:8000/factoring/ should run at about 39s on your
system.  (I am using gcc 4.7.1.)

I haven't looked at the final version that went into codeutils, so I
have no idea why it runs slower.  A wild guess is that its actual input
or tokenisation has been slowed down.  For smallish numbers, such things
can dominate over actually factoring the numbers.

I think the current coreutils factor performance for smallish numbers
might be sufficient.  (Larger numbers than 2^100 need a bit too much
time, but we are working on a fix.)

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Jim Meyering

Torbjorn Granlund wrote:

Please do, and let me and Niels know if it takes more than 45s.  Your
test case takes 28s on my 3.3 GHz Sandy bridge system with our current
code.  I'm a little disappointed the code doesn't beat the old code more
for small factorisations.

   So on my 2.1GHz i3-2310M, running over the range 452,930,477 to 472,882,027.

   old broken code = 14m
   old fixed code  = 18m
   new code= 63s

 OK, this is about 60% slower than I would have expected.  Our code at
 http://gmplib.org:8000/factoring/ should run at about 39s on your
 system.  (I am using gcc 4.7.1.)

 I haven't looked at the final version that went into codeutils, so I
 have no idea why it runs slower.  A wild guess is that its actual input
 or tokenisation has been slowed down.  For smallish numbers, such things
 can dominate over actually factoring the numbers.

 I think the current coreutils factor performance for smallish numbers
 might be sufficient.  (Larger numbers than 2^100 need a bit too much
 time, but we are working on a fix.)

When I compare the factor built from http://gmplib.org:8000/factoring/, and
the one in coreutils.git I see these single-trial times: (on a 3.2GHz i7-970)

  $ seq 452930477 472882027|env time ./factor  k
  29.28user 0.44system 0:30.85elapsed 96%CPU (0avgtext+0avgdata 548maxresident)k
  0inputs+1011088outputs (0major+162minor)pagefaults 0swaps
  ...
  $ seq 452930477 472882027|env time /cu/src/factor  k
  26.47user 0.48system 0:28.07elapsed 96%CPU (0avgtext+0avgdata 692maxresident)k
  0inputs+1011088outputs (0major+205minor)pagefaults 0swaps

I.e., a 9% improvement.  Compiled with bleeding edge gcc 4.8.0 20121007.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Pádraig Brady


On 10/08/2012 01:34 PM, Jim Meyering wrote:

Torbjorn Granlund wrote:


Please do, and let me and Niels know if it takes more than 45s.  Your
test case takes 28s on my 3.3 GHz Sandy bridge system with our current
code.  I'm a little disappointed the code doesn't beat the old code more
for small factorisations.

   So on my 2.1GHz i3-2310M, running over the range 452,930,477 to 472,882,027.

   old broken code = 14m
   old fixed code  = 18m
   new code= 63s

OK, this is about 60% slower than I would have expected.  Our code at
http://gmplib.org:8000/factoring/ should run at about 39s on your
system.  (I am using gcc 4.7.1.)

I haven't looked at the final version that went into codeutils, so I
have no idea why it runs slower.  A wild guess is that its actual input
or tokenisation has been slowed down.  For smallish numbers, such things
can dominate over actually factoring the numbers.

I think the current coreutils factor performance for smallish numbers
might be sufficient.  (Larger numbers than 2^100 need a bit too much
time, but we are working on a fix.)


When I compare the factor built from http://gmplib.org:8000/factoring/, and
the one in coreutils.git I see these single-trial times: (on a 3.2GHz i7-970)

   $ seq 452930477 472882027|env time ./factor  k
   29.28user 0.44system 0:30.85elapsed 96%CPU (0avgtext+0avgdata 
548maxresident)k
   0inputs+1011088outputs (0major+162minor)pagefaults 0swaps
   ...
   $ seq 452930477 472882027|env time /cu/src/factor  k
   26.47user 0.48system 0:28.07elapsed 96%CPU (0avgtext+0avgdata 
692maxresident)k
   0inputs+1011088outputs (0major+205minor)pagefaults 0swaps

I.e., a 9% improvement.  Compiled with bleeding edge gcc 4.8.0 20121007.


Thanks for checking that.

Torbjorn you seem to be interploating from your
3.3 GHz Sandy bridge to my 2.1GHz i3-2310M Sandy Bridge,
but it might not be linear due to cache, turbo boost,
mem bandwidth, ...

cheers,
Pádraig.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Torbjorn Granlund

Pádraig Brady p...@draigbrady.com writes:

  Thanks for checking that.
  
  Torbjorn you seem to be interploating from your
  3.3 GHz Sandy bridge to my 2.1GHz i3-2310M Sandy Bridge,
  but it might not be linear due to cache, turbo boost,
  mem bandwidth, ...
  
It should be linear in clock frequency, yes.

The factor program's working set (code and data) is tiny.  Things should
fit in L1 caches.  We have a prime table, but it is smaller than L1D (at
about 10 kB).

All Sandy bridges in the world have the same L1 cache sizes.

Mem bandwidth is therefore irrelevamt, and so is higher-level caches.
Turbo boost is relevant, but I have that switched off since I am quite
fond of reproducible benchmarking results.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Jim Meyering

Torbjorn Granlund wrote:
 Pádraig Brady p...@draigbrady.com writes:

   Thanks for checking that.

   Torbjorn you seem to be interploating from your
   3.3 GHz Sandy bridge to my 2.1GHz i3-2310M Sandy Bridge,
   but it might not be linear due to cache, turbo boost,
   mem bandwidth, ...

 It should be linear in clock frequency, yes.

It meaning the core computation.

However, this little command does a lot of I/O, too:
191M input, 77M output.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Torbjorn Granlund

Jim Meyering j...@meyering.net writes:

  However, this little command does a lot of I/O, too:
  191M input, 77M output.
  
Sure.  I've never seen significant variance for such stuff, measurable
as CPU time.

I tested with a Sandybridge i3-2120T now.  The range takes 32 s.

In both cases, the systems run GNU/Linux.  The kernel version is 3.2.

The factoring speed varies very much with GCC version.  I particular the
trial division code has a very tight loop, and such loops have more
compiler reliance.  GCC 4.6 an later generate code that executes 4 insns
per (unsuccessful) division.

It is also important to use a 32bit binary.  We should perhaps have
provided better 32-bit code paths to be used for numbers  2^32 on
32-bit hardware.  Now, Pádraig's example needs about 3x more time for a
32-bit binary on the same hardware.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Pádraig Brady


On 10/08/2012 02:21 PM, Torbjorn Granlund wrote:

Jim Meyering j...@meyering.net writes:

   However, this little command does a lot of I/O, too:
   191M input, 77M output.

Sure.  I've never seen significant variance for such stuff, measurable
as CPU time.

I tested with a Sandybridge i3-2120T now.  The range takes 32 s.

In both cases, the systems run GNU/Linux.  The kernel version is 3.2.

The factoring speed varies very much with GCC version.  I particular the
trial division code has a very tight loop, and such loops have more
compiler reliance.  GCC 4.6 an later generate code that executes 4 insns
per (unsuccessful) division.


gcc (GCC) 4.6.0 20110603 (Red Hat 4.6.0-10)
With -march=native -O2


It is also important to use a 32bit binary.  We should perhaps have


s/to use/to not use/ I presume. Yep I'm on 64 bit.


provided better 32-bit code paths to be used for numbers  2^32 on
32-bit hardware.  Now, Pádraig's example needs about 3x more time for a
32-bit binary on the same hardware.


thanks,
Pádraig.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Pádraig Brady


On 10/08/2012 11:52 AM, Jim Meyering wrote:

Pádraig Brady wrote:

On 10/07/2012 10:00 AM, Jim Meyering wrote:

Torbjorn Granlund wrote:

Jim Meyering j...@meyering.net writes:

How about this in place of the final sentence?

The new program also
runs a deterministic primality test for each prime factor, not just
a probabilistic test.

That's better, thanks.


I pushed the actual bug fix (for the issue mentioned in the Subject)
long ago, so I'm closing this issue.

Regarding your upcoming improvements, please start a new thread
when you're ready to discuss them, so that your comments are not
lost in the volume of with this now-done bug report.


A small amendment I'm going to push is not to rely on GMP5.
GMP4 on my fedora 15 system doesn't have mpz_inits().
i.e. support for initializing multiple variables at once.

Patch is simple enough...


Hi Pádraig,

Thanks, but wouldn't that be a slight pessimization?
What do you think about providing the missing function instead?
Maybe not worth the hassle, but still, it would avoid adding those 12
in-function lines.  factor.c is already large and complex enough that
every little bit helps.


Borderline, but to align code with newer libs,
I've done as you suggest in the attached.


BTW, Fedora 15 passed end of life back in June.


Right, so it's not too old.
The same issue will affect debian stable, RHEL, ...

cheers,
Pádraig.
From 8052c5bf1c5c6dc92b420ce2ed3c6595d9b31797 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com
Date: Mon, 8 Oct 2012 11:38:41 +0100
Subject: [PATCH] build: support older GMP versions

The new factor code introduced usage of mpz_inits() and
mpz_clears(), which are only available since GMP = 5,
and will result in a compile error when missing.

* m4/gmp.m4 (cu_GMP): Define HAVE_DECL_MPZ_INITS appropriately.
* src/factor (mpz_inits): New function, defined where missing.
(mpz_clears): Likewise.
---
 m4/gmp.m4|2 ++
 src/factor.c |   23 +++
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/m4/gmp.m4 b/m4/gmp.m4
index e337e16..59a664f 100644
--- a/m4/gmp.m4
+++ b/m4/gmp.m4
@@ -30,6 +30,8 @@ AC_DEFUN([cu_GMP],
 LIB_GMP=$ac_cv_search___gmpz_init
 AC_DEFINE([HAVE_GMP], [1],
   [Define if you have GNU libgmp (or replacement)])
+# This only available in GMP = 5
+AC_CHECK_DECLS([mpz_inits], [], [], [[#include gmp.h]])
}],
   [AC_MSG_WARN([libgmp development library was not found or not usable.])
AC_MSG_WARN([AC_PACKAGE_NAME will be built without GMP support.])])
diff --git a/src/factor.c b/src/factor.c
index 5bfbfdc..843542b 100644
--- a/src/factor.c
+++ b/src/factor.c
@@ -526,6 +526,29 @@ factor_insert_large (struct factors *factors,
 }
 
 #if HAVE_GMP
+
+# if !HAVE_DECL_MPZ_INITS
+
+#  define mpz_inits(...) mpz_va_init (mpz_init, __VA_ARGS__)
+#  define mpz_clears(...) mpz_va_init (mpz_clear, __VA_ARGS__)
+
+static void
+mpz_va_init (void (*mpz_single_init)(mpz_t), mpz_ptr mpz, ...)
+{
+  va_list ap;
+
+  va_start (ap, mpz);
+
+  while (mpz != NULL)
+{
+  mpz_single_init (mpz);
+  mpz = va_arg (ap, mpz_ptr);
+}
+
+  va_end (ap);
+}
+# endif
+
 static void mp_factor (mpz_t, struct mp_factors *);
 
 static void
-- 
1.7.6.4

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Jim Meyering

Pádraig Brady wrote:
...
 Thanks, but wouldn't that be a slight pessimization?
 What do you think about providing the missing function instead?
 Maybe not worth the hassle, but still, it would avoid adding those 12
 in-function lines.  factor.c is already large and complex enough that
 every little bit helps.

 Borderline, but to align code with newer libs,
 I've done as you suggest in the attached.
...

Thank you!
That looks great.

 Subject: [PATCH] build: support older GMP versions

 The new factor code introduced usage of mpz_inits() and
 mpz_clears(), which are only available since GMP = 5,
 and will result in a compile error when missing.

 * m4/gmp.m4 (cu_GMP): Define HAVE_DECL_MPZ_INITS appropriately.
 * src/factor (mpz_inits): New function, defined where missing.
 (mpz_clears): Likewise.
 ---
  m4/gmp.m4|2 ++
  src/factor.c |   23 +++
  2 files changed, 25 insertions(+), 0 deletions(-)

 diff --git a/m4/gmp.m4 b/m4/gmp.m4
 index e337e16..59a664f 100644
 --- a/m4/gmp.m4
 +++ b/m4/gmp.m4
 @@ -30,6 +30,8 @@ AC_DEFUN([cu_GMP],
  LIB_GMP=$ac_cv_search___gmpz_init
  AC_DEFINE([HAVE_GMP], [1],
[Define if you have GNU libgmp (or replacement)])
 +# This only available in GMP = 5

   # This is available only in GMP = 5

 +AC_CHECK_DECLS([mpz_inits], [], [], [[#include gmp.h]])
 }],
[AC_MSG_WARN([libgmp development library was not found or not usable.])
 AC_MSG_WARN([AC_PACKAGE_NAME will be built without GMP support.])])

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-08 Thread Pádraig Brady


On 10/08/2012 03:47 PM, Pádraig Brady wrote:

diff --git a/src/factor.c b/src/factor.c
index 5bfbfdc..843542b 100644
--- a/src/factor.c
+++ b/src/factor.c
@@ -526,6 +526,29 @@ factor_insert_large (struct factors *factors,
  }

  #if HAVE_GMP
+
+# if !HAVE_DECL_MPZ_INITS
+
+#  define mpz_inits(...) mpz_va_init (mpz_init, __VA_ARGS__)
+#  define mpz_clears(...) mpz_va_init (mpz_clear, __VA_ARGS__)
+
+static void
+mpz_va_init (void (*mpz_single_init)(mpz_t), mpz_ptr mpz, ...)
+{
+  va_list ap;
+
+  va_start (ap, mpz);
+
+  while (mpz != NULL)
+{
+  mpz_single_init (mpz);
+  mpz = va_arg (ap, mpz_ptr);
+}
+
+  va_end (ap);
+}
+# endif
+
  static void mp_factor (mpz_t, struct mp_factors *);


Actually the above doesn't order the va_arg() call correctly.
Also it uses mpz_ptr which is not kosher it seems:
  http://gmplib.org/list-archives/gmp-discuss/2009-May/003769.html
So I've adjusted to:

#define mpz_inits(...) mpz_va_init (mpz_init, __VA_ARGS__)
#define mpz_clears(...) mpz_va_init (mpz_clear, __VA_ARGS__)

static void
mpz_va_init (void (*mpz_single_init)(mpz_t), ...)
{
  va_list ap;

  va_start (ap, mpz_single_init);

  mpz_t *mpz;
  while ((mpz = va_arg (ap, mpz_t *)))
mpz_single_init (*mpz);

  va_end (ap);
}

cheers,
Pádraig.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-10-07 Thread Jim Meyering

Torbjorn Granlund wrote:
 Jim Meyering j...@meyering.net writes:

   How about this in place of the final sentence?
   
   The new program also
   runs a deterministic primality test for each prime factor, not just
   a probabilistic test.
   
 That's better, thanks.

I pushed the actual bug fix (for the issue mentioned in the Subject)
long ago, so I'm closing this issue.

Regarding your upcoming improvements, please start a new thread
when you're ready to discuss them, so that your comments are not
lost in the volume of with this now-done bug report.

Thanks again,
Jim

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-30 Thread Torbjorn Granlund

Jim Meyering j...@meyering.net writes:

   Thanks.  Here's how I've integrated it so far.
   This is not ready to push, but rather just to give you an idea
   of what's coming.  I'm sure I'll have to adjust things before pushing.
  
  There have been a few corrections, and I've fleshed out some log entries.
  
  The following series is ready:
  
[PATCH 01/13] build: remove redundant dependency: $(PROGRAMS):
[PATCH 02/13] factor: prepare for the new factor program
[PATCH 03/13] factor: new implementation; not yet integrated
[PATCH 04/13] build: add rules to build the new factor program
[PATCH 05/13] factor: improvements from
[PATCH 06/13] factor: merge with preexisting factor, integrate
[PATCH 07/13] maint: use __builtin_expect only if __GNUC__
[PATCH 08/13] maint: syntax-check: add make-prime-list exemptions
[PATCH 09/13] factor: 25% speed-up, on output
[PATCH 10/13] build: avoid warning about unused macro
[PATCH 11/13] maint: mark set-but-not-used variables with
[PATCH 12/13] maint: avoid -Wsuggest-attribute=const warning
[PATCH 13/13] maint: make-prime-list: do not ignore write failure
  
  Torbjorn and Niels,
  
  I'd be happy to include more fine-grained changes to factor.c
  if you can convert the http://gmplib.org:8000/factoring commits
  and ChangeLog deltas to git commits where the ChangeLog delta
  appears in the commit log and passes coreutils' commit-log-checking hook.
  But that may be more trouble than it's worth.
  
I think those change logs are not super relevant for the coreutils
ChangeLog.  factor.c: Complete rewrite seem sufficient to me...

Both Niels and I mailed the paperwork to the FSF a week or two ago.
Have you heard from them?  Snail mail tend to disappear.

  The only missing piece is a NEWS entry.
  Would one of you please write that?
  
Sure.  Do you have an example of an old one?  Here is a start:

  The 'factor' program has been completely rewritten for speed and to
  add range.  It can now always factor numbers up to 2^128, even without
  GMP support.  Its speed is from a few times better (for small numbers)
  to over 10,000 times better (just below 2^64).  The new program also
  runs a proper prime criterion on found factors, not a probabilistic
  test.

As you might have spotted from our repo, I and Nisse Möller are working
on a small Quadratic Sieve (QS) factorer, for which we have two goals:

(1) offer it as a HAVE_GMP dependent addition to GNU factor
(2) make a more complex variant intended to be state-of-the-art

QS is one of the most powerful factoring algorithms yet discovered.
With our implementation, we will be able to factor even the most
stubborn 128-bit composites within seconds, but with enough patience
numbers of upp to 300 bits are within reach.

The code is not very large, it will make 'factor' about 30% larger.

It should factor any 128-bit numbers in around 1 second.  Any 30 bits
extra take about 10 times more time.

I don't think these new developments should hold up a commit of our old
factor.c developments.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-30 Thread Jim Meyering

Torbjorn Granlund wrote:

 Jim Meyering j...@meyering.net writes:

Thanks.  Here's how I've integrated it so far.
This is not ready to push, but rather just to give you an idea
of what's coming.  I'm sure I'll have to adjust things before pushing.

   There have been a few corrections, and I've fleshed out some log entries.

   The following series is ready:

 [PATCH 01/13] build: remove redundant dependency: $(PROGRAMS):
 [PATCH 02/13] factor: prepare for the new factor program
 [PATCH 03/13] factor: new implementation; not yet integrated
 [PATCH 04/13] build: add rules to build the new factor program
 [PATCH 05/13] factor: improvements from
 [PATCH 06/13] factor: merge with preexisting factor, integrate
 [PATCH 07/13] maint: use __builtin_expect only if __GNUC__
 [PATCH 08/13] maint: syntax-check: add make-prime-list exemptions
 [PATCH 09/13] factor: 25% speed-up, on output
 [PATCH 10/13] build: avoid warning about unused macro
 [PATCH 11/13] maint: mark set-but-not-used variables with
 [PATCH 12/13] maint: avoid -Wsuggest-attribute=const warning
 [PATCH 13/13] maint: make-prime-list: do not ignore write failure

   Torbjorn and Niels,

   I'd be happy to include more fine-grained changes to factor.c
   if you can convert the http://gmplib.org:8000/factoring commits
   and ChangeLog deltas to git commits where the ChangeLog delta
   appears in the commit log and passes coreutils' commit-log-checking hook.
   But that may be more trouble than it's worth.

 I think those change logs are not super relevant for the coreutils
 ChangeLog.  factor.c: Complete rewrite seem sufficient to me...

 Both Niels and I mailed the paperwork to the FSF a week or two ago.
 Have you heard from them?  Snail mail tend to disappear.

Not yet.  I've just asked them.

   The only missing piece is a NEWS entry.
   Would one of you please write that?

 Sure.  Do you have an example of an old one?  Here is a start:

Here are a few years worth of NEWS entries:

  http://git.sv.gnu.org/cgit/coreutils.git/tree/NEWS

That looks fine.  Thanks.

   The 'factor' program has been completely rewritten for speed and to
   add range.  It can now always factor numbers up to 2^128, even without
   GMP support.  Its speed is from a few times better (for small numbers)
   to over 10,000 times better (just below 2^64).  The new program also
   runs a proper prime criterion on found factors, not a probabilistic
   test.

How about this in place of the final sentence?

The new program also
runs a deterministic primality test for each prime factor, not just
a probabilistic test.

 As you might have spotted from our repo, I and Nisse Möller are working
 on a small Quadratic Sieve (QS) factorer, for which we have two goals:

 (1) offer it as a HAVE_GMP dependent addition to GNU factor
 (2) make a more complex variant intended to be state-of-the-art

 QS is one of the most powerful factoring algorithms yet discovered.
 With our implementation, we will be able to factor even the most
 stubborn 128-bit composites within seconds, but with enough patience
 numbers of upp to 300 bits are within reach.

 The code is not very large, it will make 'factor' about 30% larger.

That sort of code size increase sounds perfectly
reasonable considering the algorithmic improvements.

 It should factor any 128-bit numbers in around 1 second.  Any 30 bits
 extra take about 10 times more time.

That sounds great.

 I don't think these new developments should hold up a commit of our old
 factor.c developments.

I agree.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-30 Thread Torbjorn Granlund

Jim Meyering j...@meyering.net writes:

  How about this in place of the final sentence?
  
  The new program also
  runs a deterministic primality test for each prime factor, not just
  a probabilistic test.
  
That's better, thanks.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-18 Thread Torbjorn Granlund

Niels and I made more changes, including some needed to silence
-Wshadow.

We also re-enabled the div_smallq code after a bug fix, allowed squfof
to fail with recovery, and simplified the function factor.

I suppose you'll need to apply these manually, because of the TAB/SPC
problem.

I don't think we anticipate more changes to this code anytime soon.

diff -r 3024e91e4b82 factor.c
--- a/factor.c  Mon Sep 17 19:43:40 2012 +0200
+++ b/factor.c  Tue Sep 18 16:28:09 2012 +0200
@@ -140,7 +140,7 @@
 #endif
 #include longlong.h
 #ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
-static const unsigned char factor_clz_tab[129] =
+const unsigned char factor_clz_tab[129] =
 {
   1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
   7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
@@ -479,8 +479,7 @@
 #define factor_insert(f, p) factor_insert_multiplicity(f, p, 1)
 
 static void
-factor_insert_large (struct factors *factors,
-uintmax_t p1, uintmax_t p0)
+factor_insert_large (struct factors *factors, uintmax_t p1, uintmax_t p0)
 {
   if (p1  0)
 {
@@ -1739,8 +1738,10 @@
   return 0;
 }
 
-static const unsigned short invtab[] =
+/* invtab[i] = floor(0x1 / (0x100 + i) */
+static const unsigned short invtab[0x81] =
   {
+0x200,
 0x1fc, 0x1f8, 0x1f4, 0x1f0, 0x1ec, 0x1e9, 0x1e5, 0x1e1,
 0x1de, 0x1da, 0x1d7, 0x1d4, 0x1d0, 0x1cd, 0x1ca, 0x1c7,
 0x1c3, 0x1c0, 0x1bd, 0x1ba, 0x1b7, 0x1b4, 0x1b2, 0x1af,
@@ -1763,17 +1764,22 @@
that q  0x40; here it instead uses a table of (Euclidian) inverses.  */
 #define div_smallq(q, r, u, d) \
   do { \
-if (0  (u) / 0x40  (d)) \
+if ((u) / 0x40  (d))  \
   {
\
int _cnt;   \
uintmax_t _dinv, _mask, _q, _r; \
count_leading_zeros (_cnt, (d));\
-   \
-   _dinv = invtab[((d)  (W_TYPE_SIZE - 8 - _cnt))\
-  - (1  (8 - 1))];   \
-   \
_r = (u);   \
-   _q = _r * _dinv  (W_TYPE_SIZE + 8 - _cnt);\
+   if (UNLIKELY (_cnt  (W_TYPE_SIZE - 8)))\
+ { \
+   _dinv = invtab[((d)  (_cnt + 8 - W_TYPE_SIZE)) - 0x80];   \
+   _q = _dinv * _r  (8 + W_TYPE_SIZE - _cnt);\
+ } \
+   else\
+ { \
+   _dinv = invtab[((d)  (W_TYPE_SIZE - 8 - _cnt)) - 0x7f];   \
+   _q = _dinv * (_r  (W_TYPE_SIZE - 3 - _cnt))  11;\
+ } \
_r -= _q*(d);   \
\
_mask = -(uintmax_t) (_r = (d));   \
@@ -1833,8 +1839,9 @@
 #define MIN(a,b) ((a)  (b) ? (a) : (b))
 #endif
 
-
-static void
+/* Return true on success. Expected to fail only for numbers =
+   2^{2*W_TYPE_SIZE - 2}, or close to that limit. */
+static bool
 factor_using_squfof (uintmax_t n1, uintmax_t n0, struct factors *factors)
 {
   /* Uses algorithm and notation from
@@ -1854,35 +1861,41 @@
   1155, 15, 231, 35, 3, 55, 7, 11, 0
 };
 
-  uintmax_t S;
   const unsigned int *m;
 
   struct { uintmax_t Q; uintmax_t P; } queue[QUEUE_SIZE];
 
-  S = isqrt2 (n1, n0);
+  if (n1 = ((uintmax_t) 1  (W_TYPE_SIZE - 2)))
+return false;
 
-  if (n0 == S * S)
+  uintmax_t sqrt_n = isqrt2 (n1, n0);
+
+  if (n0 == sqrt_n * sqrt_n)
 {
   uintmax_t p1, p0;
 
-  umul_ppmm (p1, p0, S, S);
+  umul_ppmm (p1, p0, sqrt_n, sqrt_n);
   assert (p0 == n0);
 
   if (n1 == p1)
{
- if (prime_p (S))
-   factor_insert_multiplicity (factors, S, 2);
+ if (prime_p (sqrt_n))
+   factor_insert_multiplicity (factors, sqrt_n, 2);
  else
{
  struct factors f;
 
  f.nfactors = 0;
- factor_using_squfof (0, S, f);
+ if (!factor_using_squfof (0, sqrt_n, f))
+   {
+ /* Try pollard rho instead */
+ factor_using_pollard_rho (sqrt_n, 1, f);
+   }
  /* Duplicate the new factors */
  for

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-18 Thread Torbjorn Granlund

Torbjorn Granlund t...@gmplib.org writes:

  Niels and I made more changes, including some needed to silence
  -Wshadow.

Change log entries can be fund in:

http://gmplib.org:8000/factoring/file/61b4eac24ea4/ChangeLog

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-18 Thread Jim Meyering

Torbjorn Granlund wrote:

 Torbjorn Granlund t...@gmplib.org writes:

   Niels and I made more changes, including some needed to silence
   -Wshadow.

 Change log entries can be fund in:

 http://gmplib.org:8000/factoring/file/61b4eac24ea4/ChangeLog

Thanks.
I've nearly finished integrating it, but have not yet
added the options that enable new functionality.
I will post soon.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-17 Thread Torbjorn Granlund

Jim Meyering j...@meyering.net writes:

  At least one other change may be interesting to you:
  another scope-reduction one:
  
  - inline uintmax_t
  + static inline uintmax_t
mulredc (uintmax_t a, uintmax_t b, uintmax_t m, uintmax_t mi)
  
How about going all the way, making everything (but main) static?

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-17 Thread Jim Meyering

Torbjorn Granlund wrote:
 Jim Meyering j...@meyering.net writes:

   At least one other change may be interesting to you:
   another scope-reduction one:

   - inline uintmax_t
   + static inline uintmax_t
 mulredc (uintmax_t a, uintmax_t b, uintmax_t m, uintmax_t mi)

 How about going all the way, making everything (but main) static?

That was the intent.
One of the earlier patches I sent converted all of the others.
I had missed mulredc initially because I'd searched manually for
names annotated with a  T  in the output of nm.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-17 Thread Torbjorn Granlund

Jim Meyering j...@meyering.net writes:

  Would you mind changing the names of a few variables
  or adjusting declarations to avoid some -Wshadow warnings?
  
  I changed the innermost r to rem locally, but there are
  others.  Also, S.
  
The two remaining instances are in squfof.

Niels, please address those.  You'll find them by passing -Wshadow to
gcc.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-17 Thread Jim Meyering

Torbjorn Granlund wrote:
...
 Please consider these additional changes:

 *** .~/cu-factor.c.~1~Mon Sep 17 11:26:31 2012
 --- cu-factor.c   Mon Sep 17 15:45:05 2012
 ***
 *** 134,139 
 --- 134,140 
   #endif
   #define LONGLONG_STANDALONE /* Don't require GMP's longlong.h mdep 
 files */
   #define ASSERT(x)   /* FIXME make longlong.h really standalone 
 */
 + #define __GMP_DECLSPEC  /* FIXME make longlong.h really standalone 
 */

Oh, that must be used in an #else branch that I'm not exercising.
It triggers a symbol defined but never used warning.
Thanks.

   #define __clz_tab factor_clz_tab /* Rename to avoid glibc collision */

BTW, why does __clz_tab use the __ prefix in the first place?

   #ifndef __GMP_GNUC_PREREQ
   #define __GMP_GNUC_PREREQ(a,b) 1
 ***
 *** 143,149 
   # endif
   # include longlong.h
   # ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
 ! const unsigned char factor_clz_tab[129] =
   {
 1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
 --- 144,150 
   #endif
   #include longlong.h
   #ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
 ! static const unsigned char factor_clz_tab[129] =

Thanks.  Done.

   {
 1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
 ***
 *** 1555,1561 
   {
 mpz_t x, z, y, P;
 mpz_t t, t2;
 -   unsigned long long int k, l;

 if (flag_verbose  0)
   {
 --- 1556,1561 
 ***
 *** 1567,1574 
 mpz_init_set_si (x, 2);
 mpz_init_set_si (z, 2);
 mpz_init_set_ui (P, 1);
 !   k = 1;
 !   l = 1;

 while (mpz_cmp_ui (n, 1) != 0)
   {
 --- 1567,1575 
 mpz_init_set_si (x, 2);
 mpz_init_set_si (z, 2);
 mpz_init_set_ui (P, 1);
 !
 !   unsigned long long int k = 1;
 !   unsigned long long int l = 1;

Likewise.

 while (mpz_cmp_ui (n, 1) != 0)
   {
 ***
 *** 2287,2293 

   /* Read white-space delimited items. Return 1 on success, 0 on EOF.
  Exit on I/O errors. */
 ! int
   read_item (struct inbuf *bufstruct)
   {
 int c;
 --- 2288,2294 

   /* Read white-space delimited items. Return 1 on success, 0 on EOF.
  Exit on I/O errors. */
 ! static int

I didn't bother with this one, because in coreutils
I've just switched back to using readtokens to read input.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-17 Thread Torbjorn Granlund

 #endif
 #define LONGLONG_STANDALONE /* Don't require GMP's longlong.h mdep 
files */
 #define ASSERT(x)   /* FIXME make longlong.h really 
standalone */
   + #define __GMP_DECLSPEC  /* FIXME make longlong.h really 
standalone */
  
  Oh, that must be used in an #else branch that I'm not exercising.
  It triggers a symbol defined but never used warning.
  Thanks.
  
__GMP_DECLSPEC is used from longlong.h when count_leading_zeros ask for
__clz_tab.

 #define __clz_tab factor_clz_tab /* Rename to avoid glibc collision */
  
  BTW, why does __clz_tab use the __ prefix in the first place?
  
It makes sense in glibc.  In GMP we prepend something like __gmp_.  Here
in a non-library we prepend factor_.  I'd like to keep the __ for src
compatibility.

 # endif
 # include longlong.h
 # ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
   ! const unsigned char factor_clz_tab[129] =
 {
   1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
   7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
   --- 144,150 
 #endif
 #include longlong.h
 #ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
   ! static const unsigned char factor_clz_tab[129] =
  
  Thanks.  Done.
  
Actually, that patch is wrong.  :-(

In longlong.h, we declare it without static.  This causes a type clash.

I suggest that we kep clz_tab non-static.  Do you have any other
suggestions?

   while (mpz_cmp_ui (n, 1) != 0)
 {
   ***
   *** 2287,2293 
  
 /* Read white-space delimited items. Return 1 on success, 0 on EOF.
Exit on I/O errors. */
   ! int
 read_item (struct inbuf *bufstruct)
 {
   int c;
   --- 2288,2294 
  
 /* Read white-space delimited items. Return 1 on success, 0 on EOF.
Exit on I/O errors. */
   ! static int
  
  I didn't bother with this one, because in coreutils
  I've just switched back to using readtokens to read input.
  
Ok.  Hopefully, that does not cause too much slowdown for large ranges.
With this fast factoring code, input is a significant part of the time!

I have a variant of input the code, which replaces all the input code
with a single function which freads 4 KiB blocks and plays with
sentinels.  Here is its non-GMP part.  I left this code out for a 20%
performance penalty.

void
read_stdin_and_factor ()
{
  uintmax_t n1, n0;
  unsigned int carry;
  unsigned int c, tc;
  char *bp, *be;
  int valid_char_seen;
  size_t nread;
  enum { BUFSIZE = 4096 };
  char buf[BUFSIZE + 1];

  buf[0] = '\377';  /* sentinel */
  bp = buf;
  be = buf;

 restart:
  n0 = 0;
  valid_char_seen = 0;

#ifndef OPTIMISE_SINGLE_WORD
#define OPTIMISE_SINGLE_WORD
#endif

#if OPTIMISE_SINGLE_WORD
  /* Loop while we have one word */
  while (LIKELY (n0  (~(uintmax_t) 0 - 9) / 10))
{
  c = (unsigned char) *bp++;
  tc = c - '0';

  if (UNLIKELY (tc = 10))
{
  if (bp  be)  /* did we hit the sentinel? */
{
  nread = fread (buf, 1, BUFSIZE, stdin);
  if (nread == 0)
{
  if (valid_char_seen)
{
  factor_and_report (0, n0);
}
  return;
}

  buf[nread] = '\377';  /* sentinel */
  bp = buf;
  be = buf + nread;
  continue;
}
  else
{
  if (valid_char_seen)
{
  factor_and_report (0, n0);
  valid_char_seen = 0;
}
  if (! isspace (c))
{
  fprintf (stderr, `%c' is not a valid positive integer\n, c);
}
}
  goto restart;
}

  /* got valid digit */
  n0 = 10 * n0 + tc;
  valid_char_seen = 1;
}
#endif

  n1 = 0;

  /* Loop while we have two words */
  for (;;)
{
  c = (unsigned char) *bp++;
  tc = c - '0';

  if (UNLIKELY (tc = 10))
{
  if (bp  be)  /* did we hit the sentinel? */
{
  nread = fread (buf, 1, BUFSIZE, stdin);
  if (nread == 0)
{
  if (valid_char_seen)
{
  factor_and_report (n1, n0);
}
  return;
}

  buf[nread] = '\377';  /* sentinel */
  bp = buf;
  be = buf + nread;
  continue;
}
  else
{
  if (valid_char_seen)
{
  factor_and_report (n1, n0);
  valid_char_seen = 0;
}
  if (! isspace (c))
{
  fprintf (stderr, `%c' is not a valid positive integer\n, c);
}
}
  goto restart;
}

  /* got valid

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-16 Thread Jim Meyering

Jim Meyering wrote:
...
 Here are some more suggested changes.
 Sorry about the terse commit logs.
 The changes are mostly stylistic.

 changeset:   121:80954440c618

Hi Torbjorn,

I've begun inserting your factor.c into coreutils.
That enables a lot more warnings, and I've made a few
changes, beginning to accommodate them.  Unfortunately,
I've also converted TABs to spaces, so I'll let you
compute your own diffs (presumably with -b).  Included below:

Would you mind changing the names of a few variables
or adjusting declarations to avoid some -Wshadow warnings?

I changed the innermost r to rem locally, but there are
others.  Also, S.

  make  all-recursive
  make[1]: Entering directory `/h/j/w/co/cu'
  Making all in po
  make[2]: Entering directory `/h/j/w/co/cu/po'
  make[2]: Leaving directory `/h/j/w/co/cu/po'
  Making all in .
  make[2]: Entering directory `/h/j/w/co/cu'
CC   src/factor.o
  src/factor.c: In function 'factor_using_squfof':
  src/factor.c:1896:17: error: declaration of 'S' shadows a previous local 
[-Werror=shadow]
 uintmax_t S, Dh, Dl, Q1, Q, P, L, L1, B;
   ^
  src/factor.c:1860:13: error: shadowed declaration is here [-Werror=shadow]
 uintmax_t S;
   ^
  src/factor.c:1987:25: error: declaration of 'r' shadows a previous local 
[-Werror=shadow]
 uintmax_t r = is_square (Q);
   ^
  src/factor.c:1945:31: error: shadowed declaration is here [-Werror=shadow]
 uintmax_t q, P1, t, r;
 ^
  src/factor.c:2037:33: error: declaration of 'r' shadows a previous local 
[-Werror=shadow]
 uintmax_t r;
   ^
  src/factor.c:1987:25: error: shadowed declaration is here [-Werror=shadow]
 uintmax_t r = is_square (Q);
   ^
  src/factor.c: At top level:
  src/factor.c:2291:1: error: no previous prototype for 'read_item' 
[-Werror=missing-prototypes]
   read_item (struct inbuf *bufstruct)
   ^
  cc1: all warnings being treated as errors
  make[2]: *** [src/factor.o] Error 1
  make[2]: Leaving directory `/h/j/w/co/cu'
  make[1]: *** [all-recursive] Error 1
  make[1]: Leaving directory `/h/j/w/co/cu'
  make: *** [all] Error 2

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-15 Thread Jim Meyering

Torbjorn Granlund wrote:

We won't be sending any more code replacement blobs to this address; it
is most surely the wrong place.

   I guess you're saying that because there's been too little feedback?

 Not really.  I suppose I'm sending a code replacement so that it get
 appended to a bug report.  My initial post was infact a bug report, but
 that was 30 posts ago...

   IMHO, this is great work.

 Thanks.

   You may consider it accepted.  That was clear in my mind from
   the beginning.  Sorry if I didn't make that clear to you.
   Now, it's just a matter of integrating it.

 Good.  I would certainly accept rejection, or partial rejection.  I
 suppose the code is a fair bit more complex than one might perhaps
 expect, but we don't know how to make it neater without a large
 performance penalty.  But we can make it much more complex if desirable.
 :-)

 I think the ugliest part is the interface to longlong.h.  I don't know
 how to make that better, though.

   Here are some suggested changes -- I made these against
   a temporary local git repository using your -005 tarball.
   That was before I learned (just now) that you have a mercurial
   repository.

Here are some more suggested changes.
Sorry about the terse commit logs.
The changes are mostly stylistic.

changeset:   121:80954440c618
tag: tip
user:Jim Meyering meyer...@redhat.com
date:Sat Sep 15 22:37:19 2012 +0200
files:   factor.c
description:
use unsigned long int consistently, not unsigned long


 factor.c |  40 
 1 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/factor.c b/factor.c
--- a/factor.c
+++ b/factor.c
@@ -119,7 +119,7 @@
 #else
 typedef unsigned char UQItype;
 typedef long SItype;
-typedef unsigned long USItype;
+typedef unsigned long int USItype;
 #if HAVE_LONG_LONG
 typedeflong long int DItype;
 typedef unsigned long long int UDItype;
@@ -178,8 +178,8 @@
 struct mp_factors
 {
   mpz_t *p;
-  unsigned long *e;
-  unsigned long nfactors;
+  unsigned long int *e;
+  unsigned long int nfactors;
 };
 #endif

@@ -189,7 +189,7 @@
 #define umul_ppmm(w1, w0, u, v)
\
   do { \
 uintmax_t __x0, __x1, __x2, __x3;  \
-unsigned long __ul, __vl, __uh, __vh;  \
+unsigned long int __ul, __vl, __uh, __vh;  \
 uintmax_t __u = (u), __v = (v);\
\
 __ul = __ll_lowpart (__u); \
@@ -530,9 +530,9 @@
 static void
 mp_factor_insert (struct mp_factors *factors, mpz_t prime)
 {
-  unsigned long nfactors = factors-nfactors;
+  unsigned long int nfactors = factors-nfactors;
   mpz_t *p  = factors-p;
-  unsigned long *e  = factors-e;
+  unsigned long int *e  = factors-e;
   long i;

   /* Locate position for insert new or increment e.  */
@@ -567,7 +567,7 @@
 }

 static void
-mp_factor_insert_ui (struct mp_factors *factors, unsigned long prime)
+mp_factor_insert_ui (struct mp_factors *factors, unsigned long int prime)
 {
   mpz_t pz;

@@ -1367,13 +1367,13 @@
 #endif

 static void
-factor_using_pollard_rho (uintmax_t n, unsigned long a, struct factors 
*factors)
+factor_using_pollard_rho (uintmax_t n, unsigned long int a,
+ struct factors *factors)
 {
   uintmax_t x, z, y, P, t, ni, g;
-  unsigned long k, l;

-  k = 1;
-  l = 1;
+  unsigned long int k = 1;
+  unsigned long int l = 1;

   redcify (P, 1, n);
   addmod (x, P, P, n); /* i.e., redcify(2) */
@@ -1407,7 +1407,7 @@
  z = x;
  k = l;
  l = 2 * l;
- for (unsigned long i = 0; i  k; i++)
+ for (unsigned long int i = 0; i  k; i++)
{
  x = mulredc (x, x, n, ni);
  addmod (x, x, a, n);
@@ -1446,14 +1446,13 @@
 }

 static void
-factor_using_pollard_rho2 (uintmax_t n1, uintmax_t n0, unsigned long a,
+factor_using_pollard_rho2 (uintmax_t n1, uintmax_t n0, unsigned long int a,
   struct factors *factors)
 {
   uintmax_t x1, x0, z1, z0, y1, y0, P1, P0, t1, t0, ni, g1, g0, r1m;
-  unsigned long k, l;

-  k = 1;
-  l = 1;
+  unsigned long int k = 1;
+  unsigned long int l = 1;

   redcify2 (P1, P0, 1, n1, n0);
   addmod2 (x1, x0, P1, P0, P1, P0, n1, n0); /* i.e., redcify(2) */
@@ -1489,7 +1488,7 @@
  z1 = x1; z0 = x0;
  k = l;
  l = 2 * l;
- for (unsigned long i = 0; i  k; i++)
+ for (unsigned long int i = 0; i  k; i++)
{
  x0 = mulredc2 (r1m, x1, x0, x1, x0, n1, n0, ni);
  x1 = r1m;
@@ -1562,11 +1561,12 @@

 #if HAVE_GMP
 static void
-mp_factor_using_pollard_rho (mpz_t n, unsigned long a, struct mp_factors 
*factors)

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-14 Thread Torbjorn Granlund

I merged your changes, and made several analogous changes.
The code now passes strict compilation with and without HAVE_GMP.

I did not merge the tests changes yet.

Please grab our version from our repo.

If your repo is public, please let me now how to access it.  Else,
please either send a diff -c between your version and our repo version,
or your full file.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-14 Thread Jim Meyering

Torbjorn Granlund wrote:

 I merged your changes, and made several analogous changes.
 The code now passes strict compilation with and without HAVE_GMP.

 I did not merge the tests changes yet.

 Please grab our version from our repo.

 If your repo is public, please let me now how to access it.  Else,
 please either send a diff -c between your version and our repo version,
 or your full file.

Thanks for finishing the job!
Regarding the full file, I presume you mean the Makefile,
since you've integrated all of my changes in factor.c.
Included below.

Though note that I've replaced the use of ./ourseq with simply seq,
which depends on your having the very latest version of seq.c from
coreutils v8.19-129-g77f89d0 or newer built and first in your path.

With that Makefile, you can remove the entire tests/ directory.

# Developement Makefile for the NT factor project

# Copyright 2012 Free Software Foundation, Inc.

# This program is free software; you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free Software
# Foundation; either version 3 of the License, or (at your option) any later
# version.

# This program is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
# FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more
# details.

# You should have received a copy of the GNU General Public License along with
# this program.  If not, see http://www.gnu.org/licenses/.  */


CC = gcc
CFLAGS = -std=gnu99 -O2 -g -Werror -W -Wall -Wno-unused-but-set-variable

all: factor make-prime-list

factor: factor.o

factor.o: factor.c primes.h

primes.h: make-prime-list
./make-prime-list 5000 primes.h

%: %.c
.PRECIOUS: %.o

%.o: %.c
$(CC) $(CFLAGS) -c $ -o $@

%: %.o
$(CC) $(CFLAGS) $(LDFLAGS) $^ -o $@

clean:
rm -f make-prime-list factor primes.h ourseq *.o

# Use make check CHECK_FLAGS=-s to check squfof
CHECK_FLAGS=

p = 1844674407370
q = 792281625142643375935438

args = $(word 2,$(subst -, ,$@)) $(word 3,$(subst -, ,$@))
tests = \
  t-0-1000-a451244522b1b662c86cb3cbb55aee3e085a61a0 \
  t-1000-2000-c792a2e02f1c8536b5121f624b04039d20187016 \
  t-2000-3000-8115e8dff97d1674134ec054598d939a2a5f6113 \
  t-3000-4000-fe7b832c8e0ed55035152c0f9ebd59de73224a60 \
  t-4000-5000-b8786d66c432e48bc5b342ee3c6752b7f096f206 \
  t-5000-6000-a74fe518c5f79873c2b9016745b88b42c8fd3ede \
  t-6000-7000-689bc70d681791e5d1b8ac1316a05d0c4473d6db \
  t-7000-8000-d370808f2ab8c865f64c2ff909c5722db5b7d58d \
  t-8000-9000-7978aa66bf2bdb446398336ea6f02605e9a77581 \
  t-$(p)8551616-$(p)8651615-66c57cd58f4fb572df7f088d17e4f4c1d4f01bb1 \
  t-$(p)8551616-$(p)8651615-66c57cd58f4fb572df7f088d17e4f4c1d4f01bb1 \
  t-$(p)8651616-$(p)8751615-729228e693b1a568ecc85b199927424c7d16d410 \
  t-$(p)8751616-$(p)8851615-5a0c985017c2d285e4698f836f5a059e0b684563 \
  t-$(p)8851616-$(p)8951615-0482295c514e371c98ce9fd335deed0c9c44a4f4 \
  t-$(p)8951616-$(p)9051615-9c0e1105ac7c45e27e7bbeb5e213f530d2ad1a71 \
  t-$(p)9051616-$(p)9151615-604366d2b1d75371d0679e6a68962d66336cd383 \
  t-$(p)9151616-$(p)9251615-9192d2bdee930135b28d7160e6d395a7027871da \
  t-$(p)9251616-$(p)9351615-bcf56ae55d20d700690cff4d3327b78f83fc01bf \
  t-$(p)9351616-$(p)9451615-16b106398749e5f24d278ba7c58229ae43f650ac \
  t-$(p)9451616-$(p)9551615-ad2c6ed63525f8e7c83c4c416e7715fa1bebc54c \
  t-$(p)9551616-$(p)9651615-2b6f9c11742d9de045515a6627c27a042c49f8ba \
  t-$(p)9651616-$(p)9751615-54851acd51c4819beb666e26bc0100dc9adbc310 \
  t-$(p)9751616-$(p)9851615-6939c2a7afd2d81f45f818a159b7c5226f83a50b \
  t-$(p)9851616-$(p)9951615-0f2c8bc011d2a45e2afa01459391e68873363c6c \
  t-$(p)9951616-18446744073710051615-630dc2ad72f4c222bad1405e6c5bea590f92a98c \
  t-$(q)50336-$(q)60335-51ccb201e35599d545cb942e2bb31aba5bce4fc5

$(tests): factor ourseq
@echo '$(lastword $(subst -, ,$@))  -'  exp.$@
@echo $(args)
@seq $(args) | ./factor $(CHECK_FLAGS) | shasum -c --status exp.$@
@rm exp.$@

check: $(tests) factor ourseq

ver = `cat ver`
dist:
mkdir nt-factor-$(ver)
cp -pr factor.c ChangeLog README Makefile make-prime-list.c ourseq.c 
longlong.h tests nt-factor-$(ver)
tar cf - nt-factor-$(ver) | lzip nt-factor-$(ver).tar.lz
rm -rf nt-factor-$(ver)
./incr ver

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-14 Thread Jim Meyering

Torbjorn Granlund wrote:

 I merged your changes, and made several analogous changes.
 The code now passes strict compilation with and without HAVE_GMP.

 I did not merge the tests changes yet.

 Please grab our version from our repo.

 If your repo is public, please let me now how to access it.  Else,
 please either send a diff -c between your version and our repo version,
 or your full file.

Oh, if you don't mind making some simple, automated changes,
I'd appreciate if you were to expand all TABs in these files:

  factor.c
  longlong.h
  make-prime-list.c
  ourseq.c

And there remain a few longer-than-79 lines:

$ wc -L *.[ch]
83 factor.c
88 longlong.h
79 make-prime-list.c
79 ourseq.c
88 total

Can you split them?

Thanks,

Jim

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-14 Thread Torbjorn Granlund

Jim Meyering j...@meyering.net writes:

  Regarding the full file, I presume you mean the Makefile,
  since you've integrated all of my changes in factor.c.
  Included below.
  
No, I mean factor.c.  I integrated the changes manually, so might have
missed something.

  Though note that I've replaced the use of ./ourseq with simply seq,
  which depends on your having the very latest version of seq.c from
  coreutils v8.19-129-g77f89d0 or newer built and first in your path.
  
  With that Makefile, you can remove the entire tests/ directory.
  
I think I stick to our current organisation for now.  The important file
to keep in synch is factor.c

I suppose this is a good time to take care of paperwork.
Will you ping assign@xxx or should I?

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-14 Thread Torbjorn Granlund

Jim Meyering j...@meyering.net writes:

  Torbjorn Granlund wrote:
  
   I merged your changes, and made several analogous changes.
   The code now passes strict compilation with and without HAVE_GMP.
  
   I did not merge the tests changes yet.
  
   Please grab our version from our repo.
  
   If your repo is public, please let me now how to access it.  Else,
   please either send a diff -c between your version and our repo version,
   or your full file.
  
  Oh, if you don't mind making some simple, automated changes,
  I'd appreciate if you were to expand all TABs in these files:
  
factor.c
longlong.h
make-prime-list.c
ourseq.c
  
I would advice against making such changes to longlong.h, since it might
make sense to keep it equal to the GMP version.  We don't plan to stop
using TAB in GMP.

But feel free to make any changes to the coreutils versions of any of
these files; we then simply need to pass -b to diff.

  And there remain a few longer-than-79 lines:
  
  $ wc -L *.[ch]
  83 factor.c
  88 longlong.h
  79 make-prime-list.c
  79 ourseq.c
  88 total
  
  Can you split them?
  
Same goes for this type of changes.  

I don't see any lines that makes the program hard to read.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-13 Thread Torbjorn Granlund

We won't be sending any more code replacement blobs to this address; it
is most surely the wrong place.

Please get our suggested factor.c replacement from
http://gmplib.org:8000/factoring/.

I plan to spend no more time on this project now.  Should the
contribution be accepted, I will make the necessary amendments to the
GNU copyright paperwork.  I am certainly willing to answer questions
about the code, of course.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-13 Thread Jim Meyering

Torbjorn Granlund wrote:
 We won't be sending any more code replacement blobs to this address; it
 is most surely the wrong place.

Hi Torbjorn,

I guess you're saying that because there's been too little feedback?
IMHO, this is great work.
I've been reviewing the latest and had prepared several patches.
Just hadn't made time to send them.

 Please get our suggested factor.c replacement from
 http://gmplib.org:8000/factoring/.

 I plan to spend no more time on this project now.  Should the
 contribution be accepted, I will make the necessary amendments to the

You may consider it accepted.  That was clear in my mind from
the beginning.  Sorry if I didn't make that clear to you.
Now, it's just a matter of integrating it.

 GNU copyright paperwork.  I am certainly willing to answer questions
 about the code, of course.

Here are some suggested changes -- I made these against
a temporary local git repository using your -005 tarball.
That was before I learned (just now) that you have a mercurial
repository.

I made the Makefile parallelization changes mostly to avoid
waiting too long when I run make check -- in the very short run.
Obviously we cannot use its GNU make features in coreutils/tests.

In coreutils, we are pretty strict on warnings, so most of these
changes are to avoid the few that remained.  I've also moved
some declarations down to be nearer their point of first
initialization.  That's another style issue in coreutils.  We
are no longer required to declare all variables at the top of
each block.

From bc5f37195abb5c0f9de7d65f5cc937ab902cd774 Mon Sep 17 00:00:00 2001
From: Jim Meyering meyer...@redhat.com
Date: Tue, 11 Sep 2012 11:59:17 +0200
Subject: [PATCH 01/12] CFLAGS: Add options.

---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 1378caf..9f84629 100644
--- a/Makefile
+++ b/Makefile
@@ -17,7 +17,7 @@


 CC = gcc
-CFLAGS = -O2 -g -Wall -Wno-unused-but-set-variable
+CFLAGS = -std=gnu99 -O2 -g -Werror -W -Wall -Wno-unused-but-set-variable

 all: factor make-prime-list

-- 
1.7.12.363.g53284de


From 6488446b1185498fed02984ac51a1068e0d3bc7c Mon Sep 17 00:00:00 2001
From: Jim Meyering meyer...@redhat.com
Date: Tue, 11 Sep 2012 11:55:30 +0200
Subject: [PATCH 02/12] s/const static/static const/

---
 factor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/factor.c b/factor.c
index abca67c..98e298c 100644
--- a/factor.c
+++ b/factor.c
@@ -793,7 +793,7 @@ mp_factor_using_division (mpz_t t, struct mp_factors 
*factors)
 #endif

 /* Entry i contains (2i+1)^(-1) mod 2^8.  */
-const static unsigned char  binvert_table[128] =
+static const unsigned char  binvert_table[128] =
 {
   0x01, 0xAB, 0xCD, 0xB7, 0x39, 0xA3, 0xC5, 0xEF,
   0xF1, 0x1B, 0x3D, 0xA7, 0x29, 0x13, 0x35, 0xDF,
-- 
1.7.12.363.g53284de


From d2b9e92f883f7817592d2cdbe338866dc9e94d49 Mon Sep 17 00:00:00 2001
From: Jim Meyering meyer...@redhat.com
Date: Tue, 11 Sep 2012 12:00:00 +0200
Subject: [PATCH 03/12] prime_2p: make r and k unsigned, and

millerrabin: make k unsigned; move decl of i into for stmt.
---
 factor.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/factor.c b/factor.c
index 98e298c..0c2999b 100644
--- a/factor.c
+++ b/factor.c
@@ -1038,7 +1038,6 @@ powm2 (uintmax_t *r1m,
 int
 millerrabin (uintmax_t n, uintmax_t ni, uintmax_t b, uintmax_t q, unsigned int 
k, uintmax_t one)
 {
-  unsigned int i;
   uintmax_t y, nm1;

   y = powm (b, q, n, ni, one);
@@ -1048,7 +1047,7 @@ millerrabin (uintmax_t n, uintmax_t ni, uintmax_t b, 
uintmax_t q, unsigned int k
   if (y == one || y == nm1)
 return 1;

-  for (i = 1; i  k; i++)
+  for (unsigned int i = 1; i  k; i++)
 {
   y = mulredc (y, y, n, ni);

@@ -1063,9 +1062,8 @@ millerrabin (uintmax_t n, uintmax_t ni, uintmax_t b, 
uintmax_t q, unsigned int k
 int
 millerrabin2 (const uintmax_t *np, uintmax_t ni,
  const uintmax_t *bp, const uintmax_t *qp,
- int k, const uintmax_t *one)
+ unsigned int k, const uintmax_t *one)
 {
-  unsigned int i;
   uintmax_t y1, y0, nm1_1, nm1_0, r1m;

   y0 = powm2 (r1m, bp, qp, np, ni, one);
@@ -1079,7 +1077,7 @@ millerrabin2 (const uintmax_t *np, uintmax_t ni,
   if (y0 == nm1_0  y1 == nm1_1)
 return 1;

-  for (i = 1; i  k; i++)
+  for (unsigned int i = 1; i  k; i++)
 {
   y0 = mulredc2 (r1m, y1, y0, y1, y0, np[1], np[0], ni);
   y1 = r1m;
@@ -1206,7 +1204,7 @@ prime2_p (uintmax_t n1, uintmax_t n0)
   uintmax_t one[2];
   uintmax_t na[2];
   uintmax_t ni;
-  int k, r;
+  unsigned int k, r;
   struct factors factors;

   if (n1 == 0)
-- 
1.7.12.363.g53284de


From 97d8c18730bd0dbed254ea64f29416c662cf772d Mon Sep 17 00:00:00 2001
From: Jim Meyering meyer...@redhat.com
Date: Tue, 11 Sep 2012 12:25:51 +0200
Subject: [PATCH 04/12] move index decl into for stmt

---
 factor.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/factor.c b/factor.c
index 0c2999b..0ef3667

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-13 Thread Jim Meyering

Jim Meyering wrote:
 Torbjorn Granlund wrote:
 We won't be sending any more code replacement blobs to this address; it
 is most surely the wrong place.

 Hi Torbjorn,

 I guess you're saying that because there's been too little feedback?
 IMHO, this is great work.
 I've been reviewing the latest and had prepared several patches.
 Just hadn't made time to send them.

 Please get our suggested factor.c replacement from
 http://gmplib.org:8000/factoring/.

 I plan to spend no more time on this project now.  Should the
 contribution be accepted, I will make the necessary amendments to the

 You may consider it accepted.  That was clear in my mind from
 the beginning.  Sorry if I didn't make that clear to you.
 Now, it's just a matter of integrating it.

 GNU copyright paperwork.  I am certainly willing to answer questions
 about the code, of course.

 Here are some suggested changes -- I made these against
 a temporary local git repository using your -005 tarball.
 That was before I learned (just now) that you have a mercurial
 repository.

Here's a change without which ourseq misuse clobbers the heap:

$ valgrind ./ourseq 999 1
==7387== Memcheck, a memory error detector
==7387== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==7387== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==7387== Command: ./ourseq 999 1
==7387==
==7387== Invalid write of size 8
==7387==at 0x4A0A0CB: memcpy@@GLIBC_2.14 (mc_replace_strmem.c:837)
==7387==by 0x4006CC: main (ourseq.c:74)
==7387==  Address 0x4c35040 is 0 bytes inside a block of size 3 alloc'd
==7387==at 0x4A0884D: malloc (vg_replace_malloc.c:263)
==7387==by 0x4006AD: main (ourseq.c:71)
==7387==
first string greater than second string
==7387==
==7387== HEAP SUMMARY:
==7387== in use at exit: 5 bytes in 2 blocks
==7387==   total heap usage: 2 allocs, 0 frees, 5 bytes allocated
==7387==
==7387== LEAK SUMMARY:
==7387==definitely lost: 0 bytes in 0 blocks
==7387==indirectly lost: 0 bytes in 0 blocks
==7387==  possibly lost: 0 bytes in 0 blocks
==7387==still reachable: 5 bytes in 2 blocks
==7387== suppressed: 0 bytes in 0 blocks
==7387== Rerun with --leak-check=full to see details of leaked memory
==7387==
==7387== For counts of detected and suppressed errors, rerun with: -v
==7387== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 2 from 2)


From 2fe3143867a3a2f0b3f4d0ff71c8bfca9c676127 Mon Sep 17 00:00:00 2001
From: Jim Meyering meyer...@redhat.com
Date: Thu, 13 Sep 2012 13:27:48 +0200
Subject: [PATCH] bug fix: ourseq would clobber heap for some out-of-order
 args

* ourseq.c (main): Don't clobber heap for argv[1] longer than argv[2].
Also declare functions static and some parameters const.
---
 ChangeLog |  6 ++
 ourseq.c  | 13 +
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index d71a8c3..b009136 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2012-09-13  Jim Meyering  meyer...@redhat.com
+
+   bug fix: ourseq would clobber heap for some out-of-order args
+   * ourseq.c (main): Don't clobber heap for argv[1] longer than argv[2].
+   Also declare functions static and some parameters const.
+
 2012-09-10  Torbjorn Granlund  t...@gmplib.org

* factor.c (mp_prime_p): Clear out `factors' only after Lucas run.
diff --git a/ourseq.c b/ourseq.c
index cb71f13..a9899f9 100644
--- a/ourseq.c
+++ b/ourseq.c
@@ -1,5 +1,5 @@
 /* A simple seq program that operates directly on the numeric strings.
-   This works around strange limits/bugs in standards seq implementations.  */
+   This works around strange limits/bugs in standard seq implementations.  */

 #include stdlib.h
 #include string.h
@@ -13,8 +13,8 @@ struct string

 typedef struct string string;

-int
-cmp (string *s1, string *s2)
+static int
+cmp (string const *s1, string const *s2)
 {
   size_t len1, len2;

@@ -29,7 +29,7 @@ cmp (string *s1, string *s2)
   return strcmp (s1-str, s2-str);
 }

-void
+static void
 incr (string *st)
 {
   size_t len;
@@ -67,6 +67,11 @@ main (int argc, char **argv)

   len1 = strlen (argv[1]);
   len2 = strlen (argv[2]);
+  if (len2  len1)
+{
+  fprintf (stderr, first string greater than second string\n);
+  exit (1);
+}

   b.str = malloc (len2 + 2);   /* not a typo for len1 */
   e.str = malloc (len2 + 1);
--
1.7.12.363.g53284de

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-13 Thread Torbjorn Granlund


   We won't be sending any more code replacement blobs to this address; it
   is most surely the wrong place.
  
  I guess you're saying that because there's been too little feedback?

Not really.  I suppose I'm sending a code replacement so that it get
appended to a bug report.  My initial post was infact a bug report, but
that was 30 posts ago...

  IMHO, this is great work.

Thanks.

  You may consider it accepted.  That was clear in my mind from
  the beginning.  Sorry if I didn't make that clear to you.
  Now, it's just a matter of integrating it.
  
Good.  I would certainly accept rejection, or partial rejection.  I
suppose the code is a fair bit more complex than one might perhaps
expect, but we don't know how to make it neater without a large
performance penalty.  But we can make it much more complex if desirable.
:-)

I think the ugliest part is the interface to longlong.h.  I don't know
how to make that better, though.

  Here are some suggested changes -- I made these against
  a temporary local git repository using your -005 tarball.
  That was before I learned (just now) that you have a mercurial
  repository.
  
I'll take a look within a few days.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-13 Thread Torbjorn Granlund


  * ourseq.c (main): Don't clobber heap for argv[1] longer than argv[2].
  Also declare functions static and some parameters const.

I put this into our repo, and also tacked on a GPL header to make sure
nobody would mistake this for non-free software.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-10 Thread Torbjorn Granlund

Torbjorn Granlund t...@gmplib.org writes:

  We found a bug causing performance problems for numbers between 2^64 and
  2^127.  It could also trigger asserts with factoring numbers close to
  2^127.
  
  The new version is attached.

Another bug found, this time related to the GMP code.  It would clear
out an uninitialised structure with prime proving disabled.

Here ia a fixed version:



nt-factor-005.tar.lz
Description: Binary data

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-07 Thread Jim Meyering

Jim Meyering wrote:
 Jim Meyering wrote:

 Torbjorn Granlund wrote:
 The very old factoring code cut from an now obsolete version GMP does
 not pass proper arguments to the mpz_probab_prime_p function.  It ask
 for 3 Miller-Rabin tests only, which is not sufficient.

 Hi Torbjorn

 Thank you for the patch and explanation.
 I've converted that into the commit below in your name.
 Please proofread it and let me know if you'd like to change anything.
 I tweaked the patch to change MR_REPS from a #define to an enum
 and to add the comment just preceding.

 I'll add NEWS and tests separately.
 ...
 From: Torbjorn Granlund t...@gmplib.org
 Date: Tue, 4 Sep 2012 16:22:47 +0200
 Subject: [PATCH] factor: don't ever declare composites to be prime

 Torbjörn, I've just noticed that I misspelled your name above.

 Here's the NEWS/tests addition.
 Following is an adjusted commit that spells your name properly.

From e561ff991b74dc19f6728aa1e6e61d1927055ac1 Mon Sep 17 00:00:00 2001

There have been enough changes (mostly typo fixes) that I'm re-posting
these for review before I push.  Also, I added this sentence to NEWS
about the performance hit, too

The fix makes factor somewhat slower (~25%) for ranges of consecutive
numbers, and up to 8 times slower for some worst-case individual numbers.


From 68cf62bb04ecd138c81b68539c2a065250ca4390 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Torbj=C3=B6rn=20Granlund?= t...@gmplib.org
Date: Tue, 4 Sep 2012 18:38:29 +0200
Subject: [PATCH 1/2] factor: don't ever declare composites to be prime

The multiple-precision factoring code (with HAVE_GMP) was copied from
a now-obsolete version of GMP that did not pass proper arguments to
the mpz_probab_prime_p function.  It makes that code perform no more
than 3 Miller-Rabin tests only, which is not sufficient.

A Miller-Rabin test will detect composites with at least a probability
of 3/4.  For a uniform random composite, the probability will actually
be much higher.

Or put another way, of the N-3 possible Miller-Rabin tests for checking
the composite N, there is no number N for which more than (N-3)/4 of the
tests will fail to detect the number as a composite.  For most numbers N
the number of false witnesses will be much, much lower.

Problem numbers are of the form N=pq, p,q prime and (p-1)/(q-1) = s,
where s is a small integer.  (There are other problem forms too,
involving 3 or more prime factors.)  When s = 2, we get the 3/4 factor.

It is easy to find numbers of that form that cause coreutils' factor to
fail:

  465658903
  2242724851
  6635692801
  17709149503
  17754345703
  20889169003
  42743470771
  54890944111
  72047131003
  85862644003
  98275842811
  114654168091
  117225546301
  ...

There are 9008992 composites of the form with s=2 below 2^64.  With 3
Miller-Rabin tests, one would expect about 9008992/64 = 140766 to be
invalidly recognized as primes in that range.

* src/factor.c (MR_REPS): Define to 25.
(factor_using_pollard_rho): Use MR_REPS, not 3.
(print_factors_multi): Likewise.
* THANKS.in: Remove my name, now that it will be automatically
included in the generated THANKS file.
---
 THANKS.in| 1 -
 src/factor.c | 9 ++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/THANKS.in b/THANKS.in
index 1580151..2c3f83c 100644
--- a/THANKS.in
+++ b/THANKS.in
@@ -608,7 +608,6 @@ Tony Leneis t...@plaza.ds.adp.com
 Tony Robinson   a...@eng.cam.ac.uk
 Toomas Soometoomas.so...@elion.ee
 Toralf Förster  toralf.foers...@gmx.de
-Torbjorn Granlund   t...@nada.kth.se
 Torbjorn Lindgren   t...@funcom.no
 Torsten Landschoff  tors...@pclab.ifg.uni-kiel.de
 Travis Gummels  tgumm...@redhat.com
diff --git a/src/factor.c b/src/factor.c
index 1d55805..e63e0e0 100644
--- a/src/factor.c
+++ b/src/factor.c
@@ -153,6 +153,9 @@ factor_using_division (mpz_t t, unsigned int limit)
   mpz_clear (r);
 }

+/* The number of Miller-Rabin tests we require.  */
+enum { MR_REPS = 25 };
+
 static void
 factor_using_pollard_rho (mpz_t n, int a_int)
 {
@@ -222,7 +225,7 @@ S4:

   mpz_div (n, n, g);   /* divide by g, before g is overwritten */

-  if (!mpz_probab_prime_p (g, 3))
+  if (!mpz_probab_prime_p (g, MR_REPS))
 {
   do
 {
@@ -242,7 +245,7 @@ S4:
   mpz_mod (x, x, n);
   mpz_mod (x1, x1, n);
   mpz_mod (y, y, n);
-  if (mpz_probab_prime_p (n, 3))
+  if (mpz_probab_prime_p (n, MR_REPS))
 {
   emit_factor (n);
   break;
@@ -411,7 +414,7 @@ print_factors_multi (mpz_t t)
   if (mpz_cmp_ui (t, 1) != 0)
 {
   debug ([is number prime?] );
-  if (mpz_probab_prime_p (t, 3))
+  if (mpz_probab_prime_p (t, MR_REPS))
 emit_factor (t);
   else
 factor_using_pollard_rho (t, 1);
--
1.7.12.176.g3fc0e4c


From

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-07 Thread Torbjorn Granlund

Paul Eggert egg...@cs.ucla.edu writes:

  On 09/06/2012 02:33 PM, Jim Meyering wrote:
* We have some hardwired W_TYPE_SIZE settings for the code interfacing
  to longlong.h.  It is now 64 bits.  It will break on systems where
  uintmax_t is not a 64-bit type.  Please see the beginning of
  factor.c.
   I wonder how many types of systems would be affected.
  
  It's only a matter of time.  GCC already supports 128-bit
  integers on my everyday host (Fedora 17, x86-64, GCC 4.7.1).
  Eventually uintmax_t will grow past 64 bits, if only for the
  crypto guys.
  
It should however be noted that uintmax_t stays at 64 bits even with
GCC's 128-bit integers.  I think the latter are declared as not being
integers, or something along those lines, to avoid the ABI-breaking
change of redefining uintmax_t.

  If the code needs exactly-64-bit unsigned integers, shouldn't
  it be using uint64_t?  That's the standard way of doing
  that sort of thing.  Gnulib can supply the type on pre-C99
  platforms.  Weird but standard-conforming platforms that
  don't have uint64_t will be out of luck, but surely they're out
  of luck anyway.
  
The code does not need any particular size of uintmax_t, except that we
need a preprocessor-time size measurement of it.  The reason for this is
longlong.h's tests of which single-line asm code to include.

The new factor program works without longlong.h, but some parts of it
will become 3-4 times slower.  To disable longlong.h, please compile
with -DUSE_LONGLONG_H=0. (The worst affected parts would probably be the
single-word Lucas code and all double-word factoring.)

I suppose that an autoconf test of the type size will be needed at least
for theoretical portability, if longlong.h is to be retained.

There is one other place where some (hypothetical) portability problems
may exist, and that's make-prime-list.c.  It prints a list of uintmax_t
literals.

We let the coreutils maintainers worry about the allowable complexity of
the factor program; I and Niels are happy to sacrifice some speed for
lowering the code complexity.  But first we will increase it by
retrofitting GMP factoring code.  :o)

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-07 Thread Jim Meyering

Torbjorn Granlund wrote:
...
* We have some hardwired W_TYPE_SIZE settings for the code interfacing
  to longlong.h.  It is now 64 bits.  It will break on systems where
  uintmax_t is not a 64-bit type.  Please see the beginning of
  factor.c.

   I wonder how many types of systems would be affected.

 It is not used currently anywhere in coreutils?  Perhaps coreutils could

uintmax_t is used throughout coreutils, but nowhere (that comes to mind)
does it fail when UINTMAX_MAX happens to be different than 2^64-1.
What I was wondering is how many systems have a uintmax_t that is
only 32 bits wide.  Now that I reread, I suppose this code would be
ok (albeit slower) with uintmax_t wider than 64.

 use autoconf for checking this?  (If we're really crazy, we could speed
 the factor program by an additional 20% by using blocked input with
 e.g. fread.)

 Please take a look at the generated code for factor_using_division,
 towards the end where 8 imulq should be found (on amd64).  The code uses
 mov, imul, cmp, jbe for testing the divisibility of a prime; the branch
 is taken when the prime divides the number being factored, thus highly
 non-taken.  (I suppose we could do a better job at describing the maths,
 with some references.  This particular trick is from Division by
 invariant integers using multiplication.)

Any place you can add a reference would be most welcome.

Here's one where I'd appreciate a reference in a comment:

  #define MAGIC64 ((uint64_t) 0x0202021202030213ULL)
  #define MAGIC63 ((uint64_t) 0x0402483012450293ULL)
  #define MAGIC65 ((uint64_t) 0x218a019866014613ULL)
  #define MAGIC11 0x23b

  /* Returns the square root if the input is a square, otherwise 0. */
  static uintmax_t
  is_square (uintmax_t x)
  {
/* Uses the tests suggested by Cohen. Excludes 99% of squares before
   computing the square root. */
if (((MAGIC64  (x  63))  1)
 ((MAGIC63  (x % 63))  1)
/* Both 0 and 64 are squares mod (65) */
 ((MAGIC65  ((x % 65)  63))  1)
 ((MAGIC11  (x % 11)  1)))
  {
uintmax_t r = isqrt (x);
if (r*r == x)
  return r;
  }
return 0;
  }

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-07 Thread Niels Möller

Jim Meyering j...@meyering.net writes:

 The existing code can factor arbitrarily large numbers quickly, as long
 as they have no large prime factors.  We should retain that capability.

My understanding is that most gnu/linux distributions build coreutils
without linking to gmp. So lots of users don't get this capability.

If this is an important feature, maybe one should consider bundling
mini-gmp and use that as a fallback in case coreutils is configured
without gmp (see
http://gmplib.org:8000/gmp/file/7677276bdf92/mini-gmp/README). I would
expect it to be a constant factor (maybe 10) times slower than the real
gmp for numbers up to a few hundred bits (for larger numbers, it gets
much slower due to lack of sophisticated algorithms, but we probably
can't factor them in reasonable time anyway).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-07 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 There is one other place where some (hypothetical) portability problems
 may exist, and that's make-prime-list.c.  It prints a list of uintmax_t
 literals.

I don't think the prime sieving is not a problem, but for each (odd)
prime p, it also computes p^{-1} mod 2^{bits} and floor ( (2^{bits} - 1)
/ p), where bits is the size of an uintmax_t. This will break cross
compilation, if uintmax_t is of different size on build and host system,
or if different suffixes (U, UL, ULL) are needed in the generated
primes.h.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-07 Thread Pádraig Brady


On 09/07/2012 09:43 AM, Niels Möller wrote:

Jim Meyeringj...@meyering.net  writes:


The existing code can factor arbitrarily large numbers quickly, as long
as they have no large prime factors.  We should retain that capability.


My understanding is that most gnu/linux distributions build coreutils
without linking to gmp. So lots of users don't get this capability.

If this is an important feature, maybe one should consider bundling
mini-gmp and use that as a fallback in case coreutils is configured
without gmp (see
http://gmplib.org:8000/gmp/file/7677276bdf92/mini-gmp/README). I would
expect it to be a constant factor (maybe 10) times slower than the real
gmp for numbers up to a few hundred bits (for larger numbers, it gets
much slower due to lack of sophisticated algorithms, but we probably
can't factor them in reasonable time anyway).


Bundling libraries is bad if one needed to update it.
The correct approach here is to file a bug against
your distro to enable gmp which is trivial matter
of adding the build and runtime dependency on gmp.

cheers,
Pádraig.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-07 Thread Pádraig Brady


On 09/07/2012 07:19 AM, Jim Meyering wrote:

There have been enough changes (mostly typo fixes) that I'm re-posting
these for review before I push.  Also, I added this sentence to NEWS
about the performance hit, too

 The fix makes factor somewhat slower (~25%) for ranges of consecutive
 numbers, and up to 8 times slower for some worst-case individual numbers.


Thanks for collating all the tweaks.
+1

Pádraig.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-07 Thread Niels Möller

Pádraig Brady p...@draigbrady.com writes:

 On 09/07/2012 09:43 AM, Niels Möller wrote:

 If this is an important feature, maybe one should consider bundling
 mini-gmp

 Bundling libraries is bad if one needed to update it.

mini-gmp is not an ordinary library. It's a single portable C source
file (currently around 4000 lines) implementing a subset of the GMP API,
and with performance only a few times slower than the real thing, for
small bignums. It's *intended* for bundling with applications, either
for unconditional use, or for use as a fallback if the real gmp library
is not available. It's never (I hope!) going to be installed in
/usr/lib. To me, coreutil's factor seem to be close match for what it's
intended for.

That said, mini-gmp is pretty new (I wrote most of it around last
Christmas) and I'm not aware of any application or library using it yet.
I think the guile hackers are considering using it (for the benefit of
applications which use guile as an extension language, but don't need
high performance bignums).

So if you decide to use it in coreutils, you'll be pioneers.

It *is* used in the GMP build process, for precomputing various internal
tables.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-07 Thread Pádraig Brady


On 09/07/2012 11:35 AM, Niels Möller wrote:

Pádraig Bradyp...@draigbrady.com  writes:


On 09/07/2012 09:43 AM, Niels Möller wrote:



If this is an important feature, maybe one should consider bundling
mini-gmp



Bundling libraries is bad if one needed to update it.


mini-gmp is not an ordinary library. It's a single portable C source
file (currently around 4000 lines) implementing a subset of the GMP API,
and with performance only a few times slower than the real thing, for
small bignums. It's *intended* for bundling with applications, either
for unconditional use, or for use as a fallback if the real gmp library
is not available. It's never (I hope!) going to be installed in
/usr/lib. To me, coreutil's factor seem to be close match for what it's
intended for.

That said, mini-gmp is pretty new (I wrote most of it around last
Christmas) and I'm not aware of any application or library using it yet.
I think the guile hackers are considering using it (for the benefit of
applications which use guile as an extension language, but don't need
high performance bignums).

So if you decide to use it in coreutils, you'll be pioneers.

It *is* used in the GMP build process, for precomputing various internal
tables.


I can see the need when bootstrapping,
but I'd prefer if coreutils just relied on regular GMP.

That said, I see there is some push back in debian on depending on GMP.
Note expr from coreutils also uses GMP, which may sway the decision.

thanks,
Pádraig.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-07 Thread Paul Eggert

On 09/07/2012 03:35 AM, Niels Möller wrote:
 It's *intended* for bundling with applications, either
 for unconditional use, or for use as a fallback if the real gmp library
 is not available.

I've been looking for something like that for Emacs, since I want
Emacs to use bignums.  Do you think it'd be suitable?

One hassle I have with combining Emacs and GMP is that
Emacs wants to control how memory is allocated, and wants its
memory allocator to longjmp out if memory gets low, and GMP
is documented to not support that.  If the mini-gmp library
doesn't have this problem I'm thinking that Emacs might use
it *instead* of GMP.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-07 Thread Torbjorn Granlund

Jim Meyering j...@meyering.net writes:

  uintmax_t is used throughout coreutils, but nowhere (that comes to mind)
  does it fail when UINTMAX_MAX happens to be different than 2^64-1.
  What I was wondering is how many systems have a uintmax_t that is
  only 32 bits wide.  Now that I reread, I suppose this code would be
  ok (albeit slower) with uintmax_t wider than 64.
  
The code with work with longlong.h iff W_TYPE_SIZE is defined to the
bitsize of uintmax_t.

  Any place you can add a reference would be most welcome.
  
I have added comments here and there.  More comments might be desirable.

  Here's one where I'd appreciate a reference in a comment:
  
#define MAGIC64 ((uint64_t) 0x0202021202030213ULL)
#define MAGIC63 ((uint64_t) 0x0402483012450293ULL)
#define MAGIC65 ((uint64_t) 0x218a019866014613ULL)
#define MAGIC11 0x23b
  
I added a comment explaining these constants.

Here is a new version of the code.  It now has GMP factoring code,
updated from the GMP demos code.



nt-factor-002.tar.lz
Description: Binary data

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-07 Thread Torbjorn Granlund

I found a problem with the GMP integration.

We have a 100 byte buffer in the stdin reading code, which was adequate
before we used GMP, but now one might want to attempt to factor much
larger numbers.

We'll fix that, but not tonight.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-06 Thread Torbjorn Granlund

I and Niels now would appreciate feedback on the new factor code.

We've put the entire little project in a tar file, which is attached.
The code is also available at http://gmplib.org:8000/factoring/.

Here is the README file:

NT factor  (Niels' and Torbjörn's factor, or New Technology factor)

This is a project for producing a decent 'factor' command for GNU.
The code was written by Torbjörn Granlund and Niels Möller in Aug-Sept
2012, but parts of the code is based on preexisting GMP code.

The old factor program could handle numbers  2^64, unless GMP was
available at build time.  Without GMP, only trial division was used;
with GMP an old version of GMP's demos/factorize.c code was used,
which relies on Pollard rho for much better performance.  The old
factor program used probabilistic Miller-Rabin primes testing.

The new code can factor numbers  2^127 and does not currently make
use of GMP, not even as an option.  It uses fast trial division code,
Pollard rho, and optionally SQUFOF.  It uses the Lucas primality test
instead of a probabilistic test.

The new code is several times faster then the old code, in particular
on 32-bit hardware.  On current 64-bit machines, it is between 3 times
and 1 times faster for ranges of numbers; for 32-bit machines we
have seen 150,000 times improvement for some number range.  The
advantage for the new code is greater for larger numbers, matching
mathematical theory of algorithm efficiency.  (These numbers compare
the new code to the old GMP-less code; GMP-enabled old code is only
between 3 and 10 times slower.)

For smaller numbers, more than half the time is spent in I/O,
buffering, and conversions.  We have not tried to optimise these
parts, but instead kept them clean.


* We don't have any --help or --version options currently.

* Our packaging with separate Makefile, outseq.c and ChangeLog was
  useful during our development.  We don't expect these to be useful
  in coreutils.  In particular, the slow testing of the 'check' target
  is probably quite unsuitable for coreutils (but similar but quicker
  tests would make sense).

* The files probably needed for coreutils are:

  o factor.c -- main factoring code
  o make-prime-list.c -- primes table generator program
  o longlong.h -- arithmetic support routines (from GMP)


Technical considerations:

* Should we handle numbers = 2^127?  That would in effect mean
  merging a current version of GMP's demos/factorize.c into this
  factor.c, and put that under HAVE_GMP (like in the old factor.c).
  It should be understood that factoring such large numbers with only
  Pollard rho is not very feasible.

* We think a --verbose option would be nice, although we don't have
  one in the present version.  It would output information on
  algorithm switches and bounds reached.  Opinions?


Portability caveats:

* We rely on POSIX.1 getchar_unlocked for a performance advantage.

* We have some hardwired W_TYPE_SIZE settings for the code interfacing
  to longlong.h.  It is now 64 bits.  It will break on systems where
  uintmax_t is not a 64-bit type.  Please see the beginning of
  factor.c.


Legal caveat:

* Both Niels and Torbjörn are GNU hackers since long.  We do not
  currently have paperwork in place for coreutils contributions.  This
  will certainly be addressed.




nt-factor.tar.lz
Description: Binary data


-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-06 Thread Torbjorn Granlund

I and Niels now would appreciate feedback on the new factor code.

We've put the entire little project in a tar file, which is attached.
The code is also available at http://gmplib.org:8000/factoring/.

Here is the README file:

NT factor  (Niels' and Torbjörn's factor, or New Technology factor)

This is a project for producing a decent 'factor' command for GNU.
The code was written by Torbjörn Granlund and Niels Möller in Aug-Sept
2012, but parts of the code is based on preexisting GMP code.

The old factor program could handle numbers  2^64, unless GMP was
available at build time.  Without GMP, only trial division was used;
with GMP an old version of GMP's demos/factorize.c code was used,
which relies on Pollard rho for much better performance.  The old
factor program used probabilistic Miller-Rabin primes testing.

The new code can factor numbers  2^127 and does not currently make
use of GMP, not even as an option.  It uses fast trial division code,
Pollard rho, and optionally SQUFOF.  It uses the Lucas primality test
instead of a probabilistic test.

The new code is several times faster then the old code, in particular
on 32-bit hardware.  On current 64-bit machines, it is between 3 times
and 1 times faster for ranges of numbers; for 32-bit machines we
have seen 150,000 times improvement for some number range.  The
advantage for the new code is greater for larger numbers, matching
mathematical theory of algorithm efficiency.  (These numbers compare
the new code to the old GMP-less code; GMP-enabled old code is only
between 3 and 10 times slower.)

For smaller numbers, more than half the time is spent in I/O,
buffering, and conversions.  We have not tried to optimise these
parts, but instead kept them clean.


* We don't have any --help or --version options currently.

* Our packaging with separate Makefile, outseq.c and ChangeLog was
  useful during our development.  We don't expect these to be useful
  in coreutils.  In particular, the slow testing of the 'check' target
  is probably quite unsuitable for coreutils (but similar but quicker
  tests would make sense).

* The files probably needed for coreutils are:

  o factor.c -- main factoring code
  o make-prime-list.c -- primes table generator program
  o longlong.h -- arithmetic support routines (from GMP)


Technical considerations:

* Should we handle numbers = 2^127?  That would in effect mean
  merging a current version of GMP's demos/factorize.c into this
  factor.c, and put that under HAVE_GMP (like in the old factor.c).
  It should be understood that factoring such large numbers with only
  Pollard rho is not very feasible.

* We think a --verbose option would be nice, although we don't have
  one in the present version.  It would output information on
  algorithm switches and bounds reached.  Opinions?


Portability caveats:

* We rely on POSIX.1 getchar_unlocked for a performance advantage.

* We have some hardwired W_TYPE_SIZE settings for the code interfacing
  to longlong.h.  It is now 64 bits.  It will break on systems where
  uintmax_t is not a 64-bit type.  Please see the beginning of
  factor.c.


Legal caveat:

* Both Niels and Torbjörn are GNU hackers since long.  We do not
  currently have paperwork in place for coreutils contributions.  This
  will certainly be addressed.




nt-factor.tar.lz
Description: Binary data


-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-06 Thread Jim Meyering

Torbjorn Granlund wrote:
 I and Niels now would appreciate feedback on the new factor code.

 We've put the entire little project in a tar file, which is attached.
 The code is also available at http://gmplib.org:8000/factoring/.

Thanks a lot!  I've started looking at the code.
I was surprised to see make check fail.

$ ./ourseq 0 10  k 
 :
$ ./factor  k  
 :
0:
1:
2: 2
3: 3
4: 2 2
5: 5
6: 2 3
7: 7
8: 2 2 2
9: 3 3
zsh: abort (core dumped)  ./factor  k

That was due to unexpected input.
Poking around, I see that ourseq writes from uninitialized memory:

$ ./ourseq 9 11
9
102
112
$ ./ourseq 9 11
9
10
11
$ ./ourseq 9 11
9
10
11

The fix is to change the memmove to copy one more byte each time:
to copy the required trailing NUL.
With that, it looks like make check will pass.
It will definitely benefit from running the individual
tests in parallel ;-)

From 9e6db73344f43e828b8d716a0ea6a5842980d518 Mon Sep 17 00:00:00 2001
From: Jim Meyering meyer...@redhat.com
Date: Thu, 6 Sep 2012 22:12:41 +0200
Subject: [PATCH] incr: don't omit trailing NUL when incrementing

---
 ourseq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ourseq.c b/ourseq.c
index d2472aa..cb71f13 100644
--- a/ourseq.c
+++ b/ourseq.c
@@ -48,7 +48,7 @@ incr (string *st)
}
   s[i] = '0';
 }
-  memmove (s + 1, s, len);
+  memmove (s + 1, s, len + 1);
   s[0] = '1';
   st-len = len + 1;
 }
--
1.7.12.176.g3fc0e4c

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-06 Thread Torbjorn Granlund

Jim Meyering j...@meyering.net writes:

  zsh: abort (core dumped)  ./factor  k
  
  That was due to unexpected input.

The parsing of the new factor is probably not too bad, but the error
reporting could be better.  :o)

  --- a/ourseq.c
  +++ b/ourseq.c
  @@ -48,7 +48,7 @@ incr (string *st)
}
 s[i] = '0';
   }
  -  memmove (s + 1, s, len);
  +  memmove (s + 1, s, len + 1);
 s[0] = '1';
 st-len = len + 1;
   }

Thanks.  Culpa mea, ourseq is not my finest work, just a hack for
testing factor.
  

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-06 Thread Jim Meyering

Torbjorn Granlund wrote:
 I and Niels now would appreciate feedback on the new factor code.
...
 * Our packaging with separate Makefile, outseq.c and ChangeLog was
   useful during our development.  We don't expect these to be useful
   in coreutils.  In particular, the slow testing of the 'check' target
   is probably quite unsuitable for coreutils (but similar but quicker
   tests would make sense).

I think the tests will be fine, as long as they're separate, and hence
can be parallelized by the default mechanism.  We might want to label
most of them as expensive, so that they're run only by those who set
RUN_EXPENSIVE_TESTS=yes in their environment.

 * The files probably needed for coreutils are:

   o factor.c -- main factoring code
   o make-prime-list.c -- primes table generator program
   o longlong.h -- arithmetic support routines (from GMP)


 Technical considerations:

 * Should we handle numbers = 2^127?  That would in effect mean
   merging a current version of GMP's demos/factorize.c into this
   factor.c, and put that under HAVE_GMP (like in the old factor.c).
   It should be understood that factoring such large numbers with only
   Pollard rho is not very feasible.

The existing code can factor arbitrarily large numbers quickly, as long
as they have no large prime factors.  We should retain that capability.

 * We think a --verbose option would be nice, although we don't have
   one in the present version.  It would output information on
   algorithm switches and bounds reached.  Opinions?

I think it would be worthwhile, especially to give an idea of what progress
is being made when factoring very large numbers, but hardly something
that need be done now.

E.g., currently this doesn't print much:

$ M8=$(echo 2^31-1|bc) M9=$(echo 2^61-1|bc) M10=$(echo 2^89-1|bc)
$ factor --verbose $(echo $M8 * $M9 * $M10 | bc)
[using arbitrary-precision arithmetic][trial division (32761)] [is number 
prime?] [pollard-rho (1)]

Ideally it'd print something every second or two.

 Portability caveats:

 * We rely on POSIX.1 getchar_unlocked for a performance advantage.

 * We have some hardwired W_TYPE_SIZE settings for the code interfacing
   to longlong.h.  It is now 64 bits.  It will break on systems where
   uintmax_t is not a 64-bit type.  Please see the beginning of
   factor.c.

I wonder how many types of systems would be affected.

 Legal caveat:

 * Both Niels and Torbjörn are GNU hackers since long.  We do not
   currently have paperwork in place for coreutils contributions.  This
   will certainly be addressed.

Thanks.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-06 Thread Torbjorn Granlund

Jim Meyering j...@meyering.net writes:

  The existing code can factor arbitrarily large numbers quickly, as long
  as they have no large prime factors.  We should retain that capability.
  
OK, so we'll put the GMP demos program into this one.

This opens another technical concern:

We have moved towards proving primality, since for 128 bit numbers it
can be done quickly.  But if we allow arbitrary large numbers, it is
expensive.

We might want an option for this choosing probabilistic testing, perhaps
--prp (common abbreviation for PRobabilistic Prime).  By default, we
should prove primality, I think.

My current devel version if GMP's demos/factorize has Lucas code.

   * We think a --verbose option would be nice, although we don't have
 one in the present version.  It would output information on
 algorithm switches and bounds reached.  Opinions?
  
  I think it would be worthwhile, especially to give an idea of what progress
  is being made when factoring very large numbers, but hardly something
  that need be done now.
  
  E.g., currently this doesn't print much:
  
  $ M8=$(echo 2^31-1|bc) M9=$(echo 2^61-1|bc) M10=$(echo 2^89-1|bc)
  $ factor --verbose $(echo $M8 * $M9 * $M10 | bc)
  [using arbitrary-precision arithmetic][trial division (32761)] [is number 
prime?] [pollard-rho (1)]
  
  Ideally it'd print something every second or two.
  
I'll let Niels worry about this, since he was the one to ask for it.

   Portability caveats:
  
   * We rely on POSIX.1 getchar_unlocked for a performance advantage.
  
   * We have some hardwired W_TYPE_SIZE settings for the code interfacing
 to longlong.h.  It is now 64 bits.  It will break on systems where
 uintmax_t is not a 64-bit type.  Please see the beginning of
 factor.c.
  
  I wonder how many types of systems would be affected.
  
It is not used currently anywhere in coreutils?  Perhaps coreutils could
use autoconf for checking this?  (If we're really crazy, we could speed
the factor program by an additional 20% by using blocked input with
e.g. fread.)

Please take a look at the generated code for factor_using_division,
towards the end where 8 imulq should be found (on amd64).  The code uses
mov, imul, cmp, jbe for testing the divisibility of a prime; the branch
is taken when the prime divides the number being factored, thus highly
non-taken.  (I suppose we could do a better job at describing the maths,
with some references.  This particular trick is from Division by
invariant integers using multiplication.)

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-06 Thread Paul Eggert

On 09/06/2012 02:33 PM, Jim Meyering wrote:
  * We have some hardwired W_TYPE_SIZE settings for the code interfacing
to longlong.h.  It is now 64 bits.  It will break on systems where
uintmax_t is not a 64-bit type.  Please see the beginning of
factor.c.
 I wonder how many types of systems would be affected.

It's only a matter of time.  GCC already supports 128-bit
integers on my everyday host (Fedora 17, x86-64, GCC 4.7.1).
Eventually uintmax_t will grow past 64 bits, if only for the
crypto guys.

If the code needs exactly-64-bit unsigned integers, shouldn't
it be using uint64_t?  That's the standard way of doing
that sort of thing.  Gnulib can supply the type on pre-C99
platforms.  Weird but standard-conforming platforms that
don't have uint64_t will be out of luck, but surely they're out
of luck anyway.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-05 Thread Jim Meyering

Eric Blake wrote:
 On 09/04/2012 10:42 AM, Jim Meyering wrote:
 Jim Meyering wrote:

  Torbjorn Granlund wrote:

 Problem numbers are of the for N=pq, p,q prime and (p-1)/(q-1) = s,

 s/for/form/

Fixed.  Thanks.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-05 Thread Jim Meyering

Torbjorn Granlund wrote:
 Jim Meyering j...@meyering.net writes:

   Jim Meyering wrote:

Torbjorn Granlund wrote:
The very old factoring code cut from an now obsolete version GMP does
not pass proper arguments to the mpz_probab_prime_p function.  It ask
for 3 Miller-Rabin tests only, which is not sufficient.
   
Hi Torbjorn
   
Thank you for the patch and explanation.
I've converted that into the commit below in your name.
Please proofread it and let me know if you'd like to change anything.
I tweaked the patch to change MR_REPS from a #define to an enum
and to add the comment just preceding.
   
I'll add NEWS and tests separately.
   ...
From: Torbjorn Granlund t...@gmplib.org
Date: Tue, 4 Sep 2012 16:22:47 +0200
Subject: [PATCH] factor: don't ever declare composites to be prime

   Torbjörn, I've just noticed that I misspelled your name above.

 Did you?

I meant that I used Torbjorn rather than Torbjörn.

 Well, you misspell recognise too, but then again, most people
 on the other side of the pond misspell lots of English words.  :-)

Yes, the dichotomy is unfortunate.
Over the years, it has even caused interface problems, i.e.,
with --colours vs --colors and $LS_COLOURS vs LS_COLORS.
I wanted to settle on one, and US spelling is more common
than British so I settled on that.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-04 Thread Torbjorn Granlund

The very old factoring code cut from an now obsolete version GMP does
not pass proper arguments to the mpz_probab_prime_p function.  It ask
for 3 Miller-Rabin tests only, which is not sufficient.

I am afraid the original poor code was wrritten by me, where this
particular problem was introduced with 3.0's demos/factorize.c.

A Miller-Rabin test will detect composites with at least a probability
of 3/4.  For a uniform random composite, the probability will actually
by much higher.

Or put another way, of the N-3 possible Miller-Rabin tests for checking
the composite N, there is no number N for which more than (N-3)/4 of the
tests will fail to detect the number as a composite.  For most numbers N
the number of false witnesses will be much, much lower.

Problem numbers are of the for N=pq, p,q prime and (p-1)/(q-1) = s,
where s is a small integer.  (There are other problem forms too,
incvolving 3 or more prime factors.)  When s = 2, we get the 3/4 factor.

It is easy to find numbers of that form that causes coreutils factor to
fail:

465658903
2242724851
6635692801
17709149503
17754345703
20889169003
42743470771
54890944111
72047131003
85862644003
98275842811
114654168091
117225546301
...

There are 9008992 composites of the form with s=2 below 2^64.  With 3
Miller-Rabin test, one would expect about 9008992/4^64 = 140766 to be
invalidly recognised as primes in that range.

Here is a simple patch:



diff
Description: Binary data

I and Niels Möller have written a suggested replacement for coreutils
factor.c.  It fixes a number of issues with the current code:

(1) Much faster trial division code ( 10x) based on a small table of
prime inverses.  Still, the new code doesn't perform lots of trial
dividing.

(2) Pollard rho code using Montgomery representation for numbers  2^64.
(We consider extending this to 128 bits.)  Not dependent on GMP.

(3) Lucas prime proving code instead of probablitic Miller-Rabin primes
testing.

(4) SQUFOF code, which might be included depending on performance
issues.

(5) Replacement GMP code (#if HAVE_GMP) that also includes Lucas proving
code.

The new code is faster than the current code:

Old:
  seq `pexpr 2^64-1000` `pexpr 2^64-1` | time factor /dev/null
  524.57 user

New:
  seq `pexpr 2^64-1000` `pexpr 2^64-1` | time ./factor /dev/null
  0.05 user

For smaller number ranges, the improvements are currently much more
modest, as little as 2x in some cases.

The code should be ready within a few weeks.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-04 Thread Bernhard Voelker



On 09/04/2012 04:46 PM, Jim Meyering wrote:
 incvolving 3 or more prime factors.)  When s = 2, we get the 3/4 factor.

s/incvolving/involving/

Have a nice day,
Berny

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-04 Thread Jim Meyering

Bernhard Voelker wrote:
 On 09/04/2012 04:46 PM, Jim Meyering wrote:
 incvolving 3 or more prime factors.)  When s = 2, we get the 3/4 factor.

 s/incvolving/involving/

Pádraig Brady wrote:
 Miller-Rabin test, one would expect about 9008992/4^64 = 140766 to be
s/4^64/64/ ?

Fixed both.  Thanks.
I've also changed s/s/z/ in recognised.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-04 Thread Jim Meyering

Jim Meyering wrote:

 Torbjorn Granlund wrote:
 The very old factoring code cut from an now obsolete version GMP does
 not pass proper arguments to the mpz_probab_prime_p function.  It ask
 for 3 Miller-Rabin tests only, which is not sufficient.

 Hi Torbjorn

 Thank you for the patch and explanation.
 I've converted that into the commit below in your name.
 Please proofread it and let me know if you'd like to change anything.
 I tweaked the patch to change MR_REPS from a #define to an enum
 and to add the comment just preceding.

 I'll add NEWS and tests separately.
...
 From: Torbjorn Granlund t...@gmplib.org
 Date: Tue, 4 Sep 2012 16:22:47 +0200
 Subject: [PATCH] factor: don't ever declare composites to be prime

Torbjörn, I've just noticed that I misspelled your name above.

Here's the NEWS/tests addition.
Following is an adjusted commit that spells your name properly.

From e561ff991b74dc19f6728aa1e6e61d1927055ac1 Mon Sep 17 00:00:00 2001
From: Jim Meyering meyer...@redhat.com
Date: Tue, 4 Sep 2012 18:26:25 +0200
Subject: [PATCH] factor: doc and tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* NEWS (Bug fixes): Mention it.
* tests/misc/factor.pl: Add five of Torbjörn's tests.
---
 NEWS | 3 +++
 tests/misc/factor.pl | 5 +
 2 files changed, 8 insertions(+)

diff --git a/NEWS b/NEWS
index f3874fd..ffa7939 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,9 @@ GNU coreutils NEWS-*- 
outline -*-
   it detects this precise type of cycle, diagnoses it as such and
   eventually exits nonzero.

+  factor (when using gmp) would mistakenly declare some composite numbers
+  to be prime, e.g., 465658903, 2242724851, 6635692801.
+
   rm -i -d now prompts the user then removes an empty directory, rather
   than ignoring the -d option and failing with an 'Is a directory' error.
   [bug introduced in coreutils-8.19, with the addition of --dir (-d)]
diff --git a/tests/misc/factor.pl b/tests/misc/factor.pl
index 47f9343..38a5037 100755
--- a/tests/misc/factor.pl
+++ b/tests/misc/factor.pl
@@ -67,6 +67,11 @@ my @Tests =
   {OUT = 4: 2 2\n},
   {ERR = $prog: 'a' is not a valid positive integer\n},
   {EXIT = 1}],
+ ['bug-2012-a', '465658903', {OUT = '15259 30517'}],
+ ['bug-2012-b', '2242724851', {OUT = '33487 66973'}],
+ ['bug-2012-c', '6635692801', {OUT = '57601 115201'}],
+ ['bug-2012-d', '17709149503', {OUT = '94099 188197'}],
+ ['bug-2012-e', '17754345703', {OUT = '94219 188437'}],
 );

 # Prepend the command line argument and append a newline to end
--
1.7.12.176.g3fc0e4c


From 4c21a96443ee26eb0d4da31526ce4cf180ac7a4e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Torbj=C3=B6rn=20Granlund?= t...@gmplib.org
Date: Tue, 4 Sep 2012 18:38:29 +0200
Subject: [PATCH] factor: don't ever declare composites to be prime

The multiple-precision factoring code (with HAVE_GMP) was copied from
a now-obsolete version of GMP that did not pass proper arguments to
the mpz_probab_prime_p function.  It makes that code perform no more
than 3 Miller-Rabin tests only, which is not sufficient.

A Miller-Rabin test will detect composites with at least a probability
of 3/4.  For a uniform random composite, the probability will actually
by much higher.

Or put another way, of the N-3 possible Miller-Rabin tests for checking
the composite N, there is no number N for which more than (N-3)/4 of the
tests will fail to detect the number as a composite.  For most numbers N
the number of false witnesses will be much, much lower.

Problem numbers are of the for N=pq, p,q prime and (p-1)/(q-1) = s,
where s is a small integer.  (There are other problem forms too,
involving 3 or more prime factors.)  When s = 2, we get the 3/4 factor.

It is easy to find numbers of that form that cause coreutils' factor to
fail:

  465658903
  2242724851
  6635692801
  17709149503
  17754345703
  20889169003
  42743470771
  54890944111
  72047131003
  85862644003
  98275842811
  114654168091
  117225546301
  ...

There are 9008992 composites of the form with s=2 below 2^64.  With 3
Miller-Rabin test, one would expect about 9008992/64 = 140766 to be
invalidly recognized as primes in that range.

* src/factor.c (MR_REPS): Define to 25.
(factor_using_pollard_rho): Use MR_REPS, not 3.
(print_factors_multi): Likewise.
* THANKS.in: Remove my name, now that it will be automatically
included in the generated THANKS file.
---
 THANKS.in| 1 -
 src/factor.c | 9 ++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/THANKS.in b/THANKS.in
index 1580151..2c3f83c 100644
--- a/THANKS.in
+++ b/THANKS.in
@@ -608,7 +608,6 @@ Tony Leneis t...@plaza.ds.adp.com
 Tony Robinson   a...@eng.cam.ac.uk
 Toomas Soometoomas.so...@elion.ee
 Toralf Förster  toralf.foers...@gmx.de
-Torbjorn Granlund   t...@nada.kth.se
 Torbjorn Lindgren

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-04 Thread Eric Blake

On 09/04/2012 10:42 AM, Jim Meyering wrote:
 Jim Meyering wrote:
 
  Torbjorn Granlund wrote:
 
 Problem numbers are of the for N=pq, p,q prime and (p-1)/(q-1) = s,

s/for/form/

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-04 Thread Torbjorn Granlund

Jim Meyering j...@meyering.net writes:

  Jim Meyering wrote:
  
   Torbjorn Granlund wrote:
   The very old factoring code cut from an now obsolete version GMP does
   not pass proper arguments to the mpz_probab_prime_p function.  It ask
   for 3 Miller-Rabin tests only, which is not sufficient.
  
   Hi Torbjorn
  
   Thank you for the patch and explanation.
   I've converted that into the commit below in your name.
   Please proofread it and let me know if you'd like to change anything.
   I tweaked the patch to change MR_REPS from a #define to an enum
   and to add the comment just preceding.
  
   I'll add NEWS and tests separately.
  ...
   From: Torbjorn Granlund t...@gmplib.org
   Date: Tue, 4 Sep 2012 16:22:47 +0200
   Subject: [PATCH] factor: don't ever declare composites to be prime
  
  Torbjörn, I've just noticed that I misspelled your name above.
  
Did you?  Well, you misspell recognise too, but then again, most people
on the other side of the pond misspell lots of English words.  :-)

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-04 Thread Torbjorn Granlund

Pádraig Brady p...@draigbrady.com writes:

  On 09/04/2012 03:46 PM, Jim Meyering wrote:
   There are 9008992 composites of the form with s=2 below 2^64.  With 3
   Miller-Rabin test, one would expect about 9008992/4^64 = 140766 to be
  
  s/4^64/64/ ?
  
  For what it's worth I checked the million primes in
  the range 452,930,477 to 472,882,027 and they're
  now identified correctly (465658903 was included previously).
  
  Note processing time has increased with the patch.
  On my 2.1GHz i3-2310M, running over the above range
  used to take 14m, but now takes 18m.
  
It sometimes takes more time to do things correctly.

As I mentioned in the original post, we will replace the current code
with code that is many times faster.  Your example above will run at
less than a minute on your system.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-04 Thread Pádraig Brady


On 09/04/2012 07:10 PM, Torbjorn Granlund wrote:

Pádraig Bradyp...@draigbrady.com  writes:

   On 09/04/2012 03:46 PM, Jim Meyering wrote:
 There are 9008992 composites of the form with s=2 below 2^64.  With 3
 Miller-Rabin test, one would expect about 9008992/4^64 = 140766 to be

   s/4^64/64/ ?

   For what it's worth I checked the million primes in
   the range 452,930,477 to 472,882,027 and they're
   now identified correctly (465658903 was included previously).

   Note processing time has increased with the patch.
   On my 2.1GHz i3-2310M, running over the above range
   used to take 14m, but now takes 18m.

It sometimes takes more time to do things correctly.


Sure. I was just quantifying the performance change,
for others who may be referencing or noticing patches.
(Actually, I'd add a note to the commit message that,
this increases calculations by about 25%).


As I mentioned in the original post, we will replace the current code
with code that is many times faster.  Your example above will run at
less than a minute on your system.


I'd left my test files in place in anticipation ;)

thanks,
Pádraig.

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

2012-09-04 Thread Torbjorn Granlund

Pádraig Brady p...@draigbrady.com writes:

  Sure. I was just quantifying the performance change,
  for others who may be referencing or noticing patches.
  (Actually, I'd add a note to the commit message that,
  this increases calculations by about 25%).
  
And surely mode for certain cases. We spend 25/3 or about 8 times more
effort in Miller Rabin.

   As I mentioned in the original post, we will replace the current code
   with code that is many times faster.  Your example above will run at
   less than a minute on your system.
  
  I'd left my test files in place in anticipation ;)
  
Please do, and let me and Niels know if it takes more than 45s.  Your
test case takes 28s on my 3.3 GHz Sandy bridge system with our current
code.  I'm a little disappointed the code doesn't beat the old code more
for small factorisations.

-- 
Torbjörn

71 matches

Mail list logo