Mersenne: Re: Factoring benefit/cost ratio

2001-12-06 Thread Steinar H. Gunderson

On Wed, Dec 05, 2001 at 07:10:24PM -0600, [EMAIL PROTECTED] wrote:
Am I correct in interpreting this to mean that you
think that using 64-bit residuals is more reliable than using 16-bit
residuals?  If so, then surely you'll grant that 256-bit residuals
would be even more reliable yet, meaning that there's still room for
error in our practice of using 64-bit residuals.

Given that the chance of error (given that it is a random error and no
program error) of a wrongly matched 64-bit residual is about
0.0542%, I think you'll agree that a 64-bit residual would
be sufficient. :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-05 Thread ribwoods

Brian Beesley wrote:
 On 3 Dec 2001, at 20:38, [EMAIL PROTECTED] wrote:
[... snip ...]
  I think our record shows that a verified factor is still
  slightly (by a minute but nonzero margin) more reliable an
  indicator of compositeness than two matching nonzero LL
  residues.

 AFAIK our record does _not_ show any such thing.

Oh?  It doesn't?

If our record does _not_ show that a verified factor is more reliable
an indicator than DC LLs, then what does it show as the (higher,
according to this hypothesis) error rate for verified factors?

Is the observed error rate in reported factors as great as the ~1-1.5%
error rate in reported LL results?  How often does double-checking a
reported factor find that the reported factor was in error?

 In _theory_ there is a very small chance that we _may_ have accepted
 an incorrect residual even after double-checking. There is a small
 chance in such a case that the residual should have been zero, i.e.
 we missed a Mersenne prime.

... and my contention is that those small chances, for L-L results,
exceed the corresponding probabilities in the case of a reported
factor.

There is a small chance that we may accept an incorrect factor even
after double-checking it, but that chance is even smaller than the
small chance that we may accept an incorrect double-checked L-L
residual.

 I've triple-checked thousands of small exponents - some of the
 ones where the accepted residual was recorded to only 16 bits or
 less, which makes the chance of an undetected error _much_
 greater (though still quite small) - so far no substantive errors in
 the database have come to light. A very few (think fingers of one
 hand) instances of incorrectly matched residuals have come to light
-

How does that compare to the observed rate of incorrect factors
discovered after triple-checking _them_?

  Some error sources would be more likely to affect the longer LL
  test than the shorter factor verification.)

 Indeed - if errors occur at random intervals, results should get
 less reliable as runs take progressively longer.

... and L-L verification runs take a lot longer than factor
verification runs, so we can reasonably expect that this class of
error will affect L-L verification runs (and thus lower such runs'
reliability) a lot more than they do factor verification runs, all
else being equal (such as use of the same hardware systems).

 However there does seem to be a wide variation in the reliability
 of systems. Some seem to have a lot of problems, some very few
 (if any).

So, while a factor verification run on a system that has a lot of
problems may be less reliable than an L-L verification run on a
system that has very few or no problems, if we were to compare
L-L verification and factor verification runs on the same system or
on systems of equal problem probability, we would expect to find that
errors whose rates were proportional to time would affect the L-L
verification runs more than the factor verification runs because of
the significant difference in average run times between the two
categories, making the L-L verification runs less reliable than
the factor verification runs.  Agreed?

 I've detected four problems on my own systems so far; two of these
 were caused by a failed cooling fan on a P100 system (unlike
 current systems, the cooling was _almost_ sufficient even without
 forced ventilation); one was caused by a configuration error -
 running Glucas on an Alpha system I inadvertently allowed fast
 arithmetic and turned off error checking, which was a really stupid
 thing to do on a system with a 53-bit mantissa FPU - as this was a
 QA run on a 33 million range exponent, I was lucky to lose only 3
 months work. The last one I never got to the bottom of, but I
 strongly suspect that Prime95's workspace memory was clobbered
 during a software installation on a Windows 98 system.

How many of those problems caused errors during L-L verifications,
and how many caused errors during factor verifications?

However, you may not have spent anywhere near as much time doing
factor verifications as you have doing L-L verifications, so it may
not be valid to draw any conclusion about comparative error rates on
your system.

Richard B. Woods


_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-05 Thread bjb
On 5 Dec 2001, at 6:09, [EMAIL PROTECTED] wrote:

> Brian Beesley wrote:
> > On 3 Dec 2001, at 20:38, [EMAIL PROTECTED] wrote:
> [... snip ...]
> > > I think our record shows that a verified factor is still
> > > slightly (by a minute but nonzero margin) more reliable an
> > > indicator of compositeness than two matching nonzero LL
> > > residues.
> >
> > AFAIK our record does _not_ show any such thing.
> 
> Oh?  It doesn't?

There is no evidence of any verified residuals being incorrect.  Neither is there any evidence that any verified factors are incorrect.
Whatever theory states, the experimental evidence is that verified  factors are no more (or less) reliable than verified LL tests.

Suppose a taxi firm runs 10 Fords and 10 Hondas for a year. None  of them break down. On that basis alone, there is no evidence  whatsoever that one make is more reliable than the other.  Naturally, other companies' experimental evidence may vary.
>
> [ big snip ]
> There is a small chance that we may accept an incorrect factor even
> after double-checking it, but that chance is even smaller than the
> small chance that we may accept an incorrect double-checked L-L
> residual.

I doubt very much that we would accept an incorrect factor. The  double-checking is done with completely different code. Besides  which, checking a factor takes a few microseconds, whereas  checking a LL test likely takes a few hundred hours.

If anything goes wrong during a factoring run, we would be far more  likely to miss a factor which we should have found rather than vice  versa. This is relatively unimportant from the point of view of finding  Mersenne primes; the effect is a small loss of efficiency.

> How does that compare to the observed rate of incorrect factors
> discovered after triple-checking _them_?

AFAIK no-one bothers to triple-check factors. Nowadays factors  are verified on the server at the time they're reported. I'm not privy  to the server logs, so I simply don't know how many get rejected  (except for the server _database_ problem reported recently - not  related to the actual factor checking code - which causes problems  with genuine factors > 32 digits in length). However, I can think of  at least one way of pushing up the rejection rate.
> 
> How many of those problems caused errors during L-L verifications,
> and how many caused errors during factor verifications?
> 
All during LL first tests, or QA runs (which are LL  DC done in  parallel with intermediate crosscheck points).

> However, you may not have spent anywhere near as much time doing
> factor verifications as you have doing L-L verifications, so it may
> not be valid to draw any conclusion about comparative error rates on
> your system.

I've spent no time at all verifying factors - it would take less than a  minute to verify everything in the factors database. The total  factoring effort I've put in (ignoring ECM  P-1 on small exponents)  is only about 3% of my total contribution, so I would expect not to  have had any factoring errors. Besides which, trial factoring  _should_ have a lower error rate than LL testing, due to the lower  load on the FPU (which is usually the CPU element most sensite  to excess heat) and the smaller memory footprint (less chance of  data getting clobbered by rogue software or random bit-flips).


Regards
Brian Beesley
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers


Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-05 Thread ribwoods

Brian,

I'm wondering whether we may be misunderstanding each other's
contentions here.  I thought you object to at least some of what I
claimed, but now it seems that you're presenting arguments and
evidence that support what I'm claiming.

Since my previous postings may have had careless wording or otherwise
obscured my intentions, and I did not earlier realize the importance
of certain details to the discussion, let me restate what I've meant
to claim:

1. It is more valuable to know a specific factor of a Mnumber than to
know that that Mnumber is composite without knowing any specific
factor.

(There's little dispute about #1.)

2. Claim #1 is true not only from the viewpoint of mathematics in
general, but also from the narrower viewpoint of the GIMPS search for
Mersenne primes.

3. One (but not the only) justification for claim #2 is that, _in
current practice_, a composite status derived by GIMPS from finding a
specific factor is (slightly) more reliable than a composite status
derived by GIMPS from matching nonzero residues from Lucas-Lehmer
tests.

That is, although in theory, or ideally, those two methods of
determining compositeness are equally reliable, there currently exists
a slight difference in reliability, in favor of the factor, from a
practical standpoint.

4. Our experience (the record), as documented in the Mersenne
mailing list or GIMPS history, supports claim #3.

- - - - -

Brian Beesley wrote:
  AFAIK our record does _not_ show any such thing.

 Oh?  It doesn't?

 There is no evidence of any verified residuals being incorrect.

Wait a second -- just yesterday you wrote that you had triple-checked
thousands of small exponents (which means they had already been
double-checked) and that A very few (think fingers of one hand)
instances of incorrectly matched residuals have come to light -
completing the double-check in these cases proved that one of the
recorded residuals was correct.

So it seems that the meaning you're assigning to verified is
something like retested and retested until two residuals match.
Is that a correct interpretation?  If not, what is?

My claim #3 means that in practice, factors require fewer verification
runs to produce matching results than do L-L residues, on average.
Do you disagree with that?  If not, then don't we agree about claim
#3?

Furthermore, my claim #4 means that the demonstration that factors
require fewer verification runs to produce matching results than do
L-L residues, on average, rests on the observed history _including the
paragraph you wrote from which I just quoted above!_  Do you disagree?

Also, in that same paragraph you wrote, ... - some of the ones where
the accepted residual was recorded to only 16 bits or less, which
makes the chance of an undetected error _much_ greater (though still
quite small) ...  Am I correct in interpreting this to mean that you
think that using 64-bit residuals is more reliable than using 16-bit
residuals?  If so, then surely you'll grant that 256-bit residuals
would be even more reliable yet, meaning that there's still room for
error in our practice of using 64-bit residuals.  But a specific
factor
is a _complete value_, not some truncation, and so its reliability is
not damaged by the incompleteness which you admit keeps the L-L
residues from being totally reliable - right?

Then you wrote so far no substantive errors in the database have come
to light, but seemingly contradicted that in the very next sentence,
A very few (think fingers of one hand) instances of incorrectly
matched residuals have come to light - completing the double-check in
these cases proved that one of the recorded residuals was correct.
... And thus _the other_ recorded residual was _incorrect_.

 Neither is there any evidence that any verified factors are
incorrect.

Depends on the meaning of verified, of course.  :-)

Will Edgington (I think) has reported finding errors in his factor
data base ... even though he verifies factors before adding them.
Mistakes happen.  But I think the error rate for factors has been
significantly lower than for L-L residuals.

 Whatever theory states, the experimental evidence is that verified
 factors are no more (or less) reliable than verified LL tests.

Then why don't we triple-check factors as often as we triple-check L-L
results?  Oh, wait ... depends on the meaning of verified, again.

 Suppose a taxi firm runs 10 Fords and 10 Hondas for a year.
[ snip ]

Let's have an example in which the number and nature of the units is
closer to the gigabytes of data items we're slinging around.

 Besides which, checking a factor takes a few microseconds, whereas
 checking a LL test likely takes a few hundred hours.

... which tends to support my claim #3.

 If anything goes wrong during a factoring run, we would be far more
 likely to miss a factor which we should have found rather than vice
 versa.

... which is in agreement with claim #3.

 How does that compare to the observed rate of incorrect 

Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-05 Thread Steve Harris

Richard,

Your first interpretation of verified residues is correct, they are
retested until two residues match. Any time a double-check reports in a
residue which is different from the first LL test, the exponent is returned
to the database to be tested again. This means that at least one of the
residues is incorrect, and happens (relatively) often, I believe about two
percent of the time.  However, as has been pointed out before, the odds of
two LL tests on different machines returning the _same_  incorrect residues
are astronomical (although, of course, still non-zero).

Steve

-Original Message-
From: [EMAIL PROTECTED] [EMAIL PROTECTED]
To: [EMAIL PROTECTED] [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED] [EMAIL PROTECTED]
Date: Wednesday, December 05, 2001 8:34 PM
Subject: Re: Mersenne: Re: Factoring benefit/cost ratio


Brian,

I'm wondering whether we may be misunderstanding each other's
contentions here.  I thought you object to at least some of what I
claimed, but now it seems that you're presenting arguments and
evidence that support what I'm claiming.

Since my previous postings may have had careless wording or otherwise
obscured my intentions, and I did not earlier realize the importance
of certain details to the discussion, let me restate what I've meant
to claim:

1. It is more valuable to know a specific factor of a Mnumber than to
know that that Mnumber is composite without knowing any specific
factor.

(There's little dispute about #1.)

2. Claim #1 is true not only from the viewpoint of mathematics in
general, but also from the narrower viewpoint of the GIMPS search for
Mersenne primes.

3. One (but not the only) justification for claim #2 is that, _in
current practice_, a composite status derived by GIMPS from finding a
specific factor is (slightly) more reliable than a composite status
derived by GIMPS from matching nonzero residues from Lucas-Lehmer
tests.

That is, although in theory, or ideally, those two methods of
determining compositeness are equally reliable, there currently exists
a slight difference in reliability, in favor of the factor, from a
practical standpoint.

4. Our experience (the record), as documented in the Mersenne
mailing list or GIMPS history, supports claim #3.

- - - - -

Brian Beesley wrote:
  AFAIK our record does _not_ show any such thing.

 Oh?  It doesn't?

 There is no evidence of any verified residuals being incorrect.

Wait a second -- just yesterday you wrote that you had triple-checked
thousands of small exponents (which means they had already been
double-checked) and that A very few (think fingers of one hand)
instances of incorrectly matched residuals have come to light -
completing the double-check in these cases proved that one of the
recorded residuals was correct.

So it seems that the meaning you're assigning to verified is
something like retested and retested until two residuals match.
Is that a correct interpretation?  If not, what is?

My claim #3 means that in practice, factors require fewer verification
runs to produce matching results than do L-L residues, on average.
Do you disagree with that?  If not, then don't we agree about claim
#3?

Furthermore, my claim #4 means that the demonstration that factors
require fewer verification runs to produce matching results than do
L-L residues, on average, rests on the observed history _including the
paragraph you wrote from which I just quoted above!_  Do you disagree?

Also, in that same paragraph you wrote, ... - some of the ones where
the accepted residual was recorded to only 16 bits or less, which
makes the chance of an undetected error _much_ greater (though still
quite small) ...  Am I correct in interpreting this to mean that you
think that using 64-bit residuals is more reliable than using 16-bit
residuals?  If so, then surely you'll grant that 256-bit residuals
would be even more reliable yet, meaning that there's still room for
error in our practice of using 64-bit residuals.  But a specific
factor
is a _complete value_, not some truncation, and so its reliability is
not damaged by the incompleteness which you admit keeps the L-L
residues from being totally reliable - right?

Then you wrote so far no substantive errors in the database have come
to light, but seemingly contradicted that in the very next sentence,
A very few (think fingers of one hand) instances of incorrectly
matched residuals have come to light - completing the double-check in
these cases proved that one of the recorded residuals was correct.
... And thus _the other_ recorded residual was _incorrect_.

 Neither is there any evidence that any verified factors are
incorrect.

Depends on the meaning of verified, of course.  :-)

Will Edgington (I think) has reported finding errors in his factor
data base ... even though he verifies factors before adding them.
Mistakes happen.  But I think the error rate for factors has been
significantly lower than for L-L residuals.

 Whatever theory states, the experimental evidence

RE: Mersenne: Re: Factoring benefit/cost ratio

2001-12-04 Thread Paul Leyland


 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] 


 
 (Aside - we're rather more likely to find a 75-bit factor than a 75-
 digit factor. In fact, finding a 75-digit prime factor of a 
 no factors 
 known Mersenne number with an exponent under 80 million would 
 be a significant achievement in itself.

Finding a 75-digit prime and non-algebraic factor of any integer with
more than, say, 200 digits would be a significant achievement.  The
record today is 55 digits; it reached 53 digits 3 years ago and only a
handful of factors with 50 or more digits have been found ever.  I have
precisely one of them to my credit 8-)

 At the moment, having found one factor, we quit. That's sufficient 
 effort for the purpose of trying to find Mersenne primes. A little 
 more work might break the cofactor down further.

Actually, some of us don't quit.  But we're a small bunch of weirdos,
and we only work on tiny exponents anyway.


Paul
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Re: Factoring benefit/cost ratio

2001-12-04 Thread bjb

On 4 Dec 2001, at 1:19, Paul Leyland wrote:

 
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] 
 
 
  
  (Aside - we're rather more likely to find a 75-bit factor than a 75-
  digit factor. In fact, finding a 75-digit prime factor of a 
  no factors 
  known Mersenne number with an exponent under 80 million would 
  be a significant achievement in itself.
 
 Finding a 75-digit prime and non-algebraic factor of any integer with
 more than, say, 200 digits would be a significant achievement.  The
 record today is 55 digits; it reached 53 digits 3 years ago and only a
 handful of factors with 50 or more digits have been found ever.  I have
 precisely one of them to my credit 8-)

Sure. Not quite the same since there appears to be no certificate of 
primality, but on 30 Aug 2001 there was a message on this list to 
the effect that M727 (c219) = prp98.prp128. So much ECM work 
was done on M727 (before the NFS people started work) that it is 
highly unlikely that there are any factors  10^50, which means 
that at least the 98-digit probable prime is almost certainly a 
genuine prime. (Maybe that's been proved by now. ECPP on 
general numbers of around 100 digits isn't very expensive.)

I think the 55 digit record applies to ECM. A number of much larger 
factors (not counting cofactors) have been found using number field 
sieve techniques.
 
  At the moment, having found one factor, we quit. That's sufficient 
  effort for the purpose of trying to find Mersenne primes. A little 
  more work might break the cofactor down further.
 
 Actually, some of us don't quit.  But we're a small bunch of weirdos,
 and we only work on tiny exponents anyway.

I was speaking for the project rather than myself. I'm also one of 
the small bunch of weirdos.


Regards
Brian Beesley
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Re: Factoring benefit/cost ratio

2001-12-04 Thread Nathan Russell

At 06:11 PM 12/4/2001 +, [EMAIL PROTECTED] wrote:
Sure. Not quite the same since there appears to be no certificate of
primality, but on 30 Aug 2001 there was a message on this list to
the effect that M727 (c219) = prp98.prp128.

Does anyone know how much CPU time was spent?


So much ECM work
was done on M727 (before the NFS people started work) that it is
highly unlikely that there are any factors  10^50, which means
that at least the 98-digit probable prime is almost certainly a
genuine prime. (Maybe that's been proved by now. ECPP on
general numbers of around 100 digits isn't very expensive.)

Especially when one uses Primo, which makes numbers of even a thousand 
digits take perhaps an afternoon on my machine.  I doubt very strongly I'm 
the first, but just for reference:

http://www.mail-archive.com/mersenne@base.com/msg06304.html
http://www.acsu.buffalo.edu/~nrussell/M727.zip

Nathan

_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-04 Thread Daran

- Original Message -
From: [EMAIL PROTECTED]
To: Daran [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, December 03, 2001 11:22 PM
Subject: Re: Mersenne: Re: Factoring benefit/cost ratio

 On 3 Dec 2001, at 20:45, Daran wrote:

  Shouldn't that be 1.015 for double-checking assignments?

 I think 1.03...

Yes, of course.

 ...However you do have a point. P-1 limits do depend on
 the trial factoring depth,...

Now I'm puzzled.  Obviously both P-1 and trial factoring limits both depend
upon the exponant, so will correlate, but I don't see a direct dependency
between them.

 and are much smaller for DC assignments
 than for first tests, so there is already something built in.

Right.  I'd noticed that my P-1 probabilities for DC assignments were about
half that for LLs, but I'd attributed that to the differences between the
magnitudes of the exponants.  This makes more sense.

  Also does the cost part of the calculation recognise the increased cost
of
  trial-factorisation after 2^64?

 Yes. The trial factoring depth is constant at 64 bits from ~8.5
 million to ~13 million. Don't forget that the number of candidates
 which need to be checked is _inversely_ proportional to the
 exponent.

Because any factor must be of form 2kp+1.

[...]

  Finally if P-1 factorisation were to be spun off into a separate work
unit,
  then the optimal arangement would be to trial factor while
  cost_of_trial_factoring * chance_of_P-1_factoring is less than
  cost_of_P-1_factoring * chance_of_trial_factoring.  Then P-1 factorise.
  Then complete trial factorisation according to the above formula.

 Interesting - but I think the effect would be small.

Perhaps, but the effort to implement (without spinning off P-1 into a
separate work type) would also be small.  The existing formula
(cost_of_trial_factoring  chance_of_trial_factoring * cost_of_LL_test *
2.03) ignores the effect of P-1.  A better formula would be:-
cost_of_trial_factoring  chance_of_trial_factoring * (cost_of_P-1_factoring
+ (1-chance_of_P-1_factoring)*(cost_of_LL_test * 2.03).

This change on its own would surely be trivial to implement.

But then, if the P-1 fails, you might as well carry on and do a little more
TF, which brings us back to the simpler formula I gave earlier.  You might
also want to lower your P-1 bounds a shade to take into account the fact
that a successful factorisation may only save a little extra TF effort.

The end result is a *tiny* reduction in the probability of finding a factor
(because of the slight reduction in P-1 bounds).  But you'll do less work,
and if you do find one, you'll probably find it earlier.  How much earlier?
I've just completed P-1 on M656677 which took 8787 seconds which about 1.5%
of the DC estimated to take 6 days 23 hours 43 minute equals 603780 seconds,
compared with a 2.6% chance of success.  My guess is that this would put it
optimally just before the last round of TF, which is (or should be) close to
the break-even point.

I'd need an estimate for the time of the last round of TF to be any more
precise.  The calculation would also need to be repeated for a first-time LL
assignment.

My final remarks on the subject of early P-1 is the observation that TF and
P-1 search overlapping factor spaces.  Are there any gains to be had in TF
from sieving out some of the smooth k's that an early P-1 failed to find?
Could you structure the factor candidate generation code, so that unsieved
list doesn't contain any smooth k's in the first place?  Even if it's not
possible or cost effective to do either of these, the fact that there are
smooth k's among the candidates should lower the estimate for the
probability of success.

Another reason to spin off P-1 into a separate work type is to allow
machines with heaps of memory to concentrate on this kind of work.  A little
experimentation shows that the probability of a successful P-1 with 400MB is
roughly double that of one with a minimal configuration, and some machines
will be doing LLs/DCs without the ability to P-1 at all.  What I envisiage
is that a virgin exponant would first go out to P-1(which would also do the
first few rounds of TF, according to my earlier formula), then as a
factoring assignment (to completion), then finally to LL and DC.

 What about factoring to a fractional depth? With a roughly
 logarithmic distribution of factors, surely about half the factors
 between 2^n and 2^(n+1) would be smaller than 2^(n+0.5), whilst
 searching to 2^(n+0.5) would take only about 41% of the time
 taken to search the whole interval.

This had occured to me.  One could calculate the exact break-even point and
TF thus far and no further.  However the gains would be minimal for many
exponants in the middle of the depth 'bands', which are already being
near-optimally factored.  Even those at the edges of the depth bands can't
be very far from the break-even point.  Also the server would need to record
more information to enable TF to be taken a further at a later

Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-04 Thread Daran

- Original Message -
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, December 04, 2001 2:38 AM
Subject: Re: Mersenne: Re: Factoring benefit/cost ratio

  and one that is decades (at least) behind GIMPS. The only reason we
  do any factoring at all is to reduce the time spent on LL testing.

 But if factoring is not really part of GIMPS's purpose (and I agree it
 isn't), how can a separate factoring effort be behind GIMPS at all?
 Aren't they measuring their progress in a different, non-comparable,
 dimension?

Say rather that there are various criteria by which the two projects can be
compared.  It's probably true that it would take them decades or more to
completely factor a set of prime exponents comparable to those which GIMPS
has verified composite.  (and that's ignoring the effort to factorise the
cofactors of Mersennes with composite exponents).  They're probably not that
far behind GIMPS in terms of total computing done.  How far will depend upon
how good they are at mobilising support.

[...]

 In reply to a statement of mine about the extra benefit of finding a
 specific factor,
 Daran G. wrote:
 I can see no way of objectively quantifying this benefit.

 Well -- if there's no objective quantification of the extra benefit of
 finding a specific factor, then it seems to me that there's no
 objectively quantifiable justification for saying that it's not
 valuable to do a certain amount of extra P-1 factoring on a
 once-LL'd Mnumber.  :)

Can't argue with that.

[...]

 Daran G. wrote:
 It seems to be implicitely acknowledged in the way the trial
 factoring
 depths are determined.

 But don't forget that this refers to a depth determined by Prime95
 _in the context of calculating the factoring tradeoff point for
 maximizing GIMPS throughput for the Test= worktype_, NOT the context
 of a Factor= or Pminus1= worktype where the user has explicitly
 specified factoring limits, possibly for reasons beyond the ken of
 Prime95.

That seemed to be the question you were asking.

[...]

 I'm not saying that Prime95 should incorporate into its calculations a
 nonzero value for the benefit of finding a specific factor.

 I'm saying that it is rational for someone to decide to factor past
 the Prime95-calculated tradeoff points, and that it is unjustified to
 criticize extra factoring on the grounds that going past the
 Prime95-calculated tradeoff points is wasted effort.

I agree.  But you can look at this in terms of an even greater project than
GIMPS - The Great Distributed Computing Project that encompasses all such
efforts, and aims to use spare computing cycles to increase the knowledge
base of humankind generally.  What contribution one chooses to make depends
upon ones own personal preferences.

 Richard B. Woods

Daran G.


_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-04 Thread Daran

- Original Message -
From: George Woltman [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Tuesday, December 04, 2001 3:41 AM
Subject: Re: Mersenne: Re: Factoring benefit/cost ratio

 I think more of the discussion has centered around stats and the
 formula for picking how far to trial factor, rather than whether factoring
 is of some mathematical value...

That's true, at least for my own recent contributions to this discussion.
However, what I've been trying to do is float a few ideas for increasing the
effectiveness of the factorising effort.  If P-1 was done early, as I
suggested, with bounds unchanged, then exactly the same Mersennes will get
factored sooner, on average, and with less work.  Ditto with the smooth k
sieving idea.  And if the cost of factoring is reduced, the optimal depth
increases.

The benefits of such an exercise are even greater if factors are given an
intrinsic value of their own.

Of course, the real scarce resource is not computer cycles, nor even ideas
for improvement, but in programming effort to implement.  Calculating that
particular trade-off I leave to others.

 -- George

Daran G.


_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-03 Thread Daran

- Original Message -
From: George Woltman [EMAIL PROTECTED]
To: Gerry Snyder [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Saturday, December 01, 2001 10:39 PM
Subject: Re: Mersenne: Re: Factoring benefit/cost ratio


 I prefer a factor to a double-check.  But it is hard to quantify prefer
in a
 mathematical formula for computing trial factoring limits.  Prime95 uses
 the formula:   cost_of_factoring must be less than
chance_of_finding_a_factor
 times 2.03 * the cost_of_an_LL_test.

Shouldn't that be 1.015 for double-checking assignments?

Also does the cost part of the calculation recognise the increased cost of
trial-factorisation after 2^64?

I've noticed on occasion that I've had to do an extra round of trial
factoring before proceeding with an doublecheck.  This indicates that
previous factorisation has been non-optimal, or have the estimates for the
relative costs of factoring vs. LL testing changed with the introduction of
new hardware?

Finally if P-1 factorisation were to be spun off into a separate work unit,
then the optimal arangement would be to trial factor while
cost_of_trial_factoring * chance_of_P-1_factoring is less than
cost_of_P-1_factoring * chance_of_trial_factoring.  Then P-1 factorise.
Then complete trial factorisation according to the above formula.

 -- George

Daran G.


_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-03 Thread bjb

On 3 Dec 2001, at 20:45, Daran wrote:

 Shouldn't that be 1.015 for double-checking assignments?

I think 1.03. However you do have a point. P-1 limits do depend on 
the trial factoring depth, and are much smaller for DC assignments 
than for first tests, so there is already something built in.
 
 Also does the cost part of the calculation recognise the increased cost of
 trial-factorisation after 2^64?

Yes. The trial factoring depth is constant at 64 bits from ~8.5 
million to ~13 million. Don't forget that the number of candidates 
which need to be checked is _inversely_ proportional to the 
exponent.
 
 I've noticed on occasion that I've had to do an extra round of trial
 factoring before proceeding with an doublecheck.  This indicates that
 previous factorisation has been non-optimal, or have the estimates for the
 relative costs of factoring vs. LL testing changed with the introduction of
 new hardware?

I think the latter has something to do with it - PPro is about twice 
as efficient at factoring compared with LL testing as a plain 
Pentium. This is because of much improved pipeline organization 
including the provision of spare registers enabling speculative 
execution, which greatly increased the throughput of the floating 
point unit in particular.

Another factor is that early versions of the program were unable to 
factor as deep as current versions. 
 
 Finally if P-1 factorisation were to be spun off into a separate work unit,
 then the optimal arangement would be to trial factor while
 cost_of_trial_factoring * chance_of_P-1_factoring is less than
 cost_of_P-1_factoring * chance_of_trial_factoring.  Then P-1 factorise.
 Then complete trial factorisation according to the above formula.

Interesting - but I think the effect would be small.

What about factoring to a fractional depth? With a roughly 
logarithmic distribution of factors, surely about half the factors 
between 2^n and 2^(n+1) would be smaller than 2^(n+0.5), whilst 
searching to 2^(n+0.5) would take only about 41% of the time 
taken to search the whole interval.

Regards
Brian Beesley
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-03 Thread ribwoods

Responses to a couple of postings --

Steve Harris wrote:
 Actually, Richard's statement that a 'Factored' status is better
 for GIMPS than a 'Two LL' status is not quite true. It's better for
 the mathematical community as a whole,

Better for the mathematical community as a whole is actually almost
exactly what I had in mind when I started composing that paragraph
... honest!  But then I started revising, and lost some stuff in the
process.

Nevertheless, I stand by my assertion that a 'Factored' status is
better for GIMPS than a 'Two LL' status ... insofar as our LL testing
remains less reliable than our factor verification.  Double-checking
certainly improves LL result reliability, but I think our record shows
that a verified factor is still slightly (by a minute but nonzero
margin) more reliable an indicator of compositeness than two matching
nonzero LL residues.

(See earlier discussions of potential reasons for incorrect LL
results.
Some error sources would be more likely to affect the longer LL test
than the shorter factor verification.)

 It matters to those who are attempting to fully factor Mersenne
 numbers, but that's a different project altogether,

Some of us like to participate in both.  :-)

 and one that is decades (at least) behind GIMPS. The only reason we
 do any factoring at all is to reduce the time spent on LL testing.

But if factoring is not really part of GIMPS's purpose (and I agree it
isn't), how can a separate factoring effort be behind GIMPS at all?
Aren't they measuring their progress in a different, non-comparable,
dimension?

 Besides, if you do manage to find a 75-digit factor of a
 2-million-digit Mersenne number, that still leaves a 125-digit
 remainder. Really not much help :-)

A journey of a thousand miles begins with but a single step.

Two million three-foot steps total more than one thousand miles.

- - - - -

In reply to a statement of mine about the extra benefit of finding a
specific factor,
Daran G. wrote:
I can see no way of objectively quantifying this benefit.

Well -- if there's no objective quantification of the extra benefit of
finding a specific factor, then it seems to me that there's no
objectively quantifiable justification for saying that it's not
valuable to do a certain amount of extra P-1 factoring on a
once-LL'd Mnumber.  :)

In response to my writing:
 This reflects the view (with which I agree) that it is more
valuable
 to know a specific factor of a Mnumber than to know that a Mnumber
is
 composite but not to know any specific factor of that Mnumber.

 So a Factored status is better for GIMPS than a Two LL status,
but
 calculations of factoring benefit that consider only the savings of
 L-L test elimination are neglecting the difference between those
two
 statuses.  If one consciously wants to neglect that difference ...
 well, okay ... but I prefer to see that explicitly acknowledged.
Daran G. wrote:
It seems to be implicitely acknowledged in the way the trial
factoring
depths are determined.

But don't forget that this refers to a depth determined by Prime95
_in the context of calculating the factoring tradeoff point for
maximizing GIMPS throughput for the Test= worktype_, NOT the context
of a Factor= or Pminus1= worktype where the user has explicitly
specified factoring limits, possibly for reasons beyond the ken of
Prime95.

Daran continues:
 If one places a non-zero value on a known factor, then the utility
 of extra factoring work on untested, once tested, and verified
 composites would be increased.

Yes.

But that doesn't have to affect Prime95's calculation of tradeoff
points _for maximizing throughput of the Test= worktype_.

I'm not saying that Prime95 should incorporate into its calculations a
nonzero value for the benefit of finding a specific factor.

I'm saying that it is rational for someone to decide to factor past
the Prime95-calculated tradeoff points, and that it is unjustified to
criticize extra factoring on the grounds that going past the
Prime95-calculated tradeoff points is wasted effort.

Richard B. Woods


_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-03 Thread George Woltman

At 08:38 PM 12/3/2001 -0600, [EMAIL PROTECTED] wrote:
Nevertheless, I stand by my assertion that a 'Factored' status is
better for GIMPS than a 'Two LL' status

I think everyone is in agreement on this.

But if factoring is not really part of GIMPS's purpose (and I agree it
isn't),

I think of GIMPS as two projects.  The larger project is let's find the
next Mersenne prime.  The much smaller project is let's maximize our
knowledge of all Mersenne numbers.

I'm saying that it is rational for someone to decide to factor past
the Prime95-calculated tradeoff points,

I agree.  I've done several P90 CPU years of that myself.

I think more of the discussion has centered around stats and the
formula for picking how far to trial factor, rather than whether factoring
is of some mathematical value.  I'm not inclined to change the trial
factoring formula.  If I recall correctly, when I last computed the formula
I erred on the side of more factoring rather than less.  As to the stats,
maybe, just maybe, we can upgrade our stats system in 2002.  I
know from other distributed projects that many users are fanatical
about any changes in stats.  We will have to tread very carefully!

-- George

_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-02 Thread bjb

On 1 Dec 2001, at 17:39, George Woltman wrote:

 This is because my rather limited reporting software only adds up the
 LL results in the verified and one-LL-tests databases.  Once an
 exponent is factored it is removed from those databases.

The other problem here is that the known factors database does 
not include the discoverer.
 
 I prefer a factor to a double-check.  But it is hard to quantify
 prefer in a mathematical formula for computing trial factoring
 limits.  Prime95 uses the formula:   cost_of_factoring must be less
 than chance_of_finding_a_factor times 2.03 * the cost_of_an_LL_test.
 
 This should maximize GIMPS throughput.  The 2.03 is because we must
 run two (or more) LL tests to do a double-check.

Again there is a complication, since the ratio of time to do factoring 
assignment X to time to do LL/DC assignment Y varies according 
to the processor. e.g. PIIs are relatively quick at factoring, whereas 
P4s are much more efficient LL testers.
 
 P.S.  I'll comment on the M#39 news later.  For now lets celebrate our
 grand accomplishment rather than worry about non-optimal press
 coverage.

Hear hear. Congratulations to the discoverer (whoever he/she is), to 
George on his program finding a fifth Mersenne Prime, and to 
everyone involved in the project, without whom we wouldn't have 
reached this milestone.


Regards
Brian Beesley
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-01 Thread Gerry Snyder

Steve Harris wrote:
 
 Actually, Richard's statement that a 'Factored' status is better for GIMPS
 than a 'Two LL' status is not quite true. It's better for the mathematical
 community as a whole, but not for GIMPS. GIMPS is looking for primes, not
 factors, and without skipping over any. 

Hmmm,

I must be having a senior moment. I would swear George said that one way
a person could lose credit for a correct LL test is if later factoring
finds a factor.

Is my feeble brain making this up, or is finding a factor more important
than stated above?

Gerry
-- 
mailto:[EMAIL PROTECTED]
Gerry Snyder, AIS Director  Symposium Chair, Region 15 RVP
Member San Fernando Valley, Southern California Iris Societies
in warm, winterless Los Angeles--USDA 9b-ish, Sunset 18-19
my work: helping generate data for: http://galileo.jpl.nasa.gov/
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-01 Thread George Woltman

Hi Gerry,

At 11:03 AM 12/1/2001 -0800, Gerry Snyder wrote:
I must be having a senior moment. I would swear George said that one way
a person could lose credit for a correct LL test is if later factoring
finds a factor.

This is because my rather limited reporting software only adds up the
LL results in the verified and one-LL-tests databases.  Once an exponent
is factored it is removed from those databases.

Is my feeble brain making this up, or is finding a factor more important
than stated above?

I prefer a factor to a double-check.  But it is hard to quantify prefer in a
mathematical formula for computing trial factoring limits.  Prime95 uses
the formula:   cost_of_factoring must be less than chance_of_finding_a_factor
times 2.03 * the cost_of_an_LL_test.

This should maximize GIMPS throughput.  The 2.03 is because we must run
two (or more) LL tests to do a double-check.

-- George

P.S.  I'll comment on the M#39 news later.  For now lets celebrate our
grand accomplishment rather than worry about non-optimal press coverage.

_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Factoring benefit/cost ratio

2001-12-01 Thread Steve Harris

George did say that, and I was aware of his statement, but that still has no
effect on the point I was making.
George's GIMPS stats also give no credit at all for finding factors, but
that  doesn't mean he considers finding factors worthless.

Steve

-Original Message-
From: Gerry Snyder [EMAIL PROTECTED]
To: mer [EMAIL PROTECTED]
Date: Saturday, December 01, 2001 12:19 PM
Subject: Re: Mersenne: Re: Factoring benefit/cost ratio


Steve Harris wrote:

 Actually, Richard's statement that a 'Factored' status is better for
GIMPS
 than a 'Two LL' status is not quite true. It's better for the
mathematical
 community as a whole, but not for GIMPS. GIMPS is looking for primes, not
 factors, and without skipping over any.

Hmmm,

I must be having a senior moment. I would swear George said that one way
a person could lose credit for a correct LL test is if later factoring
finds a factor.

Is my feeble brain making this up, or is finding a factor more important
than stated above?

Gerry
--
mailto:[EMAIL PROTECTED]
Gerry Snyder, AIS Director  Symposium Chair, Region 15 RVP
Member San Fernando Valley, Southern California Iris Societies
in warm, winterless Los Angeles--USDA 9b-ish, Sunset 18-19
my work: helping generate data for: http://galileo.jpl.nasa.gov/
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers

_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: Re: Factoring benefit/cost ratio

2001-11-30 Thread Steve Harris

Actually, Richard's statement that a 'Factored' status is better for GIMPS
than a 'Two LL' status is not quite true. It's better for the mathematical
community as a whole, but not for GIMPS. GIMPS is looking for primes, not
factors, and without skipping over any. This means all candidates must be
tested and the non-primes eliminated, and it doesn't matter whether they are
eliminated by 'factored' or by 'two matching nonzero LL residues'. It
matters  to those who are attempting to fully factor Mersenne numbers, but
that's a different project altogether, and one that is decades (at least)
behind GIMPS. The only reason we do any factoring at all is to reduce the
time spent on LL testing.

Besides, if you do manage to find a 75-digit factor of a 2-million-digit
Mersenne number, that still leaves a 125-digit remainder. Really not
much help :-)

Regards,
Steve Harris

-Original Message-
From: Daran [EMAIL PROTECTED]
To: [EMAIL PROTECTED] [EMAIL PROTECTED]
Date: Friday, November 30, 2001 2:00 AM
Subject: Re: Factoring benefit/cost ratio (was: Mersenne: Fw: The Mersenne
Newsletter, issue #18)


- Original Message -
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, November 24, 2001 6:49 PM
Subject: Factoring benefit/cost ratio (was: Mersenne: Fw: The Mersenne
Newsletter, issue #18)

 But ones factoring benefit calculation might [should would be in
 line with the popular theme of prescribing what's best for other
 GIMPS participants :)] include not only the time savings of
 eliminating the need for one or two L-L tests, but also the extra
 benefit of finding a specific factor.

I can see no way of objectively quantifying this benefit.

 In the GIMPS Search Status table at www.mersenne.org/status.htm the
 march of progress is from Status Unknown to Composite - One LL to
 Composite - Two LL to ... Composite - Factored.

More desireable - whether or not recorded on that page - would be
Composite - Least (or greatest) factor known.  Most desireable (other
than
Prime) would be Composite - Completely factored'.

 This reflects the view (with which I agree) that it is more valuable
 to know a specific factor of a Mnumber than to know that a Mnumber is
 composite but not to know any specific factor of that Mnumber.

 So a Factored status is better for GIMPS than a Two LL status, but
 calculations of factoring benefit that consider only the savings of
 L-L test elimination are neglecting the difference between those two
 statuses.  If one consciously wants to neglect that difference ...
 well, okay ... but I prefer to see that explicitly acknowledged.

It seems to be implicitely acknowledged in the way the trial factoring
depths are determined.  If one places a non-zero value on a known factor,
then the utility of extra factoring work on untested, once tested, and
verified composites would be increased.  It would have to be set very high
indeed to make it worth while returning to verified composite Mersennes.

 Richard Woods

Daran G.


_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers

_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers