RE: Mersenne: ECM question

1999-05-07 Thread Paul Leyland

 The function being minimized, namely
 
 probability of finding a 50-digit factor on one curve
 -
time per curve
 
 is flat near its minimum.  Implementation and platform differences 
 can obviously affect the denominator (time per curve).
 The stage-2 strategy affects the numerator.  The two optimal B1's
 are close enough to be considered the same.


Umm.   I think you want to maximize the probability.

Minimizing it is easy.


Paul

Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Mersenne: Any statistics majors out there?

1999-05-07 Thread George Woltman

Hi all,

I'm working on version 19 of prime95 and I need your help.
In the past, the exponents at which a larger FFT is used was picked
rather haphazardly.  I simply picked a few exponents trying to find
one that could run a thousand or so iterations without the convolution
error greatly exceeding 0.3.

For this release, I thought it would be nice to use a more
scientific approach.  What I've done is run almost 1 iterations
of an exponent and used Excel to graph the number of iterations with 
an error between 0.200 and 0.201, 0.201 and 0.202, etc.  As you can imagine
you get a Bell-like curve that peaks at some value.

Since it has been 25 years since I took statistics courses
in college, I am at a loss as to how to use this graph to predict the
probability that N iterations will have an error below M.  I could
then use such information to find the exponent such that the chance 
of a convolution error exceeding 0.5 is less than 1 in 10^40 (or some
other criteria that we agree is "safe").

Rather than mail these largish spreadsheets to the entire list,
two sample graphs are at ftp://entropia.com/gimps/x1345000.xls and 
ftp://entropia.com/gimps/x136.xls (in the graphs a 400 on the X-axis
is really a convolution error of 0.400, the Y-axis is a count of iterations).

Any help is appreciated.

Best regards,
George 

P.S.  A brief explanation of convolution errors:  Since the FFT in prime95
is done in floating point arithmetic, round-off errors build up.  After the
FFT, every value is rounded back to an integer.  The convolution error is
the largest amount that a result value differs from the correct integer
value.  If the convolution error exceeds 0.5, then the round-back-to-integer
step will round to the wrong integer value - a catastrophic failure.

Another way to look at this:  Each FFT value contains about 20 bits of data.
That is, values ranging from 2^19 to -2^19.  Consider an FFT of 128K elements.
If each FFT value was close to the 2^19 value, then each result value would
need 19+19+17=55 bits to hold the result.  Since a float only holds 53 bits
there would be a loss of information.  Fortunately, probability assures us
that it is extremely unlikely that every (or even most) FFT values will
be close to the 2^19 maximum value.




Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Any statistics majors out there?

1999-05-07 Thread Todd Sauke

George,

You indicate that that the error distribution looks "like a Bell curve".
There is reasonable theoretical basis for the errors to follow a Bell
curve.  The sum of many random plusses and minuses combine as in the
classic "random walk" problem to give a Gaussian probability distribution.
The probabilities for various outcomes of Gaussian distributed events are
given by the "Error Function" which is defined something like the integral
over a Gaussian probability distribution from its center to some desired
point (like 0.5) expressed in terms of the width of the distribution
(you've got to worry about the pesky square-root-2s).  Probability outside
one sigma ~ 0.3; outside 2 sigma  .05; outside 3 sigma  0.003; outside 4
sigma  0.6; outside 5 sigma  6e-7; outside 6 sigma  2e-9; outside 7
sigma  3e-12; . . . outside 14 sigma  2e-44.   This function is tabulated
in reference books, but usually not out to within 10e-40 of the terminal
value.  I suppose there are good approximations applicable out there
though.  So outside about 14 sigmas you should be able to say the
probability is below 10e-40. The problem is that if there are small
deviations from "Gaussian-ness" way out on the wings of your distribution,
the REAL probability of a certain result is not well approximated by the
Error Function result.  Even though there are reasonable theoretical
considerations to indicate that your error distribution "should" be
strictly Gaussian, any "weird", low probability-type effects that cause
small deviations from "Gaussian-ness" will make the "computed" probability
of, say, 10e-40 be wrong and give a false sense of security.  This is why
people usually do what you instinctively did before and go with a "gut
feeling" about how many "sigmas out on the distribution" to go for safety.
Without a firm idea of what the "real" distribution is or unassailable
theoretical conviction that the errors "must" be Gaussian,  a probability
calculation out to 10e-40 just gives false security.  That said, you really
don't need to go out to 10e-40 anyway.  The probability of having a
roundoff error greater than 0.5 over an entire LL test only needs to be
small compared to other typical error risks, like gamma rays, people using
excessive overclocking, or whatever effects cause the ~1% errors revealed
by doublechecking.  You probably want a per iteration error greater than
0.5 to be much less than about 10e-10, say.  So if 0.5 is outside about 6
or 7 sigmas of your error distribution, you would be OK, ASSUMING the
distribution is nearly Gaussian.  You might look at the integral of the
error histograms you get to see if they deviate subtly from the error
function.  I was unable to load the raw data you referenced; I'm using the
OLD Excel 4.0.  If you want to send me some raw data readable by Excel 4.0
I would be happy to look at it and look for how faithfully it follows a
Gaussian.

Sincerely,

Todd Sauke

Hi all,

   I'm working on version 19 of prime95 and I need your help.
In the past, the exponents at which a larger FFT is used was picked
rather haphazardly.  I simply picked a few exponents trying to find
one that could run a thousand or so iterations without the convolution
error greatly exceeding 0.3.

   For this release, I thought it would be nice to use a more
scientific approach.  What I've done is run almost 1 iterations
of an exponent and used Excel to graph the number of iterations with
an error between 0.200 and 0.201, 0.201 and 0.202, etc.  As you can imagine
you get a Bell-like curve that peaks at some value.

   Since it has been 25 years since I took statistics courses
in college, I am at a loss as to how to use this graph to predict the
probability that N iterations will have an error below M.  I could
then use such information to find the exponent such that the chance
of a convolution error exceeding 0.5 is less than 1 in 10^40 (or some
other criteria that we agree is "safe").

   Rather than mail these largish spreadsheets to the entire list,
two sample graphs are at ftp://entropia.com/gimps/x1345000.xls and
ftp://entropia.com/gimps/x136.xls (in the graphs a 400 on the X-axis
is really a convolution error of 0.400, the Y-axis is a count of iterations).

   Any help is appreciated.

Best regards,
George

P.S.  A brief explanation of convolution errors:  Since the FFT in prime95
is done in floating point arithmetic, round-off errors build up.  After the
FFT, every value is rounded back to an integer.  The convolution error is
the largest amount that a result value differs from the correct integer
value.  If the convolution error exceeds 0.5, then the round-back-to-integer
step will round to the wrong integer value - a catastrophic failure.

Another way to look at this:  Each FFT value contains about 20 bits of data.
That is, values ranging from 2^19 to -2^19.  Consider an FFT of 128K elements.
If each FFT value was close to the 2^19 value, then each result value would
need