Mersenne: P-1

1999-09-17 Thread Jukka Santala

Well, see the topic. In other words, is it possible to get a quick "P-1
factoring for dummies" rundown? At which point is P-1 factoring worth
the effort? Does it have any overlappign with ECM? How should the bounds
be chosen for any given exponent, and will higher bounds always find any
factors smaller bounds would have, or is there an advantage to running
lower bounds? As I understand, every run with _same_ bounds would find
the same factors?

 -Donwulff

_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Celerons vs. Pentium II/III at large FFT lengths?

1999-09-17 Thread St. Dee

On Fri, 17 Sep 1999, Brian J. Beesley wrote:
 On 16 Sep 99, at 18:35, Lucas Wiman wrote:
  
  This brings us to an interesting point.  Should the primenet server start
  default assigning celeron's 384K FFT mersennes, and save the larger ones
  for PII's/PIII's?
 
 No. Whatever the problem was (I _did_ manage to duplicate it on my 
 Celeron 366 laptop using v18 - a 448K FFT was actually _slower_ than 
 a 512K FFT on the same system!) v19 does not suffer from it. And the 
 PrimeNet server doesn't recognise a Celeron unless v19 is running - 
 in v18, all P6 architecture chips are identified as Pentium Pros.

So I guess I should have my Celery's test larger exponents, at least until
I upgrade to V19...;-)

Kel

_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: v19 DNS(?) crash...

1999-09-17 Thread Jukka Santala

v19 is giving me trouble now. When I try to start it up, it says
"Contacting PrimeNet Server", hangs up there for a while and then
crashes with Application Error "The exception unknown software exception
(0x06ba) occured in the application at location 0x77e1fc45". I don't
need to go thru the debugger to guess this is because the university
network I'm using seems to be cut from USA currently, ie. DNS queries
among other things fail.

 -Donwulff

_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: Timing(?) errors

1999-09-17 Thread Steinar H. Gunderson

All,

I'm running Prime95 v18 on a Dell XPS P60 (no, I don't want to hear `switch
to factoring' or anything -- it's running double-checks). When activity
happens (in this case a Word document being opened and looked at), sometimes
the log shows stuff like:

[lots of 0.814 sec iteration times]
Iteration: 3077000 / 3644xxx. Clocks: [48.8 million] = 0.814 sec.
Iteration: 3078000 / 3644xxx. Clocks: [56.1 million] = 0.936 sec. -- Word
Iteration: 3079000 / 3644xxx. Clocks: [46.7 million] = 0.779 sec.
Iteration: 308 / 3644xxx. Clocks: [47.0 million] = 0.784 sec.
Iteration: 3081000 / 3644xxx. Clocks: [46.7 million] = 0.779 sec.

Anybody have an idea about why it actually is faster now?
My best guess would be some kind of timing mistake, but it still doesn't
sound right...

(The computer has been left totally untouched after the Word usage. The Word
usage shows itself in the 0.936 timing. The machine should have enough mem --
40 MB. Running Win95, no evil CPU-hogging programs.)

/* Steinar */
-- 
Homepage: http://members.xoom.com/sneeze/
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: G4: real or hype?

1999-09-17 Thread EWMAYER

Jonathan Zylstra wrote about the new Apple G4 processor (see quotes below).
The G4 is really a hybrid combination of the basic PowerPC CPU with a 128-bit
vector unit called the AltiVec.  Richard Crandall (who consults for Apple)
and Jason Klivington (of Apple) used a beta version of the G4 to do some of
the all-integer verification of our (Crandall, Mayer, Papadopoulos) F24
(24th Fermat number) project. Note that we didn't get a chance to test the
floating-point capabilities of the G4 - both main "wavefront runs of the
Pe'pin test of F24 were on other hardware (a 250MHz MIPS R1 and a
167MHz SPARC Ultra-1), both achieved a length-1 million FFT-based squaring
time in under 1 second, whereas the all-integer version needed about a 5 times
as long on a 400 MHz G4 - one can't conclude anything from that, since it's
like comparing apples and oranges.

Based on the technical specs and the relative all-integer timings on the
G4 vs. the Pentium (with similar amounts of code optimization, the G4 looks
to be somewhat faster than a Pentium, but by less than a factor of 2), it 
looks
like a good processor, but all the ballyhoo needs to be put into perspective.

Before I continue, a disclaimer: I neither work nor consult for any computer
manufacturer. My taste in processors is simple: the faster, the better.

"sustained performance of 1 gigaflop." (which makes the G4 a 
'supercomputer' )
"theoretical performance of 4 gigaflops"
"It is a 128 bit processor, and can perform 4 ( sometimes 8 ) 32bit =
floating pt. calculations per cycle."
"It is 3 times faster then the PIII 600Mhz"

While all of this may be technically true, it's also very misleading:

- The 1 Gflop figure comes from the fact that the G4 can in theory dispatch 2
floating-point operations (1 mul, 1 add) per cycle, so at 500 MHz that equals
1 Gflop. I defy you to get that kind of performance out of, say, a code for a
large FFT-with very careful coding, you may get half that.

The above FP capabilities are not qualitatively different than for most other
current high-end processors (Pentium, SPARC, MIPS, Alpha) and are slightly 
less
than the AMD K7, which can dispatch up to 3 FP ops per cycle (2 adds, 1 mul).
Thus, by the same reasoning, AMD could legitimately claim 2 Gflops for a
667 MHz K7. Never mind that one will only see that kind of performance for
perfectly balanced, perfectly pipelined code whose data never leave the
FP registers.

- 128-bit: also misleading. The AltiVec vector unit can do some fairly
nice operations in which a 128-bit integer operand is treated as a vector
of 4, 8, or 16 operands of 32, 16, and 8 bits, respectively, and can do
a nice variety of 4x32-bit vector FP operations, but in my opinion that
is far from constituting a true 128-bit CPU. The above enhancements are
qualitatively similar to the Pentium MMX enhancements - useful for things
like multimedia, but nearly useless when one is doing serious math with
64-bit operands. I say "nearly" since one can, e.g. build various 64-bit
integer operations out of ones on shorter operands, but it's a pain, say,
if one wants a 64-x64==128-bit integer multiply, which is potentially
more usefull for LL testing than parallel 16x16==32-bit multiplies. The
AltiVec 4x32-bit floats, like the similar MMX intructions, are not very
useful for FFT-based large- integer arithmetic - not enough precision.

- "It is 3 times faster then the PIII 600Mhz." Based on what? Give us some
SPECint or SPECfp figures that support this before making such claims. The
only datum provided (below) indicates a speedup of 1.45x, far less than 3x.

On the comparison table between the PIII and G4, they show this:

Test:PIII Clock Cycles   G4 Clock CyclesG4 =
Performance- (Adjusted for MHz)
256 Pt. FFT6.944=
1.74x better than PIII1.45x faster than PIII

In what way is a tiny 256pt FFT a good indicator of overall system 
performance?
Was this single precision (I suspect so) or double? Was it specially coded
to use the 4x32-bit FP ops supported by the G4? (I suspect so.) If so, was
similar coding attempted to use the PIII MMX instructions? (I suspect not.)
What numbers emerge when the figures are adjusted not just for MHz, but also
for price?

I've also seen some of Apples's ads in the San Jose Mercury News - the phrase
"The fastest desktop computer on earth" was used. Apparently this depends on
one's definition of both "fastest" an of "desktop computer," (perhaps even
of "Earth." :) I can buy a desktop Alpha 21264 which probably blows the G4
out of the water, performance-wise (and there, we HAVE some SPEC numbers to
guide us) on 95% of generic compiled code, say in Numerical Recipes or the
SPEC benchmark suites. So again, folks at Apple, please back your
claims up with performance data based on real-world code, the kind most
programmers really write, that tests more of the instruction set (not just
the special goodies you included) 

Re: Mersenne: P-1

1999-09-17 Thread Chris Nash

Hi there

 Well, see the topic. In other words, is it possible to get a quick "P-1
 factoring for dummies" rundown?

Sure, let's give it a try. Suppose we do some calculations modulo some
number N. In effect we're actually doing all the calculations modulo all the
prime factors of N, but of course we don't know what those prime factors
are. But if we could find a calculation for which one of the prime factors
would 'disappear', it would help us find that factor.

Fermat's little theorem tells us that a^(p-1)=1 mod p, for all primes p. In
fact, for each a there is a smallest number x0 such that a^x=1 mod p (the
exponent of a modulo p). All we usually know about x is it divides p-1 The
idea behind P-1 factoring is this, if we compute

R=a^(some big very composite number)-1 mod N

if our 'big composite number' is divisible by x, what is left will be
divisible by some unknown factor p of N, and we could find p by examining
gcd(R,N).

Usually the big composite number is a product of powers of many small
primes, so it has many factors and there is a good chance the unknown x
(which is probably p-1) is a factor of it.

 At which point is P-1 factoring worth the effort?

Probably as soon as your factoring attempts have exceeded what the machine
is comfortable with. If you've reached 2^32, try P-1 for a little while.
Trial-factoring will be slower if you carried on trying factors, also your
probability of success with trial-factoring large numbers is extremely low.
(p divides random N with probability 1/p).

Of course P-1 may fail. You may have to go a very long way before the
unknown x divides your big composite number - what if x itself has a large
prime factor? P-1 would not find it until your P-1 loop reached that prime
(in other words, your limit has to be bigger than the biggest prime factor
of the unknown x). However there are factors that P-1 finds 'easily', and
even a failed P-1 run can tell you a little more information about the
number which might help if you try some more trial-factoring. (you know for
instance that any factor must have some prime, or power of prime, factor of
p-1 above the limit).

 Does it have any overlappign with ECM?

The theory's very similar. Both algorithms attempt to find a point where one
of the unknown prime factors 'disappears' from the calculation. However ECM
has better odds. In P-1 attempts, the unknown x is fixed. You can't do
anything about it, and even if you try using a different base a, you're very
likely going to have a similar x with the same problems (a large factor). In
ECM, choosing a different curve could give an equivalent 'x' value that
varies a lot. One of those 'x' values may disappear quite quickly from the
calculations. (but again, with large values it could be a long time before
that happens).

Of course the steps involved in ECM factoring are a little more complex than
P-1...

 How should the bounds
 be chosen for any given exponent, and will higher bounds always find any
 factors smaller bounds would have, or is there an advantage to running
 lower bounds? As I understand, every run with _same_ bounds would find
 the same factors?

In theory you can change the base and the bounds. Changing the base often
has little or no effect, unless you are incredibly lucky (of course,
'obvious' bases related to the number are likely to be of little use - don't
use base a=2 for Mersennes!). Changing the bounds though can make the
difference between finding a factor, and not finding one. We may fail when
we test

a^(some small composite number)-1

but we may succeed when we test

a^((some small composite number)*(some larger composite number))-1

By writing it like this you can also see that the larger bound succeeds if
ever the smaller bound does. (the top line always divides the bottom line,
so any factors of the top also appear at the bottom). Of course larger
bounds take longer to calculate, and there is also a possibility that larger
bounds would find "more than one" factor in one run. Ideally you check
periodically through the calculation to see if you have already met a
factor, but that might take time.

The overriding "decision factor" is based purely on the time you're willing
to spend. Factoring being explicitly harder than primality testing, you
might be happy with, say, spending 10 times as long searching for a factor
as you would on a proof the number was composite. So you might find "some
very big composite number" 10 times the bit length of N was acceptable. 10
times the bit length of N is a good ballpark estimate for the bounds setting
for the P-1 test to get that sort of time required.

Of course if you were willing to spend 100, 1000 times as long, you could
set the bounds higher... but in that case, bear in mind that the P-1 test is
often unsuccessful. If you have that much time to spend, you might prefer to
dedicate it to a more sophisticated algorithm. Just like trial-factoring,
you have to increase the bounds *A LOT* before your chances of 

Re: Mersenne: v19 DNS(?) crash...

1999-09-17 Thread Greg Hewgill

On Fri, Sep 17, 1999 at 07:57:06PM +0300, Jukka Santala wrote:
 "Contacting PrimeNet Server", hangs up there for a while and then
 crashes with Application Error "The exception unknown software exception
 (0x06ba) occured in the application at location 0x77e1fc45".

You might have better luck using HTTP communications rather than RPC (the error
0x6ba is a Win32 RPC error). The HTTP method is the preferred method anyway,
from what I've read. It's more robust and less likely to crash.

Greg Hewgill
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne Digest V1 #627

1999-09-17 Thread Mersenne Digest


Mersenne Digest   Friday, September 17 1999   Volume 01 : Number 627




--

Date: Thu, 16 Sep 1999 11:47:48 +1200 (NZST)
From: Bill Rea [EMAIL PROTECTED]
Subject: Re: Mersenne: SPARC times

Laurent,

 From [EMAIL PROTECTED] Wed Sep 15 20:37:35 1999
 
 Bill Rea wrote:
  
  This is using MacLucasUNIX compiled with the Sun workshop compilers.
 
Which version and which flags did you use?  I guess you ran
 your tests under Solaris 7, right?

The MacLucasUNIX was v 6.20. The Ultra 5 and 10 were Solaris 7, both
E450's were Solaris 2.6. I tried both workshop 4 and workshop 5 
compilers. The options in the make file which gave the fastest
times on the tests were:-

OPT=-fast -xO4 -xtarget=ultra -xarch=v8plusa -xsafe=mem -xdepend \
- -xparallel -xchip=ultra

That's very strange!  I have benchmarked some code using both
 flags (with -fast preprended) under Solaris 7 (which is required
 to run code compiled with -xarch=v9) and v9 helped;  however the
 code was purely 64-bit integer.

There's also a very interesting flag to test that's not
 documented in Sun cc doc:  -xinline=all.  I used it by error but
 it did a great job with the code I was working on.

I'll try this option and also put the -xarch=v9 to a test on a 
new exponent when the next one finishes. The implication in the
WS4 documentation is that the -xinline=all is automatically used 
with -xO4, but it's worth trying anyway.

I don't know the tests supplied but the difference might result
 from the way time is counted.  I think the best way to check the
 speed of a code is to use getrusage for the process only
 (RUSAGE_SELF) and to only take into account the user time.  This
 way I get very consistent timings for the before mentioned code
 (BTW, the code is ecdl by Robert Harley, used to crack ECC).

The time reports clock time, user time, and system time. I was
comparing user time. I talked very briefly to a Sun Engineer and he
said it probably had more to do with keeping the processor fed.
Sometimes optimizations in the code don't translate into faster speeds
because the limiting factor is memory access times.  The CPUs with
bigger caches perform better than you would expect given their clock
speeds. However, he was just as surprized as I was that the v9 option
didn't produce faster code.

Bill Rea, Information Technology Services, University of Canterbury  \_ 
E-Mail b dot rea at its dot canterbury dot ac dot nz /   New 
Phone   64-3-364-2331, Fax 64-3-364-2332/)  Zealand 
Unix Systems Administrator (/' 
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers

--

Date: Wed, 15 Sep 1999 20:13:02 -0500
From: Ken Kriesel [EMAIL PROTECTED]
Subject: Re: Mersenne: Iters between screen outputs

At 10:29 PM 1999/09/15 +0200, "Shot" [EMAIL PROTECTED] wrote:
Hi all,

I was wondering why the "Iterations between screen outputs" setting 
was defaultly set on 100, and I thought it was beacuse each screen 
output takes precious CPU time.

But when I changed it from 100 i/o (usually around 0.430 sec/iter) to 
10 i/o, nothing really slowed down - the time was still between 0.430 
and 0.431 s/i.

So I boldly went where no man has gone before ;) and changed it to 1 
i/o... and the times still stayed between 0.429 and 0.431 s/i.

The question is, how much of the CPU's power is consumed by screen 
outputs?

Hardly anything these days, with fast pentiums and smart video cards.  
But on 486's  386's it was significant.  I have systems where it's
set to 5 iterations, and systems where it's set at 1000 iterations.
It all depends on system speed, exponent size, and desired update frequency.

Ken

Ken Kriesel, PE [EMAIL PROTECTED]
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers

--

Date: Wed, 15 Sep 1999 22:23:58 -0400
From: Jeff Woods [EMAIL PROTECTED]
Subject: Re: Mersenne: Iters between screen outputs

The actual ITERATION takes the same, no matter how long the screen output 
takes.  The report on the screen is NOT time for iteration PLUS time to 
display the report... It's time for the iteration WITHOUT the screen display.

At 10:29 PM 9/15/99 +0200, you wrote:
Hi all,

I was wondering why the "Iterations between screen outputs" setting
was defaultly set on 100, and I thought it was beacuse each screen
output takes precious CPU time.

But when I changed it from 100 i/o (usually around 0.430 sec/iter) to
10 i/o, nothing really slowed down - the time was still between 0.430
and 0.431 s/i.

So I boldly went where no man has gone before ;) and changed it to 1

Re: Mersenne: P-1

1999-09-17 Thread Lucas Wiman

 But as everyone on this list knows, any factor of a Mersenne
 number looks like 2*k*p+1 for N=2^p-1.  Plugging this into
 the above equation gives
 q=2*k*p+1
 q-1=2*k*p

Correct.

 Doesn't this mean the lower bound on a "p-1" method search should
 be greater that the Mersenne exponent (the p in 2^p-1) to have the best
 chance of success?  Then the "upper bound" of the "p-1" search can
 be resevered for cracking a big factor in the "k" of a Mersenne factor.

No.  Simply multiply the exponent on the base by p.  This produces the 
desired result, without having to go to the extra effort of extending the
bound that far.  

I probably should add a section on P-1 to the FAQ.

-Lucas
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: G4: real or hype?

1999-09-17 Thread Petri Holopainen

[EMAIL PROTECTED] wrote:
 
 - "It is 3 times faster then the PIII 600Mhz." Based on what? Give us some
 SPECint or SPECfp figures that support this before making such claims. The
 only datum provided (below) indicates a speedup of 1.45x, far less than 3x.
 

Apple is really blowing this G4 hype out of proportions.
It is a really nice chip, but looking at the SPEC numbers:

http://www.mot.com/SPS/PowerPC/products/semiconductor/cpu/7400.html

... they show that it's basically a G3 + Altivec and a nice FPU:

CPU SPECint95   SPECfp95
--- -   
G3-45021.413.8
G4-45021.420.4
P3-55022.315.1
P3-60024.015.9
Athlon-55025.120.6
Athlon-65029.422.4
Alpha EV67-66737.565.5

So this "128-bit 1 Gigaflops supercomputer" is just a load of
Apple marketing propaganda (yeah, Intel does this crap too, remember
"more vibrant colors" with MMX and "enhanced Internet experience"
with SSE...).

-- Petri H.
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers