Re: Mersenne: Java Prime and lotsa questions

1999-05-04 Thread Brian J Beesley

 b1) Using "properly sized" FFT: Yes, I believe using properly sized FFT's
 saves some time. How much depends on the programmer, the data, the
 language/compiler, and the hardware. Non-power-of-2 FFT's can even be
 useful, but it seems that coding up all (or most) of the possible sizes may
 be more trouble than it's worth.

Well, George went to the trouble of adding 3*2^n, 5*2^n and 7*2^n 
run length code into Prime95 - even though it's less efficient than 
plain powers-of-two, it's still well worth while, for the exponents that 
can benefit from it.

I'd be _delighted_ if someone took it on themselves to add code for 
intermediate run lengths into MacLucas/MacLucasUNIX - even just 
3*2^n would be very worthwhile. If I had time, I'd do it myself!
 
 b2) Recurring pattern in "N=N^2-2 % (2^P-1)": I don't recall anyone here
 mentioning any interesting repeats, even for composite P. I'm sure many have
 looked, and were disappointed.

If you get to residual 0, then you do get a most interesting repeat - 
the next iteration is -2 and all the subsequent ones are +2

This proves that 0 cannot occur as an "interim" residual (any 
iteration prior to 2^p-3) if 2^p-1 is going to turn out to be prime.

Although testing for "early zero" would be very quick, it's 
nevertheless probably not worth doing. We see no res64=2 values 
in the completed results for composite exponents.

Since there are "only" 2^p-1 available values that the residual can 
take, if you iterate long enough, the residual is _certain_ to go into 
a repeating loop, eventually. However, the chance of this 
happenning in the first 2^p-3 iterations would seem to be 
vanishingly small. If it _does_ start to repeat "early" then 2^p-1 
_must_ be composite - because it can't get to zero without sticking 
at 2, see above - but is it really worth adding the extra code to 
check?
 
 c2) The LL test does not adapt well to parallel efforts. I think it will
 always be the case that interprocess communication will cause the
 calculations to run slower than if just one processor crunched on the whole
 thing. [Array or vector processor machines are another matter!] But if the
 number of significant bits required were orders of magnitude larger than the
 hardware, then it might make sense to parallel the squarings. We're not
 there yet.

This seems self-evident, however it is certainly true that there is an 
efficient way to code an FFT for a vector processor.
 
 d) NTT's (Number Theoretic Transforms): There has been some talk of that
 here before. However, until the ratio of integer to floating point
 throughput improves, NTT's will not be very popular. [Note that the FFT
 "automagically" computes the %2^P-1, using the Crandall-Fagin procedure, so
 NTT's have a lot of ground to make up first.]
 
If you have a processor like StrongArm, which has efficient integer 
hardware but no floating-point hardware at all - forcing you to resort 
to software emulation for floating-point - then NTT starts to make a 
lot of sense. You get _some_ ground made up automatically 
because you have no rounding errors, so you can use a smaller 
transform size than experience with real/complex FFT would 
suggest. I'm still "rather hazy" (read, totally baffled) on exactly how 
the Crandall-Fagin procedure works, so I don't know if you could 
adapt NTT to do the mod 2^p-1 automatically. But my gut feeling is 
that it ought to be possible.


Regards
Brian Beesley

Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Update on using Prime95 as a windows shell

1999-05-04 Thread Foghorn Leghorn

Don't think I didn't try this (and a number of similar ideas). The
problem was that the forked copy of Prime95 doesn't seem to start
executing until ReCache completes. Either that or ReCache
suspends itself until Prime95 completes - *most* unsatisfactory to
hog all that memory long-term...

That's interesting; I haven't been getting that problem. When pass 2b 
starts, Prime95 launches, and ReCache continues until the pass finishes, and 
then it terminates, leaving Prime95 running at peak performance. I'm running 
Windows 98, and I compiled your code with Borland C++ Builder 4. Could one 
of these factors make a difference?

(If you're interested in trying out my compiled executable, I'd be glad to 
send it to you.)

As you've found out, running ReCache to completion then starting
Prime95 does have a good effect (it forces DLLs etc. which aren't
currently active to be dumped, for a start). However, I certainly get
more consistent results using the manual start procedure.

The automatic method has been completely consistent for me. Strange...


___
Get Free Email and Do More On The Web. Visit http://www.msn.com

Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Mersenne: RE: JavaPrime update... (Lotsa Questions)

1999-05-04 Thread Blosser, Jeremy

Okay, I actually remembered to benchmark JavaPrime on the RS6000 we have
here...
That RS6000 can rock!

FFT = 256k
.4874s/iteration!!!

And that was with my old JavaLucas version... Gotta try the new one which
does the radix 8 ops... but its currently being revamped so I suppose when
I'm done...
I guess the PPC 603s (or are they 604s... hmmm its a RS/6000 63p) can kick
some butt w/ their Java FP libs... Makes me almost want to try it on the
AS/400, I can try it later today... problem is the thing is so friggin'
weird...

Does anyone have benchmarks for MacLucasUnix on the RS6K that I can compare
against? I'd look at the benchmark page, but for some reason my company's
firewall is blocking the site "because it contains sex related content"
Odd...

Anyway, I was thinking yet again (I know, you gotta hate when I do that...)

The FFT algorithm inherently lends itself towards a recursive
implementation, but for speed reasons we do it iteratively. Or in more
modern implementations, semi-iteratively...
However, for a multiple processor machine... wouldn't it be effecient to
recurse deep enough to spawn X (x being the # of CPUs) threads from there? I
don't see how we run into any memory/access problems in a multithreaded
scenario here... Another thought is to have each CPU perform a portion of
the FFT, and use the Chinese Remainder Theorem to combine the results... (I
think, I need to read up on the CRT some more)...



Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Mersenne: RE: JavaPrime update... (Lotsa Questions)

1999-05-04 Thread Blosser, Jeremy


Okay, I actually remembered to benchmark JavaPrime on the 
RS6000 we have
here...
That RS6000 can rock!

FFT = 256k
.4874s/iteration!!!

And that was with my old JavaLucas version... Gotta try the 
new one which
does the radix 8 ops... but its currently being revamped so I 
suppose when
I'm done...
I guess the PPC 603s (or are they 604s... hmmm its a RS/6000 
63p) can kick
some butt w/ their Java FP libs... Makes me almost want to try 
it on the
AS/400, I can try it later today... problem is the thing is so friggin'
weird...


BAH! The AS/400s JVM seems to be out of data... :P Of course, IBM has to
make it incredibly difficult to update anything on the AS/400. I can't even
get the PTFs to update it, I gotta have some IT guy do it... REALLY
annoying...

Oh well...

Anyway, the benchmarks were:
256k FFT
2.573s/iteration

The way IBM does things on the AS/400 are just so danged weird... For
example, before "creating the java program" (via CRTJVAPGM) and upping the
optimization on it, I was getting some 8s/iter ... What the heck the
CRTJVAPGM command does is beyond me... (You can set the optimization level
via the JAVA command on the AS/400 anyway)...

Add to that the fact that the file system on the AS/400 is so wacky, it
sucks...

Oh well, back to real work I suppose. Perhaps I'll get the newer version
done later tonight...

Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



RE: Mersenne: RE: JavaPrime update... (Lotsa Questions)

1999-05-04 Thread Blosser, Jeremy

BAH! The AS/400s JVM seems to be out of data... :P Of course, 

data == date ... doh!

Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm