Re: Mersenne: Java Prime and lotsa questions
b1) Using "properly sized" FFT: Yes, I believe using properly sized FFT's saves some time. How much depends on the programmer, the data, the language/compiler, and the hardware. Non-power-of-2 FFT's can even be useful, but it seems that coding up all (or most) of the possible sizes may be more trouble than it's worth. Well, George went to the trouble of adding 3*2^n, 5*2^n and 7*2^n run length code into Prime95 - even though it's less efficient than plain powers-of-two, it's still well worth while, for the exponents that can benefit from it. I'd be _delighted_ if someone took it on themselves to add code for intermediate run lengths into MacLucas/MacLucasUNIX - even just 3*2^n would be very worthwhile. If I had time, I'd do it myself! b2) Recurring pattern in "N=N^2-2 % (2^P-1)": I don't recall anyone here mentioning any interesting repeats, even for composite P. I'm sure many have looked, and were disappointed. If you get to residual 0, then you do get a most interesting repeat - the next iteration is -2 and all the subsequent ones are +2 This proves that 0 cannot occur as an "interim" residual (any iteration prior to 2^p-3) if 2^p-1 is going to turn out to be prime. Although testing for "early zero" would be very quick, it's nevertheless probably not worth doing. We see no res64=2 values in the completed results for composite exponents. Since there are "only" 2^p-1 available values that the residual can take, if you iterate long enough, the residual is _certain_ to go into a repeating loop, eventually. However, the chance of this happenning in the first 2^p-3 iterations would seem to be vanishingly small. If it _does_ start to repeat "early" then 2^p-1 _must_ be composite - because it can't get to zero without sticking at 2, see above - but is it really worth adding the extra code to check? c2) The LL test does not adapt well to parallel efforts. I think it will always be the case that interprocess communication will cause the calculations to run slower than if just one processor crunched on the whole thing. [Array or vector processor machines are another matter!] But if the number of significant bits required were orders of magnitude larger than the hardware, then it might make sense to parallel the squarings. We're not there yet. This seems self-evident, however it is certainly true that there is an efficient way to code an FFT for a vector processor. d) NTT's (Number Theoretic Transforms): There has been some talk of that here before. However, until the ratio of integer to floating point throughput improves, NTT's will not be very popular. [Note that the FFT "automagically" computes the %2^P-1, using the Crandall-Fagin procedure, so NTT's have a lot of ground to make up first.] If you have a processor like StrongArm, which has efficient integer hardware but no floating-point hardware at all - forcing you to resort to software emulation for floating-point - then NTT starts to make a lot of sense. You get _some_ ground made up automatically because you have no rounding errors, so you can use a smaller transform size than experience with real/complex FFT would suggest. I'm still "rather hazy" (read, totally baffled) on exactly how the Crandall-Fagin procedure works, so I don't know if you could adapt NTT to do the mod 2^p-1 automatically. But my gut feeling is that it ought to be possible. Regards Brian Beesley Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
Re: Mersenne: Update on using Prime95 as a windows shell
Don't think I didn't try this (and a number of similar ideas). The problem was that the forked copy of Prime95 doesn't seem to start executing until ReCache completes. Either that or ReCache suspends itself until Prime95 completes - *most* unsatisfactory to hog all that memory long-term... That's interesting; I haven't been getting that problem. When pass 2b starts, Prime95 launches, and ReCache continues until the pass finishes, and then it terminates, leaving Prime95 running at peak performance. I'm running Windows 98, and I compiled your code with Borland C++ Builder 4. Could one of these factors make a difference? (If you're interested in trying out my compiled executable, I'd be glad to send it to you.) As you've found out, running ReCache to completion then starting Prime95 does have a good effect (it forces DLLs etc. which aren't currently active to be dumped, for a start). However, I certainly get more consistent results using the manual start procedure. The automatic method has been completely consistent for me. Strange... ___ Get Free Email and Do More On The Web. Visit http://www.msn.com Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
Mersenne: RE: JavaPrime update... (Lotsa Questions)
Okay, I actually remembered to benchmark JavaPrime on the RS6000 we have here... That RS6000 can rock! FFT = 256k .4874s/iteration!!! And that was with my old JavaLucas version... Gotta try the new one which does the radix 8 ops... but its currently being revamped so I suppose when I'm done... I guess the PPC 603s (or are they 604s... hmmm its a RS/6000 63p) can kick some butt w/ their Java FP libs... Makes me almost want to try it on the AS/400, I can try it later today... problem is the thing is so friggin' weird... Does anyone have benchmarks for MacLucasUnix on the RS6K that I can compare against? I'd look at the benchmark page, but for some reason my company's firewall is blocking the site "because it contains sex related content" Odd... Anyway, I was thinking yet again (I know, you gotta hate when I do that...) The FFT algorithm inherently lends itself towards a recursive implementation, but for speed reasons we do it iteratively. Or in more modern implementations, semi-iteratively... However, for a multiple processor machine... wouldn't it be effecient to recurse deep enough to spawn X (x being the # of CPUs) threads from there? I don't see how we run into any memory/access problems in a multithreaded scenario here... Another thought is to have each CPU perform a portion of the FFT, and use the Chinese Remainder Theorem to combine the results... (I think, I need to read up on the CRT some more)... Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
Mersenne: RE: JavaPrime update... (Lotsa Questions)
Okay, I actually remembered to benchmark JavaPrime on the RS6000 we have here... That RS6000 can rock! FFT = 256k .4874s/iteration!!! And that was with my old JavaLucas version... Gotta try the new one which does the radix 8 ops... but its currently being revamped so I suppose when I'm done... I guess the PPC 603s (or are they 604s... hmmm its a RS/6000 63p) can kick some butt w/ their Java FP libs... Makes me almost want to try it on the AS/400, I can try it later today... problem is the thing is so friggin' weird... BAH! The AS/400s JVM seems to be out of data... :P Of course, IBM has to make it incredibly difficult to update anything on the AS/400. I can't even get the PTFs to update it, I gotta have some IT guy do it... REALLY annoying... Oh well... Anyway, the benchmarks were: 256k FFT 2.573s/iteration The way IBM does things on the AS/400 are just so danged weird... For example, before "creating the java program" (via CRTJVAPGM) and upping the optimization on it, I was getting some 8s/iter ... What the heck the CRTJVAPGM command does is beyond me... (You can set the optimization level via the JAVA command on the AS/400 anyway)... Add to that the fact that the file system on the AS/400 is so wacky, it sucks... Oh well, back to real work I suppose. Perhaps I'll get the newer version done later tonight... Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
RE: Mersenne: RE: JavaPrime update... (Lotsa Questions)
BAH! The AS/400s JVM seems to be out of data... :P Of course, data == date ... doh! Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm