Re: Mersenne: Re: and, from WAY out in left field...
On 25 Sep 99, at 1:23, [EMAIL PROTECTED] wrote: As with all distributed efforts, we all contribute what we can. It's not a contest, except perhaps amongst us programmers, in terms of pushing each other to continually improve our codes. There are other ways of contributing as well as simply contributing CPU cycles - valuable though that contribution is! Sorry, I am bitter and can't afford a 550 Pentium. Hopefully soon - here prices have dropped below 1000 $US for 500MHz systems- you can get a 300-400MHz PII virtually for free if you're willing to pay $20/month for 36 months of unlimited Internet access. You should try buying online, e.g. at www.valueamerica.com or some such store. UK "shop window" retail prices _look_ awful - but they insist on selling you a huge monitor, "high-end" multimedia hardware (possibly including an awful scanner and a worse digital camera), a 56K modem (built-in WinModem, useless unless you're running Windoze) and a huge raft of "free" software (mostly obsolete and/or rubbish they can't offload any other way, except as landfill) as well as the bits you actually need. You can buy basic systems at quite reasonable prices here, if you look in the right place e.g. £440 (ex VAT) for a complete PII-400 system. PIII-500 systems are a couple of hundred pounds more expensive, but that's Intel pricing policy. In terms of "bangs per buck", the systems with the fastest processors are relatively poor value for money (especially as the performance of "real" programs is not linearly related to the CPU clock speed!) There is an _enormous_ improvement in speed between a Cyrix 333 and a PII-400, _much_ less between a PII-400 and a PIII-600 (or whatever the fastest chip Intel do at the moment is - I find it hard even to keep track of where the "bleeding edge" is, so fast is the turnover of products!) Bear in mind the transport costs, the complications with customs duty the possible risks involved in dealing in foreign exchange (you don't know the exchange rate until the credit card company processes the transaction - they can, and do, manipulate this to your disadvantage) I'm not sure that there's any real benefit in dealing direct with US suppliers for single unit purchases. Bulk purchasing may be an entirely different matter. Here in the UK, component prices seem to be reasonably competitive, especially when purchased through the many outlets which include free delivery. This is a major saving on a small order. How do I know that the exponent I am testing has not already run to conclusion.? If someone else has run your exponent to completion (unlikely) you will find out when you check in your completion dates. The PrimeNet server will mutter at you. Regards Brian Beesley _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: FFTW for GIMPS?
On Sat, 25 Sep 1999, Guillermo Ballester Valor wrote: Yes, certainly I've be able to adapt lucdwt and McLucasUNIX in four days. On the other hand, my goal only was to know if working with FFTW is a good idea, and timings obtained make me think it could be. For really big FFTs you can get major gains by using FFTW as a building block in a bigger program, rather than have it do your entire FFT with a single function call. As Ernst mentioned, the building block approach lets you fold some of the forward and inverse FFT computations together, and this saves loads of time in cache misses avoided. On the UltraSPARC, using FFTW for the pieces of your computation rather than the whole thing is somewhere between 2 and 10 times faster than FFTW alone. If your comparison were ported to intel machines, which is wrong, your code will run nearly as fast as mprime!!. You say your code is twice faster than FFTW, sure it is, *BUT* in my pentium-166 the short code I wrote do an iteration of my actual exponent 3975659 in 0.901 seconds while mprime take only 0.359. This makes a RPI=40%. Then, your code will reach nearly 90% !and without lots of assembler code!. On the Pentium, assembly-coded small FFTs run more than twice as fast as FFTW. Even from C, you can do better on the Pentium (do a web search for djbfft, a free Pentium-optimized FFT package). For a recursive split-radix, you need about 200 lines of assembly; surely this is worth twice the speed! jasonp _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Front-end design
Pierre Abbat wrote: I suggest a couple of named pipes for control (the front end writes to one and reads from the other, and mprime vice versa). Since writing to a pipe whose reader is stuck can get you blocked when the pipe is full, and writing to a pipe with nothing at the other end results in a SIGPIPE, mprime should not output anything to the pipe unless told to do so, and should trap SIGPIPE and turn off output. The current .ini structure is, I feel, sufficient for controlling the operation of the client, and preferrable. However, some signal should exist to force the client to re-read these settings after change; at least in *nixes this should be easy (Just handle HUP). As for data out of the client, I like to think that the existing save-state files would be enough, or could be made to be enough. I'm not sure how much cycles generating them frequently will burn, though, which could be a problem. Also the timing data is not easily accessible this way, but I like the approach used in the recently posted *nix GUI. And I concur with the ideas that the "extended GUI" should, if possible, be separate from the main client so as not to run a risk of messing up the results. -Donwulff _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Decamega Tests (was: RE: Mersenne: GIMPS client output)
John R Pierce wrote: There's another trick to this, primarily useful in assembler not C programming... You can do it in C also. Its called Duff's Device. Its ugly but legal C code. Here a URL with info on it: http://bs.cs.tu-berlin.de/~jutta/c/duffs-device.html Regards, Herb Savage Say you unroll something 16 times Take teh actual iteration count modulo 16, and JMP into the loop at that offset to start with the repeat counter set to the count/16. i.e. if you need to do, say, 60 iterations of the inner code, thats 48 + 12, so you set the loop count to 3, and jump to the 4th block of the 16 (16-12 = 4) Anyways, prime95 is HEAVILY unrolled, using assembler macros to generate the inline linear inner 'loops'. -jrp _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Mersenne: Re: Mlucas 2.7 for x86?
Guillermo Ballester Valor ([EMAIL PROTECTED]) writes: You say your code is twice faster than FFTW, sure it is, *BUT* in my pentium-166 the short code I wrote do an iteration of my actual exponent 3975659 in 0.901 seconds while mprime take only 0.359. This makes a RPI=40%. Then, your code will reach nearly 90% !and without lots of assembler code! Hi, Guillermo: Alas, I don't think it works that way, for several reasons. On RISC Microprocessors, instruction scheduling (while nontrivial) is much easier than on CISCs, meaning that a good compiler can usually generate code that runs nearly as fast as hand-tuned assembly code (ASM) written by a skilled programmer. By "nearly" I mean 50-90%, depending on the CPU, the compiler and the ASM programmer. Non-RISCs like the Intel x86 are notoriously difficult to generate good code for, either automatically or by hand - I believe George spent 2 years getting the ASM for the early versions of Prime95 put together, and he (George - correct me if I'm wrong) was an x86 ASM expert to begin with. I'm not familiar enough with the details of FFTW to say for sure (Steven Johnson or Jason Papadopoulos could answer this), but I'm pretty sure Frigo/Johnson have done some machine-specific ASM of critical portions of the code, at least for some popular architectures - if so, the x86 would be one of the first candidates, for the above reason and the fact that it's so numerous. That may explain why FFTW performs so well on the x86. Whenever I've tried my code (not including v2.7 - that should run at least 2x faster on x86 than 2.6, in my estimation) on x86, it's been 5-10x slower than Prime95. (That's partly due to the fact that my f90 compiler for x86 is awful, but only partly.) Is there any linux or window Mlucas 2.7 executable for intel machines? No, but anyone with an f90 compiler for x86 is free to download and compile the source. By far the best compiler I know of for x86 is the Digital/Compaq/Microsoft Visual f90 for Windows - I don't have that one to play with, unfortunately, but I'd be interested to hear from someone who does what timings they get. Regards, Ernst _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Mersenne: Re: Mlucas 2.7 for x86?
[EMAIL PROTECTED] wrote: I'm not familiar enough with the details of FFTW to say for sure (Steven Johnson or Jason Papadopoulos could answer this), but I'm pretty sure Frigo/Johnson have done some machine-specific ASM of critical portions of the code, at least for some popular architectures - if so, the x86 would be one of the first candidates, for the above reason and the fact that it's so numerous. That may explain why FFTW performs so well on the x86. I've read on the fly the code used and I've not seen any line of assembler code. Perhaps the C-code has tuned thinking in x86. Is there any linux or window Mlucas 2.7 executable for intel machines? No, but anyone with an f90 compiler for x86 is free to download and compile the source. By far the best compiler I know of for x86 is the Digital/Compaq/Microsoft Visual f90 for Windows - I don't have that one to play with, unfortunately, but I'd be interested to hear from someone who does what timings they get. I've found a microsoft f-90 compiler. I've tried to compile it this afternoon but it gives me some errors, I think easy to solve. The compiler requires interfaces for most fft routines because first parameter 'a' is a target !? :-( I have no many time. But I'll be on work. yours sincerely. | Guillermo Ballester Valor | | [EMAIL PROTECTED] | | c/ cordoba, 19 | | 18151-Ogijares (Spain) | | (Linux registered user 1171811) | _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: FFTW for GIMPS?
Jason Stratos Papadopoulos wrote: For really big FFTs you can get major gains by using FFTW as a building block in a bigger program, rather than have it do your entire FFT with a single function call. As Ernst mentioned, the building block approach lets you fold some of the forward and inverse FFT computations together, and this saves loads of time in cache misses avoided. On the UltraSPARC, using FFTW for the pieces of your computation rather than the whole thing is somewhere between 2 and 10 times faster than FFTW alone. It could be terrific!. I'll see that. On the Pentium, assembly-coded small FFTs run more than twice as fast as FFTW. Even from C, you can do better on the Pentium (do a web search for djbfft, a free Pentium-optimized FFT package). For a recursive split-radix, you need about 200 lines of assembly; surely this is worth twice the speed! I would like to write some C-code for general proposes. For tuned assembler we have the Woltman fantastic prime95/mprime code. Thank you very much for your comments. It will help me a lot. Have a nice weekend. | Guillermo Ballester Valor | | [EMAIL PROTECTED] | | c/ cordoba, 19 | | 18151-Ogijares (Spain) | | (Linux registered user 1171811) | _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Mersenne: Graphical visualisation of computations
Jukka Santala [EMAIL PROTECTED] wrote on Friday 24 September 1999 at 5:25 PM: Though you ask this, I find the topic rather appropriate for the list, especially given the angle of "HOW can we visualize the process of mathemathical operations. That brings me to the following observation: I think few of us fully realize the _enormous_ amount of (highly optimized) processing required each iteration to achieve something that looks so simple: a squaring, minus 2, mod (2^p-1). Remember, you see nothing happening! Prime just tells you: I used so-and-so-many BILLION clock cycles on the previous iteration. When a computer is ray-tracing (preferrably on some fancy high-end workstation) you are immediately dazzled by the picture-perfect pixels appearing on the screen in brilliant colours. But your result is not for eternity. In GIMPS it is. M7xx tested. Composite. Or: M10yy trial-factored to 2^65. No factors below 2^65. And you know that YOU discovered that mathematical fact. That little 64bit residue or checksum is precious. If you had a grain of rice for the number of clock cycles needed to produce just one of those bits, you could feed the entire (current) world population for a whole year! Is that about right? Whatever the correct figure, the program doesn't show you what it's doing. Actually, it might be interesting to be able to see, say, just the last 64 bits (in hex) of the current number L[n] in the Lucas sequence. The program displays those bits for L[p-1], right? So why not during the sequence? The program should IMHO at least -optionally- display a progress bar showing graphically what it now only shows in digits: progress on current exponent [95.2% completed]. The bar would probably grow one pixel longer every couple of hours. Any comments on the issue of graphical visualisation? Cheers, Robert van der Peijl _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Graphical visualisation of computations
I think few of us fully realize the _enormous_ amount of (highly optimized) processing required each iteration to achieve something that looks so simple: a squaring, minus 2, mod (2^p-1). Whatever the correct figure, the program doesn't show you what it's doing. This is an interesting point - not so much whether a gui would be nice or not, but, what would it show? A progress bar, maybe... anything else? There isn't really anything else to show. Intermediate results of the LL test don't themselves have a lot of meaning (even the final result, if non-zero, is devoid of much interpretation). There's not a lot you could plot - a graph of the iteration time would only serve to show when you opened Microsoft Word or something... I must admit, I dallied (albeit very briefly) with the bovine rc5 client. Were it not for their peculiar statistics ('if keys were drops of water, we could flood the world in a week' etc) I'd have got bored of the effort a lot sooner than I did. The SETI client looks pretty (more of a screen burner than a screen saver though) but the display is at best meaningless, a waste of cycles at best. I think we ought to count ourselves lucky that what we've got is bare-bones and low-impact. Chris Nash Lexington KY UNITED STATES _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Graphical visualisation of computations
I must admit, I dallied (albeit very briefly) with the bovine rc5 client. Gack! About the only manufactured challenge comes from ID software. Should read About the only more manufactured challenge comes form ID software. :) -Lucas _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers