> As we can see, multiprocessor machines seems to be faster than ones > with one CPU (at least if they use Glucas program, because AFAIK mprime > is single threaded). I don't know how Glucas exactly works (is there > huge amount of data transferred between processors in short time or > not), but there might be a way to split it somehow with reasonable > improvement.
I don't know too how GLucas does that. I just contributed in fixing some problems about the multi-threading. In the case of a multi-threaded application like GLucas, all threads share the same memory. So sharing data is easy. On a CC-NUMA machine like the Bull NovaScale, each block of 4 CPUs has its own local block of memory. For a thread running on a block, accessing local memory is (quite) fast. But accessing to the memory of another block costs (2 times slower). So, being able to allocate the memory needed by each thread on the block where the thread runs would improve the performances of GLucas. But that requires to know in details which part of memory is used by each thread and to modify GLucas. Then it could be possible to have GLucas running ONE exponent on several machines by using MPI. It just requires to have the previous work done. Running GLucas on several machines would add another Perf factor slowing down GLucas. But I guess checking M42 on 8 machines with 16 ia64 processors each would take only ONE day ... Tony Acc�dez au courrier �lectronique de La Poste : www.laposte.net ; 3615 LAPOSTENET (0,34�/mn) ; t�l : 08 92 68 13 50 (0,34�/mn) _______________________________________________ Prime mailing list [email protected] http://hogranch.com/mailman/listinfo/prime
