Hi, just for the try - can run np 2
( Ping Pong test is for 2 processes only ) On 8/13/08, Daniël Mantione <[email protected]> wrote: > > > > On Tue, 12 Aug 2008, Gus Correa wrote: > > > Hello Daniel and list > > > > Could it be a problem with memory bandwidth / contention in multi-core? > > > Yes, I believe we are somehow limited by memory performance. Here are > some numbers from a dual Opteron 2352 system, which has much more memory > bandwidth: > > > #--------------------------------------------------- > # Benchmarking PingPong > # #processes = 2 > # ( 6 additional processes waiting in MPI_Barrier) > #--------------------------------------------------- > #bytes #repetitions t[usec] Mbytes/sec > > 0 1000 0.86 0.00 > 1 1000 0.97 0.98 > 2 1000 0.95 2.01 > 4 1000 0.96 3.97 > 8 1000 0.95 7.99 > 16 1000 0.96 15.85 > 32 1000 0.99 30.69 > 64 1000 0.97 63.09 > 128 1000 1.02 119.68 > 256 1000 1.18 207.25 > 512 1000 1.40 348.77 > 1024 1000 1.75 556.75 > 2048 1000 2.59 753.22 > 4096 1000 5.10 766.23 > 8192 1000 7.93 985.13 > 16384 1000 14.60 1070.57 > 32768 1000 27.92 1119.23 > 65536 640 46.67 1339.16 > 131072 320 86.03 1453.06 > 262144 160 163.16 1532.21 > 524288 80 310.01 1612.88 > 1048576 40 730.62 1368.69 > 2097152 20 1449.72 1379.57 > 4194304 10 2884.90 1386.53 > > However, +/- 1200 MB/s (or +/ 1500 MB/s in case of the AMD system) is not > even close to the memory performance limits the systems, so there > should be room for optimization. > > After all, the openib btl manages to tranfer the data from the memory of > oneprocess to the memory of another process just fine with more > performance. > > > > It has been reported in many mailing lists (mpich, beowulf, etc). > > Here it seems to happen in dual-processor dual-core with our memory > intensive > > programs. > > > MPICH2 manages to get about 5GB/s in shared memory performance on the > Xeon 5420 system. > > > > Have you checked what happens to the shared memory runs as you > > you increase the number of active cores/processes? > > Would it help to set the processor affinity in the shared memory runs? > > > > http://www.open-mpi.org/faq/?category=building#build-paffinity > > http://www.open-mpi.org/faq/?category=tuning#using-paffinity > > > Neither has any effect on the scores. > > > Daniël > _______________________________________________ > users mailing list > [email protected] > http://www.open-mpi.org/mailman/listinfo.cgi/users >
