Hello Daniel and list
Could it be a problem with memory bandwidth / contention in multi-core?
It has been reported in many mailing lists (mpich, beowulf, etc).
Here it seems to happen in dual-processor dual-core with our memory
intensive programs.
Have you checked what happens to the shared memory runs as you
you increase the number of active cores/processes?
Would it help to set the processor affinity in the shared memory runs?
http://www.open-mpi.org/faq/?category=building#build-paffinity
http://www.open-mpi.org/faq/?category=tuning#using-paffinity
Gus Correa
--
---------------------------------------------------------------------
Gustavo J. Ponce Correa, PhD - Email: g...@ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
Daniël Mantione wrote:
Hello,
I'm troubleshooting a weird benchmark situation that having the sm btl
enabled gives me worse results than disabling it.
For example, this on a single compute node with 2*Xeon5420, 8 GB RAM and a
ConnectX gen2 IB card, with OFED 1.3 and OpenMPI 1.2.6 as software setup:
[cvsupport@extern src]$ mpirun -np 8 --mca btl self,sm,openib -hostfile \
hostfile ./IMB-MPI1.openmpi -npmin 8 PingPong
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.87 0.00
1 1000 0.98 0.97
2 1000 0.97 1.96
4 1000 0.99 3.87
8 1000 0.98 7.78
16 1000 1.15 13.33
32 1000 1.13 26.93
64 1000 1.12 54.42
128 1000 1.27 96.31
256 1000 1.55 157.01
512 1000 2.04 239.00
1024 1000 2.75 355.62
2048 1000 4.58 426.40
4096 1000 7.12 548.93
8192 1000 11.29 692.14
16384 1000 18.83 829.75
32768 1000 34.57 904.08
65536 640 60.73 1029.22
131072 320 112.06 1115.43
262144 160 215.48 1160.21
524288 80 423.34 1181.09
1048576 40 858.18 1165.26
2097152 20 1744.15 1146.69
4194304 10 4055.60 986.29
Now, when disabling the sm btl, the score is:
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 1.08 0.00
1 1000 1.42 0.67
2 1000 1.19 1.60
4 1000 1.21 3.14
8 1000 1.61 4.75
16 1000 1.30 11.70
32 1000 1.32 23.13
64 1000 1.61 37.97
128 1000 2.80 43.53
256 1000 3.21 76.05
512 1000 4.06 120.15
1024 1000 5.03 194.21
2048 1000 7.15 273.05
4096 1000 10.05 388.55
8192 1000 16.02 487.76
16384 1000 29.63 527.41
32768 1000 51.23 610.03
65536 640 92.26 677.43
131072 320 141.03 886.36
262144 160 233.62 1070.14
524288 80 434.56 1150.60
1048576 40 818.84 1221.24
2097152 20 1403.75 1424.76
4194304 10 2523.40 1585.16
Now, I do have fast Infiniband, but I can't believe that the openib btl is
supposed to be faster than the sm btl. Does anyone know wether
something can be tuned here?
Best regards,
Daniël Mantione
------------------------------------------------------------------------
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users