Hello, I would really appreciate any advice on troubleshooting/tuning Open MPI over ConnectX. More details about our setup can be found here http://www.cse.scitech.ac.uk/disco/database/search-machine.php?MID=52 Single process per node (ppn=1) seems to be fine (the results for IMB can be found here http://www.cse.scitech.ac.uk/disco/database/search-pmb.php) However there is a problem with Alltoall and ppn=8 mpiexec --mca btl ^tcp -machinefile hosts32x8.txt -n 128 src/IMB-MPI1.openmpi -npmin 128 Alltoall #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.01 0.02 0.01 1 1000 95.70 95.87 95.81 2 1000 107.59 107.64 107.62 4 1000 108.46 108.52 108.49 8 1000 112.25 112.30 112.28 16 1000 121.07 121.12 121.10 32 1000 154.12 154.18 154.15 64 1000 207.85 207.93 207.89 128 1000 334.52 334.63 334.58 256 1000 9303.66 9305.98 9304.99 512 1000 8953.59 8955.71 8955.08 1024 1000 8607.87 8608.78 8608.42 2048 1000 8642.59 8643.30 8643.03 4096 1000 8478.45 8478.64 8478.58
Ive tried playing with various parameters but to no avail. The step up for the same message size is noticeable for n=64 and 32 as well but progressively less so. Even more surprising is the fact that Gigabit performs better for this message size. mpiexec --mca btl self,sm,tcp --mca btl_tcp_if_include eth1 -machinefile hosts32x8.txt -n 128 src/IMB-MPI1.openmpi -npmin 128 Alltoall 8 1000 598.66 599.11 598.95 16 1000 723.07 723.48 723.29 32 1000 1144.79 1145.46 1145.18 64 1000 1850.25 1850.97 1850.66 128 1000 3794.32 3795.23 3794.82 256 1000 5653.55 5653.97 5653.81 512 1000 7107.96 7109.90 7109.66 1024 1000 10310.53 10315.90 10315.63 2048 1000 350066.92 350152.90 350091.89 4096 1000 42238.60 42239.53 42239.27 8192 1000 112781.11 112782.55 112782.10 16384 1000 2450606.75 2450625.01 2450617.86 Unfortunately this task never completes Thanks in advance. Sorry for the long post. Igor PS Im following the discussion on slow sm btl but not sure if this particular problem is related or not. BTW the Open MPI build Im using is for Intel compiler. PPS MVAPICH and MVAPICH2 behave much better but not perfect too. Unfortunately I have other problems with them. I. Kozin (i.kozin at dl.ac.uk) STFC Daresbury Laboratory, WA4 4AD, UK http://www.cse.clrc.ac.uk/disco