On Tue, 27 Aug 2019 14:36:54 -0500 Cooper Burns via users <users@lists.open-mpi.org> wrote:
> Hello all, > > I have been doing some MPI benchmarking on an Infiniband cluster. > > Specs are: > 12 cores/node > 2.9ghz/core > Infiniband interconnect (TCP also available) > > Some runtime numbers: > 192 cores total: (16 nodes) > IntelMPI: > 0.4 seconds > OpenMPI 3.1.3 (--mca btl ^tcp): > 2.5 seconds > OpenMPI 3.1.3 (--mca btl ^openib): > 26 seconds 5x is quite a difference... Here are a few possible reasons I can think of: 1) The app was placed/pinned differently by the two MPIs. Often this would probably not cause such a big difference. 2) Bad luck wrt collective performance. Different MPIs have different weak spots across the parameter space of numranks,transfersize,mpi-collective. 3) You're not on Mellanox infiniband but Qlogic/Intel (Truescale) infiniband. Using openib there is better than tcp but not ideal (it uses psm for native transport). 4) You changed more than the MPI. For example Intel compilers + intel-mpi vs OpenMPI + gcc. /Peter K _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users