I tried adding "-mca btl openib,sm,self" but it did not make any difference.
Jesus' e-mail this morning has got me thinking. In our system, each cabinet has 224 cores, and we are reaching a different level of the system architecture when we go beyond 224. I got an additional data point at 256 and found that performance is already falling off. Perhaps I did not build OpenMPI properly to support the Mellanox adapters that are used in the backplane, or I need some configuration setting similar to FAQ #19 in the Tuning/Openfabrics section. From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Sunday, June 09, 2013 6:48 PM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem Strange - it looks like a classic oversubscription behavior. Another possibility is that it isn't using IB for some reason when extended to the other nodes. What does your cmd line look like? Have you tried adding "-mca btl openib,sm,self" just to ensure it doesn't use TCP for some reason? On Jun 9, 2013, at 4:31 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com<mailto:edwin.l.blo...@lmco.com>> wrote: Correct. 20 nodes, 8 cores per dual-socket on each node = 360. From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> [mailto:users-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On Behalf Of Ralph Castain Sent: Sunday, June 09, 2013 6:18 PM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem So, just to be sure - when you run 320 "cores", you are running across 20 nodes? Just want to ensure we are using "core" the same way - some people confuse cores with hyperthreads. On Jun 9, 2013, at 3:50 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com<mailto:edwin.l.blo...@lmco.com>> wrote: 16. dual-socket Xeon, E5-2670. I am trying a larger model to see if the performance drop-off happens at a different number of cores. Also I'm running some intermediate core-count sizes to refine the curve a bit. I also added mpi_show_mca_params all, and at the same time, btl_openib_use_eager_rdma 1, just to see if that does anything. From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> [mailto:users-boun...@open-mpi.org<mailto:boun...@open-mpi.org>] On Behalf Of Ralph Castain Sent: Sunday, June 09, 2013 5:04 PM To: Open MPI Users Subject: EXTERNAL: Re: [OMPI users] Need advice on performance problem Looks to me like things are okay thru 160, and then things fall apart after that point. How many cores are on a node? On Jun 9, 2013, at 1:59 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com<mailto:edwin.l.blo...@lmco.com>> wrote: I'm having some trouble getting good scaling with OpenMPI 1.6.4 and I don't know where to start looking. This is an Infiniband FDR network with Sandy Bridge nodes. I am using affinity (--bind-to-core) but no other options. As the number of cores goes up, the message sizes are typically going down. There seem to be lots of options in the FAQ, and I would welcome any advice on where to start. All these timings are on a completely empty system except for me. Thanks MPI # cores Ave. Rate Std. Dev. % # timings Speedup Efficiency ================================================================================================ MVAPICH | 16 | 8.6783 | 0.995 % | 2 | 16.000 | 1.0000 MVAPICH | 48 | 8.7665 | 1.937 % | 3 | 47.517 | 0.9899 MVAPICH | 80 | 8.8900 | 2.291 % | 3 | 78.095 | 0.9762 MVAPICH | 160 | 8.9897 | 2.409 % | 3 | 154.457 | 0.9654 MVAPICH | 320 | 8.9780 | 2.801 % | 3 | 309.317 | 0.9666 MVAPICH | 480 | 8.9704 | 2.316 % | 3 | 464.366 | 0.9674 MVAPICH | 640 | 9.0792 | 1.138 % | 3 | 611.739 | 0.9558 MVAPICH | 720 | 9.1328 | 1.052 % | 3 | 684.162 | 0.9502 MVAPICH | 800 | 9.1945 | 0.773 % | 3 | 755.079 | 0.9438 OpenMPI | 16 | 8.6743 | 2.335 % | 2 | 16.000 | 1.0000 OpenMPI | 48 | 8.7826 | 1.605 % | 2 | 47.408 | 0.9877 OpenMPI | 80 | 8.8861 | 0.120 % | 2 | 78.093 | 0.9762 OpenMPI | 160 | 8.9774 | 0.785 % | 2 | 154.598 | 0.9662 OpenMPI | 320 | 12.0585 | 16.950 % | 2 | 230.191 | 0.7193 OpenMPI | 480 | 14.8330 | 1.300 % | 2 | 280.701 | 0.5848 OpenMPI | 640 | 17.1723 | 2.577 % | 3 | 323.283 | 0.5051 OpenMPI | 720 | 18.2153 | 2.798 % | 3 | 342.868 | 0.4762 OpenMPI | 800 | 19.3603 | 2.254 % | 3 | 358.434 | 0.4480 _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/users