If you run at 224 and things look okay, then I would suspect something in the 
upper level switch that spans cabinets. At that point, I'd have to leave it to 
Mellanox to advise.


On Jun 11, 2013, at 6:55 AM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com> wrote:

> I tried adding "-mca btl openib,sm,self"  but it did not make any difference.
>  
> Jesus’ e-mail this morning has got me thinking.  In our system, each cabinet 
> has 224 cores, and we are reaching a different level of the system 
> architecture when we go beyond 224.  I got an additional data point at 256 
> and found that performance is already falling off. Perhaps I did not build 
> OpenMPI properly to support the Mellanox adapters that are used in the 
> backplane, or I need some configuration setting similar to FAQ #19 in the 
> Tuning/Openfabrics section.
>  
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Ralph Castain
> Sent: Sunday, June 09, 2013 6:48 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem
>  
> Strange - it looks like a classic oversubscription behavior. Another 
> possibility is that it isn't using IB for some reason when extended to the 
> other nodes. What does your cmd line look like? Have you tried adding "-mca 
> btl openib,sm,self" just to ensure it doesn't use TCP for some reason?
>  
>  
> On Jun 9, 2013, at 4:31 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com> wrote:
> 
> 
> Correct.  20 nodes, 8 cores per dual-socket on each node = 360.
>  
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Ralph Castain
> Sent: Sunday, June 09, 2013 6:18 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem
>  
> So, just to be sure - when you run 320 "cores", you are running across 20 
> nodes?
>  
> Just want to ensure we are using "core" the same way - some people confuse 
> cores with hyperthreads.
>  
> On Jun 9, 2013, at 3:50 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com> wrote:
> 
> 
> 
> 16.  dual-socket Xeon, E5-2670.
>  
> I am trying a larger model to see if the performance drop-off happens at a 
> different number of cores. 
> Also I’m running some intermediate core-count sizes to refine the curve a bit.
> I also added mpi_show_mca_params all, and at the same time, 
> btl_openib_use_eager_rdma 1, just to see if that does anything.
>  
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Ralph Castain
> Sent: Sunday, June 09, 2013 5:04 PM
> To: Open MPI Users
> Subject: EXTERNAL: Re: [OMPI users] Need advice on performance problem
>  
> Looks to me like things are okay thru 160, and then things fall apart after 
> that point. How many cores are on a node?
>  
>  
> On Jun 9, 2013, at 1:59 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com> wrote:
> 
> 
> 
> 
> I’m having some trouble getting good scaling with OpenMPI 1.6.4 and I don’t 
> know where to start looking. This is an Infiniband FDR network with Sandy 
> Bridge nodes.  I am using affinity (--bind-to-core) but no other options. As 
> the number of cores goes up, the message sizes are typically going down. 
> There seem to be lots of options in the FAQ, and I would welcome any advice 
> on where to start.  All these timings are on a completely empty system except 
> for me.
>  
> Thanks
>  
>  
>     MPI              # cores   Ave. Rate   Std. Dev. %  # timings   Speedup   
>  Efficiency
> ================================================================================================
> MVAPICH            |   16   |    8.6783  |   0.995 % |       2  |   16.000  | 
>  1.0000
> MVAPICH            |   48   |    8.7665  |   1.937 % |       3  |   47.517  | 
>  0.9899
> MVAPICH            |   80   |    8.8900  |   2.291 % |       3  |   78.095  | 
>  0.9762
> MVAPICH            |  160   |    8.9897  |   2.409 % |       3  |  154.457  | 
>  0.9654
> MVAPICH            |  320   |    8.9780  |   2.801 % |       3  |  309.317  | 
>  0.9666
> MVAPICH            |  480   |    8.9704  |   2.316 % |       3  |  464.366  | 
>  0.9674
> MVAPICH            |  640   |    9.0792  |   1.138 % |       3  |  611.739  | 
>  0.9558
> MVAPICH            |  720   |    9.1328  |   1.052 % |       3  |  684.162  | 
>  0.9502
> MVAPICH            |  800   |    9.1945  |   0.773 % |       3  |  755.079  | 
>  0.9438
> OpenMPI            |   16   |    8.6743  |   2.335 % |       2  |   16.000  | 
>  1.0000
> OpenMPI            |   48   |    8.7826  |   1.605 % |       2  |   47.408  | 
>  0.9877
> OpenMPI            |   80   |    8.8861  |   0.120 % |       2  |   78.093  | 
>  0.9762
> OpenMPI            |  160   |    8.9774  |   0.785 % |       2  |  154.598  | 
>  0.9662
> OpenMPI            |  320   |   12.0585  |  16.950 % |       2  |  230.191  | 
>  0.7193
> OpenMPI            |  480   |   14.8330  |   1.300 % |       2  |  280.701  | 
>  0.5848
> OpenMPI            |  640   |   17.1723  |   2.577 % |       3  |  323.283  | 
>  0.5051
> OpenMPI            |  720   |   18.2153  |   2.798 % |       3  |  342.868  | 
>  0.4762
> OpenMPI            |  800   |   19.3603  |   2.254 % |       3  |  358.434  | 
>  0.4480
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to