Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

Jeff Squyres Fri, 7 Jan 2011 09:35:30 -0500

On Jan 6, 2011, at 11:23 PM, Gilbert Grosdidier wrote:

> > lstopo
> Machine (35GB)
>  NUMANode L#0 (P#0 18GB) + Socket L#0 + L3 L#0 (8192KB)
>    L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
>      PU L#0 (P#0)
>      PU L#1 (P#8)
>    L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
>      PU L#2 (P#1)
>      PU L#3 (P#9)
>    L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2
>      PU L#4 (P#2)
>      PU L#5 (P#10)
>    L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3
>      PU L#6 (P#3)
>      PU L#7 (P#11)
[snip]


Well, this might disprove my theory.  :-\  The OS indexing is not contiguous on 
the hyperthreads, so I might be wrong about what happened here.  Try this:

mpirun --mca mpi_paffinity_alone 1 hwloc-bind --get

You can even run that on just one node; let's see what you get.  This will tell 
us what each process is *actually* bound to.  hwloc-bind --get will report a 
bitmask of the P#'s from above.  So if we see 001, 010, 011, ...etc, then my 
theory of OMPI binding 1 proc per hyperthread (vs. 1 proc per core) is 
incorrect.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

Reply via email to