On Jan 6, 2011, at 11:23 PM, Gilbert Grosdidier wrote: > > lstopo > Machine (35GB) > NUMANode L#0 (P#0 18GB) + Socket L#0 + L3 L#0 (8192KB) > L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 > PU L#0 (P#0) > PU L#1 (P#8) > L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 > PU L#2 (P#1) > PU L#3 (P#9) > L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 > PU L#4 (P#2) > PU L#5 (P#10) > L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 > PU L#6 (P#3) > PU L#7 (P#11) [snip]
Well, this might disprove my theory. :-\ The OS indexing is not contiguous on the hyperthreads, so I might be wrong about what happened here. Try this: mpirun --mca mpi_paffinity_alone 1 hwloc-bind --get You can even run that on just one node; let's see what you get. This will tell us what each process is *actually* bound to. hwloc-bind --get will report a bitmask of the P#'s from above. So if we see 001, 010, 011, ...etc, then my theory of OMPI binding 1 proc per hyperthread (vs. 1 proc per core) is incorrect. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/