Hi Jeff,
Here is the output of lstopo on one of the workers (thanks
Jean-Christophe) :
> lstopo
Machine (35GB)
NUMANode L#0 (P#0 18GB) + Socket L#0 + L3 L#0 (8192KB)
L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#8)
L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
PU L#2 (P#1)
PU L#3 (P#9)
L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2
PU L#4 (P#2)
PU L#5 (P#10)
L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3
PU L#6 (P#3)
PU L#7 (P#11)
NUMANode L#1 (P#1 18GB) + Socket L#1 + L3 L#1 (8192KB)
L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4
PU L#8 (P#4)
PU L#9 (P#12)
L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5
PU L#10 (P#5)
PU L#11 (P#13)
L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6
PU L#12 (P#6)
PU L#13 (P#14)
L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7
PU L#14 (P#7)
PU L#15 (P#15)
Tests with --bind-to-core are under way ...
What is your conclusion, please ?
Thanks, G.
Le 06/01/2011 23:16, Jeff Squyres a écrit :
On Jan 6, 2011, at 5:07 PM, Gilbert Grosdidier wrote:
Yes Jeff, I'm pretty sure indeed that hyperthreading is enabled, since 16 CPUs
are visible in the /proc/cpuinfo pseudo-file, while it's a 8 core Nehalem node.
However, I always carefully checked that only 8 processes are running on each
node. Could it be that they are assigned to 8 hyperthreads but only 4 cores,
for example ? Is this actually possible with paffinity set to 1 ?
Yes. I actually had this happen to another user recently; I should add this to
the FAQ... (/me adds to to-do list)
Here's what I'm guessing is happening: OMPI's paffinity_alone algorithm is
currently pretty stupid. It simply assigns the first MPI process on the node
to OS processor ID 0. It then assigned the second MPI process on the node to
OS processor ID 1. ...and so on.
However, if hyperthreading is enabled, OS processor ID's 0 and 1 might be 2
hyperthreads on the same core. And therefore OMPI has effectively just bound 2
processes to the same core. Ouch!
The output of lstopo can verify if this is happening: look to see if processor
ID's 0 through 7 are on the same 4 cores.
Instead of paffinity_alone, use the mpirun --bind-to-core option; that should
bind each MPI process to (the first hyperthread in) its own core.
Sidenote: many improvements are coming to our processor affinity system over
the next few releases... See my slides from the Open MPI BOF at SC'10 for some
discussion of what's coming:
http://www.open-mpi.org/papers/sc-2010/