Matt, I'm pretty confident in saying this is entirely in Intel MPI land:
aknister@borgj157:~> I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=enable mpiexec.hydra -np 48 -ppn 24 -print-rank-map /bin/true (borgj157:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27) (borgj164:28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47) aknister@borgj157:~> I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=disable mpiexec.hydra -np 48 -ppn 24 -print-rank-map /bin/true (borgj157:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23) (borgj164:24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47) However, if a machinefile argument is passed to mpiexec.hydra (which mpirun does by default) the I_MPI_JOB_RESPECT_PROCESS_PLACEMENT variable isn't respected (see below). Maybe we need an I_MPI_JOB_RESPECT_I_MPI_JOB_RESPECT_PROCESS_PLACEMENT_VARIABLE variable. aknister@borgj157:~> I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=enable mpiexec.hydra -machinefile $PBS_NODEFILE -np 48 -ppn 24 --print-rank-map true (borgj157:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27) (borgj164:28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47) aknister@borgj157:~> I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=disable mpiexec.hydra -machinefile $PBS_NODEFILE -np 48 -ppn 24 --print-rank-map true (borgj157:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27) (borgj164:28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47) Feel free to open an in-house (Footprints) ticket if you'd like to dig into this a little more and find a workable solution on discover. -Aaron On Thu, Apr 30, 2015 at 1:32 PM, Thompson, Matt[SCIENCE SYSTEMS AND APPLICATIONS INC] <[email protected]> wrote: > > Aaron, et al, > > No. I tried setting various flags, but nothing seemed to change. > > Well, that's not true. Per SLURM's website: > > http://slurm.schedmd.com/mpi_guide.html#intel_mpi > > I did try a more extreme example. This time, I had 12 nodes. So If I run > as below, I get the same answer (28, then 20 with mpirun). So I thought, > well, let's try srun: > > (1128) $ setenv I_MPI_PMI_LIBRARY /usr/slurm/lib64/libpmi.so >> (1129) $ srun -n 48 ./helloWorld.exe | sort -k2 -g >> srun.slurm: cluster configuration lacks support for cpu binding >> Process 0 of 48 is on borgj102 >> Process 1 of 48 is on borgj102 >> Process 2 of 48 is on borgj102 >> Process 3 of 48 is on borgj102 >> Process 4 of 48 is on borgj105 >> Process 5 of 48 is on borgj105 >> Process 6 of 48 is on borgj105 >> Process 7 of 48 is on borgj105 >> Process 8 of 48 is on borgj106 >> Process 9 of 48 is on borgj106 >> Process 10 of 48 is on borgj106 >> Process 11 of 48 is on borgj106 >> Process 12 of 48 is on borgj108 >> Process 13 of 48 is on borgj108 >> Process 14 of 48 is on borgj108 >> Process 15 of 48 is on borgj108 >> Process 16 of 48 is on borgj111 >> Process 17 of 48 is on borgj111 >> Process 18 of 48 is on borgj111 >> Process 19 of 48 is on borgj111 >> Process 20 of 48 is on borgj112 >> Process 21 of 48 is on borgj112 >> Process 22 of 48 is on borgj112 >> Process 23 of 48 is on borgj112 >> Process 24 of 48 is on borgj130 >> Process 25 of 48 is on borgj130 >> Process 26 of 48 is on borgj130 >> Process 27 of 48 is on borgj130 >> Process 28 of 48 is on borgj133 >> Process 29 of 48 is on borgj133 >> Process 30 of 48 is on borgj133 >> Process 31 of 48 is on borgj133 >> Process 32 of 48 is on borgj134 >> Process 33 of 48 is on borgj134 >> Process 34 of 48 is on borgj134 >> Process 35 of 48 is on borgj134 >> Process 36 of 48 is on borgj140 >> Process 37 of 48 is on borgj140 >> Process 38 of 48 is on borgj140 >> Process 39 of 48 is on borgj140 >> Process 40 of 48 is on borgj143 >> Process 41 of 48 is on borgj143 >> Process 42 of 48 is on borgj143 >> Process 43 of 48 is on borgj143 >> Process 44 of 48 is on borgj145 >> Process 45 of 48 is on borgj145 >> Process 46 of 48 is on borgj145 >> Process 47 of 48 is on borgj145 >> > > That looks like a very SLURM-y output. Loadbalance everywhere! This seems > to support the "mpirun did it" theory. > > (Note: Do *not* have I_MPI_PMI_LIBRARY=/usr/slurm/lib64/libpmi.so set when > you mpirun. You get fun errors!) > > > > On 04/30/2015 11:09 AM, Aaron Knister wrote: > >> >> Hi Matt, >> >> I happen to know the admins of that cluster ;-) I'll take a look and >> > get back to you. Also, are you setting any additional I_MPI variables? > >> >> -Aaron >> >> Sent from my iPhone >> >> On Apr 30, 2015, at 10:53 AM, Thompson, Matt[SCIENCE SYSTEMS AND >>> APPLICATIONS INC] (GSFC-610.1) <[email protected]> wrote: >>> >>> >>> All, >>> >>> (Note: I'm also asking this on Intel's forums) >>> >>> I'm hoping you can help me with a question. Namely, I'm on a cluster >>> that uses SLURM and lets say I ask for 2 28-core Haswell nodes to run >>> interactively and I get them. Great, so my environment now has things like: >>> >>> SLURM_NTASKS_PER_NODE=28 >>> SLURM_TASKS_PER_NODE=28(x2) >>> SLURM_JOB_CPUS_PER_NODE=28(x2) >>> SLURM_CPUS_ON_NODE=28 >>> >>> Now, let's run a simple HelloWorld (using Intel MPI 5.0.3.048) on, say, >>> 48 processors (and pipe through sort to see things a bit better): >>> >>> (1047) $ mpirun -np 48 -print-rank-map ./helloWorld.exe | sort -k2 -g >>> srun.slurm: cluster configuration lacks support for cpu binding >>> >>> (borgj102:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27) >>> (borgj105:28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47) >>> Process 0 of 48 is on borgj102 >>> Process 1 of 48 is on borgj102 >>> Process 2 of 48 is on borgj102 >>> Process 3 of 48 is on borgj102 >>> Process 4 of 48 is on borgj102 >>> Process 5 of 48 is on borgj102 >>> Process 6 of 48 is on borgj102 >>> Process 7 of 48 is on borgj102 >>> Process 8 of 48 is on borgj102 >>> Process 9 of 48 is on borgj102 >>> Process 10 of 48 is on borgj102 >>> Process 11 of 48 is on borgj102 >>> Process 12 of 48 is on borgj102 >>> Process 13 of 48 is on borgj102 >>> Process 14 of 48 is on borgj102 >>> Process 15 of 48 is on borgj102 >>> Process 16 of 48 is on borgj102 >>> Process 17 of 48 is on borgj102 >>> Process 18 of 48 is on borgj102 >>> Process 19 of 48 is on borgj102 >>> Process 20 of 48 is on borgj102 >>> Process 21 of 48 is on borgj102 >>> Process 22 of 48 is on borgj102 >>> Process 23 of 48 is on borgj102 >>> Process 24 of 48 is on borgj102 >>> Process 25 of 48 is on borgj102 >>> Process 26 of 48 is on borgj102 >>> Process 27 of 48 is on borgj102 >>> Process 28 of 48 is on borgj105 >>> Process 29 of 48 is on borgj105 >>> Process 30 of 48 is on borgj105 >>> Process 31 of 48 is on borgj105 >>> Process 32 of 48 is on borgj105 >>> Process 33 of 48 is on borgj105 >>> Process 34 of 48 is on borgj105 >>> Process 35 of 48 is on borgj105 >>> Process 36 of 48 is on borgj105 >>> Process 37 of 48 is on borgj105 >>> Process 38 of 48 is on borgj105 >>> Process 39 of 48 is on borgj105 >>> Process 40 of 48 is on borgj105 >>> Process 41 of 48 is on borgj105 >>> Process 42 of 48 is on borgj105 >>> Process 43 of 48 is on borgj105 >>> Process 44 of 48 is on borgj105 >>> Process 45 of 48 is on borgj105 >>> Process 46 of 48 is on borgj105 >>> Process 47 of 48 is on borgj105 >>> >>> As you can see, the first 28 processes are on node 1, and the last 20 >>> are on node 2. Okay. Now, I want to do some load balancing, so I want 24 on >>> each. In the past, I always used -perhost and it worked, but now: >>> >>> (1048) $ mpirun -np 48 -perhost 24 -print-rank-map ./helloWorld.exe | >>> sort -k2 -g >>> srun.slurm: cluster configuration lacks support for cpu binding >>> >>> (borgj102:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27) >>> (borgj105:28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47) >>> Process 0 of 48 is on borgj102 >>> Process 1 of 48 is on borgj102 >>> Process 2 of 48 is on borgj102 >>> Process 3 of 48 is on borgj102 >>> Process 4 of 48 is on borgj102 >>> Process 5 of 48 is on borgj102 >>> Process 6 of 48 is on borgj102 >>> Process 7 of 48 is on borgj102 >>> Process 8 of 48 is on borgj102 >>> Process 9 of 48 is on borgj102 >>> Process 10 of 48 is on borgj102 >>> Process 11 of 48 is on borgj102 >>> Process 12 of 48 is on borgj102 >>> Process 13 of 48 is on borgj102 >>> Process 14 of 48 is on borgj102 >>> Process 15 of 48 is on borgj102 >>> Process 16 of 48 is on borgj102 >>> Process 17 of 48 is on borgj102 >>> Process 18 of 48 is on borgj102 >>> Process 19 of 48 is on borgj102 >>> Process 20 of 48 is on borgj102 >>> Process 21 of 48 is on borgj102 >>> Process 22 of 48 is on borgj102 >>> Process 23 of 48 is on borgj102 >>> Process 24 of 48 is on borgj102 >>> Process 25 of 48 is on borgj102 >>> Process 26 of 48 is on borgj102 >>> Process 27 of 48 is on borgj102 >>> Process 28 of 48 is on borgj105 >>> Process 29 of 48 is on borgj105 >>> Process 30 of 48 is on borgj105 >>> Process 31 of 48 is on borgj105 >>> Process 32 of 48 is on borgj105 >>> Process 33 of 48 is on borgj105 >>> Process 34 of 48 is on borgj105 >>> Process 35 of 48 is on borgj105 >>> Process 36 of 48 is on borgj105 >>> Process 37 of 48 is on borgj105 >>> Process 38 of 48 is on borgj105 >>> Process 39 of 48 is on borgj105 >>> Process 40 of 48 is on borgj105 >>> Process 41 of 48 is on borgj105 >>> Process 42 of 48 is on borgj105 >>> Process 43 of 48 is on borgj105 >>> Process 44 of 48 is on borgj105 >>> Process 45 of 48 is on borgj105 >>> Process 46 of 48 is on borgj105 >>> Process 47 of 48 is on borgj105 >>> >>> Huh. No change and still 28,20. Do you know if there is a way to >>> "override" what appears to be SLURM beating the -perhost flag? I suppose >>> there is that srun.slurm warning being thrown, but that usually is a >>> warning for more "tasks-per-core" sort of manipulations. >>> >>> Thanks, >>> Matt >>> -- >>> Matt Thompson SSAI, Sr Software Test Engr >>> NASA GSFC, Global Modeling and Assimilation Office >>> Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771 >>> Phone: 301-614-6712 Fax: 301-614-6246 >>> >> > > > -- > Matt Thompson SSAI, Sr Software Test Engr > NASA GSFC, Global Modeling and Assimilation Office > Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771 > Phone: 301-614-6712 Fax: 301-614-6246 >
