Ahhh... try adding "--mpi=pmi" or "--mpi=pmi2" to your srun command.

Andy

p.s. If this fixes it, you might want to set the mpi default in slurm.conf appropriately.

On 9/23/2014 3:07 PM, Lev Givon wrote:
Received from Andy Riebs on Tue, Sep 23, 2014 at 02:57:49PM EDT:
On 9/23/2014 2:49 PM, Lev Givon wrote:
I have OpenMPI 1.8.2 compiled with PMI support enabled and slurm 2.6.5 
installed on an
8-CPU machine running Ubuntu 14.04.1. I noticed that attempting to run any
program compiled against said OpenMPI installation via srun using

srun -n X mpiexec program

with X > 1 effectively is equivalent to running

mpiexec -np X program

X times. Is this behavior expected? Running the program via sbatch only causes 1
run over X MPI processes.

Lev, if you drop "mpiexec" from your command line, you should see
the desired behaviour, i.e.,

$ srun -n X program
Doing so does launch the program only X times, but the communicator size seen 
by each
instance is 1, e.g., for the proverbial "Hello world" program, the output

Hello, world, I am 0 of 1 (myhost)

is generated X times.

Incidentally, I verified that OpenMPI was build against PMI successfully:

$ ldd /opt/openmpi-1.8.2/bin/mpiexec  | grep pmi
         libpmi.so.0 => /usr/lib/libpmi.so.0 (0x00002aed18f66000)

(Also, be sure to recognize the difference between "-n" and "-N"!)

Reply via email to