Ahhh... try adding "--mpi=pmi" or "--mpi=pmi2" to your srun command.
Andy
p.s. If this fixes it, you might want to set the mpi default in
slurm.conf appropriately.
On 9/23/2014 3:07 PM, Lev Givon wrote:
Received from Andy Riebs on Tue, Sep 23, 2014 at 02:57:49PM EDT:
On 9/23/2014 2:49 PM, Lev Givon wrote:
I have OpenMPI 1.8.2 compiled with PMI support enabled and slurm 2.6.5
installed on an
8-CPU machine running Ubuntu 14.04.1. I noticed that attempting to run any
program compiled against said OpenMPI installation via srun using
srun -n X mpiexec program
with X > 1 effectively is equivalent to running
mpiexec -np X program
X times. Is this behavior expected? Running the program via sbatch only causes 1
run over X MPI processes.
Lev, if you drop "mpiexec" from your command line, you should see
the desired behaviour, i.e.,
$ srun -n X program
Doing so does launch the program only X times, but the communicator size seen
by each
instance is 1, e.g., for the proverbial "Hello world" program, the output
Hello, world, I am 0 of 1 (myhost)
is generated X times.
Incidentally, I verified that OpenMPI was build against PMI successfully:
$ ldd /opt/openmpi-1.8.2/bin/mpiexec | grep pmi
libpmi.so.0 => /usr/lib/libpmi.so.0 (0x00002aed18f66000)
(Also, be sure to recognize the difference between "-n" and "-N"!)