I've been documenting for my users how to move from Torque to SLURM and what that means for running MPI jobs. Based on the SLURM documentation I've come up with the following:
$ slurm.conf MpiDefault=none MpiParams=ports=30000-39999 Then users run... OpenMPI: srun --mpi=openmpi --resv-ports /path/to/executable MVAPICH2: srun --mpi=none /path/to/executable To test this and ensure I'm not giving bad instructions I've been running small 2 node HPL tests (to also test IB functionality), and this is when things go bad: $ salloc -N2 --ntasks-per-node=32 --cpus-per-task=1 --mem-per-cpu=1900 -p mpi-core32 $ module load gcc openmpi openblas $ srun --mpi=openmpi --resv-ports $HOME/hpl/bin/openblas_openmpi/xhpl <LOTS of errors> HPL ERROR from process # 0, on line 419 of function HPL_pdinfo: >>> Need at least 64 processes for these tests <<< Then... $ srun --mpi=pmi2 --resv-ports $HOME/hpl/bin/openblas_openmpi/xhpl < no errors > Our install of OpenMPI was compiled like so: ../openmpi-1.8.2/configure --prefix=/apps/gcc-4.8.2/openmpi/1.8.2 \ --libdir=/apps/gcc-4.8.2/openmpi/1.8.2/lib64 \ --with-slurm --with-pmi --with-verbs \ --enable-shared --enable-static \ CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64 The SLURM documentation [1] seems to indicate that the --mpi type should be OpenMPI. I'm finding though that if I set MpiDefault=pmi2 then I'm able to run both OpenMPI and MVAPICH2 without the "--mpi" argument or the "--resv-ports" argument. MVAPICH2 was compiled using " --with-pm=no --with-pmi=slurm". Is it the case that if OpenMPI is compiled with "--with-pmi" and "--with-slurm" then the pmi2 MPI plugin should be used? Is "--resv-ports" necessary given how OpenMPI was compiled? Thanks, - Trey [1] http://slurm.schedmd.com/mpi_guide.html#open_mpi ============================= Trey Dockendorf Systems Analyst I Texas A&M University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: [email protected] Jabber: [email protected]
