Hey, Shawn! Nice to see you here too:)
I'd like to add that I found the reason of this behavior on your cluster.
It is in the way OpenMPI launches it's processes. It use srun with just one
slot on each node to launch orted. In this case SLURM will bind it to just
one core and all children's of orted - real branches will also run in that
core. I wrote about this to OpenMPI developers but we didn't find solution
yet.
But probably I found the way, I'll check that in near future.
More elegant sution is to configure OpenMPI with PMI support and launch MPI
with srun like this:
srun ./mpiname
суббота, 1 марта 2014 г. пользователь L. Shawn Matott написал:
>
> On our cluster we use SLURM v2.6.3 with cpusets enabled. We sometimes see
> problems with openmpi and incorrect cpu pinning. As a workaround we use the
> following bit of bash code to manually assemble an openmpi rankfile, switch
> from slurm to ssh as the process launch module, and finally launch using
> mpirun instead of srun. Hope this is helpful to someone.....
>
> ----
> L. Shawn Matott, PhD
> Computational Scientist
> University at Buffalo,
> Center for Computational Research
> 701 Ellicott Street, Buffalo, New York 14203
>
> #
> ============================================================
> ====================================
> # create rank file to explicitly bind cores
> echo "creating hostfile and rankfile"
> uid=`id -u`
> jid=$SLURM_JOB_ID
> nodes=`nodeset -e $SLURM_NODELIST`
>
> # trigger creation of cpuset information and save to working dir
> srun bash -c "cat /cgroup/cpuset/slurm/uid_${uid}/job_${jid}/cpuset.cpus >
> cpus.\`hostname\`.$SLURM_JOB_ID"
>
> RANKFILE=rankfile.$$
> NODEFILE=nodefile.$$
>
> rm -f $RANKFILE
> rm -f $NODEFILE
> rank=0
> for i in ${nodes}; do
> # extract space-separated list of assigned cpus
> cpus=`cat cpus.${i}.${SLURM_JOB_ID}`
> cpus=`nodeset -Re $cpus`
> # add cpu assignments to the rank file
> for j in ${cpus}; do
> echo "rank ${rank}=$i slot=$j" >> $RANKFILE
> echo "$i" >> $NODEFILE
> rank=`expr $rank + 1`
> if [ "$rank" == "$SLURM_NPROCS" ]; then
> break;
> fi
> done
> if [ "$rank" == "$SLURM_NPROCS" ]; then
> break;
> fi
> done
>
> # use ssh instead of slurm as the launcher
> # the rankfile that was just created will ensure cpusets are still honored.
> export OMPI_MCA_plm=rsh
>
> # launch application using mpirun
> echo "Launching application using mpirun"
> mpirun \
> -h $NODEFILE \
> --rankfile $RANKFILE \
> --prefix $OMPI \
> --n $SLURM_NPROCS \
> --display-map \
> --verbose $EXE $ARGS
> #
> ============================================================
> ====================================
--
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov