On our cluster we use SLURM v2.6.3 with cpusets enabled.  We sometimes see
problems with openmpi and incorrect cpu pinning. As a workaround we use the
following bit of bash code to manually assemble an openmpi rankfile, switch
from slurm to ssh as the process launch module, and finally launch using
mpirun instead of srun. Hope this is helpful to someone.....

----
L. Shawn Matott, PhD
Computational Scientist
University at Buffalo,
Center for Computational Research
701 Ellicott Street, Buffalo, New York 14203

#
================================================================================================
# create rank file to explicitly bind cores
echo "creating hostfile and rankfile"
uid=`id -u`
jid=$SLURM_JOB_ID
nodes=`nodeset -e $SLURM_NODELIST`

# trigger creation of cpuset information and save to working dir
srun bash -c "cat /cgroup/cpuset/slurm/uid_${uid}/job_${jid}/cpuset.cpus >
cpus.\`hostname\`.$SLURM_JOB_ID"

RANKFILE=rankfile.$$
NODEFILE=nodefile.$$

rm -f $RANKFILE
rm -f $NODEFILE
rank=0
for i in ${nodes}; do
 # extract space-separated list of assigned cpus
 cpus=`cat cpus.${i}.${SLURM_JOB_ID}`
 cpus=`nodeset -Re $cpus`
 # add cpu assignments to the rank file
 for j in ${cpus}; do
   echo "rank ${rank}=$i slot=$j" >> $RANKFILE
   echo "$i" >> $NODEFILE
   rank=`expr $rank + 1`
   if [ "$rank" == "$SLURM_NPROCS" ]; then
     break;
   fi
 done
 if [ "$rank" == "$SLURM_NPROCS" ]; then
   break;
 fi
done

# use ssh instead of slurm as the launcher
# the rankfile that was just created will ensure cpusets are still honored.
export OMPI_MCA_plm=rsh

# launch application using mpirun
echo "Launching application using mpirun"
mpirun \
 -h $NODEFILE \
 --rankfile $RANKFILE  \
 --prefix $OMPI \
 --n $SLURM_NPROCS \
 --display-map  \
 --verbose $EXE $ARGS
#
================================================================================================

Reply via email to