On Feb 28, 2014, at 9:13 AM, L. Shawn Matott <[email protected]> wrote:
>
> Danny,
>
> That's good to know. Which of the steps causes the loss of functionality
> (rankfile, ssh as plm, or mpirun instead of srun)?
Again, to clarify - you gain a lot of functionality in terms of mapping,
binding, and other areas. In exchange, you lose atomicity in accounting and
memory limits. Note, however, that you can enforce memory limits on the
individual procs using mpirun if you so choose, so it the only actual loss is
the individual process accounting. Mpirun will provide those numbers as well,
if you want, but will not add them to the Slurm accounting database.
>
> --- Shawn
>
> -----Original Message----- From: Danny Auble
> Sent: Friday, February 28, 2014 12:09 PM
> To: slurm-dev
> Subject: [slurm-dev] Re: openmpi misbehaves when started under slurm
>
>
> Just a notice to those attempting to run this way, Slurm will not be
> able to monitor the step or keep accounting or enforce memory limits
> when running this way.
>
> On 02/28/2014 09:01 AM, L. Shawn Matott wrote:
>>
>> On our cluster we use SLURM v2.6.3 with cpusets enabled. We sometimes see
>> problems with openmpi and incorrect cpu pinning. As a workaround we use the
>> following bit of bash code to manually assemble an openmpi rankfile, switch
>> from slurm to ssh as the process launch module, and finally launch using
>> mpirun instead of srun. Hope this is helpful to someone.....
>>
>> ----
>> L. Shawn Matott, PhD
>> Computational Scientist
>> University at Buffalo,
>> Center for Computational Research
>> 701 Ellicott Street, Buffalo, New York 14203
>>
>> #
>> ================================================================================================
>> # create rank file to explicitly bind cores
>> echo "creating hostfile and rankfile"
>> uid=`id -u`
>> jid=$SLURM_JOB_ID
>> nodes=`nodeset -e $SLURM_NODELIST`
>>
>> # trigger creation of cpuset information and save to working dir
>> srun bash -c "cat /cgroup/cpuset/slurm/uid_${uid}/job_${jid}/cpuset.cpus >
>> cpus.\`hostname\`.$SLURM_JOB_ID"
>>
>> RANKFILE=rankfile.$$
>> NODEFILE=nodefile.$$
>>
>> rm -f $RANKFILE
>> rm -f $NODEFILE
>> rank=0
>> for i in ${nodes}; do
>> # extract space-separated list of assigned cpus
>> cpus=`cat cpus.${i}.${SLURM_JOB_ID}`
>> cpus=`nodeset -Re $cpus`
>> # add cpu assignments to the rank file
>> for j in ${cpus}; do
>> echo "rank ${rank}=$i slot=$j" >> $RANKFILE
>> echo "$i" >> $NODEFILE
>> rank=`expr $rank + 1`
>> if [ "$rank" == "$SLURM_NPROCS" ]; then
>> break;
>> fi
>> done
>> if [ "$rank" == "$SLURM_NPROCS" ]; then
>> break;
>> fi
>> done
>>
>> # use ssh instead of slurm as the launcher
>> # the rankfile that was just created will ensure cpusets are still honored.
>> export OMPI_MCA_plm=rsh
>>
>> # launch application using mpirun
>> echo "Launching application using mpirun"
>> mpirun \
>> -h $NODEFILE \
>> --rankfile $RANKFILE \
>> --prefix $OMPI \
>> --n $SLURM_NPROCS \
>> --display-map \
>> --verbose $EXE $ARGS
>> #
>> ================================================================================================