On Feb 28, 2014, at 9:16 AM, Danny Auble <[email protected]> wrote:
>
>
> On 02/28/2014 09:13 AM, L. Shawn Matott wrote:
>>
>> Danny,
>>
>> That's good to know. Which of the steps causes the loss of functionality
>> (rankfile, ssh as plm, or mpirun instead of srun)?
> Yes :). Primarily mpirun instead of srun though.
>>
>> --- Shawn
>>
>> -----Original Message----- From: Danny Auble
>> Sent: Friday, February 28, 2014 12:09 PM
>> To: slurm-dev
>> Subject: [slurm-dev] Re: openmpi misbehaves when started under slurm
>>
>>
>> Just a notice to those attempting to run this way, Slurm will not be
>> able to monitor the step or keep accounting or enforce memory limits
>> when running this way.
Just to clarify - this isn't precisely true. What happens is that mpirun uses
srun to launch its own set of daemons. Those daemons do indeed appear within
the Slurm context, and so they and all their children (i.e., your app procs)
are covered by Slurm's accounting and memory limits. However, those functions
appear only at the *aggregate* level - i.e., to Slurm only one process is
running, and so the accounting is done at the aggregated level, and memory
limits apply to the aggregate.
>>
>> On 02/28/2014 09:01 AM, L. Shawn Matott wrote:
>>>
>>> On our cluster we use SLURM v2.6.3 with cpusets enabled. We sometimes see
>>> problems with openmpi and incorrect cpu pinning. As a workaround we use the
>>> following bit of bash code to manually assemble an openmpi rankfile, switch
>>> from slurm to ssh as the process launch module, and finally launch using
>>> mpirun instead of srun. Hope this is helpful to someone.....
>>>
>>> ----
>>> L. Shawn Matott, PhD
>>> Computational Scientist
>>> University at Buffalo,
>>> Center for Computational Research
>>> 701 Ellicott Street, Buffalo, New York 14203
>>>
>>> #
>>> ================================================================================================
>>>
>>> # create rank file to explicitly bind cores
>>> echo "creating hostfile and rankfile"
>>> uid=`id -u`
>>> jid=$SLURM_JOB_ID
>>> nodes=`nodeset -e $SLURM_NODELIST`
>>>
>>> # trigger creation of cpuset information and save to working dir
>>> srun bash -c "cat /cgroup/cpuset/slurm/uid_${uid}/job_${jid}/cpuset.cpus >
>>> cpus.\`hostname\`.$SLURM_JOB_ID"
>>>
>>> RANKFILE=rankfile.$$
>>> NODEFILE=nodefile.$$
>>>
>>> rm -f $RANKFILE
>>> rm -f $NODEFILE
>>> rank=0
>>> for i in ${nodes}; do
>>> # extract space-separated list of assigned cpus
>>> cpus=`cat cpus.${i}.${SLURM_JOB_ID}`
>>> cpus=`nodeset -Re $cpus`
>>> # add cpu assignments to the rank file
>>> for j in ${cpus}; do
>>> echo "rank ${rank}=$i slot=$j" >> $RANKFILE
>>> echo "$i" >> $NODEFILE
>>> rank=`expr $rank + 1`
>>> if [ "$rank" == "$SLURM_NPROCS" ]; then
>>> break;
>>> fi
>>> done
>>> if [ "$rank" == "$SLURM_NPROCS" ]; then
>>> break;
>>> fi
>>> done
>>>
>>> # use ssh instead of slurm as the launcher
>>> # the rankfile that was just created will ensure cpusets are still honored.
>>> export OMPI_MCA_plm=rsh
>>>
>>> # launch application using mpirun
>>> echo "Launching application using mpirun"
>>> mpirun \
>>> -h $NODEFILE \
>>> --rankfile $RANKFILE \
>>> --prefix $OMPI \
>>> --n $SLURM_NPROCS \
>>> --display-map \
>>> --verbose $EXE $ARGS
>>> #
>>> ================================================================================================
>>>