P.S.

I planned to check this thing before reporting. But since this came up in
the list I can suggest that maybe openMPI plugin can help resolving this.

In MPI plugins of SLURM there is a handler called
mpi_hook_client_single_task_per_node()

It was made specially for LAM/MPI. Since OpenMPI is the descendant of
LAM/MPI probably this routin should return true instead of false. But I
have to say that I didn't had the time to check that. I thought of that
just yesterday :).


2014-03-01 0:18 GMT+07:00 Artem Polyakov <[email protected]>:

> Hey, Shawn! Nice to see you here too:)
>
> I'd like to add that I found the reason of this behavior on your cluster.
> It is in the way OpenMPI launches it's processes. It use srun with just one
> slot on each node to launch orted. In this case SLURM will bind it to just
> one core and all children's of orted - real branches will also run in that
> core. I wrote about this to OpenMPI developers but we didn't find solution
> yet.
> But probably I found the way, I'll check that in near future.
>
> More elegant sution is to configure OpenMPI with PMI support and launch
> MPI with srun like this:
>
> srun ./mpiname
>
>
> суббота, 1 марта 2014 г. пользователь L. Shawn Matott написал:
>
>
>> On our cluster we use SLURM v2.6.3 with cpusets enabled.  We sometimes see
>> problems with openmpi and incorrect cpu pinning. As a workaround we use
>> the
>> following bit of bash code to manually assemble an openmpi rankfile,
>> switch
>> from slurm to ssh as the process launch module, and finally launch using
>> mpirun instead of srun. Hope this is helpful to someone.....
>>
>> ----
>> L. Shawn Matott, PhD
>> Computational Scientist
>> University at Buffalo,
>> Center for Computational Research
>> 701 Ellicott Street, Buffalo, New York 14203
>>
>> #
>> ============================================================
>> ====================================
>> # create rank file to explicitly bind cores
>> echo "creating hostfile and rankfile"
>> uid=`id -u`
>> jid=$SLURM_JOB_ID
>> nodes=`nodeset -e $SLURM_NODELIST`
>>
>> # trigger creation of cpuset information and save to working dir
>> srun bash -c "cat /cgroup/cpuset/slurm/uid_${uid}/job_${jid}/cpuset.cpus
>> >
>> cpus.\`hostname\`.$SLURM_JOB_ID"
>>
>> RANKFILE=rankfile.$$
>> NODEFILE=nodefile.$$
>>
>> rm -f $RANKFILE
>> rm -f $NODEFILE
>> rank=0
>> for i in ${nodes}; do
>>  # extract space-separated list of assigned cpus
>>  cpus=`cat cpus.${i}.${SLURM_JOB_ID}`
>>  cpus=`nodeset -Re $cpus`
>>  # add cpu assignments to the rank file
>>  for j in ${cpus}; do
>>    echo "rank ${rank}=$i slot=$j" >> $RANKFILE
>>    echo "$i" >> $NODEFILE
>>    rank=`expr $rank + 1`
>>    if [ "$rank" == "$SLURM_NPROCS" ]; then
>>      break;
>>    fi
>>  done
>>  if [ "$rank" == "$SLURM_NPROCS" ]; then
>>    break;
>>  fi
>> done
>>
>> # use ssh instead of slurm as the launcher
>> # the rankfile that was just created will ensure cpusets are still
>> honored.
>> export OMPI_MCA_plm=rsh
>>
>> # launch application using mpirun
>> echo "Launching application using mpirun"
>> mpirun \
>>  -h $NODEFILE \
>>  --rankfile $RANKFILE  \
>>  --prefix $OMPI \
>>  --n $SLURM_NPROCS \
>>  --display-map  \
>>  --verbose $EXE $ARGS
>> #
>> ============================================================
>> ====================================
>
>
>
> --
> С Уважением, Поляков Артем Юрьевич
> Best regards, Artem Y. Polyakov
>



-- 
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov

Reply via email to