P.S. I planned to check this thing before reporting. But since this came up in the list I can suggest that maybe openMPI plugin can help resolving this.
In MPI plugins of SLURM there is a handler called mpi_hook_client_single_task_per_node() It was made specially for LAM/MPI. Since OpenMPI is the descendant of LAM/MPI probably this routin should return true instead of false. But I have to say that I didn't had the time to check that. I thought of that just yesterday :). 2014-03-01 0:18 GMT+07:00 Artem Polyakov <[email protected]>: > Hey, Shawn! Nice to see you here too:) > > I'd like to add that I found the reason of this behavior on your cluster. > It is in the way OpenMPI launches it's processes. It use srun with just one > slot on each node to launch orted. In this case SLURM will bind it to just > one core and all children's of orted - real branches will also run in that > core. I wrote about this to OpenMPI developers but we didn't find solution > yet. > But probably I found the way, I'll check that in near future. > > More elegant sution is to configure OpenMPI with PMI support and launch > MPI with srun like this: > > srun ./mpiname > > > суббота, 1 марта 2014 г. пользователь L. Shawn Matott написал: > > >> On our cluster we use SLURM v2.6.3 with cpusets enabled. We sometimes see >> problems with openmpi and incorrect cpu pinning. As a workaround we use >> the >> following bit of bash code to manually assemble an openmpi rankfile, >> switch >> from slurm to ssh as the process launch module, and finally launch using >> mpirun instead of srun. Hope this is helpful to someone..... >> >> ---- >> L. Shawn Matott, PhD >> Computational Scientist >> University at Buffalo, >> Center for Computational Research >> 701 Ellicott Street, Buffalo, New York 14203 >> >> # >> ============================================================ >> ==================================== >> # create rank file to explicitly bind cores >> echo "creating hostfile and rankfile" >> uid=`id -u` >> jid=$SLURM_JOB_ID >> nodes=`nodeset -e $SLURM_NODELIST` >> >> # trigger creation of cpuset information and save to working dir >> srun bash -c "cat /cgroup/cpuset/slurm/uid_${uid}/job_${jid}/cpuset.cpus >> > >> cpus.\`hostname\`.$SLURM_JOB_ID" >> >> RANKFILE=rankfile.$$ >> NODEFILE=nodefile.$$ >> >> rm -f $RANKFILE >> rm -f $NODEFILE >> rank=0 >> for i in ${nodes}; do >> # extract space-separated list of assigned cpus >> cpus=`cat cpus.${i}.${SLURM_JOB_ID}` >> cpus=`nodeset -Re $cpus` >> # add cpu assignments to the rank file >> for j in ${cpus}; do >> echo "rank ${rank}=$i slot=$j" >> $RANKFILE >> echo "$i" >> $NODEFILE >> rank=`expr $rank + 1` >> if [ "$rank" == "$SLURM_NPROCS" ]; then >> break; >> fi >> done >> if [ "$rank" == "$SLURM_NPROCS" ]; then >> break; >> fi >> done >> >> # use ssh instead of slurm as the launcher >> # the rankfile that was just created will ensure cpusets are still >> honored. >> export OMPI_MCA_plm=rsh >> >> # launch application using mpirun >> echo "Launching application using mpirun" >> mpirun \ >> -h $NODEFILE \ >> --rankfile $RANKFILE \ >> --prefix $OMPI \ >> --n $SLURM_NPROCS \ >> --display-map \ >> --verbose $EXE $ARGS >> # >> ============================================================ >> ==================================== > > > > -- > С Уважением, Поляков Артем Юрьевич > Best regards, Artem Y. Polyakov > -- С Уважением, Поляков Артем Юрьевич Best regards, Artem Y. Polyakov
