Hello,

For everybody interested: the problem with affinity that was discussed here
was fixed in main OMPI trunk. The fix will be included into 1.8.1 version.


2014-03-01 0:45 GMT+07:00 Ralph Castain <[email protected]>:

>
>
> On Feb 28, 2014, at 9:13 AM, L. Shawn Matott <[email protected]> wrote:
>
> >
> > Danny,
> >
> > That's good to know. Which of the steps causes the loss of functionality
> (rankfile, ssh as plm, or mpirun instead of srun)?
>
> Again, to clarify - you gain a lot of functionality in terms of mapping,
> binding, and other areas. In exchange, you lose atomicity in accounting and
> memory limits. Note, however, that you can enforce memory limits on the
> individual procs using mpirun if you so choose, so it the only actual loss
> is the individual process accounting. Mpirun will provide those numbers as
> well, if you want, but will not add them to the Slurm accounting database.
>
> >
> > --- Shawn
> >
> > -----Original Message----- From: Danny Auble
> > Sent: Friday, February 28, 2014 12:09 PM
> > To: slurm-dev
> > Subject: [slurm-dev] Re: openmpi misbehaves when started under slurm
> >
> >
> > Just a notice to those attempting to run this way, Slurm will not be
> > able to monitor the step or keep accounting or enforce memory limits
> > when running this way.
> >
> > On 02/28/2014 09:01 AM, L. Shawn Matott wrote:
> >>
> >> On our cluster we use SLURM v2.6.3 with cpusets enabled.  We sometimes
> see
> >> problems with openmpi and incorrect cpu pinning. As a workaround we use
> the
> >> following bit of bash code to manually assemble an openmpi rankfile,
> switch
> >> from slurm to ssh as the process launch module, and finally launch using
> >> mpirun instead of srun. Hope this is helpful to someone.....
> >>
> >> ----
> >> L. Shawn Matott, PhD
> >> Computational Scientist
> >> University at Buffalo,
> >> Center for Computational Research
> >> 701 Ellicott Street, Buffalo, New York 14203
> >>
> >> #
> >>
> ================================================================================================
> >> # create rank file to explicitly bind cores
> >> echo "creating hostfile and rankfile"
> >> uid=`id -u`
> >> jid=$SLURM_JOB_ID
> >> nodes=`nodeset -e $SLURM_NODELIST`
> >>
> >> # trigger creation of cpuset information and save to working dir
> >> srun bash -c "cat
> /cgroup/cpuset/slurm/uid_${uid}/job_${jid}/cpuset.cpus >
> >> cpus.\`hostname\`.$SLURM_JOB_ID"
> >>
> >> RANKFILE=rankfile.$$
> >> NODEFILE=nodefile.$$
> >>
> >> rm -f $RANKFILE
> >> rm -f $NODEFILE
> >> rank=0
> >> for i in ${nodes}; do
> >> # extract space-separated list of assigned cpus
> >> cpus=`cat cpus.${i}.${SLURM_JOB_ID}`
> >> cpus=`nodeset -Re $cpus`
> >> # add cpu assignments to the rank file
> >> for j in ${cpus}; do
> >>   echo "rank ${rank}=$i slot=$j" >> $RANKFILE
> >>   echo "$i" >> $NODEFILE
> >>   rank=`expr $rank + 1`
> >>   if [ "$rank" == "$SLURM_NPROCS" ]; then
> >>     break;
> >>   fi
> >> done
> >> if [ "$rank" == "$SLURM_NPROCS" ]; then
> >>   break;
> >> fi
> >> done
> >>
> >> # use ssh instead of slurm as the launcher
> >> # the rankfile that was just created will ensure cpusets are still
> honored.
> >> export OMPI_MCA_plm=rsh
> >>
> >> # launch application using mpirun
> >> echo "Launching application using mpirun"
> >> mpirun \
> >> -h $NODEFILE \
> >> --rankfile $RANKFILE  \
> >> --prefix $OMPI \
> >> --n $SLURM_NPROCS \
> >> --display-map  \
> >> --verbose $EXE $ARGS
> >> #
> >>
> ================================================================================================
>



-- 
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov

Reply via email to