I have one process per node, here is corresponding line from my job
submission script (with compute nodes named "node1" and "node2"):

#PBS -l select=1:ncpus=1:mpiprocs=1:host=node1+1:ncpus=1:mpiprocs=1:host=node2

On Tue, Jan 18, 2022 at 10:20 PM Ralph Castain via users
<users@lists.open-mpi.org> wrote:
>
> Afraid I can't understand your scenario - when you say you "submit a job" to 
> run on two nodes, how many processes are you running on each node??
>
>
> > On Jan 18, 2022, at 1:07 PM, Crni Gorac via users 
> > <users@lists.open-mpi.org> wrote:
> >
> > Using OpenMPI 4.1.2 from MLNX_OFED_LINUX-5.5-1.0.3.2 distribution, and
> > have PBS 18.1.4 installed on my cluster (cluster nodes are running
> > CentOS 7.9).  When I try to submit a job that will run on two nodes in
> > the cluster, both ranks get OMPI_COMM_WORLD_LOCAL_SIZE set to 2,
> > instead of 1, and OMPI_COMM_WORLD_LOCAL_RANK are set to 0 and 1,
> > instead of both being 0.  At the same time, the hostfile generated by
> > PBS ($PBS_NODEFILE) properly contains two nodes listed.
> >
> > I've tried with OpenMPI 3 from HPC-X, and the same thing happens too.
> > However, when I build OpenMPI myself (notable difference from above
> > mentioned pre-built MPI versions is that I use "--with-tm" option to
> > point to my PBS installation), then OMPI_COMM_WORLD_LOCAL_SIZE and
> > OMPI_COMM_WORLD_LOCAL_RANK are set properly.
> >
> > I'm not sure how to debug the problem, and whether it is possible to
> > fix it at all with a pre-built OpenMPI version, so any suggestion is
> > welcome.
> >
> > Thanks.
>
>

Reply via email to