I have one process per node, here is corresponding line from my job submission script (with compute nodes named "node1" and "node2"):
#PBS -l select=1:ncpus=1:mpiprocs=1:host=node1+1:ncpus=1:mpiprocs=1:host=node2 On Tue, Jan 18, 2022 at 10:20 PM Ralph Castain via users <users@lists.open-mpi.org> wrote: > > Afraid I can't understand your scenario - when you say you "submit a job" to > run on two nodes, how many processes are you running on each node?? > > > > On Jan 18, 2022, at 1:07 PM, Crni Gorac via users > > <users@lists.open-mpi.org> wrote: > > > > Using OpenMPI 4.1.2 from MLNX_OFED_LINUX-5.5-1.0.3.2 distribution, and > > have PBS 18.1.4 installed on my cluster (cluster nodes are running > > CentOS 7.9). When I try to submit a job that will run on two nodes in > > the cluster, both ranks get OMPI_COMM_WORLD_LOCAL_SIZE set to 2, > > instead of 1, and OMPI_COMM_WORLD_LOCAL_RANK are set to 0 and 1, > > instead of both being 0. At the same time, the hostfile generated by > > PBS ($PBS_NODEFILE) properly contains two nodes listed. > > > > I've tried with OpenMPI 3 from HPC-X, and the same thing happens too. > > However, when I build OpenMPI myself (notable difference from above > > mentioned pre-built MPI versions is that I use "--with-tm" option to > > point to my PBS installation), then OMPI_COMM_WORLD_LOCAL_SIZE and > > OMPI_COMM_WORLD_LOCAL_RANK are set properly. > > > > I'm not sure how to debug the problem, and whether it is possible to > > fix it at all with a pre-built OpenMPI version, so any suggestion is > > welcome. > > > > Thanks. > >