Afraid I can't understand your scenario - when you say you "submit a job" to run on two nodes, how many processes are you running on each node??
> On Jan 18, 2022, at 1:07 PM, Crni Gorac via users <users@lists.open-mpi.org> > wrote: > > Using OpenMPI 4.1.2 from MLNX_OFED_LINUX-5.5-1.0.3.2 distribution, and > have PBS 18.1.4 installed on my cluster (cluster nodes are running > CentOS 7.9). When I try to submit a job that will run on two nodes in > the cluster, both ranks get OMPI_COMM_WORLD_LOCAL_SIZE set to 2, > instead of 1, and OMPI_COMM_WORLD_LOCAL_RANK are set to 0 and 1, > instead of both being 0. At the same time, the hostfile generated by > PBS ($PBS_NODEFILE) properly contains two nodes listed. > > I've tried with OpenMPI 3 from HPC-X, and the same thing happens too. > However, when I build OpenMPI myself (notable difference from above > mentioned pre-built MPI versions is that I use "--with-tm" option to > point to my PBS installation), then OMPI_COMM_WORLD_LOCAL_SIZE and > OMPI_COMM_WORLD_LOCAL_RANK are set properly. > > I'm not sure how to debug the problem, and whether it is possible to > fix it at all with a pre-built OpenMPI version, so any suggestion is > welcome. > > Thanks.