Afraid I can't understand your scenario - when you say you "submit a job" to 
run on two nodes, how many processes are you running on each node??


> On Jan 18, 2022, at 1:07 PM, Crni Gorac via users <users@lists.open-mpi.org> 
> wrote:
> 
> Using OpenMPI 4.1.2 from MLNX_OFED_LINUX-5.5-1.0.3.2 distribution, and
> have PBS 18.1.4 installed on my cluster (cluster nodes are running
> CentOS 7.9).  When I try to submit a job that will run on two nodes in
> the cluster, both ranks get OMPI_COMM_WORLD_LOCAL_SIZE set to 2,
> instead of 1, and OMPI_COMM_WORLD_LOCAL_RANK are set to 0 and 1,
> instead of both being 0.  At the same time, the hostfile generated by
> PBS ($PBS_NODEFILE) properly contains two nodes listed.
> 
> I've tried with OpenMPI 3 from HPC-X, and the same thing happens too.
> However, when I build OpenMPI myself (notable difference from above
> mentioned pre-built MPI versions is that I use "--with-tm" option to
> point to my PBS installation), then OMPI_COMM_WORLD_LOCAL_SIZE and
> OMPI_COMM_WORLD_LOCAL_RANK are set properly.
> 
> I'm not sure how to debug the problem, and whether it is possible to
> fix it at all with a pre-built OpenMPI version, so any suggestion is
> welcome.
> 
> Thanks.


Reply via email to