Are you launching the job with "mpirun"? I'm not familiar with that cmd line 
and don't know what it does.

Most likely explanation is that the mpirun from the prebuilt versions doesn't 
have TM support, and therefore doesn't understand the 1ppn directive in your 
cmd line. My guess is that you are using the ssh launcher - what is odd is that 
you should wind up with two procs on the first node, in which case those envars 
are correct. If you are seeing one proc on each node, then something is wrong.


> On Jan 18, 2022, at 1:33 PM, Crni Gorac via users <users@lists.open-mpi.org> 
> wrote:
> 
> I have one process per node, here is corresponding line from my job
> submission script (with compute nodes named "node1" and "node2"):
> 
> #PBS -l select=1:ncpus=1:mpiprocs=1:host=node1+1:ncpus=1:mpiprocs=1:host=node2
> 
> On Tue, Jan 18, 2022 at 10:20 PM Ralph Castain via users
> <users@lists.open-mpi.org> wrote:
>> 
>> Afraid I can't understand your scenario - when you say you "submit a job" to 
>> run on two nodes, how many processes are you running on each node??
>> 
>> 
>>> On Jan 18, 2022, at 1:07 PM, Crni Gorac via users 
>>> <users@lists.open-mpi.org> wrote:
>>> 
>>> Using OpenMPI 4.1.2 from MLNX_OFED_LINUX-5.5-1.0.3.2 distribution, and
>>> have PBS 18.1.4 installed on my cluster (cluster nodes are running
>>> CentOS 7.9).  When I try to submit a job that will run on two nodes in
>>> the cluster, both ranks get OMPI_COMM_WORLD_LOCAL_SIZE set to 2,
>>> instead of 1, and OMPI_COMM_WORLD_LOCAL_RANK are set to 0 and 1,
>>> instead of both being 0.  At the same time, the hostfile generated by
>>> PBS ($PBS_NODEFILE) properly contains two nodes listed.
>>> 
>>> I've tried with OpenMPI 3 from HPC-X, and the same thing happens too.
>>> However, when I build OpenMPI myself (notable difference from above
>>> mentioned pre-built MPI versions is that I use "--with-tm" option to
>>> point to my PBS installation), then OMPI_COMM_WORLD_LOCAL_SIZE and
>>> OMPI_COMM_WORLD_LOCAL_RANK are set properly.
>>> 
>>> I'm not sure how to debug the problem, and whether it is possible to
>>> fix it at all with a pre-built OpenMPI version, so any suggestion is
>>> welcome.
>>> 
>>> Thanks.
>> 
>> 


Reply via email to