Thank you for the correction, Ralph.
I didn't know there was a (wise) default for the
number of processes when using Torque-enabled OpenMPI.

Gus Correa

Ralph Castain wrote:
Just to correct something said here.

You need to tell mpirun how many processes to launch,
regardless of whether you are using Torque or not.

This is not correct. If you don't tell mpirun how many processes to launch, we will automatically launch one process for every slot in your allocation. In the case described here, there were 16 slots allocated, so we would automatically launch 16 processes.


On Aug 10, 2009, at 3:47 PM, Gus Correa wrote:

Hi Jody, list

See comments inline.

Jody Klymak wrote:
On Aug 10, 2009, at  13:01 PM, Gus Correa wrote:
Hi Jody

We don't have Mac OS-X, but Linux, not sure if this applies to you.

Did you configure your OpenMPI with Torque support,
and pointed to the same library that provides the
Torque you are using (--with-tm=/path/to/torque-library-directory)?
Not explicitly. I'll check into that....

1) If you don't do it explicitly, configure will use the first libtorque
it finds (and that works I presume),
which may/may not be the one you want, if you have more than one.
If you only have one version of Torque installed,
this shouldn't be the problem.

2) Have you tried something very simple, like the examples/hello_c.c
program, to test the Torque-OpenMPI integration?

3) Also, just in case, put a "cat $PBS_NODEFILE" inside your script,
before mpirun, to see what it reports.
For  "#PBS -l nodes=2:ppn=8"
it should show 16 lines, 8 with the name of each node.

4) Finally, just to make sure the syntax is right.
On your message you wrote:

>>> If I submit openMPI with:
>>> #PBS -l nodes=2:ppn=8
>>> mpirun MyProg

Is this the real syntax you used?

Or was it perhaps:

#PBS -l nodes=2:ppn=8
mpirun -n 16 MyProg

You need to tell mpirun how many processes to launch,
regardless of whether you are using Torque or not.

My $0.02

Gus Correa
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA

Are you using the right mpirun? (There are so many out there.)
yeah - I use the explicit path and moved the OS X one.
Thanks!  Jody
Gus Correa
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA

Jody Klymak wrote:
Hi All,
I've been trying to get torque pbs to work on my OS X 10.5.7 cluster with openMPI (after finding that Xgrid was pretty flaky about connections). I *think* this is an MPI problem (perhaps via operator error!)
If I submit openMPI with:
#PBS -l nodes=2:ppn=8
mpirun MyProg
pbs locks off two of the processors, checked via "pbsnodes -a", and the job output. But mpirun runs the whole job on the second of the two processors.
If I run the same job w/o qsub (i.e. using ssh)
mpirun -n 16 -host xserve01,xserve02 MyProg
it runs fine on all the nodes....
My /var/spool/toque/server_priv/nodes file looks like:
xserve01.local np=8
xserve02.local np=8
Any idea what could be going wrong or how to debu this properly? There is nothing suspicious in the server or mom logs.
Thanks for any help,
Jody Klymak
users mailing list

users mailing list
Jody Klymak
users mailing list

users mailing list

users mailing list

Reply via email to