Hello all,

We recently set up a 12-node, 24-processor cluster for our research, but I
have a problem with evenly distributing the processes among processors.

If I submit a job with PBS to use two processors per node, I always see a
node running 3 processes and a "execution server" has only one process. I
spent quite a lot of time to figure this out, but I couldn't  solve this
problem yet. I would really appreciate it if somebody can give me some
insight about this problem.

For just a little bit of background information:

I included "ppn=2" in a pbs script to use two processes per a node, i.e.,
   #PBS -l nodes=11:ppn=2.

A process list from "$PBS_NODEFILE" in a pbs script show that there are two
processes on the execution server, but when I look at a "PI####" file
generated which the program is running, I can see the execution server has
only one process.

The system came with oscar-1.2.1rh72, OpenPBS_2_3_16, and mpich-1.2.1.
I download mpich-1.2.4 and compiled it to make our Fortran research code to
work with Absoft Fortran compiler.

This problem occurs both when I run our research code, and linpack test
problems (xhpl).
Both mpirun from mpich-1.2.1 and mpich-1.2.4 produce the same problem for
linpack tests.

If I specify each node twice in a machine file and run a program with
mpirun, the processes are evenly distributed. However, we would like to use
PBS, so I am asking for a help.

Thanks in advance,

Tong-Seok Han




-------------------------------------------------------
This sf.net email is sponsored by: Are you worried about 
your web server security? Click here for a FREE Thawte 
Apache SSL Guide and answer your Apache SSL security 
needs: http://www.gothawte.com/rd523.html
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to