Hello all, We recently set up a 12-node, 24-processor cluster for our research, but I have a problem with evenly distributing the processes among processors.
If I submit a job with PBS to use two processors per node, I always see a node running 3 processes and a "execution server" has only one process. I spent quite a lot of time to figure this out, but I couldn't solve this problem yet. I would really appreciate it if somebody can give me some insight about this problem. For just a little bit of background information: I included "ppn=2" in a pbs script to use two processes per a node, i.e., #PBS -l nodes=11:ppn=2. A process list from "$PBS_NODEFILE" in a pbs script show that there are two processes on the execution server, but when I look at a "PI####" file generated which the program is running, I can see the execution server has only one process. The system came with oscar-1.2.1rh72, OpenPBS_2_3_16, and mpich-1.2.1. I download mpich-1.2.4 and compiled it to make our Fortran research code to work with Absoft Fortran compiler. This problem occurs both when I run our research code, and linpack test problems (xhpl). Both mpirun from mpich-1.2.1 and mpich-1.2.4 produce the same problem for linpack tests. If I specify each node twice in a machine file and run a program with mpirun, the processes are evenly distributed. However, we would like to use PBS, so I am asking for a help. Thanks in advance, Tong-Seok Han ------------------------------------------------------- This sf.net email is sponsored by: Are you worried about your web server security? Click here for a FREE Thawte Apache SSL Guide and answer your Apache SSL security needs: http://www.gothawte.com/rd523.html _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
