no i did not.

I did and finally everything is ok.

I am using MPICH to compile an run my jobs.I am using mpirun to execute
the jobs.

Here i am sending a simple mpi job that compile and execute a cpi.c


#!/bin/bash
#PBS -N caca4
#PBS -o outputfile
#PBS -e errorfile
#PBS -l nodes=2:ppn=2
#PBS -m abe
#PBS -M [EMAIL PROTECTED]
#PBS -q workq
nn=`cat $PBS_NODEFILE | wc -l`
/opt/mpich-1.2.4/bin/mpicc -o
/home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c.cc
/home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c
/opt/mpich-1.2.4/bin/mpirun -v -machinefile $PBS_NODEFILE -np $nn
/home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c.cc

and here as the output and error file i got :
error file:

You have no controlling tty.  Cannot read passphrase.
output file:

running /home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c.cc on 4
LINUX ch_p4 processors
Created /home/oscartst/PI1323
p0_1449:  p4_error: Timeout in making connection to remote process on
oscarnode6.oscardomain: 0
p0_1449: (306.041952) net_send: could not write to fd=4, errno = 32

and here is the email :

PBS Job Id: 148.oscarnode1.oscardomain
Job Name:   caca4
Execution terminated
Exit_status=1
resources_used.cput=00:00:01
resources_used.mem=13216kb
resources_used.vmem=13216kb
resources_used.walltime=00:05:08

i am having doubts about it:

1-Why exists a "remote process on oscarnode6.oscardomain: 0" if i have only
ask for two nodes.I always got this error despite of the number of nodes i
choose.

So i did some test

TEST#1 Change the script to use 4 processors and eliminate #PBS -l
nodes=2:ppn=2 option

/opt/mpich-1.2.4/bin/mpirun -v -machinefile $PBS_NODEFILE -np 4
/home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c.cc
and here are the results:

errorfile:
You have no controlling tty.  Cannot read passphrase.

outputfile:
running /home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c.cc on 4
LINUX ch_p4 processors
Created /home/oscartst/PI1528
p0_1654:  p4_error: Timeout in making connection to remote process on
oscarnode5.oscardomain: 0
p0_1654: (306.048095) net_send: could not write to fd=4, errno = 32

TEST#2 i decided not to use  $PBS_NODEFILE and change the script to:
/opt/mpich-1.2.4/bin/mpirun -np 4
/home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c.cc

and guess what??...

Cannot read /opt/mpich-1.2.4/share/machines.LINUX.
Looked for files with extension LINUX in
directory /opt/mpich-1.2.4/share .

(The /opt/mpich-1.2.4/share/machines.LINUX. file is root:root 644)

TEST#3 so i wanted to execute mpirun without pbs as root
mpirun -np 4 cpi.c.cc
 and finally got the job be executed ( i got some messages:

Warning: the RSA host key for 'oscarnode2' differs from the key for the IP
address '10.0.2.2'
Matching host key in /root/.ssh/known_hosts2:5
Offending key for IP in /root/.ssh/known_hosts2:2

TEST4: execute mpirun without pbs as oscartst

i was asked 3 or 4 times for the oscartst password.

I want to use mpich with pbs but it seems there is some problems.
i was told that i should use mpiexec.
Is that true? I am using a fast ethernet network in my cluster


Any idea?

David













-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps1
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to