no i did not. I did and finally everything is ok.
I am using MPICH to compile an run my jobs.I am using mpirun to execute the jobs. Here i am sending a simple mpi job that compile and execute a cpi.c #!/bin/bash #PBS -N caca4 #PBS -o outputfile #PBS -e errorfile #PBS -l nodes=2:ppn=2 #PBS -m abe #PBS -M [EMAIL PROTECTED] #PBS -q workq nn=`cat $PBS_NODEFILE | wc -l` /opt/mpich-1.2.4/bin/mpicc -o /home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c.cc /home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c /opt/mpich-1.2.4/bin/mpirun -v -machinefile $PBS_NODEFILE -np $nn /home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c.cc and here as the output and error file i got : error file: You have no controlling tty. Cannot read passphrase. output file: running /home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c.cc on 4 LINUX ch_p4 processors Created /home/oscartst/PI1323 p0_1449: p4_error: Timeout in making connection to remote process on oscarnode6.oscardomain: 0 p0_1449: (306.041952) net_send: could not write to fd=4, errno = 32 and here is the email : PBS Job Id: 148.oscarnode1.oscardomain Job Name: caca4 Execution terminated Exit_status=1 resources_used.cput=00:00:01 resources_used.mem=13216kb resources_used.vmem=13216kb resources_used.walltime=00:05:08 i am having doubts about it: 1-Why exists a "remote process on oscarnode6.oscardomain: 0" if i have only ask for two nodes.I always got this error despite of the number of nodes i choose. So i did some test TEST#1 Change the script to use 4 processors and eliminate #PBS -l nodes=2:ppn=2 option /opt/mpich-1.2.4/bin/mpirun -v -machinefile $PBS_NODEFILE -np 4 /home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c.cc and here are the results: errorfile: You have no controlling tty. Cannot read passphrase. outputfile: running /home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c.cc on 4 LINUX ch_p4 processors Created /home/oscartst/PI1528 p0_1654: p4_error: Timeout in making connection to remote process on oscarnode5.oscardomain: 0 p0_1654: (306.048095) net_send: could not write to fd=4, errno = 32 TEST#2 i decided not to use $PBS_NODEFILE and change the script to: /opt/mpich-1.2.4/bin/mpirun -np 4 /home/oscartst/pbs/DIRECTORY/[EMAIL PROTECTED]/cpi.c/cpi.c.cc and guess what??... Cannot read /opt/mpich-1.2.4/share/machines.LINUX. Looked for files with extension LINUX in directory /opt/mpich-1.2.4/share . (The /opt/mpich-1.2.4/share/machines.LINUX. file is root:root 644) TEST#3 so i wanted to execute mpirun without pbs as root mpirun -np 4 cpi.c.cc and finally got the job be executed ( i got some messages: Warning: the RSA host key for 'oscarnode2' differs from the key for the IP address '10.0.2.2' Matching host key in /root/.ssh/known_hosts2:5 Offending key for IP in /root/.ssh/known_hosts2:2 TEST4: execute mpirun without pbs as oscartst i was asked 3 or 4 times for the oscartst password. I want to use mpich with pbs but it seems there is some problems. i was told that i should use mpiexec. Is that true? I am using a fast ethernet network in my cluster Any idea? David ------------------------------------------------------- This SF.Net email sponsored by: Parasoft Error proof Web apps, automate testing & more. Download & eval WebKing and get a free book. www.parasoft.com/bulletproofapps1 _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users