Have you tried this:
http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion
On Feb 2, 2009, at 2:52 PM, c.j....@exxonmobil.com wrote:
I am using openmpi to run a job on 4 nodes, 2 processors per node.
Seems
like 5 out of the 8 processors executed the app successfully and 3
of them
did not. Here is the error message I got. The last thing I did in
the code
is an MPI_Barrier call and it never returns (probably because 3 of
the
processes never gets executed properly?)
[0,1,7][btl_openib_component.c:1332:btl_openib_component_progress]
from
hplcnla160 to: hplcnla162 error polling HP CQ with status LOCAL LENGTH
ERROR status number 1 for wr_id 6158264 opcode 0
and here is the script I used:
#!/bin/bash -debug
#PBS -N mytest
#PBS -l nodes=4:ppn=2,walltime=00:05:00,tpn=2
#PBS -j oe
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
/opt/openmpi-1.2.4/gnu/bin/mpirun -np $NP My_Executable
Has anybody seen this kind of error before? Thanks.
CJ
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems