Have you tried this:

    http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion



On Feb 2, 2009, at 2:52 PM, c.j....@exxonmobil.com wrote:


I am using openmpi to run a job on 4 nodes, 2 processors per node. Seems like 5 out of the 8 processors executed the app successfully and 3 of them did not. Here is the error message I got. The last thing I did in the code is an MPI_Barrier call and it never returns (probably because 3 of the
processes never gets executed properly?)

[0,1,7][btl_openib_component.c:1332:btl_openib_component_progress] from
hplcnla160 to: hplcnla162 error polling HP CQ with status LOCAL LENGTH
ERROR status number 1 for wr_id 6158264 opcode 0

and here is the script I used:

#!/bin/bash -debug
#PBS -N mytest
#PBS -l nodes=4:ppn=2,walltime=00:05:00,tpn=2
#PBS -j oe

NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
/opt/openmpi-1.2.4/gnu/bin/mpirun -np $NP My_Executable

Has anybody seen this kind of error before? Thanks.

CJ

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to