Lin,
Try -np 16 and not running on the head node.
Doug Reeder
On Jul 6, 2009, at 7:08 PM, Zou, Lin (GE, Research, Consultant) wrote:
Hi all,
The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as
a headnode, they are connected by a high speed switch.
There are point-to-point communication functions( MPI_Send and
MPI_Recv ), the data size is about 40KB, and a lot of computings
which will consume a long time(about 1 sec)in a loop.The co-
processor in PS3 can take care of the computation, the main
processor take care of point-to-point communication,so the computing
and communication can overlap.The communication funtions should
return much faster than computing function.
My question is that after some circles, the time consumed by
communication functions in a PS3 will increase heavily, and the
whole cluster's sync state will corrupt.When I decrease the
computing time, this situation just disappeare.I am very confused
about this.
I think there is a mechanism in OpenMPI that cause this case, does
everyone get this situation before?
I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there
something i should added?
Lin
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users