[OMPI users] Program hangs when using OpenMPI and CUDA

Fengguang Song Sat, 4 Jun 2011 18:15:44 -0400

Hi,

I'm confronting a problem when using OpenMPI 1.5.1 on a GPU cluster. My program 
uses MPI to exchange data
between nodes, and uses cudaMemcpyAsync to exchange data between Host and GPU 
devices within a node.
When the MPI message size is less than 1MB, everything works fine. However, 
when the message size
is > 1MB, the program hangs (i.e., an MPI send never reaches its destination 
based on my trace).


The issue may be related to locked-memory contention between OpenMPI and CUDA.
Does anyone have the experience to solve the problem? Which MCA parameters 
should I tune to increase 
the message size to be > 1MB (to avoid the program hang)? Any help would be 
appreciated.

Thanks,
Fengguang

[OMPI users] Program hangs when using OpenMPI and CUDA

Reply via email to