On 2009-10-29, at 10:21AM, Vincent Loechner wrote:


It seems that the calls to collective communication are not
returning for some MPI processes, when the number of processes is
greater or equal to 5. It's reproduceable, on two different
architectures, with two different versions of OpenMPI (1.3.2 and
1.3.3). It was working correctly with OpenMPI version 1.2.7.

Does it work if you turn off the shared memory transport layer; that is,

mpirun -n 6 -mca btl ^sm ./testmpi

Yes it does, on both my configurations (AMD and Intel processor).
So it seems that the shared memory synchronization process is
broken.

Presumably that is this bug:
https://svn.open-mpi.org/trac/ompi/ticket/2043

I also found by trial and error that increasing the number of fifos, eg
-mca btl_sm_num_fifos 5
on a 6-processor job, apparently worked around the problem.
But yes, something seems broken in OpenMP shared memory transport with gcc 4.4.x.

   Jonathan
--
Jonathan Dursi <ljdu...@scinet.utoronto.ca>




Reply via email to