> >>> It seems that the calls to collective communication are not > >>> returning for some MPI processes, when the number of processes is > >>> greater or equal to 5. It's reproduceable, on two different > >>> architectures, with two different versions of OpenMPI (1.3.2 and > >>> 1.3.3). It was working correctly with OpenMPI version 1.2.7. > >> > >> Does it work if you turn off the shared memory transport layer; > >> that is, > >> > >> mpirun -n 6 -mca btl ^sm ./testmpi > > > > Yes it does, on both my configurations (AMD and Intel processor). > > So it seems that the shared memory synchronization process is > > broken. > > Presumably that is this bug: > https://svn.open-mpi.org/trac/ompi/ticket/2043
Yes it is. > I also found by trial and error that increasing the number of fifos, eg > -mca btl_sm_num_fifos 5 > on a 6-processor job, apparently worked around the problem. > But yes, something seems broken in OpenMP shared memory transport with > gcc 4.4.x. Yes, same for me: -mca btl_sm_num_fifos 5 worked. Thanks for your answer Jonathan. If I may help the developpers in any way to track this bug get into contact with me. --Vincent