> >>> It seems that the calls to collective communication are not
> >>> returning for some MPI processes, when the number of processes is
> >>> greater or equal to 5. It's reproduceable, on two different
> >>> architectures, with two different versions of OpenMPI (1.3.2 and
> >>> 1.3.3). It was working correctly with OpenMPI version 1.2.7.
> >>
> >> Does it work if you turn off the shared memory transport layer;  
> >> that is,
> >>
> >> mpirun -n 6 -mca btl ^sm ./testmpi
> >
> > Yes it does, on both my configurations (AMD and Intel processor).
> > So it seems that the shared memory synchronization process is
> > broken.
> 
> Presumably that is this bug:
> https://svn.open-mpi.org/trac/ompi/ticket/2043

Yes it is.

> I also found by trial and error that increasing the number of fifos, eg
> -mca btl_sm_num_fifos 5
> on a 6-processor job, apparently worked around the problem.
> But yes, something seems broken in OpenMP shared memory transport with  
> gcc 4.4.x.

Yes, same for me: -mca btl_sm_num_fifos 5 worked.
Thanks for your answer Jonathan.

If I may help the developpers in any way to track this bug get into
contact with me.

--Vincent

Reply via email to