what is your message size and the number of cores per node? is there any difference using different algorithms?
2009/1/23 Gabriele Fatigati <g.fatig...@cineca.it> > Hi Jeff, > i would like to understand why, if i run over 512 procs or more, my > code stops over mpi collective, also with little send buffer. All > processors are locked into call, doing nothing. But, if i add > MPI_Barrier after MPI collective, it works! I run over Infiniband > net. > > I know many people with this strange problem, i think there is a > strange interaction between Infiniband and OpenMPI that causes it. > > > > 2009/1/23 Jeff Squyres <jsquy...@cisco.com>: > > On Jan 23, 2009, at 6:32 AM, Gabriele Fatigati wrote: > > > >> I've noted that OpenMPI has an asynchronous behaviour in the collective > >> calls. > >> The processors, doesn't wait that other procs arrives in the call. > > > > That is correct. > > > >> This behaviour sometimes can cause some problems with a lot of > >> processors in the jobs. > > > > Can you describe what exactly you mean? The MPI spec specifically allows > > this behavior; OMPI made specific design choices and optimizations to > > support this behavior. FWIW, I'd be pretty surprised if any optimized > MPI > > implementation defaults to fully synchronous collective operations. > > > >> Is there an OpenMPI parameter to lock all process in the collective > >> call until is finished? Otherwise i have to insert many MPI_Barrier > >> in my code and it is very tedious and strange.. > > > > As you have notes, MPI_Barrier is the *only* collective operation that > MPI > > guarantees to have any synchronization properties (and it's a fairly weak > > guarantee at that; no process will exit the barrier until every process > has > > entered the barrier -- but there's no guarantee that all processes leave > the > > barrier at the same time). > > > > Why do you need your processes to exit collective operations at the same > > time? > > > > -- > > Jeff Squyres > > Cisco Systems > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > -- > Ing. Gabriele Fatigati > > Parallel programmer > > CINECA Systems & Tecnologies Department > > Supercomputing Group > > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy > > www.cineca.it Tel: +39 051 6171722 > > g.fatigati [AT] cineca.it > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >