Re: [OMPI users] Asynchronous behaviour of MPI Collectives

Igor Kozin Fri, 23 Jan 2009 08:05:35 -0500

what is your message size and the number of cores per node?
is there any difference using different algorithms?


2009/1/23 Gabriele Fatigati <g.fatig...@cineca.it>

> Hi Jeff,
> i would like to understand why, if i run over 512 procs or more, my
> code stops over mpi collective, also with little send buffer. All
> processors are locked into call, doing nothing. But, if i add
> MPI_Barrier  after MPI collective, it works! I run over Infiniband
> net.
>
> I know many people with this strange problem, i think there is a
> strange interaction between Infiniband and OpenMPI that causes it.
>
>
>
> 2009/1/23 Jeff Squyres <jsquy...@cisco.com>:
>  > On Jan 23, 2009, at 6:32 AM, Gabriele Fatigati wrote:
> >
> >> I've noted that OpenMPI has an asynchronous behaviour in the collective
> >> calls.
> >> The processors, doesn't wait that other procs arrives in the call.
> >
> > That is correct.
> >
> >> This behaviour sometimes can cause some problems with a lot of
> >> processors in the jobs.
> >
> > Can you describe what exactly you mean?  The MPI spec specifically allows
> > this behavior; OMPI made specific design choices and optimizations to
> > support this behavior.  FWIW, I'd be pretty surprised if any optimized
> MPI
> > implementation defaults to fully synchronous collective operations.
> >
> >> Is there an OpenMPI parameter to lock all process in the collective
> >> call until is finished? Otherwise  i have to insert many MPI_Barrier
> >> in my code and it is very tedious and strange..
> >
> > As you have notes, MPI_Barrier is the *only* collective operation that
> MPI
> > guarantees to have any synchronization properties (and it's a fairly weak
> > guarantee at that; no process will exit the barrier until every process
> has
> > entered the barrier -- but there's no guarantee that all processes leave
> the
> > barrier at the same time).
> >
> > Why do you need your processes to exit collective operations at the same
> > time?
> >
> > --
> > Jeff Squyres
> > Cisco Systems
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
>
>
>
> --
> Ing. Gabriele Fatigati
>
> Parallel programmer
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it                    Tel:   +39 051 6171722
>
> g.fatigati [AT] cineca.it
> _______________________________________________
>  users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Asynchronous behaviour of MPI Collectives

Reply via email to