Re: [OMPI users] MPI_Reduce performance

2010-09-08 Thread Terry Frankcombe
Gabriele,

Can you clarify... those timings are what is reported for the reduction
call specifically, not the total execution time?

If so, then the difference is, to a first approximation, the time you
spend sitting idly by doing absolutely nothing waiting at the barrier.

Ciao
Terry


-- 
Dr. Terry Frankcombe
Research School of Chemistry, Australian National University
Ph: (+61) 0417 163 509Skype: terry.frankcombe



Re: [OMPI users] MPI_Reduce performance

2010-09-08 Thread Ashley Pittman

On 8 Sep 2010, at 10:21, Gabriele Fatigati wrote:
> So, im my opinion, it is better put MPI_Barrier before any MPI_Reduce to 
> mitigate "asynchronous" behaviour of MPI_Reduce in OpenMPI. I suspect the 
> same for others collective communications. Someone can explaine me why 
> MPI_reduce has this strange behaviour?

There are many cases where where adding an explicit barrier before a call to 
reduce would be superfluous so the standard rightly says that it isn't needed 
and need not be performed.  As you've seen though there are also cases where it 
can help.  I'd be interested to know the effect if you only added a barrier 
before MPI_Reduce occasionally, perhaps every one or two hundred iterations, 
this can also have a beneficial effect as a barrier every iteration adds 
significant overhead.

This is a textbook example of where the new asynchronous barrier could help, in 
theory it should have the effect of being able keep processes in sync without 
any additional overhead in the case that they are already well synchronised.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk




Re: [OMPI users] MPI_Reduce performance

2010-09-08 Thread David Zhang
Doing Reduce without Barrier first allows one process to call Reduce and
exit immediately without waiting for other processes to call Reduce.
Therefore, this allows one process to advance faster than other processes.
I suspect the 2671 second result is the difference between the fastest and
slowest process.  Having Barrier reduce the time difference because it
forces the faster processes to go slower.

On Wed, Sep 8, 2010 at 3:21 AM, Gabriele Fatigati wrote:

> Dear OpenMPI users,
>
> i'm using OpenMPI 1.3.3 on Infiniband 4x interconnnection network. My
> parallel application use intensive MPI_Reduce communication over
> communicator created with MPI_Comm_split.
>
> I've noted strange behaviour during execution. My code is instrumented with
> Scalasca 1.3 to report subroutine execution time. First execution shows
> elapsed time with 128 processors ( job_communicator is created with
> MPI_Comm_split). In both cases is composed to the same ranks of
> MPI_COMM_WORLD:
>
> MPI_Reduce(.,job_communicator)
>
> The elapsed time is 2671 sec.
>
> Second run use MPI_BARRIER before MPI_Reduce:
>
> MPI_Barrier(job_communicator..)
> MPI_Reduce(.,job_communicator)
>
> The elapsed time of Barrier+Reduce is 2167 sec, (about 8 minutes less).
>
> So, im my opinion, it is better put MPI_Barrier before any MPI_Reduce to
> mitigate "asynchronous" behaviour of MPI_Reduce in OpenMPI. I suspect the
> same for others collective communications. Someone can explaine me why
> MPI_reduce has this strange behaviour?
>
> Thanks in advance.
>
>
>
>
> --
> Ing. Gabriele Fatigati
>
> Parallel programmer
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.itTel:   +39 051 6171722
>
> g.fatigati [AT] cineca.it
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
David Zhang
University of California, San Diego


[OMPI users] MPI_Reduce performance

2010-09-08 Thread Gabriele Fatigati
Dear OpenMPI users,

i'm using OpenMPI 1.3.3 on Infiniband 4x interconnnection network. My
parallel application use intensive MPI_Reduce communication over
communicator created with MPI_Comm_split.

I've noted strange behaviour during execution. My code is instrumented with
Scalasca 1.3 to report subroutine execution time. First execution shows
elapsed time with 128 processors ( job_communicator is created with
MPI_Comm_split). In both cases is composed to the same ranks of
MPI_COMM_WORLD:

MPI_Reduce(.,job_communicator)

The elapsed time is 2671 sec.

Second run use MPI_BARRIER before MPI_Reduce:

MPI_Barrier(job_communicator..)
MPI_Reduce(.,job_communicator)

The elapsed time of Barrier+Reduce is 2167 sec, (about 8 minutes less).

So, im my opinion, it is better put MPI_Barrier before any MPI_Reduce to
mitigate "asynchronous" behaviour of MPI_Reduce in OpenMPI. I suspect the
same for others collective communications. Someone can explaine me why
MPI_reduce has this strange behaviour?

Thanks in advance.




-- 
Ing. Gabriele Fatigati

Parallel programmer

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it