It looks like the buffering operations consume about 15% as much time as the allreduce operations. Not huge, but not trivial, all the same. Is there any way to avoid the buffering step?
On Thu, Sep 24, 2009 at 6:03 PM, Eugene Loh <eugene....@sun.com> wrote: > Greg Fischer wrote: > > (I apologize in advance for the simplistic/newbie question.) > > I'm performing an ALLREDUCE operation on a multi-dimensional array. This > operation is the biggest bottleneck in the code, and I'm wondering if > there's a way to do it more efficiently than what I'm doing now. Here's a > representative example of what's happening: > > ir=1 > do ikl=1,km > do ij=1,jm > do ii=1,im > albuf(ir)=array(ii,ij,ikl,nl,0,ng) > ir=ir+1 > enddo > enddo > enddo > agbuf=0.0 > call > mpi_allreduce(albuf,agbuf,im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr) > ir=1 > do ikl=1,km > do ij=1,jm > do ii=1,im > phim(ii,ij,ikl,nl,0,ng)=agbuf(ir) > ir=ir+1 > enddo > enddo > enddo > > Is there any way to just do this in one fell swoop, rather than buffering, > transmitting, and unbuffering? This operation is looped over many times. > Are there savings to be had here? > > There are three steps here: buffering, transmitting, and unbuffering. Any > idea how the run time is distributed among those three steps? E.g., if most > time is spent in the MPI call, then combining all three steps into one is > unlikely to buy you much... and might even hurt. If most of the time is > spent in the MPI call, then there may be some tuning of collective > algorithms to do. I don't have any experience doing this with OMPI. I'm > just saying it makes some sense to isolate the problem a little bit more. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >