Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores: very poor performance

Gilbert Grosdidier Wed, 22 Dec 2010 13:29:24 -0500

Bonsoir Eugene,

 First thanks for trying to help me.


 I already gave a try to some profiling tool, namely IPM, which is rather
simple to use. Here follows some output for a 1024 core run.
Unfortunately, I'm yet unable to have the equivalent MPT chart.

#IPMv0.983####################################################################
#
# command : unknown (completed)
# host    : r34i0n0/x86_64_Linux           mpi_tasks : 1024 on 128 nodes
# start   : 12/21/10/13:18:09              wallclock : 3357.308618 sec
# stop    : 12/21/10/14:14:06              %comm     : 27.67
# gbytes  : 0.00000e+00 total              gflop/sec : 0.00000e+00 total
#
##############################################################################
# region  : *       [ntasks] =   1024
#
#                           [total] <avg>           min           max

# entries 1024 11 1# wallclock 3.43754e+06 3356.98 3356.833357.31# user 2.82831e+06 2762.02 2622.042923.37# system 376230 367.412 174.603492.919# mpi 951328 929.031 633.1371052.86# %comm 27.6719 18.860131.363# gflop/sec 0 00 0# gbytes 0 00 0

#
#
#                            [time]       [calls] <%mpi> <%wall>

# MPI_Waitall 741683 7.91081e+07 77.9621.58# MPI_Allreduce 114057 2.53665e+07 11.993.32# MPI_Recv 40164.7 2048 4.221.17# MPI_Isend 27420.6 6.53513e+08 2.880.80# MPI_Barrier 25113.5 2048 2.640.73# MPI_Sendrecv 2123.6 212992 0.220.06# MPI_Irecv 464.616 6.53513e+08 0.050.01# MPI_Reduce 215.447 171008 0.020.01# MPI_Bcast 85.0198 1024 0.010.00# MPI_Send 0.377043 2048 0.000.00# MPI_Comm_rank 0.000744925 4096 0.000.00# MPI_Comm_size 0.000252183 1024 0.000.00

###############################################################################

It seems to my non-expert eye that MPI_Waitall is dominant among MPIcalls,but not for the overall application, however I will have to compare withMPT,

before concluding.

 Thanks again for your suggestions, that I'll address one by one.

 Best,     G.




Le 22/12/2010 18:50, Eugene Loh a écrit :

Can you isolate a bit more where the time is being spent? Theperformance effect you're describing appears to be drastic. Have youprofiled the code? Some choices of tools can be found in the FAQhttp://www.open-mpi.org/faq/?category=perftools The results may be"uninteresting" (all time spent in your MPI_Waitall calls, forexample), but it'd be good to rule out other possibilities (e.g., I'veseen cases where it's the non-MPI time that's the culprit).
If all the time is spent in MPI_Waitall, then I wonder if it would bepossible for you to reproduce the problem with just someMPI_Isend|Irecv|Waitall calls that mimic your program. E.g., "lots ofshort messages", or "lots of long messages", etc. It sounds likethere is some repeated set of MPI exchanges, so maybe that set can beextracted and run without the complexities of the application.
Anyhow, some profiling might help guide one to the problem.

Gilbert Grosdidier wrote:
There are indeed a high rate of communications. But the buffer
size is always the same for a given pair of processes, and I thought
that mpi_leave_pinned should avoid freeing the memory in this case.
Am I wrong ?

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores: very poor performance

Reply via email to