I am a student whose research work includes using MPI and OpenACC to
accelerate our in-house research CFD code on multiple GPUs. I am having a
big issue related to the "progression of operations in MPI" and am thinking
your inputs can be very helpful.

     I am now testing the performance of overlapping communication and
computation for a code. Communication exists between hosts (CPUs) and
computations are done on devices (GPUs). However, in my case, the actual
communication always starts when the computations finish. Therefore, even
though I wrote my code in an overlapping way, there is no overlapping
because of the OpenMPI not supporting asynchronous progression. I found
that MPI often does progress (i.e. actually send or receive the data) only
if I am blocking in a call to MPI_Wait (Then no overlapping occurs at all).
My purpose is to use overlapping to hide communication latency and thus
improve the performance of my code. Is there a way you can suggest to me?
Thank you very much!

     I am now using PGI/17.5 compiler and openmpi/2.0.0. A 100 Gbps
EDR-Infiniband is used for MPI traffic. If I use "ompi_info", then info.
about the thread support is "Thread support: posix (MPI_THREAD_MULTIPLE:
yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib:

Best Regards,

Weicheng Xue
users mailing list

Reply via email to