On Nov 13, 2018, at 8:52 PM, Weicheng Xue <weic...@vt.edu> wrote:
>     I am a student whose research work includes using MPI and OpenACC to 
> accelerate our in-house research CFD code on multiple GPUs. I am having a big 
> issue related to the "progression of operations in MPI" and am thinking your 
> inputs can be very helpful.

Someone asked me about an Open MPI + OpenACC issue this past week at the 
Supercomputing trade show.

I'm not sure if anyone in the Open MPI development community is testing with 
Open MPI + OpenACC.  I don't know much about it -- I would *hope* that it "just 
works", but I don't know that for sure.

>      I am now testing the performance of overlapping communication and 
> computation for a code. Communication exists between hosts (CPUs) and 
> computations are done on devices (GPUs). However, in my case, the actual 
> communication always starts when the computations finish. Therefore, even 
> though I wrote my code in an overlapping way, there is no overlapping because 
> of the OpenMPI not supporting asynchronous progression. I found that MPI 
> often does progress (i.e. actually send or receive the data) only if I am 
> blocking in a call to MPI_Wait (Then no overlapping occurs at all). My 
> purpose is to use overlapping to hide communication latency and thus improve 
> the performance of my code. Is there a way you can suggest to me? Thank you 
> very much!

Nearly all transports in Open MPI support asynchronous progress -- but only 
some of them offer hardware- and/or OS-assisted asynchronous progress (which is 
probably what you are assuming).  Specifically: I'm quibbling with your choice 
of wording, but the end effect you are observing is likely a) correct, and b) 
dependent upon the network transport that you are using.

>      I am now using PGI/17.5 compiler and openmpi/2.0.0. A 100 Gbps 
> EDR-Infiniband is used for MPI traffic. If I use "ompi_info", then info. 
> about the thread support is "Thread support: posix (MPI_THREAD_MULTIPLE: yes, 
> OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes)".

That's a little surprising -- IB should be one of the transports that actually 
supports asynchronous progress.

Are you using UCX for the IB transport?

Jeff Squyres

users mailing list

Reply via email to