Hi, I am a student whose research work includes using MPI and OpenACC to accelerate our in-house research CFD code on multiple GPUs. I am having a big issue related to the "progression of operations in MPI" and am thinking your inputs can be very helpful.
I am now testing the performance of overlapping communication and computation for a code. Communication exists between hosts (CPUs) and computations are done on devices (GPUs). However, in my case, the actual communication always starts when the computations finish. Therefore, even though I wrote my code in an overlapping way, there is no overlapping because of the OpenMPI not supporting asynchronous progression. I found that MPI often does progress (i.e. actually send or receive the data) only if I am blocking in a call to MPI_Wait (Then no overlapping occurs at all). My purpose is to use overlapping to hide communication latency and thus improve the performance of my code. Is there a way you can suggest to me? Thank you very much! I am now using PGI/17.5 compiler and openmpi/2.0.0. A 100 Gbps EDR-Infiniband is used for MPI traffic. If I use "ompi_info", then info. about the thread support is "Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes)". Best Regards, Weicheng Xue
_______________________________________________ users mailing list email@example.com https://lists.open-mpi.org/mailman/listinfo/users