Hi Jeff S.
Thank you very much for your reply.
I am still feeling some confusion. Please guide.
The idea is to do this:
>
> MPI_Recv_init()
> MPI_Send_init()
> for (i = 0; i < 1000; ++i) {
> MPI_Startall()
> /* do whatever */
> MPI_Waitall()
> }
> for (i = 0; i < 1000; ++i) {
> MPI_Request_free()
> }
>
> So in your inner loop, you just call MPI_Startall() and a corresponding
> MPI_Test* / MPI_Wait* call to complete those requests.
>
> The idea is that the MPI_*_init() functions do some one-time setup on the
> requests and then you just start and complete those same requests over and
> over and over. When you're done, you free them.
>
> Actually in my code what I was doing is:
*CALL a subroutine-(1) 10000 times in the main program.
**Subroutine-(1) starts===================================*
*
Loop A starts here >>>>>>>>>>>>>>>>>>>> (three passes)
Call subroutine-(2)
Subroutine-(2) starts----------------------------
Pick local data from array U in separate arrays for each
neighboring processor
CALL MPI_IRECV for each neighboring process
CALL MPI_ISEND for each neighboring process
-------perform work that could be done with local data
CALL MPI_WAITALL( )
-------perform work using the received data
Subroutine**-(2)** ends**----------------------------*
* -------perform work to update array U*
* Loop A ends here >>>>>>>>>>>>>>>>>>>>*
*Subroutine-(1) ends====================================*
I assume that the above setup will overlap computation with communication
(hiding communication behind computations), as well.
Now intention is to use persistent communication to get more efficiency. I
am facing confusion how to use your proposed model for my work. Please
suggest.
best regards,
AA.