amjad ali wrote:
and it's conceivable that you might have better performance with

    CALL MPI_ISEND()
    DO I = 1, N
        call do_a_little_of_my_work()  ! no MPI progress is being made here
        CALL MPI_TEST()            ! enough MPI progress is being made here that the receiver has something to do
    END DO
    CALL MPI_WAIT()

Whether performance improves or not is not guaranteed by the MPI standard.
And the SECOND desire is to use Persistent communication for even better speedup.
Right.  That's a separate issue.

So actually I am focusing on the persistent communication at this time. Based on your suggestions, I developed:

sending, receiving buffers, and the request array is defined in declared in the global module. And their sizes are allocated in the main program. But following is not working. Segmentation fault messages at just from the underline blue line lace.
Well, the problem must be in the details of how you're implementing this.  I've attached a program that works for me.

Main program starts------@@@@@@@@@@@@@@@@@@@@@@@.

CALL MPI_RECV_INIT for each neighboring process 
CALL MPI_SEND_INIT for each neighboring process

Loop Calling the subroutine1--------------------(10000 times in the main program).

Call subroutine1

Subroutine1 starts===================================
   Loop A starts here >>>>>>>>>>>>>>>>>>>> (three passes)
   Call subroutine2

   Subroutine2 starts----------------------------

         Pick local data from array U in separate arrays for each neighboring processor
         CALL MPI_STARTALL
         -------perform work that could be done with local data
         CALL MPI_WAITALL( )
         -------perform work using the received data
   Subroutine
2 ends----------------------------


         -------perform work to update array U
   Loop A ends here >>>>>>>>>>>>>>>>>>>>
Subroutine1 ends====================================

Loop Calling the subroutine1 ends------------(10000 times in the main program).

CALL MPI_Request_free( )

Main program ends------@@@@@@@@@@@@@@@@@@@@@@@.

How to tackle all this.
module my_mpi_stuff
  integer, parameter :: nmsgs = 1, nwords = 8
  integer me, np
  integer reqs(       nmsgs,2)       ! (...,1) are for sends and (...,2) are 
for receives
  real(8) bufs(nwords,nmsgs,2)       ! (...,1) are for sends and (...,2) are 
for receives
end module my_mpi_stuff

program main
  use my_mpi_stuff
  include "mpif.h"

  call MPI_Init(ier)
  call MPI_Comm_size(MPI_COMM_WORLD,np,ier)
  call MPI_Comm_rank(MPI_COMM_WORLD,me,ier)

  ! set up individual sends and receives
  if ( np /= 2 ) stop "np is not 2"      ! this simple example works only for 
np==2
  call MPI_Recv_init(bufs(1,1,2), nwords, MPI_REAL8, 1-me, 300, MPI_COMM_WORLD, 
reqs(1,2), ier)
  call MPI_Send_init(bufs(1,1,1), nwords, MPI_REAL8, 1-me, 300, MPI_COMM_WORLD, 
reqs(1,1), ier)

  do i = 1, 10000
    call sub1()
  end do

  ! dump out buffers
  do imsg = 1, nmsgs
    write(6,'(3i5,8f8.1)') me, imsg, 1, bufs(:,imsg,1)
    write(6,'(3i5,8f8.1)') me, imsg, 2, bufs(:,imsg,2)
  end do

  do imsg = 1, nmsgs
    call MPI_Request_free(reqs(imsg,1), ier)
    call MPI_Request_free(reqs(imsg,2), ier)
  end do

  call MPI_Finalize(ier)
end program main

subroutine sub1()
  do i = 1, 3
    call sub2()
    ! call update(u)
  end do
end subroutine sub1

subroutine sub2()
  use my_mpi_stuff
  include "mpif.h"

  ! Pick local data from array U in separate arrays for each neighboring 
processor
  do imsg = 1, nmsgs
    do iword = 1, nwords
      bufs(iword,imsg,1) = 10000 * me + 100 * imsg + iword
    end do
  end do

  call MPI_Startall(2*nmsgs,reqs,ier)

  ! -------perform work that could be done with local data

  call MPI_Waitall (2*nmsgs,reqs,MPI_STATUSES_IGNORE,ier)

  ! -------perform work using the received data

end subroutine sub2

#!/bin/csh

setenv OPAL_PREFIX .....
set path = ( $OPAL_PREFIX/bin $path )

mpif90 a.f90
mpirun -n 2 ./a.out

Reply via email to