amjad ali wrote:

Hi all.
(sorry for duplication, if it is)

I have to parallelize a CFD code using domain/grid/mesh partitioning among the processes. Before running, we do not know,
(i) How many processes we will use ( np  is unknown)
(ii) A process will have how many neighbouring processes (my_nbrs = ?)
(iii) How many entries a process need to send to a particular neighbouring process.
But when the code run, I calculate all of this info easily.


The problem is to copy a number of entries to an array then send that array to a destination process. The same sender has to repeat this work to send data to all of its neighbouring processes. Is this following code fine:

DO i = 1, my_nbrs
   DO j = 1, few_entries_for_this_neighbour
       send_array(j)   =    my_array(jth_particular_entry)
   ENDDO
CALL MPI_ISEND(send_array(1:j),j, MPI_REAL8, dest(i), tag, MPI_COMM_WORLD, request1(i), ierr)

instead of "j" I assume you intended something like "few_entries_for_this_neighbour"

ENDDO

And the corresponding receives, at each process:

DO i = 1, my_nbrs
   k = few_entries_from_this_neighbour
CALL MPI_IRECV(recv_array(1:k),k, MPI_REAL8, source(i), tag, MPI_COMM_WORLD, request2(i), ierr)
   DO j = 1, few_from_source(i)
       received_data(j)   =    recv_array(j)
   ENDDO
ENDDO

After the above MPI_WAITALL.


I think this code will not work. Both for sending and receiving. For the non-blocking sends we cannot use send_array to send data to other processes like above (as we are not sure for the availability of application buffer for reuse). Am I right?

Similar problem is with recv array; data from multiple processes cannot be received in the same array like above. Am I right?

Correct for both send and receive. When you call MPI_Isend, the buffer cannot be written until the MPI_Waitall. When you use MPI_Irecv, you cannot read the data until MPI_Waitall. You're reusing both send and receive buffers too often and too soon.

Target is to hide communication behind computation. So need non blocking communication. As we do know value of np or values of my_nbrs for each process, we cannot decide to create so many arrays. Please suggest solution.

You can allocate memory dynamically, even in Fortran.

A more subtle solution that I could assume is following:

cc = 0
DO i = 1, my_nbrs
   DO j = 1, few_entries_for_this_neighbour
       send_array(cc+j)   =    my_array(jth_particular_entry)
   ENDDO
CALL MPI_ISEND(send_array(cc:cc+j),j, MPI_REAL8, dest(i), tag, MPI_COMM_WORLD, request1(i), ierr)
   cc = cc  + j
ENDDO

Same issue with j as before, but yes concatenating the various send buffers in a one-dimensional fashion should work.

And the corresponding receives, at each process:

cc = 0
DO i = 1, my_nbrs
   k = few_entries_from_this_neighbour
CALL MPI_IRECV(recv_array(cc+1:cc+k),k, MPI_REAL8, source(i), tag, MPI_COMM_WORLD, request2(i), ierr)
   DO j = 1, k
       received_data(j)   =    recv_array(cc+j)
   ENDDO
   cc = cc + k
ENDDO

Okay, but you're still reading the data before the MPI_Waitall call. If you call MPI_Irecv(buffer,...), you cannot read the buffer's contents until the corresponding MPI_Waitall (or variant).

After the above MPI_WAITALL.

Means that,
send_array for all neighbours will have a collected shape:
send_array = [... entries for nbr 1 ..., ... entries for nbr 1 ..., ..., ... entries for last nbr ...] And the respective entries will be send to respective neighbours as above.


recv_array for all neighbours will have a collected shape:
recv_array = [... entries from nbr 1 ..., ... entries from nbr 1 ..., ..., ... entries from last nbr ...] And the entries from the processes will be received at respective locations/portion in the recv_array.


Is this scheme is quite fine and correct.

I am in search of efficient one.

Reply via email to