Re: [OMPI users] Complexity of MPI_Comm_split and MPI_Comm_Create?

Edgar Gabriel Tue, 20 Jan 2015 15:22:29 -0500 (EST)

Here are the communication operations occurring in the best casescenario in Open MPI right now:


Comm_create:
  - Communicator ID allocation: 2 Allreduce operations per round of
    negotiations
  - 1 Allreduce operation for 'activating' the communicator


Comm_split:
  - 1 Allgather operation for collecting all color keys
  - Communicator ID allocation: 2 Allreduce operations per round of
    negotiations
  - 1 Allreduce operation for 'activating' the communicator

As the description above suggests, you might need more than one roundfor the comunicator id allocation, depending on the history of theapplication and which IDs have already been used.

The details for how the operations are implemented can vary. We couldassume however a binary tree for the reduce and the broadcast portion ofthe Allreduce operation, each being O(log P). For Allgather we could acombination of a linear gather (O(P)) and a binary tree broadcast (O(logP)).


So as of today, Comm_split is more expensive than Comm_create.

Thanks
Edgar

On 1/19/2015 4:13 PM, Jonathan Eckstein wrote:

Dear Open MPIers:

I have been using MPI for many years, most recently Open MPI.  But I
have just encountered the first situation in which it will be helpful to
create communicators (for an unstructured sparse matrix algorithm).

I have identified two ways I could create the communicators I need.
Where P denotes the number of MPI processors, Option A is:
   1.  Exchange of messages between processors of adjacent rank
       [O(1) message rounds (one up, one down)]
   2.  One scan operation
       [O(log P) message rounds]
   3.  One or two calls to MPI_COMM_SPLIT
       [Unknown complexity]
Option B is:
   1.  Three scan operations (one in reverse direction)
       [O(log P) message rounds + time to make reverse communicator]
   2.  Each processor calls MPI_GROUP_RANGE_INCL and MPI_COMM_CREATE
       at most twice
       [Unknown complexity]

All the groups/communicators I am creating are stride-1 ranges of
contiguous processors from MPI_COMM_WORLD.  Some of them could overlap
by one processor, hence the possible need to call MPI_COMM_SPLIT or
MPI_COMM_CREATE twice per processor.

Option A looks easier to code, but I wonder whether it will scale as
well, because I am not sure about the complexity of MPI_COMM_SPLIT. What
are the parallel message complexities of MPI_COMM_SPLIT and
MPI_COMM_CREATE?  I poked around the web but could not find much on this
topic.

For option B, I will need to make a communicator that has the same
processes as MPI_COMM_WORLD, but in reverse order.  This looks like it
can be done easily with MPI_GROUP_RANGE_INCL with a stride of -1, but
again I am not sure how much communication is required to set up the
communicator -- I would guess O(log P) rounds of messages.

Any advice or explanation you can offer would be much appreciated.

    Professor Jonathan Eckstein
    Rutgers University


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/01/26216.php

Re: [OMPI users] Complexity of MPI_Comm_split and MPI_Comm_Create?

Reply via email to