Thanks very much, Edgar. I will definitely go with mpi_comm_create, then. I will need two extra scans to get set up for those operations, but that sounds much better than an Allgather (after some reflection, I was guessing that you would be doing an Allgather to organize the color keys).

I will be calling MPI_Comm_create simultaneously across all processors in MPI_COMM_WORLD, but with the group arguments set up in such a way that I make multiple disjoint sub-communicators in the process. I'd appreciate it if you'd let me know if that's not OK to do.

  Thanks again!

      Jonathan


On 1/20/2015 3:22 PM, Edgar Gabriel wrote:
Here are the communication operations occurring in the best case
scenario in Open MPI right now:

Comm_create:
   - Communicator ID allocation: 2 Allreduce operations per round of
     negotiations
   - 1 Allreduce operation for 'activating' the communicator

Comm_split:
   - 1 Allgather operation for collecting all color keys
   - Communicator ID allocation: 2 Allreduce operations per round of
     negotiations
   - 1 Allreduce operation for 'activating' the communicator

As the description above suggests, you might need more than one round
for the comunicator id allocation, depending on the history of the
application and which IDs have already been used.

The details for how the operations are implemented can vary. We could
assume however a binary tree for the reduce and the broadcast portion of
the Allreduce operation, each being O(log P). For Allgather we could a
combination of a linear gather (O(P)) and a binary tree broadcast (O(log
P)).

So as of today, Comm_split is more expensive than Comm_create.

Thanks
Edgar

On 1/19/2015 4:13 PM, Jonathan Eckstein wrote:
Dear Open MPIers:

I have been using MPI for many years, most recently Open MPI.  But I
have just encountered the first situation in which it will be helpful to
create communicators (for an unstructured sparse matrix algorithm).

I have identified two ways I could create the communicators I need.
Where P denotes the number of MPI processors, Option A is:
   1.  Exchange of messages between processors of adjacent rank
       [O(1) message rounds (one up, one down)]
   2.  One scan operation
       [O(log P) message rounds]
   3.  One or two calls to MPI_COMM_SPLIT
       [Unknown complexity]
Option B is:
   1.  Three scan operations (one in reverse direction)
       [O(log P) message rounds + time to make reverse communicator]
   2.  Each processor calls MPI_GROUP_RANGE_INCL and MPI_COMM_CREATE
       at most twice
       [Unknown complexity]

All the groups/communicators I am creating are stride-1 ranges of
contiguous processors from MPI_COMM_WORLD.  Some of them could overlap
by one processor, hence the possible need to call MPI_COMM_SPLIT or
MPI_COMM_CREATE twice per processor.

Option A looks easier to code, but I wonder whether it will scale as
well, because I am not sure about the complexity of MPI_COMM_SPLIT. What
are the parallel message complexities of MPI_COMM_SPLIT and
MPI_COMM_CREATE?  I poked around the web but could not find much on this
topic.

For option B, I will need to make a communicator that has the same
processes as MPI_COMM_WORLD, but in reverse order.  This looks like it
can be done easily with MPI_GROUP_RANGE_INCL with a stride of -1, but
again I am not sure how much communication is required to set up the
communicator -- I would guess O(log P) rounds of messages.

Any advice or explanation you can offer would be much appreciated.

    Professor Jonathan Eckstein
    Rutgers University


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/01/26216.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/01/26217.php

Reply via email to