Here are the communication operations occurring in the best case
scenario in Open MPI right now:
Comm_create:
- Communicator ID allocation: 2 Allreduce operations per round of
negotiations
- 1 Allreduce operation for 'activating' the communicator
Comm_split:
- 1 Allgather operation for collecting all color keys
- Communicator ID allocation: 2 Allreduce operations per round of
negotiations
- 1 Allreduce operation for 'activating' the communicator
As the description above suggests, you might need more than one round
for the comunicator id allocation, depending on the history of the
application and which IDs have already been used.
The details for how the operations are implemented can vary. We could
assume however a binary tree for the reduce and the broadcast portion of
the Allreduce operation, each being O(log P). For Allgather we could a
combination of a linear gather (O(P)) and a binary tree broadcast (O(log
P)).
So as of today, Comm_split is more expensive than Comm_create.
Thanks
Edgar
On 1/19/2015 4:13 PM, Jonathan Eckstein wrote:
Dear Open MPIers:
I have been using MPI for many years, most recently Open MPI. But I
have just encountered the first situation in which it will be helpful to
create communicators (for an unstructured sparse matrix algorithm).
I have identified two ways I could create the communicators I need.
Where P denotes the number of MPI processors, Option A is:
1. Exchange of messages between processors of adjacent rank
[O(1) message rounds (one up, one down)]
2. One scan operation
[O(log P) message rounds]
3. One or two calls to MPI_COMM_SPLIT
[Unknown complexity]
Option B is:
1. Three scan operations (one in reverse direction)
[O(log P) message rounds + time to make reverse communicator]
2. Each processor calls MPI_GROUP_RANGE_INCL and MPI_COMM_CREATE
at most twice
[Unknown complexity]
All the groups/communicators I am creating are stride-1 ranges of
contiguous processors from MPI_COMM_WORLD. Some of them could overlap
by one processor, hence the possible need to call MPI_COMM_SPLIT or
MPI_COMM_CREATE twice per processor.
Option A looks easier to code, but I wonder whether it will scale as
well, because I am not sure about the complexity of MPI_COMM_SPLIT. What
are the parallel message complexities of MPI_COMM_SPLIT and
MPI_COMM_CREATE? I poked around the web but could not find much on this
topic.
For option B, I will need to make a communicator that has the same
processes as MPI_COMM_WORLD, but in reverse order. This looks like it
can be done easily with MPI_GROUP_RANGE_INCL with a stride of -1, but
again I am not sure how much communication is required to set up the
communicator -- I would guess O(log P) rounds of messages.
Any advice or explanation you can offer would be much appreciated.
Professor Jonathan Eckstein
Rutgers University
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/01/26216.php