Re: [OMPI users] UDP like messaging with MPI

2011-11-21 Thread Jeremiah Willcock

On Mon, 21 Nov 2011, Mudassar Majeed wrote:


Thank you for your answer. Actually, I used the term UDP to show the 
non-connection oriented messaging. TCP creates connection between two parties 
(who
communicate) but in UDP a message can be sent to any IP/port where a 
process/thread is listening to, and if the process is busy in doing something, 
all the
received messages are queued for it and when ever it calls the recv function 
one message is taken from the queue.


That is how MPI message matching works; messages sit in a queue until you 
call MPI_Irecv (or MPI_Recv or MPI_Probe, etc.) to get them.  Unlike UDP, 
MPI messages do not need to complete on the sender until they are 
received, so you will probably need to use MPI_Isend to avoid deadlocks.



I am implementing a distributed algorithm that will provide communication 
sensitive load balancing for computational loads. For example, if we have 10 
nodes each
containing 10 cores (100 cores in total). So when MPI application will start 
(let say with 1000) processes (more than 1 process per core) then I will run my
distributed algorithm MPI_Balance (sorry for giving MPI_ prefix as it is not a 
part of MPI, but I am trying to make it the part of MPI ;) ). So that algorithm
will take those processes that communicate more in the same node (keeping the 
computational load on 10 cores on that node balanced).

So that was the little bit explanation. So for that my distributed algorithm 
requires that some processes communicate with each other to collaborate on 
something.
So I need a kind of messaging that I explained above. It is kind of UDP 
messaging (no connection before sending a message, and message is always queued 
on the
receiver's side and sender is not blocked, it just sends the message and the 
receiver takes it when it gets free from other task).


The one difficulty in doing this is to manage the MPI requests from the 
sends and poll them with MPI_Test periodically.  You can just keep the 
requests in an array (std::vector in C++) which can be expanded when 
needed; to send a message, call MPI_Isend and put the request into the 
array, and periodically call MPI_Testany or MPI_Testsome on the array to 
find completed requests.  Note that you will need to keep the data being 
sent intact in its buffer until the request completes.  Here's a naive 
version that does extra copies and doesn't clean out its arrays of 
requests or buffers:


class message_send_engine {
  vector requests;
  vector buffers;

  public:
  void send(void* buf, int byte_len, int dest, int tag) {
MPI_Request req;
size_t buf_num = buffers.size();
buffers.resize(buf_num + 1);
buffers[buf_num].assign((char*)buf, (char*)buf + byte_len);
requests.resize(buf_num + 1);
MPI_Isend([buf_num][0], byte_len, MPI_BYTE, dest, tag, MPI_COMM_WORLD, 
[buf_num]);
  }

  void poll() { // Call this periodically
while (true) {
  int index, flag;
  MPI_Testany((int)requests.size(), [0], , , 
MPI_STATUS_IGNORE);
  if (flag && index != MPI_UNDEFINED) {
buffers[index].clear(); // Free memory
  } else {
break;
  }
}
  }
};

bool test_for_message(void* buf, int max_len, MPI_Status& st) {
  int flag;
  MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, , );
  return (flag != 0);
}

If test_for_message returns true, you can then use MPI_Recv to get the 
message.


I have tried to use the combination of MPI_Send, MPI_Recv, MPI_Iprobe, 
MPI_Isend, MPI_Irecv, MPI_Test etc, but I am not getting that thing that 
I am looking for. I think MPI should also provide that way. May be it is 
not in my knowledge. That's why I am asking the experts. I am still 
looking for it :(


-- Jeremiah Willcock


Re: [OMPI users] UDP like messaging with MPI

2011-11-21 Thread Mudassar Majeed
Thank you for your answer. Actually, I used the term UDP to show the 
non-connection oriented messaging. TCP creates connection between two parties 
(who communicate) but in UDP a message can be sent to any IP/port where a 
process/thread is listening to, and if the process is busy in doing something, 
all the received messages are queued for it and when ever it calls the recv 
function one message is taken from the queue. 

I am implementing a distributed algorithm that will provide communication 
sensitive load balancing for computational loads. For example, if we have 10 
nodes each containing 10 cores (100 cores in total). So when MPI application 
will start (let say with 1000) processes (more than 1 process per core) then I 
will run my distributed algorithm MPI_Balance (sorry for giving MPI_ prefix as 
it is not a part of MPI, but I am trying to make it the part of MPI ;) ). So 
that algorithm will take those processes that communicate more in the same node 
(keeping the computational load on 10 cores on that node balanced). 

So that was the little bit explanation. So for that my distributed algorithm 
requires that some processes communicate with each other to collaborate on 
something. So I need a kind of messaging that I explained above. It is kind of 
UDP messaging (no connection before sending a message, and message is always 
queued on the receiver's side and sender is not blocked, it just sends the 
message and the receiver takes it when it gets free from other task). 

I have tried to use the combination of MPI_Send, MPI_Recv, MPI_Iprobe, 
MPI_Isend, MPI_Irecv, MPI_Test etc, but I am not getting that thing that I am 
looking for. I think MPI should also provide that way. May be it is not in my 
knowledge. That's why I am asking the experts. I am still looking for it :(

thanks and regards,
Mudassar Majeed
PhD Student
Linkoping University
PhD Topic: Parallel Computing (Optimal composition of parallel programs and 
runtime support).






 From: Jeff Squyres <jsquy...@cisco.com>
To: mudassar...@yahoo.com; Open MPI Users <us...@open-mpi.org> 
Cc: "li...@razik.name" <li...@razik.name> 
Sent: Monday, November 21, 2011 6:07 PM
Subject: Re: [OMPI users] UDP like messaging with MPI
 
MPI defines only reliable communications -- it's not quite the same thing as 
UDP.  

Hence, if you send something, it is guaranteed to be able to be received.  UDP 
may drop packets whenever it feels like it (e.g., when it is out of resources).

Most MPI implementations will do some form of buffering of unexpected receives. 
 So if process A sends message X to process B, if B hasn't posted a matching 
receive for message X yet, B will likely silently accept the message under the 
covers and buffer it (or at least buffer part of it).  Hence, when you finally 
post the matching X receive in B, whatever of X was already received will 
already be there, but B may need to send a clear-to-send to A to get the rest 
of the message.

Specifically: if X is "short", A may eagerly send the whole message to B.  If X 
is "long", A may only send the first part of B and wait for a CTS before 
sending the rest of it.

MPI implementations typically do this in order to conserve buffer space -- 
i.e., if A sends a 10MB message, there's no point in buffering it at B until 
the matching receive is made and the message can be received directly into the 
destination 10MB buffer that B has made available.  If B accepted the 10MB X 
early, it would cost an additional 10MB to buffer it.  Ick.

Alternatively, what I think Lukas was trying to suggest was that you can post 
non-blocking receives and simply test for completion later.  This allows MPI to 
receive straight into the target buffer without intermediate copies or 
additional buffers.  Then you can just check to see when the receive(s) is(are) 
done.


On Nov 19, 2011, at 10:47 AM, Mudassar Majeed wrote:

> I know about tnıs functıons, they special requirements like the mpi_irecv 
> call should be made in every process. My processes should not look for 
> messages or implicitly receive them. But messages shuddering go into their 
> msg queues and retrieved when needed. Just like udp communication.
> 
> Regards
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] UDP like messaging with MPI

2011-11-21 Thread Jeff Squyres
MPI defines only reliable communications -- it's not quite the same thing as 
UDP.  

Hence, if you send something, it is guaranteed to be able to be received.  UDP 
may drop packets whenever it feels like it (e.g., when it is out of resources).

Most MPI implementations will do some form of buffering of unexpected receives. 
 So if process A sends message X to process B, if B hasn't posted a matching 
receive for message X yet, B will likely silently accept the message under the 
covers and buffer it (or at least buffer part of it).  Hence, when you finally 
post the matching X receive in B, whatever of X was already received will 
already be there, but B may need to send a clear-to-send to A to get the rest 
of the message.

Specifically: if X is "short", A may eagerly send the whole message to B.  If X 
is "long", A may only send the first part of B and wait for a CTS before 
sending the rest of it.

MPI implementations typically do this in order to conserve buffer space -- 
i.e., if A sends a 10MB message, there's no point in buffering it at B until 
the matching receive is made and the message can be received directly into the 
destination 10MB buffer that B has made available.  If B accepted the 10MB X 
early, it would cost an additional 10MB to buffer it.  Ick.

Alternatively, what I think Lukas was trying to suggest was that you can post 
non-blocking receives and simply test for completion later.  This allows MPI to 
receive straight into the target buffer without intermediate copies or 
additional buffers.  Then you can just check to see when the receive(s) is(are) 
done.


On Nov 19, 2011, at 10:47 AM, Mudassar Majeed wrote:

> I know about tnıs functıons, they special requirements like the mpi_irecv 
> call should be made in every process. My processes should not look for 
> messages or implicitly receive them. But messages shuddering go into their 
> msg queues and retrieved when needed. Just like udp communication.
> 
> Regards
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] UDP like messaging with MPI

2011-11-19 Thread Lukas Razik
Hi!

>I know about tnıs functıons, they special requirements  like the mpi_irecv 
>call should be made in every process. My processes should not look for 
>messages or implicitly receive them.

I understand. But then I think your UDP comparison is wrong - whatever... :)



> But messages shuddering go into their msg queues and retrieved when needed.

I don't know if MPI has something like message queues. If not then I would use 
"MPI RMA" (i.e. MPI Remote Memory Access from MPI-2) and implement a message 
queue like this:

1. On the node which shall get the messages (I call it G) I would create a 
buffer with a size which is as big as needed to store enough messages from the 
processes which shall write into the message queue.
2. With the collective operation MPI_Win_create() I would make the buffer of G 
available for all participating nodes.
3. Now the sender nodes can put data into the window buffer with MPI_Put():
Here the problem is that they must know which part of the buffer is already 
overwritten with messages of other senders.
Therefore I would use a integer value at the beginning of the window buffer. 
This value shows the next free position in the window buffer.
In other words each sender who wants to save a message in the window buffer
 a) reads the integer value from the buffer

 b) saves its message at the free place in the buffer
 c) increments the integer value by the size of the written message
4. To avoid race conditions you must lock the accesses to the window by 
MPI_Win_lockI() and MPI_Win_unlock().
BTW: There are three type of synchronization calls:
 a) MPI_Win_fence()
 b) MPI_Win_start(), MPI_Win_complete, MPI_Win_post(), MPI_Win_wait()
 c) MPI_Win_lock(), MPI_Win_unlock()But I think the right one here is c) 
because then the target process of your MPI_put() (i.e. G) doesn't need to be 
involved in the communication. Therefore a communication which is synchronized 
by method c) is called a "passive target communication". Have a look into the 
MPI-2 standard if you don't know what that means.

5. When G wants to "receive" from the windows buffer it must also call the 
MPI_Win_lock() & MPI_Win_unlock() operations. Then it reads one or more 
messages from the buffer window. After that it must decrement the integer value 
by the size of the read message(s).


Hints:
This would be a LIFO message queue. If you want a FIFO queue then implement a 
"ring buffer". Therefore you could use two integer values at the beginning of 
the buffer which show the head and the tail of the queue. Maybe there's also an 
more efficient way but that's an idea I have.



> Just like udp communication.


In the best of my knowledge "normal" UDP sockets have no receive queues . So if 
there's no receiver which waits for an incoming UDP datagram then it's 
discarded, isn't it? (Maybe asynchronous UDP sockets have queues...)


Regards,
Lukas




Re: [OMPI users] UDP like messaging with MPI

2011-11-19 Thread Mudassar Majeed
I know about tnıs functıons, they special requirements  like the mpi_irecv call 
should be made in every process. My processes should not look for messages or 
implicitly receive them. But messages shuddering go into their msg queues and 
retrieved when needed. Just like udp communication.

Regards