Re: [OMPI users] UDP like messaging with MPI
On Mon, 21 Nov 2011, Mudassar Majeed wrote: Thank you for your answer. Actually, I used the term UDP to show the non-connection oriented messaging. TCP creates connection between two parties (who communicate) but in UDP a message can be sent to any IP/port where a process/thread is listening to, and if the process is busy in doing something, all the received messages are queued for it and when ever it calls the recv function one message is taken from the queue. That is how MPI message matching works; messages sit in a queue until you call MPI_Irecv (or MPI_Recv or MPI_Probe, etc.) to get them. Unlike UDP, MPI messages do not need to complete on the sender until they are received, so you will probably need to use MPI_Isend to avoid deadlocks. I am implementing a distributed algorithm that will provide communication sensitive load balancing for computational loads. For example, if we have 10 nodes each containing 10 cores (100 cores in total). So when MPI application will start (let say with 1000) processes (more than 1 process per core) then I will run my distributed algorithm MPI_Balance (sorry for giving MPI_ prefix as it is not a part of MPI, but I am trying to make it the part of MPI ;) ). So that algorithm will take those processes that communicate more in the same node (keeping the computational load on 10 cores on that node balanced). So that was the little bit explanation. So for that my distributed algorithm requires that some processes communicate with each other to collaborate on something. So I need a kind of messaging that I explained above. It is kind of UDP messaging (no connection before sending a message, and message is always queued on the receiver's side and sender is not blocked, it just sends the message and the receiver takes it when it gets free from other task). The one difficulty in doing this is to manage the MPI requests from the sends and poll them with MPI_Test periodically. You can just keep the requests in an array (std::vector in C++) which can be expanded when needed; to send a message, call MPI_Isend and put the request into the array, and periodically call MPI_Testany or MPI_Testsome on the array to find completed requests. Note that you will need to keep the data being sent intact in its buffer until the request completes. Here's a naive version that does extra copies and doesn't clean out its arrays of requests or buffers: class message_send_engine { vector requests; vectorbuffers; public: void send(void* buf, int byte_len, int dest, int tag) { MPI_Request req; size_t buf_num = buffers.size(); buffers.resize(buf_num + 1); buffers[buf_num].assign((char*)buf, (char*)buf + byte_len); requests.resize(buf_num + 1); MPI_Isend([buf_num][0], byte_len, MPI_BYTE, dest, tag, MPI_COMM_WORLD, [buf_num]); } void poll() { // Call this periodically while (true) { int index, flag; MPI_Testany((int)requests.size(), [0], , , MPI_STATUS_IGNORE); if (flag && index != MPI_UNDEFINED) { buffers[index].clear(); // Free memory } else { break; } } } }; bool test_for_message(void* buf, int max_len, MPI_Status& st) { int flag; MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, , ); return (flag != 0); } If test_for_message returns true, you can then use MPI_Recv to get the message. I have tried to use the combination of MPI_Send, MPI_Recv, MPI_Iprobe, MPI_Isend, MPI_Irecv, MPI_Test etc, but I am not getting that thing that I am looking for. I think MPI should also provide that way. May be it is not in my knowledge. That's why I am asking the experts. I am still looking for it :( -- Jeremiah Willcock
Re: [OMPI users] UDP like messaging with MPI
Thank you for your answer. Actually, I used the term UDP to show the non-connection oriented messaging. TCP creates connection between two parties (who communicate) but in UDP a message can be sent to any IP/port where a process/thread is listening to, and if the process is busy in doing something, all the received messages are queued for it and when ever it calls the recv function one message is taken from the queue. I am implementing a distributed algorithm that will provide communication sensitive load balancing for computational loads. For example, if we have 10 nodes each containing 10 cores (100 cores in total). So when MPI application will start (let say with 1000) processes (more than 1 process per core) then I will run my distributed algorithm MPI_Balance (sorry for giving MPI_ prefix as it is not a part of MPI, but I am trying to make it the part of MPI ;) ). So that algorithm will take those processes that communicate more in the same node (keeping the computational load on 10 cores on that node balanced). So that was the little bit explanation. So for that my distributed algorithm requires that some processes communicate with each other to collaborate on something. So I need a kind of messaging that I explained above. It is kind of UDP messaging (no connection before sending a message, and message is always queued on the receiver's side and sender is not blocked, it just sends the message and the receiver takes it when it gets free from other task). I have tried to use the combination of MPI_Send, MPI_Recv, MPI_Iprobe, MPI_Isend, MPI_Irecv, MPI_Test etc, but I am not getting that thing that I am looking for. I think MPI should also provide that way. May be it is not in my knowledge. That's why I am asking the experts. I am still looking for it :( thanks and regards, Mudassar Majeed PhD Student Linkoping University PhD Topic: Parallel Computing (Optimal composition of parallel programs and runtime support). From: Jeff Squyres <jsquy...@cisco.com> To: mudassar...@yahoo.com; Open MPI Users <us...@open-mpi.org> Cc: "li...@razik.name" <li...@razik.name> Sent: Monday, November 21, 2011 6:07 PM Subject: Re: [OMPI users] UDP like messaging with MPI MPI defines only reliable communications -- it's not quite the same thing as UDP. Hence, if you send something, it is guaranteed to be able to be received. UDP may drop packets whenever it feels like it (e.g., when it is out of resources). Most MPI implementations will do some form of buffering of unexpected receives. So if process A sends message X to process B, if B hasn't posted a matching receive for message X yet, B will likely silently accept the message under the covers and buffer it (or at least buffer part of it). Hence, when you finally post the matching X receive in B, whatever of X was already received will already be there, but B may need to send a clear-to-send to A to get the rest of the message. Specifically: if X is "short", A may eagerly send the whole message to B. If X is "long", A may only send the first part of B and wait for a CTS before sending the rest of it. MPI implementations typically do this in order to conserve buffer space -- i.e., if A sends a 10MB message, there's no point in buffering it at B until the matching receive is made and the message can be received directly into the destination 10MB buffer that B has made available. If B accepted the 10MB X early, it would cost an additional 10MB to buffer it. Ick. Alternatively, what I think Lukas was trying to suggest was that you can post non-blocking receives and simply test for completion later. This allows MPI to receive straight into the target buffer without intermediate copies or additional buffers. Then you can just check to see when the receive(s) is(are) done. On Nov 19, 2011, at 10:47 AM, Mudassar Majeed wrote: > I know about tnıs functıons, they special requirements like the mpi_irecv > call should be made in every process. My processes should not look for > messages or implicitly receive them. But messages shuddering go into their > msg queues and retrieved when needed. Just like udp communication. > > Regards > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] UDP like messaging with MPI
MPI defines only reliable communications -- it's not quite the same thing as UDP. Hence, if you send something, it is guaranteed to be able to be received. UDP may drop packets whenever it feels like it (e.g., when it is out of resources). Most MPI implementations will do some form of buffering of unexpected receives. So if process A sends message X to process B, if B hasn't posted a matching receive for message X yet, B will likely silently accept the message under the covers and buffer it (or at least buffer part of it). Hence, when you finally post the matching X receive in B, whatever of X was already received will already be there, but B may need to send a clear-to-send to A to get the rest of the message. Specifically: if X is "short", A may eagerly send the whole message to B. If X is "long", A may only send the first part of B and wait for a CTS before sending the rest of it. MPI implementations typically do this in order to conserve buffer space -- i.e., if A sends a 10MB message, there's no point in buffering it at B until the matching receive is made and the message can be received directly into the destination 10MB buffer that B has made available. If B accepted the 10MB X early, it would cost an additional 10MB to buffer it. Ick. Alternatively, what I think Lukas was trying to suggest was that you can post non-blocking receives and simply test for completion later. This allows MPI to receive straight into the target buffer without intermediate copies or additional buffers. Then you can just check to see when the receive(s) is(are) done. On Nov 19, 2011, at 10:47 AM, Mudassar Majeed wrote: > I know about tnıs functıons, they special requirements like the mpi_irecv > call should be made in every process. My processes should not look for > messages or implicitly receive them. But messages shuddering go into their > msg queues and retrieved when needed. Just like udp communication. > > Regards > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] UDP like messaging with MPI
Hi! >I know about tnıs functıons, they special requirements like the mpi_irecv >call should be made in every process. My processes should not look for >messages or implicitly receive them. I understand. But then I think your UDP comparison is wrong - whatever... :) > But messages shuddering go into their msg queues and retrieved when needed. I don't know if MPI has something like message queues. If not then I would use "MPI RMA" (i.e. MPI Remote Memory Access from MPI-2) and implement a message queue like this: 1. On the node which shall get the messages (I call it G) I would create a buffer with a size which is as big as needed to store enough messages from the processes which shall write into the message queue. 2. With the collective operation MPI_Win_create() I would make the buffer of G available for all participating nodes. 3. Now the sender nodes can put data into the window buffer with MPI_Put(): Here the problem is that they must know which part of the buffer is already overwritten with messages of other senders. Therefore I would use a integer value at the beginning of the window buffer. This value shows the next free position in the window buffer. In other words each sender who wants to save a message in the window buffer a) reads the integer value from the buffer b) saves its message at the free place in the buffer c) increments the integer value by the size of the written message 4. To avoid race conditions you must lock the accesses to the window by MPI_Win_lockI() and MPI_Win_unlock(). BTW: There are three type of synchronization calls: a) MPI_Win_fence() b) MPI_Win_start(), MPI_Win_complete, MPI_Win_post(), MPI_Win_wait() c) MPI_Win_lock(), MPI_Win_unlock()But I think the right one here is c) because then the target process of your MPI_put() (i.e. G) doesn't need to be involved in the communication. Therefore a communication which is synchronized by method c) is called a "passive target communication". Have a look into the MPI-2 standard if you don't know what that means. 5. When G wants to "receive" from the windows buffer it must also call the MPI_Win_lock() & MPI_Win_unlock() operations. Then it reads one or more messages from the buffer window. After that it must decrement the integer value by the size of the read message(s). Hints: This would be a LIFO message queue. If you want a FIFO queue then implement a "ring buffer". Therefore you could use two integer values at the beginning of the buffer which show the head and the tail of the queue. Maybe there's also an more efficient way but that's an idea I have. > Just like udp communication. In the best of my knowledge "normal" UDP sockets have no receive queues . So if there's no receiver which waits for an incoming UDP datagram then it's discarded, isn't it? (Maybe asynchronous UDP sockets have queues...) Regards, Lukas
Re: [OMPI users] UDP like messaging with MPI
I know about tnıs functıons, they special requirements like the mpi_irecv call should be made in every process. My processes should not look for messages or implicitly receive them. But messages shuddering go into their msg queues and retrieved when needed. Just like udp communication. Regards