Re: [OMPI devel] OpenIB BTL and SRQs

2007-07-13 Thread Don Kerr



Jeff Squyres wrote:


On Jul 12, 2007, at 1:18 PM, Don Kerr wrote:

 


- So if you want to simply eliminate the flow control, choose M high
enough (or just a total number of receive buffers to post to the SRQ)
that you won't ever run out of resources and you should see some
speedup from lack of flow control.  This obviously mainly helps apps
with lots of small messages; it may not help in many other cases.

 


Is there any distinction by the size of the message. If the "M"
parameter is set high does the openib btl post this many recv buffers
for the SRQ on both QPs?  Or are SRQs only created on one of the QPs?
   



Keep in mind that the SRQs are only for send/receive messages, not  
RDMA messages.
 

That is obviously enough but isn't there a window for MPI messages that 
are greater than the eager limit but less than where the rdma protocol 
kicks in and fragments for this size message use fragments larger than 
than the eager size.


Maybe this is where openib's high and low priority qp differ from udapl 
which makes a choice of which endpoint to use based on the size of the 
fragment. That is why I was curious if openib was using SRQs on both 
queue pairs.


Each receive buffer has a max size (the eager limit, IIRC).  So if  
the message is larger than that, we'll fragment per the pipeline  
protocol, possibly subject to doing RDMA if the message is large  
enough, yadda yadda yadda.  More specifically, the size of the buffer  
is not dependent upon an individual message that is being sent or  
received (since they're pre-posted -- we have no idea what the  
message sizes will be).


As for whether the SRQ is on both QP's, this is a Galen/George/Gleb  
(G^3) question...


 



Re: [OMPI devel] OpenIB BTL and SRQs

2007-07-12 Thread Jeff Squyres

On Jul 12, 2007, at 1:18 PM, Don Kerr wrote:


- So if you want to simply eliminate the flow control, choose M high
enough (or just a total number of receive buffers to post to the SRQ)
that you won't ever run out of resources and you should see some
speedup from lack of flow control.  This obviously mainly helps apps
with lots of small messages; it may not help in many other cases.


Is there any distinction by the size of the message. If the "M"
parameter is set high does the openib btl post this many recv buffers
for the SRQ on both QPs?  Or are SRQs only created on one of the QPs?


Keep in mind that the SRQs are only for send/receive messages, not  
RDMA messages.


Each receive buffer has a max size (the eager limit, IIRC).  So if  
the message is larger than that, we'll fragment per the pipeline  
protocol, possibly subject to doing RDMA if the message is large  
enough, yadda yadda yadda.  More specifically, the size of the buffer  
is not dependent upon an individual message that is being sent or  
received (since they're pre-posted -- we have no idea what the  
message sizes will be).


As for whether the SRQ is on both QP's, this is a Galen/George/Gleb  
(G^3) question...


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] OpenIB BTL and SRQs

2007-07-12 Thread Don Kerr



Jeff Squyres wrote:


There's a few benefits:

- Remember that you post a big pool of buffers instead of num_peers  
individual sets of receive buffers.  Hence, if you post M buffers for  
each of N peers, each peer -- due to flow control -- can only have M  
outstanding sends at a time.  So if you have apps sending lots of  
small messages, you can get better utilization of buffer space  
because a single peer has more than M buffers to receive into.


- You can also post less than M*N buffers by playing the statistics  
of your app -- if you know that you won't have more than M*N messages  
outstanding at any given time, you can post fewer receive buffers.
 

- At the same time, there's a problem with flow control (meaning that  
there is none): how can a sender know when they have overflowed the  
receiver (other than an RNR)?  So it's not necessarily as safe.


- So if you want to simply eliminate the flow control, choose M high  
enough (or just a total number of receive buffers to post to the SRQ)  
that you won't ever run out of resources and you should see some  
speedup from lack of flow control.  This obviously mainly helps apps  
with lots of small messages; it may not help in many other cases. 
 

Is there any distinction by the size of the message. If the "M" 
parameter is set high does the openib btl post this many recv buffers 
for the SRQ on both QPs?  Or are SRQs only created on one of the QPs?




On Jul 12, 2007, at 12:29 PM, Don Kerr wrote:

 


Through mca parameters one can select the use of shared receive queues
in the openib btl, other than having fewer queues I am wondering what
are the benefits of using this option. Can anyone eleborate on using
them vs the default?

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   




 



Re: [OMPI devel] OpenIB BTL and SRQs

2007-07-12 Thread Don Kerr
Interesting. So with SRQs there is no flow control, I am guessing the 
btl sets some reasonable default but essentially is relying on the user 
to adjust other parameters so the buffers are not over run.


And yes Galen I would like to read your paper.

Jeff Squyres wrote:


There's a few benefits:

- Remember that you post a big pool of buffers instead of num_peers  
individual sets of receive buffers.  Hence, if you post M buffers for  
each of N peers, each peer -- due to flow control -- can only have M  
outstanding sends at a time.  So if you have apps sending lots of  
small messages, you can get better utilization of buffer space  
because a single peer has more than M buffers to receive into.


- You can also post less than M*N buffers by playing the statistics  
of your app -- if you know that you won't have more than M*N messages  
outstanding at any given time, you can post fewer receive buffers.


- At the same time, there's a problem with flow control (meaning that  
there is none): how can a sender know when they have overflowed the  
receiver (other than an RNR)?  So it's not necessarily as safe.


- So if you want to simply eliminate the flow control, choose M high  
enough (or just a total number of receive buffers to post to the SRQ)  
that you won't ever run out of resources and you should see some  
speedup from lack of flow control.  This obviously mainly helps apps  
with lots of small messages; it may not help in many other cases.



On Jul 12, 2007, at 12:29 PM, Don Kerr wrote:

 


Through mca parameters one can select the use of shared receive queues
in the openib btl, other than having fewer queues I am wondering what
are the benefits of using this option. Can anyone eleborate on using
them vs the default?

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   




 



Re: [OMPI devel] OpenIB BTL and SRQs

2007-07-12 Thread Jeff Squyres

There's a few benefits:

- Remember that you post a big pool of buffers instead of num_peers  
individual sets of receive buffers.  Hence, if you post M buffers for  
each of N peers, each peer -- due to flow control -- can only have M  
outstanding sends at a time.  So if you have apps sending lots of  
small messages, you can get better utilization of buffer space  
because a single peer has more than M buffers to receive into.


- You can also post less than M*N buffers by playing the statistics  
of your app -- if you know that you won't have more than M*N messages  
outstanding at any given time, you can post fewer receive buffers.


- At the same time, there's a problem with flow control (meaning that  
there is none): how can a sender know when they have overflowed the  
receiver (other than an RNR)?  So it's not necessarily as safe.


- So if you want to simply eliminate the flow control, choose M high  
enough (or just a total number of receive buffers to post to the SRQ)  
that you won't ever run out of resources and you should see some  
speedup from lack of flow control.  This obviously mainly helps apps  
with lots of small messages; it may not help in many other cases.



On Jul 12, 2007, at 12:29 PM, Don Kerr wrote:


Through mca parameters one can select the use of shared receive queues
in the openib btl, other than having fewer queues I am wondering what
are the benefits of using this option. Can anyone eleborate on using
them vs the default?

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] OpenIB BTL and SRQs

2007-07-12 Thread Galen Shipman


On Jul 12, 2007, at 10:29 AM, Don Kerr wrote:


Through mca parameters one can select the use of shared receive queues
in the openib btl, other than having fewer queues I am wondering what
are the benefits of using this option. Can anyone eleborate on using
them vs the default?

In the trunk the number of queue pairs is the same, regardless of SRQ  
or NON-SRQ hence forth named PP (per-peer).
The difference is that PP receive resources scale with the number of  
active QP connections. SRQ receive resources do not.
So the real difference is the memory footprint of the the receive  
resources. SRQ is potentially much smaller. This comes at a cost; SRQ  
does not have flow control as we cannot reserve resources for a  
particular peer, so we do have the possibility of an RNR (receiver  
not ready) NAK if all the shared receive resources are consumed and  
some peer is still transmitting messages. This has a performance  
penalty as an RNR NAK stalls the IB pipeline. With PP, we can  
guarantee that resources are available to the peer and thereby avoid  
RNR (although there is a bug in the trunk right now in that sometimes  
we get RNR even with PP, but this is being worked on).


I have been working on a modification to the OpenIB BTL which allows  
the user to specify SRQ and PP QPs arbitrarily. That is we can use a  
mix of PP and SRQ with a mix of receive sizes for each. This is  
coming into the trunk very soon, perhaps tomorrow but we need to  
verify the branch with some additional testing.


I hope this helps, I have a paper at EuroPVM/MPI that discusses much  
of this, I will send you a copy off list.


- Galen



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] OpenIB BTL and SRQs

2007-07-12 Thread Don Kerr
Through mca parameters one can select the use of shared receive queues 
in the openib btl, other than having fewer queues I am wondering what 
are the benefits of using this option. Can anyone eleborate on using 
them vs the default?