Re: [OMPI devel] Threaded progress for CPCs

2008-05-21 Thread Jeff Squyres
One more point that Pasha and I hashed out yesterday in IM... To avoid the problem of posting a short handshake buffer to already- existing SRQs, we will only do the extra handshake if there are PPRQ's in receive_queues. The handshake will go across the smallest PPRQ, and represent all QPs

Re: [OMPI devel] Threaded progress for CPCs

2008-05-20 Thread Jeff Squyres
Ok, I think we're mostly converged on a solution. This might not get implemented immediately (got some other pending v1.3 stuff to bug fix, etc.), but it'll happen for v1.3. - endpoint creation will mpool alloc/register a small buffer for handshake - cpc does not need to call _post_recvs()

Re: [OMPI devel] Threaded progress for CPCs

2008-05-20 Thread Pavel Shamis (Pasha)
Is it possible to have sane SRQ implementation without HW flow control? It seems pretty unlikely if the only available HW flow control is to terminate the connection. ;-) Even if we can get the iWARP semantics to work, this feels kinda icky. Perhaps I'm overreacting and this is

Re: [OMPI devel] Threaded progress for CPCs

2008-05-20 Thread Gleb Natapov
On Mon, May 19, 2008 at 01:38:53PM -0400, Jeff Squyres wrote: > >> 5. ...? > > What about moving posting of receive buffers into main thread. With > > SRQ it is easy: don't post anything in CPC thread. Main thread will > > prepost buffers automatically after first fragment received on the > > endpo

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Steve Wise
Jeff Squyres wrote: On May 19, 2008, at 4:44 PM, Steve Wise wrote: 1. Posting more at low watermark can lead to DoS-like behavior when you have a fast sender and a slow receiver. This is exactly the resource-exhaustion kind of behavior that a high quality MPI implementation is supposed to a

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Jeff Squyres
On May 19, 2008, at 4:44 PM, Steve Wise wrote: 1. Posting more at low watermark can lead to DoS-like behavior when you have a fast sender and a slow receiver. This is exactly the resource-exhaustion kind of behavior that a high quality MPI implementation is supposed to avoid -- we really should

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Steve Wise
Jeff Squyres wrote: On May 19, 2008, at 3:40 PM, Jon Mason wrote: iWARP needs preposted recv buffers (or it will drop the connection). So this isn't a good option. I was talking about SRQ only. You said above that iwarp does retransmit for SRQ. openib BTL relies on HW retransmi

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Jeff Squyres
On May 19, 2008, at 3:40 PM, Jon Mason wrote: iWARP needs preposted recv buffers (or it will drop the connection). So this isn't a good option. I was talking about SRQ only. You said above that iwarp does retransmit for SRQ. openib BTL relies on HW retransmit when using SRQ, so if iwarp d

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Jon Mason
On Mon, May 19, 2008 at 10:12:19PM +0300, Gleb Natapov wrote: > On Mon, May 19, 2008 at 01:52:22PM -0500, Jon Mason wrote: > > On Mon, May 19, 2008 at 05:17:57PM +0300, Gleb Natapov wrote: > > > On Mon, May 19, 2008 at 05:08:17PM +0300, Pavel Shamis (Pasha) wrote: > > > > >> 5. ...? > > > > >>

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Gleb Natapov
On Mon, May 19, 2008 at 01:52:22PM -0500, Jon Mason wrote: > On Mon, May 19, 2008 at 05:17:57PM +0300, Gleb Natapov wrote: > > On Mon, May 19, 2008 at 05:08:17PM +0300, Pavel Shamis (Pasha) wrote: > > > >> 5. ...? > > > >> > > > > What about moving posting of receive buffers into main thread.

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Jon Mason
On Mon, May 19, 2008 at 01:38:53PM -0400, Jeff Squyres wrote: > On May 19, 2008, at 8:25 AM, Gleb Natapov wrote: > > > Is it possible to have sane SRQ implementation without HW flow > > control? > > It seems pretty unlikely if the only available HW flow control is to > terminate the connectio

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Jon Mason
On Mon, May 19, 2008 at 05:17:57PM +0300, Gleb Natapov wrote: > On Mon, May 19, 2008 at 05:08:17PM +0300, Pavel Shamis (Pasha) wrote: > > >> 5. ...? > > >> > > > What about moving posting of receive buffers into main thread. With > > > SRQ it is easy: don't post anything in CPC thread. Main th

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Jeff Squyres
On May 19, 2008, at 8:25 AM, Gleb Natapov wrote: Is it possible to have sane SRQ implementation without HW flow control? It seems pretty unlikely if the only available HW flow control is to terminate the connection. ;-) Even if we can get the iWARP semantics to work, this feels kinda ic

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Gleb Natapov
On Mon, May 19, 2008 at 07:39:13PM +0300, Pavel Shamis (Pasha) wrote: So this solution will cost 1 buffer on each srq ... sounds acceptable for me. But I don't see too much difference compared to #1, as I understand we anyway will be need the pipe for communication with mai

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Pavel Shamis (Pasha)
What about moving posting of receive buffers into main thread. With SRQ it is easy: don't post anything in CPC thread. Main thread will prepost buffers automatically after first fragment received on the endpoint (in btl_openib_handle_incoming()). It still doesn't guaranty tha

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Gleb Natapov
On Mon, May 19, 2008 at 05:08:17PM +0300, Pavel Shamis (Pasha) wrote: > >> 5. ...? > >> > > What about moving posting of receive buffers into main thread. With > > SRQ it is easy: don't post anything in CPC thread. Main thread will > > prepost buffers automatically after first fragment receive

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Pavel Shamis (Pasha)
1. When CM progress thread completes an incoming connection, it sends a command down a pipe to the main thread indicating that a new endpoint is ready to use. The pipe message will be noticed by opal_progress() in the main thread and will run a function to do all necessary housekeeping (

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Gleb Natapov
On Sun, May 18, 2008 at 11:38:36AM -0400, Jeff Squyres wrote: > ==> Remember that the goal for this work was to have a separate > progress thread *without* all the heavyweight OMPI thread locks. > Specifically: make it work in a build without --enable-progress- > threads or --enable-mpi-threa