Re: [OMPI devel] RFC: sm Latency

2009-01-22 Thread Richard Graham
rules. Rich On 1/22/09 12:51 PM, "Eugene Loh" <eugene@sun.com> wrote: > Richard Graham wrote: >> Re: [OMPI devel] RFC: sm Latency In the recvi function, do you first try to >> match off the unexpected list before you try and match data in the fifo¹s? >

Re: [OMPI devel] RFC: sm Latency

2009-01-22 Thread Richard Graham
BTW, In the recvi function, do you first try to match off the unexpected list before you try and match data in the fifo¹s ? Rich On 1/21/09 8:00 PM, "Eugene Loh" wrote: > Ron Brightwell wrote: >> >>> >>> If you poll only the queue that correspond to a posted

Re: [OMPI devel] RFC: sm Latency

2009-01-21 Thread Eugene Loh
Ron Brightwell wrote: If you poll only the queue that correspond to a posted receive, you only optimize micro-benchmarks, until they start using ANY_SOURCE. Note that the HPCC RandomAccess benchmark only uses MPI_ANY_SOURCE (and MPI_ANY_TAG). But HPCC RandomAccess also

Re: [OMPI devel] RFC: sm Latency

2009-01-21 Thread Ron Brightwell
> > Possibly, you meant to ask how one does directed polling with a wildcard > > source MPI_ANY_SOURCE. If that was your question, the answer is we > > punt. We report failure to the ULP, which reverts to the standard code > > path. > > Sorry, I meant ANY_SOURCE. If you poll only the queue

Re: [OMPI devel] RFC: sm Latency

2009-01-21 Thread Patrick Geoffray
Eugene Loh wrote: Possibly, you meant to ask how one does directed polling with a wildcard source MPI_ANY_SOURCE. If that was your question, the answer is we punt. We report failure to the ULP, which reverts to the standard code path. Sorry, I meant ANY_SOURCE. If you poll only the queue

Re: [OMPI devel] RFC: sm Latency

2009-01-21 Thread Eugene Loh
Patrick Geoffray wrote: Eugene Loh wrote: To recap: 1) The work is already done. How do you do "directed polling" with ANY_TAG ? Not sure I understand the question.  So, maybe we start by being explicitly about what we mean by "directed polling". Currently, the sm BTL

Re: [OMPI devel] RFC: sm Latency

2009-01-21 Thread Jeff Squyres
Brian is referring to the "rdma" onesided component (OMPI osd framework) that directly invokes the BTL functions (vs. using the PML send/receive functions). The osd matching is quite different than pt2pt matching. His concern is that that model continues to work -- e.g., if the rdma osd

Re: [OMPI devel] RFC: sm Latency

2009-01-21 Thread Eugene Loh
Richard Graham wrote: On 1/20/09 8:53 PM, "Jeff Squyres" wrote: Eugene: you mentioned that there are other possibilities to having the BTL understand match headers, such as a callback into the PML. Have you tried this approach to see what the performance cost

Re: [OMPI devel] RFC: sm Latency

2009-01-21 Thread Eugene Loh
Richard Graham wrote: Re: [OMPI devel] RFC: sm Latency On 1/20/09 2:08 PM, "Eugene Loh" <eugene@sun.com> wrote: Richard Graham wrote: Re: [OMPI devel] RFC: sm Latency First, the performance improvements look really nice. A few questions:   - How much o

Re: [OMPI devel] RFC: sm Latency

2009-01-21 Thread Eugene Loh
Brian Barrett wrote: I unfortunately don't have time to look in depth at the patch. But my concern is that currently (today, not at some made up time in the future, maybe), we use the BTLs for more than just MPI point-to- point. The rdma one-sided component (which was added for 1.3 and

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Patrick Geoffray
Eugene, All my remarks are related to the receive side. I think the send side optimizations are fine, but don't take my word for it. Eugene Loh wrote: > To recap: > 1) The work is already done. How do you do "directed polling" with ANY_TAG ? How do you ensure you check all incoming queues from

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Richard Graham
On 1/20/09 8:53 PM, "Jeff Squyres" wrote: > This all sounds really great to me. I agree with most of what has > been said -- e.g., benchmarks *are* important. Improving them can > even sometimes have the side effect of improving real applications. ;-) > > My one big

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Richard Graham
On 1/20/09 2:08 PM, "Eugene Loh" <eugene@sun.com> wrote: > Richard Graham wrote: >> Re: [OMPI devel] RFC: sm Latency First, the performance improvements look >> really nice. >> A few questions: >> - How much of an abstraction violation doe

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Brian Barrett
I unfortunately don't have time to look in depth at the patch. But my concern is that currently (today, not at some made up time in the future, maybe), we use the BTLs for more than just MPI point-to- point. The rdma one-sided component (which was added for 1.3 and hopefully will be the

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Jeff Squyres
On Jan 20, 2009, at 8:53 PM, Jeff Squyres wrote: This all sounds really great to me. I agree with most of what has been said -- e.g., benchmarks *are* important. Improving them can even sometimes have the side effect of improving real applications. ;-) My one big concern is the moving

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Jeff Squyres
This all sounds really great to me. I agree with most of what has been said -- e.g., benchmarks *are* important. Improving them can even sometimes have the side effect of improving real applications. ;-) My one big concern is the moving of architectural boundaries of making the btl

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Eugene Loh
Patrick Geoffray wrote: >Eugene Loh wrote: > > >>>replace the fifo’s with a single link list per process in shared >>>memory, with senders to this process adding match envelopes >>>atomically, with each process reading its own link list (multiple >>> >>> >>*) Doesn't strike me as a

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Patrick Geoffray
Hi Eugene, Eugene Loh wrote: >> replace the fifo’s with a single link list per process in shared >> memory, with senders to this process adding match envelopes >> atomically, with each process reading its own link list (multiple > *) Doesn't strike me as a "simple" change. Actually, it's

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Eugene Loh
Richard Graham wrote: Re: [OMPI devel] RFC: sm Latency First, the performance improvements look really nice. A few questions:   - How much of an abstraction violation does this introduce? Doesn't need to be much of an abstraction violation at all if, by that, we mean teaching the BTL

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Graham, Richard L.
and memory contention is manageble. Rich - Original Message - From: devel-boun...@open-mpi.org <devel-boun...@open-mpi.org> To: Open MPI Developers <de...@open-mpi.org> Sent: Tue Jan 20 06:56:53 2009 Subject: Re: [OMPI devel] RFC: sm Latency Richard Graham wrote: > First,

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Terry Dontje
Richard Graham wrote: > First, the performance improvements look really nice. > A few questions: > - How much of an abstraction violation does this introduce ? This > looks like the btl needs to start “knowing” about MPI level semantics. > Currently, the btl purposefully is ulp agnostic. I ask for

Re: [OMPI devel] RFC: sm Latency

2009-01-17 Thread Richard Graham
First, the performance improvements look really nice. A few questions: - How much of an abstraction violation does this introduce ? This looks like the btl needs to start “knowing” about MPI level semantics. Currently, the btl purposefully is ulp agnostic. I ask for 2 reasons - you

[OMPI devel] RFC: sm Latency

2009-01-17 Thread Eugene Loh
Title: RFC: sm Latency RFC: sm Latency WHAT: Introducing optimizations to reduce ping-pong latencies over the sm BTL. WHY: This is a visible benchmark of MPI performance. We can improve shared-memory latencies from 30% (if hardware latency is the limiting factor) to 2× or more (if MPI