rules.
Rich
On 1/22/09 12:51 PM, "Eugene Loh" <eugene@sun.com> wrote:
> Richard Graham wrote:
>> Re: [OMPI devel] RFC: sm Latency In the recvi function, do you first try to
>> match off the unexpected list before you try and match data in the fifo¹s?
>
BTW,
In the recvi function, do you first try to match off the unexpected list
before you try and match data in the fifo¹s ?
Rich
On 1/21/09 8:00 PM, "Eugene Loh" wrote:
> Ron Brightwell wrote:
>>
>>>
>>> If you poll only the queue that correspond to a posted
Ron Brightwell wrote:
If you poll only the queue that correspond to a posted receive, you only optimize micro-benchmarks, until they start using ANY_SOURCE.
Note that the HPCC RandomAccess benchmark only uses MPI_ANY_SOURCE (and
MPI_ANY_TAG).
But HPCC RandomAccess also
> > Possibly, you meant to ask how one does directed polling with a wildcard
> > source MPI_ANY_SOURCE. If that was your question, the answer is we
> > punt. We report failure to the ULP, which reverts to the standard code
> > path.
>
> Sorry, I meant ANY_SOURCE. If you poll only the queue
Eugene Loh wrote:
Possibly, you meant to ask how one does directed polling with a wildcard
source MPI_ANY_SOURCE. If that was your question, the answer is we
punt. We report failure to the ULP, which reverts to the standard code
path.
Sorry, I meant ANY_SOURCE. If you poll only the queue
Patrick Geoffray wrote:
Eugene Loh wrote:
To recap:
1) The work is already done.
How do you do "directed polling" with ANY_TAG ?
Not sure I understand the question. So, maybe we start by being
explicitly about what we mean by "directed polling".
Currently, the sm BTL
Brian is referring to the "rdma" onesided component (OMPI osd
framework) that directly invokes the BTL functions (vs. using the PML
send/receive functions). The osd matching is quite different than
pt2pt matching.
His concern is that that model continues to work -- e.g., if the rdma
osd
Richard Graham wrote:
On 1/20/09 8:53 PM, "Jeff Squyres" wrote:
Eugene: you mentioned that there are other possibilities to having the
BTL understand match headers, such as a callback into the PML. Have
you tried this approach to see what the performance cost
Richard Graham wrote:
Re: [OMPI devel] RFC: sm Latency
On 1/20/09 2:08 PM, "Eugene Loh" <eugene@sun.com> wrote:
Richard Graham wrote:
Re: [OMPI devel] RFC: sm Latency First, the
performance improvements look really nice.
A few questions:
- How much o
Brian Barrett wrote:
I unfortunately don't have time to look in depth at the patch. But
my concern is that currently (today, not at some made up time in the
future, maybe), we use the BTLs for more than just MPI point-to-
point. The rdma one-sided component (which was added for 1.3 and
Eugene,
All my remarks are related to the receive side. I think the send side
optimizations are fine, but don't take my word for it.
Eugene Loh wrote:
> To recap:
> 1) The work is already done.
How do you do "directed polling" with ANY_TAG ? How do you ensure you
check all incoming queues from
On 1/20/09 8:53 PM, "Jeff Squyres" wrote:
> This all sounds really great to me. I agree with most of what has
> been said -- e.g., benchmarks *are* important. Improving them can
> even sometimes have the side effect of improving real applications. ;-)
>
> My one big
On 1/20/09 2:08 PM, "Eugene Loh" <eugene@sun.com> wrote:
> Richard Graham wrote:
>> Re: [OMPI devel] RFC: sm Latency First, the performance improvements look
>> really nice.
>> A few questions:
>> - How much of an abstraction violation doe
I unfortunately don't have time to look in depth at the patch. But my
concern is that currently (today, not at some made up time in the
future, maybe), we use the BTLs for more than just MPI point-to-
point. The rdma one-sided component (which was added for 1.3 and
hopefully will be the
On Jan 20, 2009, at 8:53 PM, Jeff Squyres wrote:
This all sounds really great to me. I agree with most of what has
been said -- e.g., benchmarks *are* important. Improving them can
even sometimes have the side effect of improving real
applications. ;-)
My one big concern is the moving
This all sounds really great to me. I agree with most of what has
been said -- e.g., benchmarks *are* important. Improving them can
even sometimes have the side effect of improving real applications. ;-)
My one big concern is the moving of architectural boundaries of making
the btl
Patrick Geoffray wrote:
>Eugene Loh wrote:
>
>
>>>replace the fifo’s with a single link list per process in shared
>>>memory, with senders to this process adding match envelopes
>>>atomically, with each process reading its own link list (multiple
>>>
>>>
>>*) Doesn't strike me as a
Hi Eugene,
Eugene Loh wrote:
>> replace the fifo’s with a single link list per process in shared
>> memory, with senders to this process adding match envelopes
>> atomically, with each process reading its own link list (multiple
> *) Doesn't strike me as a "simple" change.
Actually, it's
Richard Graham wrote:
Re: [OMPI devel] RFC: sm Latency
First, the performance improvements look
really nice.
A few questions:
- How much of an abstraction violation does this introduce?
Doesn't need to be much of an abstraction violation at all if, by that,
we mean teaching the BTL
and memory contention
is manageble.
Rich
- Original Message -
From: devel-boun...@open-mpi.org <devel-boun...@open-mpi.org>
To: Open MPI Developers <de...@open-mpi.org>
Sent: Tue Jan 20 06:56:53 2009
Subject: Re: [OMPI devel] RFC: sm Latency
Richard Graham wrote:
> First,
Richard Graham wrote:
> First, the performance improvements look really nice.
> A few questions:
> - How much of an abstraction violation does this introduce ? This
> looks like the btl needs to start “knowing” about MPI level semantics.
> Currently, the btl purposefully is ulp agnostic. I ask for
First, the performance improvements look really nice.
A few questions:
- How much of an abstraction violation does this introduce ? This looks
like the btl needs to start “knowing” about MPI level semantics. Currently,
the btl purposefully is ulp agnostic. I ask for 2 reasons
- you
Title: RFC: sm Latency
RFC: sm Latency
WHAT: Introducing optimizations to reduce ping-pong
latencies over the sm BTL.
WHY: This is a visible benchmark of MPI performance.
We can improve shared-memory latencies from 30% (if hardware
latency is the limiting factor) to 2× or more (if MPI
23 matches
Mail list logo