On Mon, Dec 17, 2007 at 08:08:02PM -0500, Richard Graham wrote:
> Needless to say (for the nth time :-) ) that changing this bit of code
> makes me
> nervous.
I've noticed it already :)
>However, it occurred to me that there is a much better way to
> test
> this code than setting
On Thu, Dec 13, 2007 at 08:04:21PM -0500, Richard Graham wrote:
> Yes, should be a bit more clear. Need an independent way to verify that
> data is matched
> in the correct order sending this information as payload is one way to do
> this. So,
> sending unique data in every message, and
On Fri, Dec 14, 2007 at 06:53:55AM -0500, Richard Graham wrote:
> If you have positive confirmation that such things have happened, this will
> go a long way.
I instrumented the code to log all kind of info about fragment reordering while
I chased a bug in openib that caused matching logic to
If you have positive confirmation that such things have happened, this will
go a long way. I will not trust the code until this has also been done with
multiple independent network paths. I very rarely express such strong
opinions, even if I don't agree with what is being done, but this is the
Yes, should be a bit more clear. Need an independent way to verify that
data is matched
in the correct order sending this information as payload is one way to do
this. So,
sending unique data in every message, and making sure that it arrives in
the user buffers
in the expected order is a
The situation that needs to be triggered, just as George has mentions, is
where we have a lot of unexpected messages, to make sure that when one that
we can match against comes in, all the unexpected messages that can be
matched with pre-posted receives are matched. Since we attempt to match
only
Rich was referring to the fact that the reordering of fragments other
than the matching ones is irrelevant to the Gleb's change. In order to
trigger the changes we need to force a lot of small unexpected
messages over multiple networks. The testing environment should have
multiple similar
On Wed, Dec 12, 2007 at 03:10:10PM -0600, Brian W. Barrett wrote:
> On Wed, 12 Dec 2007, Gleb Natapov wrote:
>
> > On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote:
> >> This is better than nothing, but really not very helpful for looking at the
> >> specific issues that can arise
[mailto:gl...@voltaire.com]
Sent: Wednesday, December 12, 2007 03:54 PM Eastern Standard Time
To: Open MPI Developers
Subject:Re: [OMPI devel] matching code rewrite in OB1
On Wed, Dec 12, 2007 at 03:52:17PM -0500, Jeff Squyres wrote:
> On Dec 12, 2007, at 3:20 PM, Gleb Natapov wr
Was Rich referring to ensuring that the test codes checked that their
payloads were correct (and not re-assembled in the wrong order)?
On Dec 12, 2007, at 4:10 PM, Brian W. Barrett wrote:
On Wed, 12 Dec 2007, Gleb Natapov wrote:
On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham
On Wed, 12 Dec 2007, Gleb Natapov wrote:
On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote:
This is better than nothing, but really not very helpful for looking at the
specific issues that can arise with this, unless these systems have several
parallel networks, with tests that
: [OMPI devel] matching code rewrite in OB1
On Wed, Dec 12, 2007 at 03:52:17PM -0500, Jeff Squyres wrote:
> On Dec 12, 2007, at 3:20 PM, Gleb Natapov wrote:
>
> >> How about making a tarball with this patch in it that can be thrown
> >> at
> >> everyone's MTT? (w
On Wed, Dec 12, 2007 at 03:52:17PM -0500, Jeff Squyres wrote:
> On Dec 12, 2007, at 3:20 PM, Gleb Natapov wrote:
>
> >> How about making a tarball with this patch in it that can be thrown
> >> at
> >> everyone's MTT? (we can put the tarball on www.open-mpi.org
> >> somewhere)
> > I don't have
On Dec 12, 2007, at 3:20 PM, Gleb Natapov wrote:
How about making a tarball with this patch in it that can be thrown
at
everyone's MTT? (we can put the tarball on www.open-mpi.org
somewhere)
I don't have access to www.open-mpi.org, but I can send you the patch.
I can send you a tarball too,
On Wed, Dec 12, 2007 at 11:57:11AM -0500, Jeff Squyres wrote:
> Gleb --
>
> How about making a tarball with this patch in it that can be thrown at
> everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere)
I don't have access to www.open-mpi.org, but I can send you the patch.
I
Gleb --
How about making a tarball with this patch in it that can be thrown at
everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere)
On Dec 11, 2007, at 4:14 PM, Richard Graham wrote:
I will re-iterate my concern. The code that is there now is mostly
nine
years old
I will re-iterate my concern. The code that is there now is mostly nine
years old (with some mods made when it was brought over to Open MPI). It
took about 2 months of testing on systems with 5-13 way network parallelism
to track down all KNOWN race conditions. This code is at the center of MPI
On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote:
> Try UD, frags are reordered at a very high rate so should be a good test.
mpi-ping works fine with UD BTL and the patch.
>
> Andrew
>
> Richard Graham wrote:
> > Gleb,
> > I would suggest that before this is checked in this be
Possibly, though I have results from a benchmark I've written indicating
the reordering happens at the sender. I believe I found it was due to
the QP striping trick I use to get more bandwidth -- if you back down to
one QP (there's a define in the code you can change), the reordering
rate
Try UD, frags are reordered at a very high rate so should be a good test.
Andrew
Richard Graham wrote:
Gleb,
I would suggest that before this is checked in this be tested on a system
that has N-way network parallelism, where N is as large as you can find.
This is a key bit of code for MPI
Gleb,
I would suggest that before this is checked in this be tested on a system
that has N-way network parallelism, where N is as large as you can find.
This is a key bit of code for MPI correctness, and out-of-order operations
will break it, so you want to maximize the chance for such
On Tue, 11 Dec 2007, Gleb Natapov wrote:
I did a rewrite of matching code in OB1. I made it much simpler and 2
times smaller (which is good, less code - less bugs). I also got rid
of huge macros - very helpful if you need to debug something. There
is no performance degradation, actually I
Hi,
I did a rewrite of matching code in OB1. I made it much simpler and 2
times smaller (which is good, less code - less bugs). I also got rid
of huge macros - very helpful if you need to debug something. There
is no performance degradation, actually I even see very small performance
23 matches
Mail list logo