Ah, gotcha.
On Nov 4, 2014, at 5:41 PM, Steve Wise wrote:
> Correct: I don't see the bug in the 1.8.4rc1 release.
>
>
> On 11/4/2014 4:33 PM, Nathan Hjelm wrote:
>> Looks like there is no issue in 1.8.4 except for the message coalescing
>> bug. Ralph, Howard, and
Correct: I don't see the bug in the 1.8.4rc1 release.
On 11/4/2014 4:33 PM, Nathan Hjelm wrote:
Looks like there is no issue in 1.8.4 except for the message coalescing
bug. Ralph, Howard, and I agree that disabling message coalescing for
1.8.4 is the safest way forward. We can back-port the
Looks like there is no issue in 1.8.4 except for the message coalescing
bug. Ralph, Howard, and I agree that disabling message coalescing for
1.8.4 is the safest way forward. We can back-port the real fix for an
eventual 1.8.5. Message rates no longer seem to care about message
coalescing in the
There is one other bug fix to address the message coalescing bug. The
rest is the BTL RDMA revamp.
If there is a need I can probably pull those out and apply them to
master sooner than SC.
-Nathan
On Tue, Nov 04, 2014 at 10:11:26PM +, Jeff Squyres (jsquyres) wrote:
> It sounds like this
That sounds fine, but I think Steve's point is that he is being bitten by this
bug now, so it would probably be good to even include this one particular fix
in 1.8.4.
On Nov 4, 2014, at 5:24 PM, Nathan Hjelm wrote:
> Going to put the RFC out today with a timeout of about 2
Going to put the RFC out today with a timeout of about 2 weeks. This
will give me some time to talk with other Open MPI developers
face-to-face at SC14.
If the RFC fails I will still bring that and a couple of other fixes
into the master.
-Nathan
On Tue, Nov 04, 2014 at 04:06:45PM -0600, Steve
It sounds like this fix should be merged in soon.
Nathan: are your other changes bug fixes, or part of your BTL revamp branch?
On Nov 4, 2014, at 5:06 PM, Steve Wise wrote:
> Ok, sounds like I should let you continue the good work! :) When do you plan
> to merge
Ok, sounds like I should let you continue the good work! :) When do you
plan to merge this into ompi proper?
On 11/4/2014 3:58 PM, Nathan Hjelm wrote:
That certainly addresses part of the problem. I am working on a complete
revamp of the btl RDMA interface. It contains this fix:
That certainly addresses part of the problem. I am working on a complete
revamp of the btl RDMA interface. It contains this fix:
https://github.com/hjelmn/ompi/commit/66fa429e306beb9fca59da0a4554e9b98d788316
-Nathan
On Tue, Nov 04, 2014 at 03:27:23PM -0600, Steve Wise wrote:
> I found the bug.
I'll issue a pull request for this and the other change I"m making.
On 11/4/2014 3:27 PM, Steve Wise wrote:
I found the bug. Here is the fix:
[root@stevo1 openib]# git diff
diff --git a/opal/mca/btl/openib/btl_openib_component.c
b/opal/mca/btl/openib/btl_openib_component.c
index
I found the bug. Here is the fix:
[root@stevo1 openib]# git diff
diff --git a/opal/mca/btl/openib/btl_openib_component.c
b/opal/mca/btl/openib/btl_openib_component.c
index d876e21..8a5ea82 100644
--- a/opal/mca/btl/openib/btl_openib_component.c
+++ b/opal/mca/btl/openib/btl_openib_component.c
I have run into the issue as well. I will open a pull request for 1.8.4
as part of a patch fixing the coalescing issues.
-Nathan
On Tue, Nov 04, 2014 at 02:50:30PM -0600, Steve Wise wrote:
> On 11/4/2014 2:09 PM, Steve Wise wrote:
> >Hi,
> >
> >I'm running ompi top-o-tree from github and seeing
On 11/4/2014 2:09 PM, Steve Wise wrote:
Hi,
I'm running ompi top-o-tree from github and seeing an openib btl issue
where the qp/srq configuration is incorrect for the given device id.
This works fine in 1.8.4rc1, but I see the problem in top-of-tree. A
simple 2 node IMB-MPI1 pingpong fails
13 matches
Mail list logo