Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Jeff Squyres (jsquyres)
Ah, gotcha. On Nov 4, 2014, at 5:41 PM, Steve Wise wrote: > Correct: I don't see the bug in the 1.8.4rc1 release. > > > On 11/4/2014 4:33 PM, Nathan Hjelm wrote: >> Looks like there is no issue in 1.8.4 except for the message coalescing >> bug. Ralph, Howard, and

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
Correct: I don't see the bug in the 1.8.4rc1 release. On 11/4/2014 4:33 PM, Nathan Hjelm wrote: Looks like there is no issue in 1.8.4 except for the message coalescing bug. Ralph, Howard, and I agree that disabling message coalescing for 1.8.4 is the safest way forward. We can back-port the

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
Looks like there is no issue in 1.8.4 except for the message coalescing bug. Ralph, Howard, and I agree that disabling message coalescing for 1.8.4 is the safest way forward. We can back-port the real fix for an eventual 1.8.5. Message rates no longer seem to care about message coalescing in the

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
There is one other bug fix to address the message coalescing bug. The rest is the BTL RDMA revamp. If there is a need I can probably pull those out and apply them to master sooner than SC. -Nathan On Tue, Nov 04, 2014 at 10:11:26PM +, Jeff Squyres (jsquyres) wrote: > It sounds like this

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Jeff Squyres (jsquyres)
That sounds fine, but I think Steve's point is that he is being bitten by this bug now, so it would probably be good to even include this one particular fix in 1.8.4. On Nov 4, 2014, at 5:24 PM, Nathan Hjelm wrote: > Going to put the RFC out today with a timeout of about 2

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
Going to put the RFC out today with a timeout of about 2 weeks. This will give me some time to talk with other Open MPI developers face-to-face at SC14. If the RFC fails I will still bring that and a couple of other fixes into the master. -Nathan On Tue, Nov 04, 2014 at 04:06:45PM -0600, Steve

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Jeff Squyres (jsquyres)
It sounds like this fix should be merged in soon. Nathan: are your other changes bug fixes, or part of your BTL revamp branch? On Nov 4, 2014, at 5:06 PM, Steve Wise wrote: > Ok, sounds like I should let you continue the good work! :) When do you plan > to merge

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
Ok, sounds like I should let you continue the good work! :) When do you plan to merge this into ompi proper? On 11/4/2014 3:58 PM, Nathan Hjelm wrote: That certainly addresses part of the problem. I am working on a complete revamp of the btl RDMA interface. It contains this fix:

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
That certainly addresses part of the problem. I am working on a complete revamp of the btl RDMA interface. It contains this fix: https://github.com/hjelmn/ompi/commit/66fa429e306beb9fca59da0a4554e9b98d788316 -Nathan On Tue, Nov 04, 2014 at 03:27:23PM -0600, Steve Wise wrote: > I found the bug.

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
I'll issue a pull request for this and the other change I"m making. On 11/4/2014 3:27 PM, Steve Wise wrote: I found the bug. Here is the fix: [root@stevo1 openib]# git diff diff --git a/opal/mca/btl/openib/btl_openib_component.c b/opal/mca/btl/openib/btl_openib_component.c index

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
I found the bug. Here is the fix: [root@stevo1 openib]# git diff diff --git a/opal/mca/btl/openib/btl_openib_component.c b/opal/mca/btl/openib/btl_openib_component.c index d876e21..8a5ea82 100644 --- a/opal/mca/btl/openib/btl_openib_component.c +++ b/opal/mca/btl/openib/btl_openib_component.c

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
I have run into the issue as well. I will open a pull request for 1.8.4 as part of a patch fixing the coalescing issues. -Nathan On Tue, Nov 04, 2014 at 02:50:30PM -0600, Steve Wise wrote: > On 11/4/2014 2:09 PM, Steve Wise wrote: > >Hi, > > > >I'm running ompi top-o-tree from github and seeing

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
On 11/4/2014 2:09 PM, Steve Wise wrote: Hi, I'm running ompi top-o-tree from github and seeing an openib btl issue where the qp/srq configuration is incorrect for the given device id. This works fine in 1.8.4rc1, but I see the problem in top-of-tree. A simple 2 node IMB-MPI1 pingpong fails