Re: [OMPI devel] [1.8.4rc1] REGRESSION on Solaris-11/x86 with two subnets

2014-11-04 Thread Gilles Gouaillardet
Ralph, FYI, here is attached the patch i am working on (still testing ...) aa207ad2f3de5b649e5439d06dca90d86f5a82c2 should be reverted then. Cheers, Gilles On 2014/11/04 13:56, Paul Hargrove wrote: > Ralph, > > You will see from the message I sent a moment ago that -D_REENTRANT on > Solaris

Re: [OMPI devel] [1.8.4rc1] REGRESSION on Solaris-11/x86 with two subnets

2014-11-04 Thread Ralph Castain
Curious - why put it under condition of pthread config? I just added it to the “if solaris” section - i.e., add the flag if we are under solaris, regardless of someone asking for thread support. Since we require that libevent be thread-enabled, it seemed safer to always ensure those flags are

Re: [OMPI devel] [1.8.4rc1] REGRESSION on Solaris-11/x86 with two subnets

2014-11-04 Thread Gilles Gouaillardet
That works too since pthread is mandatory now (i previously made a RFC and removing the --with-threads configure option is in my todo list) On 2014/11/04 14:10, Ralph Castain wrote: > Curious - why put it under condition of pthread config? I just added it to > the "if solaris" section - i.e.,

Re: [OMPI devel] [1.8.4rc1] REGRESSION on Solaris-11/x86 with two subnets

2014-11-04 Thread Ralph Castain
Ah, okay - thanks for clarifying that! > On Nov 3, 2014, at 9:12 PM, Gilles Gouaillardet > wrote: > > That works too since pthread is mandatory now > (i previously made a RFC and removing the --with-threads configure option is > in my todo list) > > On

Re: [OMPI devel] osu_mbw_mr error

2014-11-04 Thread Joshua Ladd
Thanks, Nathan. After a bit more investigation yesterday, this was our conclusion too; that it is a longstanding bug in OpenIB BTL we just happened to start triggering the broken flow with some recent changes made to the default max_lmc parameter. Let us know if you need anything from our end.

Re: [OMPI devel] OMPI 1.8.4rc1 issues

2014-11-04 Thread Gilles Gouaillardet
Ralph, On 2014/11/04 1:54, Ralph Castain wrote: > Hi folks > > Looking at the over-the-weekend MTT reports plus at least one comment on the > list, we have the following issues to address: > > * many-to-one continues to fail. Shall I just assume this is an unfixable > problem or a bad test and

[OMPI devel] Request for a Open MPI SotU BoF slot for VampirTrace

2014-11-04 Thread Bert Wesarg
All, the TU Dresden would like to talk a little bit about the current state of VampirTrace in Open MPI, its successor Score-P [1] and the future of the collaboration at the SC'14 BoF. I think a 5min talk to present the basic idea for Score-P project would be great to have, following an open

Re: [OMPI devel] OMPI 1.8.4rc1 issues

2014-11-04 Thread Ralph Castain
> On Nov 4, 2014, at 12:44 AM, Gilles Gouaillardet > wrote: > > Ralph, > > On 2014/11/04 1:54, Ralph Castain wrote: >> Hi folks >> >> Looking at the over-the-weekend MTT reports plus at least one comment on the >> list, we have the following issues to address:

[OMPI devel] thread-tests hang

2014-11-04 Thread Alina Sklarevich
Hi, We observe a hang when running the multi-threading support test "latency.c" (attached to this report), which uses MPI_THREAD_MULTIPLE. The hang happens immediately at the begining of the test and is reproduced in the v1.8 release branch. The command line to reproduce the behavior is: $

[OMPI devel] OpenMPI Developers Face to Face Q1 2015 poll

2014-11-04 Thread Howard Pritchard
Hi OMPI folks, We're planning to hold another developers face to face in Q1 2015. Currently, we're thinking of holding the face to face either the last week of January, or one of the first two weeks of February. The format will be similar to the previous f2f in Chicago - start on Monday afternoon

Re: [OMPI devel] thread-tests hang

2014-11-04 Thread Ralph Castain
That would be correct - we restored some configure flags that are required to make multi-thread programs work. Jeff can probably provide more info. > On Nov 4, 2014, at 9:15 AM, Alina Sklarevich > wrote: > > Hi, > > We observe a hang when running the

[OMPI devel] Open MPI Developers Face to Face Q1 2015 (updated doodle poll link)

2014-11-04 Thread Howard Pritchard
Hi Folks, Per request to have a yes/yesifneedbe/no poll, and limitation of doodle to change options, a new doodle poll for deciding on the date for the next developers f2f is at: https://doodle.com/zzaupgxge9y6medu There is also a wiki page for the meeting:

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
On 11/4/2014 2:09 PM, Steve Wise wrote: Hi, I'm running ompi top-o-tree from github and seeing an openib btl issue where the qp/srq configuration is incorrect for the given device id. This works fine in 1.8.4rc1, but I see the problem in top-of-tree. A simple 2 node IMB-MPI1 pingpong fails

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
I have run into the issue as well. I will open a pull request for 1.8.4 as part of a patch fixing the coalescing issues. -Nathan On Tue, Nov 04, 2014 at 02:50:30PM -0600, Steve Wise wrote: > On 11/4/2014 2:09 PM, Steve Wise wrote: > >Hi, > > > >I'm running ompi top-o-tree from github and seeing

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
I found the bug. Here is the fix: [root@stevo1 openib]# git diff diff --git a/opal/mca/btl/openib/btl_openib_component.c b/opal/mca/btl/openib/btl_openib_component.c index d876e21..8a5ea82 100644 --- a/opal/mca/btl/openib/btl_openib_component.c +++ b/opal/mca/btl/openib/btl_openib_component.c

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
I'll issue a pull request for this and the other change I"m making. On 11/4/2014 3:27 PM, Steve Wise wrote: I found the bug. Here is the fix: [root@stevo1 openib]# git diff diff --git a/opal/mca/btl/openib/btl_openib_component.c b/opal/mca/btl/openib/btl_openib_component.c index

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
That certainly addresses part of the problem. I am working on a complete revamp of the btl RDMA interface. It contains this fix: https://github.com/hjelmn/ompi/commit/66fa429e306beb9fca59da0a4554e9b98d788316 -Nathan On Tue, Nov 04, 2014 at 03:27:23PM -0600, Steve Wise wrote: > I found the bug.

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
Ok, sounds like I should let you continue the good work! :) When do you plan to merge this into ompi proper? On 11/4/2014 3:58 PM, Nathan Hjelm wrote: That certainly addresses part of the problem. I am working on a complete revamp of the btl RDMA interface. It contains this fix:

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Jeff Squyres (jsquyres)
It sounds like this fix should be merged in soon. Nathan: are your other changes bug fixes, or part of your BTL revamp branch? On Nov 4, 2014, at 5:06 PM, Steve Wise wrote: > Ok, sounds like I should let you continue the good work! :) When do you plan > to merge

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
Going to put the RFC out today with a timeout of about 2 weeks. This will give me some time to talk with other Open MPI developers face-to-face at SC14. If the RFC fails I will still bring that and a couple of other fixes into the master. -Nathan On Tue, Nov 04, 2014 at 04:06:45PM -0600, Steve

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Jeff Squyres (jsquyres)
That sounds fine, but I think Steve's point is that he is being bitten by this bug now, so it would probably be good to even include this one particular fix in 1.8.4. On Nov 4, 2014, at 5:24 PM, Nathan Hjelm wrote: > Going to put the RFC out today with a timeout of about 2

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
There is one other bug fix to address the message coalescing bug. The rest is the BTL RDMA revamp. If there is a need I can probably pull those out and apply them to master sooner than SC. -Nathan On Tue, Nov 04, 2014 at 10:11:26PM +, Jeff Squyres (jsquyres) wrote: > It sounds like this

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
Looks like there is no issue in 1.8.4 except for the message coalescing bug. Ralph, Howard, and I agree that disabling message coalescing for 1.8.4 is the safest way forward. We can back-port the real fix for an eventual 1.8.5. Message rates no longer seem to care about message coalescing in the

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
Correct: I don't see the bug in the 1.8.4rc1 release. On 11/4/2014 4:33 PM, Nathan Hjelm wrote: Looks like there is no issue in 1.8.4 except for the message coalescing bug. Ralph, Howard, and I agree that disabling message coalescing for 1.8.4 is the safest way forward. We can back-port the

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Jeff Squyres (jsquyres)
Ah, gotcha. On Nov 4, 2014, at 5:41 PM, Steve Wise wrote: > Correct: I don't see the bug in the 1.8.4rc1 release. > > > On 11/4/2014 4:33 PM, Nathan Hjelm wrote: >> Looks like there is no issue in 1.8.4 except for the message coalescing >> bug. Ralph, Howard, and

Re: [hwloc-devel] upcoming feature removal

2014-11-04 Thread Jeff Squyres (jsquyres)
On Nov 3, 2014, at 5:49 AM, Brice Goglin wrote: > * don't put I/O objects in "normal" children since it confuses programs > consulting the children list. rather place them under a dedicated child > pointer special objects such as Misc may go there as well. If you're going

Re: [hwloc-devel] upcoming feature removal

2014-11-04 Thread Brice Goglin
Le 04/11/2014 14:59, Jeff Squyres (jsquyres) a écrit : > If you're going to separate them from the normal "children", then how about > naming them for what they are? E.g.: > > - pe_children > - io_children > - misc_children OK but what does "pe" stand for? did you mean "pu" to match our PU

Re: [hwloc-devel] upcoming feature removal

2014-11-04 Thread Jeff Squyres (jsquyres)
On Nov 4, 2014, at 9:10 AM, Brice Goglin wrote: >> - pu_children >> - io_children >> - misc_children > > OK but what does "pe" stand for? did you mean "pu" to match our PU objects? I said "pu_children". Really. I didn't edit the text above and make it look like I