Re: [OMPI devel] OMPI devel] Pull requests on the trunk

2014-11-06 Thread Ralph Castain
Yeah - to be clear, I had no problem with anything you did, Gilles. I was only noting that several of them had positive comments, but they weren’t being merged. Hate to see the good work lost or forgotten :-) > On Nov 6, 2014, at 5:29 PM, Jeff Squyres (jsquyres) > wrote:

[hwloc-devel] Create success (hwloc git dev-266-g88e6e89)

2014-11-06 Thread MPI Team
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc dev-266-g88e6e89 Start time: Thu Nov 6 21:01:01 EST 2014 End time: Thu Nov 6 21:02:47 EST 2014 Your friendly daemon, Cyrador

Re: [OMPI devel] OMPI devel] Pull requests on the trunk

2014-11-06 Thread Jeff Squyres (jsquyres)
Actually, I like the PRs; I like the nice github tools for commenting and discussing. I'm sorry I haven't followed up on the two you filed for me yet. :-( On Nov 6, 2014, at 8:23 PM, Gilles Gouaillardet wrote: > My bad (mostly) > > I made quite a lot of PR

Re: [OMPI devel] OMPI devel] Pull requests on the trunk

2014-11-06 Thread Gilles Gouaillardet
My bad (mostly) I made quite a lot of PR to get some review before commiting to the master, and did not follow up in a timely manner. I closed two obsoletes PR today. #245 should be ready for prime time. #227 too unless George has an objection. I asked Jeff to review #232 and #228 because

Re: [OMPI devel] Pull requests on the trunk

2014-11-06 Thread Jeff Squyres (jsquyres)
On Nov 6, 2014, at 6:21 PM, Ralph Castain wrote: > I agree - I sent the note because I see people doing things a bit differently > than expected. I have no issue with PRs for things where people want extra > eyes on something before committing, or as part of an RFC. Just

Re: [OMPI devel] Pull requests on the trunk

2014-11-06 Thread Ralph Castain
I agree - I sent the note because I see people doing things a bit differently than expected. I have no issue with PRs for things where people want extra eyes on something before committing, or as part of an RFC. Just want to ensure folks aren’t letting them languish expecting some kind of

Re: [OMPI devel] Pull requests on the trunk

2014-11-06 Thread Howard Pritchard
HI Ralph, We should discuss this on Tuesday. I thought we'd decided for master to use a model where developers would directly push to ompi/master. I'd be willing to pull the request from Giles marked as bugs tomorrow. Howard 2014-11-06 13:16 GMT-07:00 Ralph Castain : >

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Nathan Hjelm
Looks like put and get functions should be added if possible. The MTL layer looks like it is designed for two-sided only with no intention of supporting one-sided. -Nathan On Thu, Nov 06, 2014 at 03:21:32PM -0700, Nathan Hjelm wrote: > > Great! We should probably try to figure out how the mtl

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Nathan Hjelm
Great! We should probably try to figure out how the mtl layer can be modified to expose those atomics. If possible this should be done before the 1.9 branch to ensure the feature is available in the next release series. -Nathan On Thu, Nov 06, 2014 at 05:15:30PM -0500, Joshua Ladd wrote: >

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Joshua Ladd
MXM supports atomics. On Thursday, November 6, 2014, Nathan Hjelm wrote: > I haven't look at that yet. Would be great to get the new osc component > working over both btls and mtls. I know portals supports atomics but I > don't know whether psm does. > > -Nathan > > On Thu,

Re: [OMPI devel] MTT diligence

2014-11-06 Thread Ralph Castain
> On Nov 6, 2014, at 1:51 PM, Jeff Squyres (jsquyres) > wrote: > > On Nov 6, 2014, at 4:06 PM, Joshua Ladd wrote: > >> Once again, many thanks to Alina for discovering and reporting this. Keep up >> the MTT vigilance! > > (this is worthy of its

Re: [OMPI devel] osu_mbw_mr error

2014-11-06 Thread Ralph Castain
> On Nov 6, 2014, at 1:39 PM, Nathan Hjelm wrote: > > On Thu, Nov 06, 2014 at 04:29:44PM -0500, Joshua Ladd wrote: >> On Thursday, November 6, 2014, Nathan Hjelm wrote: >> >> On Thu, Nov 06, 2014 at 04:06:23PM -0500, Joshua Ladd wrote: >>> Nathan, >>>

Re: [OMPI devel] thread-tests hang

2014-11-06 Thread Ralph Castain
FWIW: I’m not planning on releasing tomorrow as we aren’t ready. We aren’t releasing with a bug as bad as threading on by default as we know we can’t really support that situation. Nothing sacred about the release date - it’s just a target. Frankly, I would even listen to the argument of

[OMPI devel] MTT diligence

2014-11-06 Thread Jeff Squyres (jsquyres)
On Nov 6, 2014, at 4:06 PM, Joshua Ladd wrote: > Once again, many thanks to Alina for discovering and reporting this. Keep up > the MTT vigilance! (this is worthy of its own thread) +100 MTT vigilance is a tough job; many thanks for submitting good bug reports on what

Re: [OMPI devel] osu_mbw_mr error

2014-11-06 Thread Nathan Hjelm
On Thu, Nov 06, 2014 at 04:29:44PM -0500, Joshua Ladd wrote: >On Thursday, November 6, 2014, Nathan Hjelm wrote: > > On Thu, Nov 06, 2014 at 04:06:23PM -0500, Joshua Ladd wrote: > >Nathan, > >Has this bug always been present in OpenIB or is this a

Re: [OMPI devel] osu_mbw_mr error

2014-11-06 Thread Joshua Ladd
On Thursday, November 6, 2014, Nathan Hjelm wrote: > On Thu, Nov 06, 2014 at 04:06:23PM -0500, Joshua Ladd wrote: > >Nathan, > >Has this bug always been present in OpenIB or is this a recent > addition? > >If this is regression, I would also be inclined to say that

Re: [OMPI devel] osu_mbw_mr error

2014-11-06 Thread Nathan Hjelm
On Thu, Nov 06, 2014 at 04:06:23PM -0500, Joshua Ladd wrote: >Nathan, >Has this bug always been present in OpenIB or is this a recent addition? >If this is regression, I would also be inclined to say that this is a The bug is as old as the message coalescing feature in the openib btl.

[OMPI devel] osu_mbw_mr error

2014-11-06 Thread Joshua Ladd
Nathan, Has this bug always been present in OpenIB or is this a recent addition? If this is regression, I would also be inclined to say that this is a blocker for 1.8.4. This is a SIGNIFICANT bug. Both Howard and I were quite surprised that all the while this code has been in use at LANL in

Re: [OMPI devel] thread-tests hang

2014-11-06 Thread Jeff Squyres (jsquyres)
On Nov 6, 2014, at 3:39 PM, Joshua Ladd wrote: > Thank you for taking the time to investigate this, Jeff. SC is a hectic and > stressful time for everyone on this list with many deadlines looming. This > bug isn't a priority for us, however, it seems to me that your

Re: [OMPI devel] thread-tests hang

2014-11-06 Thread Joshua Ladd
Thank you for taking the time to investigate this, Jeff. SC is a hectic and stressful time for everyone on this list with many deadlines looming. This bug isn't a priority for us, however, it seems to me that your original revert, the one that simply wants to disable threading by default (and for

Re: [OMPI devel] Pull requests on the trunk

2014-11-06 Thread Jeff Squyres (jsquyres)
I have 2, namely #228 (Fix --with-fortran=... logic) and #232 (RFC/weak symbols status ignore). I will look at them eventually, there just haven't been enough hours in the day yet, especially with SC coming up. :-( On Nov 6, 2014, at 3:16 PM, Ralph Castain wrote: >

[OMPI devel] Fwd: [ompi] move down jobid, vpid from ORTE to OPAL layer (#249)

2014-11-06 Thread Ralph Castain
I suppose it’s too much to ask, but can we turn this thing “off” until you get it fixed? Maybe you could test it posting to yourself in the meantime? > Begin forwarded message: > > Date: November 6, 2014 at 12:17:48 PM PST > From: mellanox-github > Reply-To:

[OMPI devel] Pull requests on the trunk

2014-11-06 Thread Ralph Castain
Hey folks We seem to be creating a bunch of pull requests on the trunk (well, by “we” I mean mostly Gilles) that are then being left hanging there, going stale. Some of these are going to start conflicting with changes being made by others, or even conflict with each other. Can we do a

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Nathan Hjelm
Not handling the multi-rail case at this point. Only issue atomics and rdma operations over a single btl module (which should be a single HCA). -Nathan On Thu, Nov 06, 2014 at 12:15:13PM -0700, Howard Pritchard wrote: >HI Nathan, >How would you get things right with atomics and

Re: [OMPI devel] thread-tests hang

2014-11-06 Thread Jeff Squyres (jsquyres)
This thread digressed significantly from the original bug report; I did not realize that the discussion was revolving around the fact that MPI_THREAD_MULTIPLE no longer works *at all*. So here's where we are: 1. MPI_THREAD_MULTIPLE doesn't work, even if you --enable-mpi-thread-multiple 2. It

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Howard Pritchard
HI Nathan, How would you get things right with atomics and multirail? Getting the memory consistency right would be really difficult. You'd have to keep issuing zero length rdma reads and hoping that that would have the effect of a pci-e flush in the case of multiple updates to a given target

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Nathan Hjelm
I haven't look at that yet. Would be great to get the new osc component working over both btls and mtls. I know portals supports atomics but I don't know whether psm does. -Nathan On Thu, Nov 06, 2014 at 08:45:15PM +0200, Mike Dubman wrote: >btw, do you plan to add atomics API to MTL layer

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Mike Dubman
btw, do you plan to add atomics API to MTL layer as well? On Thu, Nov 6, 2014 at 5:23 PM, Nathan Hjelm wrote: > At the moment I select the lowest latency BTL that can reach all of the > ranks in the communicator used to create the window. I can add code to > round-robin

Re: [OMPI devel] Prepping for 1.8.4 release

2014-11-06 Thread Ralph Castain
Yeah, my bad - somehow, it showed up on the github pull request list for ompi-release. I’ll remove it. > On Nov 6, 2014, at 9:19 AM, Joshua Ladd wrote: > > We filed an RFC for the trunk at Jeff's request. This is a new feature. > > > Josh > > On Thu, Nov 6, 2014 at

Re: [OMPI devel] Prepping for 1.8.4 release

2014-11-06 Thread Joshua Ladd
We filed an RFC for the trunk at Jeff's request. This is a new feature. Josh On Thu, Nov 6, 2014 at 12:13 PM, Joshua Ladd wrote: > Yalla is only in trunk. Unless you want us to push it to 1.8.4 - we won't > object :) > > Josh > > On Thu, Nov 6, 2014 at 11:46 AM, Ralph

Re: [OMPI devel] Prepping for 1.8.4 release

2014-11-06 Thread Joshua Ladd
Yalla is only in trunk. Unless you want us to push it to 1.8.4 - we won't object :) Josh On Thu, Nov 6, 2014 at 11:46 AM, Ralph Castain wrote: > Hey folks > > Here is the NEWS I have for 1.8.4 so far - please respond with any > additions/mods you would like to suggest >

[OMPI devel] Prepping for 1.8.4 release

2014-11-06 Thread Ralph Castain
Hey folks Here is the NEWS I have for 1.8.4 so far - please respond with any additions/mods you would like to suggest +1.8.4 +- +- Removed inadvertent change that set --enable-mpi-thread-multiple "on" + by default, thus impacting performance for non-threaded apps +- Significantly reduced

Re: [OMPI devel] mpirun does not honor rankfile

2014-11-06 Thread Ralph Castain
IIRC, you prefix the core number with a P to indicate physical I’ll see what I can do about getting the physical notation re-implemented - just can’t promise when that will happen > On Nov 6, 2014, at 8:30 AM, Tom Wurgler wrote: > > Well, unless we can get LSF to use

Re: [OMPI devel] mpirun does not honor rankfile

2014-11-06 Thread Tom Wurgler
Well, unless we can get LSF to use physical numbering, we are dead in the water without a translator of some sort. We are trying to figure how we can automate the translation in the meantime, but we have a mix of clusters and the mapping is different between them. We daily use 1.6.4 openmpi

Re: [OMPI devel] simple_spawn test fails using different set of btls.

2014-11-06 Thread Dave Goodell (dgoodell)
On Nov 6, 2014, at 12:44 AM, George Bosilca wrote: > PS: Sorry Dave I also pushed a master branch merge ... It's not the end of the world, just try to keep an eye on it and avoid doing it in the future. If you need any help avoiding it, feel free to ping me or the devel@

Re: [OMPI devel] mpirun does not honor rankfile

2014-11-06 Thread Ralph Castain
Ugh….we used to have a switch for that purpose, but it became hard to manage the code. I could reimplement at some point, but it won’t be in the immediate future. I gather the issue is that the system tools report physical numbering, and so you have to mentally translate to create the

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Nathan Hjelm
At the moment I select the lowest latency BTL that can reach all of the ranks in the communicator used to create the window. I can add code to round-robin windows over the available BTLs on multi-rail systems. -Nathan On Wed, Nov 05, 2014 at 06:38:25PM -0800, Paul Hargrove wrote: >All

Re: [OMPI devel] mpirun does not honor rankfile

2014-11-06 Thread Tom Wurgler
So we used lstopo with a arg of "--logical" and the output showed the core numbering 0,1,2,3...47 instead of 0,4,8,12 etc. The multiplying by 4 you speak of falls apart when you get to the second socket as its physical numbers are 1,5,9,13... and its logical is 12,13,14,15 So the

Re: [OMPI devel] simple_spawn test fails using different set of btls.

2014-11-06 Thread Elena Elkina
Thanks! It fixes the problem with tcp. Best regards, Elena On Thu, Nov 6, 2014 at 10:44 AM, George Bosilca wrote: > I pushed a slightly better patch for the TCP BTL > (54ddb0aece0892dcdb1a1293a3bd3902b5f3acdc). The correct scheme would be to > OBJ_RETAIN the proc once it

Re: [OMPI devel] simple_spawn test fails using different set of btls.

2014-11-06 Thread George Bosilca
I pushed a slightly better patch for the TCP BTL (54ddb0aece0892dcdb1a1293a3bd3902b5f3acdc). The correct scheme would be to OBJ_RETAIN the proc once it is attached to the btl_proc and release it upon destruction of the btl_proc. However, for some obscure reason this doesn't quite works, as the

Re: [OMPI devel] simple_spawn test fails using different set of btls.

2014-11-06 Thread Gilles Gouaillardet
Ralph, i updated the MODEX flag to PMIX_GLOBAL https://github.com/open-mpi/ompi/commit/d542c9ff2dc57ca5d260d0578fd5c1c556c598c7 Elena, i was able to reproduce the issue (salloc -N 5 mpirun -np 2 is enough). i was "lucky" to reproduce the issue : it happened because one of node was misconfigured