Re: [OMPI devel] OPAL_PMIX_NODEID is not set by orted

2016-08-11 Thread r...@open-mpi.org
I’m working on providing the info, guys - just sitting in a branch right now. Too many meetings...sigh. > On Aug 11, 2016, at 10:09 AM, George Bosilca wrote: > > I just pushed a solution to this problem in 8d0baf140f. If we are unable to > extract the expected information from the RTE, we sim

Re: [OMPI devel] OPAL_PMIX_NODEID is not set by orted

2016-08-12 Thread r...@open-mpi.org
Fixed in https://github.com/open-mpi/ompi/pull/1959 > On Aug 11, 2016, at 6:23 PM, Gilles Gouaillardet wrote: > > Thanks George, > > > fwiw, note the current behavior is a bit more "twisted" than that. > > OPAL_MODEX_RECV_VALUE() returns successfully (e.g. err == OPAL_SUCCESS) but > the OPA

[OMPI devel] Coll/sync component missing???

2016-08-19 Thread r...@open-mpi.org
Hi folks I had a question arise regarding a problem being seen by an OMPI user - has to do with the old bugaboo I originally dealt with back in my LANL days. The problem is with an app that repeatedly hammers on a collective, and gets overwhelmed by unexpected messages when one of the procs fal

Re: [OMPI devel] Coll/sync component missing???

2016-08-19 Thread r...@open-mpi.org
legacy codes that don’t want to refactor their algorithms. > On Aug 19, 2016, at 8:48 PM, Nathan Hjelm wrote: > >> On Aug 19, 2016, at 4:24 PM, r...@open-mpi.org wrote: >> >> Hi folks >> >> I had a question arise regarding a problem being seen by an OM

[OMPI devel] Warnings on master

2016-08-20 Thread r...@open-mpi.org
We seem to have gotten into a state again of generating a ton of warnings on master - can folks take a look at these and clean them up? opal_datatype_pack.c: In function ‘pack_predefined_heterogeneous’: opal_datatype_pack.c:421:24: warning: variable ‘_l_blength’ set but not used [-Wunused-but-se

Re: [OMPI devel] Coll/sync component missing???

2016-08-20 Thread r...@open-mpi.org
exacerbate > the issue. However, doing a loop around a small MPI_Send will also end on a > memory exhaustion issue, one that would not be easily circumvented by adding > synchronizations deep inside the library. > > George. > > > On Sat, Aug 20, 2016 at 12:30 AM

Re: [OMPI devel] Coll/sync component missing???

2016-08-20 Thread r...@open-mpi.org
des a fix or workaround, the end user will be left > with some important load imbalance, which is far from being optimal from > his/her performance point of view. > > > Cheers, > > Gilles > > On Sunday, August 21, 2016, r...@open-mpi.org <mailto:r...@open-mpi.org>

Re: [OMPI devel] Coll/sync component missing???

2016-08-22 Thread r...@open-mpi.org
Send will also end on a >> memory exhaustion issue, one that would not be easily circumvented by adding >> synchronizations deep inside the library. >> >> George. >> >> >> On Sat, Aug 20, 2016 at 12:30 AM, r...@open-mpi.org >> <mailto:r...@open-

[OMPI devel] OMPI v2.0.1rc1 available for test

2016-08-22 Thread r...@open-mpi.org
Hello folks Dunno where the head-honcho’s are hiding, but per their request: the newest v2.0.1 release candidate has been posted in the usual place: https://www.open-mpi.org/software/ompi/v2.0/ Beat it up, please! Ralph 2.0.1 -- 23 August 2016 --- Bug fixes/minor improveme

Re: [OMPI devel] stdin issue with master

2016-08-22 Thread r...@open-mpi.org
Yeah, I started working on it earlier this evening - will look some more tomorrow > On Aug 22, 2016, at 7:57 PM, Gilles Gouaillardet wrote: > > Folks, > > > i made a trivial test > > > echo hello | mpirun -np 1 cat > > > and with v2.x and v1.10, the output is "hello" as expected > > but

Re: [OMPI devel] stdin issue with master

2016-08-22 Thread r...@open-mpi.org
Fixed in 9210230 > On Aug 22, 2016, at 8:49 PM, r...@open-mpi.org wrote: > > Yeah, I started working on it earlier this evening - will look some more > tomorrow > >> On Aug 22, 2016, at 7:57 PM, Gilles Gouaillardet wrote: >> >> Folks, >> >> &

Re: [OMPI devel] [2.0.1.rc1] runtime failure on MacOS 10.6

2016-08-22 Thread r...@open-mpi.org
Huh - I’ll take a look. Thanks! > On Aug 22, 2016, at 9:11 PM, Paul Hargrove wrote: > > On a Mac OSX 10.6 system: > > $ mpirun -mca btl sm,self -np 2 examples/ring_c' > dyld: lazy symbol binding failed: Symbol not found: _strnlen > Referenced from: > /Users/paul/OMPI/openmpi-2.0.1rc1-macos10

Re: [OMPI devel] [2.0.1.rc1] runtime failure on MacOS 10.6

2016-08-22 Thread r...@open-mpi.org
Hey Paul I just checked on my Mac and had no problem. However, I’m at 10.11, and so I’m wondering if the old 10.6 just doesn’t have strnlen on it? What compiler were you using? > On Aug 22, 2016, at 9:14 PM, r...@open-mpi.org wrote: > > Huh - I’ll take a look. Thanks! > >&g

Re: [OMPI devel] [2.0.1.rc1] runtime failure on MacOS 10.6

2016-08-22 Thread r...@open-mpi.org
in POSIX.1 and so might be on most any > Linux system regardless of age. > > -Paul > > On Mon, Aug 22, 2016 at 9:17 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wrote: > Hey Paul > > I just checked on my Mac and had no problem

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-23 Thread r...@open-mpi.org
Thanks Gilles! > On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet > wrote: > > Thanks Paul, > > at first glance, something is going wrong in the sec module under solaris. > I will keep digging tomorrow > > Cheers, > > Gilles > > On Tuesday, August 23, 2016, Paul Hargrove

Re: [OMPI devel] [2.0.1.rc1] runtime failure on MacOS 10.6

2016-08-23 Thread r...@open-mpi.org
Actually, I found that we already dealt with this, but the version in the 2.0.1 branch didn’t include the update. I’ll see what else is missing and ask that it be brought across. Thanks Paul Ralph > On Aug 22, 2016, at 9:25 PM, r...@open-mpi.org wrote: > > Hmmm...okay. I guess we’l

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-23 Thread r...@open-mpi.org
native? > On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote: > > Thanks Gilles! > >> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet >> mailto:gilles.gouaillar...@gmail.com>> wrote: >> >> Thanks Paul, >> >> at first glance, some

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-23 Thread r...@open-mpi.org
Looks like Solaris has a “getupeercred” - can you take a look at it, Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native sec component. > On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote: > > I took a quick glance at this one, and the only way I can see to

Re: [OMPI devel] Binding with --oversubscribe in 2.0.0

2016-08-24 Thread r...@open-mpi.org
Well, that’s a new one! I imagine we could modify the logic to allow a combination of oversubscribe and overload flags. Won’t get out until 2.1, though you could pull the patch in advance if it is holding you up. > On Aug 23, 2016, at 11:46 PM, Ben Menadue wrote: > > Hi, > > One of our users

[OMPI devel] OMPI v1.10.4

2016-08-24 Thread r...@open-mpi.org
Hi folks Looks like we will need to release a 1.10.4 to pickup a desired patch, and so I plan to do so next week (week of Aug 29th). I know of one issue I’ve heard about - please let me know if there are any others, and/or if someone wants any additional changes in it. There are two outstanding

Re: [OMPI devel] Binding with --oversubscribe in 2.0.0

2016-08-24 Thread r...@open-mpi.org
who don’t have such kind scenarios, and don’t realize we are otherwise binding by default. So in your case, you’d want something like: mpirun --map-by core:oversubscribe --bind-to core:overload HTH Ralph > On Aug 24, 2016, at 7:33 AM, r...@open-mpi.org wrote: > > Well, that’s a n

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-24 Thread r...@open-mpi.org
raw/open-mpi/ompi-release/pull/1336.patch> > (note you need recent autotools in order to use it) > > > Cheers, > > > Gilles > > On 8/23/2016 10:40 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >> Looks like Solaris has a “getupeercred” - c

Re: [OMPI devel] Binding with --oversubscribe in 2.0.0

2016-08-24 Thread r...@open-mpi.org
d-allowed", which also works). > > Cheers, > Ben > > > -Original Message- > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of > r...@open-mpi.org > Sent: Thursday, 25 August 2016 2:03 AM > To: OpenMPI Devel > Subject: Re: [OMPI devel] Bind

Re: [OMPI devel] Binding with --oversubscribe in 2.0.0

2016-08-24 Thread r...@open-mpi.org
hanks for that... that option's not on the man page for mpirun, but I can > see it in the --help message (as "overload-allowed", which also works). > > Cheers, > Ben > > > -Original Message- > From: devel [mailto:devel-boun...@lists.op

Re: [OMPI devel] Binding with --oversubscribe in 2.0.0

2016-08-24 Thread r...@open-mpi.org
'Open MPI Developers' >> Subject: Re: [OMPI devel] Binding with --oversubscribe in 2.0.0 >> >> Hi Ralph, >> >> Thanks for that... that option's not on the man page for mpirun, but I can >> see it in the --help message (as "overload-allowed&qu

Re: [OMPI devel] C89 support

2016-08-29 Thread r...@open-mpi.org
I hadn’t realized we still have a --disable-c99 configure option - that sounds bad as we can’t possibly build that way. We need to internally perform the configure check, but we shouldn’t be exposing a configure option as that just confuses people into thinking it really is an option. > On Aug

Re: [OMPI devel] C89 support

2016-08-29 Thread r...@open-mpi.org
Just so people don’t spend a lot of time on this: as the release manager for the 1.10 series, you are going to have to provide me with a great deal of motivation to accept this proposed change. We ended C89 support way back in the 1.7 series, so reviving it here would really seem odd. I haven’t

Re: [OMPI devel] C89 support

2016-08-29 Thread r...@open-mpi.org
Just to clarify: we primarily use c99 features in our plugins as a means of directly specifying which functions are being implemented, and which are not. In c89, this can only be done by maintaining positional alignment - c99 allows us to do this using the function names. Thus, the c99 method is

Re: [OMPI devel] C89 support

2016-08-30 Thread r...@open-mpi.org
Chris For me, this is the critical point: > On Aug 29, 2016, at 9:50 PM, Gilles Gouaillardet wrote: > > iirc, we use C99 struct initialisers, so stricly speaking, i do not think > Open MPI can be built with a pure C89 compiler when configure'd > > with the --disable-c99 option. i'd rather im

Re: [OMPI devel] C89 support

2016-08-30 Thread r...@open-mpi.org
Chris At the risk of being annoying, it would really help me if you could answer my question: is Gilles correct in his feeling that we are looking at a scenario where you can support 90% of C99 (e.g., C99-style comments, named structure fields), and only the things modified in this PR are requi

[OMPI devel] PMIx shared memory dstore now off by default

2016-09-01 Thread r...@open-mpi.org
Hi folks In order to let some folks continue working on dynamic operations on the master, I have turned the PMIx shared memory data store support “off” by default for the embedded code. You can turn it “on” using the --enable-pmix3-dstore option. Once the dynamics support is functional, we wil

Re: [OMPI devel] Question about Open MPI bindings

2016-09-02 Thread r...@open-mpi.org
I’ll dig more later, but just checking offhand, I can’t replicate this on my box, so it may be something in hwloc for that box (or maybe you have some MCA params set somewhere?): $ mpirun -n 2 --bind-to core --report-bindings hostname [rhc001:83938] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:

Re: [OMPI devel] Question about Open MPI bindings

2016-09-03 Thread r...@open-mpi.org
Okay, can you add --display-devel-map --mca rmaps_base_verbose 10 to your cmd line? It sounds like there is something about that topo that is bothering the mapper > On Sep 2, 2016, at 9:31 PM, George Bosilca wrote: > > Thanks Gilles, that's a very useful trick. The bindings reported by ORTE ar

Re: [OMPI devel] Question about Open MPI bindings

2016-09-03 Thread r...@open-mpi.org
p://dancer.icl.utk.edu:17451/>] [[41198,0],0] > GOT 1 CPUS > [dancer.icl.utk.edu:17451 <http://dancer.icl.utk.edu:17451/>] [[41198,0],0] > PROC [[41198,1],2] BITMAP 1,9 > [dancer.icl.utk.edu:17451 <http://dancer.icl.utk.edu:17451/>] [[41198,0],0] > BOUND PROC [[41198,1],2][arc0

Re: [OMPI devel] OMPI devel] Question about Open MPI bindings

2016-09-03 Thread r...@open-mpi.org
ion ? > > Bottom line, you might have to set yet an other MCA param equivalent to the > --hetero-nodes option. > > Cheers, > > Gilles > > r...@open-mpi.org wrote: > Interesting - well, it looks like ORTE is working correctly. The map is what > you would expect,

[OMPI devel] Hanging tests

2016-09-05 Thread r...@open-mpi.org
Hey folks All of the tests that involve either ISsend_ator, SSend_ator, ISsend_rtoa, or SSend_rtoa are hanging on master and v2.x. Does anyone know what these tests do, and why we never seem to pass them? Do we care? Ralph ___ devel mailing list deve

Re: [OMPI devel] Hanging tests

2016-09-05 Thread r...@open-mpi.org
here > > Cheers, > > Gilles > > r...@open-mpi.org wrote: >> Hey folks >> >> All of the tests that involve either ISsend_ator, SSend_ator, ISsend_rtoa, >> or SSend_rtoa are hanging on master and v2.x. Does anyone know what these

Re: [OMPI devel] Question about Open MPI bindings

2016-09-05 Thread r...@open-mpi.org
nt the behavior you describe, then you simply tell ORTE to “--map-by core --bind-to core” > On Sep 5, 2016, at 11:05 AM, George Bosilca wrote: > > On Sat, Sep 3, 2016 at 10:34 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wrote: > Inter

Re: [OMPI devel] OMPI devel] Question about Open MPI bindings

2016-09-05 Thread r...@open-mpi.org
> PS: Is there an MCA parameter for "hetero-nodes” ? orte_hetero_nodes > > > On Sat, Sep 3, 2016 at 8:07 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>> wrote: > Ah, indeed - if the node where mpirun is executing doesn’t match the c

Re: [OMPI devel] Hanging tests

2016-09-06 Thread r...@open-mpi.org
e(&comm); > > MPI_Comm_dup(MPI_COMM_WORLD, &comm); > if (1 == rank) { > b = 0x; > MPI_Recv(&b, 1, MPI_INT, 0, 0, comm, MPI_STATUS_IGNORE); > if (0x != b) MPI_Abort(MPI_COMM_WORLD, 2); > } > MPI_Comm_free(&c

Re: [OMPI devel] link issue on master with --disable-shared --enable-static --disable-dlopen

2016-09-13 Thread r...@open-mpi.org
I should think we could pass the disable-pdl-open option downward - can’t see any reason why not. > On Sep 13, 2016, at 7:51 PM, Gilles Gouaillardet wrote: > > Folks, > > > i configure'd Open MPI with > > --disable-shared --enable-static --disable-dlopen > > and i can no longer link a simpl

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-14 Thread r...@open-mpi.org
This has nothing to do with PMIx, Josh - the error is coming out of the usock OOB component. > On Sep 14, 2016, at 7:17 AM, Joshua Ladd wrote: > > Eric, > > We are looking into the PMIx code path that sets up the jobid. The session > directories are created based on the jobid. It might be th

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-14 Thread r...@open-mpi.org
> is there any reason to use a session directory based on the jobid (or job > family) ? > I mean, could we use mkstemp to generate a unique directory, and then > propagate the path via orted comm or the environment ? > > Cheers, > > Gilles > > On Wednesday, Septe

Re: [OMPI devel] Lots of new features rolled out on github.com today

2016-09-14 Thread r...@open-mpi.org
> On Sep 14, 2016, at 11:37 AM, Jeff Squyres (jsquyres) > wrote: > > - Code reviews got better / more organized > - Some project management tools now available > - We can enforce the use of 2-factor authentication Please don’t do that... > > https://github.com/blog/2256-a-whole-new-github-un

Re: [OMPI devel] Lots of new features rolled out on github.com today

2016-09-14 Thread r...@open-mpi.org
I’d want to _fully_ understand the implications before forcing something on everyone that might prove burdensome, especially when it “solves” a currently non-existent problem > On Sep 14, 2016, at 11:43 AM, Jeff Squyres (jsquyres) > wrote: > > On Sep 14, 2016, at 2:40 PM, r...@

Re: [OMPI devel] Lots of new features rolled out on github.com today

2016-09-14 Thread r...@open-mpi.org
> > HPC-DES > Los Alamos National Laboratory > > > > > > On 9/14/16, 12:53 PM, "devel on behalf of Jeff Squyres (jsquyres)" > wrote: > >> Sure. There's no rush at all; in fact, this is probably a decent topic >> for our next face-to-face.

Re: [OMPI devel] toward a unique session directory

2016-09-14 Thread r...@open-mpi.org
away temp dirs. It isn’t the RM-based environment that is of concern - it’s the non-RM one where epilog scripts don’t exist that is the problem. > On Sep 14, 2016, at 6:05 PM, Gilles Gouaillardet wrote: > > Ralph, > > On 9/15/2016 12:11 AM, r...@open-mpi.org <mailto:r...@o

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-14 Thread r...@open-mpi.org
Nah, something isn’t right here. The singleton doesn’t go thru that code line, or it isn’t supposed to do so. I think the problem lies in the way the singleton in 2.x is starting up. Let me take a look at how singletons are working over there. > On Sep 14, 2016, at 8:10 PM, Gilles Gouaillardet

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-14 Thread r...@open-mpi.org
Ah...I take that back. We changed this and now we _do_ indeed go down that code path. Not good. So yes, we need that putenv so it gets the jobid from the HNP that was launched, like it used to do. You want to throw that in? Thanks Ralph > On Sep 14, 2016, at 8:18 PM, r...@open-mpi.org wr

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-14 Thread r...@open-mpi.org
what is causing the trouble. > On Sep 14, 2016, at 8:26 PM, r...@open-mpi.org wrote: > > Ah...I take that back. We changed this and now we _do_ indeed go down that > code path. Not good. > > So yes, we need that putenv so it gets the jobid from the HNP that was > launched, li

Re: [OMPI devel] toward a unique session directory

2016-09-15 Thread r...@open-mpi.org
> On Sep 15, 2016, at 12:51 AM, Gilles Gouaillardet wrote: > > Ralph, > > > > my reply is in the text > > > On 9/15/2016 11:11 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >> If we are going to make a change, then let’s do it only

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread r...@open-mpi.org
I don’t understand this fascination with PMIx. PMIx didn’t calculate this jobid - OMPI did. Yes, it is in the opal/pmix layer, but it had -nothing- to do with PMIx. So why do you want to continue to blame PMIx for this problem?? > On Sep 15, 2016, at 4:29 AM, Joshua Ladd wrote: > > Great cat

Re: [OMPI devel] toward a unique session directory

2016-09-15 Thread r...@open-mpi.org
please remind me how to test if an app was launched by >> mpirun/orted or direct launched by the RM ? >> >> right now, which direct launch method is supported ? >> i am aware of srun (SLURM) and apron (CRAY), are there any other ? >> >> Cheers, >> >

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread r...@open-mpi.org
Josh > > On Thu, Sep 15, 2016 at 10:07 AM, r...@open-mpi.org > <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>> > wrote: > I don’t understand this fascination with PMIx. PMIx didn’t calculate this > jobid - OMPI did. Yes, it is in the opal/pmix layer, but i

Re: [OMPI devel] OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread r...@open-mpi.org
I don’t think a collision was the issue here. We were taking the mpirun-generated jobid and passing it thru the hash, thus creating an incorrect and invalid value. What I’m more surprised by is that it doesn’t -always- fail. Only thing I can figure is that, unlike with PMIx, the usock oob compon

Re: [OMPI devel] Sample of merging ompi and ompi-release

2016-09-19 Thread r...@open-mpi.org
One question, to be discussed on the webex: now that github has a “reviewed” feature, so we still need/want the “thumbs-up” bot? If we retain it, then how do we deal with the non-sync’d, duplicative mechanisms? > On Sep 19, 2016, at 4:23 PM, George Bosilca wrote: > > :+1: > > George. > >

Re: [OMPI devel] OMPI devel] RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)

2016-09-21 Thread r...@open-mpi.org
FWIW: you know the location of every proc (to at least the host level) from time of orte_init, which should precede anything in the BTL > On Sep 21, 2016, at 8:31 AM, Gilles Gouaillardet > wrote: > > George, > > Is proc locality already set at that time ? > > If yes, then we could keep a har

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. v2.x-dev-2911-gc7bf9a0

2016-09-22 Thread r...@open-mpi.org
Hey Gilles This fix doesn’t look right to me. > +/* we read something - better get more */ > +num_chars_read += rc; > +orted_uri = realloc((void*)orted_uri, buffer_length+chunk); > +memset(&orted_uri[buffer_length], 0, chunk); > +buffer

[OMPI devel] Error in hwloc configury

2016-09-22 Thread r...@open-mpi.org
Hey folks I’m encountering an issue with the way we detect external HWLOC. If I have a directory that includes an hwloc installation in my CPPFLAGS, then we fail to build, even if I don’t specify anything with regard to hwloc on my configure cmd line. The errors I get look like: In file includ

Re: [OMPI devel] Error in hwloc configury

2016-09-22 Thread r...@open-mpi.org
Is the root cause we append our stuff to CPPFLAGS, instead of prepend ? > > You can retrieve the compile command line with > make V=1 > > If my guess is correct, does someone know the rationale for append vs prepend > ? > > Cheers, > > Gilles > > r...@open-mpi.

Re: [OMPI devel] Error in hwloc configury

2016-09-22 Thread r...@open-mpi.org
t; >> >>unset file >> >> ]) >> >> OPAL_VAR_SCOPE_POP >> >> diff --git a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc.m4 >> b/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc.m4 >> >> index 6807624..3f6a6fe 10

Re: [OMPI devel] mtl/psm2 and $PSM2_DEVICES

2016-09-29 Thread r...@open-mpi.org
iable PSM2_DEVICES="self,shm” ). >> This is to avoid “reserving” HW resources in the HFI card that wouldn’t be >> used unless you later on spawn ranks in other nodes. Therefore, to allow >> dynamic process to be spawned on other nodes you need to tell PSM2 to >> instruct

Re: [OMPI devel] Open MPI, PMIx and munge

2016-10-03 Thread r...@open-mpi.org
OMPI should not build munge support unless specifically requested to do so > On Oct 2, 2016, at 7:04 PM, Gilles Gouaillardet wrote: > > Folks, > > > Open MPI policy is to build munge support if it is found, whereas PMIx policy > is to build munge support only if it requested > > from a pragm

Re: [OMPI devel] use of OBJ_NEW and related calls

2016-10-10 Thread r...@open-mpi.org
See opal/class/opal_object.h And your assumption is correct :-) > On Oct 10, 2016, at 1:18 PM, Emani, Murali wrote: > > Hi, > > Could someone help me in understanding where the functions OBJ_NEW/ > OBJ_CONSTRUCT/ OBJ_DESTRUCT are defined in the source code. Are these > specific to OpenMPI c

Re: [OMPI devel] New Open MPI Community Bylaws to discuss

2016-10-12 Thread r...@open-mpi.org
The OMPI community members have had their respective legal offices review the changes, but we decided to provide notice and get input from others prior to the formal vote of acceptance. Once approved, there will no longer be a CLA at all. The only requirement for contribution will be the sign-of

Re: [OMPI devel] New Open MPI Community Bylaws to discuss

2016-10-12 Thread r...@open-mpi.org
ote: > > Regardless, I would have to notify legal teams about amendment of the > existing CLA. If organizations that already signed the agreement don't have > any say, then this conversation is pointless. > > -Pasha > > On Wed, Oct 12, 2016 at 9:29 AM, r...@open-m

Re: [OMPI devel] RFC: Rename nightly snapshot tarballs

2016-10-17 Thread r...@open-mpi.org
You make a valid point - I too prefer simplicity to a 50-character-long name. There should be some simple way of making it clear which branch the tarball came from...your suggestions seem reasonable and easy to do. I’m sure we’ll be talking about this on the telecon in the morning. > On Oct 17,

[OMPI devel] Supercomputing 2016: Birds-of-a-Feather meetings

2016-10-24 Thread r...@open-mpi.org
Hello all This year, we will again be hosting Birds-of-a-Feather meetings for Open MPI and PMIx. Open MPI: Wed, Nov 16th, 5:15-7pm http://sc16.supercomputing.org/presentation/?id=bof103&sess=sess322 PMIx: Wed, Nov16th, 12

[OMPI devel] Update to Open MPI Administrative Rules

2016-10-25 Thread r...@open-mpi.org
Hello all Some of you may have noticed that we have been receiving pull requests on Github from contributors who have not signed a formal Contributor’s Agreement. This has raised some discussion in the community about how we accept such contributions without conflicting with our bylaws. Accord

Re: [OMPI devel] Update to Open MPI Administrative Rules

2016-10-25 Thread r...@open-mpi.org
M, Jeff Squyres (jsquyres) > wrote: > > On Oct 25, 2016, at 12:43 PM, r...@open-mpi.org wrote: >> >> signed-off by: > > Just to nit-pick: it's "Signed-off-by" (with a capital S and a -). It's the > output you get when you "git

Re: [OMPI devel] direct launch problem with master

2016-10-31 Thread r...@open-mpi.org
I should hope bisecting would be a last resort. The simplest interim solution is to set OMPI_MCA_routed=direct in your environment. I’ll take a look at a more permanent solution in the morning. > On Oct 30, 2016, at 6:33 PM, Pritchard Jr., Howard wrote: > > Hi Folks, > > While trying to solve

Re: [OMPI devel] direct launch problem with master

2016-10-31 Thread r...@open-mpi.org
Fixed in PR https://github.com/open-mpi/ompi/pull/2322 <https://github.com/open-mpi/ompi/pull/2322> > On Oct 31, 2016, at 1:20 AM, r...@open-mpi.org wrote: > > I should hope bisecting would be a last resort. The simplest interim solution > is to set OMPI_MCA_routed=direct i

Re: [OMPI devel] PMIx in 2.x

2016-11-07 Thread r...@open-mpi.org
I’m not sure your description of the 1.x behavior is entirely accurate. What actually happened in that scenario is that the various mpirun’s would connect to each other, proxying the various MPI dynamic calls across each other. You had to tell each mpirun how to find the other - this was in the

Re: [OMPI devel] PMIx in 2.x

2016-11-08 Thread r...@open-mpi.org
t; > > PMIX ERROR: ERROR in file src/server/pmix_server.c at line 1881 > >

Re: [OMPI devel] PMIx in 2.x

2016-11-08 Thread r...@open-mpi.org
in master). > > Thanks, > Pieter > From: devel on behalf of r...@open-mpi.org > > Sent: Tuesday, November 8, 2016 12:17:29 PM > To: OpenMPI Devel > Subject: Re: [OMPI devel] PMIx in 2.x > > Should be handled more gracefully, of course. When a proc departs, we cle

Re: [OMPI devel] Developing MPI program without mpirun

2016-11-18 Thread r...@open-mpi.org
The 2.0.1 NEWS states that the MPI dynamics operations (comm_spawn, connect, and accept) do not work on that release. They are being fixed for the 2.0.2 release. > On Nov 18, 2016, at 7:48 AM, Rui Liu wrote: > > Hi Howard, > > 1. I am using a cluster which involves 20 ubuntu 14.04 servers, an

Re: [OMPI devel] [OMPI users] funny SIGSEGV in 'ompi_info'

2016-11-22 Thread r...@open-mpi.org
The “correct” answer is, of course, to propagate the error upwards so that the highest level caller (e.g., MPI_Init or ompi_info) can return it to the user, who can then decide what to do. Disregarding the parameter is not an option as it violates our “do what the user said to do, else return a

Re: [OMPI devel] master nightly tarballs stopped on 11/21

2016-11-23 Thread r...@open-mpi.org
I’ll turn my crontab back on for the holiday, in case Brian isn’t available - worst case, the tarball gets pushed upstream twice. > On Nov 23, 2016, at 7:59 AM, Pritchard Jr., Howard wrote: > > Hi Brian, > > Could you check what’s going on with the nightly tarball builds? > Nothing new has bee

Re: [OMPI devel] master nightly tarballs stopped on 11/21

2016-11-23 Thread r...@open-mpi.org
n Nov 23, 2016, at 08:28, "r...@open-mpi.org <mailto:r...@open-mpi.org>" > mailto:r...@open-mpi.org>> wrote: > >> I’ll turn my crontab back on for the holiday, in case Brian isn’t available >> - worst case, the tarball gets pushed upstream twice. >&g

[OMPI devel] Reminder: sign up for OMPI Jan meeting

2016-12-01 Thread r...@open-mpi.org
Hey folks So far, I’m the only one who has signed up for the Jan 24-26 meeting in the Bay area. I’m going to be a very lonely person! The meeting page is: https://github.com/open-mpi/ompi/wiki/Meeting-2017-01 If you cannot sign up yoursel

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread r...@open-mpi.org
?? I see a bunch of commits that were all collected in a single PR from Gilles yesterday - is that what you are referring to? > On Dec 1, 2016, at 1:58 PM, Howard Pritchard wrote: > > Hi Folks, > > Just an FYI it looks like a bunch of commits may have been accidentally > pushed to > master s

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread r...@open-mpi.org
I should add, FWIW: I’m working with the HEAD of master right now, and not seeing any problems. > On Dec 1, 2016, at 2:10 PM, r...@open-mpi.org wrote: > > ?? I see a bunch of commits that were all collected in a single PR from > Gilles yesterday - is that what you are referring

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread r...@open-mpi.org
ommit/1e2019ce2a903be24361b3424d8e98d27e941c6c> > > Cheers, > > Gilles > > On Friday, December 2, 2016, r...@open-mpi.org > > wrote: > I should add, FWIW: I’m working with the HEAD of master right now, and not > seeing any problems. > >> On Dec 1,

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread r...@open-mpi.org
systems are different and it is hard to compete in coverage with our set > of Jenkins' :). > > 2016-12-01 14:51 GMT-08:00 r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>>: > FWIW: I verified it myself, and it was fine on my systems

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread r...@open-mpi.org
aster. > > Howard > > > 2016-12-01 16:59 GMT-07:00 r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>>: > Ummm...guys, it was done via PR. I saw it go by, and it was all done to > procedure: > > https://github.com/open-mpi/ompi

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread r...@open-mpi.org
gt; > git log --oneline --topo-order > > you don't see a Merge pull request #2488 in the history for master. > > Howard > > > 2016-12-01 16:59 GMT-07:00 r...@open-mpi.org <mailto:r...@open-mpi.org> > mailto:r...@open-mpi.org>>: > Ummm...guys, it w

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread r...@open-mpi.org
Enough folks - we all have different methods of working, and despite all the angst, it all seems to work. Gilles: there is nothing wrong with the master. It works fine. Let’s get back to doing something useful > On Dec 1, 2016, at 5:15 PM, Gilles Gouaillardet wrote: > > Paul, > > > that is

Re: [OMPI devel] Errors with CXX=pgc++ (but CXX=pgCC OK)

2016-12-17 Thread r...@open-mpi.org
Added to 1.10 README - thanks! > On Dec 16, 2016, at 4:18 PM, Paul Hargrove wrote: > > With the 1.10.r5c1 tarball on linux/x86-64 and various versions of the PGI > compilers I have configured with > --prefix=[...] --enable-debug CC=pgcc CXX=pgc++ FC=pgfortran > > I see the following with versi

[OMPI devel] Last call: v1.10.5

2016-12-19 Thread r...@open-mpi.org
Any last concerns or desired changes? Otherwise, barring hearing anything by noon Pacific, I’ll build/release the final version Ralph ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Last call: v1.10.5

2016-12-19 Thread r...@open-mpi.org
> Thanks, > > _MAC > > -Original Message- > From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of > r...@open-mpi.org > Sent: Monday, December 19, 2016 8:57 AM > To: OpenMPI Devel > Subject: [OMPI devel] Last call: v1.10.5 > > Any last concerns o

Re: [OMPI devel] OMPI devel] hwloc missing NUMANode object

2017-01-05 Thread r...@open-mpi.org
I can add a check to see if we have NUMA, and if not we can fall back to socket (if present) or just “none” > On Jan 5, 2017, at 1:39 AM, Gilles Gouaillardet > wrote: > > Thanks Brice, > > Right now, the user facing issue is that numa binding is requested, and there > is no numa, so mpirun a

Re: [OMPI devel] OMPI devel] hwloc missing NUMANode object

2017-01-11 Thread r...@open-mpi.org
Should be fixed here: https://github.com/open-mpi/ompi/pull/2711 <https://github.com/open-mpi/ompi/pull/2711> > On Jan 5, 2017, at 6:42 AM, r...@open-mpi.org wrote: > > I can add a check to see if we have NUMA, and if not we can fall back to > socket (if present) or just “no

Re: [OMPI devel] Fwd: Re: [OMPI users] still segmentation fault with openmpi-2.0.2rc3 on Linux

2017-01-11 Thread r...@open-mpi.org
I specified > "--slot-list 0:0-5,1:0-5". Does incorrect mean that it isn't > allowed to specify more slots than available, to specify fewer > slots than available, or to specify more slots than needed for > the processes? > > > Kind regards > > Siegma

Re: [OMPI devel] [OMPI users] still segmentation fault with openmpi-2.0.2rc3 on Linux

2017-01-11 Thread r...@open-mpi.org
Looking at this note again: how many procs is spawn_master generating? > On Jan 11, 2017, at 7:39 PM, r...@open-mpi.org wrote: > > Sigh - yet another corner case. Lovely. Will take a poke at it later this > week. Thx for tracking it down > >> On Jan 11, 2017, at 5:27 PM

Re: [OMPI devel] [OMPI users] still segmentation fault with openmpi-2.0.2rc3 on Linux

2017-01-12 Thread r...@open-mpi.org
or > the processes? > > > Kind regards > > Siegmar > > Am 11.01.2017 um 10:04 schrieb Gilles Gouaillardet: > Siegmar, > > I was able to reproduce the issue on my vm > (No need for a real heterogeneous cluster here) > > I will keep digging tomorrow. > No

[OMPI devel] OMPI v1.10.6

2017-01-12 Thread r...@open-mpi.org
Hi folks It looks like we may have motivation to release 1.10.6 in the near future. Please check to see if you have anything that should be included, or is pending review. Thanks Ralph ___ devel mailing list devel@lists.open-mpi.org https://rfd.newme

Re: [OMPI devel] OMPI v1.10.6

2017-01-18 Thread r...@open-mpi.org
Last call for v1.10.6 changes - we still have a few pending for review, but none marked as critical. If you want them included, please push for a review _now_ Thanks Ralph > On Jan 12, 2017, at 1:54 PM, r...@open-mpi.org wrote: > > Hi folks > > It looks like we may have motiva

Re: [OMPI devel] OMPI v1.10.6

2017-01-18 Thread r...@open-mpi.org
Will someone be submitting that PR soon? > On Jan 18, 2017, at 10:09 AM, George Bosilca wrote: > > https://github.com/open-mpi/ompi/issues/2750 > <https://github.com/open-mpi/ompi/issues/2750> > > George. > > > > On Wed, Jan 18, 2017 at 12:57 PM, r.

[OMPI devel] Reminder: assign as well as request review

2017-01-27 Thread r...@open-mpi.org
Hey folks Just a reminder. If you request a review from someone, GitHub doesn’t show that person’s icon when looking at the list of PRs. It only shows their icon and marks the PR with their ID if you actually “assign” it to that person. Thus, just requesting a review without assigning the PR to

[OMPI devel] Problem on master

2017-01-27 Thread r...@open-mpi.org
Hello all There is a known issue on master that we are attempting to debug. Sadly, it is one that only shows on multi-node operations, and the signature varies based on your environment. We hope to have this resolved soon (and no, it doesn’t appear to be due to any one specific commit). In the

  1   2   3   >