Re: [OMPI devel] [OMPI users] Class information in OpenMPI

2016-07-07 Thread Ralph Castain
We used to have Doxygen support that would create what you are asking for, but I don’t think anyone has maintained it in a long time. I ran “doxygen” at the top-level directory and it did indeed generate a bunch of html, but I’m not sure it is all that helpful. You might take a look and see if

[OMPI devel] BML/R2 error

2016-07-03 Thread Ralph Castain
I agree with the compiler - I can’t figure out exactly what was meant here either: bml_r2.c: In function ‘mca_bml_r2_endpoint_add_btl’: bml_r2.c:271:21: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses] if ((btl_in_use && (btl_flags & MCA_BTL_FLAGS_RDMA) ||

Re: [MTT devel] updated mtt wiki

2016-06-30 Thread Ralph Castain
Wow - thanks Deb!! > On Jun 30, 2016, at 8:34 AM, Rezanka, Deborah > wrote: > > Hi, > > I have uploaded the new wiki files to the mtt.wiki. The files are: > > How to Set Up Python Virtual Environments > Converting Python Code for Compatibility > Python 2

[OMPI devel] 1.10 series is complete

2016-06-21 Thread Ralph Castain
Hi folks We have now released 1.10.3, and there are no planned milestones for a 1.10.4, so we can now turn our undivided attention to the 2.x series! As always, any bug fixes filed against the 1.10 series will simply be allowed to accumulate until a critical mass accrues, and then we’ll worry

[OMPI devel] Master: warnings on Mac

2016-06-18 Thread Ralph Castain
runtime/ompi_mpi_init.c:103:5: warning: "HAVE___MALLOC_INITIALIZE_HOOK" is not defined [-Wundef] #if HAVE___MALLOC_INITIALIZE_HOOK In file included from ../../../../opal/class/opal_value_array.h:31:0, from ../../../../opal/mca/base/mca_base_var.h:66, from

Re: [OMPI devel] Issue with 2.0.0rc3, singleton init

2016-06-16 Thread Ralph Castain
ard Pritchard > HPC-DES > Los Alamos National Laboratory > > > From: devel <devel-boun...@open-mpi.org <mailto:devel-boun...@open-mpi.org>> > on behalf of Ralph Castain <r...@open-mpi.org <mailto:r...@open-mpi.org>> > Reply-To: Open MPI Developers <de...@o

Re: [OMPI devel] Issue with 2.0.0rc3, singleton init

2016-06-16 Thread Ralph Castain
FWIW: I am able to replicate and will provide a patch later today > On Jun 16, 2016, at 8:19 AM, Howard Pritchard wrote: > > Hi Lisandro, > > Thanks for giving the rc3 a try. Could you post the output of ompi_info from > your > install to the list? > > Thanks, > >

[OMPI devel] Last call: OMPI v1.10.3 release

2016-06-14 Thread Ralph Castain
Hi folks Does anyone have an outstanding reason to delay release of v1.10.3? If not, I will release it tomorrow. Ralph

Re: [OMPI devel] Event notification in OMPI

2016-06-11 Thread Ralph Castain
The corresponding PMIx RFC is now available for comment: https://github.com/pmix/RFCs/pull/2 <https://github.com/pmix/RFCs/pull/2> > On Jun 9, 2016, at 8:37 AM, Ralph Castain <r...@open-mpi.org> wrote: > > Hi folks > > There is a PR that has cleared Jenkins,

[OMPI devel] Event notification in OMPI

2016-06-09 Thread Ralph Castain
Hi folks There is a PR that has cleared Jenkins, but it represents a fairly significant change in OMPI capabilities. Thus, I think it merits a little more attention. The PR (https://github.com/open-mpi/ompi/pull/1767 ) brings the PMIx event

Re: [OMPI devel] Jenkins testing - what purpose are we striving to achieve?

2016-06-07 Thread Ralph Castain
ing bug(s) is fixed. > > I don't think it makes much sense to run a jenkins script against PRs if it > fails when run against master. > The purpose of jenkins PR testing is to trap new problems, not to keep > reminding us there are problems > with the underlying branch which the

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-06 Thread Ralph Castain
I think I have this fixed here: https://github.com/open-mpi/ompi/pull/1756 <https://github.com/open-mpi/ompi/pull/1756> George - can you please try it on your system? > On Jun 5, 2016, at 4:18 PM, Ralph Castain <r...@open-mpi.org> wrote: > > Yeah, I can reproduce on my b

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-05 Thread Ralph Castain
le to reproduce. In this case, I'm falling > on the wrong side of whatever race condition is happening... > > >> On Jun 4, 2016, at 7:57 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >> I may have an idea of what’s going on here - I just need to finish something

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread Ralph Castain
I may have an idea of what’s going on here - I just need to finish something else first and then I’ll take a look. > On Jun 4, 2016, at 4:20 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > >> >> On Jun 5, 2016, at 07:53 , Ralph Castain <r...@open-mpi.org &

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread Ralph Castain
> On Jun 4, 2016, at 1:11 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > > > > On Sat, Jun 4, 2016 at 11:05 PM, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > He can try adding "-mca state_base_verbose 5”, but if we a

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread Ralph Castain
; >> On Jun 4, 2016, at 6:43 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >> Neither of those threads have anything to do with catching the sigchld - >> threads 4-5 are listening for OOB and PMIx connection requests. It looks >> more like mpirun thoug

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-04 Thread Ralph Castain
Neither of those threads have anything to do with catching the sigchld - threads 4-5 are listening for OOB and PMIx connection requests. It looks more like mpirun thought it had picked everything up and has begun shutting down, but I can’t really tell for certain. > On Jun 4, 2016, at 6:29 AM,

[OMPI devel] 1.10.3rc4 ready for test

2016-06-04 Thread Ralph Castain
Hello folks The release candidate is in the usual place: https://www.open-mpi.org/software/ompi/v1.10/ Please note that the OMPI web site will be down for maintenance 7-8am US Eastern time on June 6th. I would like to get a round of final

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-02 Thread Ralph Castain
running the IBM collective and pt2pt tests, > but each time it deadlocked was in a different test. If you are interested in > some particular values, I would be happy to attach a debugger next time it > happens. > > George. > > > On Wed, Jun 1, 2016 at 10:47 PM, Ralph Castain

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-01 Thread Ralph Castain
What kind of apps are they? Or does it matter what you are running? > On Jun 1, 2016, at 7:37 PM, George Bosilca wrote: > > I have a seldomly occurring deadlock on a OS X laptop if I use more than 2 > processes). It is coming up once every 200 runs or so. > > Here is

[OMPI devel] Hangs on master

2016-05-27 Thread Ralph Castain
Hey folks MTT is reporting a massive wave of hangs on master from last night - they all look like this: libibverbs: Warning: couldn't load driver 'cxgb3': libcxgb3-rdmav2.so: cannot open shared object file: No such file or directory libibverbs: Warning: couldn't load driver 'mthca':

Re: [OMPI devel] 1.10.3rc status

2016-05-27 Thread Ralph Castain
>> --without-mpi-param-check), >> >> then something will go wrong in MPI_File_close >> >> >> that raises several questions ... >> >> - why does MPI-IO default behavior is to fail silently ? >> >> (point to point or collective

[OMPI devel] One-sided failures on master

2016-05-26 Thread Ralph Castain
I’m seeing a lot of onesided hangs on master when trying to run an MTT scan on it tonight - did something go in that might be having trouble? Ralph

Re: [OMPI devel] mpirun fails with the latest git pull

2016-05-26 Thread Ralph Castain
If a .m4 changes, then you always must re-run autogen.pl > On May 26, 2016, at 1:48 PM, dpchoudh . wrote: > > Hello all > > With a git pull of roughly 4 PM EDT (US), that had a .m4 file (something to > do with MXM) in the change set, mpirun does not work anymore. The

[OMPI devel] 1.10.3rc status

2016-05-26 Thread Ralph Castain
I’m seeing three errors in MTT today - of these, I only consider the first two to be of significant concern: onesided/cxx_win_attr : https://mtt.open-mpi.org/index.php?do_redir=2326 [**ERROR**]: MPI_COMM_WORLD rank 0, file cxx_win_attr.cc:50:

Re: [OMPI devel] [1.10.3.rc3] test results summary

2016-05-26 Thread Ralph Castain
Thanks Paul! > On May 25, 2016, at 7:39 PM, Paul Hargrove wrote: > > Sometime earlier today while I was busy in meetings, my tests of the > 1.10.3rc3 tarball completed. > > The two issues I had reported in RC2 did not occur with RC3. > > The issue with static linking

[OMPI devel] v1.10.3rc3 out for test

2016-05-25 Thread Ralph Castain
Hi folks I believe this is ready for final test now. Please give it a whirl and let us know! https://www.open-mpi.org/software/ompi/v1.10/ Ralph

Re: [OMPI devel] modex getting corrupted

2016-05-21 Thread Ralph Castain
Please provide the exact code used for both send/recv - you likely have an error in the syntax > On May 20, 2016, at 9:36 PM, dpchoudh . wrote: > > Hello all > > I have a naive question: > > My 'cluster' consists of two nodes, connected back to back with a proprietary >

[OMPI devel] v1.10.3rc2 ready for final testing

2016-05-21 Thread Ralph Castain
Hi folks I have posted rc2 of 1.10.3 - this contains all the changes that have been provided, and should be the final release candidate. Barring anything you find, I expect to just make a final pass on the NEWS items and release by end of May. https://www.open-mpi.org/software/ompi/v1.10/

[OMPI devel] Datatype flag?

2016-05-20 Thread Ralph Castain
Hey folks Is there some flag by which the datatype code can know what transport is being used? For example, suppose a transport can handle certain datatype configurations itself, without the converting dealing with them (e.g., contiguous vs non-contiguous). Essentially, it’s an “offload”

Re: [OMPI devel] RFC: Public Test Repo

2016-05-19 Thread Ralph Castain
Assume that we are able to “package” MTT so it can be upstreamed to the OpenHPC folks (as they have requested). It then becomes a little more “salable” as a generalized testing tool, which means that the tests in our repo could be used by others without having to deal with the MTT-specific

Re: [OMPI devel] default mapping on master vs v2.x

2016-05-18 Thread Ralph Castain
I don’t think your stated analysis is quite correct. First, the topology is -always- retrieved. The only question is whether or not we set the #slots equal to the number of detected cpus. If the user specifies the #slots, then we respect that designation. If the user does not specify #slots,

Re: [OMPI devel] Process connectivity map

2016-05-16 Thread Ralph Castain
You are welcome to raise the question of default mapping behavior on master yet again, but please do so on a separate thread so we can make sense of it. Note that I will not be making more modifications of that behavior, so if someone feels strongly that they want it to change, please go ahead

Re: [OMPI devel] Process connectivity map

2016-05-16 Thread Ralph Castain
Sounds like something has been broken - what Jeff describes is the intended behavior > On May 16, 2016, at 8:00 AM, Gilles Gouaillardet > wrote: > > Jeff, > > this is not what I observed > (tcp btl, 2 to 4 nodes with one task per node, cutoff=0) > the add_procs

Re: [MTT devel] GitHub Issue Cleanup

2016-05-13 Thread Ralph Castain
+1 on all these Thanks for taking point, Josh! > On May 13, 2016, at 11:08 AM, Josh Hursey wrote: > > We are seeing more activity with MTT development, and there is a desire to > push to a formal release at some point in the not-to-distant future. As such, > I think it

Re: [OMPI devel] [v2.x] printf format warnings w/ -m32

2016-05-11 Thread Ralph Castain
I took a look at this, and the problem isn’t in the print statements. The problem is that PRIsize_t is being incorrectly set to “unsigned long” instead of something correct for the -m32 directive in that environment > On May 6, 2016, at 9:48 AM, Paul Hargrove wrote: > >

[OMPI devel] New Github labels

2016-05-11 Thread Ralph Castain
Hi folks For PRs on the master, I have added two labels: Target: 2.x Target: 1.10 These are intended to mark that this PR should be ported to the target branch once it has been committed to the master. I’m hoping it helps us to avoid missing PRs that address problems found on release

Re: [OMPI devel] Jenkins mindist test now failing in 2.x

2016-05-10 Thread Ralph Castain
/aix component; it has nothing > to do with mindist: > >https://github.com/open-mpi/ompi-release/pull/1153 > > > > On May 10, 2016, at 5:47 PM, Ralph Castain <r...@open-mpi.org> wrote: > > > > Cannot be the same reason, Jeff - the schizo updates never

Re: [OMPI devel] Jenkins mindist test now failing in 2.x

2016-05-10 Thread Ralph Castain
Cannot be the same reason, Jeff - the schizo updates never went over there. If mindist is failing in 2.x, it is for a totally different reason On Tue, May 10, 2016 at 2:11 PM, Jeff Squyres (jsquyres) wrote: > Ralph -- > > You fixed the mindist test on master, right? > >

Re: [OMPI devel] Process placement

2016-05-07 Thread Ralph Castain
I believe this has been fixed. Note that the allocation display occurs prior to mapping, and thus the slots_inuse will be zero at that point. You’ll see those numbers change if you do a comm_spawn, but otherwise they will always be zero > On May 5, 2016, at 8:37 PM, Ralph Castain <r..

Re: [OMPI devel] Process placement

2016-05-06 Thread Ralph Castain
= > dancer01 > dancer00 > dancer01 > dancer01 > dancer01 > dancer00 > dancer00 > dancer00 > dancer00 > dancer00 > dancer00 > dancer00 > > > -- > Aurélien Bouteiller, Ph.D. ~~ https://icl.cs.utk.edu/~bouteill/ &g

Re: [OMPI devel] [2.0.0rc2] xlc build failure (pmix)

2016-05-02 Thread Ralph Castain
Fix setup here: https://github.com/open-mpi/ompi-release/pull/1129 > On May 2, 2016, at 5:13 PM, Paul Hargrove wrote: > > AFTER fixing the inline asm problem detailed in >

[OMPI devel] v2.0.0rc issues

2016-04-30 Thread Ralph Castain
With external libevent, hwloc, and pmix 1.1.4: +-+-+-+--+--+--+--+--+---+ | Phase | Section| Pass | Fail | Time out

[OMPI devel] Warnings in 2.0 release candidate

2016-04-30 Thread Ralph Castain
On CentOS-7 using gcc 4.8: btl_tcp.c: In function ‘mca_btl_tcp_add_procs’: btl_tcp.c:97:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (int j = 0 ; j < tcp_proc->proc_endpoint_count ; ++j) { ^ btl_tcp_proc.c: In

Re: [OMPI devel] 2.0.0 is coming: what do we need to communicate to users?

2016-04-29 Thread Ralph Castain
Didn’t OSHMEM up-level its API? I believe we also have some early support in there for DVM and Singularity, but not the full-blown capability that is in master. Unsure if we want to advertise that for 2.0, maybe wait for the updates in 2.1? > On Apr 29, 2016, at 10:55 AM, Jeff Squyres

Re: [OMPI devel] 2.0.0 is coming: what do we need to communicate to users?

2016-04-29 Thread Ralph Castain
FWIW: I think Matias has a good point, though perhaps it belongs on a wiki page. When we moved the BTLs down to OPAL, for example, it doesn’t impact users, but would be worth ensuring developer’s and ISVs had a convenient place to see what changed. > On Apr 29, 2016, at 10:47 AM, Jeff Squyres

Re: [OMPI devel] Open MPI v2.0.0rc2

2016-04-29 Thread Ralph Castain
All “green” from me: Passed: 1347 Failed: 0 > On Apr 28, 2016, at 4:01 PM, Jeff Squyres (jsquyres) > wrote: > > At long last, here's the next v2.0.0 release candidate: 2.0.0rc2: > >https://www.open-mpi.org/software/ompi/v2.x/ > > We didn't keep a good list of all

Re: [OMPI devel] modex receive

2016-04-29 Thread Ralph Castain
CM is not being selected for TCP - you specified TCP for the BTLs, but that assumes that a BTL will be selected. You obviously have something in your system that is supported by an MTL, and that will always be selected before a BTL. > On Apr 28, 2016, at 8:22 PM, dpchoudh .

Re: [OMPI devel] Process affinity detection

2016-04-26 Thread Ralph Castain
info. Doable, but results in a scaling penalty, and so definitely not something we want to do by default. > > On 04/26/2016 02:56 PM, Ralph Castain wrote: >> Hmmm…you mean for procs on the same node? I’m not sure how you can do it >> without introducing another data exchange,

Re: [OMPI devel] Process affinity detection

2016-04-26 Thread Ralph Castain
Hmmm…you mean for procs on the same node? I’m not sure how you can do it without introducing another data exchange, and that would require the app to execute it since otherwise we have no idea when they set the affinity. If we assume they set the affinity prior to calling MPI_Init, then we

Re: [OMPI devel] 1.10.3rc MTT failures

2016-04-25 Thread Ralph Castain
ub (ssh) > > for that reason, I would not simply assume the latest test suite is used :-( > > and fwiw, Jeff uses an internally mirrored repo for ompi-tests, so it Cisco > > clusters should use the latest test suites. > > > > Geoffrey, > > can you please comment o

Re: [OMPI devel] 1.10.3rc MTT failures

2016-04-25 Thread Ralph Castain
suites. > > Geoffrey, > can you please comment on the config of the ibm cluster ? > > Cheers, > > Gilles > > On Monday, April 25, 2016, Ralph Castain <r...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote: > I don’t know -

Re: [OMPI devel] 1.10.3rc MTT failures

2016-04-25 Thread Ralph Castain
u make sure the ibm test suite is up to date ? > I pushed a fix for datatypes a few days ago, and it should be fine now. > > I will double check this tomorrow anyway > > Cheers, > > Gilles > > On Monday, April 25, 2016, Ralph Castain <r...@open-mpi.org > <ma

[OMPI devel] 1.10.3rc MTT failures

2016-04-25 Thread Ralph Castain
I’m seeing some consistent errors in the 1.10.3rc MTT results and would appreciate it if folks could check them out: ONESIDED: onesided/cxx_win_attr: [**ERROR**]: MPI_COMM_WORLD rank 0, file cxx_win_attr.cc:50: Win::Get_attr: Got wrong value for disp unit [**ERROR**]: MPI_COMM_WORLD rank 1, file

Re: [OMPI devel] Common symbol warnings in tarballs (was: make install warns about 'common symbols')

2016-04-20 Thread Ralph Castain
> On Apr 20, 2016, at 10:24 AM, Dave Goodell (dgoodell) > wrote: > > On Apr 20, 2016, at 9:14 AM, Jeff Squyres (jsquyres) > wrote: >> >> I was under the impression that this warning script only ran for developer >> builds. But it looks like it's

Re: [OMPI devel] Common symbol warnings in tarballs (was: make install warns about 'common symbols')

2016-04-20 Thread Ralph Castain
Agreed - it was only supposed to run for developer builds > On Apr 20, 2016, at 7:14 AM, Jeff Squyres (jsquyres) > wrote: > > I was under the impression that this warning script only ran for developer > builds. But it looks like it's unconditionally run at the end of

Re: [OMPI devel] psm2 and psm2_ep_open problems

2016-04-15 Thread Ralph Castain
I have a patch that I think will resolve this problem - would you please take a look?Ralph matias.diff Description: Binary data On Apr 15, 2016, at 7:32 AM, Ralph Castain <r...@open-mpi.org> wrote:Actually, it did come across the developer list :-)Why don’t I resolve this by just en

Re: [OMPI devel] Fwd: psm2 and psm2_ep_open problems

2016-04-15 Thread Ralph Castain
Actually, it did come across the developer list :-) Why don’t I resolve this by just ensuring that the key we create is properly filled? It’s a trivial fix in the PMI ess component > On Apr 15, 2016, at 7:26 AM, Howard Pritchard wrote: > > I didn't copy dev on this. > >

Re: [OMPI devel] Process placement

2016-04-13 Thread Ralph Castain
The —map-by node option should now be fixed on master, and PRs waiting for 1.10 and 2.0 Thx! > On Apr 12, 2016, at 6:45 PM, Ralph Castain <r...@open-mpi.org> wrote: > > FWIW: speaking just to the —map-by node issue, Josh Ladd reported the problem > on master as well yesterda

Re: [OMPI devel] Process placement

2016-04-12 Thread Ralph Castain
FWIW: speaking just to the —map-by node issue, Josh Ladd reported the problem on master as well yesterday. I’ll be looking into it on Wed. > On Apr 12, 2016, at 5:53 PM, George Bosilca wrote: > > > > On Wed, Apr 13, 2016 at 1:59 AM, Gilles Gouaillardet

[OMPI devel] 1.10.3rc1 available for test

2016-04-08 Thread Ralph Castain
Hi folks We are prepping for release of 1.10.3, so please test the release candidate: https://www.open-mpi.org/software/ompi/v1.10/ Changes include: 1.10.3 -- - Minor manpage cleanups - Implement atomic support in OSHMEM/UCX - Fix support

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-3792-g92290b9

2016-04-07 Thread Ralph Castain
thing more than a typo > pinpointed and fixed in a matter of seconds. > > George. > > > On Thu, Apr 7, 2016 at 1:58 PM, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > Just as a suggestion: please express such changes in the f

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-3792-g92290b9

2016-04-07 Thread Ralph Castain
Just as a suggestion: please express such changes in the form of a Pull Request instead of a direct commit to avoid getting such mistakes into the code base. I’m not advocating it for truly trivial stuff - but changing the thread_unlock to an OB1 call probably should be given a chance for

Re: [OMPI devel] Scaling down open mpi for embedded application

2016-03-19 Thread Ralph Castain
There have been a couple of folks who did this before (one for a set-top cable TV box, another for a small satellite), and some folks run OMPI on small RaspberryPi “clusters”, so it is indeed doable. I would suggest going with a newer version as Gilles said, just so you start with something we

[OMPI devel] RFC: RML change to multi-select

2016-03-15 Thread Ralph Castain
Hi folks We are working on integrating the RML with libfabric so we have access to both management Ethernet and fabric transports. A first step in enabling this is to convert the RML framework to multi-select of active components. The stub functions then scan the components in priority order

[OMPI devel] Event notification in OMPI

2016-03-15 Thread Ralph Castain
Hello folks I have (finally) written a description of the PMIx event notification system. Please see the wiki page here: https://github.com/pmix/master/wiki/2.9-PMIx-Event-Notification This is the same event notification method

Re: [OMPI devel] RFC: warn if running a debug build

2016-03-02 Thread Ralph Castain
erious" benchmark should list the third party libs and their > versions, so that could be enough. > > Cheers, > > Gilles > > On Wednesday, March 2, 2016, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > What about this crazy idea? We alrea

Re: [OMPI devel] RFC: warn if running a debug build

2016-03-02 Thread Ralph Castain
What about this crazy idea? We already have .opal_unignore that looks at the username. Well, what if we did the same thing here? Have autogen.pl look at the username - if it is a known developer, then enable debug. If not, then disable it. I am just concerned that we are going to spend a bunch

Re: [OMPI devel] RFC: warn if running a debug build

2016-03-01 Thread Ralph Castain
I’ll bet we get a rash of complaints about this behavior…at the very least, let’s not do it if somebody deliberately asks for a debug build. I think people generally hate getting annoying warnings just because a few people do something wrong. > On Mar 1, 2016, at 8:27 PM, Gilles Gouaillardet

Re: [OMPI devel] Singletons

2016-02-29 Thread Ralph Castain
The fix is waiting in a PR - Howard said he will do the final review this evening (need verification that Cray isn’t broken) > On Feb 29, 2016, at 10:35 AM, George Bosilca wrote: > > Singletons are broken with version (e5d6b97db4fa1) compiled in both debug and > optimized

Re: [OMPI devel] Crash in orte_iof_hnp_read_local_handler

2016-02-28 Thread Ralph Castain
quot;sleep 1" instead of "ls /", no crash. > > So I run this loop : > while mpirun -host -np 6 ls /; do true; done > > > I'm not sure why MTT is reproducing the error ... does it write to mpirun > stdin ? > > On 02/26/2016 11:46 AM, Ralph Castain wr

Re: [OMPI devel] Crash in orte_iof_hnp_read_local_handler

2016-02-26 Thread Ralph Castain
So the child processes are not calling orte_init or anything like that? I can check it - any chance you can give me a line number via a debug build? > On Feb 26, 2016, at 11:42 AM, Sylvain Jeaugey wrote: > > I got this strange crash on master this night running

[OMPI devel] Confused topic for developer's meeting

2016-02-26 Thread Ralph Castain
There was some confusion yesterday at the developer’s meeting over a topic regarding framework dependencies. I apologize - I should have looked over the agenda more closely in advance to ensure I recalled everything. Instead of the topic I had wanted to discuss, we wound up discussing embedding

[OMPI devel] ORTED process group

2016-02-23 Thread Ralph Castain
Hello all The question was raised at today's developer workshop about our current practice of putting the application processes in a separate process group from their parent ORTE daemon. This has the unfortunate side effect of making the processes "invisible" to any host resource manager when

Re: [OMPI devel] Trunk is broken

2016-02-17 Thread Ralph Castain
gt; > > 2016-02-17 11:34 GMT-07:00 Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>>: > FWIW: I wouldn’t have seen that because I don’t have IB on my system. > >> On Feb 17, 2016, at 10:11 AM, Nysal Jan K A <jny...@gmail.com >> &

Re: [OMPI devel] Trunk is broken

2016-02-17 Thread Ralph Castain
I built with "--with-memory-manager=none" > > Regards > --Nysal > > On Tue, Feb 16, 2016 at 10:19 AM, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > It is very easy to reproduce - configure with: > enable_mem_debug=no > en

Re: [OMPI devel] Trunk is broken

2016-02-15 Thread Ralph Castain
is warning ? i do not see it. > fwiw, the abstraction violation was kind of already here, so i am surprised > it pops up now only > > Cheers, > > Gilles > > On 2/16/2016 1:17 PM, Ralph Castain wrote: >> Looks like someone broke the master build o

[OMPI devel] Trunk is broken

2016-02-15 Thread Ralph Castain
Looks like someone broke the master build on Linux: ../../../ompi/.libs/libmpi.so: undefined reference to `opal_memory_linux_malloc_init_hook' I suspect it was a hard-coded reference to some component’s variable? Ralph

Re: [OMPI devel] Error using MPI_Pack_external / MPI_Unpack_external

2016-02-11 Thread Ralph Castain
built >> with heterogeneous support! The other (working) machine is a large HPC, and >> it seems OpenMPI was built >> without heterogeneous support. >> >> Currently we work around the problem for packing and unpacking by having a >> compiler switch >

Re: [OMPI devel] Error using MPI_Pack_external / MPI_Unpack_external

2016-02-10 Thread Ralph Castain
Out of curiosity: if both systems are Intel, they why are you enabling hetero? You don’t need it in that scenario. Admittedly, we do need to fix the bug - just trying to understand why you are configuring that way. > On Feb 10, 2016, at 8:46 PM, Michael Rezny wrote:

Re: [OMPI devel] ompi_procs_cutoff, jobid and vpid

2016-02-05 Thread Ralph Castain
les Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > > Thanks Ralph, > > I will implement the second option. > conversion from sentinel to process name will require a few extra steps, but > that should not be in the critical path. > > Cheers, > > Gilles

Re: [OMPI devel] ompi_procs_cutoff, jobid and vpid

2016-02-05 Thread Ralph Castain
There are two potential places you could use: * the vpid itself is 32-bits in size - we are quite some years away from needing all of them, so taking the upper-most bit for this purpose should be okay * the lower 16-bits of the jobid is the local jobid - i.e., the number of times someone

Re: [OMPI devel] ompi_procs_cutoff, jobid and vpid

2016-02-05 Thread Ralph Castain
FWIW: we do have a macro that safely returns the either field of the jobid, whether in 32 or 64 bit environments. Is there some reason not to just use those? > On Feb 5, 2016, at 3:58 PM, Gilles Gouaillardet > wrote: > > Jeff, > > first, cutoff currently

Re: [OMPI devel] mpirun --launch-proxy options

2016-02-05 Thread Ralph Castain
I’m pretty sure you can by simply enclosing the entire launch proxy command in quotes, but I can take a look a little later today > On Feb 5, 2016, at 7:17 AM, Justin Cinkelj wrote: > > I'm starting mpi program via --launch-proxy, and would like to pass some >

Re: [OMPI devel] RFC: set MCA param mpi_add_procs_cutoff default to 32

2016-02-04 Thread Ralph Castain
related, but not in a rigid sense. Maybe they should be...? On Thu, Feb 4, 2016 at 9:31 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > On Feb 4, 2016, at 9:18 AM, Ralph Castain <r...@open-mpi.org> wrote: > > > > +1, with an addition and modification: &g

Re: [OMPI devel] RFC: set MCA param mpi_add_procs_cutoff default to 32

2016-02-04 Thread Ralph Castain
+1, with an addition and modification: * add the async_modex on by default * make the change in master and let it "stew" for awhile before moving to 2.0. I believe only Cisco has been running MTT against that setup so far. On Thu, Feb 4, 2016 at 6:04 AM, Gilles Gouaillardet <

Re: [OMPI devel] orted-children communication

2016-01-26 Thread Ralph Castain
see a socket opened between orted and the application >> while it's running, what is that for? >> >> 2016-01-19 17:40 GMT+01:00 Ralph Castain <r...@open-mpi.org>: >> >>> This is on master, yes? The only orted-children communication on the >>> master

Re: [OMPI devel] tm-less tm module

2016-01-25 Thread Ralph Castain
ted at the end of configure, rather >> than "make install", it could save folks some time spent compiling >> incorrectly configured builds. >> > >> > >> > Another thing one might independently want to consider is having >> configure warn

Re: [OMPI devel] tm-less tm module

2016-01-25 Thread Ralph Castain
compile" test fails. > > This would, for instance, catch the situation when the "libfoo" packages > is installed but the "libfoo-dev" package is not. > > This approach, however, may require non-trivial changes to how all the > configure probes are perform

Re: [OMPI devel] tm-less tm module

2016-01-25 Thread Ralph Castain
you make valid points. So -- no tm "addition". We >> just have to rely on people using functionality like "--with-tm" in the >> configure line to force/ensure that tm (or whatever feature) will actually >> get built. >> >> >> > On Jan 25, 20

Re: [OMPI devel] tm-less tm module

2016-01-25 Thread Ralph Castain
hatever feature) will actually > get built. > > > > On Jan 25, 2016, at 1:31 PM, Ralph Castain <r...@open-mpi.org> wrote: > > > > I think we would be opening a real can of worms with this idea. There > are environments, for example, that use PBSPro for one part

Re: [OMPI devel] tm-less tm module

2016-01-25 Thread Ralph Castain
I think we would be opening a real can of worms with this idea. There are environments, for example, that use PBSPro for one part of the system (e.g., IO nodes), but something else for the compute section. Personally, I'd rather follow Howard's suggestion. On Mon, Jan 25, 2016 at 10:21 AM,

Re: [OMPI devel] Benchmark with multiple orteds

2016-01-25 Thread Ralph Castain
Cheers, > > Gilles > > > On Monday, January 25, 2016, Ralph Castain <r...@open-mpi.org> wrote: > >> I believe the performance penalty will still always be greater than zero, >> however, as the TCP stack is smart enough to take an optimized path when >

Re: [OMPI devel] Benchmark with multiple orteds

2016-01-25 Thread Ralph Castain
I believe the performance penalty will still always be greater than zero, however, as the TCP stack is smart enough to take an optimized path when doing a loopback as opposed to inter-node communication. On Mon, Jan 25, 2016 at 4:28 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote:

[OMPI devel] Open MPI v1.10.2 released

2016-01-21 Thread Ralph Castain
The Open MPI Team, representing a consortium of research, academic, and industry partners, is pleased to announce the release of Open MPI version 1.10.2. v1.10.2 is primarily a bug fix release, but does include some API upgrades to OSHMEM. All users are encouraged to upgrade to v1.10.2 when

Re: [OMPI devel] orted-children communication

2016-01-19 Thread Ralph Castain
This is on master, yes? The only orted-children communication on the master (and going forward) is via PMIx. I’ve got a branch that contains the error notification support so the orted can alert the child about changes such as migration, and Annu Dasari and Dave Solt are working on the error

Re: [OMPI devel] Contributing to mpi

2016-01-07 Thread Ralph Castain
Hi Mudit We welcome anyone interested in contributing! You have a couple of choices on method, depending on how the magnitude of contribution and whether or not you want formal credit for it in the “authors” section: * you can just do some work on a fork of the master repo and generate a pull

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-3330-g213b2ab

2016-01-06 Thread Ralph Castain
nce proc_list[i] in order to access its proc_name. > Note this commit is incomplete and I pushed a second one right after I > figured it out. > > Cheers, > > Gilles > > On Wednesday, January 6, 2016, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>>

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-3330-g213b2ab

2016-01-06 Thread Ralph Castain
Hi Gilles Could you please explain this one - I honestly don’t understand the change, and haven’t encountered a problem. Thanks Ralph > On Jan 5, 2016, at 11:22 PM, git...@crest.iu.edu wrote: > > This is an automated email from the git hooks/post-receive script. It was > generated because a

Re: [OMPI devel] OMPI v1.10.2rc3 is out

2016-01-01 Thread Ralph Castain
compilers older than 12.2. > I am unable to test *any* PGI compilers now, or for the foreseeable future. > I have also given up on regaining access to any SPARC hardware. > > -Paul > > On Thu, Dec 24, 2015 at 9:47 AM, Ralph Castain <r...@open-mpi.org > <mailto:r...

<    1   2   3   4   5   6   7   8   9   10   >