Re: [OMPI devel] 1.8.4rc Status

2014-12-17 Thread Ralph Castain
Thanks Paul! Sorry I was out all day - stuck in meetings, I fear. On Wed, Dec 17, 2014 at 7:17 PM, Paul Hargrove wrote: > > Short version: > > v1.8 nightly (v1.8.3-313-g54c80c2) PASSED my testing. > > In full: > > I gave openmpi-v1.8.3-313-g54c80c2 a try. > In this test I

[hwloc-devel] Create success (hwloc git 1.10.0-35-g0f77ac3)

2014-12-17 Thread MPI Team
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc 1.10.0-35-g0f77ac3 Start time: Wed Dec 17 21:03:26 EST 2014 End time: Wed Dec 17 21:04:59 EST 2014 Your friendly daemon, Cyrador

[hwloc-devel] Create success (hwloc git dev-329-g0f4ab7c)

2014-12-17 Thread MPI Team
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc dev-329-g0f4ab7c Start time: Wed Dec 17 21:01:02 EST 2014 End time: Wed Dec 17 21:03:08 EST 2014 Your friendly daemon, Cyrador

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
Ralph, You get it right. The latest nightly tarball shoul work out of the box. (well, -m64 must be passed manually, but this is not related whatsoever to the issue discussed here) Cheers, Gilles "Jeff Squyres (jsquyres)" wrote: >Paul -- > >The __sun macro check is now in

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-17 Thread Jeff Squyres (jsquyres)
All the BTLs now have been fixed to not use opal_using_threads() in their component_init functions. If that means we can move the init of the thread level back further down in ompi_mpi_init(), then anyone who wants to should feel free to do so. :-) On Dec 13, 2014, at 2:38 AM, George Bosilca

Re: [OMPI devel] 1.8.4rc Status

2014-12-17 Thread Jeff Squyres (jsquyres)
Paul -- The __sun macro check is now in the OMPI 1.8 tree, and is in the latest nightly tarball. If I'm following this thread right -- and I might not be! -- I think Gilles is saying that now that the __sun check is in, it should fix this -mt/-D_REENTRANT/whatever problem. Can you confirm?

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Jeff Squyres (jsquyres)
This issue should be fixed now. Please let me know if there are any other libfabric-related issues. Thanks for your patience! On Dec 17, 2014, at 4:22 PM, Jeff Squyres (jsquyres) wrote: > On Dec 17, 2014, at 4:10 PM, Howard Pritchard wrote: > >> I'm

Re: [OMPI devel] 1.8.4rc Status

2014-12-17 Thread Tom Wurgler
Sorry for the delayed response. I got and built the tarball 1.8.3-272-g4e4f997 below. A single node job runs ok, with correct cores etc A multi-node job dies with the following error (no core dumps now): >>>A specified physical processor does not exist in this topology: >>> CPU

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-564-g6c468b8

2014-12-17 Thread Jeff Squyres (jsquyres)
Blarg -- I had a typo in my command-line search for an alps component, and thought that it wasn't in the tree any longer. I just replaced it; sorry. On Dec 17, 2014, at 4:40 PM, Howard Pritchard wrote: > Hi Jeff, > > Why did you delete the il > >

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-564-g6c468b8

2014-12-17 Thread Howard Pritchard
Hi Jeff, Why did you delete the il libmca_common_alps_so_version thats going to break my stuff. 2014-12-17 14:36 GMT-07:00 : > > This is an automated email from the git hooks/post-receive script. It was > generated because a ref change was pushed to the repository

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Jeff Squyres (jsquyres)
On Dec 17, 2014, at 4:10 PM, Howard Pritchard wrote: > I'm fixing the libfabric m4 file. It just says its happy if > infiniband/verbs.h is there, then goes check some specific providers, but > doesn't go back and check whether the specific providers are actually >

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Howard Pritchard
I'm fixing the libfabric m4 file. It just says its happy if infiniband/verbs.h is there, then goes check some specific providers, but doesn't go back and check whether the specific providers are actually available. 2014-12-17 12:53 GMT-07:00 Jeff Squyres (jsquyres) : > > On

Re: [OMPI devel] Solaris/x86-64 SEGV with 1.8-latest

2014-12-17 Thread Paul Hargrove
Sorry about 'idx' vs 'index'. In case Rolf is not correct about this being fixed, see below for the topology object. -Paul (dbx) print *topology *topology = { nb_levels = 3736059629U next_group_depth = 3736059629U level_nbobjects= (3800392320U, 4294966655U, 1U,

Re: [OMPI devel] Solaris/x86-64 SEGV with 1.8-latest

2014-12-17 Thread Rolf vandeVaart
I think this has already been fixed by Ralph this morning. I had observed the same issue but is now gone. From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Brice Goglin Sent: Wednesday, December 17, 2014 3:53 PM To: de...@open-mpi.org Subject: Re: [OMPI devel] Solaris/x86-64 SEGV

Re: [OMPI devel] Solaris/x86-64 SEGV with 1.8-latest

2014-12-17 Thread Brice Goglin
Le 17/12/2014 21:43, Paul Hargrove a écrit : > > Dbx gives me > > t@1 (l@1) terminated by signal SEGV (no mapping at the fault address) > Current function is opal_hwloc172_hwloc_get_obj_by_depth >74 return topology->levels[depth][idx]; > (dbx) where > current thread:

[OMPI devel] Solaris/x86-64 SEGV with 1.8-latest

2014-12-17 Thread Paul Hargrove
I tried last nights v1.8 tarball (openmpi-v1.8.3-272-g4e4f997.tar.bz2) with the Studio Compilers (v12.3) on a Solaris/x86-64 system. Configure args (other than prefix) were: --enable-debug --with-verbs \ CC=cc CXX=CC FC=f90 \ CFLAGS=-m64 --with-wrapper-cflags=-m64 \ FCFLAGS=-m64

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Jeff Squyres (jsquyres)
On Dec 17, 2014, at 2:19 PM, Howard Pritchard wrote: > I did another mtt run with --disable-libfabric included on the configure line > and still failed with the same problem, mtl/ofi thinks its okay to build... FWIW: this problem is because the switch is

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Jeff Squyres (jsquyres)
I believe the community took that position after oshmem caused disruption for a while. The ofi MTL has been in the tree for less than 24 hours. Please give us a little time to sort out running on other people's systems. On Dec 17, 2014, at 1:48 PM, Joshua Ladd wrote:

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Jeff Squyres (jsquyres)
On Dec 17, 2014, at 1:40 PM, Howard Pritchard wrote: > I think the problem is that the libfabric configure is finding ibverbs.h > header file and thinking everything is fine, > so the mtl/ofi thinks it can build, even though there will be no libfabric > for it to resolve

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Joshua Ladd
Then I would propose adding a ".ompi_ignore" to the 'ofi' component until they get their configury straightened out. Josh On Wed, Dec 17, 2014 at 2:19 PM, Howard Pritchard wrote: > > Hi Josh, > > I did another mtt run with --disable-libfabric included on the configure >

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Howard Pritchard
Hi Josh, I did another mtt run with --disable-libfabric included on the configure line and still failed with the same problem, mtl/ofi thinks its okay to build... Howard 2014-12-17 11:48 GMT-07:00 Joshua Ladd : > > Seem to me this should be disabled by default until folks

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Paul Hargrove
Results of tests described below: 1) SEGV in hwloc - will report later 2) PASS 3) PASS So, both -D_REENTRANT or -mt are working for me IF added both the CFLAGS and wrapper-cflags. -Paul On Tue, Dec 16, 2014 at 10:56 PM, Paul Hargrove wrote: > > I've queued 3 tests: > > 1)

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Paul Hargrove
I did run the nightly and it SEGVs in hwloc! I will provide more info when I am able. -Paul On Tue, Dec 16, 2014 at 10:59 PM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > > Thanks Paul ! > > imho the first test is useless since it does not include the commit that > sets the

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Joshua Ladd
Seem to me this should be disabled by default until folks can quiet the noise. If memory serves me, that's the position the community took with OSHMEM. Josh On Wed, Dec 17, 2014 at 1:40 PM, Howard Pritchard wrote: > > Jeff, > > I think the problem is that the libfabric

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Howard Pritchard
Jeff, I think the problem is that the libfabric configure is finding ibverbs.h header file and thinking everything is fine, so the mtl/ofi thinks it can build, even though there will be no libfabric for it to resolve symbols when the mtl framework is opened. As you'll see from the make output,

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Howard Pritchard
Nope, its a Cray XC with good old (well new) MLNX cards installed on the login and i/o nodes, so libfab is likely getting built with ofed/ib provider - or at least things it can build. I'm rerunning the MTT script to diagnose the problem, just wondering if others have seen this yet. 2014-12-17

Re: [OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Jeff Squyres (jsquyres)
Is this on a PSM-enabled cluster? Can you send the full output from configure, the config.log, and the output from "make"? Are you building statically (i.e., libmpi.a)? On Dec 17, 2014, at 12:04 PM, Howard Pritchard wrote: > I noticed my MTT smoke test failed with

[OMPI devel] ofi/mtl causing problems

2014-12-17 Thread Howard Pritchard
I noticed my MTT smoke test failed with todays master build: name=PMI_process_mapping, (val=(vector,(0,4,4))) ./c_hello./c_hello: : symbol lookup errorsymbol lookup error: :

Re: [OMPI devel] OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Jeff Squyres (jsquyres)
Turns out that this problem was caused by not having a Fortran compiler. I fixed that in https://github.com/open-mpi/ompi-release/commit/b90c8142d343b12cbcc1023cb767801ea2d567a4. There's still 2 other minor problems (a cleanfile and a condition source include); working on those... On Dec

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
Ralph, what goes wrong ? (e.g. which command ?) and which compiler (e.g. gcc < 4.9.1 ?) are you using ? Cheers, Gilles On 2014/12/17 17:30, Ralph Castain wrote: > I'm afraid I cannot generate a new rc, nor will there be a new 1.8 nightly > tarball as (ahem) Jeff's fortran commit broke the

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Ralph Castain
I'm afraid I cannot generate a new rc, nor will there be a new 1.8 nightly tarball as (ahem) Jeff's fortran commit broke the build system. I tried to figure out a fix, but am too tired to get it right. So I'm afraid we are stuck for the moment until Jeff returns in the morning and fixes the

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
Thanks Paul ! imho the first test is useless since it does not include the commit that sets the -D_REENTRANT CFLAGS on solaris/solarisstudio https://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83 Cheers, Gilles On 2014/12/17 15:56, Paul Hargrove wrote: > I've

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Paul Hargrove
I've queued 3 tests: 1) openmpi-v1.8.3-272-g4e4f997 2) openmpi-v1.8.4rc4 + adding -D_REENTRANT to CFLAGS and wrapper-cflags 3) openmpi-v1.8.4rc4 + adding -mt to CFLAGS and wrapper-cflags I hope to be able to login and collect the results around noon pacific time on Wed. -Paul On Tue, Dec 16,

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Paul Hargrove
Gilles, If I have done my testing correctly (not 100% sure) then adding "-D_REENTRANT" was NOT sufficient, where "-mt" was. I can at least test 1 tarball with one set of configure args each evening. Anything more than that I cannot commit to. My scripts are capable of grabbing the v1.8 nightly

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Gilles Gouaillardet
Ralph, i think that will not work. here is the full story : once upon a time, on solaris, we did not try to compile pthread'ed app without any special parameters. that was a minor annoyance on solaris 10 with old gcc : configure passed a flag (-pthread if i remember correctly) that was not

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Paul Hargrove
Ralph, No change with the patch you supplied. The test that uses the "pflags" set by your patch is guarded by the value of ompi_pthread_c_success. So, I think there must be some other patch needed to the body of OMPI_INTL_POSIX_THREADS_PLAIN_C to even reach the code changed by the patch you sent

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Ralph Castain
Hi Paul Can you try the attached patch? It would require running autogen, I fear. Otherwise, I can add it to the tarball. Ralph On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove wrote: > > Gilles, > > The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure

Re: [OMPI devel] OMPI devel] 1.8.4rc Status

2014-12-17 Thread Paul Hargrove
Gilles, The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure arguments. While it may be overkill, I configured 1.8.4rc4 with CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \ LDFLAGS="-mt" --with-wrapper-ldflags="-mt" The resulting run worked! So, I very