Thanks Paul! Sorry I was out all day - stuck in meetings, I fear.
On Wed, Dec 17, 2014 at 7:17 PM, Paul Hargrove wrote:
>
> Short version:
>
> v1.8 nightly (v1.8.3-313-g54c80c2) PASSED my testing.
>
> In full:
>
> I gave openmpi-v1.8.3-313-g54c80c2 a try.
> In this test I
Creating nightly hwloc snapshot git tarball was a success.
Snapshot: hwloc 1.10.0-35-g0f77ac3
Start time: Wed Dec 17 21:03:26 EST 2014
End time: Wed Dec 17 21:04:59 EST 2014
Your friendly daemon,
Cyrador
Creating nightly hwloc snapshot git tarball was a success.
Snapshot: hwloc dev-329-g0f4ab7c
Start time: Wed Dec 17 21:01:02 EST 2014
End time: Wed Dec 17 21:03:08 EST 2014
Your friendly daemon,
Cyrador
Ralph,
You get it right.
The latest nightly tarball shoul work out of the box.
(well, -m64 must be passed manually, but this is not related whatsoever to the
issue discussed here)
Cheers,
Gilles
"Jeff Squyres (jsquyres)" wrote:
>Paul --
>
>The __sun macro check is now in
All the BTLs now have been fixed to not use opal_using_threads() in their
component_init functions.
If that means we can move the init of the thread level back further down in
ompi_mpi_init(), then anyone who wants to should feel free to do so. :-)
On Dec 13, 2014, at 2:38 AM, George Bosilca
Paul --
The __sun macro check is now in the OMPI 1.8 tree, and is in the latest nightly
tarball.
If I'm following this thread right -- and I might not be! -- I think Gilles is
saying that now that the __sun check is in, it should fix this
-mt/-D_REENTRANT/whatever problem.
Can you confirm?
This issue should be fixed now.
Please let me know if there are any other libfabric-related issues.
Thanks for your patience!
On Dec 17, 2014, at 4:22 PM, Jeff Squyres (jsquyres) wrote:
> On Dec 17, 2014, at 4:10 PM, Howard Pritchard wrote:
>
>> I'm
Sorry for the delayed response.
I got and built the tarball 1.8.3-272-g4e4f997 below.
A single node job runs ok, with correct cores etc
A multi-node job dies with the following error (no core dumps now):
>>>A specified physical processor does not exist in this topology:
>>> CPU
Blarg -- I had a typo in my command-line search for an alps component, and
thought that it wasn't in the tree any longer.
I just replaced it; sorry.
On Dec 17, 2014, at 4:40 PM, Howard Pritchard wrote:
> Hi Jeff,
>
> Why did you delete the il
>
>
Hi Jeff,
Why did you delete the il
libmca_common_alps_so_version
thats going to break my stuff.
2014-12-17 14:36 GMT-07:00 :
>
> This is an automated email from the git hooks/post-receive script. It was
> generated because a ref change was pushed to the repository
On Dec 17, 2014, at 4:10 PM, Howard Pritchard wrote:
> I'm fixing the libfabric m4 file. It just says its happy if
> infiniband/verbs.h is there, then goes check some specific providers, but
> doesn't go back and check whether the specific providers are actually
>
I'm fixing the libfabric m4 file. It just says its happy if
infiniband/verbs.h is there, then goes check some specific providers, but
doesn't go back and check whether the specific providers are actually
available.
2014-12-17 12:53 GMT-07:00 Jeff Squyres (jsquyres) :
>
> On
Sorry about 'idx' vs 'index'.
In case Rolf is not correct about this being fixed, see below for the
topology object.
-Paul
(dbx) print *topology
*topology = {
nb_levels = 3736059629U
next_group_depth = 3736059629U
level_nbobjects= (3800392320U, 4294966655U, 1U,
I think this has already been fixed by Ralph this morning. I had observed the
same issue but is now gone.
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Brice Goglin
Sent: Wednesday, December 17, 2014 3:53 PM
To: de...@open-mpi.org
Subject: Re: [OMPI devel] Solaris/x86-64 SEGV
Le 17/12/2014 21:43, Paul Hargrove a écrit :
>
> Dbx gives me
>
> t@1 (l@1) terminated by signal SEGV (no mapping at the fault address)
> Current function is opal_hwloc172_hwloc_get_obj_by_depth
>74 return topology->levels[depth][idx];
> (dbx) where
> current thread:
I tried last nights v1.8 tarball (openmpi-v1.8.3-272-g4e4f997.tar.bz2) with
the Studio Compilers (v12.3) on a Solaris/x86-64 system.
Configure args (other than prefix) were:
--enable-debug --with-verbs \
CC=cc CXX=CC FC=f90 \
CFLAGS=-m64 --with-wrapper-cflags=-m64 \
FCFLAGS=-m64
On Dec 17, 2014, at 2:19 PM, Howard Pritchard wrote:
> I did another mtt run with --disable-libfabric included on the configure line
> and still failed with the same problem, mtl/ofi thinks its okay to build...
FWIW: this problem is because the switch is
I believe the community took that position after oshmem caused disruption for a
while.
The ofi MTL has been in the tree for less than 24 hours. Please give us a
little time to sort out running on other people's systems.
On Dec 17, 2014, at 1:48 PM, Joshua Ladd wrote:
On Dec 17, 2014, at 1:40 PM, Howard Pritchard wrote:
> I think the problem is that the libfabric configure is finding ibverbs.h
> header file and thinking everything is fine,
> so the mtl/ofi thinks it can build, even though there will be no libfabric
> for it to resolve
Then I would propose adding a ".ompi_ignore" to the 'ofi' component until
they get their configury straightened out.
Josh
On Wed, Dec 17, 2014 at 2:19 PM, Howard Pritchard
wrote:
>
> Hi Josh,
>
> I did another mtt run with --disable-libfabric included on the configure
>
Hi Josh,
I did another mtt run with --disable-libfabric included on the configure
line and still failed with the same problem, mtl/ofi thinks its okay to
build...
Howard
2014-12-17 11:48 GMT-07:00 Joshua Ladd :
>
> Seem to me this should be disabled by default until folks
Results of tests described below:
1) SEGV in hwloc - will report later
2) PASS
3) PASS
So, both -D_REENTRANT or -mt are working for me IF added both the CFLAGS
and wrapper-cflags.
-Paul
On Tue, Dec 16, 2014 at 10:56 PM, Paul Hargrove wrote:
>
> I've queued 3 tests:
>
> 1)
I did run the nightly and it SEGVs in hwloc!
I will provide more info when I am able.
-Paul
On Tue, Dec 16, 2014 at 10:59 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
>
> Thanks Paul !
>
> imho the first test is useless since it does not include the commit that
> sets the
Seem to me this should be disabled by default until folks can quiet the
noise. If memory serves me, that's the position the community took with
OSHMEM.
Josh
On Wed, Dec 17, 2014 at 1:40 PM, Howard Pritchard
wrote:
>
> Jeff,
>
> I think the problem is that the libfabric
Jeff,
I think the problem is that the libfabric configure is finding ibverbs.h
header file and thinking everything is fine,
so the mtl/ofi thinks it can build, even though there will be no libfabric
for it to resolve symbols when the mtl
framework is opened.
As you'll see from the make output,
Nope, its a Cray XC with good old (well new) MLNX cards installed on the
login and i/o nodes,
so libfab is likely getting built with ofed/ib provider - or at least
things it can build.
I'm rerunning the MTT script to diagnose the problem, just wondering if
others have seen this yet.
2014-12-17
Is this on a PSM-enabled cluster?
Can you send the full output from configure, the config.log, and the output
from "make"?
Are you building statically (i.e., libmpi.a)?
On Dec 17, 2014, at 12:04 PM, Howard Pritchard wrote:
> I noticed my MTT smoke test failed with
I noticed my MTT smoke test failed with todays master build:
name=PMI_process_mapping, (val=(vector,(0,4,4)))
./c_hello./c_hello: : symbol lookup errorsymbol lookup error: :
Turns out that this problem was caused by not having a Fortran compiler. I
fixed that in
https://github.com/open-mpi/ompi-release/commit/b90c8142d343b12cbcc1023cb767801ea2d567a4.
There's still 2 other minor problems (a cleanfile and a condition source
include); working on those...
On Dec
Ralph,
what goes wrong ?
(e.g. which command ?)
and which compiler (e.g. gcc < 4.9.1 ?) are you using ?
Cheers,
Gilles
On 2014/12/17 17:30, Ralph Castain wrote:
> I'm afraid I cannot generate a new rc, nor will there be a new 1.8 nightly
> tarball as (ahem) Jeff's fortran commit broke the
I'm afraid I cannot generate a new rc, nor will there be a new 1.8 nightly
tarball as (ahem) Jeff's fortran commit broke the build system. I tried to
figure out a fix, but am too tired to get it right.
So I'm afraid we are stuck for the moment until Jeff returns in the morning
and fixes the
Thanks Paul !
imho the first test is useless since it does not include the commit that
sets the -D_REENTRANT CFLAGS on solaris/solarisstudio
https://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
Cheers,
Gilles
On 2014/12/17 15:56, Paul Hargrove wrote:
> I've
I've queued 3 tests:
1) openmpi-v1.8.3-272-g4e4f997
2) openmpi-v1.8.4rc4 + adding -D_REENTRANT to CFLAGS and wrapper-cflags
3) openmpi-v1.8.4rc4 + adding -mt to CFLAGS and wrapper-cflags
I hope to be able to login and collect the results around noon pacific time
on Wed.
-Paul
On Tue, Dec 16,
Gilles,
If I have done my testing correctly (not 100% sure) then adding
"-D_REENTRANT" was NOT sufficient, where "-mt" was.
I can at least test 1 tarball with one set of configure args each evening.
Anything more than that I cannot commit to.
My scripts are capable of grabbing the v1.8 nightly
Ralph,
i think that will not work.
here is the full story :
once upon a time, on solaris, we did not try to compile pthread'ed app
without any special parameters.
that was a minor annoyance on solaris 10 with old gcc : configure passed
a flag (-pthread if i remember correctly)
that was not
Ralph,
No change with the patch you supplied.
The test that uses the "pflags" set by your patch is guarded by the value
of ompi_pthread_c_success.
So, I think there must be some other patch needed to the body
of OMPI_INTL_POSIX_THREADS_PLAIN_C to even reach the code changed by the
patch you sent
Hi Paul
Can you try the attached patch? It would require running autogen, I fear.
Otherwise, I can add it to the tarball.
Ralph
On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove wrote:
>
> Gilles,
>
> The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure
Gilles,
The 1.8.3 test works where the 1.8.4rc4 one fails with identical configure
arguments.
While it may be overkill, I configured 1.8.4rc4 with
CFLAGS="-m64 -mt" --with-wrapper-cflags="-m64 -mt" \
LDFLAGS="-mt" --with-wrapper-ldflags="-mt"
The resulting run worked!
So, I very
38 matches
Mail list logo