Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-13 Thread Philipp Thomas
Hi Paul,
sorry for chiming in so late, but this list is on low priority for me at the
moment.

* Paul Hargrove (phhargr...@lbl.gov) [20150202 22:58]:

> is erroneous is that /usr/lib contains 32-bit libs (and target is 64-bit).
> Therefore libtool should have replaced -lltdl with /usr/lib64/libltdl.so

It doesn't need to do so. If only -lltdl is passed, the linker will by
default search /usr/lib64. As I'm SUSE's maintainer of libtool (and openMPI
:) maintainer, I'll gladly try to help with any issue.

Philipp


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-03 Thread Paul Hargrove
On Mon, Feb 2, 2015 at 9:26 PM, Paul Hargrove  wrote:

> I am now going to see about a PGI compiler on a system at another center
> (or two?) in order to see how universal the problem is.


That was a dead-end.

Of the many non-NERSC non-Cray institutions where I have accounts, I could
only find one that still has PGI compilers.  However, they don't have
libltdl installed!

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-03 Thread Paul Hargrove
On Mon, Feb 2, 2015 at 5:47 PM, Paul Hargrove  wrote:

> I'll report my test results more completely later, but all 4 PGI-based
> builds I have results for so far have failed with libtool replacing
> "-lltdl" in  link command line with "/usr/lib/libltdl.so" rather than the
> correct "/usr/lib64/libltdl.so".
>
> So, this is a PGI compiler issue not a Cray one.
> Will know later is "PGI" needs to be replaced with "non-GNU"
>


All non-PGI compilers tested out fine, including Open64, PathScale,
Oracle/Studio, IBM and Intel.  I found no other problems with Jeff's
tarball that aren't also present in master.

My PGI testers (one each for v 9, 10, 11, 12, 13, and 14) are all on 2
systems at NERSC.
I am now going to see about a PGI compiler on a system at another center
(or two?) in order to see how universal the problem is.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
On Mon, Feb 2, 2015 at 5:22 PM, Paul Hargrove  wrote:

> So, the overhead for me is pretty small as long as the number of failures
> is kept low.


I jinxed it!!!

I have, I believe, about 7 different failures now on various systems.
All of those appear UNRELATED to the libltdl changes.

I went ahead and reported the OpenBSD/arc4random issue, since it appears to
be a regression of something I reported against v1.8 6 months ago.
However, for the other issues I've encountered I am going to re-run against
a trunk tarball before reporting (to avoid wasting my time and yours).

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
On Mon, Feb 2, 2015 at 4:13 PM, Paul Hargrove  wrote:

> HOWEVER - switching from PGI to GNU compilers made the problem go away.
> So, I suspect it may be an issue with the installation/configuration of
> the PGI compilers.
>


I've reproduced the problem on a non-Cray system with four different
installations of the PGI compilers.
The system has PGI 10.x and 11.x installed by the sys admins.
It also has my private installs of 9.x and 12.x, which I know were
installed with just the defaults.

I'll report my test results more completely later, but all 4 PGI-based
builds I have results for so far have failed with libtool replacing
"-lltdl" in  link command line with "/usr/lib/libltdl.so" rather than the
correct "/usr/lib64/libltdl.so".

So, this is a PGI compiler issue not a Cray one.
Will know later is "PGI" needs to be replaced with "non-GNU"

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
Jeff,

Having already pointed my script at your tarball's URL, typing
"./test-ompi" releases about 60 "hounds".  I get an email for each system
as it's tests complete, and gmail filters tag only the ones where one or
more configurations failed.  So, the overhead for me is pretty small as
long as the number of failures is kept low.

I'll report what I find, but at this point I am expecting only the Cray+PGI
issue we know about.  I am under the impression that you've fixed
everything else I had reported.

-Paul

On Mon, Feb 2, 2015 at 5:05 PM, Jeff Squyres (jsquyres) 
wrote:

> Paul --
>
> If you've got the cycles and it's easy, release the hounds on the tarball
> that I just uploaded to:
>
> http://www.open-mpi.org/~jsquyres/unofficial/
>
> Thanks!
>
>
> > On Feb 2, 2015, at 7:19 PM, Paul Hargrove  wrote:
> >
> > Jeff,
> >
> > If you are still chasing the goal of getting this branch to "just work",
> then I am willing to keep testing.  Let me know when a new tarball is ready
> and I'll give it a run on all of my systems.
> >
> > -Paul
> >
> > On Mon, Feb 2, 2015 at 4:15 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > I had fixed it in my local tree but not yet pushed to my github branch;
> I was waiting to see what happened w.r.t. your failure on the NERSC machine.
> >
> > I pushed the fix up to my branch now; do you want a new tarball?
> >
> >
> > > On Feb 2, 2015, at 5:56 PM, Paul Hargrove  wrote:
> > >
> > > Jeff,
> > >
> > > Looks like you didn't hit all the un-guarded references to lt_dladvise.
> > > Specifically you missed a struct decl:
> > >
> > >
> /[]/openmpi-libltdl-linux-x86_64-gcc/openmpi-gitclone/opal/util/lt_interface.c:25:8:
> error: unknown type name 'lt_dladvise'
> > >
> > > -Paul
> > >
> > >
> > > On Sat, Jan 31, 2015 at 4:44 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > > Looks like the lt_interface.c code didn't properly use the lt_dladvise
> #if. How did that ever work, I wonder?
> > >
> > > Fixed now.  On to your second finding...
> > >
> > >
> > > > On Jan 30, 2015, at 7:42 PM, Paul Hargrove 
> wrote:
> > > >
> > > > Not meeting with the greatest of success.
> > > > This is a report of just the first (of at least 2) failure modes I
> am seeing.
> > > >
> > > > On a Scientific Linux 5.5. (RHEL-5.5 clone like CentOS) I get a
> build failure described below.
> > > > At least Solaris-11 and a few other linux systems (including
> RHAS-4.4) are also failing in what appears to be the same manner.
> > > > I am sure there are more, but I am aborting this round of testing at
> this point.
> > > >
> > > > I again await a new tarball with a less broken-by-default behavior.
> > > >
> > > > -Paul
> > > >
> > > >
> > > > The configure output includes
> > > > checking ltdl.h usability... yes
> > > > checking ltdl.h presence... yes
> > > > checking for ltdl.h... yes
> > > > looking for library without search path
> > > > checking for lt_dlopen in -lltdl... yes
> > > > checking for lt_dladvise_init... no
> > > > configure: WARNING: *
> > > > configure: WARNING: Could not find lt_dladvise_init in libltdl
> > > > configure: WARNING: This could mean that your libltdl version
> > > > configure: WARNING: is old.  If you could upgrade, that would be
> great.
> > > > configure: WARNING: *
> > > > checking for lt_dladvise... no
> > > >
> > > > However, it looks like opal/utill/lt_interface.c is still attempting
> to call lt_dladvise:
> > > > PGC-S-0040-Illegal use of symbol, lt_dladvise
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> 25)
> > > > PGC-W-0156-Type not specified, 'int' assumed
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> 25)
> > > > PGC/x86-64 Linux 12.10-0: compilation completed with severe errors
> > > >
> > > > The put of "libtool --version" says "1.5.22" and we have
> libltdl.so.3.1.4.
> > > > However, the rpm database is not readable, preventing me from
> checking a package version associated with the libltdl.
> > > >
> > > > The failing Solaris-11/x86-64 system says 1.5.22 without any
> ambiguity:
> > > > $ pkg info libltdl | grep Version
> > > >Version: 1.5.22
> > > >
> > > >
> > > > -Paul
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Jan 30, 2015 at 3:51 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > > > New tarball posted (same location).  Now featuring 100% fewer "make
> check" failures.
> > > >
> > > > http://www.open-mpi.org/~jsquyres/unofficial/
> > > >
> > > >
> > > > > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > > > >
> > > > > Shame on me for not running "make check".
> > > > >
> > > > > Fixing...
> > > > >
> > > > >
> > > 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
Jeff,

If you are still chasing the goal of getting this branch to "just work",
then I am willing to keep testing.  Let me know when a new tarball is ready
and I'll give it a run on all of my systems.

-Paul

On Mon, Feb 2, 2015 at 4:15 PM, Jeff Squyres (jsquyres) 
wrote:

> I had fixed it in my local tree but not yet pushed to my github branch; I
> was waiting to see what happened w.r.t. your failure on the NERSC machine.
>
> I pushed the fix up to my branch now; do you want a new tarball?
>
>
> > On Feb 2, 2015, at 5:56 PM, Paul Hargrove  wrote:
> >
> > Jeff,
> >
> > Looks like you didn't hit all the un-guarded references to lt_dladvise.
> > Specifically you missed a struct decl:
> >
> >
> /[]/openmpi-libltdl-linux-x86_64-gcc/openmpi-gitclone/opal/util/lt_interface.c:25:8:
> error: unknown type name 'lt_dladvise'
> >
> > -Paul
> >
> >
> > On Sat, Jan 31, 2015 at 4:44 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > Looks like the lt_interface.c code didn't properly use the lt_dladvise
> #if. How did that ever work, I wonder?
> >
> > Fixed now.  On to your second finding...
> >
> >
> > > On Jan 30, 2015, at 7:42 PM, Paul Hargrove  wrote:
> > >
> > > Not meeting with the greatest of success.
> > > This is a report of just the first (of at least 2) failure modes I am
> seeing.
> > >
> > > On a Scientific Linux 5.5. (RHEL-5.5 clone like CentOS) I get a build
> failure described below.
> > > At least Solaris-11 and a few other linux systems (including RHAS-4.4)
> are also failing in what appears to be the same manner.
> > > I am sure there are more, but I am aborting this round of testing at
> this point.
> > >
> > > I again await a new tarball with a less broken-by-default behavior.
> > >
> > > -Paul
> > >
> > >
> > > The configure output includes
> > > checking ltdl.h usability... yes
> > > checking ltdl.h presence... yes
> > > checking for ltdl.h... yes
> > > looking for library without search path
> > > checking for lt_dlopen in -lltdl... yes
> > > checking for lt_dladvise_init... no
> > > configure: WARNING: *
> > > configure: WARNING: Could not find lt_dladvise_init in libltdl
> > > configure: WARNING: This could mean that your libltdl version
> > > configure: WARNING: is old.  If you could upgrade, that would be great.
> > > configure: WARNING: *
> > > checking for lt_dladvise... no
> > >
> > > However, it looks like opal/utill/lt_interface.c is still attempting
> to call lt_dladvise:
> > > PGC-S-0040-Illegal use of symbol, lt_dladvise
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> 25)
> > > PGC-W-0156-Type not specified, 'int' assumed
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> 25)
> > > PGC/x86-64 Linux 12.10-0: compilation completed with severe errors
> > >
> > > The put of "libtool --version" says "1.5.22" and we have
> libltdl.so.3.1.4.
> > > However, the rpm database is not readable, preventing me from checking
> a package version associated with the libltdl.
> > >
> > > The failing Solaris-11/x86-64 system says 1.5.22 without any ambiguity:
> > > $ pkg info libltdl | grep Version
> > >Version: 1.5.22
> > >
> > >
> > > -Paul
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jan 30, 2015 at 3:51 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > > New tarball posted (same location).  Now featuring 100% fewer "make
> check" failures.
> > >
> > > http://www.open-mpi.org/~jsquyres/unofficial/
> > >
> > >
> > > > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > > >
> > > > Shame on me for not running "make check".
> > > >
> > > > Fixing...
> > > >
> > > >
> > > >> On Jan 30, 2015, at 4:58 PM, Paul Hargrove 
> wrote:
> > > >>
> > > >> Jeff,
> > > >>
> > > >> I ran on just one (mac OSX 10.8) system first as a "smoke test".
> > > >> It encountered the failure show below on "make check" at which
> point I decided not to test 60+ platforms.
> > > >> Please advise how I should proceed (best guess is wait for a new
> tarball).
> > > >>
> > > >> -Paul
> > > >>
> > > >> Making check in test
> > > >> Making check in support
> > > >> make  libsupport.a
> > > >>  CC   components.o
> > > >>
> /Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
> fatal error: 'opal/libltdl/ltdl.h' file not found
> > > >> #include "opal/libltdl/ltdl.h"
> > > >> ^
> > > >>
> > > >>
> > > >> On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > > >> On Jan 30, 2015, at 2:46 PM, Paul Hargrove 
> wrote:
> > > >>>
> > > >>> If I had new enough autotools to autogen on this old system then I
> wouldn't have 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
On Mon, Feb 2, 2015 at 1:58 PM, Paul Hargrove  wrote:

> 2b.  I am retrying now with all of Cray's environment modules unloaded
> except the one for the PGI compiler.  Nathan had suggested something like
> this to me in the past, but I've never had issues with the default
> environment.  I will report the result when available.



The result is unchanged after unloading all the Cray environment modules.

However, I did notice that configure found (for instance) ALPS support
despite my unloading all the Cray environment modules and included a
message recognizing the system as CLE4.   So, it is possible that unloading
the modules was insufficient to avoid the Cray-specific aspects of the
system.

HOWEVER - switching from PGI to GNU compilers made the problem go away.
So, I suspect it may be an issue with the installation/configuration of the
PGI compilers.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
Jeff,

Looks like you didn't hit all the un-guarded references to lt_dladvise.
Specifically you missed a struct decl:

/[]/openmpi-libltdl-linux-x86_64-gcc/openmpi-gitclone/opal/util/lt_interface.c:25:8:
error: unknown type name 'lt_dladvise'

-Paul


On Sat, Jan 31, 2015 at 4:44 AM, Jeff Squyres (jsquyres)  wrote:

> Looks like the lt_interface.c code didn't properly use the lt_dladvise
> #if. How did that ever work, I wonder?
>
> Fixed now.  On to your second finding...
>
>
> > On Jan 30, 2015, at 7:42 PM, Paul Hargrove  wrote:
> >
> > Not meeting with the greatest of success.
> > This is a report of just the first (of at least 2) failure modes I am
> seeing.
> >
> > On a Scientific Linux 5.5. (RHEL-5.5 clone like CentOS) I get a build
> failure described below.
> > At least Solaris-11 and a few other linux systems (including RHAS-4.4)
> are also failing in what appears to be the same manner.
> > I am sure there are more, but I am aborting this round of testing at
> this point.
> >
> > I again await a new tarball with a less broken-by-default behavior.
> >
> > -Paul
> >
> >
> > The configure output includes
> > checking ltdl.h usability... yes
> > checking ltdl.h presence... yes
> > checking for ltdl.h... yes
> > looking for library without search path
> > checking for lt_dlopen in -lltdl... yes
> > checking for lt_dladvise_init... no
> > configure: WARNING: *
> > configure: WARNING: Could not find lt_dladvise_init in libltdl
> > configure: WARNING: This could mean that your libltdl version
> > configure: WARNING: is old.  If you could upgrade, that would be great.
> > configure: WARNING: *
> > checking for lt_dladvise... no
> >
> > However, it looks like opal/utill/lt_interface.c is still attempting to
> call lt_dladvise:
> > PGC-S-0040-Illegal use of symbol, lt_dladvise
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> 25)
> > PGC-W-0156-Type not specified, 'int' assumed
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
> 25)
> > PGC/x86-64 Linux 12.10-0: compilation completed with severe errors
> >
> > The put of "libtool --version" says "1.5.22" and we have
> libltdl.so.3.1.4.
> > However, the rpm database is not readable, preventing me from checking a
> package version associated with the libltdl.
> >
> > The failing Solaris-11/x86-64 system says 1.5.22 without any ambiguity:
> > $ pkg info libltdl | grep Version
> >Version: 1.5.22
> >
> >
> > -Paul
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Jan 30, 2015 at 3:51 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > New tarball posted (same location).  Now featuring 100% fewer "make
> check" failures.
> >
> > http://www.open-mpi.org/~jsquyres/unofficial/
> >
> >
> > > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > >
> > > Shame on me for not running "make check".
> > >
> > > Fixing...
> > >
> > >
> > >> On Jan 30, 2015, at 4:58 PM, Paul Hargrove 
> wrote:
> > >>
> > >> Jeff,
> > >>
> > >> I ran on just one (mac OSX 10.8) system first as a "smoke test".
> > >> It encountered the failure show below on "make check" at which point
> I decided not to test 60+ platforms.
> > >> Please advise how I should proceed (best guess is wait for a new
> tarball).
> > >>
> > >> -Paul
> > >>
> > >> Making check in test
> > >> Making check in support
> > >> make  libsupport.a
> > >>  CC   components.o
> > >>
> /Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
> fatal error: 'opal/libltdl/ltdl.h' file not found
> > >> #include "opal/libltdl/ltdl.h"
> > >> ^
> > >>
> > >>
> > >> On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > >> On Jan 30, 2015, at 2:46 PM, Paul Hargrove 
> wrote:
> > >>>
> > >>> If I had new enough autotools to autogen on this old system then I
> wouldn't have asked about libltdl from libtool-1.4.  So, please *do*
> generate a tarball and I will test (on *all* of my systems).
> > >>
> > >> Sweet, thank you.  I just posted a tarball here:
> > >>
> > >>http://www.open-mpi.org/~jsquyres/unofficial/
> > >>
> > >> --
> > >> Jeff Squyres
> > >> jsquy...@cisco.com
> > >> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> > >>
> > >> ___
> > >> devel mailing list
> > >> de...@open-mpi.org
> > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16854.php
> > >>
> > >>
> > >>
> > >> --
> > >> Paul H. Hargrove  phhargr...@lbl.gov
> > >> Computer Languages & Systems 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Christopher Samuel
On 03/02/15 05:09, Ralph Castain wrote:

> Just out of curiosity: I see you are reporting about a build on the
> headnode of a BG cluster. We've never ported OMPI to BG - are you using
> it on such a system? Or were you just test building the code on a
> convenient server?

Just a convenient server with a not-so-mainstream architecture (and an
older RHEL release through necessity).  Sorry to get your hopes up! :-)

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Jeff Squyres (jsquyres)
On Feb 2, 2015, at 5:24 PM, Jeff Squyres (jsquyres)  wrote:
> 
> IANAL, but after talking through the license stuff, we think there will be 
> new license issues caused by --disable-dlopen behavior.

ARRGH -- that should have been:

...we think there will be ***NO*** new license issues caused by 
--disable-dlopen behavior.

Sorry for any confusion!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Jeff Squyres (jsquyres)
Ralph and I just chatted about this on the phone.

IANAL, but after talking through the license stuff, we think there will be new 
license issues caused by --disable-dlopen behavior.

It feels like there's a lot of unexpected issues coming up with (more-or-less) 
causing (most?) people to build with --disable-dlopen support:

- (probably?) larger libraries and process memory footprint
- wonky behavior on Cray/NERSC Hopper system (but perhaps Howard will solve 
that one?)
- after talking to Howard and Rolf today, might well need to (re)add 
--with-libltdl=DIR support (for libltdl installed in non-standard locations)
- difference in behavior between git clone builds (require libltdl by default) 
and production builds (build libltdl support or not)
- it seems that there are valid use cases where people want to add plugins to 
existing Open MPI installations

It might well be worth investigating manually embedding libltdl ourselves 
(i.e., git committing libltdl vs. having autogen copy it in).  The 
bootstrapping will be a bit different; Dave raised the point last week that 
it's not guaranteed that this will work -- would need to be investigated.




> On Feb 2, 2015, at 2:25 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Uuuurggghhh.
> 
> More below.
> 
> 
>> On Feb 2, 2015, at 1:04 PM, Ralph Castain  wrote:
>> 
>> Returning to the libltdl question: I think we may have a problem here. If we 
>> remove libltdl and default to disable-dlopen, then the user will - without 
>> warning - slurp all components that are able to build into libompi. This 
>> includes everything they specified, BUT because of our "build if you can" 
>> policy, it also includes a lot of stuff that they didn't specify and may not 
>> even realize is present.
> 
> Yes, this is true -- the size of libmpi.so (etc.) will actually go up.
> 
> 
> 
> It would be an interesting experiment to see if the process size actually 
> increases.  When you dlopen() a DSO, it's loaded into distinct pages -- even 
> components that are fairly small (e.g., mca_btl_self.so is 63726 bytes on my 
> system) are automatically inflated to be multiples of 4K.  When all the 
> components are packed into libmpi.so (etc), the end result is actually 
> smaller.
> 
> That being said, when built as DSOs, OMPI can (and likely does) dlclose 
> components that you don't use at run time.  You obviously can't do that when 
> all the components are in libmpi.so (etc.).  Meaning: there's forces pulling 
> both ways here -- I wonder whether users will typically grow or shrink their 
> process sizes...?
> 
> The answer may be an obvious "your process will grow", but it may not be.  If 
> someone has some spare cycles (hah!), this would be an interesting 
> experiment.  :-)
> 
> 
> 
> We've had these discussions before; the conclusion of which was to ensure 
> that we provide "--disable" and "--without" options for those people who know 
> exactly what they want, and don't want anything else.
> 
> So Ralph -- I hear the cautionary warning that you're raising.  Are the 
> --disable/--without options no longer viable?
> 
>> As a result, they not only will have a bloated memory footprint, but they 
>> also may very well have slurped in GPL libraries (e.g., if Slurm is present) 
>> that could potentially impact their legal situation. We may need to 
>> reconsider our build policy in light of this situation.
> 
> IANAL and all that.
> 
> If you're distributing binaries, my understanding is that this doesn't change 
> your legal situation.  I.e., if you're a) building an OMPI component that 
> links against GPL libraries, and then b) distributing those binaries, it 
> doesn't matter if you built the component as a DSO or as part of (for 
> example) libmpi.so.
> 
> -
> 
> All that being said, yes, removing our default model of plugins is a *big* 
> change.  There are many subtle issues involved (including those that Ralph 
> brought up in this mail).
> 
> If we want to keep this model (plugins by default), the only way I can think 
> of to do that is to manually embed libltdl ourselves.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Paul Hargrove
Jeff and Howard,

Just a couple minor points:

1.  In case one has lost track, the reason the behavior described by Jeff
is erroneous is that /usr/lib contains 32-bit libs (and target is 64-bit).
Therefore libtool should have replaced -lltdl with /usr/lib64/libltdl.so
(if at all).

2a.  Jeff does raise a good point that the problem might be Cray-specific.
It is worth noting that I was performing a build for the login node (not
the compute nodes), using the PGI-14.7.0 compiler.  Configure options are
--prefix-...  --enable-debug CC=pgcc CXX=pgCC FC=pgf90

2b.  I am retrying now with all of Cray's environment modules unloaded
except the one for the PGI compiler.  Nathan had suggested something like
this to me in the past, but I've never had issues with the default
environment.  I will report the result when available.

-Paul

On Mon, Feb 2, 2015 at 1:18 PM, Jeff Squyres (jsquyres) 
wrote:

> Re-adding devel, since Paul sent me the logs off-list.
>
> (per Ralph's comment, we may or may not stick with this
> don't-build-libltdl philosophy, but I'd still like to run this issue down)
>
> Howard: see Paul's notes below.  It's on the hopper system at Nersc.
>
> Do you have any Cray insight here?  (see below for the exact issue)
>
>
> > On Feb 1, 2015, at 3:52 AM, Paul Hargrove  wrote:
> >
> > Jeff (off-list),
> >
> > Original make was with V=1, so I skipped the "make clean" before
> "record/send the output of make w/ V=1".
> > All the requested files should be in the attached .tar.bz2.
> >
> > What I see from configure is the following is explicit about "without
> search path":
> > configure:63392: result: looking for library without search path
> > configure:63394: checking for lt_dlopen in -lltdl
> > configure:63419: pgcc -o conftest -g   conftest.c -lltdl  -lrt -lutil
> >&5
> > configure:63419: $? = 0
> > configure:63428: result: yes
> >
> > The "make V=1" shows "-ltdl" passed to libtool in the line before the
> one I quoted previously.
> > Libtool then *instead* passes "/usr/lib/libltdl.so" to the link command.
> > So, I've included the generated config.lt, which appears to place
> /usr/lib ahead of /usr/lib64 in its search path(s).
> >
> > Let me know what else you may need.
> > This is on NERSC's Hopper, where Howard and Nathan both have accounts
> (though I did see the message about Nathan taking some time off).
>
> Here's the full output from the logs that Paul sent to me -- you can see
> that Makefile passes "-lltdl" and then libtool converts it to
> /usr/lib/ltdl.so:
>
> /bin/sh ../libtool  --tag=CC   --mode=link pgcc
> -DOPAL_CONFIGURE_HOST="\"hopper09\"" -g  -version-info 0:0:0  -o
> libopen-pal.la -rpath
> /scratch/scratchdirs/hargrove/OMPI/openmpi-libltdl-linux-x86_64-pgi-14.7/INST/lib
> class/opal_bitmap.lo class/opal_free_list.lo class/opal_hash_table.lo
> class/opal_hotel.lo class/opal_tree.lo class/opal_list.lo
> class/opal_object.lo class/opal_graph.lo class/opal_lifo.lo
> class/opal_fifo.lo class/opal_pointer_array.lo class/opal_value_array.lo
> class/opal_ring_buffer.lo class/opal_rb_tree.lo class/ompi_free_list.lo
> memoryhooks/memory.lo runtime/opal_progress.lo runtime/opal_finalize.lo
> runtime/opal_init.lo runtime/opal_params.lo runtime/opal_cr.lo
> runtime/opal_info_support.lo runtime/opal_progress_threads.lo
> threads/condition.lo threads/mutex.lo threads/thread.lo
> dss/dss_internal_functions.lo dss/dss_compare.lo dss/dss_copy.lo
> dss/dss_dump.lo dss/dss_load_unload.lo dss/dss_lookup.lo dss/dss_pack.lo
> dss/dss_peek.lo dss/dss_print.lo dss/dss_register.lo dss/dss_unpack.lo
> dss/dss_open_close.lo asm/libasm.la datatype/libdatatype.la mca/base/
> libmca_base.la util/libopalutil.la  mca/allocator/libmca_allocator.la
> mca/backtrace/libmca_backtrace.la mca/backtrace/execinfo/
> libmca_backtrace_execinfo.la  mca/btl/libmca_btl.la  mca/compress/
> libmca_compress.la  mca/crs/libmca_crs.la  mca/dstore/libmca_dstore.la
> mca/event/libmca_event.la mca/event/libevent2022/
> libmca_event_libevent2022.la  mca/hwloc/libmca_hwloc.la
> mca/hwloc/hwloc191/libmca_hwloc_hwloc191.la  mca/if/libmca_if.la
> mca/if/posix_ipv4/libmca_if_posix_ipv4.la mca/if/linux_ipv6/
> libmca_if_linux_ipv6.la  mca/installdirs/libmca_installdirs.la
> mca/installdirs/config/libmca_installdirs_config.la mca/installdirs/env/
> libmca_installdirs_env.la  mca/memchecker/libmca_memchecker.la
> mca/memcpy/libmca_memcpy.la  mca/memory/libmca_memory.la mca/memory/linux/
> libmca_memory_linux.la  mca/mpool/libmca_mpool.la  mca/pmix/libmca_pmix.la
> mca/pstat/libmca_pstat.la  mca/rcache/libmca_rcache.la  mca/sec/
> libmca_sec.la  mca/shmem/libmca_shmem.la  mca/timer/libmca_timer.la
> mca/timer/linux/libmca_timer_linux.la  -lrt -lutil  -lltdl   -lrt -lutil
> -lltdl
> libtool: link: pgcc -shared  -fpic -DPIC  class/.libs/opal_bitmap.o
> class/.libs/opal_free_list.o class/.libs/opal_hash_table.o
> class/.libs/opal_hotel.o class/.libs/opal_tree.o class/.libs/opal_list.o
> 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Jeff Squyres (jsquyres)
Re-adding devel, since Paul sent me the logs off-list.

(per Ralph's comment, we may or may not stick with this don't-build-libltdl 
philosophy, but I'd still like to run this issue down)

Howard: see Paul's notes below.  It's on the hopper system at Nersc.

Do you have any Cray insight here?  (see below for the exact issue)


> On Feb 1, 2015, at 3:52 AM, Paul Hargrove  wrote:
> 
> Jeff (off-list),
> 
> Original make was with V=1, so I skipped the "make clean" before "record/send 
> the output of make w/ V=1".
> All the requested files should be in the attached .tar.bz2.
> 
> What I see from configure is the following is explicit about "without search 
> path":
> configure:63392: result: looking for library without search path
> configure:63394: checking for lt_dlopen in -lltdl
> configure:63419: pgcc -o conftest -g   conftest.c -lltdl  -lrt -lutil  >&5
> configure:63419: $? = 0
> configure:63428: result: yes
> 
> The "make V=1" shows "-ltdl" passed to libtool in the line before the one I 
> quoted previously.
> Libtool then *instead* passes "/usr/lib/libltdl.so" to the link command.
> So, I've included the generated config.lt, which appears to place /usr/lib 
> ahead of /usr/lib64 in its search path(s).
> 
> Let me know what else you may need.
> This is on NERSC's Hopper, where Howard and Nathan both have accounts (though 
> I did see the message about Nathan taking some time off).

Here's the full output from the logs that Paul sent to me -- you can see that 
Makefile passes "-lltdl" and then libtool converts it to /usr/lib/ltdl.so:

/bin/sh ../libtool  --tag=CC   --mode=link pgcc 
-DOPAL_CONFIGURE_HOST="\"hopper09\"" -g  -version-info 0:0:0  -o libopen-pal.la 
-rpath 
/scratch/scratchdirs/hargrove/OMPI/openmpi-libltdl-linux-x86_64-pgi-14.7/INST/lib
  class/opal_bitmap.lo class/opal_free_list.lo class/opal_hash_table.lo 
class/opal_hotel.lo class/opal_tree.lo class/opal_list.lo class/opal_object.lo 
class/opal_graph.lo class/opal_lifo.lo class/opal_fifo.lo 
class/opal_pointer_array.lo class/opal_value_array.lo class/opal_ring_buffer.lo 
class/opal_rb_tree.lo class/ompi_free_list.lo memoryhooks/memory.lo 
runtime/opal_progress.lo runtime/opal_finalize.lo runtime/opal_init.lo 
runtime/opal_params.lo runtime/opal_cr.lo runtime/opal_info_support.lo 
runtime/opal_progress_threads.lo threads/condition.lo threads/mutex.lo 
threads/thread.lo dss/dss_internal_functions.lo dss/dss_compare.lo 
dss/dss_copy.lo dss/dss_dump.lo dss/dss_load_unload.lo dss/dss_lookup.lo 
dss/dss_pack.lo dss/dss_peek.lo dss/dss_print.lo dss/dss_register.lo 
dss/dss_unpack.lo dss/dss_open_close.lo asm/libasm.la datatype/libdatatype.la 
mca/base/libmca_base.la util/libopalutil.la  mca/allocator/libmca_allocator.la  
mca/backtrace/libmca_backtrace.la 
mca/backtrace/execinfo/libmca_backtrace_execinfo.la  mca/btl/libmca_btl.la  
mca/compress/libmca_compress.la  mca/crs/libmca_crs.la  
mca/dstore/libmca_dstore.la  mca/event/libmca_event.la 
mca/event/libevent2022/libmca_event_libevent2022.la  mca/hwloc/libmca_hwloc.la 
mca/hwloc/hwloc191/libmca_hwloc_hwloc191.la  mca/if/libmca_if.la 
mca/if/posix_ipv4/libmca_if_posix_ipv4.la 
mca/if/linux_ipv6/libmca_if_linux_ipv6.la  
mca/installdirs/libmca_installdirs.la 
mca/installdirs/config/libmca_installdirs_config.la 
mca/installdirs/env/libmca_installdirs_env.la  
mca/memchecker/libmca_memchecker.la  mca/memcpy/libmca_memcpy.la  
mca/memory/libmca_memory.la mca/memory/linux/libmca_memory_linux.la  
mca/mpool/libmca_mpool.la  mca/pmix/libmca_pmix.la  mca/pstat/libmca_pstat.la  
mca/rcache/libmca_rcache.la  mca/sec/libmca_sec.la  mca/shmem/libmca_shmem.la  
mca/timer/libmca_timer.la mca/timer/linux/libmca_timer_linux.la  -lrt -lutil  
-lltdl   -lrt -lutil  -lltdl  
libtool: link: pgcc -shared  -fpic -DPIC  class/.libs/opal_bitmap.o 
class/.libs/opal_free_list.o class/.libs/opal_hash_table.o 
class/.libs/opal_hotel.o class/.libs/opal_tree.o class/.libs/opal_list.o 
class/.libs/opal_object.o class/.libs/opal_graph.o class/.libs/opal_lifo.o 
class/.libs/opal_fifo.o class/.libs/opal_pointer_array.o 
class/.libs/opal_value_array.o class/.libs/opal_ring_buffer.o 
class/.libs/opal_rb_tree.o class/.libs/ompi_free_list.o 
memoryhooks/.libs/memory.o runtime/.libs/opal_progress.o 
runtime/.libs/opal_finalize.o runtime/.libs/opal_init.o 
runtime/.libs/opal_params.o runtime/.libs/opal_cr.o 
runtime/.libs/opal_info_support.o runtime/.libs/opal_progress_threads.o 
threads/.libs/condition.o threads/.libs/mutex.o threads/.libs/thread.o 
dss/.libs/dss_internal_functions.o dss/.libs/dss_compare.o dss/.libs/dss_copy.o 
dss/.libs/dss_dump.o dss/.libs/dss_load_unload.o dss/.libs/dss_lookup.o 
dss/.libs/dss_pack.o dss/.libs/dss_peek.o dss/.libs/dss_print.o 
dss/.libs/dss_register.o dss/.libs/dss_unpack.o dss/.libs/dss_open_close.o  

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Jeff Squyres (jsquyres)
Uuuurggghhh.

More below.


> On Feb 2, 2015, at 1:04 PM, Ralph Castain  wrote:
> 
> Returning to the libltdl question: I think we may have a problem here. If we 
> remove libltdl and default to disable-dlopen, then the user will - without 
> warning - slurp all components that are able to build into libompi. This 
> includes everything they specified, BUT because of our "build if you can" 
> policy, it also includes a lot of stuff that they didn't specify and may not 
> even realize is present.

Yes, this is true -- the size of libmpi.so (etc.) will actually go up.



It would be an interesting experiment to see if the process size actually 
increases.  When you dlopen() a DSO, it's loaded into distinct pages -- even 
components that are fairly small (e.g., mca_btl_self.so is 63726 bytes on my 
system) are automatically inflated to be multiples of 4K.  When all the 
components are packed into libmpi.so (etc), the end result is actually smaller.

That being said, when built as DSOs, OMPI can (and likely does) dlclose 
components that you don't use at run time.  You obviously can't do that when 
all the components are in libmpi.so (etc.).  Meaning: there's forces pulling 
both ways here -- I wonder whether users will typically grow or shrink their 
process sizes...?

The answer may be an obvious "your process will grow", but it may not be.  If 
someone has some spare cycles (hah!), this would be an interesting experiment.  
:-)



We've had these discussions before; the conclusion of which was to ensure that 
we provide "--disable" and "--without" options for those people who know 
exactly what they want, and don't want anything else.

So Ralph -- I hear the cautionary warning that you're raising.  Are the 
--disable/--without options no longer viable?

> As a result, they not only will have a bloated memory footprint, but they 
> also may very well have slurped in GPL libraries (e.g., if Slurm is present) 
> that could potentially impact their legal situation. We may need to 
> reconsider our build policy in light of this situation.

IANAL and all that.

If you're distributing binaries, my understanding is that this doesn't change 
your legal situation.  I.e., if you're a) building an OMPI component that links 
against GPL libraries, and then b) distributing those binaries, it doesn't 
matter if you built the component as a DSO or as part of (for example) 
libmpi.so.

-

All that being said, yes, removing our default model of plugins is a *big* 
change.  There are many subtle issues involved (including those that Ralph 
brought up in this mail).

If we want to keep this model (plugins by default), the only way I can think of 
to do that is to manually embed libltdl ourselves.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Ralph Castain
Hi Chris

Just out of curiosity: I see you are reporting about a build on the
headnode of a BG cluster. We've never ported OMPI to BG - are you using it
on such a system? Or were you just test building the code on a convenient
server?

Ralph


On Mon, Feb 2, 2015 at 3:52 AM, Chris Samuel  wrote:

> On Mon, 2 Feb 2015 11:35:40 AM Jeff Squyres wrote:
>
> > Ah -- the point being that this is not an issue related to the libltdl
> work.
>
> Sorry - I saw the request to test the tarball and tried it out, missed the
> significance of the subject. :-/
>
> --
>  Christopher SamuelSenior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/  http://twitter.com/vlsci
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/02/16876.php
>


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Ralph Castain
Returning to the libltdl question: I think we may have a problem here. If
we remove libltdl and default to disable-dlopen, then the user will -
without warning - slurp all components that are able to build into libompi.
This includes everything they specified, BUT because of our "build if you
can" policy, it also includes a lot of stuff that they didn't specify and
may not even realize is present.

As a result, they not only will have a bloated memory footprint, but they
also may very well have slurped in GPL libraries (e.g., if Slurm is
present) that could potentially impact their legal situation. We may need
to reconsider our build policy in light of this situation.


On Mon, Feb 2, 2015 at 3:35 AM, Jeff Squyres (jsquyres) 
wrote:

> Ah -- the point being that this is not an issue related to the libltdl
> work.
>
>
> > On Feb 2, 2015, at 2:51 AM, Adrian Reber  wrote:
> >
> > I have reported the same error a few days ago and submitted it now as a
> > github issue: https://github.com/open-mpi/ompi/issues/371
> >
> > On Mon, Feb 02, 2015 at 12:36:54PM +1100, Christopher Samuel wrote:
> >> On 31/01/15 10:51, Jeff Squyres (jsquyres) wrote:
> >>
> >>> New tarball posted (same location).  Now featuring 100% fewer "make
> check" failures.
> >>
> >> On our BG/Q front-end node (PPC64, RHEL 6.4) I see:
> >>
> >> ../../config/test-driver: line 95: 30173 Segmentation fault  (core
> dumped) "$@" > $log_file 2>&1
> >> FAIL: opal_lifo
> >>
> >> Stack trace implies the culprit is in:
> >>
> >> #0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
> >>at
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
> >> 51  old = *addr;
> >>
> >> I've attached a script of gdb doing "thread apply all bt full" in
> >> case that's helpful.
> >>
> >> All the best,
> >> Chris
> >> --
> >> Christopher SamuelSenior Systems Administrator
> >> VLSCI - Victorian Life Sciences Computation Initiative
> >> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> >> http://www.vlsci.org.au/  http://twitter.com/vlsci
> >>
> >
> >> Script started on Mon 02 Feb 2015 12:32:56 EST
> >>
> >> [samuel@avoca class]$ gdb
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo
> core.32444
> >>  [?1034hGNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
> >> Copyright (C) 2010 Free Software Foundation, Inc.
> >> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> >> This is free software: you are free to change and redistribute it.
> >> There is NO WARRANTY, to the extent permitted by law.  Type "show
> copying"
> >> and "show warranty" for details.
> >> This GDB was configured as "ppc64-redhat-linux-gnu".
> >> For bug reporting instructions, please see:
> >> ...
> >> Reading symbols from
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo...done.
> >> [New Thread 32465]
> >> [New Thread 32464]
> >> [New Thread 32466]
> >> [New Thread 32444]
> >> [New Thread 32469]
> >> [New Thread 32467]
> >> [New Thread 32470]
> >> [New Thread 32463]
> >> [New Thread 32468]
> >> Missing separate debuginfo for
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
> >> Try: yum --disablerepo='*' --enablerepo='*-debug*' install
> /usr/lib/debug/.build-id/de/a09192aa84bbc15579ae5190dc8acd16eb94fe
> >> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libpmi.so.0
> >> Try: yum --disablerepo='*' --enablerepo='*-debug*' install
> /usr/lib/debug/.build-id/28/09dfc4706ed44259cc31a5898c8d1a9b76b949
> >> Missing separate debuginfo for
> /usr/local/slurm/14.03.10/lib/libslurm.so.27
> >> Try: yum --disablerepo='*' --enablerepo='*-debug*' install
> /usr/lib/debug/.build-id/e2/39d8a2994ae061ab7ada0ebb7719b8efa5de96
> >> Missing separate debuginfo for
> >> Try: yum --disablerepo='*' --enablerepo='*-debug*' install
> /usr/lib/debug/.build-id/1a/063e3d64bb5560021ec2ba5329fb1e420b470f
> >> Reading symbols from
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0...done.
> >> Loaded symbols for
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
> >> Reading symbols from /usr/local/slurm/14.03.10/lib/libpmi.so.0...done.
> >> Loaded symbols for /usr/local/slurm/14.03.10/lib/libpmi.so.0
> >> Reading symbols from
> /usr/local/slurm/14.03.10/lib/libslurm.so.27...done.
> >> Loaded symbols for /usr/local/slurm/14.03.10/lib/libslurm.so.27
> >> Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> found)...done.
> >> Loaded symbols for /lib64/libdl.so.2
> >> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> >> [Thread debugging using libthread_db enabled]
> >> Loaded symbols for /lib64/libpthread.so.0
> >> Reading symbols from /lib64/librt.so.1...(no debugging symbols
> found)...done.
> >> Loaded symbols for /lib64/librt.so.1
> >> Reading symbols from 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Chris Samuel
On Mon, 2 Feb 2015 11:35:40 AM Jeff Squyres wrote:

> Ah -- the point being that this is not an issue related to the libltdl work.

Sorry - I saw the request to test the tarball and tried it out, missed the 
significance of the subject. :-/

-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Jeff Squyres (jsquyres)
Ah -- the point being that this is not an issue related to the libltdl work.


> On Feb 2, 2015, at 2:51 AM, Adrian Reber  wrote:
> 
> I have reported the same error a few days ago and submitted it now as a
> github issue: https://github.com/open-mpi/ompi/issues/371
> 
> On Mon, Feb 02, 2015 at 12:36:54PM +1100, Christopher Samuel wrote:
>> On 31/01/15 10:51, Jeff Squyres (jsquyres) wrote:
>> 
>>> New tarball posted (same location).  Now featuring 100% fewer "make check" 
>>> failures.
>> 
>> On our BG/Q front-end node (PPC64, RHEL 6.4) I see:
>> 
>> ../../config/test-driver: line 95: 30173 Segmentation fault  (core 
>> dumped) "$@" > $log_file 2>&1
>> FAIL: opal_lifo
>> 
>> Stack trace implies the culprit is in:
>> 
>> #0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
>>at 
>> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
>> 51  old = *addr;
>> 
>> I've attached a script of gdb doing "thread apply all bt full" in
>> case that's helpful.
>> 
>> All the best,
>> Chris
>> -- 
>> Christopher SamuelSenior Systems Administrator
>> VLSCI - Victorian Life Sciences Computation Initiative
>> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>> http://www.vlsci.org.au/  http://twitter.com/vlsci
>> 
> 
>> Script started on Mon 02 Feb 2015 12:32:56 EST
>> 
>> [samuel@avoca class]$ gdb 
>> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo 
>> core.32444
>> [?1034hGNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
>> Copyright (C) 2010 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later 
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "ppc64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> ...
>> Reading symbols from 
>> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo...done.
>> [New Thread 32465]
>> [New Thread 32464]
>> [New Thread 32466]
>> [New Thread 32444]
>> [New Thread 32469]
>> [New Thread 32467]
>> [New Thread 32470]
>> [New Thread 32463]
>> [New Thread 32468]
>> Missing separate debuginfo for 
>> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
>> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
>> /usr/lib/debug/.build-id/de/a09192aa84bbc15579ae5190dc8acd16eb94fe
>> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libpmi.so.0
>> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
>> /usr/lib/debug/.build-id/28/09dfc4706ed44259cc31a5898c8d1a9b76b949
>> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libslurm.so.27
>> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
>> /usr/lib/debug/.build-id/e2/39d8a2994ae061ab7ada0ebb7719b8efa5de96
>> Missing separate debuginfo for 
>> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
>> /usr/lib/debug/.build-id/1a/063e3d64bb5560021ec2ba5329fb1e420b470f
>> Reading symbols from 
>> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0...done.
>> Loaded symbols for 
>> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
>> Reading symbols from /usr/local/slurm/14.03.10/lib/libpmi.so.0...done.
>> Loaded symbols for /usr/local/slurm/14.03.10/lib/libpmi.so.0
>> Reading symbols from /usr/local/slurm/14.03.10/lib/libslurm.so.27...done.
>> Loaded symbols for /usr/local/slurm/14.03.10/lib/libslurm.so.27
>> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libdl.so.2
>> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols 
>> found)...done.
>> [Thread debugging using libthread_db enabled]
>> Loaded symbols for /lib64/libpthread.so.0
>> Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/librt.so.1
>> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libm.so.6
>> Reading symbols from /lib64/libutil.so.1...(no debugging symbols 
>> found)...done.
>> Loaded symbols for /lib64/libutil.so.1
>> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/libc.so.6
>> Reading symbols from /lib64/ld64.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib64/ld64.so.1
>> Core was generated by 
>> `/vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo '.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
>>at 
>> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
>> 51   old = *addr;
>> Missing separate debuginfos, use: debuginfo-install 
>> glibc-2.12-1.107.el6_4.5.ppc64
>> 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-02 Thread Adrian Reber
I have reported the same error a few days ago and submitted it now as a
github issue: https://github.com/open-mpi/ompi/issues/371

On Mon, Feb 02, 2015 at 12:36:54PM +1100, Christopher Samuel wrote:
> On 31/01/15 10:51, Jeff Squyres (jsquyres) wrote:
> 
> > New tarball posted (same location).  Now featuring 100% fewer "make check" 
> > failures.
> 
> On our BG/Q front-end node (PPC64, RHEL 6.4) I see:
> 
> ../../config/test-driver: line 95: 30173 Segmentation fault  (core 
> dumped) "$@" > $log_file 2>&1
> FAIL: opal_lifo
> 
> Stack trace implies the culprit is in:
> 
> #0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
> at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
> 51  old = *addr;
> 
> I've attached a script of gdb doing "thread apply all bt full" in
> case that's helpful.
> 
> All the best,
> Chris
> -- 
>  Christopher SamuelSenior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/  http://twitter.com/vlsci
> 

> Script started on Mon 02 Feb 2015 12:32:56 EST
> 
> [samuel@avoca class]$ gdb 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo 
> core.32444
> [?1034hGNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "ppc64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> ...
> Reading symbols from 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo...done.
> [New Thread 32465]
> [New Thread 32464]
> [New Thread 32466]
> [New Thread 32444]
> [New Thread 32469]
> [New Thread 32467]
> [New Thread 32470]
> [New Thread 32463]
> [New Thread 32468]
> Missing separate debuginfo for 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
> /usr/lib/debug/.build-id/de/a09192aa84bbc15579ae5190dc8acd16eb94fe
> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libpmi.so.0
> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
> /usr/lib/debug/.build-id/28/09dfc4706ed44259cc31a5898c8d1a9b76b949
> Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libslurm.so.27
> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
> /usr/lib/debug/.build-id/e2/39d8a2994ae061ab7ada0ebb7719b8efa5de96
> Missing separate debuginfo for 
> Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
> /usr/lib/debug/.build-id/1a/063e3d64bb5560021ec2ba5329fb1e420b470f
> Reading symbols from 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0...done.
> Loaded symbols for 
> /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
> Reading symbols from /usr/local/slurm/14.03.10/lib/libpmi.so.0...done.
> Loaded symbols for /usr/local/slurm/14.03.10/lib/libpmi.so.0
> Reading symbols from /usr/local/slurm/14.03.10/lib/libslurm.so.27...done.
> Loaded symbols for /usr/local/slurm/14.03.10/lib/libslurm.so.27
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols 
> found)...done.
> [Thread debugging using libthread_db enabled]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libutil.so.1...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib64/libutil.so.1
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld64.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/ld64.so.1
> Core was generated by 
> `/vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo '.
> Program terminated with signal 11, Segmentation fault.
> #0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
> at 
> /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
> 51old = *addr;
> Missing separate debuginfos, use: debuginfo-install 
> glibc-2.12-1.107.el6_4.5.ppc64
> (gdb) thread apply all bt full
> 
> Thread 9 (Thread 0xfff7a0ef200 (LWP 32468)):
> #0  0x0080adb6629c in .__libc_write () from /lib64/libpthread.so.0
> No symbol table info available.
> #1  0x0fff7d6905b4 in show_stackframe (signo=11, 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-01 Thread Christopher Samuel
On 31/01/15 10:51, Jeff Squyres (jsquyres) wrote:

> New tarball posted (same location).  Now featuring 100% fewer "make check" 
> failures.

On our BG/Q front-end node (PPC64, RHEL 6.4) I see:

../../config/test-driver: line 95: 30173 Segmentation fault  (core dumped) 
"$@" > $log_file 2>&1
FAIL: opal_lifo

Stack trace implies the culprit is in:

#0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
at 
/vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
51  old = *addr;

I've attached a script of gdb doing "thread apply all bt full" in
case that's helpful.

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

Script started on Mon 02 Feb 2015 12:32:56 EST

[samuel@avoca class]$ gdb /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo core.32444
[?1034hGNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo...done.
[New Thread 32465]
[New Thread 32464]
[New Thread 32466]
[New Thread 32444]
[New Thread 32469]
[New Thread 32467]
[New Thread 32470]
[New Thread 32463]
[New Thread 32468]
Missing separate debuginfo for /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/de/a09192aa84bbc15579ae5190dc8acd16eb94fe
Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libpmi.so.0
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/28/09dfc4706ed44259cc31a5898c8d1a9b76b949
Missing separate debuginfo for /usr/local/slurm/14.03.10/lib/libslurm.so.27
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/e2/39d8a2994ae061ab7ada0ebb7719b8efa5de96
Missing separate debuginfo for 
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/1a/063e3d64bb5560021ec2ba5329fb1e420b470f
Reading symbols from /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0...done.
Loaded symbols for /vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/opal/.libs/libopen-pal.so.0
Reading symbols from /usr/local/slurm/14.03.10/lib/libpmi.so.0...done.
Loaded symbols for /usr/local/slurm/14.03.10/lib/libpmi.so.0
Reading symbols from /usr/local/slurm/14.03.10/lib/libslurm.so.27...done.
Loaded symbols for /usr/local/slurm/14.03.10/lib/libslurm.so.27
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld64.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld64.so.1
Core was generated by `/vlsci/VLSCI/samuel/tmp/OMPI/build-gcc/test/class/.libs/lt-opal_lifo '.
Program terminated with signal 11, Segmentation fault.
#0  0x10001048 in opal_atomic_swap_32 (addr=0x20, newval=1)
at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/include/opal/sys/atomic_impl.h:51
51	old = *addr;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.5.ppc64
(gdb) thread apply all bt full

Thread 9 (Thread 0xfff7a0ef200 (LWP 32468)):
#0  0x0080adb6629c in .__libc_write () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x0fff7d6905b4 in show_stackframe (signo=11, info=0xfff7a0ee3d8, p=0xfff7a0edd00)
at /vlsci/VLSCI/samuel/tmp/OMPI/openmpi-gitclone/opal/util/stacktrace.c:81
print_buffer = "[avoca:32444] *** Process received signal ***\n", '\000' 
tmp = 0xfff7a0ed858 "[avoca:32444] *** Process received signal ***\n"
size = 1024
ret = 46
si_code_str = 0xfff7d75bab8 ""
#2  
No symbol table info available.
#3  0x10001048 in opal_atomic_swap_32 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-01-31 Thread Jeff Squyres (jsquyres)
Hmm.

I'm unable to find where this happens -- we don't explicitly list "libltdl.so" 
anywhere, for example.  Libltdl is added by AC_CHECK_LIB, like most other 
libraries OMPI links against.

Can you send more info (perhaps off-list, if the attachments get large):

- full output of configure
- config.log
- config.status
- run make clean, and then record/send the output of "make V=1"

Thanks.


> On Jan 30, 2015, at 7:46 PM, Paul Hargrove  wrote:
> 
> Observed failure mode 2-of-2:
> 
> On at least one SLES-11.1 system you are attempting to link libltdl.so by 
> full path and are erroneously using /usr/lib/libltdl.so instead of 
> /usr/lib64/libltdl.so, resulting in the failure below.
> 
> -Paul
> 
> libtool: link: pgcc -shared  -fpic -DPIC  class/.libs/opal_bitmap.o 
> class/.libs/opal_free_list.o class/.libs/opal_hash_table.o 
> class/.libs/opal_hotel.o class/.libs/opal_tree.o class/.libs/opal_list.o 
> class/.libs/opal_object.o class/.libs/opal_graph.o class/.libs/opal_lifo.o 
> class/.libs/opal_fifo.o class/.libs/opal_pointer_array.o 
> class/.libs/opal_value_array.o class/.libs/opal_ring_buffer.o 
> class/.libs/opal_rb_tree.o class/.libs/ompi_free_list.o 
> memoryhooks/.libs/memory.o runtime/.libs/opal_progress.o 
> runtime/.libs/opal_finalize.o runtime/.libs/opal_init.o 
> runtime/.libs/opal_params.o runtime/.libs/opal_cr.o 
> runtime/.libs/opal_info_support.o runtime/.libs/opal_progress_threads.o 
> threads/.libs/condition.o threads/.libs/mutex.o threads/.libs/thread.o 
> dss/.libs/dss_internal_functions.o dss/.libs/dss_compare.o 
> dss/.libs/dss_copy.o dss/.libs/dss_dump.o dss/.libs/dss_load_unload.o 
> dss/.libs/dss_lookup.o dss/.libs/dss_pack.o dss/.libs/dss_peek.o 
> dss/.libs/dss_print.o dss/.libs/dss_register.o dss/.libs/dss_unpack.o 
> dss/.libs/dss_open_close.o  
> -Wl,--whole-archive,asm/.libs/libasm.a,datatype/.libs/libdatatype.a,mca/base/.libs/libmca_base.a,util/.libs/libopalutil.a,mca/allocator/.libs/libmca_allocator.a,mca/backtrace/.libs/libmca_backtrace.a,mca/backtrace/execinfo/.libs/libmca_backtrace_execinfo.a,mca/btl/.libs/libmca_btl.a,mca/compress/.libs/libmca_compress.a,mca/crs/.libs/libmca_crs.a,mca/dstore/.libs/libmca_dstore.a,mca/event/.libs/libmca_event.a,mca/event/libevent2022/.libs/libmca_event_libevent2022.a,mca/hwloc/.libs/libmca_hwloc.a,mca/hwloc/hwloc191/.libs/libmca_hwloc_hwloc191.a,mca/if/.libs/libmca_if.a,mca/if/posix_ipv4/.libs/libmca_if_posix_ipv4.a,mca/if/linux_ipv6/.libs/libmca_if_linux_ipv6.a,mca/installdirs/.libs/libmca_installdirs.a,mca/installdirs/config/.libs/libmca_installdirs_config.a,mca/installdirs/env/.libs/libmca_installdirs_env.a,mca/memchecker/.libs/libmca_memchecker.a,mca/memcpy/.libs/libmca_memcpy.a,mca/memory/.libs/libmca_memory.a,mca/memory/linux/.libs/libmca_memory_linux.a,mca/mpool/.libs/libmca_mpool.a,mca/pmix/.libs/libmca_pmix.a,mca/pstat/.libs/libmca_pstat.a,mca/rcache/.libs/libmca_rcache.a,mca/sec/.libs/libmca_sec.a,mca/shmem/.libs/libmca_shmem.a,mca/timer/.libs/libmca_timer.a,mca/timer/linux/.libs/libmca_timer_linux.a
>  -Wl,--no-whole-archive  -lm -lpciaccess -lrt -lutil /usr/lib/libltdl.so -ldl 
> -lc-Wl,-soname -Wl,libopen-pal.so.0 -o .libs/libopen-pal.so.0.0.0
> /usr/lib/libltdl.so: could not read symbols: File in wrong format
> 
> On Fri, Jan 30, 2015 at 3:51 PM, Jeff Squyres (jsquyres)  
> wrote:
> New tarball posted (same location).  Now featuring 100% fewer "make check" 
> failures.
> 
> http://www.open-mpi.org/~jsquyres/unofficial/
> 
> 
> > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres)  
> > wrote:
> >
> > Shame on me for not running "make check".
> >
> > Fixing...
> >
> >
> >> On Jan 30, 2015, at 4:58 PM, Paul Hargrove  wrote:
> >>
> >> Jeff,
> >>
> >> I ran on just one (mac OSX 10.8) system first as a "smoke test".
> >> It encountered the failure show below on "make check" at which point I 
> >> decided not to test 60+ platforms.
> >> Please advise how I should proceed (best guess is wait for a new tarball).
> >>
> >> -Paul
> >>
> >> Making check in test
> >> Making check in support
> >> make  libsupport.a
> >>  CC   components.o
> >> /Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
> >>  fatal error: 'opal/libltdl/ltdl.h' file not found
> >> #include "opal/libltdl/ltdl.h"
> >> ^
> >>
> >>
> >> On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres) 
> >>  wrote:
> >> On Jan 30, 2015, at 2:46 PM, Paul Hargrove  wrote:
> >>>
> >>> If I had new enough autotools to autogen on this old system then I 
> >>> wouldn't have asked about libltdl from libtool-1.4.  So, please *do* 
> >>> generate a tarball and I will test (on *all* of my systems).
> >>
> >> Sweet, thank you.  I just posted a tarball here:
> >>
> >>http://www.open-mpi.org/~jsquyres/unofficial/
> >>
> >> --
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >> For corporate legal 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-01-31 Thread Jeff Squyres (jsquyres)
Looks like the lt_interface.c code didn't properly use the lt_dladvise #if. How 
did that ever work, I wonder?

Fixed now.  On to your second finding...


> On Jan 30, 2015, at 7:42 PM, Paul Hargrove  wrote:
> 
> Not meeting with the greatest of success.
> This is a report of just the first (of at least 2) failure modes I am seeing.
> 
> On a Scientific Linux 5.5. (RHEL-5.5 clone like CentOS) I get a build failure 
> described below.
> At least Solaris-11 and a few other linux systems (including RHAS-4.4) are 
> also failing in what appears to be the same manner.
> I am sure there are more, but I am aborting this round of testing at this 
> point.
> 
> I again await a new tarball with a less broken-by-default behavior.
> 
> -Paul
> 
> 
> The configure output includes
> checking ltdl.h usability... yes
> checking ltdl.h presence... yes
> checking for ltdl.h... yes
> looking for library without search path
> checking for lt_dlopen in -lltdl... yes
> checking for lt_dladvise_init... no
> configure: WARNING: *
> configure: WARNING: Could not find lt_dladvise_init in libltdl
> configure: WARNING: This could mean that your libltdl version
> configure: WARNING: is old.  If you could upgrade, that would be great.
> configure: WARNING: *
> checking for lt_dladvise... no
> 
> However, it looks like opal/utill/lt_interface.c is still attempting to call 
> lt_dladvise:
> PGC-S-0040-Illegal use of symbol, lt_dladvise 
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
>  25)
> PGC-W-0156-Type not specified, 'int' assumed 
> (/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
>  25)
> PGC/x86-64 Linux 12.10-0: compilation completed with severe errors
> 
> The put of "libtool --version" says "1.5.22" and we have libltdl.so.3.1.4.
> However, the rpm database is not readable, preventing me from checking a 
> package version associated with the libltdl.
> 
> The failing Solaris-11/x86-64 system says 1.5.22 without any ambiguity:
> $ pkg info libltdl | grep Version
>Version: 1.5.22
> 
> 
> -Paul
> 
> 
> 
> 
> 
> 
> 
> On Fri, Jan 30, 2015 at 3:51 PM, Jeff Squyres (jsquyres)  
> wrote:
> New tarball posted (same location).  Now featuring 100% fewer "make check" 
> failures.
> 
> http://www.open-mpi.org/~jsquyres/unofficial/
> 
> 
> > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres)  
> > wrote:
> >
> > Shame on me for not running "make check".
> >
> > Fixing...
> >
> >
> >> On Jan 30, 2015, at 4:58 PM, Paul Hargrove  wrote:
> >>
> >> Jeff,
> >>
> >> I ran on just one (mac OSX 10.8) system first as a "smoke test".
> >> It encountered the failure show below on "make check" at which point I 
> >> decided not to test 60+ platforms.
> >> Please advise how I should proceed (best guess is wait for a new tarball).
> >>
> >> -Paul
> >>
> >> Making check in test
> >> Making check in support
> >> make  libsupport.a
> >>  CC   components.o
> >> /Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
> >>  fatal error: 'opal/libltdl/ltdl.h' file not found
> >> #include "opal/libltdl/ltdl.h"
> >> ^
> >>
> >>
> >> On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres) 
> >>  wrote:
> >> On Jan 30, 2015, at 2:46 PM, Paul Hargrove  wrote:
> >>>
> >>> If I had new enough autotools to autogen on this old system then I 
> >>> wouldn't have asked about libltdl from libtool-1.4.  So, please *do* 
> >>> generate a tarball and I will test (on *all* of my systems).
> >>
> >> Sweet, thank you.  I just posted a tarball here:
> >>
> >>http://www.open-mpi.org/~jsquyres/unofficial/
> >>
> >> --
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >> For corporate legal information go to: 
> >> http://www.cisco.com/web/about/doing_business/legal/cri/
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2015/01/16854.php
> >>
> >>
> >>
> >> --
> >> Paul H. Hargrove  phhargr...@lbl.gov
> >> Computer Languages & Systems Software (CLaSS) Group
> >> Computer Science Department   Tel: +1-510-495-2352
> >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2015/01/16855.php
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-01-30 Thread Paul Hargrove
Observed failure mode 2-of-2:

On at least one SLES-11.1 system you are attempting to link libltdl.so by
full path and are erroneously using /usr/lib/libltdl.so instead of
/usr/lib64/libltdl.so, resulting in the failure below.

-Paul

libtool: link: pgcc -shared  -fpic -DPIC  class/.libs/opal_bitmap.o
class/.libs/opal_free_list.o class/.libs/opal_hash_table.o
class/.libs/opal_hotel.o class/.libs/opal_tree.o class/.libs/opal_list.o
class/.libs/opal_object.o class/.libs/opal_graph.o class/.libs/opal_lifo.o
class/.libs/opal_fifo.o class/.libs/opal_pointer_array.o
class/.libs/opal_value_array.o class/.libs/opal_ring_buffer.o
class/.libs/opal_rb_tree.o class/.libs/ompi_free_list.o
memoryhooks/.libs/memory.o runtime/.libs/opal_progress.o
runtime/.libs/opal_finalize.o runtime/.libs/opal_init.o
runtime/.libs/opal_params.o runtime/.libs/opal_cr.o
runtime/.libs/opal_info_support.o runtime/.libs/opal_progress_threads.o
threads/.libs/condition.o threads/.libs/mutex.o threads/.libs/thread.o
dss/.libs/dss_internal_functions.o dss/.libs/dss_compare.o
dss/.libs/dss_copy.o dss/.libs/dss_dump.o dss/.libs/dss_load_unload.o
dss/.libs/dss_lookup.o dss/.libs/dss_pack.o dss/.libs/dss_peek.o
dss/.libs/dss_print.o dss/.libs/dss_register.o dss/.libs/dss_unpack.o
dss/.libs/dss_open_close.o
 
-Wl,--whole-archive,asm/.libs/libasm.a,datatype/.libs/libdatatype.a,mca/base/.libs/libmca_base.a,util/.libs/libopalutil.a,mca/allocator/.libs/libmca_allocator.a,mca/backtrace/.libs/libmca_backtrace.a,mca/backtrace/execinfo/.libs/libmca_backtrace_execinfo.a,mca/btl/.libs/libmca_btl.a,mca/compress/.libs/libmca_compress.a,mca/crs/.libs/libmca_crs.a,mca/dstore/.libs/libmca_dstore.a,mca/event/.libs/libmca_event.a,mca/event/libevent2022/.libs/libmca_event_libevent2022.a,mca/hwloc/.libs/libmca_hwloc.a,mca/hwloc/hwloc191/.libs/libmca_hwloc_hwloc191.a,mca/if/.libs/libmca_if.a,mca/if/posix_ipv4/.libs/libmca_if_posix_ipv4.a,mca/if/linux_ipv6/.libs/libmca_if_linux_ipv6.a,mca/installdirs/.libs/libmca_installdirs.a,mca/installdirs/config/.libs/libmca_installdirs_config.a,mca/installdirs/env/.libs/libmca_installdirs_env.a,mca/memchecker/.libs/libmca_memchecker.a,mca/memcpy/.libs/libmca_memcpy.a,mca/memory/.libs/libmca_memory.a,mca/memory/linux/.libs/libmca_memory_linux.a,mca/mpool/.libs/libmca_mpool.a,mca/pmix/.libs/libmca_pmix.a,mca/pstat/.libs/libmca_pstat.a,mca/rcache/.libs/libmca_rcache.a,mca/sec/.libs/libmca_sec.a,mca/shmem/.libs/libmca_shmem.a,mca/timer/.libs/libmca_timer.a,mca/timer/linux/.libs/libmca_timer_linux.a
-Wl,--no-whole-archive  -lm -lpciaccess -lrt -lutil /usr/lib/libltdl.so
-ldl -lc-Wl,-soname -Wl,libopen-pal.so.0 -o .libs/libopen-pal.so.0.0.0
/usr/lib/libltdl.so: could not read symbols: File in wrong format

On Fri, Jan 30, 2015 at 3:51 PM, Jeff Squyres (jsquyres)  wrote:

> New tarball posted (same location).  Now featuring 100% fewer "make check"
> failures.
>
> http://www.open-mpi.org/~jsquyres/unofficial/
>
>
> > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres) 
> wrote:
> >
> > Shame on me for not running "make check".
> >
> > Fixing...
> >
> >
> >> On Jan 30, 2015, at 4:58 PM, Paul Hargrove  wrote:
> >>
> >> Jeff,
> >>
> >> I ran on just one (mac OSX 10.8) system first as a "smoke test".
> >> It encountered the failure show below on "make check" at which point I
> decided not to test 60+ platforms.
> >> Please advise how I should proceed (best guess is wait for a new
> tarball).
> >>
> >> -Paul
> >>
> >> Making check in test
> >> Making check in support
> >> make  libsupport.a
> >>  CC   components.o
> >>
> /Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
> fatal error: 'opal/libltdl/ltdl.h' file not found
> >> #include "opal/libltdl/ltdl.h"
> >> ^
> >>
> >>
> >> On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> >> On Jan 30, 2015, at 2:46 PM, Paul Hargrove  wrote:
> >>>
> >>> If I had new enough autotools to autogen on this old system then I
> wouldn't have asked about libltdl from libtool-1.4.  So, please *do*
> generate a tarball and I will test (on *all* of my systems).
> >>
> >> Sweet, thank you.  I just posted a tarball here:
> >>
> >>http://www.open-mpi.org/~jsquyres/unofficial/
> >>
> >> --
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16854.php
> >>
> >>
> >>
> >> --
> >> Paul H. Hargrove  phhargr...@lbl.gov
> >> Computer Languages & Systems Software (CLaSS) Group
> >> Computer Science Department   Tel: +1-510-495-2352
> >> Lawrence Berkeley 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-01-30 Thread Paul Hargrove
Not meeting with the greatest of success.
This is a report of just the first (of at least 2) failure modes I am
seeing.

On a Scientific Linux 5.5. (RHEL-5.5 clone like CentOS) I get a build
failure described below.
At least Solaris-11 and a few other linux systems (including RHAS-4.4) are
also failing in what appears to be the same manner.
I am sure there are more, but I am aborting this round of testing at this
point.

I again await a new tarball with a less broken-by-default behavior.

-Paul


The configure output includes

checking ltdl.h usability... yes
checking ltdl.h presence... yes
checking for ltdl.h... yes
looking for library without search path
checking for lt_dlopen in -lltdl... yes
checking for lt_dladvise_init... no
configure: WARNING: *
configure: WARNING: Could not find lt_dladvise_init in libltdl
configure: WARNING: This could mean that your libltdl version
configure: WARNING: is old.  If you could upgrade, that would be great.
configure: WARNING: *
checking for lt_dladvise... no


However, it looks like opal/utill/lt_interface.c is still attempting to
call lt_dladvise:

PGC-S-0040-Illegal use of symbol, lt_dladvise
(/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
25)
PGC-W-0156-Type not specified, 'int' assumed
(/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-pgi-12.10/openmpi-gitclone/opal/util/lt_interface.c:
25)
PGC/x86-64 Linux 12.10-0: compilation completed with severe errors


The put of "libtool --version" says "1.5.22" and we have libltdl.so.3.1.4.
However, the rpm database is not readable, preventing me from checking a
package version associated with the libltdl.

The failing Solaris-11/x86-64 system says 1.5.22 without any ambiguity:

$ pkg info libltdl | grep Version
   Version: 1.5.22



-Paul







On Fri, Jan 30, 2015 at 3:51 PM, Jeff Squyres (jsquyres)  wrote:

> New tarball posted (same location).  Now featuring 100% fewer "make check"
> failures.
>
> http://www.open-mpi.org/~jsquyres/unofficial/
>
>
> > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres) 
> wrote:
> >
> > Shame on me for not running "make check".
> >
> > Fixing...
> >
> >
> >> On Jan 30, 2015, at 4:58 PM, Paul Hargrove  wrote:
> >>
> >> Jeff,
> >>
> >> I ran on just one (mac OSX 10.8) system first as a "smoke test".
> >> It encountered the failure show below on "make check" at which point I
> decided not to test 60+ platforms.
> >> Please advise how I should proceed (best guess is wait for a new
> tarball).
> >>
> >> -Paul
> >>
> >> Making check in test
> >> Making check in support
> >> make  libsupport.a
> >>  CC   components.o
> >>
> /Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
> fatal error: 'opal/libltdl/ltdl.h' file not found
> >> #include "opal/libltdl/ltdl.h"
> >> ^
> >>
> >>
> >> On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> >> On Jan 30, 2015, at 2:46 PM, Paul Hargrove  wrote:
> >>>
> >>> If I had new enough autotools to autogen on this old system then I
> wouldn't have asked about libltdl from libtool-1.4.  So, please *do*
> generate a tarball and I will test (on *all* of my systems).
> >>
> >> Sweet, thank you.  I just posted a tarball here:
> >>
> >>http://www.open-mpi.org/~jsquyres/unofficial/
> >>
> >> --
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16854.php
> >>
> >>
> >>
> >> --
> >> Paul H. Hargrove  phhargr...@lbl.gov
> >> Computer Languages & Systems Software (CLaSS) Group
> >> Computer Science Department   Tel: +1-510-495-2352
> >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16855.php
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16856.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For 

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-01-30 Thread Jeff Squyres (jsquyres)
New tarball posted (same location).  Now featuring 100% fewer "make check" 
failures.

http://www.open-mpi.org/~jsquyres/unofficial/


> On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Shame on me for not running "make check".
> 
> Fixing...
> 
> 
>> On Jan 30, 2015, at 4:58 PM, Paul Hargrove  wrote:
>> 
>> Jeff,
>> 
>> I ran on just one (mac OSX 10.8) system first as a "smoke test".
>> It encountered the failure show below on "make check" at which point I 
>> decided not to test 60+ platforms.
>> Please advise how I should proceed (best guess is wait for a new tarball).
>> 
>> -Paul
>> 
>> Making check in test
>> Making check in support
>> make  libsupport.a
>>  CC   components.o
>> /Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
>>  fatal error: 'opal/libltdl/ltdl.h' file not found
>> #include "opal/libltdl/ltdl.h"
>> ^
>> 
>> 
>> On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres) 
>>  wrote:
>> On Jan 30, 2015, at 2:46 PM, Paul Hargrove  wrote:
>>> 
>>> If I had new enough autotools to autogen on this old system then I wouldn't 
>>> have asked about libltdl from libtool-1.4.  So, please *do* generate a 
>>> tarball and I will test (on *all* of my systems).
>> 
>> Sweet, thank you.  I just posted a tarball here:
>> 
>>http://www.open-mpi.org/~jsquyres/unofficial/
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/01/16854.php
>> 
>> 
>> 
>> -- 
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/01/16855.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16856.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-01-30 Thread Jeff Squyres (jsquyres)
Shame on me for not running "make check".

Fixing...


> On Jan 30, 2015, at 4:58 PM, Paul Hargrove  wrote:
> 
> Jeff,
> 
> I ran on just one (mac OSX 10.8) system first as a "smoke test".
> It encountered the failure show below on "make check" at which point I 
> decided not to test 60+ platforms.
> Please advise how I should proceed (best guess is wait for a new tarball).
> 
> -Paul
> 
> Making check in test
> Making check in support
> make  libsupport.a
>   CC   components.o
> /Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
>  fatal error: 'opal/libltdl/ltdl.h' file not found
> #include "opal/libltdl/ltdl.h"
>  ^
> 
> 
> On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres)  
> wrote:
> On Jan 30, 2015, at 2:46 PM, Paul Hargrove  wrote:
> >
> > If I had new enough autotools to autogen on this old system then I wouldn't 
> > have asked about libltdl from libtool-1.4.  So, please *do* generate a 
> > tarball and I will test (on *all* of my systems).
> 
> Sweet, thank you.  I just posted a tarball here:
> 
> http://www.open-mpi.org/~jsquyres/unofficial/
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16854.php
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16855.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-01-30 Thread Paul Hargrove
Jeff,

I ran on just one (mac OSX 10.8) system first as a "smoke test".
It encountered the failure show below on "make check" at which point I
decided not to test 60+ platforms.
Please advise how I should proceed (best guess is wait for a new tarball).

-Paul

Making check in test
Making check in support
make  libsupport.a
  CC   components.o
/Users/Paul/OMPI/openmpi-libltdl-macos10.8-x86-clang/openmpi-gitclone/test/support/components.c:27:10:
fatal error: 'opal/libltdl/ltdl.h' file not found
#include "opal/libltdl/ltdl.h"
 ^


On Fri, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres)  wrote:

> On Jan 30, 2015, at 2:46 PM, Paul Hargrove  wrote:
> >
> > If I had new enough autotools to autogen on this old system then I
> wouldn't have asked about libltdl from libtool-1.4.  So, please *do*
> generate a tarball and I will test (on *all* of my systems).
>
> Sweet, thank you.  I just posted a tarball here:
>
> http://www.open-mpi.org/~jsquyres/unofficial/
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16854.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] RFC: Remove embedded libltdl

2015-01-30 Thread Jeff Squyres (jsquyres)
On Jan 30, 2015, at 2:46 PM, Paul Hargrove  wrote:
> 
> If I had new enough autotools to autogen on this old system then I wouldn't 
> have asked about libltdl from libtool-1.4.  So, please *do* generate a 
> tarball and I will test (on *all* of my systems).

Sweet, thank you.  I just posted a tarball here:

http://www.open-mpi.org/~jsquyres/unofficial/

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-01-30 Thread Jeff Squyres (jsquyres)
On Jan 29, 2015, at 9:11 PM, Paul Hargrove  wrote:
> 
> If I understand one is (or will be soon) expected to have libtool-dev(el) 
> installed on the build system, even if one is not a OMPI developer.

No.  We don't want to raise the bar that high for simple user installations.

If you are installing an OMPI tarball from source (that is not a git clone):

1. If you have libltdl devel support installed (i.e., ltdl.h), then OMPI will 
build as it always has: with plugin support.

2. If you do not have libltdl devel support installed, then OMPI will 
effectively behave as if --disable-dlopen was specified.  I.e., there will be 
no plugin support and all OMPI plugins will be slurped up into their 
upper-level libraries (e.g., BTLs are slurped up into libmpi.so or libmpi.a).

Hence, from OMPI tarballs, *OMPI will always build and function correctly* -- 
regardless of whether you have libltdl / libltdl-devel.

I'm guessing that many user installations will now build without plugin support 
(because libltdl-devel is typically not installed in Linux distros / OS X by 
default).  However, after talking this through in Dallas this week, I'm not 
thinking that this is a problem.

> How does this plan to cease embedding libltdl align with the fact that 
> autogen.pl currently applies patches to the parts of the generated configure 
> from the *packager's* system?  Isn't there now going to be a disconnect 
> between the versions of libtool-related portions of the configure script 
> shipped in a tarball and the version (if any) of libltdl on the system 
> building from that tarball?

I think that's two questions:

1. Will OMPI continue to patch configure/etc. w.r.t. libltdl functionality?

No; there's no need, because we were effectively patching against the embedded 
libltdl.  Since we're not embedding, there's nothing to patch.

We will lose the "bug fix" that we were patching, however (there is a giant 
comment in this file that explains what it is for): 
https://github.com/open-mpi/ompi/blob/master/config/libltdl-preopen-error.diff

That may be a bit annoying.  ...but then again, if most users are going to end 
up not building plugin support, the need for that patch effectively goes away.

2. What happens if OMPI tries to build against an old libltdl?

We *should* be ok.  libltdl has been stable for a long time.  libltdl added the 
lt_dladvise functionality at one point, but we added configure tests to check 
for that a long time ago (i.e., the C code can handle whether or not your 
libltdl has support for lt_dladvise).

This PR actually adds the results of those lt_dladvise configury tests to 
ompi_info output.

> Also, I can still build v1.8 on an old Red Hat 8 system where the system 
> libtool/libltdl is 1.4.2, perhaps only because Open MPI embeds a recent 
> version.

Could be.  Would you mind testing the OMPI PR branch on this old system?  I can 
make you a tarball if it would help.

> What minimum version of libltdl is required after the proposed change?
> Will I need to install a libtool-2.x on that old system to be able to build 
> OpenMPI with dlopen support?

I don't know what the minimum Libtool/libltdl required version is; I didn't try 
to back track to find it.

I think that if we can still build on a sufficiently-old system (e.g., Red Hat 
8 with your LT 1.4.2), that is probably good enough.

Also, remember: libltdl has *never been required* for Open MPI.  Although 
building with libltdl has always been the default, you could always have 
disabled it.  This PR effectively now changes the default to 
build-it-if-we-got-it-and-skip-it-if-we-don't for users, and developers must 
specify --disable-dlopen if they don't have libltdl-devel (per the assumption 
that most developers will want to build with dlopen support).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove embedded libltdl

2015-01-29 Thread Paul Hargrove
Jeff,

If I understand one is (or will be soon) expected to have libtool-dev(el)
installed on the build system, even if one is not a OMPI developer.

How does this plan to cease embedding libltdl align with the fact that
autogen.pl currently applies patches to the parts of the generated
configure from the *packager's* system?  Isn't there now going to be a
disconnect between the versions of libtool-related portions of the
configure script shipped in a tarball and the version (if any) of libltdl
on the system building from that tarball?

If I've missed something obvious, please enlighten me.

Also, I can still build v1.8 on an old Red Hat 8 system where the system
libtool/libltdl is 1.4.2, perhaps only because Open MPI embeds a recent
version.
What minimum version of libltdl is required after the proposed change?
Will I need to install a libtool-2.x on that old system to be able to build
OpenMPI with dlopen support?

-Paul

On Thu, Jan 29, 2015 at 5:27 PM, Jeff Squyres (jsquyres)  wrote:

> WHAT: Remove the embedded libtdl from the OPAL source tree
>
> WHY: Fixes #311
>
> WHERE: Various configury and .c files in the code base (see
> https://github.com/open-mpi/ompi/pull/366)
>
> TIMEOUT: Let's discuss next Tuesday during the Dallas meeting
> roundup/sumary
>
> MORE DETAIL:
>
> We've known for a while that upgrading to Libtool 2.4.4 (i.e., the latest
> Libtool) broke something in the OMPI build.
>
> Per #311 (https://github.com/open-mpi/ompi/issues/311), I made a small
> reproducer and filed a bug with upstream Libtool.  Turns out that this bad
> behavior is a bug in Libtool and/or autoreconf (it isn't immediately
> obvious which) when you embed libltdl in a larger project.
>
> Upstream Libtool/Autoconf is not anxious to fix this bug.  :-\
>
> We talked about this issue this week here in Dallas and came to the
> conclusion that we might as well just take out the embedded libltdl and use
> the system-provided one when available, and fall back to --disable-dlopen
> behavior when a system-provided libltdl is not available.
>
> I've filed PR #366 that does this.
>
> https://github.com/open-mpi/ompi/pull/366 contains a writeup describing
> what happens when you don't have libltld support, etc.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16848.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900