Re: [OMPI devel] Open-MPI backwards compatibility and library version changes

2018-11-14 Thread Gilles Gouaillardet
Chris,

I am a bit puzzled at your logs.

As far as I understand,

ldd libhhgttg.so.1

reports that libopen-rte.so.40 and libopen-pal.so.40 are both
dependencies, but that does not say anything on
who is depending on them. They could be directly needed by
libhhgttg.so.1 (I hope / do not think it is the case),
or indirectly by libmpi.so.40 (I'd rather bet on that).

In the latter case, having libhhgttg.so.1 point to an other
libmpi.so.40 that depends on newer opal/orte libraries should just
work.

You might want to run string libhhgttg.so.1 and look for libmpi.so.40
(I found it) and libopen-pal.so.40 (I did not find it) or
libopen-rte.so.40 (I did not find it too).


Note if you
gcc -shared -o libhhgttg.so.1 libhhgttg.c -lmpi -lopen-rte -lopen-pal
then your lib will explicitly depend on the "internal" MPI libraries
and you will face the same issue that your end user.
You should not need to do that (I assume you do not explicitly call
internal opal/orte subroutines), and hence avoid doing it.
That being said, keep in mind that some build systems might do that
for you under the hood (I have seen that, but I cannot remember which
one), and that would be a bad thing, at least from an Open MPI point
of view.


Cheers,

Gilles
On Wed, Nov 14, 2018 at 6:46 PM Christopher Samuel  wrote:
>
> On 15/11/18 2:16 am, Barrett, Brian via devel wrote:
>
> > In practice, this should not be a problem. The wrapper compilers (and
> >  our instructions for linking when not using the wrapper compilers)
> > only link against libmpi.so (or a set of libraries if using Fortran),
> > as libmpi.so contains the public interface. libmpi.so has a
> > dependency on libopen-pal.so so the loader will load the version of
> > libopen-pal.so that matches the version of Open MPI used to build
> > libmpi.so However, if someone explicitly links against libopen-pal.so
> > you end up where we are today.
>
> Unfortunately that's not the case, just creating a shared library
> that only links in libmpi.so will create dependencies on the private
> libraries too in the final shared library. :-(
>
> Here's a toy example to illustrate that.
>
> [csamuel@farnarkle2 libtool]$ cat hhgttg.c
> int answer(void)
> {
> return(42);
> }
>
> [csamuel@farnarkle2 libtool]$ gcc hhgttg.c -c -o hhgttg.o
>
> [csamuel@farnarkle2 libtool]$ gcc -shared -Wl,-soname,libhhgttg.so.1 -o
> libhhgttg.so.1 hhgttg.o -lmpi
>
> [csamuel@farnarkle2 libtool]$ ldd libhhgttg.so.1
> linux-vdso.so.1 =>  (0x7ffc625b3000)
> libmpi.so.40 =>
> /apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libmpi.so.40
> (0x7f018a582000)
> libc.so.6 => /lib64/libc.so.6 (0x7f018a09e000)
> libopen-rte.so.40 =>
> /apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libopen-rte.so.40
> (0x7f018a4b5000)
> libopen-pal.so.40 =>
> /apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libopen-pal.so.40
> (0x7f0189fde000)
> libdl.so.2 => /lib64/libdl.so.2 (0x7f0189dda000)
> librt.so.1 => /lib64/librt.so.1 (0x7f0189bd2000)
> libutil.so.1 => /lib64/libutil.so.1 (0x7f01899cf000)
> libm.so.6 => /lib64/libm.so.6 (0x7f01896cd000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7f01894b1000)
> libz.so.1 => /lib64/libz.so.1 (0x7f018929b000)
> libhwloc.so.5 => /lib64/libhwloc.so.5 (0x7f018905e000)
> /lib64/ld-linux-x86-64.so.2 (0x7f018a46b000)
> libnuma.so.1 => /lib64/libnuma.so.1 (0x7f0188e52000)
> libltdl.so.7 => /lib64/libltdl.so.7 (0x7f0188c48000)
> libgcc_s.so.1 =>
> /apps/skylake/software/core/gcccore/6.4.0/lib64/libgcc_s.so.1
> (0x7f018a499000)
>
>
> All the best,
> Chris
> --
>   Christopher Samuel OzGrav Senior Data Science Support
>   ARC Centre of Excellence for Gravitational Wave Discovery
>   http://www.ozgrav.org/  http://twitter.com/ozgrav
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Open-MPI backwards compatibility and library version changes

2018-11-14 Thread Christopher Samuel
On 15/11/18 12:10 pm, Christopher Samuel wrote:

> I wonder if it's because they use libtool instead?

Yup, it's libtool - using it compile my toy example shows the same
behaviour with "readelf -d" pulling in the private libraries directly. :-(

[csamuel@farnarkle2 libtool]$ cat hhgttg.c
int answer(void)
{
return(42);
}


[csamuel@farnarkle2 libtool]$ libtool compile gcc hhgttg.c -c -o hhgttg.o
libtool: compile:  gcc hhgttg.c -c  -fPIC -DPIC -o .libs/hhgttg.o
libtool: compile:  gcc hhgttg.c -c -o hhgttg.o >/dev/null 2>&1


[csamuel@farnarkle2 libtool]$ libtool link gcc -o libhhgttg.la hhgttg.lo 
-lmpi -rpath /usr/local/lib
libtool: link: gcc -shared  -fPIC -DPIC  .libs/hhgttg.o   -Wl,-rpath 
-Wl,/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib 
-Wl,-rpath 
-Wl,/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib 
/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libmpi.so 
-L/apps/skylake/software/core/gcccore/6.4.0/lib64 
-L/apps/skylake/software/core/gcccore/6.4.0/lib 
-L/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib 
/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libopen-rte.so 
/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libopen-pal.so 
-ldl -lrt -lutil -lm -lpthread -lz -lhwloc-Wl,-soname 
-Wl,libhhgttg.so.0 -o .libs/libhhgttg.so.0.0.0
libtool: link: (cd ".libs" && rm -f "libhhgttg.so.0" && ln -s 
"libhhgttg.so.0.0.0" "libhhgttg.so.0")
libtool: link: (cd ".libs" && rm -f "libhhgttg.so" && ln -s 
"libhhgttg.so.0.0.0" "libhhgttg.so")
libtool: link: ar cru .libs/libhhgttg.a  hhgttg.o
libtool: link: ranlib .libs/libhhgttg.a
libtool: link: ( cd ".libs" && rm -f "libhhgttg.la" && ln -s 
"../libhhgttg.la" "libhhgttg.la" )


[csamuel@farnarkle2 libtool]$ readelf -d .libs/libhhgttg.so.0| fgrep -i lib
  0x0001 (NEEDED) Shared library: [libmpi.so.40]
  0x0001 (NEEDED) Shared library: 
[libopen-rte.so.40]
  0x0001 (NEEDED) Shared library: 
[libopen-pal.so.40]
  0x0001 (NEEDED) Shared library: [libdl.so.2]
  0x0001 (NEEDED) Shared library: [librt.so.1]
  0x0001 (NEEDED) Shared library: [libutil.so.1]
  0x0001 (NEEDED) Shared library: [libm.so.6]
  0x0001 (NEEDED) Shared library: [libpthread.so.0]
  0x0001 (NEEDED) Shared library: [libz.so.1]
  0x0001 (NEEDED) Shared library: [libhwloc.so.5]
  0x0001 (NEEDED) Shared library: [libc.so.6]
  0x000e (SONAME) Library soname: [libhhgttg.so.0]
  0x001d (RUNPATH)Library runpath: 
[/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib]


All the best,
Chris
-- 
  Christopher Samuel OzGrav Senior Data Science Support
  ARC Centre of Excellence for Gravitational Wave Discovery
  http://www.ozgrav.org/  http://twitter.com/ozgrav

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Open-MPI backwards compatibility and library version changes

2018-11-14 Thread Christopher Samuel
On 15/11/18 11:45 am, Christopher Samuel wrote:

> Unfortunately that's not the case, just creating a shared library
> that only links in libmpi.so will create dependencies on the private
> libraries too in the final shared library. :-(

Hmm, I might be misinterpreting the output of "ldd", it looks like it
reports the dependencies of dependencies not just the direct
dependencies.  "readelf -d" seems more reliable.

[csamuel@farnarkle2 libtool]$ readelf -d libhhgttg.so.1 | fgrep -i lib
  0x0001 (NEEDED) Shared library: [libmpi.so.40]
  0x0001 (NEEDED) Shared library: [libc.so.6]
  0x000e (SONAME) Library soname: [libhhgttg.so.1]

Whereas the HDF5 libraries really do have them listed as a dependency.

[csamuel@farnarkle2 1.10.1]$ readelf -d ./lib/libhdf5_fortran.so.100 | 
fgrep -i lib
  0x0001 (NEEDED) Shared library: [libhdf5.so.101]
  0x0001 (NEEDED) Shared library: [libsz.so.2]
  0x0001 (NEEDED) Shared library: 
[libmpi_usempif08.so.40]
  0x0001 (NEEDED) Shared library: 
[libmpi_usempi_ignore_tkr.so.40]
  0x0001 (NEEDED) Shared library: 
[libmpi_mpifh.so.40]
  0x0001 (NEEDED) Shared library: [libmpi.so.40]
  0x0001 (NEEDED) Shared library: 
[libopen-rte.so.40]
  0x0001 (NEEDED) Shared library: 
[libopen-pal.so.40]
  0x0001 (NEEDED) Shared library: [libdl.so.2]
  0x0001 (NEEDED) Shared library: [librt.so.1]
  0x0001 (NEEDED) Shared library: [libutil.so.1]
  0x0001 (NEEDED) Shared library: [libpthread.so.0]
  0x0001 (NEEDED) Shared library: [libz.so.1]
  0x0001 (NEEDED) Shared library: [libhwloc.so.5]
  0x0001 (NEEDED) Shared library: [libgfortran.so.3]
  0x0001 (NEEDED) Shared library: [libm.so.6]
  0x0001 (NEEDED) Shared library: [libquadmath.so.0]
  0x0001 (NEEDED) Shared library: [libc.so.6]
  0x0001 (NEEDED) Shared library: [libgcc_s.so.1]
  0x000e (SONAME) Library soname: 
[libhdf5_fortran.so.100]
  0x001d (RUNPATH)Library runpath: 
[/apps/skylake/software/mpi/gcc/6.4.0/openmpi/3.0.0/hdf5/1.10.1/lib:/apps/skylake/software/core/szip/2.1.1/lib:/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib:/apps/skylake/software/core/gcccore/6.4.0/lib/../lib64]

I wonder if it's because they use libtool instead?

All the best,
Chris
-- 
  Christopher Samuel OzGrav Senior Data Science Support
  ARC Centre of Excellence for Gravitational Wave Discovery
  http://www.ozgrav.org/  http://twitter.com/ozgrav

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Open-MPI backwards compatibility and library version changes

2018-11-14 Thread Christopher Samuel
On 15/11/18 2:16 am, Barrett, Brian via devel wrote:

> In practice, this should not be a problem. The wrapper compilers (and
>  our instructions for linking when not using the wrapper compilers)
> only link against libmpi.so (or a set of libraries if using Fortran),
> as libmpi.so contains the public interface. libmpi.so has a
> dependency on libopen-pal.so so the loader will load the version of
> libopen-pal.so that matches the version of Open MPI used to build
> libmpi.so However, if someone explicitly links against libopen-pal.so
> you end up where we are today.

Unfortunately that's not the case, just creating a shared library
that only links in libmpi.so will create dependencies on the private
libraries too in the final shared library. :-(

Here's a toy example to illustrate that.

[csamuel@farnarkle2 libtool]$ cat hhgttg.c
int answer(void)
{
return(42);
}

[csamuel@farnarkle2 libtool]$ gcc hhgttg.c -c -o hhgttg.o

[csamuel@farnarkle2 libtool]$ gcc -shared -Wl,-soname,libhhgttg.so.1 -o 
libhhgttg.so.1 hhgttg.o -lmpi

[csamuel@farnarkle2 libtool]$ ldd libhhgttg.so.1
linux-vdso.so.1 =>  (0x7ffc625b3000)
libmpi.so.40 => 
/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libmpi.so.40 
(0x7f018a582000)
libc.so.6 => /lib64/libc.so.6 (0x7f018a09e000)
libopen-rte.so.40 => 
/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libopen-rte.so.40 
(0x7f018a4b5000)
libopen-pal.so.40 => 
/apps/skylake/software/compiler/gcc/6.4.0/openmpi/3.0.0/lib/libopen-pal.so.40 
(0x7f0189fde000)
libdl.so.2 => /lib64/libdl.so.2 (0x7f0189dda000)
librt.so.1 => /lib64/librt.so.1 (0x7f0189bd2000)
libutil.so.1 => /lib64/libutil.so.1 (0x7f01899cf000)
libm.so.6 => /lib64/libm.so.6 (0x7f01896cd000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7f01894b1000)
libz.so.1 => /lib64/libz.so.1 (0x7f018929b000)
libhwloc.so.5 => /lib64/libhwloc.so.5 (0x7f018905e000)
/lib64/ld-linux-x86-64.so.2 (0x7f018a46b000)
libnuma.so.1 => /lib64/libnuma.so.1 (0x7f0188e52000)
libltdl.so.7 => /lib64/libltdl.so.7 (0x7f0188c48000)
libgcc_s.so.1 => 
/apps/skylake/software/core/gcccore/6.4.0/lib64/libgcc_s.so.1 
(0x7f018a499000)


All the best,
Chris
-- 
  Christopher Samuel OzGrav Senior Data Science Support
  ARC Centre of Excellence for Gravitational Wave Discovery
  http://www.ozgrav.org/  http://twitter.com/ozgrav
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Open-MPI backwards compatibility and library version changes

2018-11-14 Thread Barrett, Brian via devel
Chris -

When we look at ABI stability for Open MPI releases, we look only at the MPI 
and SHMEM interfaces, not the internal interfaces used by Open MPI internally.  
libopen-pal.so is an internal library, and we do not guarantee ABI stability 
across minor releases.  In 3.0.3, there was a backwards incompatible change in 
libopen-pal.so, which is why the shared library version numbers were increased 
in a way that prevented loading a new version of libopen-pal.so when the 
application was linked against an earlier version of the library.

In practice, this should not be a problem.  The wrapper compilers (and our 
instructions for linking when not using the wrapper compilers) only link 
against libmpi.so (or a set of libraries if using Fortran), as libmpi.so 
contains the public interface.  libmpi.so has a dependency on libopen-pal.so, 
so the loader will load the version of libopen-pal.so that matches the version 
of Open MPI used to build libmpi.so.  However, if someone explicitly links 
against libopen-pal.so, you end up where we are today.

There’s probably a bug in HDF5’s mechanism for linking against Open MPI, since 
it pulled in a dependency on libopen-pal.so.  However, there may be some things 
we can do in the future to better handle this scenario.  Unfortunately, most of 
the Open MPI developers (myself included) are at the SC’18 conference this 
week, so it will take us some time to investigate further.

Brian

> On Nov 14, 2018, at 5:20 AM, Christopher Samuel  wrote:
> 
> Hi folks,
> 
> Just resub'd after a long time to ask a question about binary/backwards 
> compatibility.
> 
> We got bitten when upgrading from 3.0.0 to 3.0.3 which we assumed would be 
> binary compatible and so (after some testing to confirm it was) replaced our 
> existing 3.0.0 install with the 3.0.3 one (because we're using hierarchical 
> namespaces in Lmod it meant we avoided needed to recompile everything we'd 
> already built over the last 12 months with 3.0.0).
> 
> However, once we'd done that we heard from a user that their code would no 
> longer run because it couldn't find libopen-pal.so.40 and saw that instead 
> 3.0.3 had libopen-pal.so.42.
> 
> Initially we thought this was some odd build system problem, but then on 
> digging further we realised that they were linking against libraries that in 
> turn were built against OpenMPI (HDF5) and that those had embedded the 
> libopen-pal.so.40 names.
> 
> Of course our testing hadn't found that because we weren't linking against 
> anything like those for our MPI tests. :-(
> 
> But I was really surprised to see that these version numbers were changing, I 
> thought the idea was to keep things backwardly compatible within these series?
> 
> Now fortunately our reason for doing the forced upgrade (we found our 3.0.0 
> didn't work with our upgrade to Slurm 18.08.3) was us missing one combination 
> out of our testing whilst fault-finding and having gotten it going we've been 
> able to drop back to the original 3.0.0 & fixed it for them.
> 
> But is this something that you folks have come across before?
> 
> All the best,
> Chris
> -- 
>  Christopher Samuel OzGrav Senior Data Science Support
>  ARC Centre of Excellence for Gravitational Wave Discovery
>  http://www.ozgrav.org/  http://twitter.com/ozgrav
> 
> 
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Open-MPI backwards compatibility and library version changes

2018-11-14 Thread Christopher Samuel
Hi folks,

Just resub'd after a long time to ask a question about binary/backwards 
compatibility.

We got bitten when upgrading from 3.0.0 to 3.0.3 which we assumed would be 
binary compatible and so (after some testing to confirm it was) replaced our 
existing 3.0.0 install with the 3.0.3 one (because we're using hierarchical 
namespaces in Lmod it meant we avoided needed to recompile everything we'd 
already built over the last 12 months with 3.0.0).

However, once we'd done that we heard from a user that their code would no 
longer run because it couldn't find libopen-pal.so.40 and saw that instead 
3.0.3 had libopen-pal.so.42.

Initially we thought this was some odd build system problem, but then on 
digging further we realised that they were linking against libraries that in 
turn were built against OpenMPI (HDF5) and that those had embedded the 
libopen-pal.so.40 names.

Of course our testing hadn't found that because we weren't linking against 
anything like those for our MPI tests. :-(

But I was really surprised to see that these version numbers were changing, I 
thought the idea was to keep things backwardly compatible within these series?

Now fortunately our reason for doing the forced upgrade (we found our 3.0.0 
didn't work with our upgrade to Slurm 18.08.3) was us missing one combination 
out of our testing whilst fault-finding and having gotten it going we've been 
able to drop back to the original 3.0.0 & fixed it for them.

But is this something that you folks have come across before?

All the best,
Chris
-- 
  Christopher Samuel OzGrav Senior Data Science Support
  ARC Centre of Excellence for Gravitational Wave Discovery
  http://www.ozgrav.org/  http://twitter.com/ozgrav



___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel