Re: [OMPI devel] Shared object dependencies

2018-06-11 Thread Tyson Whitehead
I have now also tried release 3.1.0.  Same thing (were I have replaced
/nix/store/glx60yay0hmmizhlxhqhnx9w3k4j9g1z-openmpi-3.1.0 with )

[orc-login2:107400] mca_base_component_repository_open: unable to open
mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
undefined symbol: mca_common_ompio_file_write (ignored)
[orc-login2:107400] mca_base_component_repository_open: unable to open
mca_fcoll_dynamic_gen2: .../lib/openmpi/mca_fcoll_dynamic_gen2.so:
undefined symbol: mca_common_ompio_register_print_entry (ignored)
[orc-login2:107400] mca_base_component_repository_open: unable to open
mca_fcoll_dynamic: .../lib/openmpi/mca_fcoll_dynamic.so: undefined
symbol: mca_common_ompio_register_print_entry (ignored)
[orc-login2:107400] mca_base_component_repository_open: unable to open
mca_fcoll_two_phase: .../lib/openmpi/mca_fcoll_two_phase.so: undefined
symbol: mca_common_ompio_register_print_entry (ignored)
[orc-login2:107400] mca_base_component_repository_open: unable to open
mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so: undefined
symbol: mca_common_ompio_register_print_entry (ignored)
 Package: Open MPI nixbld@localhost Distribution
Open MPI: 3.1.0
  Open MPI repo revision: v3.1.0
   Open MPI release date: May 07, 2018
    Open RTE: 3.1.0
  Open RTE repo revision: v3.1.0
   Open RTE release date: May 07, 2018
OPAL: 3.1.0
   OPAL repo revision: v3.1.0
   OPAL release date: May 07, 2018

I straced the process, and, as far as I could tell, it was just mostly
opening the shared objects in alphabetical order.  Would appreciate
any insight, such as whether this is normal behaviour I can ignore or
not?

Thanks!  -Tyson
On Fri, 8 Jun 2018 at 17:37, Tyson Whitehead  wrote:
>
> This email starts out talking about version 1.10.7 to give a complete
> picture.  I tested 2.1.3 as well, it also exhibits this issue,
> although to a lesser extent though, and am asking for help on that
> release.
>
> I was compiling the OpenMPI 1.10.7 shipped with NixOS against a newer
> libibverbs with a large set of drivers and get some strange errors
> when when running opmi_info (I've replaced the common prefix
> /nix/store/9zm0pqsh67fw0xi5cpnybnd7hgzryffs-openmpi-1.10.7 with ...)
>
> [mon241:04077] mca: base: component_find: unable to open
> .../lib/openmpi/mca_btl_openib: .../lib/openmpi/mca_btl_openib.so:
> undefined symbol: mca_mpool_grdma_evict (ignored)
> [mon241:04077] mca: base: component_find: unable to open
> .../lib/openmpi/mca_fcoll_individual:
> .../lib/openmpi/mca_fcoll_individual.so: undefined symbol:
> mca_io_ompio_file_write (ignored)
> [mon241:04077] mca: base: component_find: unable to open
> .../lib/openmpi/mca_fcoll_ylib: .../lib/openmpi/mca_fcoll_ylib.so:
> undefined symbol: ompi_io_ompio_scatter_data (ignored)
> [mon241:04077] mca: base: component_find: unable to open
> .../lib/openmpi/mca_fcoll_dynamic:
> .../lib/openmpi/mca_fcoll_dynamic.so: undefined symbol:
> ompi_io_ompio_allgatherv_array (ignored)
> [mon241:04077] mca: base: component_find: unable to open
> .../lib/openmpi/mca_fcoll_two_phase:
> .../lib/openmpi/mca_fcoll_two_phase.so: undefined symbol:
> ompi_io_ompio_set_aggregator_props (ignored)
> [mon241:04077] mca: base: component_find: unable to open
> .../lib/openmpi/mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so:
> undefined symbol: ompi_io_ompio_allgather_array (ignored)
>  Package: Open MPI nixbld@ Distribution
>Open MPI: 1.10.7
>  Open MPI repo revision: v1.10.6-48-g5e373bf
>   Open MPI release date: May 16, 2017
>Open RTE: 1.10.7
>  Open RTE repo revision: v1.10.6-48-g5e373bf
>   Open RTE release date: May 16, 2017
>OPAL: 1.10.7
>  OPAL repo revision: v1.10.6-48-g5e373bf
>   OPAL release date: May 16, 2017
> ...
>
> I dug into the first of these (figured out what library provided it,
> looked at the declared dependencies, poked around in the automake
> file) , and, as far as I could determine, it seems that
> mca_btl_openib.so simply isn't linked to list mca_mpool_grdma.so
> (which provides the symbol) as a dependency.
>
> Seeing as 1.10.7 is no longer supported.  I figured I would try 2.1.3
> in case this has been fixed.  I compiled it up as well, and it seems
> all but the mca_fcoll_individual one have been resolved (I've replaced
> /nix/store/4kh0zbn8pmdqhvwagicswg70rwnpm570-openmpi-2.1.3 with ...)
>
> [mon241:05544] mca_base_component_repository_open: unable to open
> mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
> undefined symbol: ompio_io_ompio_file_read (ignored)
>  Package: Open MPI nixbld@ Distribution
>Open MPI: 2.1.3
>  Open MPI repo revision: v2.1.2-129-gcfd8f3f
>   Op

[OMPI devel] Shared object dependencies

2018-06-08 Thread Tyson Whitehead
This email starts out talking about version 1.10.7 to give a complete
picture.  I tested 2.1.3 as well, it also exhibits this issue,
although to a lesser extent though, and am asking for help on that
release.

I was compiling the OpenMPI 1.10.7 shipped with NixOS against a newer
libibverbs with a large set of drivers and get some strange errors
when when running opmi_info (I've replaced the common prefix
/nix/store/9zm0pqsh67fw0xi5cpnybnd7hgzryffs-openmpi-1.10.7 with ...)

[mon241:04077] mca: base: component_find: unable to open
.../lib/openmpi/mca_btl_openib: .../lib/openmpi/mca_btl_openib.so:
undefined symbol: mca_mpool_grdma_evict (ignored)
[mon241:04077] mca: base: component_find: unable to open
.../lib/openmpi/mca_fcoll_individual:
.../lib/openmpi/mca_fcoll_individual.so: undefined symbol:
mca_io_ompio_file_write (ignored)
[mon241:04077] mca: base: component_find: unable to open
.../lib/openmpi/mca_fcoll_ylib: .../lib/openmpi/mca_fcoll_ylib.so:
undefined symbol: ompi_io_ompio_scatter_data (ignored)
[mon241:04077] mca: base: component_find: unable to open
.../lib/openmpi/mca_fcoll_dynamic:
.../lib/openmpi/mca_fcoll_dynamic.so: undefined symbol:
ompi_io_ompio_allgatherv_array (ignored)
[mon241:04077] mca: base: component_find: unable to open
.../lib/openmpi/mca_fcoll_two_phase:
.../lib/openmpi/mca_fcoll_two_phase.so: undefined symbol:
ompi_io_ompio_set_aggregator_props (ignored)
[mon241:04077] mca: base: component_find: unable to open
.../lib/openmpi/mca_fcoll_static: .../lib/openmpi/mca_fcoll_static.so:
undefined symbol: ompi_io_ompio_allgather_array (ignored)
 Package: Open MPI nixbld@ Distribution
   Open MPI: 1.10.7
 Open MPI repo revision: v1.10.6-48-g5e373bf
  Open MPI release date: May 16, 2017
   Open RTE: 1.10.7
 Open RTE repo revision: v1.10.6-48-g5e373bf
  Open RTE release date: May 16, 2017
   OPAL: 1.10.7
 OPAL repo revision: v1.10.6-48-g5e373bf
  OPAL release date: May 16, 2017
...

I dug into the first of these (figured out what library provided it,
looked at the declared dependencies, poked around in the automake
file) , and, as far as I could determine, it seems that
mca_btl_openib.so simply isn't linked to list mca_mpool_grdma.so
(which provides the symbol) as a dependency.

Seeing as 1.10.7 is no longer supported.  I figured I would try 2.1.3
in case this has been fixed.  I compiled it up as well, and it seems
all but the mca_fcoll_individual one have been resolved (I've replaced
/nix/store/4kh0zbn8pmdqhvwagicswg70rwnpm570-openmpi-2.1.3 with ...)

[mon241:05544] mca_base_component_repository_open: unable to open
mca_fcoll_individual: .../lib/openmpi/mca_fcoll_individual.so:
undefined symbol: ompio_io_ompio_file_read (ignored)
 Package: Open MPI nixbld@ Distribution
   Open MPI: 2.1.3
 Open MPI repo revision: v2.1.2-129-gcfd8f3f
  Open MPI release date: Mar 13, 2018
   Open RTE: 2.1.3
 Open RTE repo revision: v2.1.2-129-gcfd8f3f
  Open RTE release date: Mar 13, 2018
   OPAL: 2.1.3
 OPAL repo revision: v2.1.2-129-gcfd8f3f
  OPAL release date: Mar 13, 2018
...

Again I was able to find this symbol in the mca_io_ompio.so library.
I looked through the source again, and it seems pretty clear that the
function is indeed called, but the library isn't linked to list the
mca_io_ompio.so library as a dependency

Looking through the various shared libraries in the .../lib/openmpi
directory though, and it seems none of them have dependencies on each
other.  How is this suppose to work?  Is the component library just
suppose to load everything so all symbols get resolved?  Is the above
error I'm seeing an error then?

Any insight would be appreciated.

Thanks!  -Tyson

PS:  Please note that the openmpi code was compiled without any
patches and without any special configure flags other than
--prefix= (NixOS also adds --diasble-static and
--disable-dependency-tracking by default, but I removed those, it
didn't make a difference)..
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel