Gilles,

You described the problem exactly. I think we were able to nail down a
solution to this one through judicious use of the -rpath $MPI_DIR/lib
linker flag, allowing the runtime linker to properly find OpenMPI symbols
at runtime. We're operational. Thanks for your help.

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883

On Mon, Oct 17, 2016 at 9:45 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Sean,
>
>
> if i understand correctly, your built a libtransport_mpi.so library that
> depends on Open MPI, and your main program dlopen libtransport_mpi.so.
>
> in this case, and at least for the time being,  you need to use
> RTLD_GLOBAL in your dlopen flags.
>
>
> Cheers,
>
>
> Gilles
>
> On 10/18/2016 4:53 AM, Sean Ahern wrote:
>
> Folks,
>
> For our code, we have a communication layer that abstracts the code that
> does the actual transfer of data. We call these "transports", and we link
> them as shared libraries. We have created an MPI transport that
> compiles/links against OpenMPI 2.0.1 using the compiler wrappers. When I
> compile OpenMPI with the--disable-dlopen option (thus cramming all of
> OpenMPI's plugins into the MPI library directly), things work great with
> our transport shared library. But when I have a "normal" OpenMPI (without
> --disable-dlopen) and create the same transport shared library, things
> fail. Upon launch, it appears that OpenMPI is unable to find the
> appropriate plugins:
>
> [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
> open mca_patcher_overwrite: /home/sean/work/ceisvn/apex/
> branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-
> 2.0.1/lib/openmpi/mca_patcher_overwrite.so: undefined symbol:
> *mca_patcher_base_patch_t_class* (ignored)
> [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
> open mca_shmem_mmap: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/
> machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_mmap.so:
> undefined symbol: *opal_show_help* (ignored)
> [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
> open mca_shmem_posix: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/
> machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_posix.so:
> undefined symbol: *opal_show_help* (ignored)
> [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
> open mca_shmem_sysv: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/
> machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_sysv.so:
> undefined symbol: *opal_show_help* (ignored)
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   opal_shmem_base_select failed
>   --> Returned value -1 instead of OPAL_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   opal_init failed
>   --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "Error" (-1) instead of "Success" (0)
>
>
> If I skip our shared libraries and instead write a standard MPI-based
> "hello, world" program that links against MPI directly (without
> --disable-dlopen), everything is again fine.
>
> It seems that having the double dlopen is causing problems for OpenMPI
> finding its own shared libraries.
>
> Note: I do have LD_LIBRARY_PATH pointing to …"openmpi-2.0.1/lib", as well
> as OPAL_PREFIX pointing to …"openmpi-2.0.1".
>
> Any thoughts about how I can try to tease out what's going wrong here?
>
> -Sean
>
> --
> Sean Ahern
> Computational Engineering International
> 919-363-0883
>
>
> _______________________________________________
> users mailing 
> listus...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to