Sean,
if i understand correctly, your built a libtransport_mpi.so library that
depends on Open MPI, and your main program dlopen libtransport_mpi.so.
in this case, and at least for the time being, you need to use
RTLD_GLOBAL in your dlopen flags.
Cheers,
Gilles
On 10/18/2016 4:53 AM, Sean Ahern wrote:
Folks,
For our code, we have a communication layer that abstracts the code
that does the actual transfer of data. We call these "transports", and
we link them as shared libraries. We have created an MPI transport
that compiles/links against OpenMPI 2.0.1 using the compiler wrappers.
When I compile OpenMPI with the--disable-dlopenoption (thus cramming
all of OpenMPI's plugins into the MPI library directly), things work
great with our transport shared library. But when I have a "normal"
OpenMPI (without --disable-dlopen) and create the same transport
shared library, things fail. Upon launch, it appears that OpenMPI is
unable to find the appropriate plugins:
[hyperion.ceintl.com:25595 <http://hyperion.ceintl.com:25595>]
mca_base_component_repository_open: unable to open
mca_patcher_overwrite:
/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_patcher_overwrite.so:
undefined symbol: *mca_patcher_base_patch_t_class* (ignored)
[hyperion.ceintl.com:25595 <http://hyperion.ceintl.com:25595>]
mca_base_component_repository_open: unable to open mca_shmem_mmap:
/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_mmap.so:
undefined symbol: *opal_show_help* (ignored)
[hyperion.ceintl.com:25595 <http://hyperion.ceintl.com:25595>]
mca_base_component_repository_open: unable to open
mca_shmem_posix:
/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_posix.so:
undefined symbol: *opal_show_help* (ignored)
[hyperion.ceintl.com:25595 <http://hyperion.ceintl.com:25595>]
mca_base_component_repository_open: unable to open mca_shmem_sysv:
/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_sysv.so:
undefined symbol: *opal_show_help* (ignored)
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
If I skip our shared libraries and instead write a standard MPI-based
"hello, world" program that links against MPI directly (without
--disable-dlopen), everything is again fine.
It seems that having the double dlopenis causing problems for OpenMPI
finding its own shared libraries.
Note: I do have LD_LIBRARY_PATHpointing to …"openmpi-2.0.1/lib", as
well as OPAL_PREFIXpointing to …"openmpi-2.0.1".
Any thoughts about how I can try to tease out what's going wrong here?
-Sean
--
Sean Ahern
Computational Engineering International
919-363-0883
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users