On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com> wrote:
> That being said, the error suggest mca_oob_ud.so is a module from a
> previous install,
> Open MPI was not built on the system it is running, or libibverbs.so.1
> has been removed after
> Open MPI was built.

yes, understood, i compiled openmpi on a node that has all the
libraries installed for our various interconnects, opa/psm/mxm/ib, but
i ran mpirun on a node that has none of them

so the resulting warnings i get

mca_btl_openib: lbrdmacm.so.1
mca_btl_usnic: libfabric.so.1
mca_oob_ud: libibverbs.so.1
mca_mtl_mxm: libmxm.so.2
mca_mtl_ofi: libfabric.so.1
mca_mtl_psm: libpsm_infinipath.so.1
mca_mtl_psm2: libpsm2.so.2
mca_pml_yalla: libmxm.so.2

you referenced them as "errors" above, but mpi actually runs just fine
for me even with these msgs, so i would consider them more warnings.

> So I do encourage you to take a step back, and think if you can find a
> better solution for your site.

there are two alternatives

1 i can compile a specific version of openmpi for each of our clusters
with each specific interconnect libraries

2 i can install all the libraries on all the machines regardless of
whether the interconnect is present

both are certainly plausible, but my effort here is to see if i can
reduce the size of our software stack and/or reduce the number of
compiled versions of openmpi

it would be nice if openmpi had (or may already have) a simple switch
that lets me disable entire portions of the library chain, ie this
host doesn't have a particular interconnect, so don't load any of the
libraries.  this might run counter to how openmpi discovers and load
libs though.
users mailing list

Reply via email to