Jeff,

Thanks for the reply and your attention to this.

Can you -- and anyone else in
similar circumstances -- let me know how common this scenario is?

I think this depends on the environment. For us and many other ISVs, it is very common. The build host is almost always physically different than the target systems, and the target systems usually only have a subset of the network hardware for which the application was originally configured (and may have drivers installed in different places). The application will be configured for all possible interconnects. On the individual target systems (each with possibly different interconnect types), the specific interconnect will be selected either by user input or by auto-detection. Thus, we would build for mx, gm, mvapi, openib, tcp, sm, ...; however, some systems may have only mx or only mvapi or neither.

The MCA of OpenMPI seems like it is very well suited for such a development environment because certain components can be selectively activated at run-time depending on the system. Your idea of applying the filter earlier and only opening the desired modules sounds like an excellent approach.

Thanks for considering the issue. Please let me know if I can provide any more information.

-Patrick


Jeff Squyres (jsquyres) wrote:

This is due to the way the OMPI finds and loads modules.  What actually
happens is that OMPI looks for *all* modules of a given type and
dlopen's them.  It then applies the filter of which components are
desired and dlclose's all the undesired ones.  It certainly would be
better to apply the filter earlier and only open the desired modules.

We actually identified this behavior quite a while ago, but never put a
high priority on fixing it because we didn't think it would be much of
an issue (because most people build/run in homogeneous environments).
But pending resource availability, I agree that this behavior is
sub-optimal and should be fixed.  I'll enter this issue on the bug
tracker so that we don't forget about it.  Can you -- and anyone else in
similar circumstances -- let me know how common this scenario is?

There is one workaround, however.  The MCA parameter
mca_component_show_load_errors defaults to a "1" value.  When it's 1,
all warnings regarding the loading of components are displayed (i.e.,
the messages you're seeing).  Setting this value to 0 will disable the
messages.  However, you won't see *any* messages about components not
loading.  For example, if you have components that you think should be
loading but are not, you won't be notified.
That being said, these messages are not usually a concern for end-users,
however -- they are typically more useful for the OMPI developers.  For
example, if a developer accidentally does something to make a plugin
un-loadable (e.g., leaves a symbol out), having these messages displayed
at mpirun time can be *very* useful.  Plugins that are shipped in a
tarball hopefully do not suffer from such issues :-), and usually have
rpath information compiled in them so even LD_LIBRARY_PATH issues
shouldn't be much of a problem.


-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Patrick Jessee
Sent: Wednesday, June 28, 2006 5:18 PM
To: Open MPI Users
Subject: [OMPI users] error messages for btl components that aren't loaded


Hello. I'm getting some odd error messages in certain situations associated with the btl components (happens with both 1.0.2 and 1.1). When certain btl components are NOT loaded, openMPI issues error messages associated with those very components. For instance, consider an application that is built with an openMPI installation that was configured with mvapi and mx (in addition to tcp,sm,self). If that application is taken to a system that does not have mvapi and mx interconnects installed and is explicitly started for TCP by using "--mca btl self,tcp,sm", then the following comes from openMPI:

[devi01:01659] mca: base: component_find: unable to open: libvapi.so: cannot open shared object file: No such file or directory (ignored) [devi01:01659] mca: base: component_find: unable to open: libvapi.so: cannot open shared object file: No such file or directory (ignored) [devi01:01659] mca: base: component_find: unable to open: libmyriexpress.so: cannot open shared object file: No such file or directory (ignored) [devi02:31845] mca: base: component_find: unable to open: libvapi.so: cannot open shared object file: No such file or directory (ignored) [devi02:31845] mca: base: component_find: unable to open: libvapi.so: cannot open shared object file: No such file or directory (ignored) [devi02:31845] mca: base: component_find: unable to open: libmyriexpress.so: cannot open shared object file: No such file or directory (ignored)

These are not fatal, but they definitely give the wrong impression that something is not right. The "--mca btl self,tcp,sm" option should tell openMPI only to load loopback, tcp, and shared memory components (as these are the only btl components that should be operational on the system). The mvapi and mx components (which need libvapi.so and libmyriexpress.so, respectively), should not be loaded and thus libvapi.so and libmyriexpress.so should not be needed or even searched for. The same thing happens with "--mca btl ^mvapi,mx". Interestingly, even on a system that does have MX, the libmyriexpress.so errors show up if the mx btl component is not loaded.

Does anyone know (a) why openMPI is complaining about a shared library from a component that isn't even loaded, and (b) how to avoid the seemingly superfluous error messages? Any help is greatly appreciated.

-Patrick




_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


<<attachment: pj.vcf>>

Reply via email to