On Sep 20, 2010, at 1:20 PM, Richard Walsh wrote:

> I was not expecting things to work, and find that codes compiled using
> OpenMPI 1.4.1 commands under SLES 10.2 produce the following message
> when run under SLES11:
> 
> mca: base: component_find: unable to open 
> /share/apps/openmpi-intel/1.4.1/lib/openmpi/mca_btl_openib: perhaps a missing 
> symbol, or compiled for a different version of Open MPI? (ignored)
> 
> This file is in position and is NOT the result of a faulty mixed-release 
> over-build
> (things work great under SLES10.2).
> 
> The message indicates that (as the default is to build OpenMPI dynamically
> with share objects) in loading this required IB-related library there must
> be a format incompatibility.   However, I find that if I force the use of GE 
> with:
> 
> -mca btl tcp,self
> 
> things seem to run OK under SLES 11.
> 
> Could someone add some detail here on what, if anything, I can expect to
> work when we try to run old SLES 10.2 build OpenMPI 1.4.1 binaries under
> SLES 11.   I would have thought NOTHING, but maybe that is not quite right.

I do not have any experience with SLES, so I can't comment for sure.  But I'd 
*guess* that there was a symbol change between 10.2 and 11 in the OpenFabrics 
libraries such that the openib BTL is unable to find a symbol that it needs.  
Another possibility is the dependent libraries of libibverbs.so changed (e.g., 
perhaps libibverbs.so required -lsysfs in 10.2, but then libsysfs.so doesn't 
exist in 11...?).  Does the SLES release notes say anything about binary 
compatibility (particularly of the OpenFabrics libraries) between SLES 10.2 and 
11?

I'm quite sure that recompiling all of OMPI should make it work -- I'd be very 
surprised if the OpenFabrics libraries in SLES 11 were inconsistent such that 
you couldn't just rebuild and have it work.

You may be able to recompile *just the openib BTL module* on SLES 11, drop it 
in your OMPI 1.4.2 installation, and have it work again.  But that's not a 
guarantee -- other things may have changed such that a recompile may change 
some struct sizes or somesuch.  

Probably your best bet would be:

- investigate if there's a missing symbol or library in the current 
mca_btl_openib.so (e.g., run nm on mca_btl_openib.so and ensure that all those 
libraries are present in SLES 11)
    - if it's a missing library, see if you can supply a dummy library to make 
it work (that may involve a little trickery)
- recompile OMPI 1.4.2 under SLES 11
    - copy in the mca_btl_openib.so from that install to your old OMPI install
    - run some apps and see if it works
    - if it does, relax, have a beer^H^H^H^Hnon-cafinated tea
- if it does not work, you may have to go the recompile-everything route

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to