On Oct 28, 2010, at 2:18 PM, Ray Muno wrote: > On 10/22/2010 07:36 AM, Scott Atchley wrote: >> Ray, >> >> Looking back at your original message, you say that it works if you use the >> Myricom supplied mpirun from the Myrinet roll. I wonder if this is a >> mismatch between libraries on the compute nodes. >> >> What do you get if you use your OMPI's mpirun with: >> >> $ mpirun -n 1 -H <remote_host> ldd $PWD/<your_binary> >> >> I am wondering if ldd find the libraries from your compile or the Myrinet >> roll. >> > > OK, a bit of a hiatus trying to get this resolved. Had to tend other > fires... > > I do think I had an issue of mixed environments. It is a Rocks 5.3 > test cluster and it had an old version of OpenMPI installed as part of > the Rocks 5.3 HPC roll. I have no removed the HPC roll. All nodes were > rebuilt. > > In the previous setup, we could actually run OpenMPI jobs over MX. > > With all other spurious versions of OpenMPI (and MPICH for that matter) > removed, I have rebuilt and re-installed, from a fresh source tree, > OpenMPI 1.4.3. It was built with PGI 10.8 compilers. > > Now, we cannot run with MX at all. > > The install was built with MX. > > $ ompi_info | grep mx > MCA btl: mx (MCA v2.0, API v2.0, Component v1.4.3) > MCA mtl: mx (MCA v2.0, API v2.0, Component v1.4.3) > > I can run with TCP, but now I get > > [compute-0-1.local:24863] mca: base: component_find: unable to open > /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx: perhaps a > missing symbol, or compiled for a different version of Open MPI? (ignored) > > $ ls -l /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx* > -rwxr-xr-x 1 muno muno 1070 Oct 28 12:49 > /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx.la > -rwxr-xr-x 1 muno muno 32044 Oct 28 12:49 > /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx.so > > mpirun -v -nolocal -np 96 --x MX_RCACHE=2 -hostfile machines --mca mtl > mx --mca pml cm cpi.pgi
Does your environment have LD_LIBRARY_PATH set to point to $OMPI/lib and $MX/lib? Does it get set on login? Is $OMPI/bin in your PATH? Scott