On 10/22/2010 07:36 AM, Scott Atchley wrote: > Ray, > > Looking back at your original message, you say that it works if you use the > Myricom supplied mpirun from the Myrinet roll. I wonder if this is a mismatch > between libraries on the compute nodes. > > What do you get if you use your OMPI's mpirun with: > > $ mpirun -n 1 -H <remote_host> ldd $PWD/<your_binary> > > I am wondering if ldd find the libraries from your compile or the Myrinet > roll. >
OK, a bit of a hiatus trying to get this resolved. Had to tend other fires... I do think I had an issue of mixed environments. It is a Rocks 5.3 test cluster and it had an old version of OpenMPI installed as part of the Rocks 5.3 HPC roll. I have no removed the HPC roll. All nodes were rebuilt. In the previous setup, we could actually run OpenMPI jobs over MX. With all other spurious versions of OpenMPI (and MPICH for that matter) removed, I have rebuilt and re-installed, from a fresh source tree, OpenMPI 1.4.3. It was built with PGI 10.8 compilers. Now, we cannot run with MX at all. The install was built with MX. $ ompi_info | grep mx MCA btl: mx (MCA v2.0, API v2.0, Component v1.4.3) MCA mtl: mx (MCA v2.0, API v2.0, Component v1.4.3) I can run with TCP, but now I get [compute-0-1.local:24863] mca: base: component_find: unable to open /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) $ ls -l /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx* -rwxr-xr-x 1 muno muno 1070 Oct 28 12:49 /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx.la -rwxr-xr-x 1 muno muno 32044 Oct 28 12:49 /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx.so mpirun -v -nolocal -np 96 --x MX_RCACHE=2 -hostfile machines --mca mtl mx --mca pml cm cpi.pgi [compute-0-3.local:21116] mca: base: component_find: unable to open /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [compute-0-3.local:21115] mca: base: component_find: unable to open /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- A requested component was not found, or was unable to be opened. This means that this component is either not installed or is unable to be used on your system (e.g., sometimes this means that shared libraries that the component requires are unable to be found/loaded). Note that Open MPI stopped checking at the first component that it did not find. Host: compute-0-3.local Framework: mtl Component: mx -------------------------------------------------------------------------- [compute-0-3.local:21116] mca: base: components_open: component pml / cm open function failed -------------------------------------------------------------------------- A requested component was not found, or was unable to be opened. This means that this component is either not installed or is unable to be used on your system (e.g., sometimes this means that shared libraries that the component requires are unable to be found/loaded). Note that Open MPI stopped checking at the first component that it did not find. Host: compute-0-3.local Framework: mtl Component: mx -------------------------------------------------------------------------- [compute-0-3.local:21115] mca: base: components_open: component pml / cm open function failed [compute-0-3.local:21117] mca: base: component_find: unable to open /share/apps/opt/OpenMPI/1.4.3/PGI/10.8/lib/openmpi/mca_mtl_mx: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- -- Ray Muno University of Minnesota