I just installed Open MPI on our cluster and whenever I try to execute a process on more than one node, I get this error:

$ mpirun -hostfile $HOSTFILE -n 1 hello_c
orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory ... followed by a whole bunch of timeout errors that I'm assuming were caused by the library error above.

The cluster has 16 nodes and is running Ubuntu 8.04 Server. The Open MPI source was compiled with openib support using the Intel compilers: $ ./configure --prefix=/usr/local --with-openib=/usr/local/lib CC=icc CFLAGS=-m64 CXX=icpc CXXFLAGS=-m64 F77=ifort FFLAGS=-m64 FC=ifort FCFLAGS=-m64

I've installed the Intel compilers on the master node only, but I've installed them in the /usr/local directory, which is accessible to all nodes via NFS. Similarly, I've compiled / installed Open MPI only on the master node, but in the NFS-shared /usr/local directory as well. Finally, I've compiled / installed all of the OpenFabrics libraries on the master node only but in the NFS-shared /usr/local/lib directory.

I've run the iccvars.sh and ifortvar.sh scripts on each node to ensure that the environment variables were setup for the Intel compilers on each node. Additionally, I've modified the LD_LIBRARY_PATH variable on each node to include /usr/local/lib and /usr/local/lib/openmpi so that each node can see the Infiniband and OpenMPI libraries.

If I only execute Open MPI on the master node, it works fine
$ mpirun -hostfile $HOSTFILE -n 1 hello_c
Hello, world, I am 0 of 1

Sorry for the long post and thanks for your help in advance!

-------------------------------------------
Chris Tanner
Space Systems Design Lab
Georgia Institute of Technology
christopher.tan...@gatech.edu
-------------------------------------------



Reply via email to