I just installed Open MPI on our cluster and whenever I try to execute
a process on more than one node, I get this error:
$ mpirun -hostfile $HOSTFILE -n 1 hello_c
orted: error while loading shared libraries: libimf.so: cannot open
shared object file: No such file or directory
... followed by a whole bunch of timeout errors that I'm assuming were
caused by the library error above.
The cluster has 16 nodes and is running Ubuntu 8.04 Server. The Open
MPI source was compiled with openib support using the Intel compilers:
$ ./configure --prefix=/usr/local --with-openib=/usr/local/lib CC=icc
CFLAGS=-m64 CXX=icpc CXXFLAGS=-m64 F77=ifort FFLAGS=-m64 FC=ifort
FCFLAGS=-m64
I've installed the Intel compilers on the master node only, but I've
installed them in the /usr/local directory, which is accessible to all
nodes via NFS. Similarly, I've compiled / installed Open MPI only on
the master node, but in the NFS-shared /usr/local directory as well.
Finally, I've compiled / installed all of the OpenFabrics libraries on
the master node only but in the NFS-shared /usr/local/lib directory.
I've run the iccvars.sh and ifortvar.sh scripts on each node to ensure
that the environment variables were setup for the Intel compilers on
each node. Additionally, I've modified the LD_LIBRARY_PATH variable
on each node to include /usr/local/lib and /usr/local/lib/openmpi so
that each node can see the Infiniband and OpenMPI libraries.
If I only execute Open MPI on the master node, it works fine
$ mpirun -hostfile $HOSTFILE -n 1 hello_c
Hello, world, I am 0 of 1
Sorry for the long post and thanks for your help in advance!
-------------------------------------------
Chris Tanner
Space Systems Design Lab
Georgia Institute of Technology
christopher.tan...@gatech.edu
-------------------------------------------