I am having a weird problem launching cases with OpenMPI 1.4.3.  It is most 
likely a problem with a particular node of our cluster, as the jobs will run 
fine on some submissions, but not other submissions.  It seems to depend on the 
node list.  I just am having trouble diagnosing which node, and what is the 
nature of the problem it has.

One or perhaps more of the orted are indicating they cannot find an Intel Math 
library.  The error is:
/release/cfd/openmpi-intel/bin/orted: error while loading shared libraries: 
libimf.so: cannot open shared object file: No such file or directory

I've checked the environment just before launching mpirun, and LD_LIBRARY_PATH 
includes the necessary component to point to where the Intel shared libraries 
are located.  Furthermore, my mpirun command line says to export the 
LD_LIBRARY_PATH variable:
Executing ['/release/cfd/openmpi-intel/bin/mpirun', '--machinefile 
/var/spool/PBS/aux/20761.maruhpc4-mgt', '-np 160', '-x LD_LIBRARY_PATH', '-x 
MPI_ENVIRONMENT=1', '/tmp/fv420761.maruhpc4-mgt/falconv4_openmpi_jsgl', '-v', 
'-cycles', '10000', '-ri', 'restart.1', '-ro', 
'/tmp/fv420761.maruhpc4-mgt/restart.1']

My shell-initialization script (.bashrc) does not overwrite LD_LIBRARY_PATH.  
OpenMPI is built explicitly --without-torque and should be using ssh to launch 
the orted.

What options can I add to get more debugging of problems launching orted?

Thanks,

Ed

Reply via email to