Add -mca plm_base_verbose 5 --leave-session-attached to the cmd line - that will show the ssh command being used to start each orted.
On Dec 14, 2012, at 12:17 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com> wrote: > I am having a weird problem launching cases with OpenMPI 1.4.3. It is most > likely a problem with a particular node of our cluster, as the jobs will run > fine on some submissions, but not other submissions. It seems to depend on > the node list. I just am having trouble diagnosing which node, and what is > the nature of the problem it has. > > One or perhaps more of the orted are indicating they cannot find an Intel > Math library. The error is: > /release/cfd/openmpi-intel/bin/orted: error while loading shared libraries: > libimf.so: cannot open shared object file: No such file or directory > > I’ve checked the environment just before launching mpirun, and > LD_LIBRARY_PATH includes the necessary component to point to where the Intel > shared libraries are located. Furthermore, my mpirun command line says to > export the LD_LIBRARY_PATH variable: > Executing ['/release/cfd/openmpi-intel/bin/mpirun', '--machinefile > /var/spool/PBS/aux/20761.maruhpc4-mgt', '-np 160', '-x LD_LIBRARY_PATH', '-x > MPI_ENVIRONMENT=1', '/tmp/fv420761.maruhpc4-mgt/falconv4_openmpi_jsgl', '-v', > '-cycles', '10000', '-ri', 'restart.1', '-ro', > '/tmp/fv420761.maruhpc4-mgt/restart.1'] > > My shell-initialization script (.bashrc) does not overwrite LD_LIBRARY_PATH. > OpenMPI is built explicitly --without-torque and should be using ssh to > launch the orted. > > What options can I add to get more debugging of problems launching orted? > > Thanks, > > Ed > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users