Note that exporting the LD_LIBRARY_PATH on the mpirun command line does not 
necessarily apply to launching the remote orteds (it applies to launching the 
remote MPI processes, which are children of the orteds).

Since you're using ssh, you might want to check the shell startup scripts on 
the target nodes (e.g., .bashrc).  It's not sufficient to not overwrite the 
LD_LIBRARY_PATH -- ensure that it is getting set to the right library location 
of the intel support libraries.

You might also want to check your .bashrc that you're not setting 
LD_LIBRARY_PATH (or path or ...) after it exits for non-interactive shells.  
This is a common optimization trick in shell startup files -- exit early when 
it detects that this is a non-interactive shell, and therefore don't do a bunch 
of stuff that assumedly is only needed when you login interactively (e.g., 
create shell aliases and the like).

Random question: is there a reason you're not using torque support?  When you 
use torque support, torque will automatically copy your current environment -- 
including LD_LIBRARY_PATH -- to the target node before launching orted.  Hence, 
it can actually be easier for LD_LIBRARY_PATH issues like this.



On Dec 14, 2012, at 3:17 PM, Blosch, Edwin L wrote:

> I am having a weird problem launching cases with OpenMPI 1.4.3.  It is most 
> likely a problem with a particular node of our cluster, as the jobs will run 
> fine on some submissions, but not other submissions.  It seems to depend on 
> the node list.  I just am having trouble diagnosing which node, and what is 
> the nature of the problem it has.
>  
> One or perhaps more of the orted are indicating they cannot find an Intel 
> Math library.  The error is:
> /release/cfd/openmpi-intel/bin/orted: error while loading shared libraries: 
> libimf.so: cannot open shared object file: No such file or directory
>  
> I’ve checked the environment just before launching mpirun, and 
> LD_LIBRARY_PATH includes the necessary component to point to where the Intel 
> shared libraries are located.  Furthermore, my mpirun command line says to 
> export the LD_LIBRARY_PATH variable:
> Executing ['/release/cfd/openmpi-intel/bin/mpirun', '--machinefile 
> /var/spool/PBS/aux/20761.maruhpc4-mgt', '-np 160', '-x LD_LIBRARY_PATH', '-x 
> MPI_ENVIRONMENT=1', '/tmp/fv420761.maruhpc4-mgt/falconv4_openmpi_jsgl', '-v', 
> '-cycles', '10000', '-ri', 'restart.1', '-ro', 
> '/tmp/fv420761.maruhpc4-mgt/restart.1']
>  
> My shell-initialization script (.bashrc) does not overwrite LD_LIBRARY_PATH.  
> OpenMPI is built explicitly --without-torque and should be using ssh to 
> launch the orted.
>  
> What options can I add to get more debugging of problems launching orted?
>  
> Thanks,
>  
> Ed
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to