Make sure that your LD_LIBRARY_PATH is being set in your shell startup files 
for *non-interactive logins*.

For example, ensure that LD_LIBRARY_PATH is set properly, even in this case:

-----
ssh some-other-node env | grep LD_LIBRARY_PATH
-----

(note that this is different than "ssh some-other-node echo $LD_LIBRARY_PATH", 
because the "$LD_LIBRARY_PATH" will be evaluated on the local node, even before 
ssh is invoked)

I mention this because some shell startup files distinguish between interactive 
and non-interactive logins; they sometimes terminate early for non-interactive 
logins.  Look for "exit" statements, or conditional blocks that are only 
invoked during interactive logins, for example.



On Feb 14, 2012, at 5:40 AM, Richard Bardwell wrote:

> Jeff,
> 
> I wiped out all versions of openmpi on all the nodes including the distro 
> installed version.
> I reinstalled version 1.4.4 on all nodes.
> I now get the error that libopen-rte.so.0 cannot be found when running 
> mpiexec across
> different nodes, even though the LD_LIBRARY_PATH for all nodes points to 
> /usr/local/lib
> where the file exists. Any ideas ?
> 
> Many Thanks
> 
> Richard
> 
> ----- Original Message ----- From: "Jeff Squyres" <jsquy...@cisco.com>
> To: "Open MPI Users" <us...@open-mpi.org>
> Sent: Monday, February 13, 2012 6:28 PM
> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
> 
> 
>> You might want to fully uninstall the disto-installed version of Open MPI on 
>> all the nodes (e.g., Red Hat may have installed a different version of Open 
>> MPI, and that version is being found in your $PATH before your 
>> custom-installedversion).
>> 
>> 
>> On Feb 13, 2012, at 12:12 PM, Richard Bardwell wrote:
>> 
>>> OK, 1.4.4 is happily installed on both machines. But, I now get a really
>>> weird error when running on the 2 nodes. I get
>>> Error: unknown option "--daemonize"
>>> even though I am just running with -np 2 -hostfile test.hst
>>> 
>>> The program runs fine on 2 cores if running locally on each node.
>>> 
>>> Any ideas ??
>>> 
>>> Thanks
>>> 
>>> Richard
>>> ----- Original Message ----- From: "Gustavo Correa" <g...@ldeo.columbia.edu>
>>> To: "Open MPI Users" <us...@open-mpi.org>
>>> Sent: Monday, February 13, 2012 4:22 PM
>>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
>>> 
>>> 
>>>> On Feb 13, 2012, at 11:02 AM, Richard Bardwell wrote:
>>>>> Ralph
>>>>> I had done a make clean in the 1.2.8 directory if that is what you meant ?
>>>>> Or do I need to do something else ?
>>>>> I appreciate your help on this by the way ;-)
>>>> Hi Richard
>>>> You can install in a different directory, totally separate from 1.2.8.
>>>> Create a new work directory [which is not the final installation 
>>>> directory, just work, say /tmp/openmpi/1.4.4/work].
>>>> Launch the OpenMPI 1.4.4 configure script from this new work directory 
>>>> with the --prefix pointing to your desired installation directory [e.g. 
>>>> /home/richard/openmpi/1.4.4/].
>>>> I am assuming this is NFS mounted on the nodes [if you have a cluster].
>>>> [Check all options with 'configure --help'.]
>>>> Then do make, make install.
>>>> Finally set your PATH and LD_LIBRARY_PATH to point to the new installation 
>>>> directory,
>>>> to prevent mixing with the old 1.2.8.
>>>> I have a number of OpenMPI versions here, compiled with various compilers,
>>>> and they coexist well this way.
>>>> I hope this helps,
>>>> Gus Correa
>>>>> ----- Original Message -----
>>>>> From: Ralph Castain
>>>>> To: Open MPI Users
>>>>> Sent: Monday, February 13, 2012 3:41 PM
>>>>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
>>>>> You need to clean out the old attempt - that is a stale file
>>>>> Sent from my iPad
>>>>> On Feb 13, 2012, at 7:36 AM, "Richard Bardwell" <rich...@sharc.co.uk> 
>>>>> wrote:
>>>>>> OK, I installed 1.4.4, rebuilt the exec and guess what ...... I now get 
>>>>>> some weird errors as below:
>>>>>> mca: base: component_find: unable to open 
>>>>>> /usr/local/lib/openmpi/mca_ras_dash_host
>>>>>> along with a few other files
>>>>>> even though the .so / .la files are all there !
>>>>>> ----- Original Message -----
>>>>>> From: Ralph Castain
>>>>>> To: Open MPI Users
>>>>>> Sent: Monday, February 13, 2012 2:59 PM
>>>>>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
>>>>>> Good heavens - where did you find something that old? Can you use a more 
>>>>>> recent version?
>>>>>> Sent from my iPad
>>>>>> 
>>>>>>> Gentlemen
>>>>>>> I am struggling to get MPI working when the hostfile contains different 
>>>>>>> nodes.
>>>>>>> I get the error below. Any ideas ?? I can ssh without password between 
>>>>>>> the two
>>>>>>> nodes. I am running 1.2.8 MPI on both machines.
>>>>>>> Any help most appreciated !!!!!
>>>>>>> MPITEST/v8_mpi_test> mpiexec -n 2 --debug-daemons -hostfile test.hst 
>>>>>>> /home/sharc/MPITEST/v8_mpi_test/mpitest
>>>>>>> Daemon [0,0,1] checking in as pid 10490 on host 192.0.2.67
>>>>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
>>>>>>> runtime/orte_init_stage1.c at line 182
>>>>>>> --------------------------------------------------------------------------
>>>>>>> It looks like orte_init failed for some reason; your parallel process is
>>>>>>> likely to abort. There are many reasons that a parallel process can
>>>>>>> fail during orte_init; some of which are due to configuration or
>>>>>>> environment problems. This failure appears to be an internal failure;
>>>>>>> here's some additional information (which may only be relevant to an
>>>>>>> Open MPI developer):
>>>>>>> orte_rml_base_select failed
>>>>>>> --> Returned value -13 instead of ORTE_SUCCESS
>>>>>>> --------------------------------------------------------------------------
>>>>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
>>>>>>> runtime/orte_system_init.c at line 42
>>>>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
>>>>>>> runtime/orte_init.c at line 52
>>>>>>> Open RTE was unable to initialize properly. The error occured while
>>>>>>> attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS.
>>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]
>>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received kill_local_procs
>>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file 
>>>>>>> base/pls_base_orted_cmds.c at line 275
>>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file 
>>>>>>> pls_rsh_module.c at line 1158
>>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c 
>>>>>>> at line 90
>>>>>>> [linux-tmpw:10489] ERROR: A daemon on node 192.0.2.68 failed to start 
>>>>>>> as expected.
>>>>>>> [linux-tmpw:10489] ERROR: There may be more information available from
>>>>>>> [linux-tmpw:10489] ERROR: the remote shell (see above).
>>>>>>> [linux-tmpw:10489] ERROR: The daemon exited unexpectedly with status 
>>>>>>> 243.
>>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]
>>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received exit
>>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file 
>>>>>>> base/pls_base_orted_cmds.c at line 188
>>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file 
>>>>>>> pls_rsh_module.c at line 1190
>>>>>>> --------------------------------------------------------------------------
>>>>>>> mpiexec was unable to cleanly terminate the daemons for this job. 
>>>>>>> Returned value Timeout instead of ORTE_SUCCESS.
>>>>>>> --------------------------------------------------------------------------
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to