Make sure that your LD_LIBRARY_PATH is being set in your shell startup files for *non-interactive logins*.
For example, ensure that LD_LIBRARY_PATH is set properly, even in this case: ----- ssh some-other-node env | grep LD_LIBRARY_PATH ----- (note that this is different than "ssh some-other-node echo $LD_LIBRARY_PATH", because the "$LD_LIBRARY_PATH" will be evaluated on the local node, even before ssh is invoked) I mention this because some shell startup files distinguish between interactive and non-interactive logins; they sometimes terminate early for non-interactive logins. Look for "exit" statements, or conditional blocks that are only invoked during interactive logins, for example. On Feb 14, 2012, at 5:40 AM, Richard Bardwell wrote: > Jeff, > > I wiped out all versions of openmpi on all the nodes including the distro > installed version. > I reinstalled version 1.4.4 on all nodes. > I now get the error that libopen-rte.so.0 cannot be found when running > mpiexec across > different nodes, even though the LD_LIBRARY_PATH for all nodes points to > /usr/local/lib > where the file exists. Any ideas ? > > Many Thanks > > Richard > > ----- Original Message ----- From: "Jeff Squyres" <jsquy...@cisco.com> > To: "Open MPI Users" <us...@open-mpi.org> > Sent: Monday, February 13, 2012 6:28 PM > Subject: Re: [OMPI users] MPI orte_init fails on remote nodes > > >> You might want to fully uninstall the disto-installed version of Open MPI on >> all the nodes (e.g., Red Hat may have installed a different version of Open >> MPI, and that version is being found in your $PATH before your >> custom-installedversion). >> >> >> On Feb 13, 2012, at 12:12 PM, Richard Bardwell wrote: >> >>> OK, 1.4.4 is happily installed on both machines. But, I now get a really >>> weird error when running on the 2 nodes. I get >>> Error: unknown option "--daemonize" >>> even though I am just running with -np 2 -hostfile test.hst >>> >>> The program runs fine on 2 cores if running locally on each node. >>> >>> Any ideas ?? >>> >>> Thanks >>> >>> Richard >>> ----- Original Message ----- From: "Gustavo Correa" <g...@ldeo.columbia.edu> >>> To: "Open MPI Users" <us...@open-mpi.org> >>> Sent: Monday, February 13, 2012 4:22 PM >>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes >>> >>> >>>> On Feb 13, 2012, at 11:02 AM, Richard Bardwell wrote: >>>>> Ralph >>>>> I had done a make clean in the 1.2.8 directory if that is what you meant ? >>>>> Or do I need to do something else ? >>>>> I appreciate your help on this by the way ;-) >>>> Hi Richard >>>> You can install in a different directory, totally separate from 1.2.8. >>>> Create a new work directory [which is not the final installation >>>> directory, just work, say /tmp/openmpi/1.4.4/work]. >>>> Launch the OpenMPI 1.4.4 configure script from this new work directory >>>> with the --prefix pointing to your desired installation directory [e.g. >>>> /home/richard/openmpi/1.4.4/]. >>>> I am assuming this is NFS mounted on the nodes [if you have a cluster]. >>>> [Check all options with 'configure --help'.] >>>> Then do make, make install. >>>> Finally set your PATH and LD_LIBRARY_PATH to point to the new installation >>>> directory, >>>> to prevent mixing with the old 1.2.8. >>>> I have a number of OpenMPI versions here, compiled with various compilers, >>>> and they coexist well this way. >>>> I hope this helps, >>>> Gus Correa >>>>> ----- Original Message ----- >>>>> From: Ralph Castain >>>>> To: Open MPI Users >>>>> Sent: Monday, February 13, 2012 3:41 PM >>>>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes >>>>> You need to clean out the old attempt - that is a stale file >>>>> Sent from my iPad >>>>> On Feb 13, 2012, at 7:36 AM, "Richard Bardwell" <rich...@sharc.co.uk> >>>>> wrote: >>>>>> OK, I installed 1.4.4, rebuilt the exec and guess what ...... I now get >>>>>> some weird errors as below: >>>>>> mca: base: component_find: unable to open >>>>>> /usr/local/lib/openmpi/mca_ras_dash_host >>>>>> along with a few other files >>>>>> even though the .so / .la files are all there ! >>>>>> ----- Original Message ----- >>>>>> From: Ralph Castain >>>>>> To: Open MPI Users >>>>>> Sent: Monday, February 13, 2012 2:59 PM >>>>>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes >>>>>> Good heavens - where did you find something that old? Can you use a more >>>>>> recent version? >>>>>> Sent from my iPad >>>>>> >>>>>>> Gentlemen >>>>>>> I am struggling to get MPI working when the hostfile contains different >>>>>>> nodes. >>>>>>> I get the error below. Any ideas ?? I can ssh without password between >>>>>>> the two >>>>>>> nodes. I am running 1.2.8 MPI on both machines. >>>>>>> Any help most appreciated !!!!! >>>>>>> MPITEST/v8_mpi_test> mpiexec -n 2 --debug-daemons -hostfile test.hst >>>>>>> /home/sharc/MPITEST/v8_mpi_test/mpitest >>>>>>> Daemon [0,0,1] checking in as pid 10490 on host 192.0.2.67 >>>>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file >>>>>>> runtime/orte_init_stage1.c at line 182 >>>>>>> -------------------------------------------------------------------------- >>>>>>> It looks like orte_init failed for some reason; your parallel process is >>>>>>> likely to abort. There are many reasons that a parallel process can >>>>>>> fail during orte_init; some of which are due to configuration or >>>>>>> environment problems. This failure appears to be an internal failure; >>>>>>> here's some additional information (which may only be relevant to an >>>>>>> Open MPI developer): >>>>>>> orte_rml_base_select failed >>>>>>> --> Returned value -13 instead of ORTE_SUCCESS >>>>>>> -------------------------------------------------------------------------- >>>>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file >>>>>>> runtime/orte_system_init.c at line 42 >>>>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file >>>>>>> runtime/orte_init.c at line 52 >>>>>>> Open RTE was unable to initialize properly. The error occured while >>>>>>> attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS. >>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0] >>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received kill_local_procs >>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>>>>>> base/pls_base_orted_cmds.c at line 275 >>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>>>>>> pls_rsh_module.c at line 1158 >>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c >>>>>>> at line 90 >>>>>>> [linux-tmpw:10489] ERROR: A daemon on node 192.0.2.68 failed to start >>>>>>> as expected. >>>>>>> [linux-tmpw:10489] ERROR: There may be more information available from >>>>>>> [linux-tmpw:10489] ERROR: the remote shell (see above). >>>>>>> [linux-tmpw:10489] ERROR: The daemon exited unexpectedly with status >>>>>>> 243. >>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0] >>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received exit >>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>>>>>> base/pls_base_orted_cmds.c at line 188 >>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>>>>>> pls_rsh_module.c at line 1190 >>>>>>> -------------------------------------------------------------------------- >>>>>>> mpiexec was unable to cleanly terminate the daemons for this job. >>>>>>> Returned value Timeout instead of ORTE_SUCCESS. >>>>>>> -------------------------------------------------------------------------- >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/