On Feb 13, 2012, at 1:28 PM, Jeff Squyres wrote:

> You might want to fully uninstall the disto-installed version of Open MPI on 
> all the nodes (e.g., Red Hat may have installed a different version of Open 
> MPI, and that version is being found in your $PATH before your 
> custom-installedversion).

Besides Jeff's suggestion, also prepend [rather than append], the new OpenMPI
to your PATH and LD_LIBRARY_PATH [bash style here]:

export PATH=/my/openmpi/1.4.4/bin:$PATH
export LD_LIBRARY_PATH=/my/openmpi/1.4.4/lib:$LD_LIBRARY_PATH

Multiple installed flavors, releases, and versions of MPI are often a source of 
confusion.

I hope this helps,
Gus Correa

> 
> 
> On Feb 13, 2012, at 12:12 PM, Richard Bardwell wrote:
> 
>> OK, 1.4.4 is happily installed on both machines. But, I now get a really
>> weird error when running on the 2 nodes. I get
>> Error: unknown option "--daemonize"
>> even though I am just running with -np 2 -hostfile test.hst
>> 
>> The program runs fine on 2 cores if running locally on each node.
>> 
>> Any ideas ??
>> 
>> Thanks
>> 
>> Richard
>> ----- Original Message ----- From: "Gustavo Correa" <g...@ldeo.columbia.edu>
>> To: "Open MPI Users" <us...@open-mpi.org>
>> Sent: Monday, February 13, 2012 4:22 PM
>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
>> 
>> 
>>> On Feb 13, 2012, at 11:02 AM, Richard Bardwell wrote:
>>>> Ralph
>>>> I had done a make clean in the 1.2.8 directory if that is what you meant ?
>>>> Or do I need to do something else ?
>>>> I appreciate your help on this by the way ;-)
>>> Hi Richard
>>> You can install in a different directory, totally separate from 1.2.8.
>>> Create a new work directory [which is not the final installation directory, 
>>> just work, say /tmp/openmpi/1.4.4/work].
>>> Launch the OpenMPI 1.4.4 configure script from this new work directory with 
>>> the --prefix pointing to your desired installation directory [e.g. 
>>> /home/richard/openmpi/1.4.4/].
>>> I am assuming this is NFS mounted on the nodes [if you have a cluster].
>>> [Check all options with 'configure --help'.]
>>> Then do make, make install.
>>> Finally set your PATH and LD_LIBRARY_PATH to point to the new installation 
>>> directory,
>>> to prevent mixing with the old 1.2.8.
>>> I have a number of OpenMPI versions here, compiled with various compilers,
>>> and they coexist well this way.
>>> I hope this helps,
>>> Gus Correa
>>>> ----- Original Message -----
>>>> From: Ralph Castain
>>>> To: Open MPI Users
>>>> Sent: Monday, February 13, 2012 3:41 PM
>>>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
>>>> You need to clean out the old attempt - that is a stale file
>>>> Sent from my iPad
>>>> On Feb 13, 2012, at 7:36 AM, "Richard Bardwell" <rich...@sharc.co.uk> 
>>>> wrote:
>>>>> OK, I installed 1.4.4, rebuilt the exec and guess what ...... I now get 
>>>>> some weird errors as below:
>>>>> mca: base: component_find: unable to open 
>>>>> /usr/local/lib/openmpi/mca_ras_dash_host
>>>>> along with a few other files
>>>>> even though the .so / .la files are all there !
>>>>> ----- Original Message -----
>>>>> From: Ralph Castain
>>>>> To: Open MPI Users
>>>>> Sent: Monday, February 13, 2012 2:59 PM
>>>>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
>>>>> Good heavens - where did you find something that old? Can you use a more 
>>>>> recent version?
>>>>> Sent from my iPad
>>>>> 
>>>>>> Gentlemen
>>>>>> I am struggling to get MPI working when the hostfile contains different 
>>>>>> nodes.
>>>>>> I get the error below. Any ideas ?? I can ssh without password between 
>>>>>> the two
>>>>>> nodes. I am running 1.2.8 MPI on both machines.
>>>>>> Any help most appreciated !!!!!
>>>>>> MPITEST/v8_mpi_test> mpiexec -n 2 --debug-daemons -hostfile test.hst 
>>>>>> /home/sharc/MPITEST/v8_mpi_test/mpitest
>>>>>> Daemon [0,0,1] checking in as pid 10490 on host 192.0.2.67
>>>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
>>>>>> runtime/orte_init_stage1.c at line 182
>>>>>> --------------------------------------------------------------------------
>>>>>> It looks like orte_init failed for some reason; your parallel process is
>>>>>> likely to abort. There are many reasons that a parallel process can
>>>>>> fail during orte_init; some of which are due to configuration or
>>>>>> environment problems. This failure appears to be an internal failure;
>>>>>> here's some additional information (which may only be relevant to an
>>>>>> Open MPI developer):
>>>>>> orte_rml_base_select failed
>>>>>> --> Returned value -13 instead of ORTE_SUCCESS
>>>>>> --------------------------------------------------------------------------
>>>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
>>>>>> runtime/orte_system_init.c at line 42
>>>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
>>>>>> runtime/orte_init.c at line 52
>>>>>> Open RTE was unable to initialize properly. The error occured while
>>>>>> attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS.
>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]
>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received kill_local_procs
>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file 
>>>>>> base/pls_base_orted_cmds.c at line 275
>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file 
>>>>>> pls_rsh_module.c at line 1158
>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c 
>>>>>> at line 90
>>>>>> [linux-tmpw:10489] ERROR: A daemon on node 192.0.2.68 failed to start as 
>>>>>> expected.
>>>>>> [linux-tmpw:10489] ERROR: There may be more information available from
>>>>>> [linux-tmpw:10489] ERROR: the remote shell (see above).
>>>>>> [linux-tmpw:10489] ERROR: The daemon exited unexpectedly with status 243.
>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]
>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received exit
>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file 
>>>>>> base/pls_base_orted_cmds.c at line 188
>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file 
>>>>>> pls_rsh_module.c at line 1190
>>>>>> --------------------------------------------------------------------------
>>>>>> mpiexec was unable to cleanly terminate the daemons for this job. 
>>>>>> Returned value Timeout instead of ORTE_SUCCESS.
>>>>>> --------------------------------------------------------------------------
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to