Hi,

I have tried the following to no avail.

On 499 machines running openMPI 1.2.7:

mpirun -np 499 -bynode -hostfile nodelist /main/mpiHelloWorld ...

With different combinations of the following parameters

-mca btl_base_verbose 1 -mca btl_base_debug 2 -mca oob_base_verbose 1 -mca
oob_tcp_debug 1 -mca oob_tcp_listen_mode listen_thread -mca
btl_tcp_endpoint_cache 65536 -mca oob_tcp_peer_retries 120

I still get the No route to Host error messages.

Also, I tried with -mca pls_rsh_num_concurrent 499 --debug-daemons and did
not get any additional useful debug output other than the error messages.

I did notice one strange thing though. The following is always successful
(atleast all my attempts)

mpirun -np 100 -bynode -hostfile nodelist /main/mpiHelloWorld

but

mpirun -np 100 -bynode -hostfile nodelist /main/mpiHelloWorld
--debug-daemons

prints these error messages at the end from each of the nodes :

[idx2:04064] [0,0,1] orted_recv_pls: received message from [0,0,0]
[idx2:04064] [0,0,1] orted_recv_pls: received exit
[idx2:04064] *** Process received signal ***
[idx2:04064] Signal: Segmentation fault (11)
[idx2:04064] Signal code:  (128)
[idx2:04064] Failing at address: (nil)
[idx2:04064] [ 0] /lib/libpthread.so.0 [0x2b92cc729f30]
[idx2:04064] [ 1] /usr/lib64/libopen-rte.so.0(orte_pls_base_close+0x18)
[0x2b92cc0202a2]
[idx2:04064] [ 2] /usr/lib64/libopen-rte.so.0(orte_system_finalize+0x70)
[0x2b92cc00b5ac]
[idx2:04064] [ 3] /usr/lib64/libopen-rte.so.0(orte_finalize+0x20)
[0x2b92cc00875c]
[idx2:04064] [ 4] /usr/bin/orted(main+0x8a6) [0x4024ae]
[idx2:04064] *** End of error message ***


I am not sure if this points to the actual cause for these issues. Is is to
do with the openMPI 1.2.7 having posix enabled in the current configuration
on these nodes? 

Thanks again for your continued help.

Regards,

Prasanna.  

> Message: 2
> Date: Thu, 11 Sep 2008 12:16:50 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] Need help resolving No route to host error
> with OpenMPI 1.1.2
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <7110e2d0-eb89-4293-a241-8487174b4...@cisco.com>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> 
> On Sep 10, 2008, at 9:29 PM, Prasanna Ranganathan wrote:
> 
>> I have upgraded to 1.2.7 and am still noticing the issue.
> 
> FWIW, we didn't change anything with regards to OOB and TCP from 1.2.6
> -> 1.2.7, but it's still good to be at the latest version.
> 
> Try running with this MCA parameter:
> 
>      mpirun --mca oob_tcp_listen_mode listen_thread ...
> 
> Sorry; I forgot that we did not enable that option by default in the
> v1.2 series.

Reply via email to