I'm assuming that this is during the startup shortly after mpirun,
right?  (i.e., during MPI_INIT)

It looks like MPI processes were unable to connect back to the
rendezvous point (mpirun) during startup.  Do you have any firewalls or
port blocking running in your cluster?


> -----Original Message-----
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Prakash Velayutham
> Sent: Friday, April 14, 2006 11:00 AM
> To: us...@open-mpi.org
> Cc: Prakash Velayutham
> Subject: [OMPI users] Open MPI error
> 
> Hi All,
> 
> What does this error mean?
> 
> **************************************************************
> ****************
> socket 10: [wins02:19102] [0,0,3]-[0,0,0] mca_oob_tcp_msg_recv: readv
> failed with errno=104
> socket 12: [wins01:19281] [0,0,4]-[0,0,0] mca_oob_tcp_msg_recv: readv
> failed with errno=104
> socket 6: [wins05:00939] [0,0,1]-[0,0,0] mca_oob_tcp_msg_send_handler:
> writev failed with errno=104
> socket 6: [wins05:00939] [0,0,1] ORTE_ERROR_LOG: Communication failure
> in file gpr_proxy_put_get.c at line 143
> socket 6: [wins05:00939] [0,0,1]-[0,0,0]
> mca_oob_tcp_peer_complete_connect: connection failed (errno=111) -
> retrying (pid=939)
> socket 6: [wins05:00939] mca_oob_tcp_peer_timer_handler
> socket 6: [wins05:00939] [0,0,1]-[0,0,0]
> mca_oob_tcp_peer_complete_connect: connection failed (errno=111) -
> retrying (pid=939)
> socket 6: [wins05:00939] mca_oob_tcp_peer_timer_handler
> socket 6: [wins05:00939] [0,0,1]-[0,0,0]
> mca_oob_tcp_peer_complete_connect: connection failed (errno=111) -
> retrying (pid=939)
> **************************************************************
> *****************
> 
> I am still debugging the code I am working on, but just wanted to get
> some insight into where I should be looking at.
> 
> I am running openmpi-1.0.1.
> 
> Thanks,
> Prakash
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

Reply via email to