I'm assuming that this is during the startup shortly after mpirun, right? (i.e., during MPI_INIT)
It looks like MPI processes were unable to connect back to the rendezvous point (mpirun) during startup. Do you have any firewalls or port blocking running in your cluster? > -----Original Message----- > From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of Prakash Velayutham > Sent: Friday, April 14, 2006 11:00 AM > To: us...@open-mpi.org > Cc: Prakash Velayutham > Subject: [OMPI users] Open MPI error > > Hi All, > > What does this error mean? > > ************************************************************** > **************** > socket 10: [wins02:19102] [0,0,3]-[0,0,0] mca_oob_tcp_msg_recv: readv > failed with errno=104 > socket 12: [wins01:19281] [0,0,4]-[0,0,0] mca_oob_tcp_msg_recv: readv > failed with errno=104 > socket 6: [wins05:00939] [0,0,1]-[0,0,0] mca_oob_tcp_msg_send_handler: > writev failed with errno=104 > socket 6: [wins05:00939] [0,0,1] ORTE_ERROR_LOG: Communication failure > in file gpr_proxy_put_get.c at line 143 > socket 6: [wins05:00939] [0,0,1]-[0,0,0] > mca_oob_tcp_peer_complete_connect: connection failed (errno=111) - > retrying (pid=939) > socket 6: [wins05:00939] mca_oob_tcp_peer_timer_handler > socket 6: [wins05:00939] [0,0,1]-[0,0,0] > mca_oob_tcp_peer_complete_connect: connection failed (errno=111) - > retrying (pid=939) > socket 6: [wins05:00939] mca_oob_tcp_peer_timer_handler > socket 6: [wins05:00939] [0,0,1]-[0,0,0] > mca_oob_tcp_peer_complete_connect: connection failed (errno=111) - > retrying (pid=939) > ************************************************************** > ***************** > > I am still debugging the code I am working on, but just wanted to get > some insight into where I should be looking at. > > I am running openmpi-1.0.1. > > Thanks, > Prakash > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >