Hi Jeff, Thanks to you, I figured the problem . As you suspected, it was iptables which was acting as firewalls in some machines. So, after I stopped the iptable, the MPI communication is going fine. Even I tried with 5 machines together and the communication is going allright. Thanks again, Jagannath
On Thu, May 26, 2011 at 5:19 AM, Jeff Squyres <jsquy...@cisco.com> wrote: > ssh may be allowed but other random TCP ports may not. > > iptables is the typical firewall software that most Linux installations > use; it may have been enabled by default. > > I'm a little doubtful that this is your problem, though, because you're > apparently able to *launch* your application, which means that OMPI's > out-of-band communication system was able to make some sockets. So it's a > little weird that the MPI layer's TCP sockets were borked. But let's check > for firewall software, first... > > > On May 26, 2011, at 12:42 AM, Jagannath Mondal wrote: > > > Hi Jeff, > > I was wondering how I can check whether there is any firewall > software . In fact I can use ssh to go from one machine to another . But, > only with mpirun , it does not work. I was wondering whether it is possible > that even in presence of firewall ssh may work but mpirun may not. > > Jagannath > > > > On Wed, May 25, 2011 at 10:42 PM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: > > Are you running any firewall software? > > > > Sent from my phone. No type good. > > > > On May 25, 2011, at 10:41 PM, "Jagannath Mondal" < > jagannath.mon...@gmail.com> wrote: > > > >> Hi, > >> I am having a problem in running mpirun over multiple nodes. > >> To run a job over two 8-core processors, I generated a hostfile as > follows: > >> yethiraj30 slots=8 max_slots=8 > >> yethiraj31 slots=8 max_slots=8 > >> > >> These two machines are intra-connected and I have installed openmpi > 1.3.3. > >> Then If I try to run the replica exchange simulation using the following > command: > >> mpirun -np 16 --hostfile hostfile mdrun_4mpi -s topol_.tpr -multi 16 > -replex 100 >& log_replica_test > >> > >> But I find following error and job does not proceed at all : > >> btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() > to 192.168.0.31 failed: No route to host (113) > >> > >> Here is the full details: > >> > >> NNODES=16, MYRANK=0, HOSTNAME=yethiraj30 > >> NNODES=16, MYRANK=1, HOSTNAME=yethiraj30 > >> NNODES=16, MYRANK=4, HOSTNAME=yethiraj30 > >> NNODES=16, MYRANK=2, HOSTNAME=yethiraj30 > >> NNODES=16, MYRANK=6, HOSTNAME=yethiraj30 > >> NNODES=16, MYRANK=3, HOSTNAME=yethiraj30 > >> NNODES=16, MYRANK=5, HOSTNAME=yethiraj30 > >> NNODES=16, MYRANK=7, HOSTNAME=yethiraj30 > >> > [yethiraj30][[22604,1],0][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] > connect() to 192.168.0.31 failed: No route to host (113) > >> > [yethiraj30][[22604,1],4][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] > connect() to 192.168.0.31 failed: No route to host (113) > >> > [yethiraj30][[22604,1],6][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] > connect() to 192.168.0.31 failed: No route to host (113) > >> > [yethiraj30][[22604,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] > connect() to 192.168.0.31 failed: No route to host (113) > >> > [yethiraj30][[22604,1],3][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] > connect() to 192.168.0.31 failed: No route to host (113) > >> > [yethiraj30][[22604,1],2][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] > connect() to 192.168.0.31 failed: No route to host (113) > >> NNODES=16, MYRANK=10, HOSTNAME=yethiraj31 > >> NNODES=16, MYRANK=12, HOSTNAME=yethiraj31 > >> > >> I am not sure how to resolve this issue. In general, I can go from one > machine to another without any problem using ssh. But, when I am trying to > run openmpi over both the machines, I get this error. Any help will be > appreciated. > >> > >> Jagannath > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >