My /etc/hosts alread has those entries. And like I mentioned, I can ping from qdr2 to qdr1. But nc -v qdr1 6818 shows that there is no route.
On Tue, Nov 5, 2013 at 9:58 AM, Ludovic Prevost < ludovic.prev...@emea.nec.com> wrote: > Hi, > > > > Could you try to populate your /etc/hosts like : > > > > 130.1.2.205 qdr1 > > 130.1.2.206 qdr2 > > > > And try again : > > > > $ nc -v qdr1 6818 > > > > Best Regards, > > PREVOST Ludovic > > NEC HPC Europe > > > > *De :* Arjun J Rao [mailto:rectangle.k...@gmail.com] > *Envoyé :* mardi 5 novembre 2013 08:54 > *À :* slurm-dev > *Objet :* [slurm-dev] Fwd: Failed to contact primary controller : No > route to host > > > > Have SLURM installed on two nodes qdr1 and qdr2 with IP addresses > 130.1.2.205 and 130.1.2.206. Started slurmctld on qdr1. Started slurmd on > qdr1 and qdr2 both. > > > > The slurmd on qdr1 is running fine. But the slurmd on qdr2 gives the > following error message : > > slurmd: debug2: _slurm_connect failed.: No route to host > > slurmd: debug2: Error connecting slurm stream socket at 130.1.2.205:6817: > No route to host > > slurmd: debug: Failed to contact primary controller: No route to host > > > > Now I have tried netstat -lnt on qdr1(130.1.2.205) and it shows this : > > Proto Recv-Q Send-Q LocalAddress ForeignAddress > State > > tcp 0 0 0.0.0.0:6817 0.0.0.0:* > LISTEN > > tcp 0 0 0.0.0.0:6818 0.0.0.0:* > LISTEN > > > > This shows that both slurmctld and slurmd on qdr1 are listening and > talking to each other. > > But doing nc -zv qdr1 6818 from qdr2 gives me the following error : > > nc: Connect to qdr1 port 6818(tcp) failed: No route to host > > > > Edit: pinging from qdr2 to qdr1 and vice versa works fine. > > > > Click > here<https://www.mailcontrol.com/sr/P3xCqCWRQUrGX2PQPOmvUpjAP6sbmVCmPcp%21iZPPdgndfOK8goMOhhJrKnC5EphNoTyr8GNFXkvLq66BGWNXYg==>to > report this email as spam. >