Have SLURM installed on two nodes qdr1 and qdr2 with IP addresses 130.1.2.205 and 130.1.2.206. Started slurmctld on qdr1. Started slurmd on qdr1 and qdr2 both.
The slurmd on qdr1 is running fine. But the slurmd on qdr2 gives the following error message : slurmd: debug2: _slurm_connect failed.: No route to host > slurmd: debug2: Error connecting slurm stream socket at 130.1.2.205:6817: > No route to host > slurmd: debug: Failed to contact primary controller: No route to host > Now I have tried netstat -lnt on qdr1(130.1.2.205) and it shows this : Proto Recv-Q Send-Q LocalAddress ForeignAddress State > tcp 0 0 0.0.0.0:6817 0.0.0.0:* > LISTEN > tcp 0 0 0.0.0.0:6818 0.0.0.0:* > LISTEN > This shows that both slurmctld and slurmd on qdr1 are listening and talking to each other. But doing nc -zv qdr1 6818 from qdr2 gives me the following error : > nc: Connect to qdr1 port 6818(tcp) failed: No route to host >
slurm.conf
Description: Binary data