Have SLURM installed on two nodes qdr1 and qdr2 with IP addresses
130.1.2.205 and 130.1.2.206. Started slurmctld on qdr1. Started slurmd on
qdr1 and qdr2 both.

The slurmd on qdr1 is running fine. But the slurmd on qdr2 gives the
following error message :

slurmd: debug2: _slurm_connect failed.: No route to host
> slurmd: debug2: Error connecting slurm stream socket at 130.1.2.205:6817:
> No route to host
>
slurmd: debug: Failed to contact primary controller: No route to host
>

Now I have tried netstat -lnt on qdr1(130.1.2.205) and it shows this :

Proto  Recv-Q    Send-Q   LocalAddress         ForeignAddress          State
> tcp        0          0             0.0.0.0:6817           0.0.0.0:*
> LISTEN
>
tcp        0              0         0.0.0.0:6818           0.0.0.0:*
> LISTEN
>

This shows that both slurmctld and slurmd on qdr1 are listening and talking
to each other.

But doing nc -zv qdr1 6818 from qdr2 gives me the following error :

> nc: Connect to qdr1 port 6818(tcp) failed: No route to host
>

Attachment: slurm.conf
Description: Binary data

Reply via email to