[slurm-dev] Re: Communication error

2017-05-08 Thread Jason Bacon
Following a tip from Moe, I actually traced the problem to an iptables issue specific to the TCP port used by SLURM. It's resolved now. Thanks for the suggestions, though. On 05/08/17 09:01, Felip Moll wrote: Re: [slurm-dev] Communication error Do you have any kind of firewall in your net

[slurm-dev] Re: Communication error

2017-05-08 Thread John Hearns
JAson, note that compute-2018 is in IDLE* status - which means that it is not reachable. As Felip suggests, log into that compute node and tail -f /var/log/slurmd.log I would also suggest on your master node running an scontrol to set that node as DRAIN then RESUME, then log into the node and (re

[slurm-dev] Re: Communication error

2017-05-08 Thread Felip Moll
Do you have any kind of firewall in your network? I would suggest it is a problem with dates but since you tested munge -n we could discard that. Can you anyway do a pdsh -w compute-* date |dshbak -c ? Can you show nodes slurmd log output? *--Felip Moll Marquès* Computer Science Engineer E-Mail