Following a tip from Moe, I actually traced the problem to an iptables
issue specific to the TCP port used by SLURM. It's resolved now.
Thanks for the suggestions, though.
On 05/08/17 09:01, Felip Moll wrote:
Re: [slurm-dev] Communication error
Do you have any kind of firewall in your net
JAson, note that compute-2018 is in IDLE* status - which means that it is
not reachable.
As Felip suggests, log into that compute node and tail -f
/var/log/slurmd.log
I would also suggest on your master node running an scontrol to set that
node as DRAIN then RESUME,
then log into the node and (re
Do you have any kind of firewall in your network?
I would suggest it is a problem with dates but since you tested munge -n we
could discard that.
Can you anyway do a pdsh -w compute-* date |dshbak -c ?
Can you show nodes slurmd log output?
*--Felip Moll Marquès*
Computer Science Engineer
E-Mail