Do you have the same munge key installed on the head node and compute nodes?

On 09/14/2015 05:59 PM, Jan Dettmer wrote:
Hi,

I am having the issue that all nodes in the cluster are listed as down. The 
network connections work fine and I can ssh to the nodes

In slurmctld.log, I get a large number of errors (the following lines many 
times).


[2015-09-14T14:40:35.384] error: authentication: Invalid credential
[2015-09-14T14:40:35.910] error: Munge decode failed: Invalid credential
[2015-09-14T14:40:35.910] error: slurm_receive_msg: 
MESSAGE_NODE_REGISTRATION_STATUS has authentication error: Invalid credential
[2015-09-14T14:40:35.910] error: slurm_receive_msg: Protocol authentication 
error
[2015-09-14T14:40:35.920] error: slurm_receive_msg: Protocol authentication 
error


The issue appeared after I had to reinstall the compute nodes (the rocks 
cluster installation hung after upgrading to slurm 15.08 and I had to remove 
the compute nodes from the data base and then reinstall then; this also causes 
the ssh keys to change).

Has anyone experienced this?

Thanks, Jan=

Reply via email to