Do you have the same munge key installed on the head node and compute nodes?
On 09/14/2015 05:59 PM, Jan Dettmer wrote:
Hi, I am having the issue that all nodes in the cluster are listed as down. The network connections work fine and I can ssh to the nodes In slurmctld.log, I get a large number of errors (the following lines many times). [2015-09-14T14:40:35.384] error: authentication: Invalid credential [2015-09-14T14:40:35.910] error: Munge decode failed: Invalid credential [2015-09-14T14:40:35.910] error: slurm_receive_msg: MESSAGE_NODE_REGISTRATION_STATUS has authentication error: Invalid credential [2015-09-14T14:40:35.910] error: slurm_receive_msg: Protocol authentication error [2015-09-14T14:40:35.920] error: slurm_receive_msg: Protocol authentication error The issue appeared after I had to reinstall the compute nodes (the rocks cluster installation hung after upgrading to slurm 15.08 and I had to remove the compute nodes from the data base and then reinstall then; this also causes the ssh keys to change). Has anyone experienced this? Thanks, Jan=
