Hi Andy, thanks very much! That was the issue. I am not sure why it was different, since I did update the munge.key with the 411 commands in rocks cluster but somehow it did not work properly. I manually scp them to each node and they now show as idle.
-Jan > On Sep 14, 2015, at 3:58 PM, Andy Riebs <[email protected]> wrote: > > > Do you have the same munge key installed on the head node and compute nodes? > > On 09/14/2015 05:59 PM, Jan Dettmer wrote: >> Hi, >> >> I am having the issue that all nodes in the cluster are listed as down. The >> network connections work fine and I can ssh to the nodes >> >> In slurmctld.log, I get a large number of errors (the following lines many >> times). >> >> >> [2015-09-14T14:40:35.384] error: authentication: Invalid credential >> [2015-09-14T14:40:35.910] error: Munge decode failed: Invalid credential >> [2015-09-14T14:40:35.910] error: slurm_receive_msg: >> MESSAGE_NODE_REGISTRATION_STATUS has authentication error: Invalid credential >> [2015-09-14T14:40:35.910] error: slurm_receive_msg: Protocol authentication >> error >> [2015-09-14T14:40:35.920] error: slurm_receive_msg: Protocol authentication >> error >> >> >> The issue appeared after I had to reinstall the compute nodes (the rocks >> cluster installation hung after upgrading to slurm 15.08 and I had to remove >> the compute nodes from the data base and then reinstall then; this also >> causes the ssh keys to change). >> >> Has anyone experienced this? >> >> Thanks, Jan=
