Hi Andy, 

thanks very much! That was the issue. I am not sure why it was different, since 
I did update the munge.key with the 411 commands in rocks cluster but somehow 
it did not work properly. I manually scp them to each node and they now show as 
idle. 

-Jan

> On Sep 14, 2015, at 3:58 PM, Andy Riebs <[email protected]> wrote:
> 
> 
> Do you have the same munge key installed on the head node and compute nodes?
> 
> On 09/14/2015 05:59 PM, Jan Dettmer wrote:
>> Hi,
>> 
>> I am having the issue that all nodes in the cluster are listed as down. The 
>> network connections work fine and I can ssh to the nodes
>> 
>> In slurmctld.log, I get a large number of errors (the following lines many 
>> times).
>> 
>> 
>> [2015-09-14T14:40:35.384] error: authentication: Invalid credential
>> [2015-09-14T14:40:35.910] error: Munge decode failed: Invalid credential
>> [2015-09-14T14:40:35.910] error: slurm_receive_msg: 
>> MESSAGE_NODE_REGISTRATION_STATUS has authentication error: Invalid credential
>> [2015-09-14T14:40:35.910] error: slurm_receive_msg: Protocol authentication 
>> error
>> [2015-09-14T14:40:35.920] error: slurm_receive_msg: Protocol authentication 
>> error
>> 
>> 
>> The issue appeared after I had to reinstall the compute nodes (the rocks 
>> cluster installation hung after upgrading to slurm 15.08 and I had to remove 
>> the compute nodes from the data base and then reinstall then; this also 
>> causes the ssh keys to change).
>> 
>> Has anyone experienced this?
>> 
>> Thanks, Jan=

Reply via email to