This seems like a FAQ, but I think I've eliminated the usual causes.

Slurm control node logs the following, repeated once per second:

[2016-05-10T23:49:10.184] error: Munge decode failed: Invalid credential
[2016-05-10T23:49:10.184] error: slurm_receive_msg: 
MESSAGE_NODE_REGISTRATION_STATUS has authentication error: Invalid credential
[2016-05-10T23:49:10.184] error: slurm_receive_msg: Protocol authentication 
error
[2016-05-10T23:49:10.194] error: slurm_receive_msg: Protocol authentication 
error


munged.log has corresponding messages, also one per second:

2016-05-10 23:49:10 Info:      Invalid credential


Compute nodes register and work correctly so I don't know where this is
coming from.

All the clocks are in sync.

All the /etc/munge/munge.key files match.

munge -n | unmunge works locally and between control and compute nodes.

These messages start immediately when the slurmctld daemon starts, as
soon as the munge plugins load, and regardless of whether any slurmd
are running on the compute nodes. So it's acting as if it's local to the
control node.

Slurm version 14.11.9 on RHEL 6.7

Would appreciate any ideas of what else to check. Maximum debugging
verbosity still doesn't seem to reveal the source of these messages.

Allan

Reply via email to