This seems like a FAQ, but I think I've eliminated the usual causes. Slurm control node logs the following, repeated once per second:
[2016-05-10T23:49:10.184] error: Munge decode failed: Invalid credential [2016-05-10T23:49:10.184] error: slurm_receive_msg: MESSAGE_NODE_REGISTRATION_STATUS has authentication error: Invalid credential [2016-05-10T23:49:10.184] error: slurm_receive_msg: Protocol authentication error [2016-05-10T23:49:10.194] error: slurm_receive_msg: Protocol authentication error munged.log has corresponding messages, also one per second: 2016-05-10 23:49:10 Info: Invalid credential Compute nodes register and work correctly so I don't know where this is coming from. All the clocks are in sync. All the /etc/munge/munge.key files match. munge -n | unmunge works locally and between control and compute nodes. These messages start immediately when the slurmctld daemon starts, as soon as the munge plugins load, and regardless of whether any slurmd are running on the compute nodes. So it's acting as if it's local to the control node. Slurm version 14.11.9 on RHEL 6.7 Would appreciate any ideas of what else to check. Maximum debugging verbosity still doesn't seem to reveal the source of these messages. Allan
