Maybe the error message could be enhanced like this:
This is slurm controler host, slurmd doesn't need to run on controller host 
except if you list it as a compute node as well (not recommanded).
I think you shouldn't run slurmd on your ControlMachine node (but run
slurmctld and slurmdbd), as in your configuration I don't see that
slurm_master has its NodeName line.
So you should either add slurm_master to your slurm.conf in NodeName
line or not start slurmd on the slurm_master.


> Hi,
> I've upgraded slurm 15.08.3 (built from rpmbuild -tb <tarball>) to 17.02.6 on 
> centos-7-x86_64.
> Since I've done that, slurmd refuse to start on ControlMachine and on 
> Backupcontroller. (it starts fine on compute nodes)
> The error is: slurmd: fatal: Unable to determine this slurmd's NodeName
> If I try to specify the nodename it fails with a different error message:
> [root@slurm_master] # slurmd -D -N $(hostname -s)
> slurmd: Node configuration differs from hardware: CPUs=0:32(hw) 
> Boards=0:1(hw) SocketsPerBoard=0:2(hw) CoresPerSocket=0:8(hw) 
> ThreadsPerCore=0:2(hw)
> slurmd: Message aggregation disabled
> slurmd: error: find_node_record: lookup failure for slurm_master
> slurmd: fatal: ROUTE -- slurm_master not found in node_record_table
> [root@slurm_master]# hostname -s
> slurm_master
> Trying to debug seems to show that the hostname is not in the node hash table.
> slurmdbd and slurmctld start fine.
> I've googled around, but I only find problems related to compute nodes, not 
> Controller or Backup.
> Any ideas?

