Hi, I used *short* hostnames (say node306) in all my compute node and SLURM settings before. It works well. However, error messages arise in /var/log/slurmctld.log when I set FQDN for the compute nodes.
[2017-04-15T22:50:06.149] error: find_node_record: lookup failure for node306. <http://node306.pi.sjtu.edu.cn/>yourdomain.com On nnode306: $ hostname node306.yourdomain.com $ hostname -s node306 $ hostname -f node306.yourdomain.com <http://node306.yourdomain.com/> In /etc/slurm/slurm.conf , shortnames are used since FQDN prevents use of hostlist. That is, "node[001-332].yourdomain.com" is invalid. NodeName=node[001-332] CPUs=16 SocketsPerBoard=2 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=64100 By far, SLURM works fine despite the error message appearing in log every 10 minutes. I appreciate any suggestion on this issue. Best, Jianwen