Read this slurm.conf manual, under the parameters that start with Node. They 
discuss this situation.

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - 
novos...@rutgers.edu<mailto:novos...@rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Apr 15, 2017, at 11:47, Jianwen Wei 
<wei.jian...@gmail.com<mailto:wei.jian...@gmail.com>> wrote:

Hi,

I used *short* hostnames (say node306) in all my compute node and SLURM 
settings before. It works well. However, error messages arise in 
/var/log/slurmctld.log when I set FQDN for the compute nodes.

[2017-04-15T22:50:06.149] error: find_node_record: lookup failure for 
node306.<http://node306.pi.sjtu.edu.cn>yourdomain.com<http://yourdomain.com>

On nnode306:

$ hostname node306.yourdomain.com<http://node306.yourdomain.com>
$ hostname -s
node306
$ hostname -f
node306.yourdomain.com<http://node306.yourdomain.com>

In /etc/slurm/slurm.conf , shortnames are used since FQDN prevents use of 
hostlist. That is, "node[001-332].yourdomain.com<http://yourdomain.com>" is 
invalid.

NodeName=node[001-332]  CPUs=16 SocketsPerBoard=2 CoresPerSocket=8 
ThreadsPerCore=1 RealMemory=64100
By far, SLURM works fine despite the error message appearing in log every 10 
minutes. I appreciate any suggestion on this issue.

Best,

Jianwen

Reply via email to