Hi,

        I used *short* hostnames (say node306) in all my compute node and SLURM 
settings before. It works well. However, error messages arise in 
/var/log/slurmctld.log when I set FQDN for the compute nodes.

                [2017-04-15T22:50:06.149] error: find_node_record: lookup 
failure for node306. <http://node306.pi.sjtu.edu.cn/>yourdomain.com

        On nnode306:

                $ hostname node306.yourdomain.com
                $ hostname -s
                node306
                $ hostname -f
                node306.yourdomain.com <http://node306.yourdomain.com/>

        In /etc/slurm/slurm.conf , shortnames are used since FQDN prevents use 
of hostlist. That is, "node[001-332].yourdomain.com" is invalid.

                NodeName=node[001-332]  CPUs=16 SocketsPerBoard=2 
CoresPerSocket=8 ThreadsPerCore=1 RealMemory=64100
        
        By far, SLURM works fine despite the error message appearing in log 
every 10 minutes. I appreciate any suggestion on this issue.

Best,

Jianwen

                

Reply via email to