Dear Group, I am running a cluster with Slurm 14.11.6 and get the following errors appearing in the log files every 5 minutes..
error: find_node_record: lookup failure for X error: find_node_record: lookup failure for Y where X is the machine which hosts the slurm controller and Y is a login node (that does not run any slurm processes). This is an unusual message as X,Y exist in /etc/hosts and are pingable etc. If I place X,Y in the slurm.conf file (as compute nodes in DOWN state) then the error goes away. Any ideas? -- Simon Michnowicz Monash e-Research Centre PH: (03) 9902 0794 Mob: 0418 302 046 www.monash.edu.au/eresearch
