Hi guys. I'm testing the slurm-14.11.5 and now using the slurm-15.08.0-0pre3.
But in both version have this error: > slurmctld: error: find_node_record: lookup failure for JGNODE1 > slurmctld: error: build_part_bitmap: invalid node name JGNODE1 > slurmctld: fatal: Invalid node names in partition CLUSTER > On node JGNODE1 have de slurmd -Dvvv runing and have this message: > slurmd: debug2: Error connecting slurm stream socket at 192.168.1.1:6817: > Connection refused > slurmd: debug: Failed to contact primary controller: Connection refused > But is normal because the slurmctld don't start. In slurm.conf on controller have this lines: ... # COMPUTE NODES NodeName=JGHCSLURM CPUs=1 State=UNKNOWN PartitionName=debug Nodes=JGNODE1 Default=yes MaxTime=INFINITE #PartitionName=CLUSTER Default=yes State=UP nodes=JGNODE[1-1] HC can ping the JGNODE1, and have on /etc/hosts the IP, FQDN and NAME. I make a simple script to test de gethostname() and resolve the name JGNODE1. Can help to find the problem? Best, jG0|s
