Hi guys.
I'm testing the slurm-14.11.5 and now using the slurm-15.08.0-0pre3.

But in both version have  this error:

> slurmctld: error: find_node_record: lookup failure for JGNODE1
> slurmctld: error: build_part_bitmap: invalid node name JGNODE1
> slurmctld: fatal: Invalid node names in partition CLUSTER
>

On node JGNODE1 have de slurmd -Dvvv runing and have this message:

> slurmd: debug2: Error connecting slurm stream socket at 192.168.1.1:6817:
> Connection refused
> slurmd: debug:  Failed to contact primary controller: Connection refused
>
But is normal because the slurmctld don't start.

In slurm.conf on controller have this lines:
...
# COMPUTE NODES
NodeName=JGHCSLURM CPUs=1 State=UNKNOWN
PartitionName=debug Nodes=JGNODE1 Default=yes MaxTime=INFINITE
#PartitionName=CLUSTER Default=yes State=UP nodes=JGNODE[1-1]


HC can ping the JGNODE1, and have on /etc/hosts the IP, FQDN and NAME.

I make a simple script to test de gethostname() and resolve the name
JGNODE1.

Can help to find the problem?

Best,
  jG0|s

Reply via email to