Your node definition doesn't match what you assigned the partition
'debug'.  You probably want NodeName=JGNODE1 instead of NodeName=JGHCSLURM.

- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: [email protected]
Jabber: [email protected]

On Tue, Apr 21, 2015 at 3:31 PM, Jorge Góis <[email protected]> wrote:

>  Hi guys.
> I'm testing the slurm-14.11.5 and now using the slurm-15.08.0-0pre3.
>
> But in both version have  this error:
>
>> slurmctld: error: find_node_record: lookup failure for JGNODE1
>> slurmctld: error: build_part_bitmap: invalid node name JGNODE1
>> slurmctld: fatal: Invalid node names in partition CLUSTER
>>
>
> On node JGNODE1 have de slurmd -Dvvv runing and have this message:
>
>> slurmd: debug2: Error connecting slurm stream socket at 192.168.1.1:6817:
>> Connection refused
>> slurmd: debug:  Failed to contact primary controller: Connection refused
>>
> But is normal because the slurmctld don't start.
>
> In slurm.conf on controller have this lines:
> ...
> # COMPUTE NODES
> NodeName=JGHCSLURM CPUs=1 State=UNKNOWN
> PartitionName=debug Nodes=JGNODE1 Default=yes MaxTime=INFINITE
> #PartitionName=CLUSTER Default=yes State=UP nodes=JGNODE[1-1]
>
>
> HC can ping the JGNODE1, and have on /etc/hosts the IP, FQDN and NAME.
>
> I make a simple script to test de gethostname() and resolve the name
> JGNODE1.
>
> Can help to find the problem?
>
> Best,
>   jG0|s
>
>
>

Reply via email to