Your node definition doesn't match what you assigned the partition 'debug'. You probably want NodeName=JGNODE1 instead of NodeName=JGHCSLURM.
- Trey ============================= Trey Dockendorf Systems Analyst I Texas A&M University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: [email protected] Jabber: [email protected] On Tue, Apr 21, 2015 at 3:31 PM, Jorge Góis <[email protected]> wrote: > Hi guys. > I'm testing the slurm-14.11.5 and now using the slurm-15.08.0-0pre3. > > But in both version have this error: > >> slurmctld: error: find_node_record: lookup failure for JGNODE1 >> slurmctld: error: build_part_bitmap: invalid node name JGNODE1 >> slurmctld: fatal: Invalid node names in partition CLUSTER >> > > On node JGNODE1 have de slurmd -Dvvv runing and have this message: > >> slurmd: debug2: Error connecting slurm stream socket at 192.168.1.1:6817: >> Connection refused >> slurmd: debug: Failed to contact primary controller: Connection refused >> > But is normal because the slurmctld don't start. > > In slurm.conf on controller have this lines: > ... > # COMPUTE NODES > NodeName=JGHCSLURM CPUs=1 State=UNKNOWN > PartitionName=debug Nodes=JGNODE1 Default=yes MaxTime=INFINITE > #PartitionName=CLUSTER Default=yes State=UP nodes=JGNODE[1-1] > > > HC can ping the JGNODE1, and have on /etc/hosts the IP, FQDN and NAME. > > I make a simple script to test de gethostname() and resolve the name > JGNODE1. > > Can help to find the problem? > > Best, > jG0|s > > >
