Thank you Trey, it work. I put this config on controller and works:
> # COMPUTE NODES > NodeName=JGNODE[1-1] CPUs=1 State=UNKNOWN > #PartitionName=debug Nodes=JGNODE1 Default=yes MaxTime=INFINITE > PartitionName=CLUSTER Default=yes State=UP nodes=JGNODE[1-1] > But now have problem with the "munge.socket.2" on the controller and the nodes. > Munge encode failed: Failed to access "/var/run/munge/munge.socket.2" > P.S. Using Centos 6.6 x86. Bets, Góis 2015-04-21 21:43 GMT+01:00 Trey Dockendorf <[email protected]>: > Your node definition doesn't match what you assigned the partition > 'debug'. You probably want NodeName=JGNODE1 instead of NodeName=JGHCSLURM. > > - Trey > > ============================= > > Trey Dockendorf > Systems Analyst I > Texas A&M University > Academy for Advanced Telecommunications and Learning Technologies > Phone: (979)458-2396 > Email: [email protected] > Jabber: [email protected] > > On Tue, Apr 21, 2015 at 3:31 PM, Jorge Góis <[email protected]> wrote: > >> Hi guys. >> I'm testing the slurm-14.11.5 and now using the slurm-15.08.0-0pre3. >> >> But in both version have this error: >> >>> slurmctld: error: find_node_record: lookup failure for JGNODE1 >>> slurmctld: error: build_part_bitmap: invalid node name JGNODE1 >>> slurmctld: fatal: Invalid node names in partition CLUSTER >>> >> >> On node JGNODE1 have de slurmd -Dvvv runing and have this message: >> >>> slurmd: debug2: Error connecting slurm stream socket at 192.168.1.1:6817: >>> Connection refused >>> slurmd: debug: Failed to contact primary controller: Connection refused >>> >> But is normal because the slurmctld don't start. >> >> In slurm.conf on controller have this lines: >> ... >> # COMPUTE NODES >> NodeName=JGHCSLURM CPUs=1 State=UNKNOWN >> PartitionName=debug Nodes=JGNODE1 Default=yes MaxTime=INFINITE >> #PartitionName=CLUSTER Default=yes State=UP nodes=JGNODE[1-1] >> >> >> HC can ping the JGNODE1, and have on /etc/hosts the IP, FQDN and NAME. >> >> I make a simple script to test de gethostname() and resolve the name >> JGNODE1. >> >> Can help to find the problem? >> >> Best, >> jG0|s >> >> >> >
