Sorry noob problem, i'm forgot start munge. Bests, Gois
2015-04-21 21:57 GMT+01:00 Jorge Góis <[email protected]>: > Thank you Trey, it work. > > I put this config on controller and works: > >> # COMPUTE NODES >> NodeName=JGNODE[1-1] CPUs=1 State=UNKNOWN >> #PartitionName=debug Nodes=JGNODE1 Default=yes MaxTime=INFINITE >> PartitionName=CLUSTER Default=yes State=UP nodes=JGNODE[1-1] >> > > But now have problem with the "munge.socket.2" on the controller and the > nodes. > >> Munge encode failed: Failed to access "/var/run/munge/munge.socket.2" >> > > P.S. Using Centos 6.6 x86. > Bets, > Góis > > > > 2015-04-21 21:43 GMT+01:00 Trey Dockendorf <[email protected]>: > >> Your node definition doesn't match what you assigned the partition >> 'debug'. You probably want NodeName=JGNODE1 instead of NodeName=JGHCSLURM. >> >> - Trey >> >> ============================= >> >> Trey Dockendorf >> Systems Analyst I >> Texas A&M University >> Academy for Advanced Telecommunications and Learning Technologies >> Phone: (979)458-2396 >> Email: [email protected] >> Jabber: [email protected] >> >> On Tue, Apr 21, 2015 at 3:31 PM, Jorge Góis <[email protected]> wrote: >> >>> Hi guys. >>> I'm testing the slurm-14.11.5 and now using the slurm-15.08.0-0pre3. >>> >>> But in both version have this error: >>> >>>> slurmctld: error: find_node_record: lookup failure for JGNODE1 >>>> slurmctld: error: build_part_bitmap: invalid node name JGNODE1 >>>> slurmctld: fatal: Invalid node names in partition CLUSTER >>>> >>> >>> On node JGNODE1 have de slurmd -Dvvv runing and have this message: >>> >>>> slurmd: debug2: Error connecting slurm stream socket at >>>> 192.168.1.1:6817: Connection refused >>>> slurmd: debug: Failed to contact primary controller: Connection refused >>>> >>> But is normal because the slurmctld don't start. >>> >>> In slurm.conf on controller have this lines: >>> ... >>> # COMPUTE NODES >>> NodeName=JGHCSLURM CPUs=1 State=UNKNOWN >>> PartitionName=debug Nodes=JGNODE1 Default=yes MaxTime=INFINITE >>> #PartitionName=CLUSTER Default=yes State=UP nodes=JGNODE[1-1] >>> >>> >>> HC can ping the JGNODE1, and have on /etc/hosts the IP, FQDN and NAME. >>> >>> I make a simple script to test de gethostname() and resolve the name >>> JGNODE1. >>> >>> Can help to find the problem? >>> >>> Best, >>> jG0|s >>> >>> >>> >> >
