Your slurmctld daemon exited right away. Check your slurmctld log. IYour partition is configured (twice) with nodes "linux[0-11]" that do not exist.
Quoting DIAM code distribution DIAM/CDRH/FDA <[email protected]>: > I am trying to install and start SLURM on a single node with 12 cpus (just > for testing purposes). So the controller and compute nodes are the same. > The slurmd and slurmctld daemons start okay without any errors. > But "sinfo" gives error "slurm_load_partitions: Unable to contact slurm > controller (connect failure)". > > The slurmd log file gave error "_slurm_connect failed: Connection refused > debug2: Error connecting slurm stream socket at 192.168.44.25:6817: > Connection refused > Failed to contact primary controller: Connection refused" > > The slurm.conf file contains the following network configuration:- > (attached is the complete config file) > SlurmctldPidFile=/var/run/slurmctld.pid > SlurmctldPort=6817 > SlurmdPidFile=/var/run/slurmd.pid > SlurmdPort=6818 > SlurmdSpoolDir=/tmp/slurmd > SlurmUser=slurmuser > SlurmdUser=root > > I do not understand why am I getting network errors when I just have one > node. I should not be getting the communication errors as there is no > communication. > > Please help! >
