Your slurmctld daemon exited right away. Check your slurmctld log.  
IYour partition is configured (twice) with nodes "linux[0-11]" that do  
not exist.

Quoting DIAM code distribution DIAM/CDRH/FDA <[email protected]>:

> I am trying to install and start SLURM on a single node with 12 cpus (just
> for testing purposes). So the controller and compute nodes are the same.
> The slurmd and slurmctld daemons start okay without any errors.
> But "sinfo" gives error "slurm_load_partitions: Unable to contact slurm
> controller (connect failure)".
>
> The slurmd log file gave error "_slurm_connect failed: Connection refused
> debug2: Error connecting slurm stream socket at 192.168.44.25:6817:
> Connection refused
> Failed to contact primary controller: Connection refused"
>
> The slurm.conf file contains the following network configuration:-
> (attached is the complete config file)
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmctldPort=6817
> SlurmdPidFile=/var/run/slurmd.pid
> SlurmdPort=6818
> SlurmdSpoolDir=/tmp/slurmd
> SlurmUser=slurmuser
> SlurmdUser=root
>
> I do not understand why am I getting network errors when I just have one
> node. I should not be getting the communication errors as there is no
> communication.
>
> Please help!
>

Reply via email to