body p { margin-bottom: 0cm; margin-top: 0pt; }
What is configured for
ControlMachine=
and
ControlAddr=
?
anything other then localhost or 127.0.0.1 is probably wrong.
A workaround might be to open the ports in your iptables.
Personally I think it's easier to change your /etc/hosts file so
your hostname maps to 127.0.0.1, But that assumes you're not part
of nis or the like, and/or /etc/nsswitch.conf has files as first
priority for hosts:
On 28/02//2012 21:24, DIAM code distribution DIAM/CDRH/FDA wrote:
Network error "_slurm_connect failed: Connection refused"
I am trying to install and start SLURM on a single node with 12
cpus (just for testing purposes). So the controller and compute
nodes are the same. The slurmd and slurmctld daemons start okay
without any errors.
But "sinfo" gives error "slurm_load_partitions: Unable to
contact slurm controller (connect failure)".
The slurmd log file gave error "_slurm_connect failed:
Connection refused
debug2: Error connecting slurm stream socket at 192.168.44.25:6817:
Connection refused
Failed to contact primary controller: Connection refused"
The slurm.conf file contains the following network
configuration:- (attached is the complete config file)
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/tmp/slurmd
SlurmUser=slurmuser
SlurmdUser=root
I do not understand why am I getting network errors when I
just have one node. I should not be getting the communication
errors as there is no communication.
Please help!