I've also experienced the same problem 15.08.x. I run both slurmdbd &
slurmctld on the same head node but I've explicitly configured slurm to use
non-localhost IP address as the ControlAddr.

- slurm.conf -> /etc/slurm/slurm.conf
ControlMachine=hpc
ControlAddr=192.168.5.3

- slurmdbd.conf -> /etc/slurm/slurmdbd.conf
DbdAddr=192.168.5.3
DbdHost=hpc

- running slurm in debug mode -> slurmdbd -D -vvv
slurmdbd: slurmdbd version 15.08.7 started
slurmdbd: debug2: running rollup at Wed Apr 06 12:20:13 2016
slurmdbd: debug2: slurm_connect failed: Connection refused
slurmdbd: debug2: Error connecting slurm stream socket at 127.0.0.1:6817:
Connection refused

- slurmctld is bound to non-localhost IP address
# lsof -n -i -P | grep 6817
slurmctld 11392       slurm    4u  IPv4 298513417      0t0  TCP
192.168.5.3:6817 (LISTEN)
# echo | nc -v 127.0.0.1 6817
nc: connect to 127.0.0.1 port 6817 (tcp) failed: Connection refused

I didn't have this issue back in 14.11.x.

On Wed, Apr 6, 2016 at 9:59 AM, Christopher Samuel <[email protected]>
wrote:

>
> On 31/03/16 16:04, Bill Broadley wrote:
>
> > So any sacctmgr change would trigger slurmdbd to try to talk to
> > slurmctld over 127.0.0.1 and fail.  But restarting slurmctld would work.
>
> Yeah, we would never have noticed as we run a central slurmdbd on a
> different machine so they've always connected to their external IP
> addresses.
>
> I suspect it might be related to this commit that went into 15.08 (as
> that was the first major change to the logic since 2009, *if* I'm
> reading the code right - not a given!):
>
> commit ebfbada369d4a0341c65a50d237441541f98cef1
> Author: Brian Christiansen <[email protected]>
> Date:   Fri Sep 11 09:28:48 2015 -0700
>
>      Allow ControlMachine, BackupController, DbdHost and DbdBackupHost to
> be either short or long hostname.
>
>     Bug 1921
>
>
> Best of luck!
> Chris
> --
>  Christopher Samuel        Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: [email protected] Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/      http://twitter.com/vlsci
>



-- 
*James Oguya*

Reply via email to