I've also experienced the same problem 15.08.x. I run both slurmdbd & slurmctld on the same head node but I've explicitly configured slurm to use non-localhost IP address as the ControlAddr.
- slurm.conf -> /etc/slurm/slurm.conf ControlMachine=hpc ControlAddr=192.168.5.3 - slurmdbd.conf -> /etc/slurm/slurmdbd.conf DbdAddr=192.168.5.3 DbdHost=hpc - running slurm in debug mode -> slurmdbd -D -vvv slurmdbd: slurmdbd version 15.08.7 started slurmdbd: debug2: running rollup at Wed Apr 06 12:20:13 2016 slurmdbd: debug2: slurm_connect failed: Connection refused slurmdbd: debug2: Error connecting slurm stream socket at 127.0.0.1:6817: Connection refused - slurmctld is bound to non-localhost IP address # lsof -n -i -P | grep 6817 slurmctld 11392 slurm 4u IPv4 298513417 0t0 TCP 192.168.5.3:6817 (LISTEN) # echo | nc -v 127.0.0.1 6817 nc: connect to 127.0.0.1 port 6817 (tcp) failed: Connection refused I didn't have this issue back in 14.11.x. On Wed, Apr 6, 2016 at 9:59 AM, Christopher Samuel <[email protected]> wrote: > > On 31/03/16 16:04, Bill Broadley wrote: > > > So any sacctmgr change would trigger slurmdbd to try to talk to > > slurmctld over 127.0.0.1 and fail. But restarting slurmctld would work. > > Yeah, we would never have noticed as we run a central slurmdbd on a > different machine so they've always connected to their external IP > addresses. > > I suspect it might be related to this commit that went into 15.08 (as > that was the first major change to the logic since 2009, *if* I'm > reading the code right - not a given!): > > commit ebfbada369d4a0341c65a50d237441541f98cef1 > Author: Brian Christiansen <[email protected]> > Date: Fri Sep 11 09:28:48 2015 -0700 > > Allow ControlMachine, BackupController, DbdHost and DbdBackupHost to > be either short or long hostname. > > Bug 1921 > > > Best of luck! > Chris > -- > Christopher Samuel Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: [email protected] Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > -- *James Oguya*
