I think I found the problem and solution. The slurm configuration 15.08 slurm [1] configuration tool mentions:
Define the hostname of the computer on which the Slurm controller and optional backup controller will execute. You can also specify addresses of these computers if desired (defaults to their hostnames). The IP addresses can be either numeric IP addresses or names. Hostname values should should not be the fully qualified domain name (e.g. use tux rather than tux.abc.com). ControlMachine: Master Controller Hostname ControlAddr: Master Controller Address (optional) So I normally fill out ControlMachine = hostname, and ControlAddr = IP address. Turns out when configured this way the slurmctld ONLY listens to the IP address and NOT 127.0.0.1. So telnet 127.0.0.1 6817 fails. As you might imagine slurmdbd does as well: [2016-03-30T19:31:46.971] debug: sending updates to MyClust at 127.0.0.1(6817) ver 7424 [2016-03-30T19:31:46.971] debug2: Error connecting slurm stream socket at 127.0.0.1:6817: Connection refused The strange thing is I see no documented way for slurmdbd.conf to know the IP address of the slurm controller. However if you look at the updated documentation at [2]: ControlAddr Name that ControlMachine should be referred to in establishing a communications path. This name will be used as an argument to the gethostbyname() function for identification. For example, "elx0000" might be used to designate the Ethernet address for node "lx0000". By default the ControlAddr will be identical in value to ControlMachine. Calling gethostbyname on an IP address isn't what slurm excepts, thus the things breaking. Seems weird to call gethostbyname for a variable called ControlAddr. So the fix is really easy: ControlMachine=SlurmHead ControlAddr=SlurmHead Or I imagine just leaving ControlAddr blank. BTW, this seems new. We have a slurm 14.11.7 and it happily allows telnet localhost 6817 to work, even if controladdr is set to the IP address. Gene can you try my fix? Seems like ControlAddr should either: A) accept an IP address B) Be deleted since it's identical to ControlMachine Out of curiosity, how do you tell slurmdbd which IP to use to connect to the slurmctld? I tried sneaking in a ControlMachine= or ControlAddr which causes and error and failure. SlurmDBD always seems to use 127.0.0.1. [1] http://slurm.schedmd.com/configurator.easy.html [2] http://slurm.schedmd.com/slurm.conf.html
