On 03/30/2016 08:15 PM, Gene Soudlenkov wrote:
>
> Hmmmm I don't think this is the case - throughout the code they use
> gethostname (not byname) for get the name of the particular host.
I didn't track down the source, the documentation claims gethostbyname.
To quote the slurm.conf page:
ControlAddr:
This name will be used as an argument to the gethostbyname()
function for identification.
In any case with 15.08 with:
ControlMachine=<hostname>
ControlMachine=<ip address>
This would fail:
telnet 127.0.0.1 6817
But this would work:
telnet 10.15.0.1 6817
So any sacctmgr change would trigger slurmdbd to try to talk to
slurmctld over 127.0.0.1 and fail. But restarting slurmctld would work.
After the change both telnets work, slurmdbd can connect over 127.0.0.1,
and sacctmgr to add/del accounts would instantly work without restarting
anything.
This seems to be specific to 15.08 and didn't exist in 14.11.
Seems like forcing slurmdbd to use the IP address instead of 127.0.0.1
for localhost, but I couldn't figure out how to do that. The slurm.conf
page didn't help.