Chris is right. If you ever have this problem it should be fairly clearly marked in both slurmctld and slurmdbd logs when it fails. Usually a firewall like iptables is to blame or different slurm users set in the various .conf files as mentioned before.
On March 30, 2016 5:57:19 PM PDT, Gene Soudlenkov <[email protected]> wrote: > >We've been having the same problem for years - and we still need to do >it. > >Gene > >On 31/03/16 13:46, Christopher Samuel wrote: >> On 31/03/16 11:33, Terri Knight wrote: >> >>> Upon further testing, I only need restart the slurmctld daemon to >get >>> the new user added such that he can run a job. >> I think when you add a user with sacctmgr slurmdbd will try and do an >> RPC to slurmctld on the registered clusters to inform them of this. >> >> If slurmdbd can't do so then you should see an error logged in the >> slurmdbd logs and consequently slurmctld won't realise this new user >> exists until it reloads its list of users from slurmdbd (say on a >restart). >> >> Check your slurmdbd logs and also check that: >> >> sacctmgr list cluster format=cluster,controlhost >> >> reports an IP address that slurmdbd can talk to for each cluster. >> >> Best of luck, >> Chris > >-- >New Zealand eScience Infrastructure >Centre for eResearch >The University of Auckland >e: [email protected] >p: +64 9 3737599 ext 89834 c: +64 21 840 825 f: +64 9 373 7453 >w: www.nesi.org.nz
