Also check to see if munge is functioning properly.
On Wed, Apr 26, 2017 at 10:00 AM, Jeff Tan <jeffe...@au1.ibm.com> wrote: > Hi Mahmood > > > [root@cluster ~]# ps aux | grep slurmdb > > root 3406 0.0 0.0 338636 2672 ? Sl 00:26 0:01 > > /usr/sbin/slurmdbd > > root 17146 0.0 0.0 105308 888 pts/2 S+ 13:26 0:00 grep > slurmdb > > That's good. What does its /var/log/slurm/slurmdbd.log say? Any errors? > > > >Is slurm.conf pointing to the right slurmdbd host > (AccountingStorageHost)? > > There is no such parameter > > Odd. What version of Slurm have you got installed? I've got one in mine > (16.05.9) and I've set it when I was using older versions. But let's assume > that it's defaulting to localhost: can you check that you can telnet > localhost 6819 and get a TCP connection? BTW, was the failing sacctmgr > running from the same machine where slurmdbd is running? If not, can you > check that you can telnet <slurmdbd host> 6819 and get a TCP connection? > > Something else to check is if you've got a firewall getting in the way, or > if sacctmgr is going to the right IP address. Some of the things I'd try: > > - tcpdump for packet tracing > - strace to see what connections are being attempted by sacctmgr > > Something else I've seen happen before is slurmdbd being unresponsive due > to a memory leak that it used to have on some older version. Keep digging > and you should find clues as to why this is not working for you. > > Regards > Jeff >