Hi Mahmood > [root@cluster ~]# ps aux | grep slurmdb > root 3406 0.0 0.0 338636 2672 ? Sl 00:26 0:01 > /usr/sbin/slurmdbd > root 17146 0.0 0.0 105308 888 pts/2 S+ 13:26 0:00 grep slurmdb
That's good. What does its /var/log/slurm/slurmdbd.log say? Any errors? > >Is slurm.conf pointing to the right slurmdbd host (AccountingStorageHost)? > There is no such parameter Odd. What version of Slurm have you got installed? I've got one in mine (16.05.9) and I've set it when I was using older versions. But let's assume that it's defaulting to localhost: can you check that you can telnet localhost 6819 and get a TCP connection? BTW, was the failing sacctmgr running from the same machine where slurmdbd is running? If not, can you check that you can telnet <slurmdbd host> 6819 and get a TCP connection? Something else to check is if you've got a firewall getting in the way, or if sacctmgr is going to the right IP address. Some of the things I'd try: - tcpdump for packet tracing - strace to see what connections are being attempted by sacctmgr Something else I've seen happen before is slurmdbd being unresponsive due to a memory leak that it used to have on some older version. Keep digging and you should find clues as to why this is not working for you. Regards Jeff