Hi Mahmood

> [root@cluster ~]# ps aux | grep slurmdb
> root      3406  0.0  0.0 338636  2672 ?        Sl   00:26   0:01
> /usr/sbin/slurmdbd
> root     17146  0.0  0.0 105308   888 pts/2    S+   13:26   0:00 grep

That's good. What does its /var/log/slurm/slurmdbd.log say? Any errors?

> >Is slurm.conf pointing to the right slurmdbd host
> There is no such parameter

Odd. What version of Slurm have you got installed? I've got one in mine
(16.05.9) and I've set it when I was using older versions. But let's assume
that it's defaulting to localhost: can you check that you can telnet
localhost 6819 and get a TCP connection? BTW, was the failing sacctmgr
running from the same machine where slurmdbd is running? If not, can you
check that you can telnet <slurmdbd host> 6819 and get a TCP connection?

Something else to check is if you've got a firewall getting in the way, or
if sacctmgr is going to the right IP address. Some of the things I'd try:

- tcpdump for packet tracing
- strace to see what connections are being attempted by sacctmgr

Something else I've seen happen before is slurmdbd being unresponsive due
to a memory leak that it used to have on some older version. Keep digging
and you should find clues as to why this is not working for you.


Reply via email to