Also check to see if munge is functioning properly.


On Wed, Apr 26, 2017 at 10:00 AM, Jeff Tan <jeffe...@au1.ibm.com> wrote:

> Hi Mahmood
>
> > [root@cluster ~]# ps aux | grep slurmdb
> > root      3406  0.0  0.0 338636  2672 ?        Sl   00:26   0:01
> > /usr/sbin/slurmdbd
> > root     17146  0.0  0.0 105308   888 pts/2    S+   13:26   0:00 grep
> slurmdb
>
> That's good. What does its /var/log/slurm/slurmdbd.log say? Any errors?
>
> > >Is slurm.conf pointing to the right slurmdbd host
> (AccountingStorageHost)?
> > There is no such parameter
>
> Odd. What version of Slurm have you got installed? I've got one in mine
> (16.05.9) and I've set it when I was using older versions. But let's assume
> that it's defaulting to localhost: can you check that you can telnet
> localhost 6819 and get a TCP connection? BTW, was the failing sacctmgr
> running from the same machine where slurmdbd is running? If not, can you
> check that you can telnet <slurmdbd host> 6819 and get a TCP connection?
>
> Something else to check is if you've got a firewall getting in the way, or
> if sacctmgr is going to the right IP address. Some of the things I'd try:
>
> - tcpdump for packet tracing
> - strace to see what connections are being attempted by sacctmgr
>
> Something else I've seen happen before is slurmdbd being unresponsive due
> to a memory leak that it used to have on some older version. Keep digging
> and you should find clues as to why this is not working for you.
>
> Regards
> Jeff
>

Reply via email to