What is your throughput like? Ie how many jobs are you running through on
a hourly/daily basis, and what is your MinJobAge setting?

If your throughput is extremely high, it could be you are holding all that
info in memory and it is just growing and growing without purging.

We have it set VERY short as we can get that information back quickly via
the slurmdbd and the sqlog tool.

      MinJobAge
              The minimum age of a completed job before its record is
purged from SLURM\u2019s active database.
              Set  the values of MaxJobCount and MinJobAge to insure the
slurmctld daemon does not exhaust
              its memory or other resources. The default value is 300
seconds.  A value of  zero  prevents
              any job record purging.  May not exceed 65533.

--Jerry




On 6/26/13 9:10 AM, "Mario Kadastik" <[email protected]> wrote:

>
>> The only thing that comes to mind is your accounting database is down
>> and slurmctld is storing all the data in memory.
>
>Ok, maybe I shouldn't have deleted the other lines from grep:
>
>[root@slurm-1 ~]# ps -eao pid,user,rss,cmd|grep slurm
>21613 slurm    6735956 /usr/sbin/slurmctld
>21675 slurm     3372 /usr/sbin/slurmdbd
>
>And I just ran sreport commands to check and got nice reports back so the
>accounting DB is running.
>
>Mario Kadastik, PhD
>Researcher
>
>---
>  "Physics is like sex, sure it may have practical reasons, but that's
>not why we do it" 
>     -- Richard P. Feynman

Reply via email to