> What is your throughput like? Ie how many jobs are you running through on
> a hourly/daily basis, and what is your MinJobAge setting?
Well not that many... I'd assume somewhere around few thousand per hour. The
MinJobAge is 200 seconds, which isn't terribly long.
> If your throughput is extremely high, it could be you are holding all that
> info in memory and it is just growing and growing without purging.
Also, right now we have cooling failure which means all workernodes are down
and slurm controller is still using 6.5GB of RSS memory. Can't be any job churn
right now and last ended jobs ended hours ago.
Mario Kadastik, PhD
Researcher
---
"Physics is like sex, sure it may have practical reasons, but that's not why
we do it"
-- Richard P. Feynman