Hi,

is it normal for slurmctld to consume in excess of 10GB of ram? I had original 
slurm controller VM created with 2GB of ram, that caused at times slurm to die 
due to OOM killer. I increased it to 6GB and we could live a little longer 
until now I had to increase it to 10GB because crashes still occurred and I now 
just witnessed the first 10GB OOM kill of slurm controller. 

We're running 2.5.3. The last 1000 lines of log before crash are here: 
http://cms.hep.kbfi.ee/~mario/dbg/slurmctld-preOOM.log

The OOM kill:
Jun 25 18:21:32 slurm-1 kernel: [5463683.553994] OOM killed process 5070 
(slurmdbd) vm:269284kB, rss:10312kB, swap:628kB
Jun 25 18:21:32 slurm-1 kernel: [5463683.909668] OOM killed process 802 
(slurmctld) vm:11688184kB, rss:10409300kB, swap:241096kB

The config file: http://cms.hep.kbfi.ee/~mario/dbg/slurm.conf

We have 167 workernodes with a total of ~5000 compute cores. Do we really need 
to give slurm far more RAM or is that amount unreasonable and points more 
likely to a memory leak?

Mario Kadastik, PhD
Researcher

---
  "Physics is like sex, sure it may have practical reasons, but that's not why 
we do it" 
     -- Richard P. Feynman

Reply via email to