Hi, is it normal for slurmctld to consume in excess of 10GB of ram? I had original slurm controller VM created with 2GB of ram, that caused at times slurm to die due to OOM killer. I increased it to 6GB and we could live a little longer until now I had to increase it to 10GB because crashes still occurred and I now just witnessed the first 10GB OOM kill of slurm controller.
We're running 2.5.3. The last 1000 lines of log before crash are here: http://cms.hep.kbfi.ee/~mario/dbg/slurmctld-preOOM.log The OOM kill: Jun 25 18:21:32 slurm-1 kernel: [5463683.553994] OOM killed process 5070 (slurmdbd) vm:269284kB, rss:10312kB, swap:628kB Jun 25 18:21:32 slurm-1 kernel: [5463683.909668] OOM killed process 802 (slurmctld) vm:11688184kB, rss:10409300kB, swap:241096kB The config file: http://cms.hep.kbfi.ee/~mario/dbg/slurm.conf We have 167 workernodes with a total of ~5000 compute cores. Do we really need to give slurm far more RAM or is that amount unreasonable and points more likely to a memory leak? Mario Kadastik, PhD Researcher --- "Physics is like sex, sure it may have practical reasons, but that's not why we do it" -- Richard P. Feynman
