We use the valgrind tool to test all slurm daemons for memory leaks with a variety of configurations. See if you can identify the source of leaks.Iinstructions in src/slurmctld/controller.c:
/**************************************************************************\ * To test for memory leaks, set MEMORY_LEAK_DEBUG to 1 using * "configure --enable-memory-leak-debug" then execute * $ valgrind --tool=memcheck --leak-check=yes --num-callers=8 \ * --leak-resolution=med ./slurmctld -Dc >valg.ctld.out 2>&1 * * Then exercise the slurmctld functionality before executing * > scontrol shutdown Quoting Mario Kadastik <[email protected]>: > >> I have encountered that slurmctld uses more than 20GB of virtual memory. >> But the RSS is less than 1GB. I am not sure whether this is OK or there >> is some leakage. >> >> 在 2013-06-25二的 11:56 -0700,Mario Kadastik写道: >>> The OOM kill: >>> Jun 25 18:21:32 slurm-1 kernel: [5463683.553994] OOM killed >>> process 5070 (slurmdbd) vm:269284kB, rss:10312kB, swap:628kB >>> Jun 25 18:21:32 slurm-1 kernel: [5463683.909668] OOM killed >>> process 802 (slurmctld) vm:11688184kB, rss:10409300kB, swap:241096kB > > > As you can see in my case the RSS was 10GB and that was the cause > for the kill. The VM was 11GB. But maybe I should increase the > virtual machines VM size, maybe that'd keep the RSS down a bit, but > I doubt this is the case. If there have been memory leak > improvements with regard to 2.5.3 to current release, then I could > upgrade, but I'd really like to know that this is a known effect as > this is a production system. > > Thanks, > > Mario Kadastik, PhD > Researcher > > --- > "Physics is like sex, sure it may have practical reasons, but > that's not why we do it" > -- Richard P. Feynman >
