What exactly is the work load on your system, (how many jobs are in the queue)?
Do you have any error messages in the log like "unpack error"? or Incomplete job record? I could see a possibility for a small leak if that happens, but nothing that would be on going. Danny > I was only able to run valgrind on the slurmctld for about a minute; > otherwise memcheck would lose control of itself after the slurmctld was > stopped and go into an endless loop of copying the alphabet into memory > (I'm not sure how well it would work to run valgrind on valgrind...) > > So, I loaded up SLURM until the slurmctld was starting to increase its > memory consumption, then killed slurmctld and restarted under valgrind > for about a minute. There are a few tiny possibly lost records (not > included since I have to type the results from a printout) and one > definitely lost: > > ==7792== HEAP SUMMARY: > ==7792== in use at exit: 277,623 bytes in 5,276 blocks > ==7792== total heap usage: 49,809 allocs, 44,533 frees, 37,882,635 > bytes allocated > > ... > > ==7792== 246,581 (74,176 direct, 172,405 indirect) bytes in 244 blocks > are definitely lost in loss record 100 of 100 > ==7792== at 0x4A05E1C: malloc (vg_replace_malloc.c:195) > ==7792== by 0x4778BC: slurm_xmalloc (xmalloc.c:94) > ==7792== by 0x43022C: create_job_record (job_mgr.c:220) > ==7792== by 0x43EC8F: load_all_job_state (job_mgr.c:862) > ==7792== by 0x45C40A: read_slurm_conf (read_config.c:740) > ==7792== by 0X42923E main (controller.c:473) > ==7792== > ==7792== LEAK SUMMARY: > ==7792== definitely lost: 74,176 bytes in 244 blocks > ==7792== indirectly lost: 172,405 bytes in 4,736 blocks > ==7792== possibly lost: 214 bytes in 7 blocks > ==7792== still reachable: 30,828 bytes in 289 blocks > ==7792== suppressed: 0 bytes in 0 blocks > > > Thanks > -Phil > > On 03/16/2011 01:48 PM, Jette, Moe wrote: > > I don't see any problems with your configuration. > > > > We use valgrind to test for memory leaks using a variety of SLURM > > configurations, > > although it is not possible to test all configurations. It would be great > > if you could > > run the slurmctld under valgrind and check for leaks. > > > > 1. Run configure with --enable-memory-leak-debug > > 2. Start slurmctld under valgrind: > > valgrind --tool=memcheck --leak-check=yes --num-callers=6 > > --leak-resolution=med slurmctld -D>val.out 2>&1 > > 3. After a while, shut it down > > scontrol shutdown > > 4. Restart the slurm daemons normally > > 5. Check the end of val.out for a memory leak report. > > > > > > ________________________________________ > > From: [email protected] [[email protected]] On > > Behalf Of Phil Sharfstein [[email protected]] > > Sent: Wednesday, March 16, 2011 1:26 PM > > To: [email protected] > > Subject: [slurm-dev] slurmctld high memory utilization > > > > The slurmctld process on my primary control machine is using over 90% of > > the available memory (16GB). After restarting slurmctld, its memory > > utilization is only a few percent. However, within 24 hours, it is > > consuming over 90% of the memory. > > > > Our slurm version is 2.2.0 running on RHEL 5.6. We are using backfill > > scheduling and cons_res select. Our jobs are all submitted with > > unlimited time limits and primarily use generic resources and licenses > > for resource allocation. We have one long-running process using the > > master resource on each of the nodes that launches a number of parallel > > slave processes that are scheduled one on each node. > > > > We will generally have 40 running master processes 50-100 pending master > > processes, 40 running slave processes and 500+ pending slave processes. > > Slave processes are prioritized (nice value) to ensure that those > > scheduled by the first launched master processes jump to the front of > > the queue (master jobs finish in the order they were launched in the > > shortest amount of time). A master process runs for 1+ hours (some > > finish 24+ hours after launch waiting for resources to complete their > > slave jobs), while a single slave processes generally completes in 5-20 > > minutes. > > > > I'm pretty sure that we are doing something wrong with our configuration > > or conops that is causing the excess memory consumption. However, I > > have not been able to track it down. > > > > Thanks, > > -Phil > > > > Our slurm.conf (excuse any typos- this was transcribed from a printout): > > > > ControlMachine=blade0204 > > ControlAddr=10.1.53.49 > > BackupController=blade0201 > > BackupAddr=10.1.53.146 > > AuthType=auth/munge > > CacheGroups=1 > > CryptoType=crypto/munge > > GresTypes=master,slave > > Licenses=fcx*3,obc*6 > > MaxJobCount=3000 > > MpiDefault=none > > ProctrackType=proctrack/pgid > > ReturnToService=1 > > SlurmctldPidFile=/var/run/slurmctld.pid > > SlurmctldPort=6817 > > SlurmdPidFile=/var/run/slurmd.pid > > SlurmdPort=6818 > > SlurmdSpoolDir=/tmp/slurmd > > SlurmUser=bin > > StateSaveLocation=/gpfs/fs0/slurm > > SwitchType=switch/none > > TaskPlugin=task/none > > HealthCheckInterval=60 > > HealthCheckProgram=/etc/slurm/healthcheck.sh > > InactiveLimit=0 > > KillWait=30 > > MessageTimeout=90 > > MinJobAge=10 > > SlurmctldTimeout=90 > > SlurmdTimeout=300 > > Waittime=0 > > FastSchedule=1 > > SchedulerType=sched/backfill > > SchedulerParameters=max_job_bf=1000 > > SchedulerPort=7321 > > SelectType=select/cons_res > > AccountingStorageType=accounting_storage/none > > ClusterName=cluster > > JobCompType=jobcomp/none > > JobAcctGatherFrequency=30 > > JobAcctGatherType=jobacct_gather/none > > SlurmctldDebug=3 > > SlurmdDebug=3 > > > > NodeName=blade02[01-16] NodeAddr=10.1.153.[146-161] Procs=8 > > RealMemory=1600 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1 > > State=UNKNOWN Gres=master:1,slave:1 > > NodeName=blade03[01-16] NodeAddr=10.1.153.[162-177] Procs=8 > > RealMemory=1600 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1 > > State=UNKNOWN Gres=master:1,slave:1 > > NodeName=blade04[01-16] NodeAddr=10.1.153.[178-193] Procs=8 > > RealMemory=1600 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1 > > State=UNKNOWN Gres=master:1,slave:1 > > > > PartitionName=clust Nodes=blade02[09-16],blade03[01-16],blade04[01-16] > > Default=YES MaxTime=INFINITE State=UP > > > > PartitionName=clusttest Nodes=blade02[01-09] Default=NO MaxTime=INFINITE > > State=UP > > > > >
