Moe,
It looks like these patches took care of the memory problems. Now
slurmctld is using almost no memory, even with several hundred jobs in
the queue (0.0% according to top.) Also, valgrind no longer shows any
leaks.
We'll schedule up some high-load jobs to see if we can create slurmctld
memory issues again. However, it is looking pretty good right now.
Thanks,
Phil
On 03/18/2011 04:00 PM, Jette, Moe wrote:
One more question, what's the virtual and real memory use?
This is from top on one of our clusters:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5905 slurm 15 0 568m 40m 1920 S 0.7 0.2 28:58.68 slurmctld
This is the patch to release a license data structure at shutdown (it does
not leak more through time, but this is the only other leak I've found):
Index: src/slurmctld/controller.c
===================================================================
--- src/slurmctld/controller.c (revision 22800)
+++ src/slurmctld/controller.c (working copy)
@@ -668,6 +668,7 @@
switch_fini();
/* purge remaining data structures */
+ license_free();
slurm_cred_ctx_destroy(slurmctld_config.cred_ctx);
slurm_crypto_fini(); /* must be after ctx_destroy */
slurm_conf_destroy();
and the first patch that I sent for the job leak, again this should not
grow through time:
Index: src/slurmctld/job_mgr.c
===================================================================
--- src/slurmctld/job_mgr.c (revision 22800)
+++ src/slurmctld/job_mgr.c (working copy)
@@ -252,7 +252,8 @@
return;
xassert (job_entry->details->magic == DETAILS_MAGIC);
- _delete_job_desc_files(job_entry->job_id);
+ if (IS_JOB_FINISHED(job_entry))
+ _delete_job_desc_files(job_entry->job_id);
for (i=0; i<job_entry->details->argc; i++)
xfree(job_entry->details->argv[i]);
@@ -4632,8 +4633,7 @@
fatal("job hash error");
*job_pptr = job_ptr->job_next;
- if (IS_JOB_FINISHED(job_ptr))
- delete_job_details(job_ptr);
+ delete_job_details(job_ptr);
xfree(job_ptr->account);
xfree(job_ptr->alloc_node);
xfree(job_ptr->comment);
________________________________________
From: Phil Sharfstein [[email protected]]
Sent: Friday, March 18, 2011 1:52 PM
To: [email protected]
Cc: Jette, Moe
Subject: Re: [slurm-dev] slurmctld high memory utilization
I was only able to run valgrind on the slurmctld for about a minute;
otherwise memcheck would lose control of itself after the slurmctld was
stopped and go into an endless loop of copying the alphabet into memory
(I'm not sure how well it would work to run valgrind on valgrind...)
So, I loaded up SLURM until the slurmctld was starting to increase its
memory consumption, then killed slurmctld and restarted under valgrind
for about a minute. There are a few tiny possibly lost records (not
included since I have to type the results from a printout) and one
definitely lost:
==7792== HEAP SUMMARY:
==7792== in use at exit: 277,623 bytes in 5,276 blocks
==7792== total heap usage: 49,809 allocs, 44,533 frees, 37,882,635
bytes allocated
...
==7792== 246,581 (74,176 direct, 172,405 indirect) bytes in 244 blocks
are definitely lost in loss record 100 of 100
==7792== at 0x4A05E1C: malloc (vg_replace_malloc.c:195)
==7792== by 0x4778BC: slurm_xmalloc (xmalloc.c:94)
==7792== by 0x43022C: create_job_record (job_mgr.c:220)
==7792== by 0x43EC8F: load_all_job_state (job_mgr.c:862)
==7792== by 0x45C40A: read_slurm_conf (read_config.c:740)
==7792== by 0X42923E main (controller.c:473)
==7792==
==7792== LEAK SUMMARY:
==7792== definitely lost: 74,176 bytes in 244 blocks
==7792== indirectly lost: 172,405 bytes in 4,736 blocks
==7792== possibly lost: 214 bytes in 7 blocks
==7792== still reachable: 30,828 bytes in 289 blocks
==7792== suppressed: 0 bytes in 0 blocks
Thanks
-Phil
On 03/16/2011 01:48 PM, Jette, Moe wrote:
I don't see any problems with your configuration.
We use valgrind to test for memory leaks using a variety of SLURM
configurations,
although it is not possible to test all configurations. It would be great if
you could
run the slurmctld under valgrind and check for leaks.
1. Run configure with --enable-memory-leak-debug
2. Start slurmctld under valgrind:
valgrind --tool=memcheck --leak-check=yes --num-callers=6
--leak-resolution=med slurmctld -D>val.out 2>&1
3. After a while, shut it down
scontrol shutdown
4. Restart the slurm daemons normally
5. Check the end of val.out for a memory leak report.
________________________________________
From: [email protected] [[email protected]] On Behalf
Of Phil Sharfstein [[email protected]]
Sent: Wednesday, March 16, 2011 1:26 PM
To: [email protected]
Subject: [slurm-dev] slurmctld high memory utilization
The slurmctld process on my primary control machine is using over 90% of
the available memory (16GB). After restarting slurmctld, its memory
utilization is only a few percent. However, within 24 hours, it is
consuming over 90% of the memory.
Our slurm version is 2.2.0 running on RHEL 5.6. We are using backfill
scheduling and cons_res select. Our jobs are all submitted with
unlimited time limits and primarily use generic resources and licenses
for resource allocation. We have one long-running process using the
master resource on each of the nodes that launches a number of parallel
slave processes that are scheduled one on each node.
We will generally have 40 running master processes 50-100 pending master
processes, 40 running slave processes and 500+ pending slave processes.
Slave processes are prioritized (nice value) to ensure that those
scheduled by the first launched master processes jump to the front of
the queue (master jobs finish in the order they were launched in the
shortest amount of time). A master process runs for 1+ hours (some
finish 24+ hours after launch waiting for resources to complete their
slave jobs), while a single slave processes generally completes in 5-20
minutes.
I'm pretty sure that we are doing something wrong with our configuration
or conops that is causing the excess memory consumption. However, I
have not been able to track it down.
Thanks,
-Phil
Our slurm.conf (excuse any typos- this was transcribed from a printout):
ControlMachine=blade0204
ControlAddr=10.1.53.49
BackupController=blade0201
BackupAddr=10.1.53.146
AuthType=auth/munge
CacheGroups=1
CryptoType=crypto/munge
GresTypes=master,slave
Licenses=fcx*3,obc*6
MaxJobCount=3000
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/tmp/slurmd
SlurmUser=bin
StateSaveLocation=/gpfs/fs0/slurm
SwitchType=switch/none
TaskPlugin=task/none
HealthCheckInterval=60
HealthCheckProgram=/etc/slurm/healthcheck.sh
InactiveLimit=0
KillWait=30
MessageTimeout=90
MinJobAge=10
SlurmctldTimeout=90
SlurmdTimeout=300
Waittime=0
FastSchedule=1
SchedulerType=sched/backfill
SchedulerParameters=max_job_bf=1000
SchedulerPort=7321
SelectType=select/cons_res
AccountingStorageType=accounting_storage/none
ClusterName=cluster
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
SlurmdDebug=3
NodeName=blade02[01-16] NodeAddr=10.1.153.[146-161] Procs=8
RealMemory=1600 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1
State=UNKNOWN Gres=master:1,slave:1
NodeName=blade03[01-16] NodeAddr=10.1.153.[162-177] Procs=8
RealMemory=1600 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1
State=UNKNOWN Gres=master:1,slave:1
NodeName=blade04[01-16] NodeAddr=10.1.153.[178-193] Procs=8
RealMemory=1600 Sockets=2 CoresPerSocket=4 ThreadsPerCore=1
State=UNKNOWN Gres=master:1,slave:1
PartitionName=clust Nodes=blade02[09-16],blade03[01-16],blade04[01-16]
Default=YES MaxTime=INFINITE State=UP
PartitionName=clusttest Nodes=blade02[01-09] Default=NO MaxTime=INFINITE
State=UP