I've been using the example documented at: http://slurm.schedmd.com/preempt.html
Specifically Excerpt from slurm.conf PartitionName=low Nodes=linux Default=YES Shared=NO Priority=10 PreemptMode=requeue PartitionName=med Nodes=linux Default=NO Shared=FORCE:1 Priority=20 PreemptMode=suspend PartitionName=high Nodes=linux Default=NO Shared=FORCE:1 Priority=30 PreemptMode=off All my compute nodes have at least as much swap as ram. This works quite well, so high priority jobs can suspend medium priority jobs and if there's memory pressure on the node suspended jobs can pushed to swap. I enforce the memory limits so jobs using more ram than they ask for get killed. With the slurm 2.6.5 to 14.11 upgrade slurm added the ability so manage memory limits as well as CPU. So I started adding GrpMemory to users so if they purchase 4 nodes they can allocate a total of 4 nodes of CPUs or 4 nodes of memory in the high priority queue. So I have entries like: User-'test':Partition='high':DefaultAccount='testgrp':GrpCPUs=128:GrpMemory=256000 So I set DefMemPerCPU=2000, so that users who do not ask for a specific memory allocation they get 2GB per CPU. My nodes have 64GB ram and 32 CPUs. This works quite well, but it broke preemption. So now if I'm running 32 2GB jobs in the medium queue, no high priority jobs can run because all ram is allocated. That seems quite weird to me, if a job is SIGSTOP'd to suspend any memory pressure should force suspended memory pages into swap. Given that the suspended job isn't running that shouldn't cause too much I/O since each page is written just once, no churning. Is there any way to get slurm to not count suspended jobs memory allocation towards the node's memory used total? Any suggestions on how to get the old behavior back where high priority jobs can be suspended?
