I've been using the example documented at:
  http://slurm.schedmd.com/preempt.html

Specifically  Excerpt from slurm.conf
PartitionName=low Nodes=linux Default=YES Shared=NO      Priority=10
PreemptMode=requeue
PartitionName=med Nodes=linux Default=NO  Shared=FORCE:1 Priority=20
PreemptMode=suspend
PartitionName=high  Nodes=linux Default=NO  Shared=FORCE:1 Priority=30
PreemptMode=off

All my compute nodes have at least as much swap as ram.  This works quite well,
so high priority jobs can suspend medium priority jobs and if there's memory
pressure on the node suspended jobs can pushed to swap.  I enforce the memory
limits so jobs using more ram than they ask for get killed.  With the slurm
2.6.5 to 14.11 upgrade slurm added the ability so manage memory limits as well
as CPU.

So I started adding GrpMemory to users so if they purchase 4 nodes they can
allocate a total of 4 nodes of CPUs or 4 nodes of memory in the high priority
queue.  So I have entries like:
User-'test':Partition='high':DefaultAccount='testgrp':GrpCPUs=128:GrpMemory=256000

So I set DefMemPerCPU=2000, so that users who do not ask for a specific memory
allocation they get 2GB per CPU.  My nodes have 64GB ram and 32 CPUs.  This
works quite well, but it broke preemption.

So now if I'm running 32 2GB jobs in the medium queue, no high priority jobs can
run because all ram is allocated.  That seems quite weird to me, if a job is
SIGSTOP'd to suspend any memory pressure should force suspended memory pages
into swap.  Given that the suspended job isn't running that shouldn't cause too
much I/O since each page is written just once, no churning.

Is there any way to get slurm to not count suspended jobs memory allocation
towards the node's memory used total?

Any suggestions on how to get the old behavior back where high priority jobs can
be suspended?

Reply via email to