[slurm-dev] Re: Partition for unused resources until needed by any other partition

Mikael Johansson Mon, 20 Oct 2014 11:18:46 -0700


Hello,

Yeah, I looked at that, and have now four partitions defined like this:

PartitionName=short    Nodes=node[005-026] Default=YES MaxNodes=6 
MaxTime=02:00:00  AllowGroups=ALL Priority=2 DisableRootJobs=NO RootOnly=NO 
Hidden=NO Shared=no PreemptMode=off
PartitionName=medium   Nodes=node[009-026] Default=NO  MaxNodes=4 
MaxTime=168:00:00 AllowGroups=ALL Priority=2 DisableRootJobs=NO RootOnly=NO 
Hidden=NO Shared=no PreemptMode=off
PartitionName=long     Nodes=node[001-004] Default=NO  MaxNodes=4 
MaxTime=744:00:00 AllowGroups=ALL Priority=2 DisableRootJobs=NO RootOnly=NO 
Hidden=NO Shared=no PreemptMode=off
PartitionName=backfill Nodes=node[001-026] Default=NO MaxNodes=10 
MaxTime=168:00:00 AllowGroups=ALL Priority=1 DisableRootJobs=NO RootOnly=NO 
Hidden=NO Shared=no PreemptMode=requeue


And I've set up:

PreemptType=preempt/partition_prio
PreemptMode=requeue
PriorityType=priority/multifactor
PriorityWeightFairshare=10000
PriorityWeightAge=2000
SelectType=select/cons_res

It works so far as when a job in "backfill" gets running, it will berequeued when a job in one of the other partitions start.

The problem is that there's plenty of free cores on the cluster that don'tget assigned jobs from "backfill". If I understand things correctly, thisis because there are jobs with a higher priority queueing in the otherpartitions.

So I would maybe need a mechanism that increases the priority of thebackfill jobs while queueing, but then immediately decreases it when thejobs start?



Cheers,
    Mikael J.
    http://www.iki.fi/~mpjohans/


On Mon, 20 Oct 2014, [email protected] wrote:

This should help:
http: //slurm.schedmd.com/preempt.html


Quoting Mikael Johansson <[email protected]>:
Hello All,
I've been scratching my head for a while now trying to figure this one out,which I would think would be a rather common setup.
I would need to set up a partition (or whatever, maybe a partition isactually not the way to go) with the following properties:
1. If there are any unused cores on the cluster, jobs submitted to this
   one would use them, and immediately have access to them.

2. The jobs should only use these resources until _any_ other job in
   another partition needs them. In this case, the jobs should be
   preempted and requeued.
So this should be some sort of "shadow" queue/partition, that shouldn'taffect the scheduling of other jobs on the cluster, but just use up anyfree resources that momentarily happen to be available. So SLURM shouldjust continue scheduling everything else normally, and treat the cores usedby this shadow queue as free resources, and then just immediately canceland requeue any jobs there, when a "real" job starts.
If anyone has something like this set up, example configs would be verywelcome, as of course all other suggestions and ideas.
Cheers,
    Mikael J.
http: //www.iki.fi/~mpjohans/
--
Morris "Moe" Jette
CTO, SchedMD LLC

[slurm-dev] Re: Partition for unused resources until needed by any other partition

Reply via email to