[slurm-dev] Re: Partition for unused resources until needed by any other partition

Paul Edmon Mon, 20 Oct 2014 11:23:54 -0700

I advise using the following SchedulerParameters, partition_job_depth,and bf_max_job_part. This will force the scheduler to schedule jobs foreach partition. Otherwise it will take a strictly top down approach.



This is what we run:

# default_queue_depth should be some multiple of the partition_job_depth,
# ideally number_of_partitions * partition_job_depth.
SchedulerParameters=default_queue_depth=5700,partition_job_depth=100,bf_interval=1,bf_continue,bf_window=2880,bf_resolution=3600,bf_max_job_test=50000,bf_max_job_part=50000,b
f_max_job_user=1,bf_max_job_start=100,max_rpc_cnt=8

These parameters work well for a cluster of 50,000 cores, 57 queues, andabout 40,000 jobs per day. We are running 14.03.8


-Paul Edmon-

On 10/20/2014 02:19 PM, Mikael Johansson wrote:

Hello,

Yeah, I looked at that, and have now four partitions defined like this:
PartitionName=short Nodes=node[005-026] Default=YES MaxNodes=6MaxTime=02:00:00 AllowGroups=ALL Priority=2 DisableRootJobs=NORootOnly=NO Hidden=NO Shared=no PreemptMode=offPartitionName=medium Nodes=node[009-026] Default=NO MaxNodes=4MaxTime=168:00:00 AllowGroups=ALL Priority=2 DisableRootJobs=NORootOnly=NO Hidden=NO Shared=no PreemptMode=offPartitionName=long Nodes=node[001-004] Default=NO MaxNodes=4MaxTime=744:00:00 AllowGroups=ALL Priority=2 DisableRootJobs=NORootOnly=NO Hidden=NO Shared=no PreemptMode=offPartitionName=backfill Nodes=node[001-026] Default=NO MaxNodes=10MaxTime=168:00:00 AllowGroups=ALL Priority=1 DisableRootJobs=NORootOnly=NO Hidden=NO Shared=no PreemptMode=requeue
And I've set up:

PreemptType=preempt/partition_prio
PreemptMode=requeue
PriorityType=priority/multifactor
PriorityWeightFairshare=10000
PriorityWeightAge=2000
SelectType=select/cons_res
It works so far as when a job in "backfill" gets running, it will berequeued when a job in one of the other partitions start.
The problem is that there's plenty of free cores on the cluster thatdon't get assigned jobs from "backfill". If I understand thingscorrectly, this is because there are jobs with a higher priorityqueueing in the other partitions.
So I would maybe need a mechanism that increases the priority of thebackfill jobs while queueing, but then immediately decreases it whenthe jobs start?
Cheers,
    Mikael J.
    http://www.iki.fi/~mpjohans/


On Mon, 20 Oct 2014, je...@schedmd.com wrote:
This should help:
http: //slurm.schedmd.com/preempt.html


Quoting Mikael Johansson <mikael.johans...@iki.fi>:
Hello All,
I've been scratching my head for a while now trying to figure thisone out, which I would think would be a rather common setup.
I would need to set up a partition (or whatever, maybe a partitionis actually not the way to go) with the following properties:
1. If there are any unused cores on the cluster, jobs submitted to this
   one would use them, and immediately have access to them.

2. The jobs should only use these resources until _any_ other job in
   another partition needs them. In this case, the jobs should be
   preempted and requeued.
So this should be some sort of "shadow" queue/partition, thatshouldn't affect the scheduling of other jobs on the cluster, butjust use up any free resources that momentarily happen to beavailable. So SLURM should just continue scheduling everything elsenormally, and treat the cores used by this shadow queue as freeresources, and then just immediately cancel and requeue any jobsthere, when a "real" job starts.
If anyone has something like this set up, example configs would bevery welcome, as of course all other suggestions and ideas.
Cheers,
    Mikael J.
http: //www.iki.fi/~mpjohans/
--
Morris "Moe" Jette
CTO, SchedMD LLC

[slurm-dev] Re: Partition for unused resources until needed by any other partition

Reply via email to