Quoting Satrajit Ghosh <[email protected]>:
hi folks, we are trying to setup a cluster in a mixed usage scenario. thus far we have had two slurm partitions (all_nodes, interactive). interactive at present contains a single node that is also part of all_nodes. --- PartitionName=all_nodes Default=YES MinNodes=1 AllowGroups=ALL Priority=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=FORCE:4 GraceTime=0 ReqResv=NO PreemptMode=GANG State=UP Nodes=node[001-030] PartitionName=interactive Default=NO MinNodes=1 MaxNodes=1 DefaultTime=01:00:00 MaxTime=01:00:00 AllowGroups=ALL Priority=10 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 MaxCPUsPerNode=32 ReqResv=NO PreemptMode=GANG State=UP Nodes=node017 --- what we are trying to achieve is a balance between cluster utilization and interactive jobs. are there ways in which we can balance these two options effectively? this would be our list of constraints: 1. compute resources are time sliced across jobs. (this is already the case, but doesn't appear to be compatible with constraint #2)
You'll want to set SelectTypeParameters to manage memory in order and avoid overcommitting memory (e.g. CR_CPU_MEMORY, plus DefMemPerCPU, MaxMemPerCPU, etc.).
I would remove the PreemptMode=GANG on each partition and instead set PreemptMode=GANG,SUSPEND on a separate line to apply globally.
You'll also need to configure Shared=FORCE:1 in partition "interactive" if you want it to preempt jobs running in the "all_node" partition.
For more information, see: http://slurm.schedmd.com/slurm.conf.html http://slurm.schedmd.com/gang_scheduling.html http://slurm.schedmd.com/preempt.html
2. an interactive job request should get priority and exclusive access within at most the time slicing window (we are using the default 30s) independent on the number of jobs running on the node.
There isn't an fundamental difference in prioritization or scheduling for interactive jobs vs. batch jobs. You might use a job_submit plugin to check for batch jobs (anything with a script) to set a nice value on it and lower its scheduling priority. Also at some point, you can exhaust memory and/or CPUs so jobs may need to get queued and wait for resources.
http://slurm.schedmd.com/job_submit_plugins.html
3. we would like to control the max number of slots an interactive job could ask for.
Slurm supports a bunch of per job, per user, and per account limits. See: http://slurm.schedmd.com/resource_limits.html
4. we would like these partitions to overlap. i.e. we don't want to carve out compute resources for the interactive partition.
No problem.
any guidance would be much appreciated. also, these nodes have 1:12 core to memory ratio, so many jobs can be launched and suspended on any node. cheers, satra
