Hi David,

Baker D.J. <[email protected]> writes:

> Hello,
>
> This is hopefully a very simple set of questions for someone. I’m evaluating
> slurm with a view to replacing our existing torque/moab system, and I’ve been
> reading about defining partitions and QoSs. I like the idea of being able to 
> use
> a QoS to throttle user activity -- for example to set maxcpus/user, 
> maxjobs/user
> and maxnodes/user, etc, etc. Also I’m going to define a very simple set of
> partitions to reflect the different types of nodes in the cluster. For example
>
> Batch – normal compute nodes
>
> Highmem – high memory nodes
>
> Gpu – gpu nodes

We have a similar range of hardware, albeit with three different
categories of memory, but we decided against setting these up as
separate partitions.  The disadvantage is that small memory jobs can
potentially clog up the large memory nodes; the advantage is that small
memory jobs can use the large memory nodes if they would otherwise be
empty.

> So presumably it makes sense to associate the “normal” QOS with the batch 
> queue
> and define throttling limits as needs. Then define corresponding QoSs for the
> highmem and gpu partitions. In this respect do the QOS definitions override 
> any
> definitions on the PartitionName line? For example does QOS Maxwall override
> MaxTime?

The hierarchy of the limits is given here:

https://slurm.schedmd.com/resource_limits.html

However, unless you have specific needs, having limits defined on both
the partitions and QOS might be overkill.  If, as you say later, you
have a heterogeneous job mix, you probably also have a heterogeneous
user base, some of whom might find the setup confusing.  For that
reason, I would start with a fairly simple configuration and only add to
that as the need arises.

> Also I suspect I’ll need to define a test queue with a high level of 
> throttling
> to enable users to get a limited number of small test jobs through the system
> quickly. In this respect does it make sense for my batch and test partitions 
> to
> overlap either partially or completely? At any one time the test partition 
> will
> only take a few resources out of the pool of normal compute nodes?

We originally had a separate test partition, but have now moved to a
'short' QOS on the main batch partition which increases the priority for
a limited number of jobs with a short maximum run-time.  If you have
overlapping batch and test partitions, the batch jobs can clog the test
nodes, although you could have different priorities for each partition.

> Another issue is that we do have a large mix of small and large jobs. In our
> torque/moab cluster we make use of the XFACTOR component to make sure that 
> small
> jobs don’t get starved out of the system. I don’t think there is an analog of
> this parameter in slurm, and so I need to understand how to enable smaller 
> jobs
> to compete with the larger jobs and not get starved out. Using slurm I
> understand that the backfill mechanism and priority flags like
> PriorityFavorSmall=NO and SMALL_RELATIVE_TO_TIME can help the situation. What
> are your thoughts?

We also have a very heterogeneous job mix, but don't have any problem
with small jobs starving.  On the contrary, as we share nodes, small
jobs with moderate memory requirements have an advantage, as there are
always a few cores available somewhere in the cluster, even when it is
quite full.  For this reason we favour large jobs slighty.

> Your advice on the above points would be appreciated, please.
>
> Best regards,
>
> David

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email [email protected]

Reply via email to