I agree having multiple partitions will decrease efficiency of the scheduler. That said if you have to do it, you have to do it. Using the features is a good way to go if people need specific ones. I could see having multiple partitions so you can charge differently for each generation of hardware, as run times will invariably be different. Still if that isn't a concern just have a single queue.

For multifactor I would turn on fairshare and age. JobSize really isn't useful unless you have people running multicore jobs and you want to prioritize, or deprioritize those.

If you end up in a multipartition scenario then I recommend having a backfill queue that underlies all the partitions and setting up REQUEUE on that partition. That way people can farm idle cycles. This is especially good for people who are hardware agnostic and don't really care when their jobs get done but rather just have a ton to do that can be interrupted at any moment. That's what we do here and we have 110 partitions. Our backfill queue does a pretty good job up picking up the idle cores but still there is structural inefficiencies with that many partitions so we never get above about 70% usage of our hardware.

So just keep that in mind when you are setting things up. More partitions means more structural inefficiency but it does give you other benefits such as isolating hardware for specific use. It really depends on what you need. I highly recommend experimenting to figure out what fits you and your users best.

-Paul Edmon-

On 1/16/2017 10:16 AM, Loris Bennett wrote:
David WALTER <david.wal...@ens.fr> writes:

Dear Loris,

Thanks for your response !

I'm going to look on this features in slurm.conf.  I only configured
the CPUs, Sockets.... per node. Do you have any example or link to
explain me how it's working and what can I use ?
It's not very complicated.  A feature is just a label, so if you had
some nodes with Intel processors and some with AMD, you could attach the
features, e.g.

NodeName=node[001,002] Procs=12 Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 
RealMemory=42000 State=unknown Feature=intel
NodeName=node[003,004] Procs=12 Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 
RealMemory=42000 State=unknown Feature=amd

Users then just request the required CPU type in their batch scripts as
a constraint, e.g:

#SBATCH --constraint="intel"

My goal is to respond to people needs and launch their jobs as fast as
possible without losing time when one partition is idle whereas the
others are fully loaded.
The easiest way to avoid the problem you describe is to just have one
partition.  If you have multiple partitions, the users have to
understand what the differences are so that they can choose sensibly.

That's why I thought the fair share factor was the best solution
Fairshare won't really help you with the problem that one partition
might be full while another is empty.  It will just affect the ordering
of jobs in the full partition, although the weight of the partition term
in the priority expression can affect the relative attractiveness of the
partitions.

In general, however, I would suggest you start with a simple set-up.
You can always add to it later to address specific issues as they arise.
For instance, you could start with one partition and two QOS: one for
normal jobs and one for test jobs.  The latter could have a higher
priority, but only a short maximum run-time and possibly a low maximum
number of jobs per user.

Cheers,

Loris

Reply via email to