Thanks. For the info. The spillover stuff would be handy, but I can
definitely see the difficulties with the coding of it. Though a similar
mechanism exists for the partitions where you can list multiple
partitions and it will execute on the one that will go first. Could
this be imported into QoS?
-Paul Edmon-
On 5/21/2014 6:52 PM, je...@schedmd.com wrote:
Quoting Paul Edmon ped...@cfa.harvard.edu:
We have just started using QoS here and I was curious about a few
features which would make our lives easier.
1. Spillover/overflow: Essentially if you use up one QoS you would
spill over into your next lower priority QoS. For instance if you
used up your groups QoS but still had jobs and there were idle cycles
your jobs that were pending for your high priority QoS would go to
the low priority normal QoS.
There isn't a great way to do this today. Each job is associated with
a single QOS.
One possibility would be to submit one job to each QOS and then
whichever job started first would kill the others. A job submit plugin
could probably handle the multiple submissions (e.g. if the --qos
option has multiple comma-separated names, then submit one job for
each QOS). Offhand I'm not sure what would be a good way to identify
and purge the extra jobs. Some variation of the --depend=singleton
logic would probably do the trick.
2. Gres: Adding number of GPU's or other Gres quantities to the QoS
that can be used.
This has been discussed, but not implemented yet.
3. Requeue/No Requeue: There are some partitions we want to allow QoS
to requeue, others we don't. For instance we have a general queue
which we don't want requeue on, but we also have a backfill queue
that we do permit it on. If the QoS could kill the backfill jobs
first to find space, and just wait on the general queue that would be
great. We haven't experimented with QoS Requeue but we may in the
future so this is just looking forward.
You can configure differtent preemption mechanisms and preempt by
either QoS or partition. Take a look at:
http://slurm.schedmd.com/preempt.html
For example, you might enable QoS high to requeue jobs in QoS low,
but wait for jobs in QoS medium.
There is no mechanism to configure QoS high to preempt jobs by
partition.
We were also wondering if jobs asking for Gres could get higher
priority on those nodes, such that they can grab the GPU's and leave
the CPU's for everyone else. After all the Gres resources are
usually scarcer than the CPU resouces and we would hate for a Gres
resource to idle just because all the CPU jobs took up the slots.
-Paul Edmon-
This has also been discussed, but not implemented yet. One option
might be to use a job_submit plugin to adjust a job's nice option
based upon GRES.
There is a partition parameter MaxCPUsPerNode that might be useful to
limit the number of CPUs that are consumed on each node by each
partition. You would probably require a separate partition/queue for
GPU jobs for that to work well, so that would probably not work for you.
Let me know if you need help pursuing these options.
Moe Jette