Thanks. For the info. The spillover stuff would be handy, but I can definitely see the difficulties with the coding of it. Though a similar mechanism exists for the partitions where you can list multiple partitions and it will execute on the one that will go first. Could this be imported into QoS?

-Paul Edmon-

On 5/21/2014 6:52 PM, [email protected] wrote:

Quoting Paul Edmon <[email protected]>:

We have just started using QoS here and I was curious about a few features which would make our lives easier.

1. Spillover/overflow: Essentially if you use up one QoS you would spill over into your next lower priority QoS. For instance if you used up your groups QoS but still had jobs and there were idle cycles your jobs that were pending for your high priority QoS would go to the low priority normal QoS.

There isn't a great way to do this today. Each job is associated with a single QOS.

One possibility would be to submit one job to each QOS and then whichever job started first would kill the others. A job submit plugin could probably handle the multiple submissions (e.g. if the --qos option has multiple comma-separated names, then submit one job for each QOS). Offhand I'm not sure what would be a good way to identify and purge the extra jobs. Some variation of the "--depend=singleton" logic would probably do the trick.


2. Gres: Adding number of GPU's or other Gres quantities to the QoS that can be used.

This has been discussed, but not implemented yet.


3. Requeue/No Requeue: There are some partitions we want to allow QoS to requeue, others we don't. For instance we have a general queue which we don't want requeue on, but we also have a backfill queue that we do permit it on. If the QoS could kill the backfill jobs first to find space, and just wait on the general queue that would be great. We haven't experimented with QoS Requeue but we may in the future so this is just looking forward.

You can configure differtent preemption mechanisms and preempt by either QoS or partition. Take a look at:
http://slurm.schedmd.com/preempt.html
For example, you might enable QoS "high" to requeue jobs in QoS "low", but wait for jobs in QoS "medium". There is no mechanism to configure QoS "high" to preempt jobs by partition.

We were also wondering if jobs asking for Gres could get higher priority on those nodes, such that they can grab the GPU's and leave the CPU's for everyone else. After all the Gres resources are usually scarcer than the CPU resouces and we would hate for a Gres resource to idle just because all the CPU jobs took up the slots.

-Paul Edmon-

This has also been discussed, but not implemented yet. One option might be to use a job_submit plugin to adjust a job's "nice" option based upon GRES. There is a partition parameter MaxCPUsPerNode that might be useful to limit the number of CPUs that are consumed on each node by each partition. You would probably require a separate partition/queue for GPU jobs for that to work well, so that would probably not work for you.

Let me know if you need help pursuing these options.

Moe Jette

Reply via email to