We have just started using QoS here and I was curious about a few features which would make our lives easier.

1. Spillover/overflow: Essentially if you use up one QoS you would spill over into your next lower priority QoS. For instance if you used up your groups QoS but still had jobs and there were idle cycles your jobs that were pending for your high priority QoS would go to the low priority normal QoS.

2. Gres: Adding number of GPU's or other Gres quantities to the QoS that can be used.

3. Requeue/No Requeue: There are some partitions we want to allow QoS to requeue, others we don't. For instance we have a general queue which we don't want requeue on, but we also have a backfill queue that we do permit it on. If the QoS could kill the backfill jobs first to find space, and just wait on the general queue that would be great. We haven't experimented with QoS Requeue but we may in the future so this is just looking forward.

We were also wondering if jobs asking for Gres could get higher priority on those nodes, such that they can grab the GPU's and leave the CPU's for everyone else. After all the Gres resources are usually scarcer than the CPU resouces and we would hate for a Gres resource to idle just because all the CPU jobs took up the slots.

-Paul Edmon-

Reply via email to