Can you elaborate a little? I'm not sure what kind of QoS will help, nor how to implement one that will satisfy the requirements.

On 11/19/2015 04:52 PM, Paul Edmon wrote:

You might consider a QoS for this. It may not do everything you want but it will give you the flexibility.

-Paul Edmon-

On 11/19/2015 04:49 AM, Daniel Letai wrote:

Hi,

Suppose I have a 100 node cluster with ~5% nodes down at any given time (maintanence/hw failure/...).

One of the projects requires exclusive use of 5 nodes, and be able to use entire cluster when available (when other projects aren't running).

I can do this easily if I maintain a static list of the exclusive nodes in slurm.conf:

PartitionName=public Nodes=tux0[01-95] Default=YES
PartitionName=special Nodes=tux[001-100] Default=NO

And allowing only that project to use partition special.

However, due to the downtime of 5%, I'd like to maintain a dynamic exclusive 5 nodes.
Any suggestions?

The project is serial and deployed as array of single node jobs, so I can run it even when the other 95 nodes are full.

Thanks,
--Dani_L.

Reply via email to