Hi,

Suppose I have a 100 node cluster with ~5% nodes down at any given time (maintanence/hw failure/...).

One of the projects requires exclusive use of 5 nodes, and be able to use entire cluster when available (when other projects aren't running).

I can do this easily if I maintain a static list of the exclusive nodes in slurm.conf:

PartitionName=public Nodes=tux0[01-95] Default=YES
PartitionName=special Nodes=tux[001-100] Default=NO

And allowing only that project to use partition special.

However, due to the downtime of 5%, I'd like to maintain a dynamic exclusive 5 nodes.
Any suggestions?

The project is serial and deployed as array of single node jobs, so I can run it even when the other 95 nodes are full.

Thanks,
--Dani_L.

Reply via email to