You might consider a QoS for this. It may not do everything you want
but it will give you the flexibility.
-Paul Edmon-
On 11/19/2015 04:49 AM, Daniel Letai wrote:
Hi,
Suppose I have a 100 node cluster with ~5% nodes down at any given
time (maintanence/hw failure/...).
One of the projects requires exclusive use of 5 nodes, and be able to
use entire cluster when available (when other projects aren't running).
I can do this easily if I maintain a static list of the exclusive
nodes in slurm.conf:
PartitionName=public Nodes=tux0[01-95] Default=YES
PartitionName=special Nodes=tux[001-100] Default=NO
And allowing only that project to use partition special.
However, due to the downtime of 5%, I'd like to maintain a dynamic
exclusive 5 nodes.
Any suggestions?
The project is serial and deployed as array of single node jobs, so I
can run it even when the other 95 nodes are full.
Thanks,
--Dani_L.