We have a heterogenous cluster with several different types of node.
We also have a couple of policies:
i)Jobs won't run across multiple nodes of a given type if they can fit
into a single node of that type.
ii)Jobs that run across multiple nodes have exclusive access to those
nodes while jobs that run within a single
node share if sufficient resources are available.

A consequence of this is that some jobs get exclusive access to some
nodes but not to others.  It's not a simple case of some nodes being
bigger in all respects than others either.

The way we implement this is to have one exclusive resource per node
type which is declared in the complex_values of each node of said type
and on the queues of nodes of all other types.  We declare the total
number of slots we
want running on a per host basis.   On each node we have one queue
that runs serial and $pe_slots PEs for every 2 slots defined in the
host's complex_values.  The JSV works out how many nodes of each type
are required and requests exclusive resources and other variables
appropriately to route the job.  To simplify the calculation we
suppress soft requests and pe ranges.

Am I missing a trick? Is there a simpler way to do this?

William
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to