We have a heterogenous cluster with several different types of node. We also have a couple of policies: i)Jobs won't run across multiple nodes of a given type if they can fit into a single node of that type. ii)Jobs that run across multiple nodes have exclusive access to those nodes while jobs that run within a single node share if sufficient resources are available.
A consequence of this is that some jobs get exclusive access to some nodes but not to others. It's not a simple case of some nodes being bigger in all respects than others either. The way we implement this is to have one exclusive resource per node type which is declared in the complex_values of each node of said type and on the queues of nodes of all other types. We declare the total number of slots we want running on a per host basis. On each node we have one queue that runs serial and $pe_slots PEs for every 2 slots defined in the host's complex_values. The JSV works out how many nodes of each type are required and requests exclusive resources and other variables appropriately to route the job. To simplify the calculation we suppress soft requests and pe ranges. Am I missing a trick? Is there a simpler way to do this? William _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
