[gridengine users] "Packing" jobs on nodes

James Gladden Tue, 17 May 2011 09:37:07 -0700

We have a cluster consisting of 24 eight processor nodes. The load onthe cluster is dominated by parallel jobs which typically occupy betweenone and three nodes (8, 16, or 24 processors). However, there a fewusers that run smaller jobs - some single processor "serial" jobs and afew parallel jobs the use two or four processors. Because the clusteris heavily used, this mix of jobs leads to conflicts.

Specifically, when dispatching a job requiring less than eightprocessors, the SGE scheduler tends to assign it the the "least-loaded"node with slots available. The result is that such jobs get scatteredaround the cluster in a manner that then blocks scheduling of jobsrequiring entire nodes. We would prefer that jobs requiring less thaneight processors get dispatched instead to the "most-loaded" node thathas the required number of available slot. This would cause the smalljobs to get "packed" onto a few nodes rather than scattered around thecluster. While this is somewhat counter to usual scheduling practice, Ibelieve it make sense in our environment.

Unfortunately, I have not been able to figure out how to get SGE to dothis. I have tried setting the queue sorting method to "Sort bysequence number." This helps, in as much that if a series of small jobsis submitted they will tend to pack on the lowest sequence-numberednodes with available slots. However, in general, a job gets assigned tothe lowest sequence numbered node rather than packed onto the"most-loaded" node with available slots.

Today I tried the following experiment. I set the the queue sortingmethod to "Sort by load", and then I changed the Load Formula from thedefault "np_load_avg" to simply "slots". The idea was to create aperversely backwards load calculation. If a node is empty the value ofthe "slots" resource will be eight and thus will appear heavily loaded.Conversely, a node with seven slots already allocated will have a"slots" resource value of one and thus appear lightly loaded.

Unfortunately, this experiment has produced no discernible result.Experiments suggest the scheduler is continuing to assign jobs to thelowest sequentially numbered queue instance with available slots.Should this work? If so, is there someway to debug this? Is theresomeway to put the scheduler into a verbose logging mode that willcompel it to reveal exactly why it chose a particular node? Anysuggestions would be greatly appreciated.


BTW, the version of SGE is 6.2u2-1.

James Gladden
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

[gridengine users] "Packing" jobs on nodes

Reply via email to