We have a cluster consisting of 24 eight processor nodes. The load on the cluster is dominated by parallel jobs which typically occupy between one and three nodes (8, 16, or 24 processors). However, there a few users that run smaller jobs - some single processor "serial" jobs and a few parallel jobs the use two or four processors. Because the cluster is heavily used, this mix of jobs leads to conflicts.

Specifically, when dispatching a job requiring less than eight processors, the SGE scheduler tends to assign it the the "least-loaded" node with slots available. The result is that such jobs get scattered around the cluster in a manner that then blocks scheduling of jobs requiring entire nodes. We would prefer that jobs requiring less than eight processors get dispatched instead to the "most-loaded" node that has the required number of available slot. This would cause the small jobs to get "packed" onto a few nodes rather than scattered around the cluster. While this is somewhat counter to usual scheduling practice, I believe it make sense in our environment.

Unfortunately, I have not been able to figure out how to get SGE to do this. I have tried setting the queue sorting method to "Sort by sequence number." This helps, in as much that if a series of small jobs is submitted they will tend to pack on the lowest sequence-numbered nodes with available slots. However, in general, a job gets assigned to the lowest sequence numbered node rather than packed onto the "most-loaded" node with available slots.

Today I tried the following experiment. I set the the queue sorting method to "Sort by load", and then I changed the Load Formula from the default "np_load_avg" to simply "slots". The idea was to create a perversely backwards load calculation. If a node is empty the value of the "slots" resource will be eight and thus will appear heavily loaded. Conversely, a node with seven slots already allocated will have a "slots" resource value of one and thus appear lightly loaded.

Unfortunately, this experiment has produced no discernible result. Experiments suggest the scheduler is continuing to assign jobs to the lowest sequentially numbered queue instance with available slots. Should this work? If so, is there someway to debug this? Is there someway to put the scheduler into a verbose logging mode that will compel it to reveal exactly why it chose a particular node? Any suggestions would be greatly appreciated.

BTW, the version of SGE is 6.2u2-1.

James Gladden
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to