Hello guys, Functional policy appears to be working so far for my. Here's another challenge for my configurations: Jesse, you mentioned that I can suspend jobs via other means through load threshold and subordinate queues. Can you give me some ideas on how this should be configured? My goal is if a queue goes over the given percentage resouces (slots in this case) and someone else from another queue want their share, some running job in that queue will be suspended. Thank you for any advice.
On Tue, Nov 27, 2012 at 3:08 PM, Reuti <[email protected]> wrote: > Hi, > > Am 27.11.2012 um 22:43 schrieb Jesse Becker: > > > On Tue, Nov 27, 2012 at 04:27:49PM -0500, Allan Tran wrote: > >> Hello, > >> I'm running SGE 8.1.2 from https://arc.liv.ac.uk/trac/SGE. Things are > running fine by default. Now I need to set up a share policy but not sure > how to approach this the best possible. > >> The scenario is we have different groups of users and I need to give > each of them a defined resource (or slots) so at any given time, each group > will has a guaranty of slots. > >> Say I have 120 slots (10 x 12 core procs) and 5 groups; math 20% (or 2 > nodes), chem 50% (5), cs 10%, bio 10% (1) and other 10% (1). > >> 1. If the cluster is idle, any user in any group can get whatever they > ask for. > >> 2. If the cluster is busy with all math users running (120 slots) and > then chem user needs 50 slots, then 50 slots of math jobs will be suspended > to allow chem users to run. Then if any other group needs to run, more math > jobs will be suspended but math will guaranty to have at least 20 slots. > >> Does it makes sense? > >> > >> I was thinking to enable the functional share policy and actually set > it up, following this instructions ( > http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-5677/i999885/index.html > ) > >> However I'm not quite clear how the number of functional tickets > translates to SGE slots. Will job will be suspended or resumed by default > with this setup? Or does it even do what I'm after here. > >> Thank for your response and advice. > > > > Functional shares (alone), won't suspend any jobs. It is used for > > scheduling jobs, to try and balanace job distribution as best it can > > according to the ticket policy you've set. > > Yes, when a job is allowed to start, it will be allowed to run up to the > end. There is nothing in SGE to reschedule or suspend a job according to > the tickets. > > One small addition: a functional policy can also change the "nice" value > of jobs to achieve a certain distribution (if nodes are oversubscribed of > course). Settings "reprioritize_interval" in the scheduler and in addition > for now "reprioritize" in the SGE configuration (`man sge_priority`). > > -- Reuti > > > > With 120 total slots in a single queue, and assuming sufficient jobs from > > each "group," SGE will try to allocate 24 SLOTS to math, 60 SLOTS to > chem, > > 12 to CS, 12 to Bio, and 12 to "other." Note that I said "slots" and, > > not "nodes." Unless there's a good reason to not "mix" jobs from > > different groups on the same node, don't try to segregate things. > > > > Functional shares also won't inherently suspend any jobs; it deals with > > scheduling and dispatch. You can suspend jobs via other means though, > > including load threshold and subordinate queues. > > > > Incidentally, the "share tree" works basically the same way as > > functional shares, except that it takes past usage into account. > > Functional shares *only* look at current state of the queues *right > > now*. This may, or may not be appropriate for your circumstance. > > > > You might want to look into "resource quotas" as well, to keep a given > > group from taking over the cluster. > > > > > > -- > > Jesse Becker > > NHGRI Linux support (Digicon Contractor) > > _______________________________________________ > > users mailing list > > [email protected] > > https://gridengine.org/mailman/listinfo/users > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
