I'm using SoGE 8.1.6 and I'm looking for a way to allow queued jobs
from one class of user ("PayingCustomers") to have control over running
jobs from another group of users ("Freebies").

The goal is to kill (possibly checkpoint & then kill) the Freebie jobs
when they are using resources and PayingCustomer jobs are waiting, not
to merely suspend jobs.

Here's what I'm thinking so far:

        Create a 2nd queue, assigned to all our nodes. This queue (FreebieQ) 
would
        have access to all cluster resources.

        Assign the FreebieQ as a subordinate of all.q.

        With user_list and xuser_list values, restrict Freebie user jobs to 
just the FreebieQ.

        Set the "suspend_method" value to send SIGKILL for the FreebieQ.

So far, so good.

My experience with job subordination leads me to think that the
subordination will only happen when:

        jobs of the two different groups are running on a single node
  -AND- 
        a load value on that node has been execeeded
(please let me know if this is wrong!).

My question is whether there is a mechanism for PayingCustomer jobs in
the queue to kill Freebie jobs if they need those resources in order to
move out of the queue and run.

For example, if there is an 8-slot server, with 3 Freebie jobs running,
a PayingCustomer job that requires 6 cores will not start running if
slots are shared between the queues.

Even if slots were 'oversubscribed' (ie. both the PayingCustomer queue
and the FreebieQ on the node were assigned 8 slots each), then the load
average (or available memory, or some other load sensor) from Freebie
jobs alone could still prevent PayingCustomer jobs from being assigned
to that node.

Any suggestions?

Thanks,

Mark
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to