I'm using SoGE 8.1.6 and I'm looking for a way to allow queued jobs from one class of user ("PayingCustomers") to have control over running jobs from another group of users ("Freebies").
The goal is to kill (possibly checkpoint & then kill) the Freebie jobs when they are using resources and PayingCustomer jobs are waiting, not to merely suspend jobs. Here's what I'm thinking so far: Create a 2nd queue, assigned to all our nodes. This queue (FreebieQ) would have access to all cluster resources. Assign the FreebieQ as a subordinate of all.q. With user_list and xuser_list values, restrict Freebie user jobs to just the FreebieQ. Set the "suspend_method" value to send SIGKILL for the FreebieQ. So far, so good. My experience with job subordination leads me to think that the subordination will only happen when: jobs of the two different groups are running on a single node -AND- a load value on that node has been execeeded (please let me know if this is wrong!). My question is whether there is a mechanism for PayingCustomer jobs in the queue to kill Freebie jobs if they need those resources in order to move out of the queue and run. For example, if there is an 8-slot server, with 3 Freebie jobs running, a PayingCustomer job that requires 6 cores will not start running if slots are shared between the queues. Even if slots were 'oversubscribed' (ie. both the PayingCustomer queue and the FreebieQ on the node were assigned 8 slots each), then the load average (or available memory, or some other load sensor) from Freebie jobs alone could still prevent PayingCustomer jobs from being assigned to that node. Any suggestions? Thanks, Mark _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users