In the message dated: Fri, 14 Aug 2015 17:44:09 +0200, The pithy ruminations from Reuti on <Re: [gridengine users] subordinating running jobs in favor of queued jobs> wer e: => => > Am 14.08.2015 um 11:28 schrieb Tina Friedrich <[email protected]>: => > => > Hi Mark,
Tina & Reuti, Thanks for getting back to me about this. => > => > > My experience with job subordination leads me to think that the => >> subordination will only happen when: => >> => >> jobs of the two different groups are running on a single node => >> -AND- => >> a load value on that node has been execeeded => >> (please let me know if this is wrong!). => > => > I don't think that's correct. Queue subordination can be set up so that any job going into the predominant queue on a node will suspend all jobs (actually, all queue instances and all jobs in them, naturally) in the subordinate queue. Ah. OK. I haven't used subordination that way. => => I assume he refers to "suspend_thresholds". Sure, it should be possible Yes, suspend_thresholds. => to set up a consumable and mimic the used slot count in the payed queue => this way. I don't quite understand. Are you saying track the used/free slot count per-node as a consumable, and use that within the FreebieQ to limit jobs assigned per node? => => (Although I just found that complexes can't be used as suspend_thresholds, while they are still working as load_thresholds.) => => -- Reuti => => => > We rely on this mechanism quite heavily here :) => > => > Set up is (in queue conf of 'higher order' queue, called test-medium.q:) => > => > subordinate_list test-low.q=1 => > => > and that suspends test-low.q the moment a job goes into test-medium.q. And the job in test-medium.q would always start, as it only sees free slots (all slots on all nodes are in all queues, if that makes sense?) Hmmm...that would mean suspending all Freebie jobs on a node, not just the minimum required to allow PayingCustomer jobs to run, correct? => > => >> My question is whether there is a mechanism for PayingCustomer jobs in => >> the queue to kill Freebie jobs if they need those resources in order to => >> move out of the queue and run. => > => > Well, I don't kill them, just suspend them; I have a scenario where certain jobs/users need to be able to always run jobs now, and that's how we achieve it. => > I was looking at killing Freebie jobs, rather than suspending them because we're using memory (h_vmem) as a consumable. My understanding is that suspending a job doesn't update SGE's idea of available memory--the memory complex still sees the memory assigned to the suspended job as consumed. I also believe that the h_vmem complex is per-host (applies to every queue on a node), not unique per queue (per node). So, if a Freebie job is using a lot of memory, then subordinating the FreebieQ (and suspending that job) still might not allow other jobs to run, as SGE would believe the node had insufficient memory. Thanks, Mark => > => > -- => > Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd => > Diamond House, Harwell Science and Innovation Campus - 01235 77 8442 => > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
