In the message dated: Fri, 14 Aug 2015 17:44:09 +0200,
The pithy ruminations from Reuti on 
<Re: [gridengine users] subordinating running jobs in favor of queued jobs> wer
e:
=> 
=> > Am 14.08.2015 um 11:28 schrieb Tina Friedrich 
<[email protected]>:
=> > 
=> > Hi Mark,

Tina & Reuti,

Thanks for getting back to me about this.

=> > 
=> > > My experience with job subordination leads me to think that the
=> >> subordination will only happen when:
=> >> 
=> >>   jobs of the two different groups are running on a single node
=> >>   -AND-
=> >>   a load value on that node has been execeeded
=> >> (please let me know if this is wrong!).
=> > 
=> > I don't think that's correct. Queue subordination can be set up so that 
any job going into the predominant queue on a node will suspend all jobs 
(actually, all queue instances and all jobs in them, naturally) in the 
subordinate queue.

Ah. OK. I haven't used subordination that way.

=> 
=> I assume he refers to "suspend_thresholds". Sure, it should be possible

Yes, suspend_thresholds.

=> to set up a consumable and mimic the used slot count in the payed queue
=> this way.

I don't quite understand. Are you saying track the used/free slot count
per-node as a consumable, and use that within the FreebieQ to limit jobs
assigned per node?


=> 
=> (Although I just found that complexes can't be used as suspend_thresholds, 
while they are still working as load_thresholds.)
=> 
=> -- Reuti
=> 
=> 
=> > We rely on this mechanism quite heavily here :)
=> > 
=> > Set up is (in queue conf of 'higher order' queue, called test-medium.q:)
=> > 
=> > subordinate_list      test-low.q=1
=> > 
=> > and that suspends test-low.q the moment a job goes into test-medium.q. And 
the job in test-medium.q would always start, as it only sees free slots (all 
slots on all nodes are in all queues, if that makes sense?)

Hmmm...that would mean suspending all Freebie jobs on a node, not just
the minimum required to allow PayingCustomer jobs to run, correct?

=> > 
=> >> My question is whether there is a mechanism for PayingCustomer jobs in
=> >> the queue to kill Freebie jobs if they need those resources in order to
=> >> move out of the queue and run.
=> > 
=> > Well, I don't kill them, just suspend them; I have a scenario where 
certain jobs/users need to be able to always run jobs now, and that's how we 
achieve it.
=> > 

I was looking at killing Freebie jobs, rather than suspending them
because we're using memory (h_vmem) as a consumable. My understanding is
that suspending a job doesn't update SGE's idea of available memory--the
memory complex still sees the memory assigned to the suspended job as
consumed. I also believe that the h_vmem complex is per-host (applies
to every queue on a node), not unique per queue (per node).  So,
if a Freebie job is using a lot of memory, then subordinating the FreebieQ
(and suspending that job) still might not allow other jobs to run,
as SGE would believe the node had insufficient memory.

Thanks,

Mark

=> > 
=> > -- 
=> > Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
=> > Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
=> > 

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to