Am 14.08.2015 um 18:30 schrieb berg...@merctech.com:

> In the message dated: Fri, 14 Aug 2015 17:44:09 +0200,
> The pithy ruminations from Reuti on 
> <Re: [gridengine users] subordinating running jobs in favor of queued jobs> 
> wer
> e:
> => 
> => > Am 14.08.2015 um 11:28 schrieb Tina Friedrich 
> <tina.friedr...@diamond.ac.uk>:
> => > 
> => > Hi Mark,
> 
> Tina & Reuti,
> 
> Thanks for getting back to me about this.
> 
> => > 
> => > > My experience with job subordination leads me to think that the
> => >> subordination will only happen when:
> => >> 
> => >>         jobs of the two different groups are running on a single node
> => >>   -AND-
> => >>         a load value on that node has been execeeded
> => >> (please let me know if this is wrong!).
> => > 
> => > I don't think that's correct. Queue subordination can be set up so that 
> any job going into the predominant queue on a node will suspend all jobs 
> (actually, all queue instances and all jobs in them, naturally) in the 
> subordinate queue.
> 
> Ah. OK. I haven't used subordination that way.
> 
> => 
> => I assume he refers to "suspend_thresholds". Sure, it should be possible
> 
> Yes, suspend_thresholds.
> 
> => to set up a consumable and mimic the used slot count in the payed queue
> => this way.
> 
> I don't quite understand. Are you saying track the used/free slot count
> per-node as a consumable, and use that within the FreebieQ to limit jobs
> assigned per node?

Yep. One could request payed=1 per used payed slot and remove all freebies at 
as soon as one payed resource is used (the alarm should be triggered as soon as 
the remaining count falls under a certain limit, i.e. 8 for an 8 core machine). 
Essentially the same as subordination with a value of one.

Or to suspend one after the other be letting just all jobs of both types 
request this resource and check whether it falls below a certain limit. I mean: 
the count can't be negative, hence you would grant 16 of this resource for an 8 
core machine. When the remaining count falls below 8, one of the freebies job 
should be suspended. But unfortunately it's not working.

https://arc.liv.ac.uk/trac/SGE/ticket/201

But see below.

Neverthess such a setup can be used for load_threshold to drain a queue, i.e. 
frebbies.q would be closed for new jobs on this node but the running ones will 
continue.


> => 
> => (Although I just found that complexes can't be used as suspend_thresholds, 
> while they are still working as load_thresholds.)
> => 
> => -- Reuti
> => 
> => 
> => > We rely on this mechanism quite heavily here :)
> => > 
> => > Set up is (in queue conf of 'higher order' queue, called test-medium.q:)
> => > 
> => > subordinate_list      test-low.q=1
> => > 
> => > and that suspends test-low.q the moment a job goes into test-medium.q. 
> And the job in test-medium.q would always start, as it only sees free slots 
> (all slots on all nodes are in all queues, if that makes sense?)
> 
> Hmmm...that would mean suspending all Freebie jobs on a node, not just
> the minimum required to allow PayingCustomer jobs to run, correct?

You may look into a slotwise preemption (`man queue_conf`, to keep the overall 
used slots per node below a certain limit). But IIRC this only works for serial 
jobs.

-- Reuti


> => > 
> => >> My question is whether there is a mechanism for PayingCustomer jobs in
> => >> the queue to kill Freebie jobs if they need those resources in order to
> => >> move out of the queue and run.
> => > 
> => > Well, I don't kill them, just suspend them; I have a scenario where 
> certain jobs/users need to be able to always run jobs now, and that's how we 
> achieve it.
> => > 
> 
> I was looking at killing Freebie jobs, rather than suspending them
> because we're using memory (h_vmem) as a consumable. My understanding is
> that suspending a job doesn't update SGE's idea of available memory--the
> memory complex still sees the memory assigned to the suspended job as
> consumed. I also believe that the h_vmem complex is per-host (applies
> to every queue on a node), not unique per queue (per node).  So,
> if a Freebie job is using a lot of memory, then subordinating the FreebieQ
> (and suspending that job) still might not allow other jobs to run,
> as SGE would believe the node had insufficient memory.
> 
> Thanks,
> 
> Mark
> 
> => > 
> => > -- 
> => > Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
> => > Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
> => > 
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to