Hi,

Am 18.08.2015 um 23:49 schrieb Laurent Planchon:

> Hi
>  
> A quick glance at the archive didn’t show anything on this topic so here it 
> is:
>  
> I have a high priority queue hi_sim and and a low priority one, lo_sim which 
> gets suspended (nsuspend = 1) when the number of licenses (detected through a 
> load_sensor) becomes less than x in order for any new job on hi_sim to always 
> be able to grab a license. This works fine except that in this case, one job 
> per host (queue instance) gets suspended, which is way too many, while 
> ideally only one job for the whole lo_sim queue should get suspended (the 
> shortest running one for instance). Is there a way to achieve this ?

as you run a load sensor, it could be put there to select and suspend an 
appropriate job as you check the overall license with it already. Well, it 
would has to handle the unsuspend too.

Question1: you are using a load sensor as the license count is done with 
something like FLEXlm and can't be implemented as a consumable?

Question2: a suspended job will return the license?


Another (more convoluted?) option:

Idea: we submit a dummy job to a dummy queue for each scheduled low priority 
job. If a dummy job gets suspended, the suspend_method will suspend the real 
job too. Hence the suspend_threshold will allow SGE to select any of the dummy 
jobs in the dummy queue. So you don't have to keep a list of already done 
suspensions and what to unsuspend - it's done by SGE.


1) define a forced boolean complex "license_job", only jobs requesting it will 
run where it's attached to

2) create one dummy queue residing on the qmaster machine with unlimited slots, 
and attach the complex "license_job" to it

2a) define the suspend_threshold with the load sensor for this just created 
queue only

3) for the low priority queue you have right now, define a prolog which will 
submit a dummy job with a sleep 99999999 while requesting the "license_job"

3a) the name of this "dummy" job should include the $job_id in an unambiguous 
form maybe __${job_id} 

4) for the low priority queue you have right now, define an epilog which will 
`qdel` the job from 3)

5) for the queue in 2) we need a suspend_method and resume_method which will:

5a) suspend_method: defined as "suspend_licensed_job $job_pid $job_name" a 
scripts which send the sigstop signal to the dummy job itself (process group) 
"kill -stop -- -$1" where $1 is $job_pid in the suspend entry of the queue and

5aa) `qmod -s ${2#__}` to suspend the real job

5b) send `sigcont` like in 5a)

5bb) `qmod -us -s ${2#__}` to unsuspend the real job



-- Reuti


> Thanks !
> 
> Laurent
>  
>  
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to