Hi, Am 18.08.2015 um 23:49 schrieb Laurent Planchon:
> Hi > > A quick glance at the archive didn’t show anything on this topic so here it > is: > > I have a high priority queue hi_sim and and a low priority one, lo_sim which > gets suspended (nsuspend = 1) when the number of licenses (detected through a > load_sensor) becomes less than x in order for any new job on hi_sim to always > be able to grab a license. This works fine except that in this case, one job > per host (queue instance) gets suspended, which is way too many, while > ideally only one job for the whole lo_sim queue should get suspended (the > shortest running one for instance). Is there a way to achieve this ? as you run a load sensor, it could be put there to select and suspend an appropriate job as you check the overall license with it already. Well, it would has to handle the unsuspend too. Question1: you are using a load sensor as the license count is done with something like FLEXlm and can't be implemented as a consumable? Question2: a suspended job will return the license? Another (more convoluted?) option: Idea: we submit a dummy job to a dummy queue for each scheduled low priority job. If a dummy job gets suspended, the suspend_method will suspend the real job too. Hence the suspend_threshold will allow SGE to select any of the dummy jobs in the dummy queue. So you don't have to keep a list of already done suspensions and what to unsuspend - it's done by SGE. 1) define a forced boolean complex "license_job", only jobs requesting it will run where it's attached to 2) create one dummy queue residing on the qmaster machine with unlimited slots, and attach the complex "license_job" to it 2a) define the suspend_threshold with the load sensor for this just created queue only 3) for the low priority queue you have right now, define a prolog which will submit a dummy job with a sleep 99999999 while requesting the "license_job" 3a) the name of this "dummy" job should include the $job_id in an unambiguous form maybe __${job_id} 4) for the low priority queue you have right now, define an epilog which will `qdel` the job from 3) 5) for the queue in 2) we need a suspend_method and resume_method which will: 5a) suspend_method: defined as "suspend_licensed_job $job_pid $job_name" a scripts which send the sigstop signal to the dummy job itself (process group) "kill -stop -- -$1" where $1 is $job_pid in the suspend entry of the queue and 5aa) `qmod -s ${2#__}` to suspend the real job 5b) send `sigcont` like in 5a) 5bb) `qmod -us -s ${2#__}` to unsuspend the real job -- Reuti > Thanks ! > > Laurent > > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users