Re: [gridengine users] subordinate_list not suspending tasks

Reuti Wed, 24 Oct 2012 05:10:22 -0700

Am 24.10.2012 um 09:57 schrieb Lars van der bijl:

> On 23 October 2012 22:08, Reuti <[email protected]> wrote:
>> Hi,
>> 
>> Am 23.10.2012 um 21:41 schrieb Lars van der bijl:
>> 
>>> I've got 2 queue's
>>> 
>>> $ qconf -sq final.q
>>> qname                 final.q
>>> hostlist              @allhosts
>>> suspend_thresholds    NONE
>>> nsuspend              1
>>> suspend_interval      00:01:00
>>> pe_list               make smp
>>> rerun                 TRUE
>>> 
>>> 
>>> $ qconf -sq quick.q
>>> qname                 quick.q
>>> hostlist              @allhosts
>>> suspend_thresholds    NONE
>>> nsuspend              1
>>> suspend_interval      00:01:00
>>> pe_list               make smp
>>> rerun                 TRUE
>>> subordinate_list      final.q=1
>>> 
>>> we have about 325 procs and both queue's have access to the same machines.
>>> 
>>> what I'd except to see is if I have 200 slots running in final.q and I
>>> submit a task to quick.q that it would suspend the task in final.q and
>>> push the new task in front.
>>> however what I am seeing that that only 32 slots are being used. and
>>> not all tasks are being pushed in front of the final.q
>>> 
>>> we only use parallel submission in case that makes a difference.
>>> 
>>> what could I change to get this behavior?
>> 
>> hard to tell from the information you posted, as I don't know how 32 are in 
>> any way related to 325 procs without knowing more details. So some remarks, 
>> maybe you can refine the setup or question then:
>> 
>> - the subordinate_list will only work "per exechost" queue instance
>> - in your current setup all slots from queue instance on a particular 
>> exechost will be suspended as soon as one slot in quick.q is used
>> - (may you are looking for a slot-wise subordination?)
> 
> the problem I have with the slot-wise setup is that you can only set 1
> slot value for the subordinate_list.
> what we have is a lot of 8 core machine. a few 4 ,6 and 12 cores. so
> those would have to go in separate queue's i'd imagine.
> 
> We frequently submit the same task with 4 cores or 8 cores. using "-pe
> smp 4" or "-pe smp 8" this causes a slotwise setup to be difficult to
> setup because if its set to 2 slots per host then a task submitted
> with 8 proc won't get suspended.
> 
>> - jobs in the quick.q don't have a higher priority
>> - it's best not to submit to queues in SGE, but think of "request resources" 
>> and SGE will select an appropriate queue for your job
>> 
>> For your setup this could mean to define a BOOL complex "quick" as 
>> "requestable FORCED" and attach it to the quick.q, then request "-l quick" 
>> (which implies "-l quick=TRUE") and in addition attach a high "urgency" 
>> value to this complex. Then they should go also to the top of the list. And 
>> only "quick" will run in this queue.
> 
> thanks this is a different way of thinking about them problem for me.
> 
> to specify what hosts can run a type of job we currently submit with
> hostgroups like so.
> 
> quick.q@@mantra


Assuming you have a complex "mantra" attached to the exechosts:

-l quick,mantra

or:

-l quick -q "*@@mantra"

Maybe one complex would do already: qmantra is attached only to @manta 
exechosts, qfoobar only to @foobar.


> now for other types of task we have a host group setup because we only
> have 10 license for a application. a single machine can run more then
> one of these tasks at a time but the license is only consumed ones per
> host.
> is there a way to have this setup with a complex?

Unfortunately no. Although it was an RFE a long time ago to have such a type of 
complex and was several times on the list:

http://arc.liv.ac.uk/pipermail/gridengine-users/2010-November.txt (please 
search for HOSTONCE)

or

https://arc.liv.ac.uk/trac/SGE/ticket/1318


But as you have mostly an SMP environment: what about submitting in bunches? I 
mean: we have access to a remote cluster where always complete exechosts are 
granted to a job, even if the job uses only 2 out of 24 cores. This is similar 
to your setup as you can start additional computations on the same machine 
without the need for another license. I adjusted our submission scripts, that 
in one job submission several tasks are started in the background by &, and 
later on after the "wait" in the jobscript all results are collected to 
assemble one email for the overall outcome of the individual tasks. For best 
usage, the implies that one assumes that all tasks in the job have around the 
same execution time.

-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] subordinate_list not suspending tasks

Reply via email to