Re: [gridengine users] round robin PE config

Reuti Wed, 14 Dec 2011 02:51:56 -0800

Am 14.12.2011 um 10:48 schrieb Lars van der bijl:

> On 13 December 2011 18:38, Reuti <[email protected]> wrote:
>> Hej Lars,
>> 
>> Am 13.12.2011 um 16:57 schrieb Lars van der bijl:
>> 
>>> hey Reuti,
>>> 
>>> I wrote a python api using networkx and a database layer called clue
>>> so i've removed a few of the flags.
>>> 
>>> qsub -r y -l mem_free=1.9G,s_vmem=2G,hbatch=1  -pe smp 1 -N
>> 
>> is there any specific reason to request "-pe smp 1"?
> 
> we pass the requested slots this way. we don't use anything other then
> PE tasks. it makes it simple in our situation.
> 
>> 
>>> qsub -pe smp 1 -t 1-50:5 
>>> /tmp/gridTask_sshot3__out__rs_mantra1__ifd_gen.30851.0.sh
>>> 
>>> qsub -pe smp 4 -hold_jid_ad 47673  -t 1-50:1 
>>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__mantra_render_split_1_seed_0.30771.0.sh
>>> qsub -pe smp 4 -hold_jid_ad 47673  -t 1-50:1 
>>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__mantra_render_split_3_seed_0.30753.0.sh
>>> qsub -pe smp 4 -hold_jid_ad 47673  -t 1-50:1 
>>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__mantra_render_split_2_seed_0.30657.0.sh
>>> qsub -pe smp 4 -hold_jid_ad 47673  -t 1-50:1 
>>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__mantra_render_split_0_seed_0.30680.0.sh
>>> 
>>> qsub -pe smp 1 -hold_jid_ad 47677,47674,47675,47676 -t 1-50:1 
>>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__joinexr.30998.0.sh
>>> 
>>> now the command runs fine. but the behaviour of the queue seems to be
>>> to want to finish a PE job before moving on to a next one where as
>>> what i'd want is to do the splits in a round robin fashion.
>> 
>> I think you are seeing the effect, that once an array job is scheduled it's 
>> already eligible to be executed before it will look into other jobs. So you 
>> see the array instances of the first job, and then for the first PE and so 
>> on.
>> 
>> What you can try is to use the option "-tc N" to limit each of arrays to run 
>> N instaces only at a time (`man qsub`). When you have still free slots left, 
>> they should getting filled by the next parallel job.
>> 
>> Overall I think the execution time will be the same, but you see the results 
>> of the first array indices earlier this way.
>> 
>> -- Reuti
> 
> Yes thats the idea. I'm not sure on using the -tc flag though because
> it could mean a single job on our farm could be limited to N task
> where it could easily be using all of the farm for it's execution.


Yep, the "-tc N" isn't "elastic" or whatever you may want to call it.

Another approach: combine all four render jobs into one, i.e. you need 16 slots 
with a fixed allocation rule of 4. Inside the jobscript you spread 3 tasks to 
other nodes by `qrsh -inherit ... &` and start a local one like usual. Before 
exiting wait for the three external ones to return.

This would assume, that the 4 jobs have more or less the same runtime per index 
to avoid idling cores.

-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] round robin PE config

Reply via email to