Re: [gridengine users] round robin PE config

Lars van der bijl Wed, 14 Dec 2011 01:49:12 -0800

On 13 December 2011 18:38, Reuti <[email protected]> wrote:
> Hej Lars,
>
> Am 13.12.2011 um 16:57 schrieb Lars van der bijl:
>
>> hey Reuti,
>>
>> I wrote a python api using networkx and a database layer called clue
>> so i've removed a few of the flags.
>>
>> qsub -r y -l mem_free=1.9G,s_vmem=2G,hbatch=1  -pe smp 1 -N
>
> is there any specific reason to request "-pe smp 1"?


we pass the requested slots this way. we don't use anything other then
PE tasks. it makes it simple in our situation.

>
>> qsub -pe smp 1 -t 1-50:5 
>> /tmp/gridTask_sshot3__out__rs_mantra1__ifd_gen.30851.0.sh
>>
>> qsub -pe smp 4 -hold_jid_ad 47673  -t 1-50:1 
>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__mantra_render_split_1_seed_0.30771.0.sh
>> qsub -pe smp 4 -hold_jid_ad 47673  -t 1-50:1 
>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__mantra_render_split_3_seed_0.30753.0.sh
>> qsub -pe smp 4 -hold_jid_ad 47673  -t 1-50:1 
>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__mantra_render_split_2_seed_0.30657.0.sh
>> qsub -pe smp 4 -hold_jid_ad 47673  -t 1-50:1 
>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__mantra_render_split_0_seed_0.30680.0.sh
>>
>> qsub -pe smp 1 -hold_jid_ad 47677,47674,47675,47676 -t 1-50:1 
>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__joinexr.30998.0.sh
>>
>> now the command runs fine. but the behaviour of the queue seems to be
>> to want to finish a PE job before moving on to a next one where as
>> what i'd want is to do the splits in a round robin fashion.
>
> I think you are seeing the effect, that once an array job is scheduled it's 
> already eligible to be executed before it will look into other jobs. So you 
> see the array instances of the first job, and then for the first PE and so on.
>
> What you can try is to use the option "-tc N" to limit each of arrays to run 
> N instaces only at a time (`man qsub`). When you have still free slots left, 
> they should getting filled by the next parallel job.
>
> Overall I think the execution time will be the same, but you see the results 
> of the first array indices earlier this way.
>
> -- Reuti

Yes thats the idea. I'm not sure on using the -tc flag though because
it could mean a single job on our farm could be limited to N task
where it could easily be using all of the farm for it's execution.
it's good to know there is no behaviour implemented in sge natively to
do this. stops me searching for it :)

thanks,

Lars

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] round robin PE config

Reply via email to