Am 14.12.2011 um 10:48 schrieb Lars van der bijl: > On 13 December 2011 18:38, Reuti <[email protected]> wrote: >> Hej Lars, >> >> Am 13.12.2011 um 16:57 schrieb Lars van der bijl: >> >>> hey Reuti, >>> >>> I wrote a python api using networkx and a database layer called clue >>> so i've removed a few of the flags. >>> >>> qsub -r y -l mem_free=1.9G,s_vmem=2G,hbatch=1 -pe smp 1 -N >> >> is there any specific reason to request "-pe smp 1"? > > we pass the requested slots this way. we don't use anything other then > PE tasks. it makes it simple in our situation. > >> >>> qsub -pe smp 1 -t 1-50:5 >>> /tmp/gridTask_sshot3__out__rs_mantra1__ifd_gen.30851.0.sh >>> >>> qsub -pe smp 4 -hold_jid_ad 47673 -t 1-50:1 >>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__mantra_render_split_1_seed_0.30771.0.sh >>> qsub -pe smp 4 -hold_jid_ad 47673 -t 1-50:1 >>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__mantra_render_split_3_seed_0.30753.0.sh >>> qsub -pe smp 4 -hold_jid_ad 47673 -t 1-50:1 >>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__mantra_render_split_2_seed_0.30657.0.sh >>> qsub -pe smp 4 -hold_jid_ad 47673 -t 1-50:1 >>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__mantra_render_split_0_seed_0.30680.0.sh >>> >>> qsub -pe smp 1 -hold_jid_ad 47677,47674,47675,47676 -t 1-50:1 >>> /tmp/gridTask_sshot3__out__rs_mantra1__rs_mantra_tile1__joinexr.30998.0.sh >>> >>> now the command runs fine. but the behaviour of the queue seems to be >>> to want to finish a PE job before moving on to a next one where as >>> what i'd want is to do the splits in a round robin fashion. >> >> I think you are seeing the effect, that once an array job is scheduled it's >> already eligible to be executed before it will look into other jobs. So you >> see the array instances of the first job, and then for the first PE and so >> on. >> >> What you can try is to use the option "-tc N" to limit each of arrays to run >> N instaces only at a time (`man qsub`). When you have still free slots left, >> they should getting filled by the next parallel job. >> >> Overall I think the execution time will be the same, but you see the results >> of the first array indices earlier this way. >> >> -- Reuti > > Yes thats the idea. I'm not sure on using the -tc flag though because > it could mean a single job on our farm could be limited to N task > where it could easily be using all of the farm for it's execution.
Yep, the "-tc N" isn't "elastic" or whatever you may want to call it. Another approach: combine all four render jobs into one, i.e. you need 16 slots with a fixed allocation rule of 4. Inside the jobscript you spread 3 tasks to other nodes by `qrsh -inherit ... &` and start a local one like usual. Before exiting wait for the three external ones to return. This would assume, that the 4 jobs have more or less the same runtime per index to avoid idling cores. -- Reuti _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
