Re: [gridengine users] Queue concept and resource management for large jobs

Gaya Nadarajan Mon, 06 Aug 2012 00:29:02 -0700

Quoting Reuti <[email protected]> on Fri, 3 Aug 2012 12:16:02 +0200:

Hi,

Am 03.08.2012 um 11:55 schrieb Gaya Nadarajan:
I'm relatively new to sge, however I have the task of makingoptimal use of 7 multi-core VM nodes (5 in a cluster and 2individual) for a set of tasks. The grid engine is installed in oneof the individual VMs which is serving as the master node.
I have a set of independent jobs (in hundreds or thousands) whicheach have 3 subtasks that have to be run in sequence). I would likea node to be assigned to the chunk of 3 jobs so that they can sharethe data between them.
Reading the sge manual suggests that a resource is really a queue.
Yes, you can say so. You submit a job and SGE selects a queueinstance (i.e. a queue on a particular exechost) for you whichfulfills the resource requests you specified.
Should I create many queues, each with the 3 subtasks so that thedata and task dependencies could be dealt with?
Usually it's best to have as few queues as possible.
Running the jobs in sequence is no problem, you can use -hold_jidfor it. But the "problem" is the temporary data. I assume, you wantthe temporary data on the local disk and not a shared file space,where it can be accessed by all nodes. The best would be to assembleone job from your 3 subtasks instead. This way you can also use the$TMPDIR, which is created and removed by SGE - preferable on a local/scratch file space.

What you are saying is merge the 3 subtasks into one job? Is thereanyway this can be expressed in sge job submission?

You are right regarding the data dependency, all 3 subtasks depend onthe same data which is downloaded locally. While I can use hold_jid tostate the dependency between subtask 1 and 2, the command for subtask3 can only be generated after subtask 2 has finished executing. Theparameters to subtask 3 rely on the result produced by subtask 2. So Ican't even run it with hold_jib option because the command itself isnot complete due to missing parameters.



Any thoughts on these? Thanks.

Otherwise you would need to tell the other two jobs to run on thesame node the first task was scheduled to (or route all by handduring the submission to individual particular exechosts).
Also I hear that array task dependencies could be good for suchtasks but not sure if I should use them.
I think this is not suitable for your setup. E.g. you have two arrayjobs A and B, each running from [1..10]. Then B[1] should be allowedto start as soon as A[1] finished. Similar for the other 9 jobindices.
-- Reuti
Any advice or suggestions would be much appreciated.


Many Thanks,
Gaya


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users




--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Queue concept and resource management for large jobs

Reply via email to