Quoting Reuti <[email protected]> on Fri, 3 Aug 2012 12:16:02 +0200:

Hi,

Am 03.08.2012 um 11:55 schrieb Gaya Nadarajan:

I'm relatively new to sge, however I have the task of making optimal use of 7 multi-core VM nodes (5 in a cluster and 2 individual) for a set of tasks. The grid engine is installed in one of the individual VMs which is serving as the master node.

I have a set of independent jobs (in hundreds or thousands) which each have 3 subtasks that have to be run in sequence). I would like a node to be assigned to the chunk of 3 jobs so that they can share the data between them.

Reading the sge manual suggests that a resource is really a queue.

Yes, you can say so. You submit a job and SGE selects a queue instance (i.e. a queue on a particular exechost) for you which fulfills the resource requests you specified.


Should I create many queues, each with the 3 subtasks so that the data and task dependencies could be dealt with?

Usually it's best to have as few queues as possible.

Running the jobs in sequence is no problem, you can use -hold_jid for it. But the "problem" is the temporary data. I assume, you want the temporary data on the local disk and not a shared file space, where it can be accessed by all nodes. The best would be to assemble one job from your 3 subtasks instead. This way you can also use the $TMPDIR, which is created and removed by SGE - preferable on a local /scratch file space.


What you are saying is merge the 3 subtasks into one job? Is there anyway this can be expressed in sge job submission?

You are right regarding the data dependency, all 3 subtasks depend on the same data which is downloaded locally. While I can use hold_jid to state the dependency between subtask 1 and 2, the command for subtask 3 can only be generated after subtask 2 has finished executing. The parameters to subtask 3 rely on the result produced by subtask 2. So I can't even run it with hold_jib option because the command itself is not complete due to missing parameters.


Any thoughts on these? Thanks.


Otherwise you would need to tell the other two jobs to run on the same node the first task was scheduled to (or route all by hand during the submission to individual particular exechosts).


Also I hear that array task dependencies could be good for such tasks but not sure if I should use them.

I think this is not suitable for your setup. E.g. you have two array jobs A and B, each running from [1..10]. Then B[1] should be allowed to start as soon as A[1] finished. Similar for the other 9 job indices.

-- Reuti


Any advice or suggestions would be much appreciated.


Many Thanks,
Gaya


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users






--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to