They just suggest that you put 4 job in one script Regards Sent from my iPhone
On Aug 6, 2012, at 3:26 AM, Gaya Nadarajan <[email protected]> wrote: > Quoting Reuti <[email protected]> on Fri, 3 Aug 2012 12:16:02 +0200: > >> Hi, >> >> Am 03.08.2012 um 11:55 schrieb Gaya Nadarajan: >> >>> I'm relatively new to sge, however I have the task of making optimal use of >>> 7 multi-core VM nodes (5 in a cluster and 2 individual) for a set of tasks. >>> The grid engine is installed in one of the individual VMs which is serving >>> as the master node. >>> >>> I have a set of independent jobs (in hundreds or thousands) which each have >>> 3 subtasks that have to be run in sequence). I would like a node to be >>> assigned to the chunk of 3 jobs so that they can share the data between >>> them. >>> >>> Reading the sge manual suggests that a resource is really a queue. >> >> Yes, you can say so. You submit a job and SGE selects a queue instance (i.e. >> a queue on a particular exechost) for you which fulfills the resource >> requests you specified. >> >> >>> Should I create many queues, each with the 3 subtasks so that the data and >>> task dependencies could be dealt with? >> >> Usually it's best to have as few queues as possible. >> >> Running the jobs in sequence is no problem, you can use -hold_jid for it. >> But the "problem" is the temporary data. I assume, you want the temporary >> data on the local disk and not a shared file space, where it can be accessed >> by all nodes. The best would be to assemble one job from your 3 subtasks >> instead. This way you can also use the $TMPDIR, which is created and removed >> by SGE - preferable on a local /scratch file space. >> > > What you are saying is merge the 3 subtasks into one job? Is there anyway > this can be expressed in sge job submission? > > You are right regarding the data dependency, all 3 subtasks depend on the > same data which is downloaded locally. While I can use hold_jid to state the > dependency between subtask 1 and 2, the command for subtask 3 can only be > generated after subtask 2 has finished executing. The parameters to subtask 3 > rely on the result produced by subtask 2. So I can't even run it with > hold_jib option because the command itself is not complete due to missing > parameters. > > > Any thoughts on these? Thanks. > > >> Otherwise you would need to tell the other two jobs to run on the same node >> the first task was scheduled to (or route all by hand during the submission >> to individual particular exechosts). >> >> >>> Also I hear that array task dependencies could be good for such tasks but >>> not sure if I should use them. >> >> I think this is not suitable for your setup. E.g. you have two array jobs A >> and B, each running from [1..10]. Then B[1] should be allowed to start as >> soon as A[1] finished. Similar for the other 9 job indices. >> >> -- Reuti >> >> >>> Any advice or suggestions would be much appreciated. >>> >>> >>> Many Thanks, >>> Gaya >>> >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >> >> >> > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
