Re: [gridengine users] Queue concept and resource management for large jobs

Gaya Nadarajan Mon, 06 Aug 2012 01:43:21 -0700

Quoting William Hay <[email protected]> on Mon, 6 Aug 2012 08:51:14 +0100:

On 6 August 2012 08:26, Gaya Nadarajan <[email protected]> wrote:

Quoting Reuti <[email protected]> on Fri, 3 Aug 201212:16:02 +0200:

Hi,

Am 03.08.2012 um 11:55 schrieb Gaya Nadarajan:

I'm relatively new to sge, however I have the task of making
optimal use of 7 multi-core VM nodes (5 in a cluster and 2
individual) for a set of tasks. The grid engine is installed in one
of the individual VMs which is serving as the master node.

I have a set of independent jobs (in hundreds or thousands) which
each have 3 subtasks that have to be run in sequence). I would like
a node to be assigned to the chunk of 3 jobs so that they can share
the data between them.

Reading the sge manual suggests that a resource is really a queue.


Yes, you can say so. You submit a job and SGE selects a queue
instance (i.e. a queue on a particular exechost) for you which
fulfills the resource requests you specified.

Should I create many queues, each with the 3 subtasks so that the
data and task dependencies could be dealt with?


Usually it's best to have as few queues as possible.

Running the jobs in sequence is no problem, you can use  -hold_jid
for it. But the "problem" is the temporary data. I assume, you want
the temporary data on the local disk and not a shared file space,
where it can be accessed by all nodes. The best would be to assemble
one job from your 3 subtasks instead. This way you can also use the
$TMPDIR, which is created and removed by SGE - preferable on a local
/scratch file space.


What you are saying is merge the 3 subtasks into one job? Is there
anyway this can be expressed in sge job submission?


We're suggesting you avoid using grid engine features dor this and
just write a fancy job script something like:


#!/bin/bash
#$ -l h_rt=72:0:0
#$ -l h_vmem=2G
scp user@host:/path/to/data.tar.gz ${TMPDIR}
cd ${TMPDIR}
tar -xvzf data.tar.gz
task1
task2
$(generate command for task 3)

If this doesn't work for you then by making your worker nodes submit
nodes as well (or ssh'ing back to a submit node)
you could have each job qsub its successor using -l h=$(hostname) to
ensure it ends up on the same node

I didn't give enough background on my work. My program itselfgenerates the tasks automatically (using planning) from other programsinvoking it with options passed in, schedules the tasks for executionand monitors their progress (and updates a DB accordingly). I'm usingthe DRMAA_API to dispatch jobs to sge. I'm trying to control thisdependency from within my program itself, which is proving a bit tricky.

This sounds like a good option, instead of generating and dispatchingtasks for execution one by one using sge setting options, I shouldgenerate a script for each chunk of 3 subtasks and send the script forexecution. Generating task 3 would require a callback to my programwhich is not that straightforward right now but I should figure outwhat would be best to get things to work.


Thanks and do post any more comments or suggestions.

Gaya


You are right regarding the data dependency, all 3 subtasks depend on
the same data which is downloaded locally. While I can use hold_jid to
state the dependency between subtask 1 and 2, the command for subtask
3 can only be generated after subtask 2 has finished executing. The
parameters to subtask 3 rely on the result produced by subtask 2. So I
can't even run it with hold_jib option because the command itself is
not complete due to missing parameters.


Any thoughts on these? Thanks.

Otherwise you would need to tell the other two jobs to run on the
same node the first task was scheduled to (or route all by hand
during the submission to individual particular exechosts).

Also I hear that array task dependencies could be good for such
tasks but not sure if I should use them.


I think this is not suitable for your setup. E.g. you have two array
jobs A and B, each running from [1..10]. Then B[1] should be allowed
to start as soon as A[1] finished. Similar for the other 9 job
indices.

-- Reuti

Any advice or suggestions would be much appreciated.


Many Thanks,
Gaya


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users




--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users




--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Queue concept and resource management for large jobs

Reply via email to