Re: [gridengine users] Queue concept and resource management for large jobs

William Hay Mon, 06 Aug 2012 00:53:05 -0700

On 6 August 2012 08:26, Gaya Nadarajan <[email protected]> wrote:
> Quoting Reuti <[email protected]> on Fri, 3 Aug 2012 12:16:02 +0200:
>
>> Hi,
>>
>> Am 03.08.2012 um 11:55 schrieb Gaya Nadarajan:
>>
>>> I'm relatively new to sge, however I have the task of making
>>> optimal use of 7 multi-core VM nodes (5 in a cluster and 2
>>> individual) for a set of tasks. The grid engine is installed in one
>>> of the individual VMs which is serving as the master node.
>>>
>>> I have a set of independent jobs (in hundreds or thousands) which
>>> each have 3 subtasks that have to be run in sequence). I would like
>>> a node to be assigned to the chunk of 3 jobs so that they can share
>>> the data between them.
>>>
>>> Reading the sge manual suggests that a resource is really a queue.
>>
>> Yes, you can say so. You submit a job and SGE selects a queue
>> instance (i.e. a queue on a particular exechost) for you which
>> fulfills the resource requests you specified.
>>
>>
>>> Should I create many queues, each with the 3 subtasks so that the
>>> data and task dependencies could be dealt with?
>>
>> Usually it's best to have as few queues as possible.
>>
>> Running the jobs in sequence is no problem, you can use  -hold_jid
>> for it. But the "problem" is the temporary data. I assume, you want
>> the temporary data on the local disk and not a shared file space,
>> where it can be accessed by all nodes. The best would be to assemble
>> one job from your 3 subtasks instead. This way you can also use the
>> $TMPDIR, which is created and removed by SGE - preferable on a local
>> /scratch file space.
>>
>
> What you are saying is merge the 3 subtasks into one job? Is there
> anyway this can be expressed in sge job submission?


We're suggesting you avoid using grid engine features dor this and
just write a fancy job script something like:


#!/bin/bash
#$ -l h_rt=72:0:0
#$ -l h_vmem=2G
scp user@host:/path/to/data.tar.gz ${TMPDIR}
cd ${TMPDIR}
tar -xvzf data.tar.gz
task1
task2
$(generate command for task 3)

If this doesn't work for you then by making your worker nodes submit
nodes as well (or ssh'ing back to a submit node)
you could have each job qsub its successor using -l h=$(hostname) to
ensure it ends up on the same node





>
> You are right regarding the data dependency, all 3 subtasks depend on
> the same data which is downloaded locally. While I can use hold_jid to
> state the dependency between subtask 1 and 2, the command for subtask
> 3 can only be generated after subtask 2 has finished executing. The
> parameters to subtask 3 rely on the result produced by subtask 2. So I
> can't even run it with hold_jib option because the command itself is
> not complete due to missing parameters.
>
>
> Any thoughts on these? Thanks.
>
>
>> Otherwise you would need to tell the other two jobs to run on the
>> same node the first task was scheduled to (or route all by hand
>> during the submission to individual particular exechosts).
>>
>>
>>> Also I hear that array task dependencies could be good for such
>>> tasks but not sure if I should use them.
>>
>> I think this is not suitable for your setup. E.g. you have two array
>> jobs A and B, each running from [1..10]. Then B[1] should be allowed
>> to start as soon as A[1] finished. Similar for the other 9 job
>> indices.
>>
>> -- Reuti
>>
>>
>>> Any advice or suggestions would be much appreciated.
>>>
>>>
>>> Many Thanks,
>>> Gaya
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>>
>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Queue concept and resource management for large jobs

Reply via email to