Re: [gridengine users] Queue concept and resource management for large jobs

Hung-sheng Tsao Mon, 06 Aug 2012 03:35:03 -0700

Keep in mind
That you can always copy the result back to central server
Then copy to the new execd host
So you do not need to run the next. Job in the same host
Regards



Sent from my iPhone

On Aug 6, 2012, at 4:33 AM, Gaya Nadarajan <[email protected]> wrote:

> Quoting William Hay <[email protected]> on Mon, 6 Aug 2012 08:51:14 +0100:
> 
>> On 6 August 2012 08:26, Gaya Nadarajan <[email protected]> wrote:
>>> Quoting Reuti <[email protected]> on Fri, 3 Aug 2012 12:16:02 
>>> +0200:
>>> 
>>>> Hi,
>>>> 
>>>> Am 03.08.2012 um 11:55 schrieb Gaya Nadarajan:
>>>> 
>>>>> I'm relatively new to sge, however I have the task of making
>>>>> optimal use of 7 multi-core VM nodes (5 in a cluster and 2
>>>>> individual) for a set of tasks. The grid engine is installed in one
>>>>> of the individual VMs which is serving as the master node.
>>>>> 
>>>>> I have a set of independent jobs (in hundreds or thousands) which
>>>>> each have 3 subtasks that have to be run in sequence). I would like
>>>>> a node to be assigned to the chunk of 3 jobs so that they can share
>>>>> the data between them.
>>>>> 
>>>>> Reading the sge manual suggests that a resource is really a queue.
>>>> 
>>>> Yes, you can say so. You submit a job and SGE selects a queue
>>>> instance (i.e. a queue on a particular exechost) for you which
>>>> fulfills the resource requests you specified.
>>>> 
>>>> 
>>>>> Should I create many queues, each with the 3 subtasks so that the
>>>>> data and task dependencies could be dealt with?
>>>> 
>>>> Usually it's best to have as few queues as possible.
>>>> 
>>>> Running the jobs in sequence is no problem, you can use  -hold_jid
>>>> for it. But the "problem" is the temporary data. I assume, you want
>>>> the temporary data on the local disk and not a shared file space,
>>>> where it can be accessed by all nodes. The best would be to assemble
>>>> one job from your 3 subtasks instead. This way you can also use the
>>>> $TMPDIR, which is created and removed by SGE - preferable on a local
>>>> /scratch file space.
>>>> 
>>> 
>>> What you are saying is merge the 3 subtasks into one job? Is there
>>> anyway this can be expressed in sge job submission?
>> 
>> We're suggesting you avoid using grid engine features dor this and
>> just write a fancy job script something like:
>> 
>> 
>> #!/bin/bash
>> #$ -l h_rt=72:0:0
>> #$ -l h_vmem=2G
>> scp user@host:/path/to/data.tar.gz ${TMPDIR}
>> cd ${TMPDIR}
>> tar -xvzf data.tar.gz
>> task1
>> task2
>> $(generate command for task 3)
>> 
>> If this doesn't work for you then by making your worker nodes submit
>> nodes as well (or ssh'ing back to a submit node)
>> you could have each job qsub its successor using -l h=$(hostname) to
>> ensure it ends up on the same node
> 
> 
> I didn't give enough background on my work. My program itself generates the 
> tasks automatically (using planning) from other programs invoking it with 
> options passed in, schedules the tasks for execution and monitors their 
> progress (and updates a DB accordingly). I'm using the DRMAA_API to dispatch 
> jobs to sge. I'm trying to control this dependency from within my program 
> itself, which is proving a bit tricky.
> 
> This sounds like a good option, instead of generating and dispatching tasks 
> for execution one by one using sge setting options, I should generate a 
> script for each chunk of 3 subtasks and send the script for execution. 
> Generating task 3 would require a callback to my program which is not that 
> straightforward right now but I should figure out what would be best to get 
> things to work.
> 
> Thanks and do post any more comments or suggestions.
> 
> Gaya
> 
>> 
>> 
>> 
>> 
>> 
>>> 
>>> You are right regarding the data dependency, all 3 subtasks depend on
>>> the same data which is downloaded locally. While I can use hold_jid to
>>> state the dependency between subtask 1 and 2, the command for subtask
>>> 3 can only be generated after subtask 2 has finished executing. The
>>> parameters to subtask 3 rely on the result produced by subtask 2. So I
>>> can't even run it with hold_jib option because the command itself is
>>> not complete due to missing parameters.
>>> 
>>> 
>>> Any thoughts on these? Thanks.
>>> 
>>> 
>>>> Otherwise you would need to tell the other two jobs to run on the
>>>> same node the first task was scheduled to (or route all by hand
>>>> during the submission to individual particular exechosts).
>>>> 
>>>> 
>>>>> Also I hear that array task dependencies could be good for such
>>>>> tasks but not sure if I should use them.
>>>> 
>>>> I think this is not suitable for your setup. E.g. you have two array
>>>> jobs A and B, each running from [1..10]. Then B[1] should be allowed
>>>> to start as soon as A[1] finished. Similar for the other 9 job
>>>> indices.
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>>> Any advice or suggestions would be much appreciated.
>>>>> 
>>>>> 
>>>>> Many Thanks,
>>>>> Gaya
>>>>> 
>>>>> 
>>>>> --
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>>> 
>>> 
>> 
>> 
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Queue concept and resource management for large jobs

Reply via email to