On Mon, Feb 6, 2012 at 4:54 PM, Burian, John
<[email protected]> wrote:
>
> On 2/6/12 4:25 PM, "Lane Schwartz" <[email protected]> wrote:
>
>>Hi all,
>>
>>I have a large number of jobs that I need to run. Each of these jobs
>>kicks off a number of child jobs. The child jobs do most of the actual
>>work - the parent jobs mostly sit and wait until the child jobs have
>>completed.
>>
>>Ideally, I would like to kick off all of my parent jobs, and let them
>>spawn off all of their respective child jobs, and wait until
>>everything finishes. But there's a problem with this. If I kick off
>>all of the parent jobs, then the parent jobs take up lots of slots in
>>my grid, and it takes far longer than it should for the grid to work
>>through all of the child jobs, because the parent jobs are taking up
>>so many compute slots.
>>
>>To solve this problem, it occurred to me that it would be nice if I
>>could specify (perhaps by job name) a maximum number of parent jobs
>>that can simultaneously be executing.
>>
>>The way I'm currently working around this problem is the following. I
>>launch one or two parent jobs, then wait until they have spawned their
>>child jobs. At this point all of the slots in my grid have been
>>filled. I then launch the rest of my parent jobs, which don't run,
>>because no slots are available. I then use qmon to lower the priority
>>of my waiting parent jobs. This works OK, but later on I still
>>sometimes end up with too many parent jobs running simultaneously.
>>
>>I've looked through the documentation to try to find a better
>>solution. The closest thing I've found is the -tc flag to qsub, which
>>allows me to limit the number of concurrent array jobs executing.
>>Unfortunately, the parent jobs are not themselves array jobs, and
>>while I suppose I could try to rewrite the parent launch scripts to
>>launch as an array job, this would be less than ideal.
>>
>>I was wondering if anyone has any other ideas on how to specify that
>>no more than n instances of jobs with a specified name should be able
>>to run simultaneously. I'd be open to other mechanisms, too.
>>
>
> We do something similar, but we accomplish it differently. We have a
> script that runs on the submit host that identifies how many chunks the
> input dataset will be divided into, then submits an array job to process
> that many chunks. This array job is submitted with a name (using '-N
> <name>') that is generated by the script. The script then submits an
> 'accumulation' job that assembles the results of the array job, but uses
> 'qsub -hold_jid <name>' so it waits in queue until all tasks of the array
> job finish. Of course, if your child jobs have to actually talk to your
> parent job periodically, this won't do you much good.

John,

My child jobs do not need to communicate with the parent job. But I
don't see how your solution solves the problem of too many parent jobs
running simultaneously. Am I missing something?

Thanks,
Lane
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to