On Mon, Feb 6, 2012 at 4:54 PM, Burian, John <[email protected]> wrote: > > On 2/6/12 4:25 PM, "Lane Schwartz" <[email protected]> wrote: > >>Hi all, >> >>I have a large number of jobs that I need to run. Each of these jobs >>kicks off a number of child jobs. The child jobs do most of the actual >>work - the parent jobs mostly sit and wait until the child jobs have >>completed. >> >>Ideally, I would like to kick off all of my parent jobs, and let them >>spawn off all of their respective child jobs, and wait until >>everything finishes. But there's a problem with this. If I kick off >>all of the parent jobs, then the parent jobs take up lots of slots in >>my grid, and it takes far longer than it should for the grid to work >>through all of the child jobs, because the parent jobs are taking up >>so many compute slots. >> >>To solve this problem, it occurred to me that it would be nice if I >>could specify (perhaps by job name) a maximum number of parent jobs >>that can simultaneously be executing. >> >>The way I'm currently working around this problem is the following. I >>launch one or two parent jobs, then wait until they have spawned their >>child jobs. At this point all of the slots in my grid have been >>filled. I then launch the rest of my parent jobs, which don't run, >>because no slots are available. I then use qmon to lower the priority >>of my waiting parent jobs. This works OK, but later on I still >>sometimes end up with too many parent jobs running simultaneously. >> >>I've looked through the documentation to try to find a better >>solution. The closest thing I've found is the -tc flag to qsub, which >>allows me to limit the number of concurrent array jobs executing. >>Unfortunately, the parent jobs are not themselves array jobs, and >>while I suppose I could try to rewrite the parent launch scripts to >>launch as an array job, this would be less than ideal. >> >>I was wondering if anyone has any other ideas on how to specify that >>no more than n instances of jobs with a specified name should be able >>to run simultaneously. I'd be open to other mechanisms, too. >> > > We do something similar, but we accomplish it differently. We have a > script that runs on the submit host that identifies how many chunks the > input dataset will be divided into, then submits an array job to process > that many chunks. This array job is submitted with a name (using '-N > <name>') that is generated by the script. The script then submits an > 'accumulation' job that assembles the results of the array job, but uses > 'qsub -hold_jid <name>' so it waits in queue until all tasks of the array > job finish. Of course, if your child jobs have to actually talk to your > parent job periodically, this won't do you much good.
John, My child jobs do not need to communicate with the parent job. But I don't see how your solution solves the problem of too many parent jobs running simultaneously. Am I missing something? Thanks, Lane _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
