Hi all, I have a large number of jobs that I need to run. Each of these jobs kicks off a number of child jobs. The child jobs do most of the actual work - the parent jobs mostly sit and wait until the child jobs have completed.
Ideally, I would like to kick off all of my parent jobs, and let them spawn off all of their respective child jobs, and wait until everything finishes. But there's a problem with this. If I kick off all of the parent jobs, then the parent jobs take up lots of slots in my grid, and it takes far longer than it should for the grid to work through all of the child jobs, because the parent jobs are taking up so many compute slots. To solve this problem, it occurred to me that it would be nice if I could specify (perhaps by job name) a maximum number of parent jobs that can simultaneously be executing. The way I'm currently working around this problem is the following. I launch one or two parent jobs, then wait until they have spawned their child jobs. At this point all of the slots in my grid have been filled. I then launch the rest of my parent jobs, which don't run, because no slots are available. I then use qmon to lower the priority of my waiting parent jobs. This works OK, but later on I still sometimes end up with too many parent jobs running simultaneously. I've looked through the documentation to try to find a better solution. The closest thing I've found is the -tc flag to qsub, which allows me to limit the number of concurrent array jobs executing. Unfortunately, the parent jobs are not themselves array jobs, and while I suppose I could try to rewrite the parent launch scripts to launch as an array job, this would be less than ideal. I was wondering if anyone has any other ideas on how to specify that no more than n instances of jobs with a specified name should be able to run simultaneously. I'd be open to other mechanisms, too. Thanks, Lane _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
