On Wed, 18 Jan 2012 12:02:11 -0700
Moe Jette <je...@schedmd.com> wrote:

> We have held some discussions on this subject and it isn't simple to  
> resolve. The best way to do this would probably be to establish  
> finer-grained locking so there can be more parallelism, say by locking  
> individual job records rather than the entire job list. That would  
> impact quite a few sub-systems, for example how we preserve job state.
> 
> If you could submit a smaller number of jobs that each have many job  
> steps, that could address your problem today (say submitting 1000 jobs  
> each with 1000 steps).

Problem is (but correct me if I'm wrong), if I submit a job with 1000 steps, I 
have to manually manage the concurrency among steps. I think we discussed this 
already in the list, when I was talking about implementing "swait" to group 
jobs and why in the end it wasn't a good solution for the way steps are 
actually run.

Also, I have to request a specific cpu/node count (which basically means: the 
entire cluster), which I would like to avoid. Once the cluster is allocated all 
the other features such as priority etc would become useless.

Reply via email to