If you provide a patch for the documentation we will apply it for the next release.
Thanks, Danny On 09/12/12 17:10, Evren Yurtesen IB wrote: > > Hello, > > Thanks for the information and th detailed explanation. > > Perhaps there could be a partial solution where the counter is not > incremented if the assigned node (after selecting a node as usual) is > already running a task from same group and at exit counter is not > decremented if the node is still running a task from same group? This > could at least stop unnecessary increments of the counter. > > If I understand correctly, for a complete solution, in addition to the > previous partial solution. The node selection would have to first > check all nodes where group processes currently are running and still > have enough resources for the new task. Then possible number of found > nodes would be substracted from the requested number to find if it is > within the limit. Which would have high overhead... > > So, is it possible to implement the partial solution at least? If not, > I feel this should be documented. Perhaps a notice that if the tasks > are assigned to same node, the node is counted twice. > > Thanks, > Evren > > On Wed, 12 Sep 2012, Danny Auble wrote: > >> >> This would actually be a bit more involved. When this check is done the >> nodes haven't been assigned to the job yet. So we would have to pull >> this logic into the select plugin as well to pick the correct nodes a >> job could use without going over the limit. IMHO this is way more >> complexity and overhead than a return on investment would present. >> >> Danny >> >> On 09/12/12 09:53, Moe Jette wrote: >>> The code current increments and decrements a counter when jobs start >>> and end. It would be possible to track the specific nodes allocated to >>> each group and avoid counting nodes twice, but that would require code >>> changes and higher overhead than simply incrementing and decrementing >>> a counter. >>> >>> Quoting Evren Yurtesen IB <[email protected]>: >>> >>>> >>>> >>>> On Tue, 11 Sep 2012, Danny Auble wrote: >>>> >>>>> On 09/11/12 05:59, Evren Yurtesen IB wrote: >>>>>> Hello, >>>>>> >>>>>> We have a cluster with 12 cores on each node. I made a QOS entry >>>>>> with GrpNodes 4 (I dont want this group of users to be able to use >>>>>> more than 4 nodes) >>>>>> >>>>>> If somebody queues tasks running (2 jobs on the same node): >>>>>> >>>>>> job 1 - node 1 >>>>>> job 2 - node 1 >>>>>> job 3 - node 2 >>>>>> job 4 - node 2 >>>>>> >>>>>> It looks like slurm is thinking 4 nodes are used? Because I see >>>>>> the next task queued in the system shows Nodes 1 and pending due >>>>>> to QOSResourceLimit. In my opinion, 2 nodes are used? :) >>>>>> >>>>>> Could it be that it counts same nodes again because they are in >>>>>> different jobs? (v2.4.2 is used on this system) >>>>>> >>>>> Yes >>>> Well, is there a way to make it count each node once? :) >>>> >>>> Thanks, >>>> Evren >> >>
