[slurm-dev] Re: QOS GrpNodes problem...?

Danny Auble Wed, 12 Sep 2012 17:35:06 -0700

If you provide a patch for the documentation we will apply it for the 
next release.


Thanks,
Danny

On 09/12/12 17:10, Evren Yurtesen IB wrote:
>
> Hello,
>
> Thanks for the information and th detailed explanation.
>
> Perhaps there could be a partial solution where the counter is not 
> incremented if the assigned node (after selecting a node as usual) is 
> already running a task from same group and at exit counter is not 
> decremented if the node is still running a task from same group? This 
> could at least stop unnecessary increments of the counter.
>
> If I understand correctly, for a complete solution, in addition to the 
> previous partial solution. The node selection would  have to first 
> check all nodes where group processes currently are running and still 
> have enough resources for the new task. Then possible number of found 
> nodes would be substracted from the requested number to find if it is 
> within the limit. Which would have high overhead...
>
> So, is it possible to implement the partial solution at least? If not, 
> I feel this should be documented. Perhaps a notice that if the tasks 
> are assigned to same node, the node is counted twice.
>
> Thanks,
> Evren
>
> On Wed, 12 Sep 2012, Danny Auble wrote:
>
>>
>> This would actually be a bit more involved.  When this check is done the
>> nodes haven't been assigned to the job yet.  So we would have to pull
>> this logic into the select plugin as well to pick the correct nodes a
>> job could use without going over the limit.  IMHO this is way more
>> complexity and overhead than a return on investment would present.
>>
>> Danny
>>
>> On 09/12/12 09:53, Moe Jette wrote:
>>> The code current increments and decrements a counter when jobs start
>>> and end. It would be possible to track the specific nodes allocated to
>>> each group and avoid counting nodes twice, but that would require code
>>> changes and higher overhead than simply incrementing and decrementing
>>> a counter.
>>>
>>> Quoting Evren Yurtesen IB <[email protected]>:
>>>
>>>>
>>>>
>>>> On Tue, 11 Sep 2012, Danny Auble wrote:
>>>>
>>>>> On 09/11/12 05:59, Evren Yurtesen IB wrote:
>>>>>> Hello,
>>>>>>
>>>>>> We have a cluster with 12 cores on each node. I made a QOS entry
>>>>>> with GrpNodes 4 (I dont want this group of users to be able to use
>>>>>> more than 4 nodes)
>>>>>>
>>>>>> If somebody queues tasks running (2 jobs on the same node):
>>>>>>
>>>>>> job 1 - node 1
>>>>>> job 2 - node 1
>>>>>> job 3 - node 2
>>>>>> job 4 - node 2
>>>>>>
>>>>>> It looks like slurm is thinking 4 nodes are used? Because I see
>>>>>> the next task queued in the system shows Nodes 1 and pending due
>>>>>> to QOSResourceLimit. In my opinion, 2 nodes are used? :)
>>>>>>
>>>>>> Could it be that it counts same nodes again because they are in
>>>>>> different jobs? (v2.4.2 is used on this system)
>>>>>>
>>>>> Yes
>>>> Well, is there a way to make it count each node once? :)
>>>>
>>>> Thanks,
>>>> Evren
>>
>>

[slurm-dev] Re: QOS GrpNodes problem...?

Reply via email to