[ 
https://issues.apache.org/jira/browse/YARN-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379671#comment-15379671
 ] 

Sunil G commented on YARN-5342:
-------------------------------

Thanks [~Naganarasimha Garla] for the insightful thoughts.

By looking into one aspect like *“improve the allocation for non-exclusive 
label when requests are from an application of no_label”*, we can try to help 
each such app to go ahead with its allocation on a non-exclusive label by not 
waiting for all node heartbeats.
For that I think we can only look in to that very partition (node’s partition 
on which a node heartbeat is under processing for an app), and see whether we 
can use some resource for this no_label app. Yes, I agree with your top level 
view and its good to have an idea about other non-exclusive partition as well. 
Since we are having a node with us with some free space in current heartbeat, 
if we can push a no_label container here under limits, i think we are solving 
problem step by step.
And I very much agree to the comment about the chances of preemption to kick 
in. I think a fair balance is to be attained for the speed of allocations for 
no_label apps on a label against larger imbalances over queue’s capacity so 
that preemption may kick in.

So the checks which I have mentioned can be w.r.t an app or its queue so that 
we will try to solve the problem specific to each app by app. A much better and 
high level solution may cause lot of refactoring I guess. So suggested a 
simpler approach here. Thoughts?

> Improve non-exclusive node partition resource allocation in Capacity Scheduler
> ------------------------------------------------------------------------------
>
>                 Key: YARN-5342
>                 URL: https://issues.apache.org/jira/browse/YARN-5342
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Sunil G
>         Attachments: YARN-5342.1.patch
>
>
> In the previous implementation, one non-exclusive container allocation is 
> possible when the missed-opportunity >= #cluster-nodes. And 
> missed-opportunity will be reset when container allocated to any node.
> This will slow down the frequency of container allocation on non-exclusive 
> node partition: *When a non-exclusive partition=x has idle resource, we can 
> only allocate one container for this app in every 
> X=nodemanagers.heartbeat-interval secs for the whole cluster.*
> In this JIRA, I propose a fix to reset missed-opporunity only if we have >0 
> pending resource for the non-exclusive partition OR we get allocation from 
> the default partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to