[
https://issues.apache.org/jira/browse/YARN-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379671#comment-15379671
]
Sunil G commented on YARN-5342:
-------------------------------
Thanks [~Naganarasimha Garla] for the insightful thoughts.
By looking into one aspect like *“improve the allocation for non-exclusive
label when requests are from an application of no_label”*, we can try to help
each such app to go ahead with its allocation on a non-exclusive label by not
waiting for all node heartbeats.
For that I think we can only look in to that very partition (node’s partition
on which a node heartbeat is under processing for an app), and see whether we
can use some resource for this no_label app. Yes, I agree with your top level
view and its good to have an idea about other non-exclusive partition as well.
Since we are having a node with us with some free space in current heartbeat,
if we can push a no_label container here under limits, i think we are solving
problem step by step.
And I very much agree to the comment about the chances of preemption to kick
in. I think a fair balance is to be attained for the speed of allocations for
no_label apps on a label against larger imbalances over queue’s capacity so
that preemption may kick in.
So the checks which I have mentioned can be w.r.t an app or its queue so that
we will try to solve the problem specific to each app by app. A much better and
high level solution may cause lot of refactoring I guess. So suggested a
simpler approach here. Thoughts?
> Improve non-exclusive node partition resource allocation in Capacity Scheduler
> ------------------------------------------------------------------------------
>
> Key: YARN-5342
> URL: https://issues.apache.org/jira/browse/YARN-5342
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Wangda Tan
> Assignee: Sunil G
> Attachments: YARN-5342.1.patch
>
>
> In the previous implementation, one non-exclusive container allocation is
> possible when the missed-opportunity >= #cluster-nodes. And
> missed-opportunity will be reset when container allocated to any node.
> This will slow down the frequency of container allocation on non-exclusive
> node partition: *When a non-exclusive partition=x has idle resource, we can
> only allocate one container for this app in every
> X=nodemanagers.heartbeat-interval secs for the whole cluster.*
> In this JIRA, I propose a fix to reset missed-opporunity only if we have >0
> pending resource for the non-exclusive partition OR we get allocation from
> the default partition.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]