[ 
https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193897#comment-16193897
 ] 

Haibo Chen commented on YARN-4511:
----------------------------------

Thanks for the reviews, [~leftnoteasy]!

The scope of the variable allocationInThisHeartbeat is supposed to be a round 
of allocation on a given node. In the case of fair scheduler, it is every 
nodeUpdate() method call. Can you point out where else do we also do allocation 
that is decoupled from heartbeat? Here is what this variable is used for in the 
case of fair scheduler: every time there is a node update, we'd do allocation 
of guaranteed containers first followed by opportunistic containers. W need to 
consider the just-allocated-yet-to-launch guaranteed containers to project how 
much resource we have left to allocate opportunistic containers. Any guidance 
on how to handle allocation decouple from heartbeat is appreciated.  

bq. In what case the oldNode and newNode will be null and should we throw 
exception when this happens?
This is more to avoid NPE defensively. I guess you are right, we should 
probably throw exception since it indicates errors and will lead to problems if 
we update one but not the other.

bq. This part of logic looks confusing, since old containers will be finished 
inside pullNewlyUpdatedContainers, do we really need this method? 
This arises from the fact that RMContainer events trigger update of 
schedulerNode, whereas the current implementation of container 
incr/decr/promotion/demotion only swaps the underly Container instance for 
RMContainers, which does not trigger schedulerNode updates, so the resulting 
accounting on that node is incorrect. [~asuresh] any suggestion to simply this?

bq. Why containerLaunched is added? Should we just increase allocated 
opportunistic/guaranteed resource?
I only try to preserve the containerLaunched flag. Can you be more specific 
about what you're referring to in the patch?

bq. I think one of the ResourceThresholds and OverAllocationInfo should be 
removed, they're kind of duplicated. We should try to reduce unnecessary 
#PB-records.
Good point. The initial intension was to include more than resourceThresholds 
in OverAllocationInfo, such as per node max allocation in terms of a 
percentage. We may be able to get rid of OverAllocationInfo down the road if 
there is no such need any more.

There is a jira open to consolidate with Resource Profiles (YARN-6690). Is that 
a good place to do the work to accommodate other resources?


> Common scheduler changes supporting scheduler-specific implementations
> ----------------------------------------------------------------------
>
>                 Key: YARN-4511
>                 URL: https://issues.apache.org/jira/browse/YARN-4511
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Haibo Chen
>         Attachments: YARN-4511-YARN-1011.00.patch, 
> YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to