[ 
https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202335#comment-16202335
 ] 

Haibo Chen commented on YARN-4511:
----------------------------------

Thanks for the background on YARN-5139, [~leftnoteasy]. 

My understanding of SchedulerNode from scheduler's perspective is that it keeps 
track of the set of allocated containers on a given node and how much
resources of the node are being in use or left for allocation. The 
SchedulerNode is notified whenever there is a container allocated, launched or 
released
on that node to update its bookkeeping. The major change of SchedulerNode in 
this patch is to account for Opportunistic containers in a different way
than we do for Guaranteed containers. Specifically, we don't include resources 
of Opportunistic container in SchedulerNode.allocatedResource. A quick
look at Capacity Scheduler shows me that SchedulerNode is notified of container 
allocation only when allocation proposal is accepted, so I believe this
patch won't change how YARN-5139 behaves.  

{code:java}allocationInThisHeartbeat{code}, however, does need to be changed 
given the way scheduling is not driven by node heartbeat in YARN-5139.
The purpose of this variable is to track how much resources allocated 
containers that have not yet launched are going to use (based on resource 
request,
since they can use all resources they have requested in the worst case if they 
were to run on the node). To illustrate the workflow of this patch and what
allocationInThisHeartbeat is for, let's say on a node of 10 GB of memory, there 
are already 10 containers running (in aggregate requested 10GB of memory)
and the resource utlization reported in the node heartbeat is 5GB of memory, 
there are 2 containers that are just allocated but not yet launched and they
two together request 2GB of memory. In the case of oversubscription, scheduler 
will try to allocate Opportunistic containers based on node resource 
utilization.
5GB is what the running containers are using and 2GB is probably soon to be 
utilized, so the scheduler will think I'd better assume that the resource 
utilization
is 7GB and so only 3GB is left, then decide whether to continue to allocate 
OPPORTUNISTIC containers given the node's overallocation threshold. How the 
3GB is calculated is done by allowedResourceForOverAllocation() and 
allocationInThisHeartbeat.

I am thinking of decoupling allocationInThisHeart from node heartbeat by 
renaming it to resourcesOfContainersPendingLaunch and update it in 
containerStarted()
method instead of resetting every node heartbeat. Let me know what you think.

bq. I'm not sure why we need a separate launchedOnNode flag because we already 
have a launchedContainer map.
This is indeed confusing. The launchedContainer should probably be renamed to 
allocatedContainer and launchedOnNode is to track whether the allocated
container is actually launched on the node. This piece code already exists. I 
can do the renaming if you are fine with it.

bq.  otherwise it gonna be very hard to modify defined protos in a future 
release.
Very much for the same reason you are thinking of here, I am more inclined to 
keep OverAllocationInfo for now. I am not sure if we just have 
ResourceThresholds,
how we can keep backward compatibility in a clean way if we ever want to 
include more for NM overallocation configs. I agree we should do the 
consolidating
with resource profiles before the release, I think we can revisit this topic 
then.


> Common scheduler changes supporting scheduler-specific implementations
> ----------------------------------------------------------------------
>
>                 Key: YARN-4511
>                 URL: https://issues.apache.org/jira/browse/YARN-4511
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Haibo Chen
>         Attachments: YARN-4511-YARN-1011.00.patch, 
> YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch, 
> YARN-4511-YARN-1011.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to