[
https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202335#comment-16202335
]
Haibo Chen commented on YARN-4511:
----------------------------------
Thanks for the background on YARN-5139, [~leftnoteasy].
My understanding of SchedulerNode from scheduler's perspective is that it keeps
track of the set of allocated containers on a given node and how much
resources of the node are being in use or left for allocation. The
SchedulerNode is notified whenever there is a container allocated, launched or
released
on that node to update its bookkeeping. The major change of SchedulerNode in
this patch is to account for Opportunistic containers in a different way
than we do for Guaranteed containers. Specifically, we don't include resources
of Opportunistic container in SchedulerNode.allocatedResource. A quick
look at Capacity Scheduler shows me that SchedulerNode is notified of container
allocation only when allocation proposal is accepted, so I believe this
patch won't change how YARN-5139 behaves.
{code:java}allocationInThisHeartbeat{code}, however, does need to be changed
given the way scheduling is not driven by node heartbeat in YARN-5139.
The purpose of this variable is to track how much resources allocated
containers that have not yet launched are going to use (based on resource
request,
since they can use all resources they have requested in the worst case if they
were to run on the node). To illustrate the workflow of this patch and what
allocationInThisHeartbeat is for, let's say on a node of 10 GB of memory, there
are already 10 containers running (in aggregate requested 10GB of memory)
and the resource utlization reported in the node heartbeat is 5GB of memory,
there are 2 containers that are just allocated but not yet launched and they
two together request 2GB of memory. In the case of oversubscription, scheduler
will try to allocate Opportunistic containers based on node resource
utilization.
5GB is what the running containers are using and 2GB is probably soon to be
utilized, so the scheduler will think I'd better assume that the resource
utilization
is 7GB and so only 3GB is left, then decide whether to continue to allocate
OPPORTUNISTIC containers given the node's overallocation threshold. How the
3GB is calculated is done by allowedResourceForOverAllocation() and
allocationInThisHeartbeat.
I am thinking of decoupling allocationInThisHeart from node heartbeat by
renaming it to resourcesOfContainersPendingLaunch and update it in
containerStarted()
method instead of resetting every node heartbeat. Let me know what you think.
bq. I'm not sure why we need a separate launchedOnNode flag because we already
have a launchedContainer map.
This is indeed confusing. The launchedContainer should probably be renamed to
allocatedContainer and launchedOnNode is to track whether the allocated
container is actually launched on the node. This piece code already exists. I
can do the renaming if you are fine with it.
bq. otherwise it gonna be very hard to modify defined protos in a future
release.
Very much for the same reason you are thinking of here, I am more inclined to
keep OverAllocationInfo for now. I am not sure if we just have
ResourceThresholds,
how we can keep backward compatibility in a clean way if we ever want to
include more for NM overallocation configs. I agree we should do the
consolidating
with resource profiles before the release, I think we can revisit this topic
then.
> Common scheduler changes supporting scheduler-specific implementations
> ----------------------------------------------------------------------
>
> Key: YARN-4511
> URL: https://issues.apache.org/jira/browse/YARN-4511
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Wangda Tan
> Assignee: Haibo Chen
> Attachments: YARN-4511-YARN-1011.00.patch,
> YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch,
> YARN-4511-YARN-1011.03.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]