[
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun Suresh updated YARN-4597:
------------------------------
Attachment: YARN-4597.003.patch
Updating patch based on the thoughtful reviews from [~vvasudev], [~kasha] and
[~jianhe] ..
bq. The methods for killing containers as needed all seem to be hardcoded to
only consider allocated resources. Can we abstract it out further to allow for
passing either allocation or utilization based on whether oversubscription is
enabled
In the latest patch, I've introduced a {{ResourceUtilizationManager}}, the
default implementation accumulates the allocated container resources. This can
be made pluggable once we have resource utilization.
bq. Can you explain why we need the synchronized block here - {code} +
synchronized (this.containersAllocation) {code}
Aah.. its not required, removed it since access to this class is actually
serialized.
bq. resourcesToFreeUp is initialized to container allocation on the node.
Shouldn't it be initialized to zero? May be I am missing something
The algorithm is as follows, assuming an NM has 10 slots, they are currently
full (7 guaranteed and 3 opportunistic), and assume an incoming Guar container
request:
# Start with currently utilized/allocated slots : 10
# Increase value of 1) with any guaranteed containers yet to start : 10 + 1(say
1 guaranteed request had come in before this) = 11
# Subtract from 2) total resources available to all containers : 11 - 10 = 1
# From 3) keep subtracting resources of all running Opportunistic containers
(which have not already been marked to kill, in reverse startup order) till the
value goes < 0.
# Kill all opportunistic containers identified in 4.
bq. I see there are two code paths leading to starting a container for when
enough resources are available or not. Did you consider a single path where we
queue containers directly and let another thread launch them.
Good point. I did think about that, but that would introduce having the logic
in a dedicated thread, which can be used to run containers. I was thinking we
keep it as it is right now till a point when we really warrant that complexity.
The tests (except {{TestContainer}}) all seem to run fine locally.
> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Chris Douglas
> Assignee: Arun Suresh
> Attachments: YARN-4597.001.patch, YARN-4597.002.patch,
> YARN-4597.003.patch
>
>
> Currently, the NM immediately launches containers after resource
> localization. Several features could be more cleanly implemented if the NM
> included a separate stage for reserving resources.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]