[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4597:
------------------------------
    Attachment: YARN-4597.003.patch

Updating patch based on the thoughtful reviews from [~vvasudev], [~kasha] and 
[~jianhe] ..

bq. The methods for killing containers as needed all seem to be hardcoded to 
only consider allocated resources. Can we abstract it out further to allow for 
passing either allocation or utilization based on whether oversubscription is 
enabled
In the latest patch, I've introduced a {{ResourceUtilizationManager}}, the 
default implementation accumulates the allocated container resources. This can 
be made pluggable once we have resource utilization.

bq. Can you explain why we need the synchronized block here - {code} +    
synchronized (this.containersAllocation) {code}
Aah.. its not required, removed it since access to this class is actually 
serialized.

bq. resourcesToFreeUp is initialized to container allocation on the node. 
Shouldn't it be initialized to zero? May be I am missing something
The algorithm is as follows, assuming an NM has 10 slots, they are currently 
full (7 guaranteed and 3 opportunistic), and assume an incoming Guar container 
request:
# Start with currently utilized/allocated slots : 10
# Increase value of 1) with any guaranteed containers yet to start : 10 + 1(say 
1 guaranteed request had come in before this) = 11
# Subtract from 2) total resources available to all containers : 11 - 10 = 1
# From 3) keep subtracting resources of all running Opportunistic containers 
(which have not already been marked to kill, in reverse startup order) till the 
value goes < 0.
# Kill all opportunistic containers identified in 4.

bq.  I see there are two code paths leading to starting a container for when 
enough resources are available or not. Did you consider a single path where we 
queue containers directly and let another thread launch them.
Good point. I did think about that, but that would introduce having the logic 
in a dedicated thread, which can be used to run containers. I was thinking we 
keep it as it is right now till a point when we really warrant that complexity.

The tests (except {{TestContainer}}) all seem to run fine locally.



> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
>                 Key: YARN-4597
>                 URL: https://issues.apache.org/jira/browse/YARN-4597
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Chris Douglas
>            Assignee: Arun Suresh
>         Attachments: YARN-4597.001.patch, YARN-4597.002.patch, 
> YARN-4597.003.patch
>
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to