[ 
https://issues.apache.org/jira/browse/YARN-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091823#comment-16091823
 ] 

Arun Suresh commented on YARN-6831:
-----------------------------------

I was thinking about removing *maxOppQueueLength* which led me to think about 
the following.
In YARN-5972, we are trying to get the NM to pause an opportunistic container 
instead of killing it. Both cgroup freezer and windows job objects implement 
freezing in the following way:
When a process is frozen, it's cpu share is reduced to 0 and its working set 
remains in memory as long as there is no external memory pressure. If the OS 
can't keep the frozen process in memory, it's memory is swapped out to disk and 
restored when the process is thawed. This implies that the number of paused 
containers is limited to the total swap space on the NM. This should be another 
local NM config, maybe something like *maxConsumedOpportunisticResources* which 
places an additional limit on number of running opportunistic containers.

> Miscellaneous refactoring changes of ContainScheduler 
> ------------------------------------------------------
>
>                 Key: YARN-6831
>                 URL: https://issues.apache.org/jira/browse/YARN-6831
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>
> While reviewing YARN-6706, Karthik pointed out a few issues for improvment in 
> ContainerScheduler
> *Make ResourceUtilizationTracker pluggable. That way, we could use a 
> different tracker when oversubscription is enabled.
> *ContainerScheduler
>   ##Why do we need maxOppQueueLength given queuingLimit?
>   ##Is there value in splitting runningContainers into runningGuaranteed and 
> runningOpportunistic?
>   ##getOpportunisticContainersStatus method implementation feels awkward. How 
> about capturing the state in the field here, and have metrics etc. pull from 
> here?
>   ##startContainersFromQueue: Local variable resourcesAvailable is unnecessary
> *OpportunisticContainersStatus
>   ##Let us clearly differentiate between allocated, used and utilized. Maybe, 
> we should rename current Used methods to Allocated?
>   ##I prefer either full name Opportunistic (in method) or Opp (shortest name 
> that makes sense). Opport is neither short nor fully descriptive.
>   ##Have we considered folding ContainerQueuingLimit class into this?
> We decided to move the issues into this follow up jira to keep YARN-6706 
> moving forward to unblock oversubscription work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to