[ https://issues.apache.org/jira/browse/YARN-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091823#comment-16091823 ]
Arun Suresh commented on YARN-6831: ----------------------------------- I was thinking about removing *maxOppQueueLength* which led me to think about the following. In YARN-5972, we are trying to get the NM to pause an opportunistic container instead of killing it. Both cgroup freezer and windows job objects implement freezing in the following way: When a process is frozen, it's cpu share is reduced to 0 and its working set remains in memory as long as there is no external memory pressure. If the OS can't keep the frozen process in memory, it's memory is swapped out to disk and restored when the process is thawed. This implies that the number of paused containers is limited to the total swap space on the NM. This should be another local NM config, maybe something like *maxConsumedOpportunisticResources* which places an additional limit on number of running opportunistic containers. > Miscellaneous refactoring changes of ContainScheduler > ------------------------------------------------------ > > Key: YARN-6831 > URL: https://issues.apache.org/jira/browse/YARN-6831 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: Haibo Chen > Assignee: Haibo Chen > > While reviewing YARN-6706, Karthik pointed out a few issues for improvment in > ContainerScheduler > *Make ResourceUtilizationTracker pluggable. That way, we could use a > different tracker when oversubscription is enabled. > *ContainerScheduler > ##Why do we need maxOppQueueLength given queuingLimit? > ##Is there value in splitting runningContainers into runningGuaranteed and > runningOpportunistic? > ##getOpportunisticContainersStatus method implementation feels awkward. How > about capturing the state in the field here, and have metrics etc. pull from > here? > ##startContainersFromQueue: Local variable resourcesAvailable is unnecessary > *OpportunisticContainersStatus > ##Let us clearly differentiate between allocated, used and utilized. Maybe, > we should rename current Used methods to Allocated? > ##I prefer either full name Opportunistic (in method) or Opp (shortest name > that makes sense). Opport is neither short nor fully descriptive. > ##Have we considered folding ContainerQueuingLimit class into this? > We decided to move the issues into this follow up jira to keep YARN-6706 > moving forward to unblock oversubscription work. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org