[
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15658402#comment-15658402
]
Konstantinos Karanasos commented on YARN-4597:
----------------------------------------------
Here are some comments on the {{ContainerScheduler}}:
- {{queuedOpportunisticContainers}} will have concurrency issues. We are
updating it when containers arrive but also in the
{{shedQueuedOpportunisticContainers}}.
- {{queuedGuaranteedContainers}} and {{queuedOpportunisticContainers}}: I think
we should use queues. I don't think we retrieve the container by the key
anywhere either ways.
- {{oppContainersMarkedForKill}}: could be a Set, right?
- {{scheduledToRunContainers}} are containers that are either already running
or are going to run very soon (transitioning from SCHEDULED to RUNNING state).
Name is a bit misleading, because it sounds like they are only the ones
belonging to the second category. I would rather say {{runningContainers}} and
specify in a comment that they might not be running at this very moment but
will be running very soon.
- In the {{onContainerCompleted()}}, the
{{scheduledToRunContainers.remove(container.getContainerId())}} and the
{{startPendingContainers()}} can go inside the if statement above. If the
container was not running and no resources were freed up, we don't need to call
the {{startPendingContainers()}}.
- fields of the {{opportunisticContainersStatus}} are set in different places.
Due to that, when we call {{getOpportunisticContainersStatus()}} we may see an
inconsistent object. Let's set the fields only in the
{{getOpportunisticContainersStatus()}}.
- line 252, indeed we can now do extraOpportContainersToKill ->
opportContainersToKill, as Karthik mentioned at a comment.
- line 87: increase -> increases
- {{shedQueuedOpportunisticContainers}}:
-- {{numAllowed}} is the number of allowed containers in the queue. Instead, we
are killing numAllowed containers. In other words, we should not kill
numAllowed, but {{queuedOpportunisticContainers.size() - numAllowed}}.
-- "Container Killed to make room for Guaranteed Container." -> "Container
killed to meet NM queuing limits". Instead of kill, you can also say de-queued.
> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager
> Reporter: Chris Douglas
> Assignee: Arun Suresh
> Labels: oct16-hard
> Attachments: YARN-4597.001.patch, YARN-4597.002.patch,
> YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch,
> YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch,
> YARN-4597.009.patch
>
>
> Currently, the NM immediately launches containers after resource
> localization. Several features could be more cleanly implemented if the NM
> included a separate stage for reserving resources.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]