[
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645979#comment-15645979
]
Jian He commented on YARN-4597:
-------------------------------
bq. scheduledToRunContainers
sounds good
bq. Using shouldLaunchContainer also causes the CONTAINER_LAUNCH event to be
fired which I do not want.
I think the shouldLaunchContainer is also used for in case container killed has
been called, skip launching the container. (see the code comment in below code:
"// Check if the container is signalled to be killed." ). In terms of the
additional event, I also think that should be changed for existing code. The
code for sending container_launched event should be inside the 'else' block
when really launching the container.
{code}
// LaunchContainer is a blocking call. We are here almost means the
// container is launched, so send out the event.
dispatcher.getEventHandler().handle(new ContainerEvent(
containerId,
ContainerEventType.CONTAINER_LAUNCHED));
context.getNMStateStore().storeContainerLaunched(containerId);
// Check if the container is signalled to be killed.
if (!shouldLaunchContainer.compareAndSet(false, true)) {
LOG.info("Container " + containerId + " not launched as "
+ "cleanup already called");
return ExitCode.TERMINATED.getExitCode();
{code}
bq. Once queue limit is reached, no new opportunistic containers should also be
queued. The AM is free to request it again. The MRAppMaster, for eg.
re-requests the same task as a GUARANTEED container.
sorry for unclear, I meant below code. It will call 'storeContainerQueued'
unconditionally, even though the container is not queued when it reached the
queue-len limit. shouldn't we not call storeContainerQueued in that case ?
{code}
{
try {
this.context.getNMStateStore().storeContainerQueued(
container.getContainerId());
} catch (IOException e) {
LOG.warn("Could not store container state into store..", e);
}
LOG.info("No available resources for container {} to start its execution "
+ "immediately.", container.getContainerId());
if (container.getContainerTokenIdentifier().getExecutionType() ==
ExecutionType.GUARANTEED) {
queuedGuaranteedContainers.put(container.getContainerId(), container);
// Kill running opportunistic containers to make space for
// guaranteed container.
killOpportunisticContainers(container);
} else {
if (queuedOpportunisticContainers.size() <= maxOppQueueLength) {
LOG.info("Opportunistic container {} will be queued at the NM.",
container.getContainerId());
queuedOpportunisticContainers.put(
container.getContainerId(), container);
} else {
LOG.info("Opportunistic container [{}] will not be queued at the NM" +
"since max queue length [{}] has been reached",
container.getContainerId(), maxOppQueueLength);
container.sendKillEvent(
ContainerExitStatus.KILLED_BY_CONTAINER_SCHEDULER,
"Opportunistic container queue is full.");
}
}
}
{code}
> Add SCHEDULE to NM container lifecycle
> --------------------------------------
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager
> Reporter: Chris Douglas
> Assignee: Arun Suresh
> Labels: oct16-hard
> Attachments: YARN-4597.001.patch, YARN-4597.002.patch,
> YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch,
> YARN-4597.006.patch
>
>
> Currently, the NM immediately launches containers after resource
> localization. Several features could be more cleanly implemented if the NM
> included a separate stage for reserving resources.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]