[
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061387#comment-14061387
]
Karthik Kambatla commented on YARN-2244:
----------------------------------------
# Can we use {{AbstractYarnScheduler#killOrphanContainerOnNode()}} instead?
{code}
+ this.rmContext.getDispatcher().getEventHandler()
+ .handle(new RMNodeCleanContainerEvent(node.getNodeID(),
containerId));
{code}
# Thanks for moving the following to a separate method. IMO, we should clean it
up more:
{code}
protected void waitForContainerCleanup(DrainDispatcher dispatcher, MockNM nm,
NodeHeartbeatResponse resp) throws
Exception {
int waitCount;
dispatcher.await();
List<ContainerId> contsToClean = resp.getContainersToCleanup();
int cleanedConts = contsToClean.size();
waitCount = 0;
while (cleanedConts < 1 && waitCount++ < 200) {
LOG.info("Waiting to get cleanup events.. cleanedConts: " + cleanedConts);
Thread.sleep(100);
resp = nm.nodeHeartbeat(true);
dispatcher.await();
contsToClean = resp.getContainersToCleanup();
cleanedConts += contsToClean.size();
}
if (contsToClean.isEmpty()) {
LOG.error("Failed to get any containers to cleanup");
} else {
LOG.info("Got cleanup for " + contsToClean.get(0));
}
Assert.assertEquals(1, cleanedConts);
}
{code}
## One line over 80 chars
## {{int waitCount = 0}} can go on oneline
## Fetching containers to clean and other arithmetic before the while loop can
be moved into the while loop. cleanedConts can be initialized to zero. I am
okay with a do-while too.
## Remove the logging - I am not sure why are we logging that information 200
times.
## Parametrize the method to also take number of container cleanups to wait for
and use it everywhere.
> FairScheduler missing handling of containers for unknown application attempts
> ------------------------------------------------------------------------------
>
> Key: YARN-2244
> URL: https://issues.apache.org/jira/browse/YARN-2244
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Reporter: Anubhav Dhoot
> Assignee: Anubhav Dhoot
> Priority: Critical
> Attachments: YARN-2224.patch, YARN-2244.001.patch, YARN-2244.002.patch
>
>
> We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other
> fixes that were common across schedulers, there were some scheduler specific
> fixes added to handle containers for unknown application attempts. Without
> these fair scheduler simply logs that an unknown container was found and
> continues to let it run.
--
This message was sent by Atlassian JIRA
(v6.2#6252)