[
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bilwa S T updated YARN-10341:
-----------------------------
Description:
If there 10 workers running and if containers get killed , after a while we see
that there are just 9 workers runnning. This is due to CONTAINER COMPLETED
Event is not processed on AM side.
Issue is in below code:
{code:java}
public void onContainersCompleted(List<ContainerStatus> statuses) {
for (ContainerStatus status : statuses) {
ContainerId containerId = status.getContainerId();
ComponentInstance instance = liveInstances.get(status.getContainerId());
if (instance == null) {
LOG.warn(
"Container {} Completed. No component instance exists.
exitStatus={}. diagnostics={} ",
containerId, status.getExitStatus(), status.getDiagnostics());
return;
}
ComponentEvent event =
new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
.setStatus(status).setInstance(instance)
.setContainerId(containerId);
dispatcher.getEventHandler().handle(event);
}
{code}
If component instance doesnt exist for a container, it doesnt iterate over
other containers as its returning from method. This happens when restart_policy
is "ON_FAILURE"
was:
If there 10 workers running and if containers get killed , after a while we see
that there are just 9 workers runnning. This is due to CONTAINER COMPLETED
Event is not processed on AM side.
Issue is in below code:
{code:java}
public void onContainersCompleted(List<ContainerStatus> statuses) {
for (ContainerStatus status : statuses) {
ContainerId containerId = status.getContainerId();
ComponentInstance instance = liveInstances.get(status.getContainerId());
if (instance == null) {
LOG.warn(
"Container {} Completed. No component instance exists.
exitStatus={}. diagnostics={} ",
containerId, status.getExitStatus(), status.getDiagnostics());
return;
}
ComponentEvent event =
new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
.setStatus(status).setInstance(instance)
.setContainerId(containerId);
dispatcher.getEventHandler().handle(event);
}
{code}
If component instance doesnt exist for a container, it doesnt iterate over
other containers as its returning from method
> Yarn Service Container Completed event doesn't get processed
> -------------------------------------------------------------
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Bilwa S T
> Assignee: Bilwa S T
> Priority: Critical
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10341.001.patch, YARN-10341.002.patch,
> YARN-10341.003.patch, YARN-10341.004.patch
>
>
> If there 10 workers running and if containers get killed , after a while we
> see that there are just 9 workers runnning. This is due to CONTAINER
> COMPLETED Event is not processed on AM side.
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List<ContainerStatus> statuses) {
> for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance =
> liveInstances.get(status.getContainerId());
> if (instance == null) {
> LOG.warn(
> "Container {} Completed. No component instance exists.
> exitStatus={}. diagnostics={} ",
> containerId, status.getExitStatus(), status.getDiagnostics());
> return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
> }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over
> other containers as its returning from method. This happens when
> restart_policy is "ON_FAILURE"
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]