[
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152241#comment-17152241
]
Eric Yang commented on YARN-10341:
----------------------------------
[~BilwaST] Sorry, I am confused by this ticket and the proposed patch fix to
the described problem.
The containers "restart_policy" controls if the container should be restarted
on the event of failure/killed. If it was not set, it will always restart. If
it was set to "NEVER", it will not restart. The completion events are
secondary information to assist to restart the containers or not. Using return
or break in onContainerCompleted method, don't make any difference.
Maybe I am missing something, could you give more information on how this patch
address the observed issue?
> Yarn Service Container Completed event doesn't get processed
> -------------------------------------------------------------
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Bilwa S T
> Assignee: Bilwa S T
> Priority: Critical
> Attachments: YARN-10341.001.patch
>
>
> If there 10 workers running and if containers get killed , after a while we
> see that there are just 9 workers runnning. This is due to CONTAINER
> COMPLETED Event is not processed on AM side.
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List<ContainerStatus> statuses) {
> for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance =
> liveInstances.get(status.getContainerId());
> if (instance == null) {
> LOG.warn(
> "Container {} Completed. No component instance exists.
> exitStatus={}. diagnostics={} ",
> containerId, status.getExitStatus(), status.getDiagnostics());
> return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
> }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over
> other containers as its returning from method
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]