Omkar Vinit Joshi created YARN-1421:
---------------------------------------
Summary: Node managers will not receive application finish event
where containers ran before RM restart
Key: YARN-1421
URL: https://issues.apache.org/jira/browse/YARN-1421
Project: Hadoop YARN
Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Critical
Problem :- Today for every application we track the node managers where
container ran. So when application finishes it notifies all those node managers
about application finish event (via node manager heartbeat). However if rm
restarts then we forget this past information and those node managers will
never get application finish event and will keep reporting finished
applications.
Propose Solution :- Instead of remembering the node managers where containers
ran for this particular application it would be better if we depend on node
manager heartbeat to take this decision. i.e. when node manager heartbeats
saying it is running application (app1, app2) then we should those
application's status in RM's memory {code}rmContext.getRMApps(){code} and if
either they are not found (very old applications) or they are in their final
state (FINISHED, KILLED, FAILED) then we should immediately notify the node
manager about the application finish event.
--
This message was sent by Atlassian JIRA
(v6.1#6144)