Daniel Templeton created YARN-4373:
--------------------------------------
Summary: Jobs can be temporarily forgotten during recovery
Key: YARN-4373
URL: https://issues.apache.org/jira/browse/YARN-4373
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Critical
The RM becomes available to service requests before state store recovery is
started. Before recovery and during the recovery period, it's possible for a
client to request an application report for a running application to which the
RM will respond that the application in unknown.
I'm seeing this issue with Oozie during an RM failover. Until the active
finishes recovery, it reports erroneous information to Oozie, which doesn't
have context to know that it should just try again later.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)