Yesha Vora created YARN-4101:
--------------------------------
Summary: RM should print alert messages if Zookeeper and
Resourcemanager gets connection issue
Key: YARN-4101
URL: https://issues.apache.org/jira/browse/YARN-4101
Project: Hadoop YARN
Issue Type: Bug
Components: yarn
Reporter: Yesha Vora
Priority: Critical
Currently, There is no way for user to understand Zk-RM has connection issues.
In HA environment, RM is highly dependent on Zookeeper. If connection between
RM and Zk is jeopardized, cluster is likely to be gone in bad state.
Example: Rm1 is active and Rm2 is standby. If connection between Rm2 and Zk is
lost, Rm2 will never become active. In this case, if Rm1 hits an error and
could not be started, cluster goes in bad state. This situation is very hard to
debug for user. In this case, if we can develop better prompting of messages,
User could fix the Zk-RM connection issue and could avoid getting in bad state.
Thus, We need a better way to prompt alert to user if connection between Zk ->
Active RM or Zk -> standby RM is getting bad.
Here are the suggestions.
1) Print connection lost alert in RM UI
2) Print alert messages while running any Yarn command such as yarn logs, yarn
applications etc
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)