[ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715388#comment-13715388 ]
Bikas Saha commented on YARN-149: --------------------------------- The document describes the overall intent for an interested reader. It does not imply that the implementation steps will follow verbatim. We would like to make the changes incremental. E.g. just do manual fail-over at first to test the HAServiceProtocol impl. Similarly we can start with services being unaware of HA state and then add it later on. To be clear, the document says that only the external facing (RPC API layer) may be aware of HAState and this is not required at the beginning. HDFS does something similar with each client operation checked for being allowed in a given HAState. Internal services are not expected to know about HAState and it is expected that there is no activity in the internal state machines until the RM instance becomes active. Can you please take another look and let me know if this is not clear from the document. I can edit it to clarify. > ResourceManager (RM) High-Availability (HA) > ------------------------------------------- > > Key: YARN-149 > URL: https://issues.apache.org/jira/browse/YARN-149 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Harsh J > Assignee: Bikas Saha > Attachments: rm-ha-phase1-approach-draft1.pdf, > rm-ha-phase1-draft2.pdf, YARN ResourceManager Automatic > Failover-rev-07-21-13.pdf > > > This jira tracks work needed to be done to support one RM instance failing > over to another RM instance so that we can have RM HA. Work includes leader > election, transfer of control to leader and client re-direction to new leader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira