[
https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715388#comment-13715388
]
Bikas Saha commented on YARN-149:
---------------------------------
The document describes the overall intent for an interested reader. It does not
imply that the implementation steps will follow verbatim. We would like to make
the changes incremental. E.g. just do manual fail-over at first to test the
HAServiceProtocol impl. Similarly we can start with services being unaware of
HA state and then add it later on. To be clear, the document says that only the
external facing (RPC API layer) may be aware of HAState and this is not
required at the beginning. HDFS does something similar with each client
operation checked for being allowed in a given HAState. Internal services are
not expected to know about HAState and it is expected that there is no activity
in the internal state machines until the RM instance becomes active. Can you
please take another look and let me know if this is not clear from the
document. I can edit it to clarify.
> ResourceManager (RM) High-Availability (HA)
> -------------------------------------------
>
> Key: YARN-149
> URL: https://issues.apache.org/jira/browse/YARN-149
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Harsh J
> Assignee: Bikas Saha
> Attachments: rm-ha-phase1-approach-draft1.pdf,
> rm-ha-phase1-draft2.pdf, YARN ResourceManager Automatic
> Failover-rev-07-21-13.pdf
>
>
> This jira tracks work needed to be done to support one RM instance failing
> over to another RM instance so that we can have RM HA. Work includes leader
> election, transfer of control to leader and client re-direction to new leader.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira