[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750258#comment-13750258
 ] 

Bikas Saha commented on YARN-1027:
----------------------------------

Its a good idea to draft a path in which the HA protocol becomes another 
service within the RM. We should think through various 
startup/transitionToActive()/transitionToStandby() scenarios to determine the 
best approach to code this. 

E.g. repeated transitions from active->standby->active for the same RM without 
bringing the process down. This means that all apps in the RM (ie all internal 
stateful objects like appmanager, scheduler, rmappimpl etc etc) should all be 
completely cleaned up during transitionToStanbdy(). Currently the RM simply 
shuts down and hence that cleanup is not necessary.

This may also suggest that we logically divide RM internal objects into 2 
groups 1) stuff that can be started once and kept on until RM stops 2) stuff 
that needs to be cleaned every time the RM is standby and re-inited when the RM 
is active. The second group would contain things like the scheduler while the 
first would contain things like the RPC services. The first set would be 
transparent to HA while the second set would need to be aware of HA.

Perhaps before we tackle this jira to completion, we should open and commit 
another jira that identifies all stateful objects within the RM and adds 
support to clean them up during RM shutdown. Those cleanup methods can be 
re-used during transitionToStandby(). This jira can build on top of that.
                
> Implement RMHAServiceProtocol
> -----------------------------
>
>                 Key: YARN-1027
>                 URL: https://issues.apache.org/jira/browse/YARN-1027
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: yarn-1027-1.patch
>
>
> Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
> single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to