[
https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842810#comment-13842810
]
Karthik Kambatla commented on YARN-1481:
----------------------------------------
Thanks [~vinodkv] for the refactoring. The code reads better now.
As for the TODO items in the patch:
# HAServiceStatus returned by getServiceStatus() has an isReadyToBecomeActive
method which is used by the FailoverController to check if an RM is in a state
to become Active. This corresponds to Active and Standby states, as the RM can
not be made Active when it is in the INITIALIZING or STOPPING states. If I have
missed the question entirely, please let me know.
{code}
+ if (isRMActive() || haState == HAServiceProtocol.HAServiceState.STANDBY) {
// TODO: What ?!
{code}
# Previously, AdminService was altering the Configuration and Bikas had
suggested we set the RM's conf in case any one decides to make a copy of the
Configuration in AdminService. Long story short, it can be removed.
{code}
+ if (this.rmContext.isHAEnabled()) {
+ HAUtil.verifyAndSetConfiguration(conf);
+ setConf(conf); // TODO: Why is this needed? Remove.
+ }
{code}
# I am not sure why we need a lock there in the first place. It should be safe
to remove it here, as transitionToStandby itself is synchronized.
{code}
+ synchronized(rm) { // TODO: Lock which class?
+ if (rmContext.isHAEnabled()) {
try {
// Transition to standby and reinit active services
LOG.info("Transitioning RM to Standby mode");
- adminService.transitionToStandby(true);
+ rm.transitionToStandby(true);
{code}
Comments on the patch itself:
# The field haState in ResourceManager can be private. Looks like we forgot to
make it private along with other changes.
# {{RMContextImpl#setHAServiceState}} should be called everywhere haState is
set in the RM, AdminService depends on this. Looks like the test failures are
also because of this.
# Given RMContext has the haState, it might be better to remove it altogether
from ResourceManager and use the rmContext version.
# get and set HAState in RMContextImpl should be synchronized on haState before
setting or getting haState
> ResourceManager and AdminService interact in a convoluted manner after
> YARN-1318
> --------------------------------------------------------------------------------
>
> Key: YARN-1481
> URL: https://issues.apache.org/jira/browse/YARN-1481
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1481-20131207.txt
>
>
> This is something I found while reviewing YARN-1318, but didn't halt that
> patch as many cycles went there already. Some top level issues
> - Not easy to follow RM's service life cycle
> -- RM adds only AdminService as its service directly.
> -- Other services are added to RM when AdminService's init calls
> RM.activeServices.init()
> - Overall, AdminService shouldn't encompass all of RM's HA state management.
> It was originally supposed to be the implementation of just the RPC server.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)