Karthik Kambatla commented on YARN-2579:

Glad to see we are getting rid of the RMFatalEventDispatcher. I am assuming we 
want to keep the changes to a minimum in this patch, and do a follow-up JIRA to 
clean this up better. I would love to work on the follow-up; noticed a few 
discrepancies while working on YARN-2010, that continue to exist with this 
patch as well. 

Functionally, the patch looks good to me. In the interest of unblocking 2.6, I 
am +1 to committing it as well, but would like to point out some follow-up work 
that I see. Filed YARN-2814 to work on these items.

I see the following follow-up items to simplify the surrounding code and 
improve readability, if we do commit the existing patch.
# Get rid of RMFatalEventDispatcher and RMFatalEvent* altogether.
# Given all other events are specific to RMActiveServices, we should move the 
dipatcher also into RMActiveServices.
# I am not a fan of having a pointer to the RM in the store as well, 
particularly since we have RMContext primarily to hold the information other 
classes need. I am concerned about more classes needing this information in the 
# Add a shutdownOrTransitionToStandby method in the RM to transparently handle 
non-HA and HA cases.
# Unrelated to this patch: we should make the existing 
{{transitionToStandby(boolean)}} private, and add a package-private 
{{transitionToStandby()}} to be called from AdminService and 
# Instead of calling ExitUtil#terminate at multiple places in the RM, we should 
have a {{protected shutdown()}} method that does this and can be overridden in 
MockRM for better testing. 

> Both RM's state is Active , but 1 RM is not really active.
> ----------------------------------------------------------
>                 Key: YARN-2579
>                 URL: https://issues.apache.org/jira/browse/YARN-2579
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.5.1
>            Reporter: Rohith
>            Assignee: Rohith
>            Priority: Blocker
>         Attachments: YARN-2579-20141105.1.patch, YARN-2579-20141105.2.patch, 
> YARN-2579-20141105.3.patch, YARN-2579-20141105.patch, YARN-2579.patch, 
> YARN-2579.patch
> I encountered a situaltion where both RM's web page was able to access and 
> its state displayed as Active. But One of the RM's ActiveServices were 
> stopped.

This message was sent by Atlassian JIRA

Reply via email to