[
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199475#comment-14199475
]
Karthik Kambatla commented on YARN-2579:
----------------------------------------
Glad to see we are getting rid of the RMFatalEventDispatcher. I am assuming we
want to keep the changes to a minimum in this patch, and do a follow-up JIRA to
clean this up better. I would love to work on the follow-up; noticed a few
discrepancies while working on YARN-2010, that continue to exist with this
patch as well.
Functionally, the patch looks good to me. In the interest of unblocking 2.6, I
am +1 to committing it as well, but would like to point out some follow-up work
that I see. Filed YARN-2814 to work on these items.
I see the following follow-up items to simplify the surrounding code and
improve readability, if we do commit the existing patch.
# Get rid of RMFatalEventDispatcher and RMFatalEvent* altogether.
# Given all other events are specific to RMActiveServices, we should move the
dipatcher also into RMActiveServices.
# I am not a fan of having a pointer to the RM in the store as well,
particularly since we have RMContext primarily to hold the information other
classes need. I am concerned about more classes needing this information in the
future.
# Add a shutdownOrTransitionToStandby method in the RM to transparently handle
non-HA and HA cases.
# Unrelated to this patch: we should make the existing
{{transitionToStandby(boolean)}} private, and add a package-private
{{transitionToStandby()}} to be called from AdminService and
EmbeddedElectorService.
# Instead of calling ExitUtil#terminate at multiple places in the RM, we should
have a {{protected shutdown()}} method that does this and can be overridden in
MockRM for better testing.
> Both RM's state is Active , but 1 RM is not really active.
> ----------------------------------------------------------
>
> Key: YARN-2579
> URL: https://issues.apache.org/jira/browse/YARN-2579
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.5.1
> Reporter: Rohith
> Assignee: Rohith
> Priority: Blocker
> Attachments: YARN-2579-20141105.1.patch, YARN-2579-20141105.2.patch,
> YARN-2579-20141105.3.patch, YARN-2579-20141105.patch, YARN-2579.patch,
> YARN-2579.patch
>
>
> I encountered a situaltion where both RM's web page was able to access and
> its state displayed as Active. But One of the RM's ActiveServices were
> stopped.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)