[
https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759443#comment-13759443
]
Bikas Saha commented on YARN-1098:
----------------------------------
I dont think Alejandro meant wrapping ResourceManager inside an outer class. I
think he means define that class separately instead of defining it within the
scope of the ResourceManager class. ResourceManager class is still top-level
and uses the new class as a member object.
I did not understand why setRMStateStore method needed to be changed in the
first place? It was written with the explicit purpose of making sure that the
RM's internal dispatcher is always set on the state store. Sending the
dispatcher in an argument defeats that purpose.
My guess is that we are attempting too much in this patch. How about we do the
following.
1) Maintain the ActiveServices as an inner class so that it has direct access
to all members and methods of ResourceManager. This is same as your suggestion
above in response to Alejandro's comments.
2) Move the entire code from ResourceManager.serviceInit/Start/Stop() to
ActiveServices.serviceInit()/Start/Stop(). Have
ResourceManager.serviceInit/Start/Stop() simply call
activeService.serviceInit/Start()/Stop(). This is a cosmetic code refactor
suggestion that I think will significantly reduce the size of this patch.
Everything should just work at this point and we have separated all services
into a new layer. We can close this jira.
Next in YARN-1027, we add HAServiceProtocol service to ResourceManager.
TransitionToActive will create a new instance of ActiveServices, call
ActiveService.serviceInit/Start(). TransitionToStandby will call
ActiveServices.stop() and remove the reference. Hopefully at this point
everything in ActiveService will get GC'd. Even if the service API is changed
to allow restarting a service I dont think every service we have will be
restartable without more effort. So its best that we "delete" that object and
create another one.
At this point we have failover capability added to the ResourceManager and we
can start working on client side pieces and testing.
When we improve a service (say RMAdmin service) to be AlwaysOn we will remove
it from ActiveServices and add it to ResourceManager. Eventually ActiveServices
will contain a collection of objects that needs to get deleted upon standby and
re-created on active. This can be an incremental work that can go at its own
pace. If one of them is a restartable service we can be more efficient and
restart it vs re-creating the service.
I would strongly suggest keeping these patches as small as possible so that we
can quickly review/commit and iterate while keeping the code base stable.
> Separate out RM services into "Always On" and "Active"
> ------------------------------------------------------
>
> Key: YARN-1098
> URL: https://issues.apache.org/jira/browse/YARN-1098
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.1.0-beta
> Reporter: Karthik Kambatla
> Assignee: Karthik Kambatla
> Labels: ha
> Attachments: yarn-1098-1.patch, yarn-1098-2.patch,
> yarn-1098-approach.patch, yarn-1098-approach.patch
>
>
> From discussion on YARN-1027, it makes sense to separate out services that
> are stateful and stateless. The stateless services can run perennially
> irrespective of whether the RM is in Active/Standby state, while the stateful
> services need to be started on transitionToActive() and completely shutdown
> on transitionToStandby().
> The external-facing stateless services should respond to the client/AM/NM
> requests depending on whether the RM is Active/Standby.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira