[jira] [Commented] (YARN-1098) Separate out RM services into "Always On" and "Active"

Bikas Saha (JIRA) Thu, 05 Sep 2013 15:41:36 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759443#comment-13759443
 ]


Bikas Saha commented on YARN-1098:
----------------------------------

I dont think Alejandro meant wrapping ResourceManager inside an outer class. I 
think he means define that class separately instead of defining it within the 
scope of the ResourceManager class. ResourceManager class is still top-level 
and uses the new class as a member object.

I did not understand why setRMStateStore method needed to be changed in the 
first place? It was written with the explicit purpose of making sure that the 
RM's internal dispatcher is always set on the state store. Sending the 
dispatcher in an argument defeats that purpose.

My guess is that we are attempting too much in this patch. How about we do the 
following.

1) Maintain the ActiveServices as an inner class so that it has direct access 
to all members and methods of ResourceManager. This is same as your suggestion 
above in response to Alejandro's comments.
2) Move the entire code from ResourceManager.serviceInit/Start/Stop() to 
ActiveServices.serviceInit()/Start/Stop(). Have 
ResourceManager.serviceInit/Start/Stop() simply call 
activeService.serviceInit/Start()/Stop(). This is a cosmetic code refactor 
suggestion that I think will significantly reduce the size of this patch.

Everything should just work at this point and we have separated all services 
into a new layer. We can close this jira.

Next in YARN-1027, we add HAServiceProtocol service to ResourceManager. 
TransitionToActive will create a new instance of ActiveServices, call 
ActiveService.serviceInit/Start(). TransitionToStandby will call 
ActiveServices.stop() and remove the reference. Hopefully at this point 
everything in ActiveService will get GC'd. Even if the service API is changed 
to allow restarting a service I dont think every service we have will be 
restartable without more effort. So its best that we "delete" that object and 
create another one.

At this point we have failover capability added to the ResourceManager and we 
can start working on client side pieces and testing.

When we improve a service (say RMAdmin service) to be AlwaysOn we will remove 
it from ActiveServices and add it to ResourceManager. Eventually ActiveServices 
will contain a collection of objects that needs to get deleted upon standby and 
re-created on active. This can be an incremental work that can go at its own 
pace. If one of them is a restartable service we can be more efficient and 
restart it vs re-creating the service.

I would strongly suggest keeping these patches as small as possible so that we 
can quickly review/commit and iterate while keeping the code base stable.
                
> Separate out RM services into "Always On" and "Active"
> ------------------------------------------------------
>
>                 Key: YARN-1098
>                 URL: https://issues.apache.org/jira/browse/YARN-1098
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>              Labels: ha
>         Attachments: yarn-1098-1.patch, yarn-1098-2.patch, 
> yarn-1098-approach.patch, yarn-1098-approach.patch
>
>
> From discussion on YARN-1027, it makes sense to separate out services that 
> are stateful and stateless. The stateless services can  run perennially 
> irrespective of whether the RM is in Active/Standby state, while the stateful 
> services need to  be started on transitionToActive() and completely shutdown 
> on transitionToStandby().
> The external-facing stateless services should respond to the client/AM/NM 
> requests depending on whether the RM is Active/Standby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1098) Separate out RM services into "Always On" and "Active"

Reply via email to