[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

Bikas Saha (JIRA) Sun, 08 Sep 2013 13:47:59 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761524#comment-13761524
 ]


Bikas Saha commented on YARN-1027:
----------------------------------

This code is confusing. The state shouldnt be initializing if the service is 
stopped. Can we open a jira to add a meaningful state to HAServiceState and 
refer to that jira in this code so that we can fix it in that jira too. 
{code}
+  public void serviceStop() throws Exception {
+    // Stop all services
+    transitionToStandby();
+
+    // Update haState as RM can no longer be active
+    haState = HAServiceState.INITIALIZING;
+    super.serviceStop();
{code}

Lets not leave orphan TODOs in the code. Please refer to YARN-1068 or open a 
new jira.
{code}
+    // TODO: When automatic failover is enabled, check if transition should be
+    // allowed for this request
{code}

In transtionToStandby() we are changing state to STANDBY after stopping all 
services. This is fine for now. We must keep this in mind later on when we 
start having ha-aware alwaysOn services. They need to stop signalling the 
ActiveServices before we stop them. Eg. RPC services would need to start 
rejecting requests before we stop the activeServices.

createAndStartActiveServices() and related methods should be package visibility 
and not protected. Protected would mean that we intend a derived class to see 
these methods too.

Is the commented code going to be uncommented or removed? The code is valid and 
should work in an active state. So it should probably be uncommented. What 
happens if we call this method when the RM is in standby mode? I am wondering 
if we may be able to call this during that time and verify that the RM is 
indeed not active.
{code}
+  private void checkActiveRMisFunctional() {
+    try {
+      rm.getNewAppId();
+//      rm.registerNode("node1", 2048);
+      rm.submitApp(1024);
{code}

Locking on the RMHAServiceProtocol is confusing. Some public methods are 
synchronized while others are not. Will these lead to race conditions in the 
future. How about we make them all public synchronized since they are not 
expected to be high performance and so heavy locking is fine. Caveat to this 
would be if ZKFC expects getServiceState()/monitorHealth() to work even while 
the service is transitioning to active/standby. Again, that probably doesnt 
matter if these operations happen in a reasonably short time.

Depending on your conclusion in YARN-1077 we can keep the approach in 1027-3 or 
1027-4 wrt RMHAServiceProtocol being always present or not.

Patch is close to being ready for commit! The main thing to verify (even if 
manually) is that the ActiveService objects are being GC'd correctly or not.
                
> Implement RMHAServiceProtocol
> -----------------------------
>
>                 Key: YARN-1027
>                 URL: https://issues.apache.org/jira/browse/YARN-1027
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
> yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, 
> yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch
>
>
> Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
> single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

Reply via email to