[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

Bikas Saha (JIRA) Thu, 05 Sep 2013 23:10:59 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759973#comment-13759973
 ]


Bikas Saha commented on YARN-1027:
----------------------------------

The patch look clean overall. 

I would suggest keeping haEnabled concept within the HAServiceProtocol service 
instead of mixing it between the ResourceManager and HAServiceProtocol. Thus 
the RM always addService(HAServiceProtocol). HAServiceProtocol is the one that 
checks if haEnabled in serviceStart(). If enabled then it transitions to 
standby and waits for active signal. If not, then it directly transitions to 
active.

Shouldn't we simply call transitionToStandby() here? That would ensure 
getServiceStatus() returns non active status for anyone that cares to know.
{code}
+  public void serviceStop() throws Exception {
+    if (rm.haState == HAServiceState.ACTIVE) {
+      rm.stopActiveServices();
{code}

This is fine for now but we might have to invest in better health check in a 
different jira. Any ideas?
{code}
public synchronized void monitorHealth() throws HealthCheckFailedException {
+    if (rm.haState == HAServiceState.ACTIVE && !rm.areActiveServicesRunning()) 
{
{code}

We probably want the log before the if stmt.
Should we change state to standby before we stop services? Assuming that HA 
aware services would need to know about this earlier rather than later so that 
they can stop signaling Active services and allow them to be drained/stopped.
{code}
+    if (rm.haState == HAServiceState.ACTIVE) {
+      rm.stopActiveServices();
+    }
+
+    LOG.info("Transitioning to standby");
+    rm.haState = HAServiceState.STANDBY;
{code}

Didnt quite get this comment. Is this do with change being requested by 
user/admin/ZKFC?
{code}
+  public void transitionToActive(StateChangeRequestInfo reqInfo) {
+    // TODO: When automatic failover is enabled, check if transition should
+    // be allowed for this request
{code}

What are the pros of making haState a member of ResourceManager instead of 
HAServiceProtocol? A pro of the latter is that it keeps all HA stuff in one 
place.

Why is there a lock used in ResourceManager.startActive() etc. Why are these 
methods protected. If testing, then lets add an @visiblefortesting annotation.

Is there a way to confirm that the active service objects are all being GC'd?

testStartAndTransitions() - How about calling getServiceStatus() and 
monitorHealth() in addition to checking the internal members, in all places 
where internal members are being checked. So we can test and exercise those 
methods too. How about completing 
Active->Standby->Active->Standby->Active->RM.serviceStop(). This would fully 
simulate multiple full cycles of transitions and also verify the shutdown case.
We can also issue some requests like createApplication() to the RM, when in 
active state, and verify that the RM is really working.

TestRMHADisabled. It confusing to read that the RM has started but its 
haState==INITIALIZING. Also, we can probably move this test in TestRMHA.java to 
keep related tests in one place. 


Minor nits

LOG instead of print?
{code}
+    } catch (Exception e) {
+      e.printStackTrace();
{code}

RM_HA_PREFIX instead of HA_PREFIX
                
> Implement RMHAServiceProtocol
> -----------------------------
>
>                 Key: YARN-1027
>                 URL: https://issues.apache.org/jira/browse/YARN-1027
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
> yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-including-yarn-1098-3.patch, 
> yarn-1027-in-rm-poc.patch
>
>
> Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
> single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

Reply via email to