[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556037#comment-14556037
 ] 

Junping Du commented on YARN-41:
--------------------------------

Thanks [~devaraj.k] for updating the patch! Just finish my review.
Two major comments:
{code}
+    // the isStopped check is for avoiding multiple unregistrations.
+    if (this.registeredWithRM && !this.isStopped && !isNMUnderSupervision()) {
+      unRegisterNM();
+    }
{code}
According to discussion above, I think we need to check 
"yarn.nodemanager.recovery.enabled" as well. Because even NM is under 
supervision, if we disable nm recovery, we should continue to unregister NM to 
RM.

{code}
      .addTransition(NodeState.RUNNING, NodeState.DECOMMISSIONED,
-         RMNodeEventType.DECOMMISSION,
+         EnumSet.of(RMNodeEventType.DECOMMISSION, RMNodeEventType.SHUTDOWN),
{code}
So, the node after shutdown will become DECOMMISSIONED as final state? I think 
we don't expect these nodes show in DECOMMISSIONED list. Isn't it? May be we 
should have some new NodeState as SHUTDOWN for this case. This could make 
changes incompatible, at least for behaviors and UI. We may need to mark this 
JIRA as incompatible and document these changes somewhere when patch is done.

Some minor comments:
Add tests for new PB objects UnRegisterNodeManagerRequestPBImpl, 
UnRegisterNodeManagerResponsePBImpl into TestYarnServerApiClasses.java.

{code}
catch (YarnException e) {
+      throw new ServiceException(e);
+    } catch (IOException e) {
+      throw new ServiceException(e);
+    }
{code}
Better to replace with
{code}
catch (YarnException | IOException e) {
    throw new ServiceException(e);
}
{code} 

{code}
+  @Test
+  public void testUnRegisterNodeManager() throws Exception {
+    UnRegisterNodeManagerRequest request = recordFactory
+        .newRecordInstance(UnRegisterNodeManagerRequest.class);
+    assertNotNull(client.unRegisterNodeManager(request));
+
+    ResourceTrackerTestImpl.exception = true;
+    try {
+      client.unRegisterNodeManager(request);
+      fail("there  should be YarnException");
+    } catch (YarnException e) {
+      assertTrue(e.getMessage().startsWith("testMessage"));
+    } finally {
+      ResourceTrackerTestImpl.exception = false;
+    }
+  }
{code}
If other exception get thrown here with wrong message, the test would still 
pass. Isn't it? better to catch all exceptions and check if it is 
YarnException. 

{code}
+  private void unRegisterNM() {
+    RecordFactory recordFactory = RecordFactoryPBImpl.get();
+    UnRegisterNodeManagerRequest request = recordFactory
+        .newRecordInstance(UnRegisterNodeManagerRequest.class);
+    request.setNodeId(this.nodeId);
+    try {
+      resourceTracker.unRegisterNodeManager(request);
+      LOG.info("Successfully Unregistered the Node with ResourceManager");
+    } catch (Exception e) {
+      LOG.warn("Unregistration of Node failed.", e);
+    }
+  }
{code}
Put nodeId in the log could help in trouble shooting, also add the missing 
period in log.

> The RM should handle the graceful shutdown of the NM.
> -----------------------------------------------------
>
>                 Key: YARN-41
>                 URL: https://issues.apache.org/jira/browse/YARN-41
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Ravi Teja Ch N V
>            Assignee: Devaraj K
>         Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
> MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
> YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch
>
>
> Instead of waiting for the NM expiry, RM should remove and handle the NM, 
> which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to