[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622783#comment-13622783
 ] 

Bikas Saha commented on YARN-479:
---------------------------------

I dont see the value of waitForever if we can specify a large value for retry 
interval (1 day or so)

Not sure what retryCounts is buying us.

Whats the intention of catching and rethrowing the exception without doing 
anything else
{code}
+          } catch (YarnException e) {
+            //catch and throw the exception if tried MAX wait time to connect 
RM
+            throw e;
{code}

there is a finally block which will make the code sleeping for longer than 
necessary before exiting. this becomes important because admins might kill the 
NM after waiting for a few seconds for it to exit. In that much time NM has to 
do a bunch of clean up tasks and this extra sleep does not help.

Unrelated to this change, but does the NM really shutdown when the heartbeat 
fails right now? It looks like that the thread just keeps running. After this 
change it looks like the heartbeat thread will just exit. This does not mean 
that the NM will shutdown?
                
> NM retry behavior for connection to RM should be similar for lost heartbeats
> ----------------------------------------------------------------------------
>
>                 Key: YARN-479
>                 URL: https://issues.apache.org/jira/browse/YARN-479
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Jian He
>         Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to