[ 
https://issues.apache.org/jira/browse/YARN-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141673#comment-16141673
 ] 

Jason Lowe commented on YARN-6640:
----------------------------------

+1 lgtm as well.  Committing this.

>  AM heartbeat stuck when responseId overflows MAX_INT
> -----------------------------------------------------
>
>                 Key: YARN-6640
>                 URL: https://issues.apache.org/jira/browse/YARN-6640
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Blocker
>         Attachments: YARN-6640.v1.patch, YARN-6640.v2.patch
>
>
> The current code in {{ApplicationMasterService}}: 
> if ((request.getResponseId() + 1) == lastResponse.getResponseId()) {/* old 
> heartbeat */  return lastResponse;}
> else if (request.getResponseId() + 1 < lastResponse.getResponseId()) { throw 
> ... }
> process the heartbeat...
> When a heartbeat comes in, in usual case we are expecting 
> request.getResponseId() == lastResponse.getResponseId(). The “if“ is for the 
> duplicate heartbeat that’s one step old, the “else if” is to throw and 
> complain for heartbeats more than two steps old, otherwise we accept the new 
> heartbeat and process it.
> So the bug is: when lastResponse.getResponseId() == MAX_INT, the newest 
> heartbeat comes in with responseId == MAX_INT. However reponseId + 1 will be 
> MIN_INT, and we will fall into the “else if” case and RM will throw. Then we 
> are stuck here…



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to