[ 
https://issues.apache.org/jira/browse/YARN-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141700#comment-16141700
 ] 

Hudson commented on YARN-6640:
------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12240 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/12240/])
YARN-6640. AM heartbeat stuck when responseId overflows MAX_INT. (jlowe: rev 
3a4e861169dc3da9df0158ba6f44a9bc8576e217)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java


>  AM heartbeat stuck when responseId overflows MAX_INT
> -----------------------------------------------------
>
>                 Key: YARN-6640
>                 URL: https://issues.apache.org/jira/browse/YARN-6640
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Blocker
>         Attachments: YARN-6640.v1.patch, YARN-6640.v2.patch
>
>
> The current code in {{ApplicationMasterService}}: 
> if ((request.getResponseId() + 1) == lastResponse.getResponseId()) {/* old 
> heartbeat */  return lastResponse;}
> else if (request.getResponseId() + 1 < lastResponse.getResponseId()) { throw 
> ... }
> process the heartbeat...
> When a heartbeat comes in, in usual case we are expecting 
> request.getResponseId() == lastResponse.getResponseId(). The “if“ is for the 
> duplicate heartbeat that’s one step old, the “else if” is to throw and 
> complain for heartbeats more than two steps old, otherwise we accept the new 
> heartbeat and process it.
> So the bug is: when lastResponse.getResponseId() == MAX_INT, the newest 
> heartbeat comes in with responseId == MAX_INT. However reponseId + 1 will be 
> MIN_INT, and we will fall into the “else if” case and RM will throw. Then we 
> are stuck here…



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to