[
https://issues.apache.org/jira/browse/YARN-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Botong Huang updated YARN-6640:
-------------------------------
Attachment: YARN-6640.v1.patch
> AM heartbeat stuck when responseId overflows MAX_INT
> -----------------------------------------------------
>
> Key: YARN-6640
> URL: https://issues.apache.org/jira/browse/YARN-6640
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Botong Huang
> Assignee: Botong Huang
> Priority: Minor
> Attachments: YARN-6640.v1.patch
>
>
> The current code in {{ApplicationMasterService}}:
> if ((request.getResponseId() + 1) == lastResponse.getResponseId()) {/* old
> heartbeat */ return lastResponse;}
> else if (request.getResponseId() + 1 < lastResponse.getResponseId()) { throw
> ... }
> process the heartbeat...
> When a heartbeat comes in, in usual case we are expecting
> request.getResponseId() == lastResponse.getResponseId(). The “if“ is for the
> duplicate heartbeat that’s one step old, the “else if” is to throw and
> complain for heartbeats more than two steps old, otherwise we accept the new
> heartbeat and process it.
> So the bug is: when lastResponse.getResponseId() == MAX_INT, the newest
> heartbeat comes in with responseId == MAX_INT. However reponseId + 1 will be
> MIN_INT, and we will fall into the “else if” case and RM will throw. Then we
> are stuck here…
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]