[
https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16176461#comment-16176461
]
Jason Lowe commented on YARN-7102:
----------------------------------
Forgot to mention that the above scenario is probably happening a lot more in
practice than one might initially think. The NM heartbeat interval is on the
order of seconds, but the NM performs an out-of-band heartbeat when a container
completes. Therefore it is very likely that at some point a nodemanager
heartbeats just as a container completes and ends up heartbeating back-to-back,
greatly increasing the likelihood of this race occurring.
> NM heartbeat stuck when responseId overflows MAX_INT
> ----------------------------------------------------
>
> Key: YARN-7102
> URL: https://issues.apache.org/jira/browse/YARN-7102
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Botong Huang
> Assignee: Botong Huang
> Priority: Critical
> Attachments: YARN-7102.v1.patch, YARN-7102.v2.patch,
> YARN-7102.v3.patch, YARN-7102.v4.patch, YARN-7102.v5.patch, YARN-7102.v6.patch
>
>
> ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM
> heartbeat in YARN-6640, please refer to YARN-6640 for details.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]