[ https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16176461#comment-16176461 ]
Jason Lowe commented on YARN-7102: ---------------------------------- Forgot to mention that the above scenario is probably happening a lot more in practice than one might initially think. The NM heartbeat interval is on the order of seconds, but the NM performs an out-of-band heartbeat when a container completes. Therefore it is very likely that at some point a nodemanager heartbeats just as a container completes and ends up heartbeating back-to-back, greatly increasing the likelihood of this race occurring. > NM heartbeat stuck when responseId overflows MAX_INT > ---------------------------------------------------- > > Key: YARN-7102 > URL: https://issues.apache.org/jira/browse/YARN-7102 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Botong Huang > Assignee: Botong Huang > Priority: Critical > Attachments: YARN-7102.v1.patch, YARN-7102.v2.patch, > YARN-7102.v3.patch, YARN-7102.v4.patch, YARN-7102.v5.patch, YARN-7102.v6.patch > > > ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM > heartbeat in YARN-6640, please refer to YARN-6640 for details. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org