[
https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Botong Huang updated YARN-7102:
-------------------------------
Attachment: YARN-7102.v2.patch
Some explanations since v2 patch is much bigger. This change revealed more
flaky tests regarding MockNM heartbeats to RM. Every heartbeat triggers events
dispatched in RM. Which needs draining for many cases. Furthermore, with this
change enforcing more strict responseId check, now we need to drain the RM
dispatcher events after every MockNM heartbeat. Otherwise, two sequential
MockNM heartbeat will fail on the second heartbeat, because RM is still
processing the first heartbeat event.
Instead of going through all the place where {{nm.nodeHeartbeat}} is called and
add {{rm.drainEvent}} afterwards, I changed the MockNM api, and call drain
inside the heartbeat method.
For easy review, the real changes are in these four files:
{{ResourceTrackerService}}, {{MockNM}}, {{MockRM}} and
{{TestResourceTrackerService}}. All other file changes are simply because of
api change in MockNM.
> NM heartbeat stuck when responseId overflows MAX_INT
> ----------------------------------------------------
>
> Key: YARN-7102
> URL: https://issues.apache.org/jira/browse/YARN-7102
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Botong Huang
> Assignee: Botong Huang
> Priority: Critical
> Attachments: YARN-7102.v1.patch, YARN-7102.v2.patch
>
>
> ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM
> heartbeat in YARN-6640, please refer to YARN-6640 for details.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]