Brook Zhou created YARN-7098:
--------------------------------
Summary: LocalizerRunner should immediately send heartbeat
response LocalizerStatus.DIE when the Container transitions from LOCALIZING to
KILLING
Key: YARN-7098
URL: https://issues.apache.org/jira/browse/YARN-7098
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Reporter: Brook Zhou
Assignee: Brook Zhou
Priority: Minor
Currently, the following can happen:
1. ContainerLocalizer heartbeats to ResourceLocalizationService.
2. LocalizerTracker.processHeartbeat verifies that there is a LocalizerRunner
for the localizerId (containerId).
3. Container receives kill event, goes from LOCALIZING -> KILLING. The
LocalizerRunner for the localizerId is removed from LocalizerTracker.
4. Since check (2) passed, LocalizerRunner sends heartbeat response with
LocalizerStatus.LIVE and the next file to download.
What should happen here is that (4) sends a LocalizerStatus.DIE, since (3)
happened before the heartbeat response in (4). This saves the container from
potentially downloading an extra resource which will end up being deleted
anyway.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]