[jira] [Updated] (YARN-7098) LocalizerRunner should immediately send heartbeat response LocalizerStatus.DIE when the Container transitions from LOCALIZING to KILLING

Brook Zhou (JIRA) Mon, 28 Aug 2017 15:52:00 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-7098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brook Zhou updated YARN-7098:
-----------------------------
    Description: 
Currently, the following can happen:

1. ContainerLocalizer heartbeats to ResourceLocalizationService.
2. LocalizerTracker.processHeartbeat verifies that there is a LocalizerRunner 
for the localizerId (containerId).
3. Container receives kill event, goes from LOCALIZING -> KILLING. The 
LocalizerRunner is not removed from LocalizerTracker due to locking.
4. Since check (2) passed, LocalizerRunner sends heartbeat response with 
LocalizerStatus.LIVE and the next file to download.

What should happen here is that (4) sends a LocalizerStatus.DIE, since (3) 
happened before the heartbeat response in (4). This saves the container from 
potentially downloading an extra resource which will end up being deleted 
anyway.

  was:
Currently, the following can happen:

1. ContainerLocalizer heartbeats to ResourceLocalizationService.
2. LocalizerTracker.processHeartbeat verifies that there is a LocalizerRunner 
for the localizerId (containerId).
3. Container receives kill event, goes from LOCALIZING -> KILLING. The 
LocalizerRunner for the localizerId is removed from LocalizerTracker.
4. Since check (2) passed, LocalizerRunner sends heartbeat response with 
LocalizerStatus.LIVE and the next file to download.

What should happen here is that (4) sends a LocalizerStatus.DIE, since (3) 
happened before the heartbeat response in (4). This saves the container from 
potentially downloading an extra resource which will end up being deleted 
anyway.


> LocalizerRunner should immediately send heartbeat response 
> LocalizerStatus.DIE when the Container transitions from LOCALIZING to KILLING
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-7098
>                 URL: https://issues.apache.org/jira/browse/YARN-7098
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Brook Zhou
>            Assignee: Brook Zhou
>            Priority: Minor
>
> Currently, the following can happen:
> 1. ContainerLocalizer heartbeats to ResourceLocalizationService.
> 2. LocalizerTracker.processHeartbeat verifies that there is a LocalizerRunner 
> for the localizerId (containerId).
> 3. Container receives kill event, goes from LOCALIZING -> KILLING. The 
> LocalizerRunner is not removed from LocalizerTracker due to locking.
> 4. Since check (2) passed, LocalizerRunner sends heartbeat response with 
> LocalizerStatus.LIVE and the next file to download.
> What should happen here is that (4) sends a LocalizerStatus.DIE, since (3) 
> happened before the heartbeat response in (4). This saves the container from 
> potentially downloading an extra resource which will end up being deleted 
> anyway.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YARN-7098) LocalizerRunner should immediately send heartbeat response LocalizerStatus.DIE when the Container transitions from LOCALIZING to KILLING

Reply via email to