[ 
https://issues.apache.org/jira/browse/YARN-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3464:
----------------------------
    Description: 
Race condition in LocalizerRunner causes container localization timeout.
Currently LocalizerRunner will kill the ContainerLocalizer when pending list 
for LocalizerResourceRequestEvent is empty.
{code}
      } else if (pending.isEmpty()) {
        action = LocalizerAction.DIE;
      }
{code}
If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the 
ContainerLocalizer due to empty pending list, this 
LocalizerResourceRequestEvent will never be handled.
Without ContainerLocalizer, LocalizerRunner#update will never be called.
The container will stay at LOCALIZING state, until the container is killed by 
AM due to TASK_TIMEOUT.

  was:
Race condition in LocalizerRunner causes container localization timeout.
Currently LocalizerRunner will kill the ContainerLocalizer when pending list 
for LocalizerResourceRequestEvent is empty.
{code}
      } else if (pending.isEmpty()) {
        action = LocalizerAction.DIE;
      }
{code}
If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the 
ContainerLocalizer due to empty pending list, this 
LocalizerResourceRequestEvent will never be handled.
The container will stay at LOCALIZING state, until the container is killed by 
AM due to TASK_TIMEOUT.


> Race condition in LocalizerRunner causes container localization timeout.
> ------------------------------------------------------------------------
>
>                 Key: YARN-3464
>                 URL: https://issues.apache.org/jira/browse/YARN-3464
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>            Priority: Critical
>
> Race condition in LocalizerRunner causes container localization timeout.
> Currently LocalizerRunner will kill the ContainerLocalizer when pending list 
> for LocalizerResourceRequestEvent is empty.
> {code}
>       } else if (pending.isEmpty()) {
>         action = LocalizerAction.DIE;
>       }
> {code}
> If a LocalizerResourceRequestEvent is added after LocalizerRunner kill the 
> ContainerLocalizer due to empty pending list, this 
> LocalizerResourceRequestEvent will never be handled.
> Without ContainerLocalizer, LocalizerRunner#update will never be called.
> The container will stay at LOCALIZING state, until the container is killed by 
> AM due to TASK_TIMEOUT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to