[ 
https://issues.apache.org/jira/browse/YARN-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-10059:
----------------------------
    Attachment: YARN-10059.001.patch

> Final states of failed-to-localize containers are not recorded in NM state 
> store
> --------------------------------------------------------------------------------
>
>                 Key: YARN-10059
>                 URL: https://issues.apache.org/jira/browse/YARN-10059
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: YARN-10059.001.patch
>
>
> Currently we found an issue that many localizers of completed containers were 
> launched and exhausted memory/cpu of that machine after NM restarted, these 
> containers were all failed and completed when localizing on a non-existed 
> local directory which is caused by another problem, but their final states 
> weren't recorded in NM state store.
>  The process flow of a fail-to-localize container is as follow:
> {noformat}
> ResourceLocalizationService$LocalizerRunner#run
> -> ContainerImpl$ResourceFailedTransition#transition handle LOCALIZING -> 
> LOCALIZATION_FAILED upon RESOURCE_FAILED
>       dispatch LocalizationEventType.CLEANUP_CONTAINER_RESOURCES
>       -> ResourceLocalizationService#handleCleanupContainerResources  handle 
> CLEANUP_CONTAINER_RESOURCES
>           dispatch ContainerEventType.CONTAINER_RESOURCES_CLEANEDUP
>           -> ContainerImpl$LocalizationFailedToDoneTransition#transition  
> handle LOCALIZATION_FAILED -> DONE upon CONTAINER_RESOURCES_CLEANEDUP
> {noformat}
> There's no update for state store in this flow now, which is required to 
> avoid unnecessary localizations after NM restarts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to