[jira] [Commented] (YARN-4771) Some containers can be skipped during log aggregation after NM restart

Jason Lowe (JIRA) Tue, 08 Mar 2016 06:48:11 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184991#comment-15184991
 ]


Jason Lowe commented on YARN-4771:
----------------------------------

The problem occurs because removeVeryOldStoppedContainersFromCache will remove 
containers from the state store that have completed at least 
yarn.nodemanager.duration-to-track-stopped-containers milliseconds ago.  Once 
the container state is removed from the state store there's nothing to recover 
for that container when the NM restarts.  With no information about that 
container to recover, the log aggregation service doesn't know it needs to 
aggregate the logs for that container, so the container is skipped during log 
aggregation.

> Some containers can be skipped during log aggregation after NM restart
> ----------------------------------------------------------------------
>
>                 Key: YARN-4771
>                 URL: https://issues.apache.org/jira/browse/YARN-4771
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.2
>            Reporter: Jason Lowe
>
> A container can be skipped during log aggregation after a work-preserving 
> nodemanager restart if the following events occur:
> # Container completes more than 
> yarn.nodemanager.duration-to-track-stopped-containers milliseconds before the 
> restart
> # At least one other container completes after the above container and before 
> the restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4771) Some containers can be skipped during log aggregation after NM restart

Reply via email to