[
https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939352#comment-16939352
]
Adam Antal commented on YARN-4946:
----------------------------------
I'm +1 (non-binding) on reverting this patch, and check it in detail.
I have to add that the state store does not save the Log Aggregation Status, it
defaults to "NOT_STARTED". During recovery without final aggregation status the
RM probably thinks that those apps haven't been completed (but they did), and
keep doing this things. This might have been an edge-case that was not covered
in this patch - anyways, I think we should revisit the whole, but as it has
severe impact on performance, we should revert it.
> RM should not consider an application as COMPLETED when log aggregation is
> not in a terminal state
> --------------------------------------------------------------------------------------------------
>
> Key: YARN-4946
> URL: https://issues.apache.org/jira/browse/YARN-4946
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: log-aggregation
> Affects Versions: 2.8.0
> Reporter: Robert Kanter
> Assignee: Szilard Nemeth
> Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-4946.001.patch, YARN-4946.002.patch,
> YARN-4946.003.patch, YARN-4946.004.patch
>
>
> MAPREDUCE-6415 added a tool that combines the aggregated log files for each
> Yarn App into a HAR file. When run, it seeds the list by looking at the
> aggregated logs directory, and then filters out ineligible apps. One of the
> criteria involves checking with the RM that an Application's log aggregation
> status is not still running and has not failed. When the RM "forgets" about
> an older completed Application (e.g. RM failover, enough time has passed,
> etc), the tool won't find the Application in the RM and will just assume that
> its log aggregation succeeded, even if it actually failed or is still running.
> We can solve this problem by doing the following:
> The RM should not consider an app to be fully completed (and thus removed
> from its history) until the aggregation status has reached a terminal state
> (e.g. SUCCEEDED, FAILED, TIME_OUT).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]