[
https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Szilard Nemeth reassigned YARN-4946:
------------------------------------
Assignee: Szilard Nemeth
> RM should write out Aggregated Log Completion file flag next to logs
> --------------------------------------------------------------------
>
> Key: YARN-4946
> URL: https://issues.apache.org/jira/browse/YARN-4946
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: log-aggregation
> Affects Versions: 2.8.0
> Reporter: Robert Kanter
> Assignee: Szilard Nemeth
> Priority: Major
>
> MAPREDUCE-6415 added a tool that combines the aggregated log files for each
> Yarn App into a HAR file. When run, it seeds the list by looking at the
> aggregated logs directory, and then filters out ineligible apps. One of the
> criteria involves checking with the RM that an Application's log aggregation
> status is not still running and has not failed. When the RM "forgets" about
> an older completed Application (e.g. RM failover, enough time has passed,
> etc), the tool won't find the Application in the RM and will just assume that
> its log aggregation succeeded, even if it actually failed or is still running.
> We can solve this problem by doing the following:
> # When the RM sees that an Application has successfully finished aggregation
> its logs, it will write a flag file next to that Application's log files
> # The tool no longer talks to the RM at all. When looking at the FileSystem,
> it now uses that flag file to determine if it should process those log files.
> If the file is there, it archives, otherwise it does not.
> # As part of the archiving process, it will delete the flag file
> # (If you don't run the tool, the flag file will eventually be cleaned up by
> the JHS when it cleans up the aggregated logs because it's in the same
> directory)
> This improvement has several advantages:
> # The edge case about "forgotten" Applications is fixed
> # The tool no longer has to talk to the RM; it only has to consult HDFS.
> This is simpler
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]