Robert Kanter created YARN-4946:
-----------------------------------

             Summary: RM should write out Aggregated Log Completion file flag 
next to logs
                 Key: YARN-4946
                 URL: https://issues.apache.org/jira/browse/YARN-4946
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: log-aggregation
    Affects Versions: 2.8.0
            Reporter: Robert Kanter
            Assignee: Haibo Chen


MAPREDUCE-6415 added a tool that combines the aggregated log files for each 
Yarn App into a HAR file.  When run, it seeds the list by looking at the 
aggregated logs directory, and then filters out ineligible apps.  One of the 
criteria involves checking with the RM that an Application's log aggregation 
status is not still running and has not failed.  When the RM "forgets" about an 
older completed Application (e.g. RM failover, enough time has passed, etc), 
the tool won't find the Application in the RM and will just assume that its log 
aggregation succeeded, even if it actually failed or is still running.

We can solve this problem by doing the following:
# When the RM sees that an Application has successfully finished aggregation 
it's logs, it will write a flag file next to that Application's log files
# The tool no longer talks to the RM at all.  When looking at the FileSystem, 
it now uses that flag file to determine if it should process those log files.  
If the file is there, it archives, otherwise it does not.
# As part of the archiving process, it will delete the flag file
# (If you don't run the tool, the flag file will eventually be cleaned up by 
the JHS when it cleans up the aggregated logs because it's in the same 
directory)

This improvement has several advantages:
# The edge case about "forgotten" Applications is fixed
# The tool no longer has to talk to the RM; it only has to consult HDFS.  This 
is simpler




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to