[ https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567434#comment-16567434 ]
Robert Kanter commented on YARN-4946: ------------------------------------- I'm not sure if any of the 3 versions of the ATS have the log aggregation status info. But I agree that we shouldn't add this dependency if possible. I also think it makes sense for the RM to remember Applications if they're still doing something, including the log aggregation. Thanks for the patch [~snemeth], a couple things: # I'm not sure creating so many helper methods is necessary, especially the ones that are one or two lines of code like {{recordLogAggregationStartTime}}. # The current approach is changing when an App is considered finished ({{APP_COMPLETED}}) and delaying it until the log aggregation has finished. That could take minutes after the App actually finishes, so this is going to add a considerable delay on a bunch of other things - definitely something users will notice. I think we should try to limit the scope of the changes so that we leave the App lifecycle as-is, but only change the part where we decide to evict an App from the RM. #- More specifically, if you look at {{RMAppManager#checkAppNumCompletedLimit}}, you can see that it's comparing a counter for the number of completed apps vs the configured max. We can simply adjust the logic here or the counter to only count an App once it's both completed _and_ log aggregation has completed. > RM should not consider an application as COMPLETED when log aggregation is > not in a terminal state > -------------------------------------------------------------------------------------------------- > > Key: YARN-4946 > URL: https://issues.apache.org/jira/browse/YARN-4946 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation > Affects Versions: 2.8.0 > Reporter: Robert Kanter > Assignee: Szilard Nemeth > Priority: Major > Attachments: YARN-4946.001.patch, YARN-4946.002.patch > > > MAPREDUCE-6415 added a tool that combines the aggregated log files for each > Yarn App into a HAR file. When run, it seeds the list by looking at the > aggregated logs directory, and then filters out ineligible apps. One of the > criteria involves checking with the RM that an Application's log aggregation > status is not still running and has not failed. When the RM "forgets" about > an older completed Application (e.g. RM failover, enough time has passed, > etc), the tool won't find the Application in the RM and will just assume that > its log aggregation succeeded, even if it actually failed or is still running. > We can solve this problem by doing the following: > The RM should not consider an app to be fully completed (and thus removed > from its history) until the aggregation status has reached a terminal state > (e.g. SUCCEEDED, FAILED, TIME_OUT). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org