[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326637#comment-14326637 ]
Karthik Kambatla commented on YARN-2942: ---------------------------------------- Thanks for clarifying that, Robert. Also, I don't think we should use the word "compaction" for this. I would prefer combined-aggregated-logs or uber-aggregated-logs. Can we split this JIRA into sub-tasks for easier reviewing: curator-ChildReaper, reader/writer, LogCombiner, and NMs calling the LogCombiner (including coordination)? > Aggregated Log Files should be compacted > ---------------------------------------- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature > Affects Versions: 2.6.0 > Reporter: Robert Kanter > Assignee: Robert Kanter > Attachments: CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)