[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Kanter updated YARN-2942: -------------------------------- Attachment: YARN-2942-preliminary.001.patch CompactedAggregatedLogsProposal_v1.pdf I've uploaded a design proposal for a solution and a preliminary patch that has the compaction code in it. Playing around with it locally, I was able to easily hack the JHS to display logs from a compacted log file. > Aggregated Log Files should be compacted > ---------------------------------------- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature > Affects Versions: 2.6.0 > Reporter: Robert Kanter > Assignee: Robert Kanter > Attachments: CompactedAggregatedLogsProposal_v1.pdf, > YARN-2942-preliminary.001.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)