[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522642#comment-14522642
 ] 

Robert Kanter commented on YARN-2942:
-------------------------------------

Thanks for pointing me to YARN-1376 and related.  I'll have to look into the 
code to get a better idea, but perhaps we can take advantage of this to do a 
completely different approach for combining the logs.  Now that we have a way 
of checking the status of log aggregation across all nodes in the cluster, 
instead of having to use ZK locks to coordinate all the NMs to append the logs, 
we can have a single server append the logs (maybe a small thread pool in the 
RM that handles this?).  We'd still use append, and the new format, but we 
wouldn't need to use ZooKeeper, and using a single Server to do the combining 
should simplify things.  We'd probably need to add a new 
{{LogAggregationStatus}} enums for "COMBINING" and "COMBINED" or something.  
I'll look into this some more, though what do you think [~vinodkv], [~jlowe], 
[~knoguchi]?

> Aggregated Log Files should be combined
> ---------------------------------------
>
>                 Key: YARN-2942
>                 URL: https://issues.apache.org/jira/browse/YARN-2942
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to