[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2942:
--------------------------------
    Attachment: CombinedAggregatedLogsProposal_v7.pdf

I’ve uploaded a v7 doc which uses the new log aggregation status.  Both the v6 
and the v7 designs have advantages and disadvantages over the other, so I’ve 
created the below list.  

||v6 design||v7 design||
|depends on ZooKeeper|*no external dependencies*|
|requires synchronizing all NMs|*requires no extra synchronization*|
|*can combine even if some aggregated logs fail*|requires that all aggregated 
logs for an app succeed in order to combine them|
|*implicitly distributes the HDFS load across the cluster for better 
balance*|HDFS load concentrated on RM’s datanode or all writes are remote|
|overall is more complicated|*overall is simpler*|
(bold is better)

Please let me know if anyone has any feedback on the v7 design and which design 
is preferred.  

> Aggregated Log Files should be combined
> ---------------------------------------
>
>                 Key: YARN-2942
>                 URL: https://issues.apache.org/jira/browse/YARN-2942
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to