[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555382#comment-14555382
 ] 

Robert Kanter commented on YARN-2942:
-------------------------------------

[~kasha], [~vinodkv], [~jlowe], and I had a discussion earlier today about the 
best way to more forward on this.  We came up with a design that mostly 
picks-and-chooses from the previous designs:
- The log files get aggregated to HDFS by each NM as they do now, except we use 
a concat-friendly format (like the one I've used in patches on this JIRA)
- We'd then concat the aggregated log files for an application into a single 
file.  Designs v4 and v5 had this, but they were using ZooKeeper to coordinate 
the NMs concatenating their own file.  Now that the RM knows when all NMs are 
done aggregating, it can take care of the concatenation via a new Service which 
concats the aggregated log files for a particular job into a single file at 
some interval.  (So ZooKeeper isn't required and no coordination is really 
needed)
- In the discussion, we talked about having another new RM service that would 
periodically compact the concatenated files (i.e. copy and replace them) to 
cleanup the blocks.  Ideally, this would be something that HDFS could add 
itself, and we wouldn't need this step.  However, [~kasha] and I talked with 
some HDFS folks and they're not sure this is something they want to put in 
HDFS.  In order to ensure that the compaction doesn't run while the NN is busy, 
they suggested having it triggered by a command that the admin runs (like 
what's done with HDFS balancing).  I think that's a better idea than having the 
RM automatically do it arbitrarily, in the meantime.  If HDFS ever adds this in 
the future, this last step is something that can be easily deprecated.

I'll write a v8 document with the formal details and upload it sometime 
tomorrow.

> Aggregated Log Files should be combined
> ---------------------------------------
>
>                 Key: YARN-2942
>                 URL: https://issues.apache.org/jira/browse/YARN-2942
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to