[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540407#comment-14540407
 ] 

Jason Lowe commented on YARN-2942:
----------------------------------

bq. Can you give some more details on this? Is it something you can share?

It's a hack to help mitigate the log aggregation namespace scaling issues on 
our large clusters.  Essentially its a periodic process to run an Oozie 
workflow that does the following:

# determines which applications are good candidates for log archiving (i.e.: 
lots of files and total size is not that big)
# runs a streaming job with a shell script that uses the list of applications 
to aggregate as input
# for each application it runs a local-mode archive job to archive the log 
contents
# when the archive has been created it swaps out the application directory with 
a symlink into the har archive

The symlink makes the archive transparent to the readers.  Both the JHS and the 
"yarn logs" command use FileContext and "just worked" with the symlink into the 
har without modifications.

So yes, we are running a MapReduce job to archive the logs which itself will 
create more logs.  However it processes many application logs for each 
archiving job.  If there is sufficient interest we can pursue how to share it, 
but the script is specific to how we configure our nodes and clusters and 
relies on unsupported symlinks.  I'm hoping the outcome of this JIRA allows us 
to move away from the need for it.

bq. We'd have to implement your last bullet point to have the NMs serve the 
logs in the meantime, as I don't think that's there today. 

That feature is indeed there today.  Links to the app logs on the NM will try 
to serve the local app logs first, then redirect to the log server if the local 
logs are unavailable.  See NMController and ContainerLogsPage.  It only becomes 
an issue when things link to the aggregated log server directly before the NM 
has finished aggregating them.

> Aggregated Log Files should be combined
> ---------------------------------------
>
>                 Key: YARN-2942
>                 URL: https://issues.apache.org/jira/browse/YARN-2942
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to