[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538087#comment-14538087
 ] 

Jason Lowe commented on YARN-2942:
----------------------------------

My apologies for taking so long to respond.  I took a look at the v6 and v7 
proposals.  If I understand them correctly they both propose that the NMs 
upload the original per-node aggregated log to HDFS and then something (either 
the NMs or the RM) later comes along and creates the aggregate-of-aggregates 
log with a side-index for faster searching and ability to correct for failed 
appends.  These are reasonable ideas, and I prefer the simpler approach.  
However I didn't see details on solving the race condition where a log reader 
comes along, sees from the index file that the desired log isn't in the 
aggregate-of-aggregates, then opens the log and reads from it just as the log 
is deleted by the entity appending to the aggregate-of-aggregates.  Since we 
don't have UNIX-style refcounting of open files in HDFS, deleting the log while 
the reader is trying to read from it is going to be disruptive.

One thing to consider in the proposals -- do we want a threshold for a per-node 
log file where we do not try to append it to the aggregate-of-aggregates file?  
We have an internal solution where we create per-application har files of the 
logs, and that process intentionally skips files that are already "big enough" 
on their own.  Saves significant time and network traffic aggregating files 
that are already beefy enough on their own to justify their existence, as we're 
primarily concerned with cleaning up the tiny logs per node, per app.

Another issue from log aggregation we've seen in practice is that the proposals 
don't address the significant write load the per-node aggregate files place on 
the namenode.  This isn't an absolute requirement for the design, but we've 
noticed it's not just about the number of files and blocks being created but 
also the overall write load associated with those files.  It would be really 
nice to reduce that load significantly.  Thinking off the top of my head, one 
possibility is to have the RM coordinate log aggregation across the nodes.  It 
would work something like this:
- NMs do not upload logs for an application to the aggregate file until told to 
do so by the RM (probably in NM heartbeat response)
- NMs provide periodic progress reports in their heartbeat on how aggregation 
is proceeding and when it succeeds/fails.
- RM coordinates and tracks aggregation process (which NM is "active", revoking 
NMs that have taken too long without progress, etc.)
- Logs would remain on NM local disk and served from there until they are 
uploaded into the app aggregate file, similar to how they work today with the 
per-node aggregate file

This has the advantages of only uploading the logs to HDFS once, only as a 
single aggregate file (plus index), and doesn't require ZooKeeper.  A 
significant downside is that it prolongs the average time the logs will be 
available on HDFS for an application due to the serialized upload process.

> Aggregated Log Files should be combined
> ---------------------------------------
>
>                 Key: YARN-2942
>                 URL: https://issues.apache.org/jira/browse/YARN-2942
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to