[
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131892#comment-14131892
]
Xuan Gong commented on YARN-2468:
---------------------------------
Did more investigations and offline discussions. It turns out this is a really
hard problem. So, we decide to solve this step by step.
For the first step, we will stick to the original proposal: change the log
layout, create a directory (named as node id of the NM), under this directory,
every time when AppLogAggregatorImpl starts to upload container logs; it will
create a file (named as node_id + timestamp).
This method will increase the number of log files, but it will work fine for a
small cluster.
For the next step, we need to find a better way to handle the logs more
efficiently. We would like to aggregate all containers’ log (Those containers
are belong to the same NM) in a single file. In that case, the total number of
logs is bounded. But we need find more scalable way, other than TFile, to do
it. Will open a separate ticket for this.
> Log handling for LRS
> --------------------
>
> Key: YARN-2468
> URL: https://issues.apache.org/jira/browse/YARN-2468
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: log-aggregation, nodemanager, resourcemanager
> Reporter: Xuan Gong
> Assignee: Xuan Gong
> Attachments: YARN-2468.1.patch
>
>
> Currently, when application is finished, NM will start to do the log
> aggregation. But for Long running service applications, this is not ideal.
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a
> single file. The files could become larger and larger.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)