[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131892#comment-14131892
 ] 

Xuan Gong commented on YARN-2468:
---------------------------------

Did more investigations and offline discussions. It turns out this is a really 
hard problem. So, we decide to solve this step by step. 

For the first step, we will stick to the original proposal: change the log 
layout, create a directory (named as node id of the NM), under this directory, 
every time when AppLogAggregatorImpl starts to upload container logs; it will 
create a file (named as node_id + timestamp). 
This method will increase the number of log files, but it will work fine for a 
small cluster. 

For the next step, we need to find a better way to handle the logs more 
efficiently. We would like to aggregate all containers’ log (Those containers 
are belong to the same NM) in a single file. In that case, the total number of 
logs is bounded. But we need find more scalable way, other than TFile, to do 
it. Will open a separate ticket for this.


> Log handling for LRS
> --------------------
>
>                 Key: YARN-2468
>                 URL: https://issues.apache.org/jira/browse/YARN-2468
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: log-aggregation, nodemanager, resourcemanager
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-2468.1.patch
>
>
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to