Zhijie Shen commented on YARN-2468:

I'm afraid it may be not fair to compare the log file creation between a single 
long running service and a short-term application.

I'm thinking about the file problem in a different direction.  Let's see how 
many log files will be created for a YARN cluster. For example, a long running 
service takes 10% resource from the cluster, and runs for 10 days. On each day, 
it will spawn out 1 log file per day. On the other side, for example, a normal 
application also takes 10% resource from the cluster, runs for 1 days, and 
spawn out 1 log file. Suppose the application will be started every day. Over 
10 days, the number of spawned logs of both the long running service and the 10 
iterations of the application is 10.

So from the point of view of the cluster, the number of logs is proportional to 
the resource usage instead of the application number. The similar resource 
usage may result in the similar number of log files. The case may not becoming 
even worse if we take the whole cluster into account. However, I agree we loose 
the opportunity to even make a long running service to use a single log file, 
reducing the total log file number.

To completely resolve the too-many-files problem, we make think of timeline 
server, which has the store layer to deal with the real I/O on your behalf. 
Another optimization may be log retention, I'm not sure the feature already 
exists or have been proposed together in this solution.

> Log handling for LRS
> --------------------
>                 Key: YARN-2468
>                 URL: https://issues.apache.org/jira/browse/YARN-2468
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: log-aggregation, nodemanager, resourcemanager
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-2468.1.patch
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.

This message was sent by Atlassian JIRA

Reply via email to