[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152532#comment-14152532
 ] 

Zhijie Shen commented on YARN-2583:
-----------------------------------

Some thoughts about the log deletion service of LRS:

1. I'm not sure if it's good to do normal log deletion in 
AggregatedLogDeletionService, while deleting rolling logs in 
AppLogAggregatorImpl. AggregatedLogDeletionService (inside JHS) will still try 
to delete the whole log dir while the LRS is still running.

2. Usually we do retention by time instead of by size, and it's inconsistent 
between AggregatedLogDeletionService and AppLogAggregatorImpl. While 
AggregatedLogDeletionService keeps all the logs newer than T1, 
AppLogAggregatorImpl may have already deleted logs newer than T1 to limit the 
number of logs of the LRS. It's going to be unpredictable after what time the 
logs should be still available for access.

3. Another problem w.r.t. NM_LOG_AGGREGATION_RETAIN_RETENTION_SIZE_PER_APP is 
that the config is favor of the longer rollingIntervalSeconds. For example, 
NM_LOG_AGGREGATION_RETAIN_RETENTION_SIZE_PER_APP = 10. If a LRS sets 
rollingIntervalSeconds = 1D, after 10D, it's still going to keep all the logs. 
However, If the LRS sets rollingIntervalSeconds = 0.5D, after 10D, it can only 
keep the last 5D's logs, even though the amount of generated logs is the same.

4. Assume we want to do deletion in AppLogAggregatorImpl, should we do deletion 
first and uploading next to avoid that the number of logs can go beyond the cap 
temporally?

> Modify the LogDeletionService to support Log aggregation for LRS
> ----------------------------------------------------------------
>
>                 Key: YARN-2583
>                 URL: https://issues.apache.org/jira/browse/YARN-2583
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-2583.1.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to