[jira] [Updated] (YARN-11188) Only files belong to the first file controller are removed even if multiple log aggregation file controllers are configured

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11188:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Only files belong to the first file controller are removed even if multiple 
> log aggregation file controllers are configured
> ---
>
> Key: YARN-11188
> URL: https://issues.apache.org/jira/browse/YARN-11188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Log aggregation can be configured to have a comma-separated list of file 
> controllers.
> The current behaviour only removes files that belong to the first file 
> controller.
> This can be problematic. 
> For example, if some user configures IFile as the file controller, and later 
> on changes the file controllers to specify multiple file controllers (e.g. 
> value = TFile,IFile) then only the first controller will be considered and 
> the files belong to that controller will be removed, in this case files 
> written by the TFile controller will be removed and the files created with 
> the IFile controller will be kept.
> This behaviour should be changed so that all of the files should be removed 
> if multiple file controllers are enabled.
> h2. CODE PATH
> 
> 1. 
> [AggregatedLogDeletionService$LogDeletionTask#run|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L82-L108]:
>  
> Let's understand what does this method do.
> 1.1 An important bit is to see how the value of the field called 
> 'retentionMillis' is set. In the constructor of LogDeletionTask, there's an 
> incoming parameter called 'retentionSecs' that is just multiplied by 1000 to 
> have a millisecond value.
> Let's see where 'retentionSecs' is coming from.
> 1.2 
> [AggregatedLogDeletionService#scheduleLogDeletionTask|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L258-L283]
>  that sets the value of retentionSecs.
> The config key for this value is 'yarn.log-aggregation.retain-seconds'.
> The javadoc says: "How long to wait before deleting aggregated logs, -1 
> disables. Be careful set this too small and you will spam the name node."
> 1.3 Going back to 
> [https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L82-L108],
>  the 'cutOffMillis' value is computed by getting the current time in millis 
> minus the retentionMillis.
> 1.4 The main point of this method is to iterate over the files in the remote 
> root log dir (field called 'remoteRootLogDir') and to check if it is a 
> directory. If so, a new Path is created with that particular directory ([code 
> link|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L90-L96]).
> One more important thing to mention: There's a field called 'suffix' that is 
> added to the remote root log dir path.
> Let's check how the 'remoteRootLogDir' and 'suffix' field get its value as 
> this is crucial to understand how the log dirs are deleted.
> 1.5 remoteRootLogDir is set in the constructor of LogDeletionTask, 
> [here|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L77].
> The value is returned by calling fileController.getRemoteRootLogDir().
> The LogAggregationFileControllerFactory creates the instance of 
> LogAggregationFileController.
> 
> *The process of determining the log aggregation file controller is quite 
> messy, let me describe this in detail.*
> *There are 2 types of file controllers: LogAggregationIndexedFileController 
> and LogAggregationTFileController*
> *There's a testcase called 
> [TestLogAggregationFileControllerFactory#testLogAggregationFileControllerFactory|#testLogAggregationFileControllerFactory]
>  that shows how the LogAggregationFileControllerFactory is configured.*
> 2.1 

[jira] [Updated] (YARN-11188) Only files belong to the first file controller are removed even if multiple log aggregation file controllers are configured

2022-06-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11188:
--
Labels: pull-request-available  (was: )

> Only files belong to the first file controller are removed even if multiple 
> log aggregation file controllers are configured
> ---
>
> Key: YARN-11188
> URL: https://issues.apache.org/jira/browse/YARN-11188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Log aggregation can be configured to have a comma-separated list of file 
> controllers.
> The current behaviour only removes files that belong to the first file 
> controller.
> This can be problematic. 
> For example, if some user configures IFile as the file controller, and later 
> on changes the file controllers to specify multiple file controllers (e.g. 
> value = TFile,IFile) then only the first controller will be considered and 
> the files belong to that controller will be removed, in this case files 
> written by the TFile controller will be removed and the files created with 
> the IFile controller will be kept.
> This behaviour should be changed so that all of the files should be removed 
> if multiple file controllers are enabled.
> h2. CODE PATH
> 
> 1. 
> [AggregatedLogDeletionService$LogDeletionTask#run|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L82-L108]:
>  
> Let's understand what does this method do.
> 1.1 An important bit is to see how the value of the field called 
> 'retentionMillis' is set. In the constructor of LogDeletionTask, there's an 
> incoming parameter called 'retentionSecs' that is just multiplied by 1000 to 
> have a millisecond value.
> Let's see where 'retentionSecs' is coming from.
> 1.2 
> [AggregatedLogDeletionService#scheduleLogDeletionTask|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L258-L283]
>  that sets the value of retentionSecs.
> The config key for this value is 'yarn.log-aggregation.retain-seconds'.
> The javadoc says: "How long to wait before deleting aggregated logs, -1 
> disables. Be careful set this too small and you will spam the name node."
> 1.3 Going back to 
> [https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L82-L108],
>  the 'cutOffMillis' value is computed by getting the current time in millis 
> minus the retentionMillis.
> 1.4 The main point of this method is to iterate over the files in the remote 
> root log dir (field called 'remoteRootLogDir') and to check if it is a 
> directory. If so, a new Path is created with that particular directory ([code 
> link|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L90-L96]).
> One more important thing to mention: There's a field called 'suffix' that is 
> added to the remote root log dir path.
> Let's check how the 'remoteRootLogDir' and 'suffix' field get its value as 
> this is crucial to understand how the log dirs are deleted.
> 1.5 remoteRootLogDir is set in the constructor of LogDeletionTask, 
> [here|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L77].
> The value is returned by calling fileController.getRemoteRootLogDir().
> The LogAggregationFileControllerFactory creates the instance of 
> LogAggregationFileController.
> 
> *The process of determining the log aggregation file controller is quite 
> messy, let me describe this in detail.*
> *There are 2 types of file controllers: LogAggregationIndexedFileController 
> and LogAggregationTFileController*
> *There's a testcase called 
> [TestLogAggregationFileControllerFactory#testLogAggregationFileControllerFactory|#testLogAggregationFileControllerFactory]
>  that shows how the LogAggregationFileControllerFactory is configured.*
> 2.1 First, some important configs:
> 2.1.1 

[jira] [Updated] (YARN-11188) Only files belong to the first file controller are removed even if multiple log aggregation file controllers are configured

2022-06-20 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-11188:
--
Summary: Only files belong to the first file controller are removed even if 
multiple log aggregation file controllers are configured  (was: Only files 
belong to the first first file controller are removed even if multiple log 
aggregation file controllers are configured)

> Only files belong to the first file controller are removed even if multiple 
> log aggregation file controllers are configured
> ---
>
> Key: YARN-11188
> URL: https://issues.apache.org/jira/browse/YARN-11188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.4.0
>
>
> Log aggregation can be configured to have a comma-separated list of file 
> controllers.
> The current behaviour only removes files that belong to the first file 
> controller.
> This can be problematic. 
> For example, if some user configures IFile as the file controller, and later 
> on changes the file controllers to specify multiple file controllers (e.g. 
> value = TFile,IFile) then only the first controller will be considered and 
> the files belong to that controller will be removed, in this case files 
> written by the TFile controller will be removed and the files created with 
> the IFile controller will be kept.
> This behaviour should be changed so that all of the files should be removed 
> if multiple file controllers are enabled.
> h2. CODE PATH
> 
> 1. 
> [AggregatedLogDeletionService$LogDeletionTask#run|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L82-L108]:
>  
> Let's understand what does this method do.
> 1.1 An important bit is to see how the value of the field called 
> 'retentionMillis' is set. In the constructor of LogDeletionTask, there's an 
> incoming parameter called 'retentionSecs' that is just multiplied by 1000 to 
> have a millisecond value.
> Let's see where 'retentionSecs' is coming from.
> 1.2 
> [AggregatedLogDeletionService#scheduleLogDeletionTask|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L258-L283]
>  that sets the value of retentionSecs.
> The config key for this value is 'yarn.log-aggregation.retain-seconds'.
> The javadoc says: "How long to wait before deleting aggregated logs, -1 
> disables. Be careful set this too small and you will spam the name node."
> 1.3 Going back to 
> [https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L82-L108],
>  the 'cutOffMillis' value is computed by getting the current time in millis 
> minus the retentionMillis.
> 1.4 The main point of this method is to iterate over the files in the remote 
> root log dir (field called 'remoteRootLogDir') and to check if it is a 
> directory. If so, a new Path is created with that particular directory ([code 
> link|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L90-L96]).
> One more important thing to mention: There's a field called 'suffix' that is 
> added to the remote root log dir path.
> Let's check how the 'remoteRootLogDir' and 'suffix' field get its value as 
> this is crucial to understand how the log dirs are deleted.
> 1.5 remoteRootLogDir is set in the constructor of LogDeletionTask, 
> [here|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L77].
> The value is returned by calling fileController.getRemoteRootLogDir().
> The LogAggregationFileControllerFactory creates the instance of 
> LogAggregationFileController.
> 
> *The process of determining the log aggregation file controller is quite 
> messy, let me describe this in detail.*
> *There are 2 types of file controllers: LogAggregationIndexedFileController 
> and LogAggregationTFileController*
> *There's a testcase called 
>