[jira] [Commented] (YARN-8617) Aggregated Application Logs accumulates for long running jobs
[ https://issues.apache.org/jira/browse/YARN-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790508#comment-16790508 ] Tarun Parimi commented on YARN-8617: Hi [~bibinchundatt], I was also facing this issue and on testing in my local cluster I observed the follows: {quote}1. limit number of files per node public static final String NM_LOG_AGGREGATION_NUM_LOG_FILES_SIZE_PER_APP = NM_PREFIX + "log-aggregation.num-log-files-per-app";{quote} This doesn't seem to work currently for IndexedFileFormat. After the file exceeds LOG_ROLL_OVER_MAX_FILE_SIZE_GB, a new file is created. But the older node files can keep on accumulating as long as the app is running. Should we implement this config for IndexedFileFormat also as a fix? {quote}For long running service the application folder eg :user/logs/application_1234 modification time gets updated on every upload cycle. This could cause nodefile to remain in hdfs if no new containers are allocated to same node.{quote} Should we check and delete nodefiles in AggrgeatedLogDeletionService for RUNNING apps without the checking the condition appDir.getModificationTime() < cutoffMillis ? Doing so will delete the older node files and fix the problem of old node files getting accumulated. > Aggregated Application Logs accumulates for long running jobs > - > > Key: YARN-8617 > URL: https://issues.apache.org/jira/browse/YARN-8617 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Priority: Major > > Currently AggregationDeletionService will delete older aggregated log files > once when they are complete. This will cause logs to accumulate for Long > Running Jobs like Llap, Spark Streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8617) Aggregated Application Logs accumulates for long running jobs
[ https://issues.apache.org/jira/browse/YARN-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714696#comment-16714696 ] Bibin A Chundatt commented on YARN-8617: [~Prabhu Joseph] Looked into the issue again ..YARN-2583 contains 2 parts # limit number of files per node public static final String NM_LOG_AGGREGATION_NUM_LOG_FILES_SIZE_PER_APP = NM_PREFIX + "log-aggregation.num-log-files-per-app"; # Delete files old than expiry time. {code} if (appDir.isDirectory() && appDir.getModificationTime() < cutoffMillis) { {code} {quote} The AggrgeatedLogDeletionService does deletion for Running Job based upon the file modification time which always will be latest as the rolled logs are getting updated into the node1 file regularly {quote} For long running service the *application folder* eg :user/logs/application_1234 modification time gets updated on every upload cycle. This could cause nodefile to remain in hdfs if no new containers are allocated to same node. > Aggregated Application Logs accumulates for long running jobs > - > > Key: YARN-8617 > URL: https://issues.apache.org/jira/browse/YARN-8617 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Priority: Major > > Currently AggregationDeletionService will delete older aggregated log files > once when they are complete. This will cause logs to accumulate for Long > Running Jobs like Llap, Spark Streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8617) Aggregated Application Logs accumulates for long running jobs
[ https://issues.apache.org/jira/browse/YARN-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570773#comment-16570773 ] Prabhu Joseph commented on YARN-8617: - Thanks [~bibinchundatt]. Still analyzing the issue in our cluster, will open this later if needed. > Aggregated Application Logs accumulates for long running jobs > - > > Key: YARN-8617 > URL: https://issues.apache.org/jira/browse/YARN-8617 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Priority: Major > > Currently AggregationDeletionService will delete older aggregated log files > once when they are complete. This will cause logs to accumulate for Long > Running Jobs like Llap, Spark Streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8617) Aggregated Application Logs accumulates for long running jobs
[ https://issues.apache.org/jira/browse/YARN-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570205#comment-16570205 ] Bibin A Chundatt commented on YARN-8617: [~Prabhu Joseph] For IndexFileFormat the rolling is based in size {code} @Private @VisibleForTesting public long getRollOverLogMaxSize(Configuration conf) { return 1024L * 1024 * 1024 * conf.getInt( LOG_ROLL_OVER_MAX_FILE_SIZE_GB, 10); } {code} > Aggregated Application Logs accumulates for long running jobs > - > > Key: YARN-8617 > URL: https://issues.apache.org/jira/browse/YARN-8617 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Priority: Major > > Currently AggregationDeletionService will delete older aggregated log files > once when they are complete. This will cause logs to accumulate for Long > Running Jobs like Llap, Spark Streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8617) Aggregated Application Logs accumulates for long running jobs
[ https://issues.apache.org/jira/browse/YARN-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569983#comment-16569983 ] Bibin A Chundatt commented on YARN-8617: [~Prabhu Joseph] IIUC LogAggregationService -> AggregatorImpl every upload cycle for each node you will have new files per node {code} node3.openstack_45454_1533315091898 node3.openstack_45454_ node3.openstack_45454_ {code} {{node3.openstack_45454_1533315091898}} file will not be changed once uploaded. So AggregationDeletionService should delete the file after log retain time {code} Set uploadedFilePathsInThisCycle = aggregator.doContainerLogAggregation(logAggregationFileController, appFinished, finishedContainers.contains(container)); ... logControllerContext.setLogUploadTimeStamp(System.currentTimeMillis()); try { this.logAggregationFileController.postWrite(logControllerContext); diagnosticMessage = "Log uploaded successfully for Application: " + appId + " in NodeManager: " + LogAggregationUtils.getNodeString(nodeId) + " at " + Times.format(logControllerContext.getLogUploadTimeStamp()) + "\n"; {code} this.logAggregationFileController.postWrite(logControllerContext); --> renames the file with time stamp > Aggregated Application Logs accumulates for long running jobs > - > > Key: YARN-8617 > URL: https://issues.apache.org/jira/browse/YARN-8617 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Priority: Major > > Currently AggregationDeletionService will delete older aggregated log files > once when they are complete. This will cause logs to accumulate for Long > Running Jobs like Llap, Spark Streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8617) Aggregated Application Logs accumulates for long running jobs
[ https://issues.apache.org/jira/browse/YARN-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568742#comment-16568742 ] Prabhu Joseph commented on YARN-8617: - [~bibinchundatt] yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds is already set to 3600. Will explain the issue clearly. A Long Running Container runs on node1 writes the daemon.log and rotates - daemon.log.1, daemon.log.2 etc, Yarn Log Aggregation happens for rolled log files as expected and combines all files for that node and place it under hdfs path in a log file called node1 file. {code} [hdfs@node2 tmp]$ hadoop fs -ls /app-logs/hive/logs-ifile/application_1533261669437_0001/node3.openstack_45454_1533315091898 -rw-r- 3 hive hadoop 29207 2018-08-03 19:51 /app-logs/hive/logs-ifile/application_1533261669437_0001/node3.openstack_45454_1533315091898 {code} The AggrgeatedLogDeletionService does deletion for Running Job based upon the file modification time which always will be latest as the rolled logs are getting updated into the node1 file regularly. This does not delete the older log contents daemon.log.2 etc part of node1 file. This will cause the node1 file to accumulate when a container is always going to be running on that node. https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L129 {code} for (FileStatus node : logFiles) { if (node.getModificationTime() < cutoffMillis) { try { fs.delete(node.getPath(), true); } catch (IOException ex) { logException("Could not delete " + appDir.getPath(), ex); } } {code} > Aggregated Application Logs accumulates for long running jobs > - > > Key: YARN-8617 > URL: https://issues.apache.org/jira/browse/YARN-8617 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Priority: Major > > Currently AggregationDeletionService will delete older aggregated log files > once when they are complete. This will cause logs to accumulate for Long > Running Jobs like Llap, Spark Streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8617) Aggregated Application Logs accumulates for long running jobs
[ https://issues.apache.org/jira/browse/YARN-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568316#comment-16568316 ] Prabhu Joseph commented on YARN-8617: - Yes [~bibinchundatt], YARN-2583 matches what am expecting. Will test with a non negative value for yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds. Thanks a lot. > Aggregated Application Logs accumulates for long running jobs > - > > Key: YARN-8617 > URL: https://issues.apache.org/jira/browse/YARN-8617 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Priority: Major > > Currently AggregationDeletionService will delete older aggregated log files > once when they are complete. This will cause logs to accumulate for Long > Running Jobs like Llap, Spark Streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8617) Aggregated Application Logs accumulates for long running jobs
[ https://issues.apache.org/jira/browse/YARN-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568306#comment-16568306 ] Bibin A Chundatt commented on YARN-8617: Jira id YARN-2583 > Aggregated Application Logs accumulates for long running jobs > - > > Key: YARN-8617 > URL: https://issues.apache.org/jira/browse/YARN-8617 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Priority: Major > > Currently AggregationDeletionService will delete older aggregated log files > once when they are complete. This will cause logs to accumulate for Long > Running Jobs like Llap, Spark Streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8617) Aggregated Application Logs accumulates for long running jobs
[ https://issues.apache.org/jira/browse/YARN-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568285#comment-16568285 ] Prabhu Joseph commented on YARN-8617: - Yes, the log files from NM local directory will be removed. But i don't see the log files which are aggregated and placed in Hdfs path yarn.nodemanager.remote-app-log-dir getting removed after yarn.log-aggregation.retain-seconds by JHS when the job is *RUNNING*. It deletes only for the completed apps. > Aggregated Application Logs accumulates for long running jobs > - > > Key: YARN-8617 > URL: https://issues.apache.org/jira/browse/YARN-8617 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Priority: Major > > Currently AggregationDeletionService will delete older aggregated log files > once when they are complete. This will cause logs to accumulate for Long > Running Jobs like Llap, Spark Streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8617) Aggregated Application Logs accumulates for long running jobs
[ https://issues.apache.org/jira/browse/YARN-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568261#comment-16568261 ] Bibin A Chundatt commented on YARN-8617: {{AppLogAggregatorImpl#uploadLogsForContainers}} all succesfully uploaded logs will be deleted too. {code} Set uploadedFilePathsInThisCycle = aggregator.doContainerLogAggregation(logAggregationFileController, appFinished, finishedContainers.contains(container)); if (uploadedFilePathsInThisCycle.size() > 0) { uploadedLogsInThisCycle = true; List uploadedFilePathsInThisCycleList = new ArrayList<>(); uploadedFilePathsInThisCycleList.addAll(uploadedFilePathsInThisCycle); deletionTask = new FileDeletionTask(delService, this.userUgi.getShortUserName(), null, uploadedFilePathsInThisCycleList); } {code} > Aggregated Application Logs accumulates for long running jobs > - > > Key: YARN-8617 > URL: https://issues.apache.org/jira/browse/YARN-8617 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Priority: Major > > Currently AggregationDeletionService will delete older aggregated log files > once when they are complete. This will cause logs to accumulate for Long > Running Jobs like Llap, Spark Streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8617) Aggregated Application Logs accumulates for long running jobs
[ https://issues.apache.org/jira/browse/YARN-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568234#comment-16568234 ] Prabhu Joseph commented on YARN-8617: - [~bibinchundatt] Will this also remove the aggregated log files from yarn.nodemanager.remote-app-log-dir when the job is running. > Aggregated Application Logs accumulates for long running jobs > - > > Key: YARN-8617 > URL: https://issues.apache.org/jira/browse/YARN-8617 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Priority: Major > > Currently AggregationDeletionService will delete older aggregated log files > once when they are complete. This will cause logs to accumulate for Long > Running Jobs like Llap, Spark Streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8617) Aggregated Application Logs accumulates for long running jobs
[ https://issues.apache.org/jira/browse/YARN-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567967#comment-16567967 ] Bibin A Chundatt commented on YARN-8617: [~Prabhu Joseph] For long running jobs we can enable rolling aggregation at nodemanagers. Configure non negative value for {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} . Default value is -1. > Aggregated Application Logs accumulates for long running jobs > - > > Key: YARN-8617 > URL: https://issues.apache.org/jira/browse/YARN-8617 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Affects Versions: 2.7.4 >Reporter: Prabhu Joseph >Priority: Major > > Currently AggregationDeletionService will delete older aggregated log files > once when they are complete. This will cause logs to accumulate for Long > Running Jobs like Llap, Spark Streaming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org