[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536684#comment-14536684 ] Hudson commented on YARN-3476: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2138 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2138/]) YARN-3476. Nodemanager can fail to delete local logs if log aggregation fails. Contributed by Rohith (jlowe: rev 25e2b02122c4ed760227ab33c49d3445c23b9276) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/CHANGES.txt > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Labels: BB2015-05-TBR > Fix For: 2.7.1 > > Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch, > 0002-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536637#comment-14536637 ] Hudson commented on YARN-3476: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #190 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/190/]) YARN-3476. Nodemanager can fail to delete local logs if log aggregation fails. Contributed by Rohith (jlowe: rev 25e2b02122c4ed760227ab33c49d3445c23b9276) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Labels: BB2015-05-TBR > Fix For: 2.7.1 > > Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch, > 0002-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536575#comment-14536575 ] Hudson commented on YARN-3476: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #180 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/180/]) YARN-3476. Nodemanager can fail to delete local logs if log aggregation fails. Contributed by Rohith (jlowe: rev 25e2b02122c4ed760227ab33c49d3445c23b9276) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Labels: BB2015-05-TBR > Fix For: 2.7.1 > > Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch, > 0002-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536524#comment-14536524 ] Hudson commented on YARN-3476: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2120 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2120/]) YARN-3476. Nodemanager can fail to delete local logs if log aggregation fails. Contributed by Rohith (jlowe: rev 25e2b02122c4ed760227ab33c49d3445c23b9276) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Labels: BB2015-05-TBR > Fix For: 2.7.1 > > Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch, > 0002-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536429#comment-14536429 ] Hudson commented on YARN-3476: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #922 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/922/]) YARN-3476. Nodemanager can fail to delete local logs if log aggregation fails. Contributed by Rohith (jlowe: rev 25e2b02122c4ed760227ab33c49d3445c23b9276) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Labels: BB2015-05-TBR > Fix For: 2.7.1 > > Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch, > 0002-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536380#comment-14536380 ] Hudson commented on YARN-3476: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #191 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/191/]) YARN-3476. Nodemanager can fail to delete local logs if log aggregation fails. Contributed by Rohith (jlowe: rev 25e2b02122c4ed760227ab33c49d3445c23b9276) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/CHANGES.txt > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Labels: BB2015-05-TBR > Fix For: 2.7.1 > > Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch, > 0002-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536066#comment-14536066 ] Hudson commented on YARN-3476: -- FAILURE: Integrated in Hadoop-trunk-Commit #7780 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7780/]) YARN-3476. Nodemanager can fail to delete local logs if log aggregation fails. Contributed by Rohith (jlowe: rev 25e2b02122c4ed760227ab33c49d3445c23b9276) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/CHANGES.txt > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Labels: BB2015-05-TBR > Fix For: 2.7.1 > > Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch, > 0002-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535784#comment-14535784 ] Jason Lowe commented on YARN-3476: -- +1 lgtm. Test failure is unrelated, and I'll fix whitespace nit on commit. > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Labels: BB2015-05-TBR > Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch, > 0002-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535062#comment-14535062 ] Hadoop QA commented on YARN-3476: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 38s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 2s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 5m 47s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 41m 55s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731375/0002-YARN-3476.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f4ebbc6 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7810/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7810/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7810/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7810/console | This message was automatically generated. > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Labels: BB2015-05-TBR > Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch, > 0002-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521471#comment-14521471 ] Hadoop QA commented on YARN-3476: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 7m 40s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 1s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 5m 58s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 48m 51s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729459/0001-YARN-3476.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / de9404f | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7554/artifact/patchprocess/whitespace.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7554/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7554/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7554/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7554/console | This message was automatically generated. > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Attachments: 0001-YARN-3476.patch, 0001-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509841#comment-14509841 ] Jason Lowe commented on YARN-3476: -- I'm OK with deleting the logs upon error uploading. It should be a rare occurrence, and log availability is already a best-effort rather than guaranteed service. Even if we try to retain the logs it has questionable benefit in practice, as the history of a job always points to the aggregated logs, not the node's copy of the logs, and thus the logs will still be "lost" from the end-user's point of view. Savvy users may realize the logs could still be on the original node, but most won't know to check there or how to form the URL to find them. If we always point to the node then that defeats one of the features of log aggregation, since loss of the node will mean the node's URL is bad and we fail to show the logs even if they are aggregated. So for now I say we keep it simple and just cleanup the files on errors to prevent leaks. Speaking of which I took a look at the patch. It will fix the particular error we saw with TFiles, but there could easily be other non-IOExceptions that creep out of the code, especially as it is maintained over time. Would it be better to wrap the cleanup in a finally block or something a little more broadly applicable to errors that occur? > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Attachments: 0001-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498951#comment-14498951 ] Rohith commented on YARN-3476: -- Thanks [~sunilg] for sharing your thoughts. Going for retention logic or time, thinking about NM recovery that retention logic should be stored in state store. Then NM should support for state store update in AggregatddLogService similar to NonAggregatedLogHandler [~jlowe] I attached patch with straightforward fix that handling exception and do post aggregation clean up. Kindly share your opinion on 2 approaches i.e 1. handling exception and 2. retention logic > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Attachments: 0001-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498387#comment-14498387 ] Sunil G commented on YARN-3476: --- HI [~jlowe] and [~rohithsharma] A retention logic to handle this error may become more complex when multiple failures seen during aggretion across application. If this happens rarely, a strong retention logic with a timer s helpful. On a generic level, by considering more failures, a clean up after aggression can save the disk. Which s acceptable as we encountered error and there may not be real pressure to give 100% good logs with an error while aggretion. > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Attachments: 0001-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496099#comment-14496099 ] Hadoop QA commented on YARN-3476: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12724974/0001-YARN-3476.patch against trunk revision fddd552. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7346//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7346//console This message is automatically generated. > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Attachments: 0001-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492593#comment-14492593 ] Rohith commented on YARN-3476: -- bq. Not all of the application's logs were available in HDFS because it encountered an error (token-related) trying to upload the logs. Is this because of IllegalStateException caused failure? There are 2 options # do post-aggregation cleanup by handling IllegalStateException OR # scheduler timer for those log directories which are not uploaded and which are not deleted. Thinking,does IllegalStateException is causing log not to be found in hdfs? If it is not then I think simple way to handle is 1st approach. > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490233#comment-14490233 ] Jason Lowe commented on YARN-3476: -- bq. Such cases we can use the aggregated log-retention configuration i.e 'yarn.log-aggregation.retain-seconds' for deleting from disk. No, that's not going to be OK. One is the lifetime on the aggregation filesystem and what we're considering is the lifetime on the local disk. I can hold a lot more logs in HDFS than I can in the local disk. > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)