[jira] [Commented] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2
[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594455#comment-14594455 ] Naganarasimha G R commented on YARN-3792: - Hi [~sjlee0], Missed to check your comment yesterday, will get this done asap and oops i missed ur second comment to fix earlier... Test case failures in TestDistributedShell and some issue fixes related to ATSV2 Key: YARN-3792 URL: https://issues.apache.org/jira/browse/YARN-3792 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-3792-YARN-2928.001.patch, YARN-3792-YARN-2928.002.patch # encountered [testcase failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] which was happening even without the patch modifications in YARN-3044 TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression # Remove unused {{enableATSV1}} in testDisstributedShell # container metrics needs to be published only for v2 test cases of testDisstributedShell # Nullpointer was thrown in TimelineClientImpl.constructResURI when Aux service was not configured and {{TimelineClient.putObjects}} was getting invoked. # Race condition for the Application events to published and test case verification for RM's ApplicationFinished Timeline Events # Application Tags for converted to lowercase in ApplicationSubmissionContextPBimpl, hence RMTimelinecollector was not able to detect to custom flow details of the app -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3827) Migrate YARN native build to new CMake framework
[ https://issues.apache.org/jira/browse/YARN-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Burlison updated YARN-3827: Attachment: YARN-3827.001.patch Patch to migrate YARN over to the new CMake infrastructure. Requires HADOOP-12036 Migrate YARN native build to new CMake framework Key: YARN-3827 URL: https://issues.apache.org/jira/browse/YARN-3827 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Reporter: Alan Burlison Assignee: Alan Burlison Attachments: YARN-3827.001.patch As per HADOOP-12036, the CMake infrastructure should be refactored and made common across all Hadoop components. This bug covers the migration of YARN to the new CMake infrastructure. This change will also add support for building YARN Native components on Solaris. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594538#comment-14594538 ] Hadoop QA commented on YARN-3116: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 18s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 50m 50s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 98m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740712/YARN-3116.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 20c03c9 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8298/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8298/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8298/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8298/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8298/console | This message was automatically generated. [Collector wireup] We need an assured way to determine if a container is an AM container on NM -- Key: YARN-3116 URL: https://issues.apache.org/jira/browse/YARN-3116 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, timelineserver Reporter: Zhijie Shen Assignee: Giovanni Matteo Fumarola Attachments: YARN-3116.patch In YARN-3030, to start the per-app aggregator only for a started AM container, we need to determine if the container is an AM container or not from the context in NM (we can do it on RM). This information is missing, such that we worked around to considered the container with ID _01 as the AM container. Unfortunately, this is neither necessary or sufficient condition. We need to have a way to determine if a container is an AM container on NM. We can add flag to the container object or create an API to do the judgement. Perhaps the distributed AM information may also be useful to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594468#comment-14594468 ] Hadoop QA commented on YARN-3051: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 53s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 2s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 0s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 0s | The patch does not introduce any new Findbugs (version ) warnings. | | | | 35m 25s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740776/YARN-3051.Reader_API_4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 20c03c9 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8295/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8295/console | This message was automatically generated. [Storage abstraction] Create backing storage read interface for ATS readers --- Key: YARN-3051 URL: https://issues.apache.org/jira/browse/YARN-3051 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3051-YARN-2928.003.patch, YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, YARN-3051.Reader_API_2.patch, YARN-3051.Reader_API_3.patch, YARN-3051.Reader_API_4.patch, YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch Per design in YARN-2928, create backing storage read interface that can be implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent
[ https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594500#comment-14594500 ] Hadoop QA commented on YARN-3176: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 27s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 51s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 32s | The applied patch generated 10 new checkstyle issues (total was 0, now 10). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 51m 7s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 89m 43s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740742/YARN-3176.v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 20c03c9 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8297/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8297/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8297/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8297/console | This message was automatically generated. In Fair Scheduler, child queue should inherit maxApp from its parent Key: YARN-3176 URL: https://issues.apache.org/jira/browse/YARN-3176 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-3176.v1.patch, YARN-3176.v2.patch if the child queue does not have a maxRunningApp limit, it will use the queueMaxAppsDefault. This behavior is not quite right, since queueMaxAppsDefault is normally a small number, whereas some parent queues do have maxRunningApp set to be more than the default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3779: --- Attachment: YARN-3779.03.patch Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster -- Key: YARN-3779 URL: https://issues.apache.org/jira/browse/YARN-3779 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Environment: mrV2, secure mode Reporter: Zhang Wei Assignee: Varun Saxena Priority: Critical Attachments: YARN-3779.01.patch, YARN-3779.02.patch, YARN-3779.03.patch, log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log {{GSSException}} is thrown everytime log aggregation deletion is attempted after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure cluster. The problem can be reproduced by following steps: 1. startup historyserver in secure cluster. 2. Log deletion happens as per expectation. 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh the configuration value. 4. All the subsequent attempts of log deletion fail with {{GSSException}} Following exception can be found in historyserver's log if log deletion is enabled. {noformat} 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this deletion attempt is being aborted | AggregatedLogDeletionService.java:127 java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: vm-31/9.91.12.31; destination host is: vm-33:25000; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getListing(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy10.getListing(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) at org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) at org.apache.hadoop.ipc.Client.call(Client.java:1381) ... 21 more Caused by: javax.security.sasl.SaslException: GSS
[jira] [Commented] (YARN-3835) hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594497#comment-14594497 ] Hadoop QA commented on YARN-3835: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | yarn tests | 50m 50s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 85m 19s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740772/YARN-3835.patch | | Optional Tests | javadoc javac unit | | git revision | trunk / 20c03c9 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8296/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8296/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8296/console | This message was automatically generated. hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml Key: YARN-3835 URL: https://issues.apache.org/jira/browse/YARN-3835 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Vamsee Yarlagadda Assignee: Vamsee Yarlagadda Priority: Minor Attachments: YARN-3835.patch It looks like by default yarn is bundling core-site.xml, yarn-site.xml in test artifact of hadoop-yarn-server-resourcemanager which means that any downstream project which uses this a dependency can have a problem in picking up the user supplied/environment supplied core-site.xml, yarn-site.xml So we should ideally exclude these .xml files from being bundled into the test-jar. (Similar to YARN-1748) I also proactively looked at other YARN modules where this might be happening. {code} vamsee-MBP:hadoop-yarn-project vamsee$ find . -name *-site.xml ./hadoop-yarn/conf/yarn-site.xml ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/yarn-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/core-site.xml {code} And out of these only two modules (hadoop-yarn-server-resourcemanager, hadoop-yarn-server-tests) are building test-jars. In future, if we start building test-jar of other modules, we should exclude these xml files from being bundled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594551#comment-14594551 ] Varun Saxena commented on YARN-3779: Added a patch and submitted it, fixing both cases. This JIRA should move to MAPREDUCE. But not moving it because not sure if Jenkins will be able to post results of the submitted patch then Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster -- Key: YARN-3779 URL: https://issues.apache.org/jira/browse/YARN-3779 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Environment: mrV2, secure mode Reporter: Zhang Wei Assignee: Varun Saxena Priority: Critical Attachments: YARN-3779.01.patch, YARN-3779.02.patch, YARN-3779.03.patch, log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log {{GSSException}} is thrown everytime log aggregation deletion is attempted after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure cluster. The problem can be reproduced by following steps: 1. startup historyserver in secure cluster. 2. Log deletion happens as per expectation. 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh the configuration value. 4. All the subsequent attempts of log deletion fail with {{GSSException}} Following exception can be found in historyserver's log if log deletion is enabled. {noformat} 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this deletion attempt is being aborted | AggregatedLogDeletionService.java:127 java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: vm-31/9.91.12.31; destination host is: vm-33:25000; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getListing(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy10.getListing(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) at org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
[jira] [Updated] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2
[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3792: Attachment: YARN-3792-YARN-2928.003.patch Hi [~sjlee0], fixed 2 nits which you mentioned and seems like test case failures are not related to this jira, will check again in the next run . Test case failures in TestDistributedShell and some issue fixes related to ATSV2 Key: YARN-3792 URL: https://issues.apache.org/jira/browse/YARN-3792 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-3792-YARN-2928.001.patch, YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch # encountered [testcase failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] which was happening even without the patch modifications in YARN-3044 TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression # Remove unused {{enableATSV1}} in testDisstributedShell # container metrics needs to be published only for v2 test cases of testDisstributedShell # Nullpointer was thrown in TimelineClientImpl.constructResURI when Aux service was not configured and {{TimelineClient.putObjects}} was getting invoked. # Race condition for the Application events to published and test case verification for RM's ApplicationFinished Timeline Events # Application Tags for converted to lowercase in ApplicationSubmissionContextPBimpl, hence RMTimelinecollector was not able to detect to custom flow details of the app -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3809) Failed to launch new attempts because ApplicationMasterLauncher's threads all hang
[ https://issues.apache.org/jira/browse/YARN-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594424#comment-14594424 ] Jun Gong commented on YARN-3809: Attach a new patch to address [~jlowe] 's suggestions. Thanks for the review. Failed to launch new attempts because ApplicationMasterLauncher's threads all hang -- Key: YARN-3809 URL: https://issues.apache.org/jira/browse/YARN-3809 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3809.01.patch, YARN-3809.02.patch, YARN-3809.03.patch ApplicationMasterLauncher create a thread pool whose size is 10 to deal with AMLauncherEventType(LAUNCH and CLEANUP). In our cluster, there was many NM with 10+ AM running on it, and one shut down for some reason. After RM found the NM LOST, it cleaned up AMs running on it. Then ApplicationMasterLauncher need handle these 10+ CLEANUP event. ApplicationMasterLauncher's thread pool would be filled up, and they all hang in the code containerMgrProxy.stopContainers(stopRequest) because NM was down, the default RPC time out is 15 mins. It means that in 15 mins ApplicationMasterLauncher could not handle new event such as LAUNCH, then new attempts will fails to launch because of time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3831) Localization failed when a local disk turns from bad to good without NM initializes it
[ https://issues.apache.org/jira/browse/YARN-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594431#comment-14594431 ] zhihai xu commented on YARN-3831: - Hi [~hex108], thanks for reporting this issue, What version is your code? YARN-3491 fixed a race condition when a local disk turns from bad to good. Localization failed when a local disk turns from bad to good without NM initializes it -- Key: YARN-3831 URL: https://issues.apache.org/jira/browse/YARN-3831 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Jun Gong Assignee: Jun Gong A local disk turns from bad to good without NM initializes it(create /path-to-local-dir/usercache and /path-to-local-dir/filecache). When localizing a container, container-executor will try to create directories under /path-to-local-dir/usercache, and it will fail. Then container's localization will fail. Related log is as following: {noformat} 2015-06-19 18:00:01,205 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1431957472783_38706012_01_000465 2015-06-19 18:00:01,212 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /data8/yarnenv/local/nmPrivate/container_1431957472783_38706012_01_000465.tokens. Credentials list: 2015-06-19 18:00:01,216 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1431957472783_38706012_01_000465 startLocalizer is : 20 org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:205) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:981) 2015-06-19 18:00:01,216 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command provided 0 2015-06-19 18:00:01,216 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is tdwadmin 2015-06-19 18:00:01,216 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Failed to create directory /data2/yarnenv/local/usercache/tdwadmin - No such file or directory 2015-06-19 18:00:01,216 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer failed java.io.IOException: Application application_1431957472783_38706012 initialization failed (exitCode=20) with output: main : command provided 0 main : user is tdwadmin Failed to create directory /data2/yarnenv/local/usercache/tdwadmin - No such file or directory at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:214) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:981) Caused by: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:205) ... 1 more 2015-06-19 18:00:01,216 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1431957472783_38706012_01_000465 transitioned from LOCALIZING to LOCALIZATION_FAILED {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3809) Failed to launch new attempts because ApplicationMasterLauncher's threads all hang
[ https://issues.apache.org/jira/browse/YARN-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong updated YARN-3809: --- Attachment: YARN-3809.03.patch Failed to launch new attempts because ApplicationMasterLauncher's threads all hang -- Key: YARN-3809 URL: https://issues.apache.org/jira/browse/YARN-3809 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3809.01.patch, YARN-3809.02.patch, YARN-3809.03.patch ApplicationMasterLauncher create a thread pool whose size is 10 to deal with AMLauncherEventType(LAUNCH and CLEANUP). In our cluster, there was many NM with 10+ AM running on it, and one shut down for some reason. After RM found the NM LOST, it cleaned up AMs running on it. Then ApplicationMasterLauncher need handle these 10+ CLEANUP event. ApplicationMasterLauncher's thread pool would be filled up, and they all hang in the code containerMgrProxy.stopContainers(stopRequest) because NM was down, the default RPC time out is 15 mins. It means that in 15 mins ApplicationMasterLauncher could not handle new event such as LAUNCH, then new attempts will fails to launch because of time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594423#comment-14594423 ] Varun Saxena commented on YARN-3779: [~vinodkv], thats correct. So do you want me to raise another JIRA for that ? Or do it as part of this one only ? Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster -- Key: YARN-3779 URL: https://issues.apache.org/jira/browse/YARN-3779 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Environment: mrV2, secure mode Reporter: Zhang Wei Assignee: Varun Saxena Priority: Critical Attachments: YARN-3779.01.patch, YARN-3779.02.patch, log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log {{GSSException}} is thrown everytime log aggregation deletion is attempted after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure cluster. The problem can be reproduced by following steps: 1. startup historyserver in secure cluster. 2. Log deletion happens as per expectation. 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh the configuration value. 4. All the subsequent attempts of log deletion fail with {{GSSException}} Following exception can be found in historyserver's log if log deletion is enabled. {noformat} 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this deletion attempt is being aborted | AggregatedLogDeletionService.java:127 java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: vm-31/9.91.12.31; destination host is: vm-33:25000; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getListing(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy10.getListing(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) at org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) at
[jira] [Created] (YARN-3837) javadocs of TimelineAuthenticationFilterInitializer give wrong prefix for auth options
Steve Loughran created YARN-3837: Summary: javadocs of TimelineAuthenticationFilterInitializer give wrong prefix for auth options Key: YARN-3837 URL: https://issues.apache.org/jira/browse/YARN-3837 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.8.0 Reporter: Steve Loughran Priority: Minor The javadocs for {{TimelineAuthenticationFilterInitializer}} talk about the prefix {{yarn.timeline-service.authentication.}}, but the code uses {{ yarn.timeline-service.http-authentication.}} as the prefix. best to use {{@value}} and let the javadocs sort it out for themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3838) Rest API failing when ip configured in RM address in secure https mode
[ https://issues.apache.org/jira/browse/YARN-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt moved HADOOP-12096 to YARN-3838: - Component/s: (was: net) (was: security) security Key: YARN-3838 (was: HADOOP-12096) Project: Hadoop YARN (was: Hadoop Common) Rest API failing when ip configured in RM address in secure https mode -- Key: YARN-3838 URL: https://issues.apache.org/jira/browse/YARN-3838 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-HADOOP-12096.patch, 0001-YARN-3810.patch, 0002-YARN-3810.patch Steps to reproduce === 1.Configure hadoop.http.authentication.kerberos.principal as below {code:xml} property namehadoop.http.authentication.kerberos.principal/name valueHTTP/_h...@hadoop.com/value /property {code} 2. In RM web address also configure IP 3. Startup RM Call Rest API for RM {{ curl -i -k --insecure --negotiate -u : https IP /ws/v1/cluster/info}} *Actual* Rest API failing {code} 2015-06-16 19:03:49,845 DEBUG org.apache.hadoop.security.authentication.server.AuthenticationFilter: Authentication exception: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:348) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:519) at org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594783#comment-14594783 ] Hadoop QA commented on YARN-3779: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 56s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 28s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 55s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 5m 53s | Tests passed in hadoop-mapreduce-client-hs. | | | | 43m 25s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740836/YARN-3779.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 055cd5a | | hadoop-mapreduce-client-hs test log | https://builds.apache.org/job/PreCommit-YARN-Build/8301/artifact/patchprocess/testrun_hadoop-mapreduce-client-hs.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8301/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8301/console | This message was automatically generated. Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster -- Key: YARN-3779 URL: https://issues.apache.org/jira/browse/YARN-3779 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Environment: mrV2, secure mode Reporter: Zhang Wei Assignee: Varun Saxena Priority: Critical Attachments: YARN-3779.01.patch, YARN-3779.02.patch, YARN-3779.03.patch, log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log {{GSSException}} is thrown everytime log aggregation deletion is attempted after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure cluster. The problem can be reproduced by following steps: 1. startup historyserver in secure cluster. 2. Log deletion happens as per expectation. 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh the configuration value. 4. All the subsequent attempts of log deletion fail with {{GSSException}} Following exception can be found in historyserver's log if log deletion is enabled. {noformat} 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this deletion attempt is being aborted | AggregatedLogDeletionService.java:127 java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: vm-31/9.91.12.31; destination host is: vm-33:25000; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getListing(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at
[jira] [Updated] (YARN-3838) Rest API failing when ip configured in RM address in secure https mode
[ https://issues.apache.org/jira/browse/YARN-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3838: --- Component/s: (was: security) webapp Rest API failing when ip configured in RM address in secure https mode -- Key: YARN-3838 URL: https://issues.apache.org/jira/browse/YARN-3838 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-HADOOP-12096.patch, 0001-YARN-3810.patch, 0002-YARN-3810.patch Steps to reproduce === 1.Configure hadoop.http.authentication.kerberos.principal as below {code:xml} property namehadoop.http.authentication.kerberos.principal/name valueHTTP/_h...@hadoop.com/value /property {code} 2. In RM web address also configure IP 3. Startup RM Call Rest API for RM {{ curl -i -k --insecure --negotiate -u : https IP /ws/v1/cluster/info}} *Actual* Rest API failing {code} 2015-06-16 19:03:49,845 DEBUG org.apache.hadoop.security.authentication.server.AuthenticationFilter: Authentication exception: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:348) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:519) at org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3809) Failed to launch new attempts because ApplicationMasterLauncher's threads all hang
[ https://issues.apache.org/jira/browse/YARN-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594746#comment-14594746 ] Hadoop QA commented on YARN-3809: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 48s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 52s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 57s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 50m 45s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 98m 47s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740804/YARN-3809.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / bcb3c40 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8299/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8299/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8299/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8299/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8299/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8299/console | This message was automatically generated. Failed to launch new attempts because ApplicationMasterLauncher's threads all hang -- Key: YARN-3809 URL: https://issues.apache.org/jira/browse/YARN-3809 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Jun Gong Assignee: Jun Gong Attachments: YARN-3809.01.patch, YARN-3809.02.patch, YARN-3809.03.patch ApplicationMasterLauncher create a thread pool whose size is 10 to deal with AMLauncherEventType(LAUNCH and CLEANUP). In our cluster, there was many NM with 10+ AM running on it, and one shut down for some reason. After RM found the NM LOST, it cleaned up AMs running on it. Then ApplicationMasterLauncher need handle these 10+ CLEANUP event. ApplicationMasterLauncher's thread pool would be filled up, and they all hang in the code containerMgrProxy.stopContainers(stopRequest) because NM was down, the default RPC time out is 15 mins. It means that in 15 mins ApplicationMasterLauncher could not handle new event such as LAUNCH, then new attempts will fails to launch because of time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3837) javadocs of TimelineAuthenticationFilterInitializer give wrong prefix for auth options
[ https://issues.apache.org/jira/browse/YARN-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3837: --- Attachment: 0001-YARN-3837.patch Attaching patch for the same. Please assign to me if its fine javadocs of TimelineAuthenticationFilterInitializer give wrong prefix for auth options -- Key: YARN-3837 URL: https://issues.apache.org/jira/browse/YARN-3837 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.8.0 Reporter: Steve Loughran Priority: Minor Attachments: 0001-YARN-3837.patch Original Estimate: 0.5h Remaining Estimate: 0.5h The javadocs for {{TimelineAuthenticationFilterInitializer}} talk about the prefix {{yarn.timeline-service.authentication.}}, but the code uses {{ yarn.timeline-service.http-authentication.}} as the prefix. best to use {{@value}} and let the javadocs sort it out for themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3838) Rest API failing when ip configured in RM address in secure https mode
[ https://issues.apache.org/jira/browse/YARN-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594801#comment-14594801 ] Bibin A Chundatt commented on YARN-3838: Typo in earlier comments {quote} As per the discussion till now we should handle in HttpServer2 {quote} As per the discussion till now we should handle in HttpServer2.builder Rest API failing when ip configured in RM address in secure https mode -- Key: YARN-3838 URL: https://issues.apache.org/jira/browse/YARN-3838 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-HADOOP-12096.patch, 0001-YARN-3810.patch, 0002-YARN-3810.patch Steps to reproduce === 1.Configure hadoop.http.authentication.kerberos.principal as below {code:xml} property namehadoop.http.authentication.kerberos.principal/name valueHTTP/_h...@hadoop.com/value /property {code} 2. In RM web address also configure IP 3. Startup RM Call Rest API for RM {{ curl -i -k --insecure --negotiate -u : https IP /ws/v1/cluster/info}} *Actual* Rest API failing {code} 2015-06-16 19:03:49,845 DEBUG org.apache.hadoop.security.authentication.server.AuthenticationFilter: Authentication exception: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:348) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:519) at org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2
[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594778#comment-14594778 ] Hadoop QA commented on YARN-3792: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 30s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 57s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 40s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 46s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 42s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 6s | The patch appears to introduce 8 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 11s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 2m 3s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 8s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 52m 43s | Tests failed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 116m 36s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-applications-distributedshell | | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740818/YARN-3792-YARN-2928.003.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / 8c036a1 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8300/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8300/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8300/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8300/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8300/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8300/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8300/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8300/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8300/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8300/console | This message was automatically generated. Test case failures in TestDistributedShell and some issue fixes related to ATSV2 Key: YARN-3792 URL: https://issues.apache.org/jira/browse/YARN-3792 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-3792-YARN-2928.001.patch, YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch # encountered [testcase failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] which was happening even without the patch modifications in YARN-3044
[jira] [Commented] (YARN-3838) Rest API failing when ip configured in RM address in secure https mode
[ https://issues.apache.org/jira/browse/YARN-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594737#comment-14594737 ] Bibin A Chundatt commented on YARN-3838: In case of resourcemanager the httpserver is started as below and the url used is just the ip address {{WebApps#start}} {code} HttpServer2.Builder builder = new HttpServer2.Builder() .setName(name) .addEndpoint( URI.create(httpScheme + bindAddress + : + port)).setConf(conf).setFindPort(findPort) .setACL(new AccessControlList(conf.get( YarnConfiguration.YARN_ADMIN_ACL, YarnConfiguration.DEFAULT_YARN_ADMIN_ACL))) .setPathSpec(pathList.toArray(new String[0])); {code} Comparing the same to hdfs side for NameNode the URL is formed as below {{DFSUtil#httpServerTemplateForNNAndJN}} {code} URI uri = URI.create(http://; + NetUtils.getHostPortString(httpAddr)); {code} Seems like this is reason why there is a difference in both hdfs and yarn for *REST api functionality when IP is configured in kerberos mode*. In case of hdfs it works but yarn its doesnt. Can we hange RM HTTPServer2.builder as velow {code} HttpServer2.Builder builder = new HttpServer2.Builder() .setName(name) .addEndpoint( URI.create(httpScheme + NetUtils.getHostPortString(new InetSocketAddress( bindAddress, port .setConf(conf) .setFindPort(findPort) .setACL( new AccessControlList(conf.get( YarnConfiguration.YARN_ADMIN_ACL, YarnConfiguration.DEFAULT_YARN_ADMIN_ACL))) .setPathSpec(pathList.toArray(new String[0])); {code} Please do correct me if i am wrong . Rest API failing when ip configured in RM address in secure https mode -- Key: YARN-3838 URL: https://issues.apache.org/jira/browse/YARN-3838 Project: Hadoop YARN Issue Type: Bug Components: security Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-HADOOP-12096.patch, 0001-YARN-3810.patch, 0002-YARN-3810.patch Steps to reproduce === 1.Configure hadoop.http.authentication.kerberos.principal as below {code:xml} property namehadoop.http.authentication.kerberos.principal/name valueHTTP/_h...@hadoop.com/value /property {code} 2. In RM web address also configure IP 3. Startup RM Call Rest API for RM {{ curl -i -k --insecure --negotiate -u : https IP /ws/v1/cluster/info}} *Actual* Rest API failing {code} 2015-06-16 19:03:49,845 DEBUG org.apache.hadoop.security.authentication.server.AuthenticationFilter: Authentication exception: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos credentails) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:348) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:519) at org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3837) javadocs of TimelineAuthenticationFilterInitializer give wrong prefix for auth options
[ https://issues.apache.org/jira/browse/YARN-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt reassigned YARN-3837: -- Assignee: Bibin A Chundatt javadocs of TimelineAuthenticationFilterInitializer give wrong prefix for auth options -- Key: YARN-3837 URL: https://issues.apache.org/jira/browse/YARN-3837 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.8.0 Reporter: Steve Loughran Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3837.patch Original Estimate: 0.5h Remaining Estimate: 0.5h The javadocs for {{TimelineAuthenticationFilterInitializer}} talk about the prefix {{yarn.timeline-service.authentication.}}, but the code uses {{ yarn.timeline-service.http-authentication.}} as the prefix. best to use {{@value}} and let the javadocs sort it out for themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3835) hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml
[ https://issues.apache.org/jira/browse/YARN-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594876#comment-14594876 ] Vamsee Yarlagadda commented on YARN-3835: - Manually verified tests.jar to see missing core-site.xml, yarn-site.xml hadoop-yarn-server-resourcemanager test package bundles core-site.xml, yarn-site.xml Key: YARN-3835 URL: https://issues.apache.org/jira/browse/YARN-3835 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Vamsee Yarlagadda Assignee: Vamsee Yarlagadda Priority: Minor Attachments: YARN-3835.patch It looks like by default yarn is bundling core-site.xml, yarn-site.xml in test artifact of hadoop-yarn-server-resourcemanager which means that any downstream project which uses this a dependency can have a problem in picking up the user supplied/environment supplied core-site.xml, yarn-site.xml So we should ideally exclude these .xml files from being bundled into the test-jar. (Similar to YARN-1748) I also proactively looked at other YARN modules where this might be happening. {code} vamsee-MBP:hadoop-yarn-project vamsee$ find . -name *-site.xml ./hadoop-yarn/conf/yarn-site.xml ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/core-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/test-classes/yarn-site.xml ./hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/core-site.xml {code} And out of these only two modules (hadoop-yarn-server-resourcemanager, hadoop-yarn-server-tests) are building test-jars. In future, if we start building test-jar of other modules, we should exclude these xml files from being bundled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3806) Proposal of Generic Scheduling Framework for YARN
[ https://issues.apache.org/jira/browse/YARN-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Shao updated YARN-3806: --- Attachment: ProposalOfGenericSchedulingFrameworkForYARN-V1.06.pdf Proposal of Generic Scheduling Framework for YARN - Key: YARN-3806 URL: https://issues.apache.org/jira/browse/YARN-3806 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Wei Shao Attachments: ProposalOfGenericSchedulingFrameworkForYARN-V1.05.pdf, ProposalOfGenericSchedulingFrameworkForYARN-V1.06.pdf Currently, a typical YARN cluster runs many different kinds of applications: production applications, ad hoc user applications, long running services and so on. Different YARN scheduling policies may be suitable for different applications. For example, capacity scheduling can manage production applications well since application can get guaranteed resource share, fair scheduling can manage ad hoc user applications well since it can enforce fairness among users. However, current YARN scheduling framework doesn’t have a mechanism for multiple scheduling policies work hierarchically in one cluster. YARN-3306 talked about many issues of today’s YARN scheduling framework, and proposed a per-queue policy driven framework. In detail, it supported different scheduling policies for leaf queues. However, support of different scheduling policies for upper level queues is not seriously considered yet. A generic scheduling framework is proposed here to address these limitations. It supports different policies (fair, capacity, fifo and so on) for any queue consistently. The proposal tries to solve many other issues in current YARN scheduling framework as well. Two new proposed scheduling policies YARN-3807 YARN-3808 are based on generic scheduling framework brought up in this proposal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3807) Proposal of Guaranteed Capacity Scheduling for YARN
[ https://issues.apache.org/jira/browse/YARN-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Shao updated YARN-3807: --- Attachment: ProposalOfGuaranteedCapacitySchedulingForYARN-V1.05.pdf Proposal of Guaranteed Capacity Scheduling for YARN --- Key: YARN-3807 URL: https://issues.apache.org/jira/browse/YARN-3807 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, fairscheduler Reporter: Wei Shao Attachments: ProposalOfGuaranteedCapacitySchedulingForYARN-V1.04.pdf, ProposalOfGuaranteedCapacitySchedulingForYARN-V1.05.pdf This proposal talks about limitations of the YARN scheduling policies for SLA applications, and tries to solve them by YARN-3806 and the new scheduling policy called guaranteed capacity scheduling. Guaranteed capacity scheduling makes guarantee to the applications that they can get resources under specified capacity cap in totally predictable manner. The application can meet SLA more easily since it is self-contained in the shared cluster - external uncertainties are eliminated. For example, suppose queue A has initial capacity 100G memory, and there are two pending applications 1 and 2, 1’s specified capacity is 70G, 2’s specified capacity is 50G. Queue A may accept application 1 to run first and makes guarantee that 1 can get resources exponentially up to its capacity and won’t be preempted (if allocation of 1 is 5G in scheduling cycle N, demand is 80G, exponential factor is 2. In N+1, it can get 5G, in N+2, it can get 10G, in N+3, it can get 20G, and in N+4, it can get 30G, reach its capacity). Later, when the cluster is free, queue A may decide to scale up by increasing its capacity to 120G, so it can accept application 2 and make guarantee to it as well. Queue A can scale down to its initial capacity when any application completes. Guaranteed capacity scheduling also has other features that the example doesn’t illustrate. See proposal for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3792) Test case failures in TestDistributedShell and some issue fixes related to ATSV2
[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594821#comment-14594821 ] Naganarasimha G R commented on YARN-3792: - * Test case reported is not due to this patch and already YARN-3790 has been raised to address it. * white space is not caused by this patch * incorrect findbugs alert, report has no issues [~sjlee0], i think its good state now ! Test case failures in TestDistributedShell and some issue fixes related to ATSV2 Key: YARN-3792 URL: https://issues.apache.org/jira/browse/YARN-3792 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-3792-YARN-2928.001.patch, YARN-3792-YARN-2928.002.patch, YARN-3792-YARN-2928.003.patch # encountered [testcase failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] which was happening even without the patch modifications in YARN-3044 TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression # Remove unused {{enableATSV1}} in testDisstributedShell # container metrics needs to be published only for v2 test cases of testDisstributedShell # Nullpointer was thrown in TimelineClientImpl.constructResURI when Aux service was not configured and {{TimelineClient.putObjects}} was getting invoked. # Race condition for the Application events to published and test case verification for RM's ApplicationFinished Timeline Events # Application Tags for converted to lowercase in ApplicationSubmissionContextPBimpl, hence RMTimelinecollector was not able to detect to custom flow details of the app -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3808) Proposal of Time Extended Fair Scheduling for YARN
[ https://issues.apache.org/jira/browse/YARN-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Shao updated YARN-3808: --- Attachment: ProposalOfTimeExtendedFairSchedulingForYARN-V1.03.pdf Proposal of Time Extended Fair Scheduling for YARN -- Key: YARN-3808 URL: https://issues.apache.org/jira/browse/YARN-3808 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler, scheduler Reporter: Wei Shao Attachments: ProposalOfTimeBasedFairSchedulingForYARN-V1.02.pdf, ProposalOfTimeExtendedFairSchedulingForYARN-V1.03.pdf This proposal talks about the issues of YARN fair scheduling policy, and tries to solve them by YARN-3806 and the new scheduling policy called time extended fair scheduling. Time extended fair scheduling policy is proposed to enforces fairness over time among users. For example, if two users share the cluster weekly, each user’s fair share is half of the cluster per week. At a particular week, if the first user has used the whole cluster for first half of the week, then in second half of the week, second user will always have priority to use cluster resources since the first user has used up its time extended fair share of the cluster already. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3808) Proposal of Time Extended Fair Scheduling for YARN
[ https://issues.apache.org/jira/browse/YARN-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Shao updated YARN-3808: --- Summary: Proposal of Time Extended Fair Scheduling for YARN (was: Proposal of Time Based Fair Scheduling for YARN) Proposal of Time Extended Fair Scheduling for YARN -- Key: YARN-3808 URL: https://issues.apache.org/jira/browse/YARN-3808 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler, scheduler Reporter: Wei Shao Attachments: ProposalOfTimeBasedFairSchedulingForYARN-V1.02.pdf This proposal talks about the issues of YARN fair scheduling policy, and tries to solve them by YARN-3806 and the new scheduling policy called time based fair scheduling. Time based fair scheduling policy is proposed to enforces time based fairness among users. For example, if two users share the cluster weekly, each user’s fair share is half of the cluster per week. At a particular week, if the first user has used the whole cluster for first half of the week, then in second half of the week, second user will always have priority to use cluster resources since the first user has used up its fair share of the cluster already. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3808) Proposal of Time Extended Fair Scheduling for YARN
[ https://issues.apache.org/jira/browse/YARN-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Shao updated YARN-3808: --- Attachment: (was: ProposalOfTimeBasedFairSchedulingForYARN-V1.02.pdf) Proposal of Time Extended Fair Scheduling for YARN -- Key: YARN-3808 URL: https://issues.apache.org/jira/browse/YARN-3808 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler, scheduler Reporter: Wei Shao Attachments: ProposalOfTimeExtendedFairSchedulingForYARN-V1.03.pdf This proposal talks about the issues of YARN fair scheduling policy, and tries to solve them by YARN-3806 and the new scheduling policy called time extended fair scheduling. Time extended fair scheduling policy is proposed to enforces fairness over time among users. For example, if two users share the cluster weekly, each user’s fair share is half of the cluster per week. At a particular week, if the first user has used the whole cluster for first half of the week, then in second half of the week, second user will always have priority to use cluster resources since the first user has used up its time extended fair share of the cluster already. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3808) Proposal of Time Extended Fair Scheduling for YARN
[ https://issues.apache.org/jira/browse/YARN-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Shao updated YARN-3808: --- Description: This proposal talks about the issues of YARN fair scheduling policy, and tries to solve them by YARN-3806 and the new scheduling policy called time extended fair scheduling. Time extended fair scheduling policy is proposed to enforces fairness over time among users. For example, if two users share the cluster weekly, each user’s fair share is half of the cluster per week. At a particular week, if the first user has used the whole cluster for first half of the week, then in second half of the week, second user will always have priority to use cluster resources since the first user has used up its time extended fair share of the cluster already. was: This proposal talks about the issues of YARN fair scheduling policy, and tries to solve them by YARN-3806 and the new scheduling policy called time based fair scheduling. Time based fair scheduling policy is proposed to enforces time based fairness among users. For example, if two users share the cluster weekly, each user’s fair share is half of the cluster per week. At a particular week, if the first user has used the whole cluster for first half of the week, then in second half of the week, second user will always have priority to use cluster resources since the first user has used up its fair share of the cluster already. Proposal of Time Extended Fair Scheduling for YARN -- Key: YARN-3808 URL: https://issues.apache.org/jira/browse/YARN-3808 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler, scheduler Reporter: Wei Shao Attachments: ProposalOfTimeBasedFairSchedulingForYARN-V1.02.pdf, ProposalOfTimeExtendedFairSchedulingForYARN-V1.03.pdf This proposal talks about the issues of YARN fair scheduling policy, and tries to solve them by YARN-3806 and the new scheduling policy called time extended fair scheduling. Time extended fair scheduling policy is proposed to enforces fairness over time among users. For example, if two users share the cluster weekly, each user’s fair share is half of the cluster per week. At a particular week, if the first user has used the whole cluster for first half of the week, then in second half of the week, second user will always have priority to use cluster resources since the first user has used up its time extended fair share of the cluster already. -- This message was sent by Atlassian JIRA (v6.3.4#6332)