[jira] [Created] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception and
Shivaji Dutta created YARN-2470: --- Summary: A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception and nodemanager does not start Key: YARN-2470 URL: https://issues.apache.org/jira/browse/YARN-2470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1 Reporter: Shivaji Dutta Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114911#comment-14114911 ] Zhijie Shen commented on YARN-2449: --- The patch should work, but we can improve the logic a bit. {code} -if (!actualInitializers.equals(initializers)) { +if (!actualInitializers.equals(initializers) || modifiedInitialiers) { {code} We can set modifiedInitialiers = true when TimelineAuthenticationFilterInitializer is added and AuthenticationFilterInitializer is skipped. These are only two possible changes. Then, we don't need to check !actualInitializers.equals(initializers) , but only modifiedInitialiers in the aforementioned condition. Can you add one more case: {code} +driver.put(, TimelineAuthenticationFilterInitializer.class.getName()); {code} Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Assignee: Varun Vasudev Priority: Critical Attachments: apache-yarn-2449.0.patch Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b 'timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2405) NPE in FairSchedulerAppsBlock
[ https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2405: --- Summary: NPE in FairSchedulerAppsBlock (was: NPE in FairSchedulerAppsBlock (scheduler page)) NPE in FairSchedulerAppsBlock - Key: YARN-2405 URL: https://issues.apache.org/jira/browse/YARN-2405 Project: Hadoop YARN Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Tsuyoshi OZAWA Attachments: YARN-2405.1.patch, YARN-2405.2.patch, YARN-2405.3.patch, YARN-2405.4.patch FairSchedulerAppsBlock#render throws NPE at this line {code} int fairShare = fsinfo.getAppFairShare(attemptId); {code} This causes the scheduler page now showing the app since it lack the definition of appsTableData {code} Uncaught ReferenceError: appsTableData is not defined {code} The problem is temporary meaning that it is usually resolved by itself either after a retry or after a few hours. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock (scheduler page)
[ https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114910#comment-14114910 ] Karthik Kambatla commented on YARN-2405: +1. NPE in FairSchedulerAppsBlock (scheduler page) -- Key: YARN-2405 URL: https://issues.apache.org/jira/browse/YARN-2405 Project: Hadoop YARN Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Tsuyoshi OZAWA Attachments: YARN-2405.1.patch, YARN-2405.2.patch, YARN-2405.3.patch, YARN-2405.4.patch FairSchedulerAppsBlock#render throws NPE at this line {code} int fairShare = fsinfo.getAppFairShare(attemptId); {code} This causes the scheduler page now showing the app since it lack the definition of appsTableData {code} Uncaught ReferenceError: appsTableData is not defined {code} The problem is temporary meaning that it is usually resolved by itself either after a retry or after a few hours. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114913#comment-14114913 ] Zhijie Shen commented on YARN-2033: --- The test failure should not be related. It seems that the configuration resource was not read correctly on jenkins: {code} java.lang.RuntimeException: java.util.zip.ZipException: oversubscribed dynamic bit lengths tree at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:105) at java.io.FilterInputStream.read(FilterInputStream.java:66) at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source) at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:153) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2334) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2322) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2393) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2346) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2263) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:605) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:247) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:296) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.setup(TestRMRestart.java:119) {code} Investigate merging generic-history into the Timeline Store --- Key: YARN-2033 URL: https://issues.apache.org/jira/browse/YARN-2033 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch Having two different stores isn't amicable to generic insights on what's happening with applications. This is to investigate porting generic-history into the Timeline Store. One goal is to try and retain most of the client side interfaces as close to what we have today. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock
[ https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114972#comment-14114972 ] Tsuyoshi OZAWA commented on YARN-2405: -- Thanks Maysam, Gera, and Karthik for review! NPE in FairSchedulerAppsBlock - Key: YARN-2405 URL: https://issues.apache.org/jira/browse/YARN-2405 Project: Hadoop YARN Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-2405.1.patch, YARN-2405.2.patch, YARN-2405.3.patch, YARN-2405.4.patch FairSchedulerAppsBlock#render throws NPE at this line {code} int fairShare = fsinfo.getAppFairShare(attemptId); {code} This causes the scheduler page now showing the app since it lack the definition of appsTableData {code} Uncaught ReferenceError: appsTableData is not defined {code} The problem is temporary meaning that it is usually resolved by itself either after a retry or after a few hours. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
[ https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114977#comment-14114977 ] Tsuyoshi OZAWA commented on YARN-2406: -- Thanks Jian for reviewing and updating! Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto Key: YARN-2406 URL: https://issues.apache.org/jira/browse/YARN-2406 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-2406.1.patch, YARN-2406.2.patch Today most recovery related proto records are defined in yarn_server_resourcemanager_service_protos.proto which is inside YARN-API module. Since these records are internally used by RM only, we can move them to the yarn_server_resourcemanager_recovery.proto file inside RM-server module -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a
[ https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115011#comment-14115011 ] Beckham007 commented on YARN-2470: -- Could u give some logs about this? A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception and nodemanager does not start -- Key: YARN-2470 URL: https://issues.apache.org/jira/browse/YARN-2470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1 Reporter: Shivaji Dutta Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Horvath updated YARN-2280: Attachment: (was: YARN-2280.patch) Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Minor Fix For: 2.6.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Horvath updated YARN-2280: Attachment: YARN-2280.patch Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Minor Fix For: 2.6.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible
[ https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115081#comment-14115081 ] Hadoop QA commented on YARN-2280: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665305/YARN-2280.patch against trunk revision 4ae8178. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4769//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4769//console This message is automatically generated. Resource manager web service fields are not accessible -- Key: YARN-2280 URL: https://issues.apache.org/jira/browse/YARN-2280 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0, 2.4.1 Reporter: Krisztian Horvath Assignee: Krisztian Horvath Priority: Minor Fix For: 2.6.0 Attachments: YARN-2280.patch Using the resource manager's rest api (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some rest call returns a class where the fields after the unmarshal cannot be accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same classes on client side these fields only accessible via reflection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2449: Attachment: apache-yarn-2449.1.patch Uploaded new patch addressing [~zjshen] comments. Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Assignee: Varun Vasudev Priority: Critical Attachments: apache-yarn-2449.0.patch, apache-yarn-2449.1.patch Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b 'timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
[ https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115142#comment-14115142 ] Hudson commented on YARN-2406: -- FAILURE: Integrated in Hadoop-Yarn-trunk #663 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/663/]) YARN-2406. Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto. Contributed by Tsuyoshi OZAWA (jianhe: rev 7b3e27ab7393214e35a575bc9093100e94dd8c89) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java Add CHANGES.txt for YARN-2406. (jianhe: rev 9d68445710feff9fda9ee69847beeaf3e99b85ef) * hadoop-yarn-project/CHANGES.txt Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto Key: YARN-2406 URL: https://issues.apache.org/jira/browse/YARN-2406 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-2406.1.patch, YARN-2406.2.patch Today most recovery related proto records are defined in yarn_server_resourcemanager_service_protos.proto which is inside YARN-API module. Since these records are internally used by RM only, we can move them to the yarn_server_resourcemanager_recovery.proto file inside RM-server module -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock
[ https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115144#comment-14115144 ] Hudson commented on YARN-2405: -- FAILURE: Integrated in Hadoop-Yarn-trunk #663 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/663/]) YARN-2405. NPE in FairSchedulerAppsBlock. (Tsuyoshi Ozawa via kasha) (kasha: rev fa80ca49bdd741823ff012ddbd7a0f1aecf26195) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java * hadoop-yarn-project/CHANGES.txt NPE in FairSchedulerAppsBlock - Key: YARN-2405 URL: https://issues.apache.org/jira/browse/YARN-2405 Project: Hadoop YARN Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-2405.1.patch, YARN-2405.2.patch, YARN-2405.3.patch, YARN-2405.4.patch FairSchedulerAppsBlock#render throws NPE at this line {code} int fairShare = fsinfo.getAppFairShare(attemptId); {code} This causes the scheduler page now showing the app since it lack the definition of appsTableData {code} Uncaught ReferenceError: appsTableData is not defined {code} The problem is temporary meaning that it is usually resolved by itself either after a retry or after a few hours. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock
[ https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115257#comment-14115257 ] Hudson commented on YARN-2405: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1854 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1854/]) YARN-2405. NPE in FairSchedulerAppsBlock. (Tsuyoshi Ozawa via kasha) (kasha: rev fa80ca49bdd741823ff012ddbd7a0f1aecf26195) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerInfo.java NPE in FairSchedulerAppsBlock - Key: YARN-2405 URL: https://issues.apache.org/jira/browse/YARN-2405 Project: Hadoop YARN Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-2405.1.patch, YARN-2405.2.patch, YARN-2405.3.patch, YARN-2405.4.patch FairSchedulerAppsBlock#render throws NPE at this line {code} int fairShare = fsinfo.getAppFairShare(attemptId); {code} This causes the scheduler page now showing the app since it lack the definition of appsTableData {code} Uncaught ReferenceError: appsTableData is not defined {code} The problem is temporary meaning that it is usually resolved by itself either after a retry or after a few hours. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
[ https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115255#comment-14115255 ] Hudson commented on YARN-2406: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1854 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1854/]) YARN-2406. Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto. Contributed by Tsuyoshi OZAWA (jianhe: rev 7b3e27ab7393214e35a575bc9093100e94dd8c89) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto Add CHANGES.txt for YARN-2406. (jianhe: rev 9d68445710feff9fda9ee69847beeaf3e99b85ef) * hadoop-yarn-project/CHANGES.txt Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto Key: YARN-2406 URL: https://issues.apache.org/jira/browse/YARN-2406 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-2406.1.patch, YARN-2406.2.patch Today most recovery related proto records are defined in yarn_server_resourcemanager_service_protos.proto which is inside YARN-API module. Since these records are internally used by RM only, we can move them to the yarn_server_resourcemanager_recovery.proto file inside RM-server module -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (YARN-2471) DEFAULT_YARN_APPLICATION_CLASSPATH doesn't honor hadoop-layout.sh
[ https://issues.apache.org/jira/browse/YARN-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved HADOOP-11024 to YARN-2471: - Key: YARN-2471 (was: HADOOP-11024) Project: Hadoop YARN (was: Hadoop Common) DEFAULT_YARN_APPLICATION_CLASSPATH doesn't honor hadoop-layout.sh - Key: YARN-2471 URL: https://issues.apache.org/jira/browse/YARN-2471 Project: Hadoop YARN Issue Type: Bug Reporter: Allen Wittenauer In 0.21, hadoop-layout.sh was introduced to allow for vendors to reorganize the Hadoop distribution in a way that pleases them. DEFAULT_YARN_APPLICATION_CLASSPATH hard-codes the paths that hadoop-layout.sh was meant to override. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a
[ https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115324#comment-14115324 ] Shivaji Dutta commented on YARN-2470: - 2014-08-27 23:37:30,566 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.DeletionService failed in state INITED; cause: java.lang.NumberFormatException: For input string: 36 java.lang.NumberFormatException: For input string: 36 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:495) at java.lang.Integer.parseInt(Integer.java:527) at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1094) at org.apache.hadoop.yarn.server.nodemanager.DeletionService.serviceInit(DeletionService.java:105) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:186) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:357) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:404) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception and nodemanager does not start -- Key: YARN-2470 URL: https://issues.apache.org/jira/browse/YARN-2470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1 Reporter: Shivaji Dutta Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a
[ https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115326#comment-14115326 ] Shivaji Dutta commented on YARN-2470: - The number is obscenely high. Since I was experimenting with it. I used Ambari to set this value. Ambari should have atleast gave me a warning for this. A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception and nodemanager does not start -- Key: YARN-2470 URL: https://issues.apache.org/jira/browse/YARN-2470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1 Reporter: Shivaji Dutta Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a
[ https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115334#comment-14115334 ] Shivaji Dutta commented on YARN-2470: - I have put an Ambari issue for Validating the field - AMBARI-7082. A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception and nodemanager does not start -- Key: YARN-2470 URL: https://issues.apache.org/jira/browse/YARN-2470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1 Reporter: Shivaji Dutta Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-913: Component/s: resourcemanager Target Version/s: 2.6.0 Affects Version/s: (was: 3.0.0) 2.5.0 2.4.1 Assignee: Steve Loughran (was: Robert Joseph Evans) Add a way to register long-lived services in a YARN cluster --- Key: YARN-913 URL: https://issues.apache.org/jira/browse/YARN-913 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Affects Versions: 2.5.0, 2.4.1 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: RegistrationServiceDetails.txt In a YARN cluster you can't predict where services will come up -or on what ports. The services need to work those things out as they come up and then publish them somewhere. Applications need to be able to find the service instance they are to bond to -and not any others in the cluster. Some kind of service registry -in the RM, in ZK, could do this. If the RM held the write access to the ZK nodes, it would be more secure than having apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2472) yarn-daemons.sh should just call yarn directly
Allen Wittenauer created YARN-2472: -- Summary: yarn-daemons.sh should just call yarn directly Key: YARN-2472 URL: https://issues.apache.org/jira/browse/YARN-2472 Project: Hadoop YARN Issue Type: Improvement Reporter: Allen Wittenauer There is little-to-no need for it to go through yarn-daemon.sh anymore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2449) Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set
[ https://issues.apache.org/jira/browse/YARN-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115415#comment-14115415 ] Zhijie Shen commented on YARN-2449: --- +1, will commit the latter patch. Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set --- Key: YARN-2449 URL: https://issues.apache.org/jira/browse/YARN-2449 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Environment: Deploy security enabled cluster is ATS also enabled and running, but no hadoop.http.filter.initializers set in core-site.xml Reporter: Karam Singh Assignee: Varun Vasudev Priority: Critical Attachments: apache-yarn-2449.0.patch, apache-yarn-2449.1.patch Timelineserver returns invalid Delegation token in secure kerberos enabled cluster when hadoop.http.filter.initializers are not set Looks in it is regression from YARN-2397 After YARN-2397. when no hadoop.http.filter.initializers is set Now when try fetch DELEGATION token from timelineserver, it returns invalid token Tried to fetch timeline delegation by using curl commands : {code} 1. curl -i -k -s -b 'timeline-cookie.txt' 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=hrt_qa' Or 2. curl -i -k -s --negotiate -u : 'http://atshost:8188/ws/v1/timeline?op=GETDELEGATIONTOKENrenewer=test_user' {code} Return response is for both queries: {code} {About:Timeline API} {code} Whereas before YARN-2397 or if you set hadoop.http.filter.initializers = TimelineAuthenticationFilterInitializer or AuthenticationFilterInitializer First query returns DT and Second used to fail -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
[ https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115427#comment-14115427 ] Hudson commented on YARN-2406: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1880 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1880/]) YARN-2406. Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto. Contributed by Tsuyoshi OZAWA (jianhe: rev 7b3e27ab7393214e35a575bc9093100e94dd8c89) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationStateDataPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationStateData.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java Add CHANGES.txt for YARN-2406. (jianhe: rev 9d68445710feff9fda9ee69847beeaf3e99b85ef) * hadoop-yarn-project/CHANGES.txt Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto Key: YARN-2406 URL: https://issues.apache.org/jira/browse/YARN-2406 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-2406.1.patch, YARN-2406.2.patch Today most recovery related proto records are defined in yarn_server_resourcemanager_service_protos.proto which is inside YARN-API module. Since these records are internally used by RM only, we can move them to the yarn_server_resourcemanager_recovery.proto file inside RM-server module -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock
[ https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115429#comment-14115429 ] Hudson commented on YARN-2405: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1880 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1880/]) YARN-2405. NPE in FairSchedulerAppsBlock. (Tsuyoshi Ozawa via kasha) (kasha: rev fa80ca49bdd741823ff012ddbd7a0f1aecf26195) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java NPE in FairSchedulerAppsBlock - Key: YARN-2405 URL: https://issues.apache.org/jira/browse/YARN-2405 Project: Hadoop YARN Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Tsuyoshi OZAWA Fix For: 2.6.0 Attachments: YARN-2405.1.patch, YARN-2405.2.patch, YARN-2405.3.patch, YARN-2405.4.patch FairSchedulerAppsBlock#render throws NPE at this line {code} int fairShare = fsinfo.getAppFairShare(attemptId); {code} This causes the scheduler page now showing the app since it lack the definition of appsTableData {code} Uncaught ReferenceError: appsTableData is not defined {code} The problem is temporary meaning that it is usually resolved by itself either after a retry or after a few hours. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115447#comment-14115447 ] Carlo Curino commented on YARN-1707: [~jianhe] that is expected. As I was saying in one of the [early comments | https://issues.apache.org/jira/browse/YARN-1707?focusedCommentId=14075076page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14075076] we are cutting YARN-1051 into several smaller patches for ease of reviewing, but we are trying to make each patch work as standalone (too many dependencies, and a bit of a waste of time, as they will not be valuable independently). So the fact that doesn't compile is expected. We mark them as patch available to signal they are ready to be reviewed. [~wangda]: we have implemented the getDisplayName alternative I mentioned above, and we are in the process of testing it. We will post an updated patch soon (again not a stand-alone one necessarily). Thanks again to both of you for quick rounds of review and insightful comments. Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2473) YARN never cleans up container directories from a full disk
Jason Lowe created YARN-2473: Summary: YARN never cleans up container directories from a full disk Key: YARN-2473 URL: https://issues.apache.org/jira/browse/YARN-2473 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Priority: Blocker After YARN-1781 when a container ends up filling a local disk the nodemanager will mark it as a bad disk and remove it from the list of good local dirs. When the container eventually completes the files that filled the disk will not be removed because the NM thinks the directory is bad. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks
[ https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115534#comment-14115534 ] Jason Lowe commented on YARN-1781: -- We've run into situations where this new behavior results in disks that end up being filled by containers remain full and never recover. See YARN-2473. YARN-90 won't help much in this case because the files that filled the disk won't be deleted. Prior to this change the disks would auto-recover when the container completed, so this is a significant regression. NM should allow users to specify max disk utilization for local disks - Key: YARN-1781 URL: https://issues.apache.org/jira/browse/YARN-1781 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.4.0 Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, apache-yarn-1781.2.patch, apache-yarn-1781.3.patch, apache-yarn-1781.4.patch This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1506: -- Attachment: YARN-1506-v17.patch Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2473) YARN never cleans up container directories from a full disk
[ https://issues.apache.org/jira/browse/YARN-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115540#comment-14115540 ] Varun Vasudev commented on YARN-2473: - [~jlowe] are you going to work on this? I can take it up if it's fine by you. YARN never cleans up container directories from a full disk --- Key: YARN-2473 URL: https://issues.apache.org/jira/browse/YARN-2473 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Priority: Blocker After YARN-1781 when a container ends up filling a local disk the nodemanager will mark it as a bad disk and remove it from the list of good local dirs. When the container eventually completes the files that filled the disk will not be removed because the NM thinks the directory is bad. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115543#comment-14115543 ] Jian He commented on YARN-1506: --- bq. In AdminService: we may updateNodeResource only if node resource changes? I think this may not be accurate, as the previous update event maybe still in transit. Updated the patch myself with this change reverted. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2473) YARN never cleans up container directories from a full disk
[ https://issues.apache.org/jira/browse/YARN-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115556#comment-14115556 ] Varun Vasudev commented on YARN-2473: - My apologies for missing this when I put up the patch for YARN-1781 YARN never cleans up container directories from a full disk --- Key: YARN-2473 URL: https://issues.apache.org/jira/browse/YARN-2473 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Priority: Blocker After YARN-1781 when a container ends up filling a local disk the nodemanager will mark it as a bad disk and remove it from the list of good local dirs. When the container eventually completes the files that filled the disk will not be removed because the NM thinks the directory is bad. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2473) YARN never cleans up container directories from a full disk
[ https://issues.apache.org/jira/browse/YARN-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115580#comment-14115580 ] Jason Lowe commented on YARN-2473: -- No worries, Varun, we all missed it. ;-) We may need to track full disks separately from bad disks so we can know whether or not it's OK to try to delete a container directory from a particular disk that isn't a known good disk. I'm hesitant to have the NM try to remove container directories even from bad disks since touching them can cause a very long pause for the thread that did it. YARN never cleans up container directories from a full disk --- Key: YARN-2473 URL: https://issues.apache.org/jira/browse/YARN-2473 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Priority: Blocker After YARN-1781 when a container ends up filling a local disk the nodemanager will mark it as a bad disk and remove it from the list of good local dirs. When the container eventually completes the files that filled the disk will not be removed because the NM thinks the directory is bad. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2473) YARN never cleans up container directories from a full disk
[ https://issues.apache.org/jira/browse/YARN-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-2473: Assignee: Varun Vasudev YARN never cleans up container directories from a full disk --- Key: YARN-2473 URL: https://issues.apache.org/jira/browse/YARN-2473 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Varun Vasudev Priority: Blocker After YARN-1781 when a container ends up filling a local disk the nodemanager will mark it as a bad disk and remove it from the list of good local dirs. When the container eventually completes the files that filled the disk will not be removed because the NM thinks the directory is bad. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2470) A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception a
[ https://issues.apache.org/jira/browse/YARN-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115581#comment-14115581 ] Chris Douglas commented on YARN-2470: - Failing to start is the correct behavior; that timeout is not valid. Is your intent to disable cleanup entirely? A high value for yarn.nodemanager.delete.debug-delay-sec causes Nodemanager to crash. Slider needs this value to be high. Setting a very high value throws an exception and nodemanager does not start -- Key: YARN-2470 URL: https://issues.apache.org/jira/browse/YARN-2470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1 Reporter: Shivaji Dutta Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115586#comment-14115586 ] Hadoop QA commented on YARN-2459: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665124/YARN-2459.3.patch against trunk revision 4bd0194. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4771//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4771//console This message is automatically generated. RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2450) Fix typos in log messages
[ https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115606#comment-14115606 ] Hitesh Shah commented on YARN-2450: --- +1. Committing shortly. Fix typos in log messages - Key: YARN-2450 URL: https://issues.apache.org/jira/browse/YARN-2450 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie Attachments: YARN-2450-01.patch There are a bunch of typos in log messages. HADOOP-10946 was initially created, but may have failed due to being in multiple components. Try fixing typos on a per-component basis. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2447) RM web services app submission doesn't pass secrets correctly
[ https://issues.apache.org/jira/browse/YARN-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115630#comment-14115630 ] Jian He commented on YARN-2447: --- looks good. committing RM web services app submission doesn't pass secrets correctly - Key: YARN-2447 URL: https://issues.apache.org/jira/browse/YARN-2447 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2447.0.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115637#comment-14115637 ] Hadoop QA commented on YARN-1506: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665373/YARN-1506-v17.patch against trunk revision 4bd0194. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4772//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4772//console This message is automatically generated. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2447) RM web services app submission doesn't pass secrets correctly
[ https://issues.apache.org/jira/browse/YARN-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115663#comment-14115663 ] Hadoop QA commented on YARN-2447: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12664088/apache-yarn-2447.0.patch against trunk revision 4bd0194. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4773//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4773//console This message is automatically generated. RM web services app submission doesn't pass secrets correctly - Key: YARN-2447 URL: https://issues.apache.org/jira/browse/YARN-2447 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.6.0 Attachments: apache-yarn-2447.0.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2450) Fix typos in log messages
[ https://issues.apache.org/jira/browse/YARN-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115688#comment-14115688 ] Ray Chiang commented on YARN-2450: -- Great. Thanks! Fix typos in log messages - Key: YARN-2450 URL: https://issues.apache.org/jira/browse/YARN-2450 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie Fix For: 2.6.0 Attachments: YARN-2450-01.patch There are a bunch of typos in log messages. HADOOP-10946 was initially created, but may have failed due to being in multiple components. Try fixing typos on a per-component basis. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2395: -- Attachment: YARN-2395-4.patch Update a new patch to address the backward compatible. FairScheduler: Preemption timeout should be configurable per queue -- Key: YARN-2395 URL: https://issues.apache.org/jira/browse/YARN-2395 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, YARN-2395-3.patch, YARN-2395-4.patch Currently in fair scheduler, the preemption logic considers fair share starvation only at leaf queue level. This jira is created to implement it at the parent queue as well. It involves : 1. Making check for fair share starvation and amount of resource to preempt recursive such that they traverse the queue hierarchy from root to leaf. 2. Currently fairSharePreemptionTimeout is a global config. We could make it configurable on a per queue basis,so that we can specify different timeouts for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2462) TestNodeManagerResync#testBlockNewContainerRequestsOnStartAndResync should have a test timeout
[ https://issues.apache.org/jira/browse/YARN-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115736#comment-14115736 ] Jason Lowe commented on YARN-2462: -- +1 lgtm. Committing this. TestNodeManagerResync#testBlockNewContainerRequestsOnStartAndResync should have a test timeout -- Key: YARN-2462 URL: https://issues.apache.org/jira/browse/YARN-2462 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Eric Payne Labels: newbie Attachments: YARN-2462.201408281422.txt, YARN-2462.201408281427.txt TestNodeManagerResync#testBlockNewContainerRequestsOnStartAndResync can hang indefinitely and should have a test timeout. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2198: --- Attachment: YARN-2198.3.patch This 3.patch addresses the code review feedback. It also adds the separate etc/hadoop/wsce-site.xml configuration for winutils (the location and file name is configured from hadoop-common's pom.xml). While at it I fixed winutils/libwinutils to use 'target/winutils' as intermediate build path and I removed the hardcoded '../../../target/bin' output path and such and use instead msbuild params passed from pom.xml. Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.separation.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2474) document the wsce-site.xml keys in hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
Remus Rusanu created YARN-2474: -- Summary: document the wsce-site.xml keys in hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm Key: YARN-2474 URL: https://issues.apache.org/jira/browse/YARN-2474 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Critical document the keys used to configure WSCE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115797#comment-14115797 ] Hadoop QA commented on YARN-2395: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665397/YARN-2395-4.patch against trunk revision b1dce2a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4774//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4774//console This message is automatically generated. FairScheduler: Preemption timeout should be configurable per queue -- Key: YARN-2395 URL: https://issues.apache.org/jira/browse/YARN-2395 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, YARN-2395-3.patch, YARN-2395-4.patch Currently in fair scheduler, the preemption logic considers fair share starvation only at leaf queue level. This jira is created to implement it at the parent queue as well. It involves : 1. Making check for fair share starvation and amount of resource to preempt recursive such that they traverse the queue hierarchy from root to leaf. 2. Currently fairSharePreemptionTimeout is a global config. We could make it configurable on a per queue basis,so that we can specify different timeouts for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-1709: --- Attachment: YARN-1709.patch Updating the patch as result of API changes based on [~vinodkv] [feedback |https://issues.apache.org/jira/browse/YARN-1708?focusedCommentId=14112669] on YARN-1708. Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2360: --- Attachment: yarn-2360-6.patch Patch looks good to me. Uploading a patch with minor language changes. [~wei.yan] - does that look okay to you? Fair Scheduler : Display dynamic fair share for queues on the scheduler page Key: YARN-2360 URL: https://issues.apache.org/jira/browse/YARN-2360 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, yarn-2360-6.patch Based on the discussion in YARN-2026, we'd like to display dynamic fair share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115826#comment-14115826 ] Karthik Kambatla commented on YARN-2395: Good catch, Ashwin. I missed the backward incompatibility issue. FairScheduler: Preemption timeout should be configurable per queue -- Key: YARN-2395 URL: https://issues.apache.org/jira/browse/YARN-2395 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, YARN-2395-3.patch, YARN-2395-4.patch Currently in fair scheduler, the preemption logic considers fair share starvation only at leaf queue level. This jira is created to implement it at the parent queue as well. It involves : 1. Making check for fair share starvation and amount of resource to preempt recursive such that they traverse the queue hierarchy from root to leaf. 2. Currently fairSharePreemptionTimeout is a global config. We could make it configurable on a per queue basis,so that we can specify different timeouts for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115830#comment-14115830 ] Wei Yan commented on YARN-2360: --- Thanks, Karthik. LGTM. Fair Scheduler : Display dynamic fair share for queues on the scheduler page Key: YARN-2360 URL: https://issues.apache.org/jira/browse/YARN-2360 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, yarn-2360-6.patch Based on the discussion in YARN-2026, we'd like to display dynamic fair share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Krishnan updated YARN-2080: --- Attachment: YARN-2080.patch Uploading a new patch that adds a scheduler agnostic AbstractReservationSystem which is extended by the CapacityReservationSystem scheduler configuration as suggested by [~kasha]. CapacityReservationSystem essentially just loads configs from capacity scheduler xml. Attempted to converge this with Fair Scheduler as part of YARN-2386 but figured that it was not feasible. It has also minor changes as a result of API changes based on [~vinodkv] [feedback | https://issues.apache.org/jira/browse/YARN-1708?focusedCommentId=14112669] on YARN-1708. Admission Control: Integrate Reservation subsystem with ResourceManager --- Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subramaniam Krishnan Assignee: Subramaniam Krishnan Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115851#comment-14115851 ] Vinod Kumar Vavilapalli commented on YARN-2459: --- Can we please add two more tests for future proofing this? - Add one in TestRMRestart to get an app rejected and make sure that the final-status gets recorded - Another one in RMStateStoreTestBase to ensure it is okay to have an updateApp call without a storeApp call like in this case. RM crashes if App gets rejected for any reason and HA is enabled Key: YARN-2459 URL: https://issues.apache.org/jira/browse/YARN-2459 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch If RM HA is enabled and used Zookeeper store for RM State Store. If for any reason Any app gets rejected and directly goes to NEW to FAILED then final transition makes that to RMApps and Completed Apps memory structure but that doesn't make it to State store. Now when RMApps default limit reaches it starts deleting apps from memory and store. In that case it try to delete this app from store and fails which causes RM to crash. Thanks, Mayank -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115854#comment-14115854 ] Karthik Kambatla commented on YARN-2395: Thanks for quickly updating the patch, Wei. The patch looks mostly good, a couple of minor comments (sorry, I should have done a more thorough review earlier): # Instead of calling updatePreemptionTimeouts() in FairScheduler multiple times, we should probably call it in QueueManager#updateAllocationConfiguration once where we call recomputeSteadyShares(). # Can we augment the test (or add a new one) to verify we are not breaking backward compatibility with the preemptionTimeout defaults? FairScheduler: Preemption timeout should be configurable per queue -- Key: YARN-2395 URL: https://issues.apache.org/jira/browse/YARN-2395 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, YARN-2395-3.patch, YARN-2395-4.patch Currently in fair scheduler, the preemption logic considers fair share starvation only at leaf queue level. This jira is created to implement it at the parent queue as well. It involves : 1. Making check for fair share starvation and amount of resource to preempt recursive such that they traverse the queue hierarchy from root to leaf. 2. Currently fairSharePreemptionTimeout is a global config. We could make it configurable on a per queue basis,so that we can specify different timeouts for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115860#comment-14115860 ] Subramaniam Krishnan commented on YARN-2080: Typo in previous comment. Read it as: Uploading a new patch that adds a scheduler agnostic AbstractReservationSystem which is extended by the CapacityReservationSystem for capacity scheduler as suggested by [~kasha]. CapacityReservationSystem essentially just loads configs from capacity scheduler xml. Attempted to converge this with Fair Scheduler as part of YARN-2386 but figured that it was not feasible. It has also minor changes as a result of API changes based on [~vinodkv] [feedback | https://issues.apache.org/jira/browse/YARN-1708?focusedCommentId=14112669] on YARN-1708. Admission Control: Integrate Reservation subsystem with ResourceManager --- Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subramaniam Krishnan Assignee: Subramaniam Krishnan Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115863#comment-14115863 ] Subramaniam Krishnan commented on YARN-2385: Thanks [~sunilg] for verifying. I am fine either ways, i.e. if you want to take up the splitting now or later as currently we have ensured that the behavior of CS FS are consistent for _getAppsInQueue_. [~leftnoteasy], [~zjshen] what do you guys feel? Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue -- Key: YARN-2385 URL: https://issues.apache.org/jira/browse/YARN-2385 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler Reporter: Subramaniam Krishnan Labels: abstractyarnscheduler Currently getAppsinQueue returns both pending running apps. The purpose of the JIRA is to explore splitting it to getRunningAppsInQueue + getPendingAppsInQueue that will provide more flexibility to callers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2395: -- Attachment: YARN-2395-5.patch Update a patch to address Karthik's comments. bq.Can we augment the test (or add a new one) to verify we are not breaking backward compatibility with the preemptionTimeout defaults? I have already added testcase both in TestAllocationFileLoaderService and TestFairScheduler. TestAllocationFileLoaderService.testBackwardsCompatibleAllocationFileParsing(). {code} // Set fair share preemption timeout to 5 minutes out.println(fairSharePreemptionTimeout300/fairSharePreemptionTimeout); out.println(/allocations); {code} TestFairScheduler.testBackwardsCompatiblePreemptionConfiguration(). {code} out.print(defaultMinSharePreemptionTimeout15/defaultMinSharePreemptionTimeout); out.print(defaultFairSharePreemptionTimeout25/defaultFairSharePreemptionTimeout); out.print(fairSharePreemptionTimeout30/fairSharePreemptionTimeout); out.println(/allocations); {code} FairScheduler: Preemption timeout should be configurable per queue -- Key: YARN-2395 URL: https://issues.apache.org/jira/browse/YARN-2395 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, YARN-2395-3.patch, YARN-2395-4.patch, YARN-2395-5.patch Currently in fair scheduler, the preemption logic considers fair share starvation only at leaf queue level. This jira is created to implement it at the parent queue as well. It involves : 1. Making check for fair share starvation and amount of resource to preempt recursive such that they traverse the queue hierarchy from root to leaf. 2. Currently fairSharePreemptionTimeout is a global config. We could make it configurable on a per queue basis,so that we can specify different timeouts for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115914#comment-14115914 ] Hadoop QA commented on YARN-2360: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665421/yarn-2360-6.patch against trunk revision b03653f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4776//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4776//console This message is automatically generated. Fair Scheduler : Display dynamic fair share for queues on the scheduler page Key: YARN-2360 URL: https://issues.apache.org/jira/browse/YARN-2360 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, yarn-2360-6.patch Based on the discussion in YARN-2026, we'd like to display dynamic fair share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115913#comment-14115913 ] Hadoop QA commented on YARN-1709: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665419/YARN-1709.patch against trunk revision b03653f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4775//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4775//console This message is automatically generated. Admission Control: Reservation subsystem Key: YARN-1709 URL: https://issues.apache.org/jira/browse/YARN-1709 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Subramaniam Krishnan Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch This JIRA is about the key data structure used to track resources over time to enable YARN-1051. The Reservation subsystem is conceptually a plan of how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115968#comment-14115968 ] Hadoop QA commented on YARN-2080: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665427/YARN-2080.patch against trunk revision c60da4d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4777//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4777//console This message is automatically generated. Admission Control: Integrate Reservation subsystem with ResourceManager --- Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subramaniam Krishnan Assignee: Subramaniam Krishnan Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115969#comment-14115969 ] Hadoop QA commented on YARN-2395: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665433/YARN-2395-5.patch against trunk revision c60da4d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4778//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4778//console This message is automatically generated. FairScheduler: Preemption timeout should be configurable per queue -- Key: YARN-2395 URL: https://issues.apache.org/jira/browse/YARN-2395 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, YARN-2395-3.patch, YARN-2395-4.patch, YARN-2395-5.patch Currently in fair scheduler, the preemption logic considers fair share starvation only at leaf queue level. This jira is created to implement it at the parent queue as well. It involves : 1. Making check for fair share starvation and amount of resource to preempt recursive such that they traverse the queue hierarchy from root to leaf. 2. Currently fairSharePreemptionTimeout is a global config. We could make it configurable on a per queue basis,so that we can specify different timeouts for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115999#comment-14115999 ] Jian He commented on YARN-1707: --- Hi Carlo, thanks for your work ! I looked at the patch, some comments and questions: - to simplify, we can use getNumApplications() method {code} disposableLeafQueue.getApplications().size() 0 || disposableLeafQueue.pendingApplications.size() 0 {code} - PlanQueue.java 80 column limit - why “newQueue.changeCapacity(sesConf.getCapacity());” is inside the check and “queue.setMaxCapacity(sesConf.getMaxCapacity());” is outside the check - CapacityScheduler#getReservationQueueNames seems getting the child reservation queues of the given plan queue. We can use the planQueue#childQueues directly - DynamicQueueConf, how about calling it QueueEntitlement to be consistent ? - CapacityScheduler#parseQueue method, I think we can simplify the condition for isReservableQueue flag something like this: {code} boolean isReservableQueue = conf.isReservableQueue(fullQueueName); if (isReservableQueue) { ParentQueue parentQueue = new PlanQueue(csContext, queueName, parent, oldQueues.get(queueName)); queue = hook.hook(parentQueue); } else if ((childQueueNames == null || childQueueNames.length == 0)) {code} - just to simplify, this log msg may be put after previous “qiter.remove();” to avoid the removed boolean flag. {code} if (LOG.isDebugEnabled()) { LOG.debug(updateChildQueues (action: remove queue): + removed + + getChildQueuesToPrint()); } {code} - we can add a new reinitialize in ReservationQueue which does all these initializations. {code} CSQueueUtils.updateQueueStatistics( schedulerContext.getResourceCalculator(), ses, this, schedulerContext.getClusterResource(), schedulerContext.getMinimumResourceCapability()); ses.reinitialize(ses, clusterResource); ((ReservationQueue) ses).setMaxApplications(this .getMaxApplicationsForReservations()); ((ReservationQueue) ses).setMaxApplicationsPerUser(this .getMaxApplicationsPerUserForReservation()); {code} - IIUC, right now, queueName here is for the planQueue(inherits parentQueue), and the reservationID is for the reservationQueue(inherits from leafQueue). I think if we can get the proper reservationQueueName(leafQueue) upfront and pass it as the queueName parameter into this method, we can avoid some if/else condition changes inside this method and the method signature. {code} private synchronized void addApplication(ApplicationId applicationId, String queueName, String user, boolean isAppRecovering, ReservationId reservationID) {code} Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2395: -- Attachment: YARN-2395-5.patch All tests passed locally. Just re-trigger the jenkins. FairScheduler: Preemption timeout should be configurable per queue -- Key: YARN-2395 URL: https://issues.apache.org/jira/browse/YARN-2395 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, YARN-2395-3.patch, YARN-2395-4.patch, YARN-2395-5.patch, YARN-2395-5.patch Currently in fair scheduler, the preemption logic considers fair share starvation only at leaf queue level. This jira is created to implement it at the parent queue as well. It involves : 1. Making check for fair share starvation and amount of resource to preempt recursive such that they traverse the queue hierarchy from root to leaf. 2. Currently fairSharePreemptionTimeout is a global config. We could make it configurable on a per queue basis,so that we can specify different timeouts for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116013#comment-14116013 ] Karthik Kambatla commented on YARN-2360: The test failure should be unrelated, it passes locally. +1. Fair Scheduler : Display dynamic fair share for queues on the scheduler page Key: YARN-2360 URL: https://issues.apache.org/jira/browse/YARN-2360 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, yarn-2360-6.patch Based on the discussion in YARN-2026, we'd like to display dynamic fair share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-1707: --- Attachment: YARN-1707.5.patch Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116029#comment-14116029 ] Carlo Curino commented on YARN-1707: [~jianhe] Thanks for the feedback... The version I just posted contains the getDisplayName implementation, but does not address your last comments yet. We will get to those next. Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2475) ReservationSystem: replan upon capacity reduction
Carlo Curino created YARN-2475: -- Summary: ReservationSystem: replan upon capacity reduction Key: YARN-2475 URL: https://issues.apache.org/jira/browse/YARN-2475 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino In the context of YARN-1051, if capacity of the cluster drops significantly upon machine failures we need to trigger a reorganization of the planned reservations. As reservations are absolute it is possible that they will not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2475) ReservationSystem: replan upon capacity reduction
[ https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116037#comment-14116037 ] Carlo Curino commented on YARN-2475: The first version of this is a simple greedy policy, that walk the plan, and for every instant in time that violate the new capacity, it removes reservation in reverse acceptance order (i.e., the reservation accepted last is the first to be rejected, thus protected older reservations). ReservationSystem: replan upon capacity reduction - Key: YARN-2475 URL: https://issues.apache.org/jira/browse/YARN-2475 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino In the context of YARN-1051, if capacity of the cluster drops significantly upon machine failures we need to trigger a reorganization of the planned reservations. As reservations are absolute it is possible that they will not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2475) ReservationSystem: replan upon capacity reduction
[ https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-2475: --- Issue Type: Sub-task (was: Bug) Parent: YARN-1051 ReservationSystem: replan upon capacity reduction - Key: YARN-2475 URL: https://issues.apache.org/jira/browse/YARN-2475 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino In the context of YARN-1051, if capacity of the cluster drops significantly upon machine failures we need to trigger a reorganization of the planned reservations. As reservations are absolute it is possible that they will not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2395) FairScheduler: Preemption timeout should be configurable per queue
[ https://issues.apache.org/jira/browse/YARN-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116039#comment-14116039 ] Hadoop QA commented on YARN-2395: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665475/YARN-2395-5.patch against trunk revision 9ad413b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4779//console This message is automatically generated. FairScheduler: Preemption timeout should be configurable per queue -- Key: YARN-2395 URL: https://issues.apache.org/jira/browse/YARN-2395 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2395-1.patch, YARN-2395-2.patch, YARN-2395-3.patch, YARN-2395-3.patch, YARN-2395-4.patch, YARN-2395-5.patch, YARN-2395-5.patch Currently in fair scheduler, the preemption logic considers fair share starvation only at leaf queue level. This jira is created to implement it at the parent queue as well. It involves : 1. Making check for fair share starvation and amount of resource to preempt recursive such that they traverse the queue hierarchy from root to leaf. 2. Currently fairSharePreemptionTimeout is a global config. We could make it configurable on a per queue basis,so that we can specify different timeouts for parent queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116041#comment-14116041 ] Hadoop QA commented on YARN-1707: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665477/YARN-1707.5.patch against trunk revision 9ad413b. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4780//console This message is automatically generated. Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2475) ReservationSystem: replan upon capacity reduction
[ https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-2475: --- Attachment: YARN-2475.patch ReservationSystem: replan upon capacity reduction - Key: YARN-2475 URL: https://issues.apache.org/jira/browse/YARN-2475 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-2475.patch In the context of YARN-1051, if capacity of the cluster drops significantly upon machine failures we need to trigger a reorganization of the planned reservations. As reservations are absolute it is possible that they will not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1710) Admission Control: agents to allocate reservation
[ https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-1710: --- Attachment: YARN-1710.1.patch Admission Control: agents to allocate reservation - Key: YARN-1710 URL: https://issues.apache.org/jira/browse/YARN-1710 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1710.1.patch, YARN-1710.patch This JIRA tracks the algorithms used to allocate a user ReservationRequest coming in from the new reservation API (YARN-1708), in the inventory subsystem (YARN-1709) maintaining the current plan for the cluster. The focus of this agents is to quickly find a solution for the set of contraints provided by the user, and the physical constraints of the plan. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1712) Admission Control: plan follower
[ https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-1712: --- Attachment: YARN-1712.1.patch Admission Control: plan follower Key: YARN-1712 URL: https://issues.apache.org/jira/browse/YARN-1712 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations, scheduler Attachments: YARN-1712.1.patch, YARN-1712.patch This JIRA tracks a thread that continuously propagates the current state of an inventory subsystem to the scheduler. As the inventory subsystem store the plan of how the resources should be subdivided, the work we propose in this JIRA realizes such plan by dynamically instructing the CapacityScheduler to add/remove/resize queues to follow the plan. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-1711: --- Attachment: YARN-1711.1.patch CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 -- Key: YARN-1711 URL: https://issues.apache.org/jira/browse/YARN-1711 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations Attachments: YARN-1711.1.patch, YARN-1711.patch This JIRA tracks the development of a policy that enforces user quotas (a time-extension of the notion of capacity) in the inventory subsystem discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1712) Admission Control: plan follower
[ https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116068#comment-14116068 ] Hadoop QA commented on YARN-1712: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665487/YARN-1712.1.patch against trunk revision 9ad413b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4782//console This message is automatically generated. Admission Control: plan follower Key: YARN-1712 URL: https://issues.apache.org/jira/browse/YARN-1712 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations, scheduler Attachments: YARN-1712.1.patch, YARN-1712.patch This JIRA tracks a thread that continuously propagates the current state of an inventory subsystem to the scheduler. As the inventory subsystem store the plan of how the resources should be subdivided, the work we propose in this JIRA realizes such plan by dynamically instructing the CapacityScheduler to add/remove/resize queues to follow the plan. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1710) Admission Control: agents to allocate reservation
[ https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116069#comment-14116069 ] Hadoop QA commented on YARN-1710: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665487/YARN-1712.1.patch against trunk revision 9ad413b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4781//console This message is automatically generated. Admission Control: agents to allocate reservation - Key: YARN-1710 URL: https://issues.apache.org/jira/browse/YARN-1710 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Attachments: YARN-1710.1.patch, YARN-1710.patch This JIRA tracks the algorithms used to allocate a user ReservationRequest coming in from the new reservation API (YARN-1708), in the inventory subsystem (YARN-1709) maintaining the current plan for the cluster. The focus of this agents is to quickly find a solution for the set of contraints provided by the user, and the physical constraints of the plan. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116079#comment-14116079 ] Hadoop QA commented on YARN-1711: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12665488/YARN-1711.1.patch against trunk revision 9ad413b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4783//console This message is automatically generated. CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 -- Key: YARN-1711 URL: https://issues.apache.org/jira/browse/YARN-1711 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations Attachments: YARN-1711.1.patch, YARN-1711.patch This JIRA tracks the development of a policy that enforces user quotas (a time-extension of the notion of capacity) in the inventory subsystem discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2360) Fair Scheduler: Display dynamic fair share for queues on the scheduler page
[ https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2360: --- Summary: Fair Scheduler: Display dynamic fair share for queues on the scheduler page (was: Fair Scheduler : Display dynamic fair share for queues on the scheduler page) Fair Scheduler: Display dynamic fair share for queues on the scheduler page --- Key: YARN-2360 URL: https://issues.apache.org/jira/browse/YARN-2360 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, yarn-2360-6.patch Based on the discussion in YARN-2026, we'd like to display dynamic fair share for queues on the scheduler page. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116184#comment-14116184 ] Wangda Tan commented on YARN-1707: -- Carlo, thanks updating the patch. In addition to Jian's comment, I think the changes for displayQueueName looks good to me. I don't have further comments about this patch for now. Thanks, Wangda Making the CapacityScheduler more dynamic - Key: YARN-1707 URL: https://issues.apache.org/jira/browse/YARN-1707 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Labels: capacity-scheduler Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.patch The CapacityScheduler is a rather static at the moment, and refreshqueue provides a rather heavy-handed way to reconfigure it. Moving towards long-running services (tracked in YARN-896) and to enable more advanced admission control and resource parcelling we need to make the CapacityScheduler more dynamic. This is instrumental to the umbrella jira YARN-1051. Concretely this require the following changes: * create queues dynamically * destroy queues dynamically * dynamically change queue parameters (e.g., capacity) * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% instead of ==100% We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2476) Apps are scheduled in random order after RM failover
Santosh Marella created YARN-2476: - Summary: Apps are scheduled in random order after RM failover Key: YARN-2476 URL: https://issues.apache.org/jira/browse/YARN-2476 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Environment: Linux Reporter: Santosh Marella RM HA is configured with 2 RMs. Used FileSystemRMStateStore. Fairscheduler allocation file is configured in yarn-site.xml: property nameyarn.scheduler.fair.allocation.file/name value/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop/allocation-pools.xml/value /property FS allocation-pools.xml: ?xml version=1.0? allocations queue name=dev minResources1 mb,10vcores/minResources maxResources19000 mb,100vcores/maxResources maxRunningApps5525/maxRunningApps weight4.5/weight schedulingPolicyfair/schedulingPolicy fairSharePreemptionTimeout3600/fairSharePreemptionTimeout /queue queue name=default minResources1 mb,10vcores/minResources maxResources19000 mb,100vcores/maxResources maxRunningApps5525/maxRunningApps weight1.5/weight schedulingPolicyfair/schedulingPolicy fairSharePreemptionTimeout3600/fairSharePreemptionTimeout /queue defaultMinSharePreemptionTimeout600/defaultMinSharePreemptionTimeout fairSharePreemptionTimeout600/fairSharePreemptionTimeout /allocations Submitted 10 sleep jobs to a FS queue using the command: hadoop jar hadoop-mapreduce-examples-2.4.1-mapr-4.0.1-SNAPSHOT.jar sleep -Dmapreduce.job.queuename=root.dev -m 10 -r 10 -mt 1 -rt 1 All the jobs were submitted by the same user, with the same priority and to the same queue. No other jobs were running in the cluster. Jobs started executing in the order in which they were submitted (jobs 6 to 10 were active, while 11 to 15 were waiting): root@perfnode131:/opt/mapr/hadoop/hadoop-2.4.1/logs# yarn application -list Total number of applications (application-types: [] and states: [SUBMITTED,ACCEPTED, RUNNING]):10 Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL application_1408572781346_0012 Sleep job MAPREDUCE userAroot.devACCEPTED UNDEFINED 0% N/A application_1408572781346_0014 Sleep job MAPREDUCE userAroot.devACCEPTED UNDEFINED 0% N/A application_1408572781346_0011 Sleep job MAPREDUCE userAroot.devACCEPTED UNDEFINED 0% N/A application_1408572781346_0010 Sleep job MAPREDUCE userAroot.dev RUNNING UNDEFINED 5% http://perfnode132:52799 application_1408572781346_0008 Sleep job MAPREDUCE userAroot.dev RUNNING UNDEFINED 5% http://perfnode131:33766 application_1408572781346_0009 Sleep job MAPREDUCE userAroot.dev RUNNING UNDEFINED 5% http://perfnode132:50964 application_1408572781346_0007 Sleep job MAPREDUCE userAroot.dev RUNNING UNDEFINED 5% http://perfnode134:52966 application_1408572781346_0015 Sleep job MAPREDUCE userAroot.devACCEPTED UNDEFINED 0% N/A application_1408572781346_0006 Sleep job MAPREDUCE userAroot.dev RUNNING UNDEFINED 9.5% http://perfnode134:34094 application_1408572781346_0013 Sleep job MAPREDUCE userAroot.devACCEPTED UNDEFINED 0% N/A Stopped RM1. There was a failover and RM2 became active. But the jobs seem to have started in a different order: root@perfnode131:~/scratch/raw_rm_logs_fs_hang# yarn application -list 14/08/21 07:26:13 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 Total number of applications (application-types: [] and states: [SUBMITTED,ACCEPTED, RUNNING]):10 Application-Id Application-NameApplication-Type User Queue State Final-State Progress Tracking-URL application_1408572781346_0012 Sleep job MAPREDUCE userAroot.dev RUNNING UNDEFINED 5%http://perfnode134:59351 application_1408572781346_0014 Sleep job MAPREDUCE userAroot.dev RUNNING
[jira] [Updated] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class
[ https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2404: - Attachment: YARN-2404.1.patch Attached a first patch. Remove ApplicationAttemptState and ApplicationState class in RMStateStore class Key: YARN-2404 URL: https://issues.apache.org/jira/browse/YARN-2404 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA Attachments: YARN-2404.1.patch We can remove ApplicationState and ApplicationAttemptState class in RMStateStore, given that we already have ApplicationStateData and ApplicationAttemptStateData records. we may just replace ApplicationState with ApplicationStateData, similarly for ApplicationAttemptState. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2394) FairScheduler: Configure fairSharePreemptionThreshold per queue
[ https://issues.apache.org/jira/browse/YARN-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2394: --- Summary: FairScheduler: Configure fairSharePreemptionThreshold per queue (was: Fair Scheduler : ability to configure fairSharePreemptionThreshold per queue) FairScheduler: Configure fairSharePreemptionThreshold per queue --- Key: YARN-2394 URL: https://issues.apache.org/jira/browse/YARN-2394 Project: Hadoop YARN Issue Type: New Feature Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2394-1.patch, YARN-2394-2.patch Preemption based on fair share starvation happens when usage of a queue is less than 50% of its fair share. This 50% is hardcoded. We'd like to make this configurable on a per queue basis, so that we can choose the threshold at which we want to preempt. Calling this config fairSharePreemptionThreshold. -- This message was sent by Atlassian JIRA (v6.2#6252)