[jira] [Created] (YARN-1554) add an env variable for the YARN AM classpath
Steve Loughran created YARN-1554: Summary: add an env variable for the YARN AM classpath Key: YARN-1554 URL: https://issues.apache.org/jira/browse/YARN-1554 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.2.0 Reporter: Steve Loughran Priority: Minor Currently YARN apps set up their classpath via the default value {{YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH}} or an overridden property {{yarn.application.classpath}}. If you don't have the classpath right, the AM won't start up. This means the client needs to be explicitly configured with the CP. If the node manager exported the classpath property via an env variable {{YARN_APPLICATION_CLASSPATH}}, then the classpath could be set up in the AM simply by referencing that property, rather than hoping its setting is in sync. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1554) add an env variable for the YARN AM classpath
[ https://issues.apache.org/jira/browse/YARN-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860131#comment-13860131 ] Steve Loughran commented on YARN-1554: -- As noted in {{org.apache.hadoop.yarn.applications.distributedshell.Client}}: {code} // At some point we should not be required to add // the hadoop specific classpaths to the env. // It should be provided out of the box. // For now setting all required classpaths including // the classpath to . for the application jar {code} add an env variable for the YARN AM classpath - Key: YARN-1554 URL: https://issues.apache.org/jira/browse/YARN-1554 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.2.0 Reporter: Steve Loughran Priority: Minor Currently YARN apps set up their classpath via the default value {{YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH}} or an overridden property {{yarn.application.classpath}}. If you don't have the classpath right, the AM won't start up. This means the client needs to be explicitly configured with the CP. If the node manager exported the classpath property via an env variable {{YARN_APPLICATION_CLASSPATH}}, then the classpath could be set up in the AM simply by referencing that property, rather than hoping its setting is in sync. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1506: - Attachment: YARN-1506-v1.patch Update the first version of patch, include: - replace set RMNode/SchedulerNode directly with event notification. - update both total resource and available resource on schedulerNode (fix a bug in previous patch) - Remove resourceOption from RMNode as RMNode don't have to aware overcommitTimeout - Other necessary changes in related tests Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860253#comment-13860253 ] Hadoop QA commented on YARN-1506: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621074/YARN-1506-v1.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2772//console This message is automatically generated. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1506: - Attachment: YARN-1506-v2.patch Not run in Jenkins automatically... Get failed in manual start build, but error message make no sense. Rename patch to v2 and submit it again. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860286#comment-13860286 ] Hadoop QA commented on YARN-1506: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621084/YARN-1506-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2773//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2773//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2773//console This message is automatically generated. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1506: - Attachment: YARN-1506-v3.patch Fix findbugs warnings and javadoc warnings in v3 patch. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860326#comment-13860326 ] Hadoop QA commented on YARN-1506: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621095/YARN-1506-v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2774//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2774//console This message is automatically generated. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1138) yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows
[ https://issues.apache.org/jira/browse/YARN-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingda Chen reassigned YARN-1138: - Assignee: (was: Yingda Chen) I am not actively working on YARN for the time being. yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows --- Key: YARN-1138 URL: https://issues.apache.org/jira/browse/YARN-1138 Project: Hadoop YARN Issue Type: Bug Reporter: Yingda Chen yarn-default.xml has yarn.application.classpath entry set to $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib. It does not work on Windows which needs to be fixed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1553) Do not use HttpConfig.isSecure() in YARN
[ https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated YARN-1553: - Attachment: YARN-1553.001.patch Do not use HttpConfig.isSecure() in YARN Key: YARN-1553 URL: https://issues.apache.org/jira/browse/YARN-1553 Project: Hadoop YARN Issue Type: Bug Reporter: Haohui Mai Attachments: YARN-1553.000.patch, YARN-1553.001.patch HDFS-5305 and related jira decide that each individual project will have their own configuration on http policy. {{HttpConfig.isSecure}} is a global static method which does not fit the design anymore. The same functionality should be moved into the YARN code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1554) add an env variable for the YARN AM classpath
[ https://issues.apache.org/jira/browse/YARN-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860661#comment-13860661 ] Sandy Ryza commented on YARN-1554: -- Is this a duplicate of YARN-973? add an env variable for the YARN AM classpath - Key: YARN-1554 URL: https://issues.apache.org/jira/browse/YARN-1554 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.2.0 Reporter: Steve Loughran Priority: Minor Currently YARN apps set up their classpath via the default value {{YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH}} or an overridden property {{yarn.application.classpath}}. If you don't have the classpath right, the AM won't start up. This means the client needs to be explicitly configured with the CP. If the node manager exported the classpath property via an env variable {{YARN_APPLICATION_CLASSPATH}}, then the classpath could be set up in the AM simply by referencing that property, rather than hoping its setting is in sync. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1553) Do not use HttpConfig.isSecure() in YARN
[ https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-1553: - Assignee: Haohui Mai Do not use HttpConfig.isSecure() in YARN Key: YARN-1553 URL: https://issues.apache.org/jira/browse/YARN-1553 Project: Hadoop YARN Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: YARN-1553.000.patch, YARN-1553.001.patch HDFS-5305 and related jira decide that each individual project will have their own configuration on http policy. {{HttpConfig.isSecure}} is a global static method which does not fit the design anymore. The same functionality should be moved into the YARN code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1549) TestUnmanagedAMLauncher#testDSShell fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860696#comment-13860696 ] Hudson commented on YARN-1549: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4949 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4949/]) YARN-1549. Fixed a bug in ResourceManager's ApplicationMasterService that was causing unamanged AMs to not finish correctly. Contributed by haosdent. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1554886) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java TestUnmanagedAMLauncher#testDSShell fails in trunk -- Key: YARN-1549 URL: https://issues.apache.org/jira/browse/YARN-1549 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: haosdent Fix For: 2.4.0 Attachments: YARN-1549.1.patch, YARN-1549.patch The following error is reproducible: {code} testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher) Time elapsed: 14.911 sec ERROR! java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=RUNNING, ExpectedStates=FINISHED,FAILED,KILLED at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:147) {code} See https://builds.apache.org/job/Hadoop-Yarn-trunk/435 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860738#comment-13860738 ] Hudson commented on YARN-1493: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4951 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4951/]) YARN-1493. Changed ResourceManager and Scheduler interfacing to recognize app-attempts separately from apps. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1554896) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptRejectedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ActiveUsersManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerAppReport.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAttemptAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppRemovedSchedulerEvent.java *
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860766#comment-13860766 ] Hadoop QA commented on YARN-1506: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621095/YARN-1506-v3.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2775//console This message is automatically generated. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1553) Do not use HttpConfig.isSecure() in YARN
[ https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860768#comment-13860768 ] Hadoop QA commented on YARN-1553: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621133/YARN-1553.001.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2776//console This message is automatically generated. Do not use HttpConfig.isSecure() in YARN Key: YARN-1553 URL: https://issues.apache.org/jira/browse/YARN-1553 Project: Hadoop YARN Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: YARN-1553.000.patch, YARN-1553.001.patch HDFS-5305 and related jira decide that each individual project will have their own configuration on http policy. {{HttpConfig.isSecure}} is a global static method which does not fit the design anymore. The same functionality should be moved into the YARN code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1297) Miscellaneous Fair Scheduler speedups
[ https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860792#comment-13860792 ] Karthik Kambatla commented on YARN-1297: First round of comments: # Would be nice to see what the gains are corresponding to replacing ResourcePBImpl with SimpleResource. If it is not noticeable, it might be better to leave it as is. # At a couple of places, instead of modifying the resource usage of a queue this way, it would be better to add a method to FSQueue that does this. {code} + Resources.addTo(cur.getResourceUsage(), container.getResource()); {code} # I am surprised direct comparisons instead of DefaultResourceCalculator have a noticeable performance difference. Can we measure the gains due to this change, and drop it if none. {code} - Resource minShare1 = Resources.min(RESOURCE_CALCULATOR, null, - s1.getMinShare(), s1.getDemand()); + int minShare1 = Math.min(s1.getMinShare().getMemory(), + s1.getDemand().getMemory()); {code} # I am not an expert, but I hear Math#signum is supposed to be optimized for performance. Just curious - how much did changing this help? Miscellaneous Fair Scheduler speedups - Key: YARN-1297 URL: https://issues.apache.org/jira/browse/YARN-1297 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1297-1.patch, YARN-1297.patch, YARN-1297.patch I ran the Fair Scheduler's core scheduling loop through a profiler to and identified a bunch of minimally invasive changes that can shave off a few milliseconds. The main one is demoting a couple INFO log messages to DEBUG, which brought my benchmark down from 16000 ms to 6000. A few others (which had way less of an impact) were * Most of the time in comparisons was being spent in Math.signum. I switched this to direct ifs and elses and it halved the percent of time spent in comparisons. * I removed some unnecessary instantiations of Resource objects * I made it so that queues' usage wasn't calculated from the applications up each time getResourceUsage was called. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1413) [YARN-321] AHS WebUI should server aggregated logs as well
[ https://issues.apache.org/jira/browse/YARN-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-1413: Attachment: YARN-1413-5.patch Thanks [~vinodkv] for review. Updating the patch. This change is calling AHS server so no nodes will be called. Thanks, Mayank [YARN-321] AHS WebUI should server aggregated logs as well -- Key: YARN-1413 URL: https://issues.apache.org/jira/browse/YARN-1413 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Attachments: YARN-1413-1.patch, YARN-1413-2.patch, YARN-1413-3.patch, YARN-1413-4.patch, YARN-1413-5.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1482) WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM
[ https://issues.apache.org/jira/browse/YARN-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1482: Attachment: YARN-1482.3.patch WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM - Key: YARN-1482 URL: https://issues.apache.org/jira/browse/YARN-1482 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-1482.1.patch, YARN-1482.2.patch, YARN-1482.3.patch This way, even if an RM goes to standby mode, we can affect a redirect to the active. And more importantly, users will not suddenly see all their links stop working. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860887#comment-13860887 ] Xuan Gong commented on YARN-1410: - Thanks for the comments. [~bikassaha], [~kkambatl] bq.But I think we have a separate createApplication() in order to get an appId for which to request RM tokens so that those tokens can be inserted in the AppSubmitContext before app submission Looks like that requesting RM tokens request does not need appId. But I agree that we still need createApplication(). By using this function, it can give us a global unique Id, we can use this Id to do several things, such as create JobId for mapreduce job, use it as part of Path to set up the local resource. For the solution of this ticket, I think the better way is to ask client to alway use the submitApplication() (adding comments on yarnClient api). In submitApplication(), we can check whether the appid is provided from ASC or not, if it does, we can use this appid (of course, need to check whether this appid is provided by current active rm or not) to submit the application. If not, we can ask one, then do the submission. Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410.1.patch App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1413) [YARN-321] AHS WebUI should server aggregated logs as well
[ https://issues.apache.org/jira/browse/YARN-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860955#comment-13860955 ] Hadoop QA commented on YARN-1413: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621155/YARN-1413-5.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2778//console This message is automatically generated. [YARN-321] AHS WebUI should server aggregated logs as well -- Key: YARN-1413 URL: https://issues.apache.org/jira/browse/YARN-1413 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Attachments: YARN-1413-1.patch, YARN-1413-2.patch, YARN-1413-3.patch, YARN-1413-4.patch, YARN-1413-5.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1553) Do not use HttpConfig.isSecure() in YARN
[ https://issues.apache.org/jira/browse/YARN-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861028#comment-13861028 ] Hadoop QA commented on YARN-1553: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621133/YARN-1553.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2777//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2777//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2777//console This message is automatically generated. Do not use HttpConfig.isSecure() in YARN Key: YARN-1553 URL: https://issues.apache.org/jira/browse/YARN-1553 Project: Hadoop YARN Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: YARN-1553.000.patch, YARN-1553.001.patch HDFS-5305 and related jira decide that each individual project will have their own configuration on http policy. {{HttpConfig.isSecure}} is a global static method which does not fit the design anymore. The same functionality should be moved into the YARN code base. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1539) Queue admin ACLs should NOT be similar to submit-acls w.r.t hierarchy.
[ https://issues.apache.org/jira/browse/YARN-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861051#comment-13861051 ] Sandy Ryza commented on YARN-1539: -- Verified that the Fair Scheduler has the same behavior. Queue admin ACLs should NOT be similar to submit-acls w.r.t hierarchy. -- Key: YARN-1539 URL: https://issues.apache.org/jira/browse/YARN-1539 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Today, Queue admin ACLs are similar to submit-acls w.r.t hierarchy in that if one has to be able to administer a queue, he/she should be an admin of all the queues in the ancestry - an unnecessary burden. This was added in YARN-899 and I believe is wrong semantics as well as implementation. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1482) WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM
[ https://issues.apache.org/jira/browse/YARN-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861075#comment-13861075 ] Hadoop QA commented on YARN-1482: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621164/YARN-1482.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 3 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.yarn.client.api.impl.TestYarnClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2779//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2779//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2779//console This message is automatically generated. WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM - Key: YARN-1482 URL: https://issues.apache.org/jira/browse/YARN-1482 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-1482.1.patch, YARN-1482.2.patch, YARN-1482.3.patch This way, even if an RM goes to standby mode, we can affect a redirect to the active. And more importantly, users will not suddenly see all their links stop working. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1490: -- Attachment: YARN-1490.1.patch RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1490.1.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1551) Allow user-specified reason for killApplication
[ https://issues.apache.org/jira/browse/YARN-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-1551: Attachment: YARN-1551.v03.patch Fixing broken tests. Allow user-specified reason for killApplication --- Key: YARN-1551 URL: https://issues.apache.org/jira/browse/YARN-1551 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1551.v01.patch, YARN-1551.v02.patch, YARN-1551.v03.patch This completes MAPREDUCE-5648 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-1490: - Assignee: Jian He (was: Vinod Kumar Vavilapalli) RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861093#comment-13861093 ] Jian He commented on YARN-1490: --- - Create a field in AppSubmissionContext to indicate whether to clean the containers on AM failure or not. - Copy the data structures(liveContainers etc.) inside SchedulerApplicationAttempt over in the case that new attempt is recovering the failed attempt’s scheduler info. - Similarly, copy the needed data structures(finished Containers etc.) inside RMAppAttempt over in the case that new attempt is recovering the failed RMAppAttempt info. - The failed attempt is changed to still receive container events and record the finished containers and new attempt is created with the reference of the objects of the previous attempt. - The appAttempt data structure inside the schedulers are removed, only use SchedulerApplication.getCurrentAppAttempt to retrieve the current attempt. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1038) LocalizationProtocolPBClientImpl RPC failing
[ https://issues.apache.org/jira/browse/YARN-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861094#comment-13861094 ] haosdent commented on YARN-1038: A complete logs would help us to position the cause of this error. LocalizationProtocolPBClientImpl RPC failing Key: YARN-1038 URL: https://issues.apache.org/jira/browse/YARN-1038 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Alejandro Abdelnur Priority: Blocker Trying to run an MR job in trunk is failing with: {code} 2013-08-06 22:24:21,498 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send rpc request to server java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1279) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1019) at org.apache.hadoop.ipc.Client.call(Client.java:1372) at org.apache.hadoop.ipc.Client.call(Client.java:1352) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy25.heartbeat(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:250) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:164) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1506: - Attachment: YARN-1506-v4.patch Some new changes coming to the trunk make v3 patch stale. Re-sync to v4 patch. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861102#comment-13861102 ] Hadoop QA commented on YARN-1506: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621217/YARN-1506-v4.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2782//console This message is automatically generated. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1431) TestWebAppProxyServlet is failing on trunk
[ https://issues.apache.org/jira/browse/YARN-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861107#comment-13861107 ] haosdent commented on YARN-1431: On my machine, this test is passed. The error log form surefire looks normal because of this code. {code:java} URL wrongUrl = new URL(http://localhost:9099/proxy/app;); {code} TestWebAppProxyServlet is failing on trunk -- Key: YARN-1431 URL: https://issues.apache.org/jira/browse/YARN-1431 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.3.0 Reporter: Omkar Vinit Joshi Priority: Blocker Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.609 sec FAILURE! - in org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet testWebAppProxyServerMainMethod(org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet) Time elapsed: 5.006 sec ERROR! java.lang.Exception: test timed out after 5000 milliseconds at java.net.Inet4AddressImpl.getHostByAddr(Native Method) at java.net.InetAddress$1.getHostByAddr(InetAddress.java:881) at java.net.InetAddress.getHostFromNameService(InetAddress.java:560) at java.net.InetAddress.getCanonicalHostName(InetAddress.java:531) at org.apache.hadoop.security.SecurityUtil.getLocalHostName(SecurityUtil.java:227) at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:247) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.doSecureLogin(WebAppProxyServer.java:72) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.serviceInit(WebAppProxyServer.java:57) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.startServer(WebAppProxyServer.java:99) at org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet.testWebAppProxyServerMainMethod(TestWebAppProxyServlet.java:187) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861108#comment-13861108 ] Hadoop QA commented on YARN-1490: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621206/YARN-1490.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.api.impl.TestNMClient org.apache.hadoop.yarn.client.api.impl.TestAMRMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2781//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2781//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2781//console This message is automatically generated. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1431) TestWebAppProxyServlet is failing on trunk
[ https://issues.apache.org/jira/browse/YARN-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned YARN-1431: -- Assignee: haosdent TestWebAppProxyServlet is failing on trunk -- Key: YARN-1431 URL: https://issues.apache.org/jira/browse/YARN-1431 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.3.0 Reporter: Omkar Vinit Joshi Assignee: haosdent Priority: Blocker Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.609 sec FAILURE! - in org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet testWebAppProxyServerMainMethod(org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet) Time elapsed: 5.006 sec ERROR! java.lang.Exception: test timed out after 5000 milliseconds at java.net.Inet4AddressImpl.getHostByAddr(Native Method) at java.net.InetAddress$1.getHostByAddr(InetAddress.java:881) at java.net.InetAddress.getHostFromNameService(InetAddress.java:560) at java.net.InetAddress.getCanonicalHostName(InetAddress.java:531) at org.apache.hadoop.security.SecurityUtil.getLocalHostName(SecurityUtil.java:227) at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:247) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.doSecureLogin(WebAppProxyServer.java:72) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.serviceInit(WebAppProxyServer.java:57) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.startServer(WebAppProxyServer.java:99) at org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet.testWebAppProxyServerMainMethod(TestWebAppProxyServlet.java:187) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1431) TestWebAppProxyServlet is failing on trunk
[ https://issues.apache.org/jira/browse/YARN-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861113#comment-13861113 ] haosdent commented on YARN-1431: The real cause of timeout exception is this code. Maybe it depends on your machine. {code:java} mainServer = WebAppProxyServer.startServer(conf); {code} {code} java.lang.Exception: test timed out after 5000 milliseconds at java.net.Inet4AddressImpl.getHostByAddr(Native Method) {code} TestWebAppProxyServlet is failing on trunk -- Key: YARN-1431 URL: https://issues.apache.org/jira/browse/YARN-1431 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.3.0 Reporter: Omkar Vinit Joshi Priority: Blocker Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.609 sec FAILURE! - in org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet testWebAppProxyServerMainMethod(org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet) Time elapsed: 5.006 sec ERROR! java.lang.Exception: test timed out after 5000 milliseconds at java.net.Inet4AddressImpl.getHostByAddr(Native Method) at java.net.InetAddress$1.getHostByAddr(InetAddress.java:881) at java.net.InetAddress.getHostFromNameService(InetAddress.java:560) at java.net.InetAddress.getCanonicalHostName(InetAddress.java:531) at org.apache.hadoop.security.SecurityUtil.getLocalHostName(SecurityUtil.java:227) at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:247) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.doSecureLogin(WebAppProxyServer.java:72) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.serviceInit(WebAppProxyServer.java:57) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.startServer(WebAppProxyServer.java:99) at org.apache.hadoop.yarn.server.webproxy.TestWebAppProxyServlet.testWebAppProxyServerMainMethod(TestWebAppProxyServlet.java:187) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-1529: Attachment: YARN-1529.v02.patch Moved YARN-changes from MAPREDUCE-5696 Add Localization overhead metrics to NM --- Key: YARN-1529 URL: https://issues.apache.org/jira/browse/YARN-1529 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of metrics. We propose addition of the following metrics to NodeManagerMetrics. When a container is about to launch, its set of LocalResources has to be fetched from a central location, typically on HDFS, that results in a number of download requests for the files missing in caches. LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses. LocalizedFilesCached: total localization requests that were served from local caches. Cache hits. LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. LocalizedBytesCached: total bytes satisfied from local caches. Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served out of cache: ratio = 100 * caches / (caches + misses) LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-304) RM Tracking Links for purged applications needs a long-term solution
[ https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-304: - Issue Type: Sub-task (was: Improvement) Parent: YARN-321 RM Tracking Links for purged applications needs a long-term solution Key: YARN-304 URL: https://issues.apache.org/jira/browse/YARN-304 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit This JIRA is intended to track a proper long-term fix for the issue described in YARN-285. The following is from the original description: As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1482) WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM
[ https://issues.apache.org/jira/browse/YARN-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1482: Attachment: YARN-1482.4.patch WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM - Key: YARN-1482 URL: https://issues.apache.org/jira/browse/YARN-1482 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-1482.1.patch, YARN-1482.2.patch, YARN-1482.3.patch, YARN-1482.4.patch This way, even if an RM goes to standby mode, we can affect a redirect to the active. And more importantly, users will not suddenly see all their links stop working. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1482) WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM
[ https://issues.apache.org/jira/browse/YARN-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861150#comment-13861150 ] Xuan Gong commented on YARN-1482: - fix -1 on release audit Testcase failure is un-related. Will open a ticket to track this test case failure WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM - Key: YARN-1482 URL: https://issues.apache.org/jira/browse/YARN-1482 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-1482.1.patch, YARN-1482.2.patch, YARN-1482.3.patch, YARN-1482.4.patch This way, even if an RM goes to standby mode, we can affect a redirect to the active. And more importantly, users will not suddenly see all their links stop working. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861156#comment-13861156 ] Vinod Kumar Vavilapalli commented on YARN-1399: --- Agree that oozie should NOT kill apps as privileged user in any case. bq. I am open to enforcing specifying either a user/queue when searching for a tag. However, in principle, this could happen with application-types as well: a user could submit a number of random YARN applications with type MAPREDUCE. I thought the way we were restricting exposing these (tags/types) was through ACLs on a secure cluster. Agreed about application-types too but two wrong things don't make a ... But I see your reasoning. Solving this faking of application-types is hard. I think that it boils down to - Documenting very clearly that types and tags can clash with other user's inputs and so need to be judiciously used. - Having options when listing apps filtered by application-type or by application-tags (app-type is a specific kind of tag, so clearly the later subsumes the former). The important bit is that the default option be as restrictive as possible. Today getApplications() API returns the list of ALL applications whether accessible or not. That's a bad default that I thought I filed a ticket about. By default we should only return the apps of the current user. Then there should be options to list all accessible apps (apps with view-acl), and then finally all apps across all users. We could follow something similar for this JIRA and set it as a precedent for fixing other existing issues with the default listing and the listing against app-types? May be we should punt on the last one (listing ALL apps) altogether so that user's can only obtain lists of apps that they have access to. This does break the existing getApplications() API which we can leave as is - in any case, information about all the apps that are not accessible is completely blacked out, so it just serves as a simple listing. Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1413) [YARN-321] AHS WebUI should server aggregated logs as well
[ https://issues.apache.org/jira/browse/YARN-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1413: -- Attachment: YARN-1413-6.patch The original patch looked patch. Updating the same patch without the extraneous pom.xml changes. [YARN-321] AHS WebUI should server aggregated logs as well -- Key: YARN-1413 URL: https://issues.apache.org/jira/browse/YARN-1413 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Attachments: YARN-1413-1.patch, YARN-1413-2.patch, YARN-1413-3.patch, YARN-1413-4.patch, YARN-1413-5.patch, YARN-1413-6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (YARN-1413) [YARN-321] AHS WebUI should server aggregated logs as well
[ https://issues.apache.org/jira/browse/YARN-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-1413. --- Resolution: Fixed Fix Version/s: YARN-321 Hadoop Flags: Reviewed Committed this to YARN-321 branch. Thanks Mayank! It'll be great to have tests for this and more. Will identify areas missing test-coverage and file tickets. [YARN-321] AHS WebUI should server aggregated logs as well -- Key: YARN-1413 URL: https://issues.apache.org/jira/browse/YARN-1413 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Fix For: YARN-321 Attachments: YARN-1413-1.patch, YARN-1413-2.patch, YARN-1413-3.patch, YARN-1413-4.patch, YARN-1413-5.patch, YARN-1413-6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1551) Allow user-specified reason for killApplication
[ https://issues.apache.org/jira/browse/YARN-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861163#comment-13861163 ] Hadoop QA commented on YARN-1551: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621207/YARN-1551.v03.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.api.impl.TestYarnClient The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2780//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2780//console This message is automatically generated. Allow user-specified reason for killApplication --- Key: YARN-1551 URL: https://issues.apache.org/jira/browse/YARN-1551 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1551.v01.patch, YARN-1551.v02.patch, YARN-1551.v03.patch This completes MAPREDUCE-5648 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861167#comment-13861167 ] Hadoop QA commented on YARN-1529: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621228/YARN-1529.v02.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2783//console This message is automatically generated. Add Localization overhead metrics to NM --- Key: YARN-1529 URL: https://issues.apache.org/jira/browse/YARN-1529 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of metrics. We propose addition of the following metrics to NodeManagerMetrics. When a container is about to launch, its set of LocalResources has to be fetched from a central location, typically on HDFS, that results in a number of download requests for the files missing in caches. LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses. LocalizedFilesCached: total localization requests that were served from local caches. Cache hits. LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. LocalizedBytesCached: total bytes satisfied from local caches. Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served out of cache: ratio = 100 * caches / (caches + misses) LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1555) [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.*
Vinod Kumar Vavilapalli created YARN-1555: - Summary: [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.* Key: YARN-1555 URL: https://issues.apache.org/jira/browse/YARN-1555 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Several tests are failing on the latest YARN-321 branch. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1482) WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM
[ https://issues.apache.org/jira/browse/YARN-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861171#comment-13861171 ] Hadoop QA commented on YARN-1482: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621230/YARN-1482.4.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2785//console This message is automatically generated. WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM - Key: YARN-1482 URL: https://issues.apache.org/jira/browse/YARN-1482 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-1482.1.patch, YARN-1482.2.patch, YARN-1482.3.patch, YARN-1482.4.patch This way, even if an RM goes to standby mode, we can affect a redirect to the active. And more importantly, users will not suddenly see all their links stop working. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1555) [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.*
[ https://issues.apache.org/jira/browse/YARN-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1555: -- Attachment: YARN-1555-20140102.txt Simple fixes to the three failing tests. [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.* - Key: YARN-1555 URL: https://issues.apache.org/jira/browse/YARN-1555 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1555-20140102.txt Several tests are failing on the latest YARN-321 branch. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1555) [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.*
[ https://issues.apache.org/jira/browse/YARN-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861178#comment-13861178 ] Hadoop QA commented on YARN-1555: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621244/YARN-1555-20140102.txt against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2786//console This message is automatically generated. [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.* - Key: YARN-1555 URL: https://issues.apache.org/jira/browse/YARN-1555 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1555-20140102.txt Several tests are failing on the latest YARN-321 branch. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861186#comment-13861186 ] Hadoop QA commented on YARN-1506: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621217/YARN-1506-v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2784//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2784//console This message is automatically generated. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1551) Allow user-specified reason for killApplication
[ https://issues.apache.org/jira/browse/YARN-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861188#comment-13861188 ] Gera Shegalov commented on YARN-1551: - Test failure is unrelated. TestYarnClient.testAMMRTokens fails on trunk without patch as well. Allow user-specified reason for killApplication --- Key: YARN-1551 URL: https://issues.apache.org/jira/browse/YARN-1551 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1551.v01.patch, YARN-1551.v02.patch, YARN-1551.v03.patch This completes MAPREDUCE-5648 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-1529: Attachment: (was: YARN-1529.v02.patch) Add Localization overhead metrics to NM --- Key: YARN-1529 URL: https://issues.apache.org/jira/browse/YARN-1529 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of metrics. We propose addition of the following metrics to NodeManagerMetrics. When a container is about to launch, its set of LocalResources has to be fetched from a central location, typically on HDFS, that results in a number of download requests for the files missing in caches. LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses. LocalizedFilesCached: total localization requests that were served from local caches. Cache hits. LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. LocalizedBytesCached: total bytes satisfied from local caches. Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served out of cache: ratio = 100 * caches / (caches + misses) LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-1529: Attachment: YARN-1529.v02.patch Add Localization overhead metrics to NM --- Key: YARN-1529 URL: https://issues.apache.org/jira/browse/YARN-1529 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of metrics. We propose addition of the following metrics to NodeManagerMetrics. When a container is about to launch, its set of LocalResources has to be fetched from a central location, typically on HDFS, that results in a number of download requests for the files missing in caches. LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses. LocalizedFilesCached: total localization requests that were served from local caches. Cache hits. LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. LocalizedBytesCached: total bytes satisfied from local caches. Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served out of cache: ratio = 100 * caches / (caches + misses) LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861194#comment-13861194 ] Sunil G commented on YARN-1398: --- As per YARN-325, this issue was fixed before 2.1.0. But in 2.1.0, we can see like below ParentQueue.completedContainer while holding a lock on the LeafQueue. This can cause same issue which is mentioned in YARN-325. Is there any reason why the ParentQueue.completedContainer call is added back with holding the lock on leaf queue. Because as per the YARN-325 fix, the fix was to remove the same. And this has mentioned in the comments too. Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Priority: Critical getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861211#comment-13861211 ] Hadoop QA commented on YARN-1529: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621247/YARN-1529.v02.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 5 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2787//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2787//console This message is automatically generated. Add Localization overhead metrics to NM --- Key: YARN-1529 URL: https://issues.apache.org/jira/browse/YARN-1529 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of metrics. We propose addition of the following metrics to NodeManagerMetrics. When a container is about to launch, its set of LocalResources has to be fetched from a central location, typically on HDFS, that results in a number of download requests for the files missing in caches. LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses. LocalizedFilesCached: total localization requests that were served from local caches. Cache hits. LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. LocalizedBytesCached: total bytes satisfied from local caches. Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served out of cache: ratio = 100 * caches / (caches + misses) LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1398) Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call
[ https://issues.apache.org/jira/browse/YARN-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861236#comment-13861236 ] Sunil G commented on YARN-1398: --- During YARN-569 defect fix for adding a scheduling policy, the below code segment is added back in leafqueue lock segment // Inform the parent queue getParent().completedContainer(clusterResource, application, node, rmContainer, null, event, this); Pls let know whether this call is really required in the synchronized block of Leafqueue completedContainer call. Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedConatiner call --- Key: YARN-1398 URL: https://issues.apache.org/jira/browse/YARN-1398 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.2.0 Reporter: Sunil G Priority: Critical getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1482) WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM
[ https://issues.apache.org/jira/browse/YARN-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1482: Attachment: YARN-1482.4.patch WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM - Key: YARN-1482 URL: https://issues.apache.org/jira/browse/YARN-1482 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-1482.1.patch, YARN-1482.2.patch, YARN-1482.3.patch, YARN-1482.4.patch, YARN-1482.4.patch This way, even if an RM goes to standby mode, we can affect a redirect to the active. And more importantly, users will not suddenly see all their links stop working. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'
[ https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1166: -- Attachment: YARN-1166.6.patch Given YARN-1493 separates app and app-attempt events, I created a new patch which not only change 'appsFailed' from MutableGaugeInt to MutableCounterInt, but also bind the change of the counter with the correct event: 1. SubmitApp: increment appsSubmitted 2. SubmitAppAttempt: increment appsPending 3. RunAppAttempt: decrement appsPending, increment appsRunning 4. FinishAppAttempt: decrement appsRunning 5. FinishApp: increment appsCompleted/appsKilled/appsFailed 1,2 and 5 are binded to the app related events, and are always increasing, while 3 and 4 are binded to the app-attempt related events, and can increase and decrease. YARN 'appsFailed' metric should be of type 'counter' Key: YARN-1166 URL: https://issues.apache.org/jira/browse/YARN-1166 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, YARN-1166.5.patch, YARN-1166.6.patch, YARN-1166.patch Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported. All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points. To be consistent, AppsFailed metric should also be of type 'counter'. -- This message was sent by Atlassian JIRA (v6.1.5#6160)