[jira] [Commented] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'
[ https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861324#comment-13861324 ] Hadoop QA commented on YARN-1166: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621270/YARN-1166.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2789//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2789//console This message is automatically generated. YARN 'appsFailed' metric should be of type 'counter' Key: YARN-1166 URL: https://issues.apache.org/jira/browse/YARN-1166 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, YARN-1166.5.patch, YARN-1166.6.patch, YARN-1166.patch Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported. All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points. To be consistent, AppsFailed metric should also be of type 'counter'. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861341#comment-13861341 ] Junping Du commented on YARN-1506: -- Looks like JVM get crash in running test. Kick off Jenkins test again. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861362#comment-13861362 ] Hadoop QA commented on YARN-1506: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621217/YARN-1506-v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2790//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2790//console This message is automatically generated. Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-1529: Attachment: YARN-1529.v03.patch addressing javadoc warning Add Localization overhead metrics to NM --- Key: YARN-1529 URL: https://issues.apache.org/jira/browse/YARN-1529 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of metrics. We propose addition of the following metrics to NodeManagerMetrics. When a container is about to launch, its set of LocalResources has to be fetched from a central location, typically on HDFS, that results in a number of download requests for the files missing in caches. LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses. LocalizedFilesCached: total localization requests that were served from local caches. Cache hits. LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. LocalizedBytesCached: total bytes satisfied from local caches. Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served out of cache: ratio = 100 * caches / (caches + misses) LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861403#comment-13861403 ] Hadoop QA commented on YARN-1529: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621292/YARN-1529.v03.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2791//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2791//console This message is automatically generated. Add Localization overhead metrics to NM --- Key: YARN-1529 URL: https://issues.apache.org/jira/browse/YARN-1529 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch Users are often unaware of localization cost that their jobs incur. To measure effectiveness of localization caches it is necessary to expose the overhead in the form of metrics. We propose addition of the following metrics to NodeManagerMetrics. When a container is about to launch, its set of LocalResources has to be fetched from a central location, typically on HDFS, that results in a number of download requests for the files missing in caches. LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses. LocalizedFilesCached: total localization requests that were served from local caches. Cache hits. LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. LocalizedBytesCached: total bytes satisfied from local caches. Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that were served out of cache: ratio = 100 * caches / (caches + misses) LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861428#comment-13861428 ] Hudson commented on YARN-1493: -- FAILURE: Integrated in Hadoop-Yarn-trunk #441 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/441/]) YARN-1493. Changed ResourceManager and Scheduler interfacing to recognize app-attempts separately from apps. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1554896) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptRejectedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ActiveUsersManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerAppReport.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAttemptAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppRemovedSchedulerEvent.java *
[jira] [Commented] (YARN-1549) TestUnmanagedAMLauncher#testDSShell fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861426#comment-13861426 ] Hudson commented on YARN-1549: -- FAILURE: Integrated in Hadoop-Yarn-trunk #441 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/441/]) YARN-1549. Fixed a bug in ResourceManager's ApplicationMasterService that was causing unamanged AMs to not finish correctly. Contributed by haosdent. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1554886) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java TestUnmanagedAMLauncher#testDSShell fails in trunk -- Key: YARN-1549 URL: https://issues.apache.org/jira/browse/YARN-1549 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: haosdent Fix For: 2.4.0 Attachments: YARN-1549.1.patch, YARN-1549.patch The following error is reproducible: {code} testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher) Time elapsed: 14.911 sec ERROR! java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=RUNNING, ExpectedStates=FINISHED,FAILED,KILLED at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:147) {code} See https://builds.apache.org/job/Hadoop-Yarn-trunk/435 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1549) TestUnmanagedAMLauncher#testDSShell fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861511#comment-13861511 ] Hudson commented on YARN-1549: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1633 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1633/]) YARN-1549. Fixed a bug in ResourceManager's ApplicationMasterService that was causing unamanged AMs to not finish correctly. Contributed by haosdent. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1554886) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java TestUnmanagedAMLauncher#testDSShell fails in trunk -- Key: YARN-1549 URL: https://issues.apache.org/jira/browse/YARN-1549 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: haosdent Fix For: 2.4.0 Attachments: YARN-1549.1.patch, YARN-1549.patch The following error is reproducible: {code} testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher) Time elapsed: 14.911 sec ERROR! java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=RUNNING, ExpectedStates=FINISHED,FAILED,KILLED at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:147) {code} See https://builds.apache.org/job/Hadoop-Yarn-trunk/435 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861513#comment-13861513 ] Hudson commented on YARN-1493: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1633 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1633/]) YARN-1493. Changed ResourceManager and Scheduler interfacing to recognize app-attempts separately from apps. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1554896) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptRejectedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ActiveUsersManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerAppReport.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAttemptAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppRemovedSchedulerEvent.java *
[jira] [Commented] (YARN-1549) TestUnmanagedAMLauncher#testDSShell fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861556#comment-13861556 ] Hudson commented on YARN-1549: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1658 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1658/]) YARN-1549. Fixed a bug in ResourceManager's ApplicationMasterService that was causing unamanged AMs to not finish correctly. Contributed by haosdent. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1554886) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java TestUnmanagedAMLauncher#testDSShell fails in trunk -- Key: YARN-1549 URL: https://issues.apache.org/jira/browse/YARN-1549 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: haosdent Fix For: 2.4.0 Attachments: YARN-1549.1.patch, YARN-1549.patch The following error is reproducible: {code} testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher) Time elapsed: 14.911 sec ERROR! java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=RUNNING, ExpectedStates=FINISHED,FAILED,KILLED at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:147) {code} See https://builds.apache.org/job/Hadoop-Yarn-trunk/435 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts
[ https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861558#comment-13861558 ] Hudson commented on YARN-1493: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1658 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1658/]) YARN-1493. Changed ResourceManager and Scheduler interfacing to recognize app-attempts separately from apps. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1554896) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptRejectedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ActiveUsersManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerAppReport.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAttemptAddedSchedulerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppRemovedSchedulerEvent.java *
[jira] [Created] (YARN-1556) NPE getting application report with a null appId
Steve Loughran created YARN-1556: Summary: NPE getting application report with a null appId Key: YARN-1556 URL: https://issues.apache.org/jira/browse/YARN-1556 Project: Hadoop YARN Issue Type: Bug Reporter: Steve Loughran Priority: Trivial If you accidentally pass in a null appId to get application report, you get an NPE back. This is arguably as intended, except that maybe a guard statement could report this in such a way as to make it easy for callers to track down the cause. {code} java.lang.NullPointerException: java.lang.NullPointerException org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) at org.apache.hadoop.ipc.Client.call(Client.java:1347) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) ... 28 more {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1557) TestYarnClient#testAMMRTokens fails in trunk
Xuan Gong created YARN-1557: --- Summary: TestYarnClient#testAMMRTokens fails in trunk Key: YARN-1557 URL: https://issues.apache.org/jira/browse/YARN-1557 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1482) WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM
[ https://issues.apache.org/jira/browse/YARN-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861678#comment-13861678 ] Xuan Gong commented on YARN-1482: - The testcase failure can be tracked in https://issues.apache.org/jira/browse/YARN-1557 WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM - Key: YARN-1482 URL: https://issues.apache.org/jira/browse/YARN-1482 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-1482.1.patch, YARN-1482.2.patch, YARN-1482.3.patch, YARN-1482.4.patch, YARN-1482.4.patch This way, even if an RM goes to standby mode, we can affect a redirect to the active. And more importantly, users will not suddenly see all their links stop working. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861713#comment-13861713 ] Bikas Saha commented on YARN-1029: -- Patch looks good to me. Although the flakiness of the new Test needs to be monitored. One option would be to walk through the test in a debugger to satisfy yourself that things are indeed happening the way they should. Lets commit the patch and move onto the next items. I think this patch may have partially covered some of the work of the ZKFC jira. We can address further comments as they come. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861716#comment-13861716 ] Bikas Saha commented on YARN-1029: -- Thanks for your patience through the review. These things are pretty subtle and the more time we spent making it simple and thinking through stuff the better later on. Although I am sure we will be surprised by real life later on :P Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861731#comment-13861731 ] Xuan Gong commented on YARN-1410: - [~bikassaha]. [~kkambatl] Any further comments ? Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410.1.patch App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1557) TestYarnClient#testAMMRTokens fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861776#comment-13861776 ] Jian He commented on YARN-1557: --- This is caused by YARN-1493 and will be fixed in YARN-1490 TestYarnClient#testAMMRTokens fails in trunk Key: YARN-1557 URL: https://issues.apache.org/jira/browse/YARN-1557 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861777#comment-13861777 ] Vinod Kumar Vavilapalli commented on YARN-1029: --- bq. Vinod Kumar Vavilapalli - did you get a chance to look at the latest patch? Looking at it right now.. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1557) TestYarnClient#testAMMRTokens fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861780#comment-13861780 ] Jian He commented on YARN-1557: --- btw. this is a test issue not, core code issue. TestYarnClient#testAMMRTokens fails in trunk Key: YARN-1557 URL: https://issues.apache.org/jira/browse/YARN-1557 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1490: -- Attachment: YARN-1490.2.patch RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.2.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861850#comment-13861850 ] Hadoop QA commented on YARN-1490: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621367/YARN-1490.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2792//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2792//console This message is automatically generated. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.2.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1495) Allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861864#comment-13861864 ] Sandy Ryza commented on YARN-1495: -- Good point, Bikas. Filed YARN-1558 for this. Allow moving apps between queues Key: YARN-1495 URL: https://issues.apache.org/jira/browse/YARN-1495 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza This is an umbrella JIRA for work needed to allow moving YARN applications from one queue to another. The work will consist of additions in the command line options, additions in the client RM protocol, and changes in the schedulers to support this. I have a picture of how this should function in the Fair Scheduler, but I'm not familiar enough with the Capacity Scheduler for the same there. Ultimately, the decision to whether an application can be moved should go down to the scheduler - some schedulers may wish not to support this at all. However, schedulers that do support it should share some common semantics around ACLs and what happens to running containers. Here is how I see the general semantics working out: * A move request is issued by the client. After it gets past ACLs, the scheduler checks whether executing the move will violate any constraints. For the Fair Scheduler, these would be queue maxRunningApps and queue maxResources constraints * All running containers are transferred from the old queue to the new queue * All outstanding requests are transferred from the old queue to the new queue Here is I see the ACLs of this working out: * To move an app from a queue a user must have modify access on the app or administer access on the queue * To move an app to a queue a user must have submit access on the queue or administer access on the queue -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1558) Persist app queue changes in the RM state store
Sandy Ryza created YARN-1558: Summary: Persist app queue changes in the RM state store Key: YARN-1558 URL: https://issues.apache.org/jira/browse/YARN-1558 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sandy Ryza -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1490: -- Attachment: YARN-1490.3.patch RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861906#comment-13861906 ] Hadoop QA commented on YARN-1490: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621381/YARN-1490.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 16 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2793//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2793//console This message is automatically generated. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861921#comment-13861921 ] Sandy Ryza commented on YARN-1496: -- Uploading a polished patch Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1496.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1496: - Attachment: YARN-1496-1.patch Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1496-1.patch, YARN-1496.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861935#comment-13861935 ] Hadoop QA commented on YARN-1496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621388/YARN-1496-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2794//console This message is automatically generated. Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1496-1.patch, YARN-1496.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1496: - Attachment: YARN-1496-2.patch Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861949#comment-13861949 ] Sandy Ryza commented on YARN-1496: -- Fixing compilation issue Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861951#comment-13861951 ] Jian He commented on YARN-1490: --- sounds better, thanks! RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1136) Replace junit.framework.Assert with org.junit.Assert
[ https://issues.apache.org/jira/browse/YARN-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861957#comment-13861957 ] Hadoop QA commented on YARN-1136: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620886/yarn1136.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2796//console This message is automatically generated. Replace junit.framework.Assert with org.junit.Assert Key: YARN-1136 URL: https://issues.apache.org/jira/browse/YARN-1136 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Chen He Labels: newbie, test Attachments: yarn1136.patch There are several places where we are using junit.framework.Assert instead of org.junit.Assert. {code}grep -rn junit.framework.Assert hadoop-yarn-project/ --include=*.java{code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1136) Replace junit.framework.Assert with org.junit.Assert
[ https://issues.apache.org/jira/browse/YARN-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861967#comment-13861967 ] Chen He commented on YARN-1136: --- Service Temporarily Unavailable for the Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2796//console; Replace junit.framework.Assert with org.junit.Assert Key: YARN-1136 URL: https://issues.apache.org/jira/browse/YARN-1136 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Chen He Labels: newbie, test Attachments: yarn1136.patch There are several places where we are using junit.framework.Assert instead of org.junit.Assert. {code}grep -rn junit.framework.Assert hadoop-yarn-project/ --include=*.java{code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1555) [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.*
[ https://issues.apache.org/jira/browse/YARN-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861971#comment-13861971 ] Mayank Bansal commented on YARN-1555: - +1 Committing Thanks, Mayank [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.* - Key: YARN-1555 URL: https://issues.apache.org/jira/browse/YARN-1555 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1555-20140102.txt Several tests are failing on the latest YARN-321 branch. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1555) [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.*
[ https://issues.apache.org/jira/browse/YARN-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861989#comment-13861989 ] Mayank Bansal commented on YARN-1555: - Committed to YARN-321 branch. Thanks [~vinodkv] [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.* - Key: YARN-1555 URL: https://issues.apache.org/jira/browse/YARN-1555 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-1555-20140102.txt Several tests are failing on the latest YARN-321 branch. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861990#comment-13861990 ] Vinod Kumar Vavilapalli commented on YARN-1029: --- Some comments/questions on the last patch: - yarn_server_resourcemanager_service_protos.proto: RMActiveNodeInfoProto - ActiveRMInfoProto ? - yarn-default.xml: This kind of failover is embedded in the RM and does not explicitly fence stores.” - “does not” or “does”? - I think we should force admins to set yarn.resourcemanager.cluster-id explicitly (only in case HA is enabled for now). Defaults don’t tend to be changed and a default cluster-id can potentially cause hard-to-debug issues. - No need for YarnBadConfigurationException. It isn’t adding any value and is inconsistent with how we tackle misconfigs everywhere. Let’s just use YarnRuntimeException. - Why is ZK added to hadoop-yarn-client module? It should be only in server-common? - RMFatalEventType.EMBEDDED_ELECTOR - EMBEDDED_ELECTOR_FAILED or something like that? Similarly STORE_FENCED to STATE_STORE_FENCED and STORE_OP_FAILED to STATE_STORE_OP_FAILED for making it explicit. EmbeddedElectorService - Initialized in AdminService? It can be initialize in ResourceManager class itself and it can access AdminService via RMContext. - It can similarly access rmDispatcher from RMContext. Testing - We should have one test that switches off the automatic failover. May be retain the old testExplicitFailover test in TestRMFailover? - TestRMHA.testTransitionsWhenAutomaticFailoverEnabled: After each transition, check the state? Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862013#comment-13862013 ] Karthik Kambatla commented on YARN-1029: bq. Initialized in AdminService? It can be initialize in ResourceManager class itself and it can access AdminService via RMContext. We initially had it in the RM, but thought AdminService is a better place. http://tinyurl.com/qdo2vos bq. Why is ZK added to hadoop-yarn-client module? It should be only in server-common? TestRMFailover needs it. bq. yarn-default.xml: This kind of failover is embedded in the RM and does not explicitly fence stores.” - “does not” or “does”? The elector doesn't explicitly fence (as in the way HDFS does), it is implicit and the store is supposed to ensure a single RM can modify it at any point in time. bq. I think we should force admins to set yarn.resourcemanager.cluster-id explicitly (only in case HA is enabled for now). Defaults don’t tend to be changed and a default cluster-id can potentially cause hard-to-debug issues. I am okay either way, but I think the fewer configs we *force* admins to set the better. If there is a single cluster, it should be perfectly okay to just use the default. No? Will address remaining suggestions. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862051#comment-13862051 ] Hadoop QA commented on YARN-1496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621392/YARN-1496-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2795//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2795//console This message is automatically generated. Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs
[ https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862099#comment-13862099 ] Zhijie Shen commented on YARN-1461: --- Thanks Karthik for the patch. In addition to the discussion in YARN-1399. Here're some comments on the patch. 1. How about making the two constants configurable? {code} + @InterfaceStability.Evolving + public static final int MAX_TAGS = 10; + @InterfaceStability.Evolving + public static final int MAX_TAG_LENGTH = 25; {code} 2. Should ApplicationSubmissionContext#newInstance have String[] tags as well? Same for ApplicationReport and GetApplicationsRequest. Or you didn't do it on purpose for sake of compatibility? If so, I'm just feeling we're going to have more newInstance methods that cannot cover all the fields the objects should have. 3. Should we consider both case-sensitive and -insensitive, and both AND and OR logic? {code} + if (tags != null !tags.isEmpty()) { +SetString appTags = application.getTags(); +if (appTags == null || appTags.isEmpty()) { + continue; +} +boolean match = false; +for (String tag : tags) { + if (appTags.contains(tag)) { +match = true; +break; + } +} +if (!match) { + continue; +} + } {code} 4. IMHO, one useful web UI is to list top tags (or tag cloud) on the side bar. When one tag is clicked, the applications with this tag is shown on the page. Anyway, we can deal with the new UI in a separate ticket. RM API and RM changes to handle tags for running jobs - Key: YARN-1461 URL: https://issues.apache.org/jira/browse/YARN-1461 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch, yarn-1461-4.patch, yarn-1461-5.patch, yarn-1461-6.patch, yarn-1461-6.patch, yarn-1461-7.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1453) [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments
[ https://issues.apache.org/jira/browse/YARN-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated YARN-1453: - Attachment: 1453-trunk.patch 1453-branch-2.patch Updated patches refreshed to latest trunk and branch-2. [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments - Key: YARN-1453 URL: https://issues.apache.org/jira/browse/YARN-1453 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Andrew Purtell Priority: Minor Attachments: 1453-branch-2.patch, 1453-branch-2.patch, 1453-trunk.patch, 1453-trunk.patch Javadoc is more strict by default in JDK8 and will error out on malformed or illegal tags found in doc comments. Although tagged as JDK8 all of the required changes are generic Javadoc cleanups. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862110#comment-13862110 ] Vinod Kumar Vavilapalli commented on YARN-1029: --- bq. We initially had it in the RM, but thought AdminService is a better place. http://tinyurl.com/qdo2vos Sure. It's not a big deal either ways. Let's leave it the way you had in the latest patch. But the comment about using fields from RMContext holds. bq. TestRMFailover needs it. Hm.. then let's put it in server-common as a compile-time dependency and specifically in hadop-yarn-client as a test-dependency. Okay? bq. The elector doesn't explicitly fence [...] May be state that somehow? It did confuse me a little. bq. I am okay either way, but I think the fewer configs we force admins to set the better. If there is a single cluster, it should be perfectly okay to just use the default. No? Yeah, thought about it. But it seemed to me that the problem of debugging bad issues with conflicting cluster-ids is worse than the little convenience the default value is bringing. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862111#comment-13862111 ] Vinod Kumar Vavilapalli commented on YARN-1029: --- Oh, and apologies for the delayed review, holidays and all. And tx for being patient too. I hope to commit this over this week-end or as soon as you can make it available. Tx. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1555) [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.*
[ https://issues.apache.org/jira/browse/YARN-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1555: -- Fix Version/s: YARN-321 Hadoop Flags: Reviewed Tx Mayank! You have to set the reviewed-flag and the fix-version during the commit. Setting this one myself for now. [YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.* - Key: YARN-1555 URL: https://issues.apache.org/jira/browse/YARN-1555 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: YARN-321 Attachments: YARN-1555-20140102.txt Several tests are failing on the latest YARN-321 branch. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1559) Race between ServerRMProxy and ClientRMProxy setting RMProxy#INSTANCE
Karthik Kambatla created YARN-1559: -- Summary: Race between ServerRMProxy and ClientRMProxy setting RMProxy#INSTANCE Key: YARN-1559 URL: https://issues.apache.org/jira/browse/YARN-1559 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker RMProxy#INSTANCE is a non-final static field and both ServerRMProxy and ClientRMProxy set it. This leads to races as witnessed on - YARN-1482. Sample trace: {noformat} java.lang.IllegalArgumentException: RM does not support this client protocol at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.hadoop.yarn.client.ClientRMProxy.checkAllowedProtocols(ClientRMProxy.java:119) at org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.init(ConfiguredRMFailoverProxyProvider.java:58) at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:158) at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:88) at org.apache.hadoop.yarn.server.api.ServerRMProxy.createRMProxy(ServerRMProxy.java:56) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1482) WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM
[ https://issues.apache.org/jira/browse/YARN-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862118#comment-13862118 ] Karthik Kambatla commented on YARN-1482: bq. in TestRMFailOver.java to pass the test case. Otherwise it will throw out this exception: Ran into something similar - believe it is because of a race between ClientRMProxy and ServerRMProxy - the way we set INSTANCE is unorthodox and lends itself to these. Created YARN-1559. WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM - Key: YARN-1482 URL: https://issues.apache.org/jira/browse/YARN-1482 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Attachments: YARN-1482.1.patch, YARN-1482.2.patch, YARN-1482.3.patch, YARN-1482.4.patch, YARN-1482.4.patch This way, even if an RM goes to standby mode, we can affect a redirect to the active. And more importantly, users will not suddenly see all their links stop working. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862120#comment-13862120 ] Karthik Kambatla commented on YARN-1029: No problem. Thanks for the clarification, Vinod. Will take care of these changes as well. While adding testExplicitFailover back, ran into YARN-1559. Might make sense to fix that first. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-6.patch, yarn-1029-7.patch, yarn-1029-7.patch, yarn-1029-8.patch, yarn-1029-9.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862121#comment-13862121 ] Junping Du commented on YARN-1506: -- The patch is available for review. [~vinodkv] and [~bikassaha], mind giving it a review? Thanks! Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned YARN-1556: -- Assignee: haosdent NPE getting application report with a null appId Key: YARN-1556 URL: https://issues.apache.org/jira/browse/YARN-1556 Project: Hadoop YARN Issue Type: Bug Reporter: Steve Loughran Assignee: haosdent Priority: Trivial If you accidentally pass in a null appId to get application report, you get an NPE back. This is arguably as intended, except that maybe a guard statement could report this in such a way as to make it easy for callers to track down the cause. {code} java.lang.NullPointerException: java.lang.NullPointerException org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) at org.apache.hadoop.ipc.Client.call(Client.java:1347) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) ... 28 more {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1556) NPE getting application report with a null appId
[ https://issues.apache.org/jira/browse/YARN-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862238#comment-13862238 ] haosdent commented on YARN-1556: How about check if the appid is null in getApplicationReport and return a more friendly error message? NPE getting application report with a null appId Key: YARN-1556 URL: https://issues.apache.org/jira/browse/YARN-1556 Project: Hadoop YARN Issue Type: Bug Reporter: Steve Loughran Assignee: haosdent Priority: Trivial If you accidentally pass in a null appId to get application report, you get an NPE back. This is arguably as intended, except that maybe a guard statement could report this in such a way as to make it easy for callers to track down the cause. {code} java.lang.NullPointerException: java.lang.NullPointerException org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:243) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) at org.apache.hadoop.ipc.Client.call(Client.java:1347) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy75.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) ... 28 more {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)