[jira] [Commented] (YARN-2679) add container launch prepare time metrics to NM.
[ https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219119#comment-14219119 ] Hadoop QA commented on YARN-2679: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682592/YARN-2679.001.patch against trunk revision a9a0cc3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5884//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5884//console This message is automatically generated. add container launch prepare time metrics to NM. Key: YARN-2679 URL: https://issues.apache.org/jira/browse/YARN-2679 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2679.000.patch, YARN-2679.001.patch add metrics in NodeManagerMetrics to get prepare time to launch container. The prepare time is the duration between sending ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving ContainerEventType.CONTAINER_LAUNCHED event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class
[ https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2404: - Attachment: YARN-2404.5.patch Refreshed a patch. I found that TestRMRestart#testAppRecoveredInOrderOnRMRestart fails after the refactoring since recoverApplication loads data from RMStateStore#RMState#appState, which is created as a instance of HashMap. We should make it TreeMap to preserve the restoring order by key, so I fixed it in this patch. [~jianhe], could you take a look? Remove ApplicationAttemptState and ApplicationState class in RMStateStore class Key: YARN-2404 URL: https://issues.apache.org/jira/browse/YARN-2404 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, YARN-2404.4.patch, YARN-2404.5.patch We can remove ApplicationState and ApplicationAttemptState class in RMStateStore, given that we already have ApplicationStateData and ApplicationAttemptStateData records. we may just replace ApplicationState with ApplicationStateData, similarly for ApplicationAttemptState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2675: Attachment: YARN-2675.004.patch the containersKilled metrics is not updated when the container is killed during localization. - Key: YARN-2675 URL: https://issues.apache.org/jira/browse/YARN-2675 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2675.000.patch, YARN-2675.001.patch, YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch The containersKilled metrics is not updated when the container is killed during localization. We should add KILLING state in finished of ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219141#comment-14219141 ] zhihai xu commented on YARN-2675: - Hi [~kasha], Good suggestion, I added unit tests to exercise all the newly added transitions in the new patch YARN-2675.004.patch. thanks zhihai the containersKilled metrics is not updated when the container is killed during localization. - Key: YARN-2675 URL: https://issues.apache.org/jira/browse/YARN-2675 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2675.000.patch, YARN-2675.001.patch, YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch The containersKilled metrics is not updated when the container is killed during localization. We should add KILLING state in finished of ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor
[ https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-2243: Attachment: YARN-2243.patch Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor Key: YARN-2243 URL: https://issues.apache.org/jira/browse/YARN-2243 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Devaraj K Priority: Minor Attachments: YARN-2243.patch, YARN-2243.patch {code} public SchedulerApplicationAttempt(ApplicationAttemptId applicationAttemptId, String user, Queue queue, ActiveUsersManager activeUsersManager, RMContext rmContext) { Preconditions.checkNotNull(RMContext should not be null, rmContext); {code} Order of arguments is wrong for Preconditions.checkNotNull(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2267) Auxiliary Service support in RM
[ https://issues.apache.org/jira/browse/YARN-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-2267: Assignee: Rohith Auxiliary Service support in RM --- Key: YARN-2267 URL: https://issues.apache.org/jira/browse/YARN-2267 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Naganarasimha G R Assignee: Rohith Currently RM does not have a provision to run any Auxiliary services. For health/monitoring in RM, its better to make a plugin mechanism in RM itself, similar to NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor
[ https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219196#comment-14219196 ] Hadoop QA commented on YARN-2243: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682603/YARN-2243.patch against trunk revision a9a0cc3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5886//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5886//console This message is automatically generated. Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor Key: YARN-2243 URL: https://issues.apache.org/jira/browse/YARN-2243 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Devaraj K Priority: Minor Attachments: YARN-2243.patch, YARN-2243.patch {code} public SchedulerApplicationAttempt(ApplicationAttemptId applicationAttemptId, String user, Queue queue, ActiveUsersManager activeUsersManager, RMContext rmContext) { Preconditions.checkNotNull(RMContext should not be null, rmContext); {code} Order of arguments is wrong for Preconditions.checkNotNull(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219195#comment-14219195 ] Hadoop QA commented on YARN-2675: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682601/YARN-2675.004.patch against trunk revision a9a0cc3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 19 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/5885//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5885//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5885//console This message is automatically generated. the containersKilled metrics is not updated when the container is killed during localization. - Key: YARN-2675 URL: https://issues.apache.org/jira/browse/YARN-2675 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2675.000.patch, YARN-2675.001.patch, YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch The containersKilled metrics is not updated when the container is killed during localization. We should add KILLING state in finished of ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2675: Attachment: YARN-2675.004.patch the containersKilled metrics is not updated when the container is killed during localization. - Key: YARN-2675 URL: https://issues.apache.org/jira/browse/YARN-2675 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2675.000.patch, YARN-2675.001.patch, YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch The containersKilled metrics is not updated when the container is killed during localization. We should add KILLING state in finished of ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2675: Attachment: (was: YARN-2675.004.patch) the containersKilled metrics is not updated when the container is killed during localization. - Key: YARN-2675 URL: https://issues.apache.org/jira/browse/YARN-2675 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2675.000.patch, YARN-2675.001.patch, YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch The containersKilled metrics is not updated when the container is killed during localization. We should add KILLING state in finished of ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219221#comment-14219221 ] Hudson commented on YARN-2865: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #11 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/11/]) YARN-2865. Fixed RM to always create a new RMContext when transtions from StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/CHANGES.txt Application recovery continuously fails with Application with id already present. Cannot duplicate Key: YARN-2865 URL: https://issues.apache.org/jira/browse/YARN-2865 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch YARN-2588 handles exception thrown while transitioningToActive and reset activeServices. But it misses out clearing RMcontext apps/nodes details and ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
[ https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219222#comment-14219222 ] Hudson commented on YARN-2878: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #11 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/11/]) YARN-2878. Fix DockerContainerExecutor.apt.vm formatting. Contributed by Abin Shahab (jianhe: rev bc4ee5e06f89b2037e0967f8ba91089ced4b7f0e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm Fix DockerContainerExecutor.apt.vm formatting - Key: YARN-2878 URL: https://issues.apache.org/jira/browse/YARN-2878 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.6.0 Reporter: Abin Shahab Assignee: Abin Shahab Fix For: 2.7.0 Attachments: YARN-1964-docs.patch The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) ClusterMetrics to include AM launch and register delays
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219218#comment-14219218 ] Hudson commented on YARN-2802: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #11 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/11/]) YARN-2802. ClusterMetrics to include AM launch and register delays. (Zhihai Xu via kasha) (kasha: rev c90fb84aaa902e6676de65d0016dee3a5414eb95) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClusterMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java ClusterMetrics to include AM launch and register delays --- Key: YARN-2802 URL: https://issues.apache.org/jira/browse/YARN-2802 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.7.0 Attachments: YARN-2802.000.patch, YARN-2802.001.patch, YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, YARN-2802.005.patch add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue. Added two metrics in QueueMetrics: aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. aMRegisterDelay: the time waiting from receiving event RMAppAttemptEventType.LAUNCHED to receiving event RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled
[ https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219224#comment-14219224 ] Rohith commented on YARN-2880: -- Hi Wangda Tan, I am trying to write test cases for nodelabel recovery. IIUC, as of now recovery is not yet supported till YARN-2800 is committed. I just started using NodeLabel feature(it is still in development process) and I stuck with several doubts on usage of NodeLabels features. Any document available? 1. How can I configure Nodelabels? Is it only rmadmin as of now? 2. I set labels to NM from rmadmin,but how do I make use of these labels? If you dont mind, please give me crisp details. Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled --- Key: YARN-2880 URL: https://issues.apache.org/jira/browse/YARN-2880 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Rohith As suggested by [~ozawa], [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569]. We should have a such test to make sure there will be no regression -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity
[ https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219220#comment-14219220 ] Hudson commented on YARN-2315: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #11 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/11/]) YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java FairScheduler: Set current capacity in addition to capacity --- Key: YARN-2315 URL: https://issues.apache.org/jira/browse/YARN-2315 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.7.0 Attachments: YARN-2315.001.patch, YARN-2315.002.patch, YARN-2315.003.patch, YARN-2315.patch Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler. In function getQueueInfo of FSQueue.java, we call setCapacity twice with different parameters so the first call is overrode by the second call. queueInfo.setCapacity((float) getFairShare().getMemory() / scheduler.getClusterResource().getMemory()); queueInfo.setCapacity((float) getResourceUsage().getMemory() / scheduler.getClusterResource().getMemory()); We should change the second setCapacity call to setCurrentCapacity to configure the current used capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) ClusterMetrics to include AM launch and register delays
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219226#comment-14219226 ] Hudson commented on YARN-2802: -- FAILURE: Integrated in Hadoop-Yarn-trunk #749 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/749/]) YARN-2802. ClusterMetrics to include AM launch and register delays. (Zhihai Xu via kasha) (kasha: rev c90fb84aaa902e6676de65d0016dee3a5414eb95) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClusterMetrics.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java ClusterMetrics to include AM launch and register delays --- Key: YARN-2802 URL: https://issues.apache.org/jira/browse/YARN-2802 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.7.0 Attachments: YARN-2802.000.patch, YARN-2802.001.patch, YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, YARN-2802.005.patch add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue. Added two metrics in QueueMetrics: aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. aMRegisterDelay: the time waiting from receiving event RMAppAttemptEventType.LAUNCHED to receiving event RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219230#comment-14219230 ] Hudson commented on YARN-2865: -- FAILURE: Integrated in Hadoop-Yarn-trunk #749 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/749/]) YARN-2865. Fixed RM to always create a new RMContext when transtions from StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/CHANGES.txt Application recovery continuously fails with Application with id already present. Cannot duplicate Key: YARN-2865 URL: https://issues.apache.org/jira/browse/YARN-2865 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch YARN-2588 handles exception thrown while transitioningToActive and reset activeServices. But it misses out clearing RMcontext apps/nodes details and ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity
[ https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219229#comment-14219229 ] Hudson commented on YARN-2315: -- FAILURE: Integrated in Hadoop-Yarn-trunk #749 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/749/]) YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler: Set current capacity in addition to capacity --- Key: YARN-2315 URL: https://issues.apache.org/jira/browse/YARN-2315 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.7.0 Attachments: YARN-2315.001.patch, YARN-2315.002.patch, YARN-2315.003.patch, YARN-2315.patch Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler. In function getQueueInfo of FSQueue.java, we call setCapacity twice with different parameters so the first call is overrode by the second call. queueInfo.setCapacity((float) getFairShare().getMemory() / scheduler.getClusterResource().getMemory()); queueInfo.setCapacity((float) getResourceUsage().getMemory() / scheduler.getClusterResource().getMemory()); We should change the second setCapacity call to setCurrentCapacity to configure the current used capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
[ https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219231#comment-14219231 ] Hudson commented on YARN-2878: -- FAILURE: Integrated in Hadoop-Yarn-trunk #749 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/749/]) YARN-2878. Fix DockerContainerExecutor.apt.vm formatting. Contributed by Abin Shahab (jianhe: rev bc4ee5e06f89b2037e0967f8ba91089ced4b7f0e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm * hadoop-yarn-project/CHANGES.txt Fix DockerContainerExecutor.apt.vm formatting - Key: YARN-2878 URL: https://issues.apache.org/jira/browse/YARN-2878 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.6.0 Reporter: Abin Shahab Assignee: Abin Shahab Fix For: 2.7.0 Attachments: YARN-1964-docs.patch The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.
[ https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219244#comment-14219244 ] Hadoop QA commented on YARN-2675: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682609/YARN-2675.004.patch against trunk revision a9a0cc3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5887//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5887//console This message is automatically generated. the containersKilled metrics is not updated when the container is killed during localization. - Key: YARN-2675 URL: https://issues.apache.org/jira/browse/YARN-2675 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2675.000.patch, YARN-2675.001.patch, YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch The containersKilled metrics is not updated when the container is killed during localization. We should add KILLING state in finished of ContainerImpl.java to update killedContainer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2301) Improve yarn container command
[ https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2301: Attachment: YARN-2301.20141120-1.patch bq. NM can setup SSL and so the port can also be https port. Ok. crosschecked the code, http/https port is set in RMNode.httpport based on configuration. So there should not be any issues bq. I meant Times.format is internally doing the check. Ok, corrected bq. we may set the conf object in the rmContext and get it from context Ok, corrected Improve yarn container command -- Key: YARN-2301 URL: https://issues.apache.org/jira/browse/YARN-2301 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Naganarasimha G R Labels: usability Attachments: YARN-2301.01.patch, YARN-2301.03.patch, YARN-2301.20141120-1.patch, YARN-2303.patch While running yarn container -list Application Attempt ID command, some observations: 1) the scheme (e.g. http/https ) before LOG-URL is missing 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to print as time format. 3) finish-time is 0 if container is not yet finished. May be N/A 4) May have an option to run as yarn container -list appId OR yarn application -list-containers appId also. As attempt Id is not shown on console, this is easier for user to just copy the appId and run it, may also be useful for container-preserving AM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219287#comment-14219287 ] Naganarasimha G R commented on YARN-2495: - Following 3 test failure issues does not seem to be introduced from of my modifications {quote} TestApplicationClientProtocolOnHA.testGetContainersOnHA:154 TestApplicationClientProtocolOnHA.testSubmitApplicationOnHA:173 TestApplicationClientProtocolOnHA.testGetClusterMetricsOnHA:85 {quote} Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml or using script suggested by [~aw]) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2599) Standby RM should also expose some jmx and metrics
[ https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219307#comment-14219307 ] Rohith commented on YARN-2599: -- I do agree that StandBy RM should expose jmx and metrics. I did some analysis by comparing /jmx for active rm's and standby rm's(with private patch). All the metrics that were listed in /jmx are consistent with Active details and standby details. But I could not find any details when /metrics page when I tried from browser's(IE,chrome and fairefox).It displayed empty page!!! . I think /metrics details are embedded in /jmx only. Standby RM should also expose some jmx and metrics -- Key: YARN-2599 URL: https://issues.apache.org/jira/browse/YARN-2599 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Rohith YARN-1898 redirects jmx and metrics to the Active. As discussed there, we need to separate out metrics displayed so the Standby RM can also be monitored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2243) Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor
[ https://issues.apache.org/jira/browse/YARN-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219322#comment-14219322 ] Hadoop QA commented on YARN-2243: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682603/YARN-2243.patch against trunk revision a9a0cc3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5888//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5888//console This message is automatically generated. Order of arguments for Preconditions.checkNotNull() is wrong in SchedulerApplicationAttempt ctor Key: YARN-2243 URL: https://issues.apache.org/jira/browse/YARN-2243 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Devaraj K Priority: Minor Attachments: YARN-2243.patch, YARN-2243.patch {code} public SchedulerApplicationAttempt(ApplicationAttemptId applicationAttemptId, String user, Queue queue, ActiveUsersManager activeUsersManager, RMContext rmContext) { Preconditions.checkNotNull(RMContext should not be null, rmContext); {code} Order of arguments is wrong for Preconditions.checkNotNull(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class
[ https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219324#comment-14219324 ] Hadoop QA commented on YARN-2404: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682598/YARN-2404.5.patch against trunk revision a9a0cc3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5889//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5889//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5889//console This message is automatically generated. Remove ApplicationAttemptState and ApplicationState class in RMStateStore class Key: YARN-2404 URL: https://issues.apache.org/jira/browse/YARN-2404 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, YARN-2404.4.patch, YARN-2404.5.patch We can remove ApplicationState and ApplicationAttemptState class in RMStateStore, given that we already have ApplicationStateData and ApplicationAttemptStateData records. we may just replace ApplicationState with ApplicationStateData, similarly for ApplicationAttemptState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2881) Implement PlanFollower for FairScheduler
Anubhav Dhoot created YARN-2881: --- Summary: Implement PlanFollower for FairScheduler Key: YARN-2881 URL: https://issues.apache.org/jira/browse/YARN-2881 Project: Hadoop YARN Issue Type: Sub-task Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
[ https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasanth kumar RJ updated YARN-2165: --- Attachment: YARN-2165.3.patch [~zjshen] Implemented your suggestion. Kindly review. Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero - Key: YARN-2165 URL: https://issues.apache.org/jira/browse/YARN-2165 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Karam Singh Assignee: Vasanth kumar RJ Attachments: YARN-2165.1.patch, YARN-2165.2.patch, YARN-2165.3.patch, YARN-2165.patch Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero Currently if set yarn.timeline-service.ttl-ms=0 Or yarn.timeline-service.ttl-ms=-86400 Timeline server start successfully with complaining {code} 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:init(247)) - Starting deletion thread with ttl -60480 and cycle interval 30 {code} At starting timelinserver should that yarn.timeline-service-ttl-ms 0 otherwise specially for -ive value discard oldvalues timestamp will be set future value. Which may lead to inconsistancy in behavior {code} public void run() { while (true) { long timestamp = System.currentTimeMillis() - ttl; try { discardOldEntities(timestamp); Thread.sleep(ttlInterval); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219406#comment-14219406 ] Hudson commented on YARN-2865: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #11 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/11/]) YARN-2865. Fixed RM to always create a new RMContext when transtions from StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java Application recovery continuously fails with Application with id already present. Cannot duplicate Key: YARN-2865 URL: https://issues.apache.org/jira/browse/YARN-2865 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch YARN-2588 handles exception thrown while transitioningToActive and reset activeServices. But it misses out clearing RMcontext apps/nodes details and ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
[ https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219407#comment-14219407 ] Hudson commented on YARN-2878: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #11 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/11/]) YARN-2878. Fix DockerContainerExecutor.apt.vm formatting. Contributed by Abin Shahab (jianhe: rev bc4ee5e06f89b2037e0967f8ba91089ced4b7f0e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm * hadoop-yarn-project/CHANGES.txt Fix DockerContainerExecutor.apt.vm formatting - Key: YARN-2878 URL: https://issues.apache.org/jira/browse/YARN-2878 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.6.0 Reporter: Abin Shahab Assignee: Abin Shahab Fix For: 2.7.0 Attachments: YARN-1964-docs.patch The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity
[ https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219405#comment-14219405 ] Hudson commented on YARN-2315: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #11 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/11/]) YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java FairScheduler: Set current capacity in addition to capacity --- Key: YARN-2315 URL: https://issues.apache.org/jira/browse/YARN-2315 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.7.0 Attachments: YARN-2315.001.patch, YARN-2315.002.patch, YARN-2315.003.patch, YARN-2315.patch Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler. In function getQueueInfo of FSQueue.java, we call setCapacity twice with different parameters so the first call is overrode by the second call. queueInfo.setCapacity((float) getFairShare().getMemory() / scheduler.getClusterResource().getMemory()); queueInfo.setCapacity((float) getResourceUsage().getMemory() / scheduler.getClusterResource().getMemory()); We should change the second setCapacity call to setCurrentCapacity to configure the current used capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity
[ https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219413#comment-14219413 ] Hudson commented on YARN-2315: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1939 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1939/]) YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java FairScheduler: Set current capacity in addition to capacity --- Key: YARN-2315 URL: https://issues.apache.org/jira/browse/YARN-2315 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.7.0 Attachments: YARN-2315.001.patch, YARN-2315.002.patch, YARN-2315.003.patch, YARN-2315.patch Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler. In function getQueueInfo of FSQueue.java, we call setCapacity twice with different parameters so the first call is overrode by the second call. queueInfo.setCapacity((float) getFairShare().getMemory() / scheduler.getClusterResource().getMemory()); queueInfo.setCapacity((float) getResourceUsage().getMemory() / scheduler.getClusterResource().getMemory()); We should change the second setCapacity call to setCurrentCapacity to configure the current used capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219414#comment-14219414 ] Hudson commented on YARN-2865: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1939 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1939/]) YARN-2865. Fixed RM to always create a new RMContext when transtions from StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java Application recovery continuously fails with Application with id already present. Cannot duplicate Key: YARN-2865 URL: https://issues.apache.org/jira/browse/YARN-2865 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch YARN-2588 handles exception thrown while transitioningToActive and reset activeServices. But it misses out clearing RMcontext apps/nodes details and ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero
[ https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219417#comment-14219417 ] Hadoop QA commented on YARN-2165: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682646/YARN-2165.3.patch against trunk revision a9a0cc3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5890//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5890//console This message is automatically generated. Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero - Key: YARN-2165 URL: https://issues.apache.org/jira/browse/YARN-2165 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Karam Singh Assignee: Vasanth kumar RJ Attachments: YARN-2165.1.patch, YARN-2165.2.patch, YARN-2165.3.patch, YARN-2165.patch Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero Currently if set yarn.timeline-service.ttl-ms=0 Or yarn.timeline-service.ttl-ms=-86400 Timeline server start successfully with complaining {code} 2014-06-15 14:52:16,562 INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:init(247)) - Starting deletion thread with ttl -60480 and cycle interval 30 {code} At starting timelinserver should that yarn.timeline-service-ttl-ms 0 otherwise specially for -ive value discard oldvalues timestamp will be set future value. Which may lead to inconsistancy in behavior {code} public void run() { while (true) { long timestamp = System.currentTimeMillis() - ttl; try { discardOldEntities(timestamp); Thread.sleep(ttlInterval); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity
[ https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219468#comment-14219468 ] Hudson commented on YARN-2315: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1963 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1963/]) YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java FairScheduler: Set current capacity in addition to capacity --- Key: YARN-2315 URL: https://issues.apache.org/jira/browse/YARN-2315 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.7.0 Attachments: YARN-2315.001.patch, YARN-2315.002.patch, YARN-2315.003.patch, YARN-2315.patch Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler. In function getQueueInfo of FSQueue.java, we call setCapacity twice with different parameters so the first call is overrode by the second call. queueInfo.setCapacity((float) getFairShare().getMemory() / scheduler.getClusterResource().getMemory()); queueInfo.setCapacity((float) getResourceUsage().getMemory() / scheduler.getClusterResource().getMemory()); We should change the second setCapacity call to setCurrentCapacity to configure the current used capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
[ https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219470#comment-14219470 ] Hudson commented on YARN-2878: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1963 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1963/]) YARN-2878. Fix DockerContainerExecutor.apt.vm formatting. Contributed by Abin Shahab (jianhe: rev bc4ee5e06f89b2037e0967f8ba91089ced4b7f0e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm Fix DockerContainerExecutor.apt.vm formatting - Key: YARN-2878 URL: https://issues.apache.org/jira/browse/YARN-2878 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.6.0 Reporter: Abin Shahab Assignee: Abin Shahab Fix For: 2.7.0 Attachments: YARN-1964-docs.patch The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219469#comment-14219469 ] Hudson commented on YARN-2865: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1963 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1963/]) YARN-2865. Fixed RM to always create a new RMContext when transtions from StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java Application recovery continuously fails with Application with id already present. Cannot duplicate Key: YARN-2865 URL: https://issues.apache.org/jira/browse/YARN-2865 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch YARN-2588 handles exception thrown while transitioningToActive and reset activeServices. But it misses out clearing RMcontext apps/nodes details and ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) ClusterMetrics to include AM launch and register delays
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219466#comment-14219466 ] Hudson commented on YARN-2802: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1963 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1963/]) YARN-2802. ClusterMetrics to include AM launch and register delays. (Zhihai Xu via kasha) (kasha: rev c90fb84aaa902e6676de65d0016dee3a5414eb95) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClusterMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java ClusterMetrics to include AM launch and register delays --- Key: YARN-2802 URL: https://issues.apache.org/jira/browse/YARN-2802 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.7.0 Attachments: YARN-2802.000.patch, YARN-2802.001.patch, YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, YARN-2802.005.patch add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue. Added two metrics in QueueMetrics: aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. aMRegisterDelay: the time waiting from receiving event RMAppAttemptEventType.LAUNCHED to receiving event RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting
[ https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219482#comment-14219482 ] Hudson commented on YARN-2878: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #11 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/11/]) YARN-2878. Fix DockerContainerExecutor.apt.vm formatting. Contributed by Abin Shahab (jianhe: rev bc4ee5e06f89b2037e0967f8ba91089ced4b7f0e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm Fix DockerContainerExecutor.apt.vm formatting - Key: YARN-2878 URL: https://issues.apache.org/jira/browse/YARN-2878 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.6.0 Reporter: Abin Shahab Assignee: Abin Shahab Fix For: 2.7.0 Attachments: YARN-1964-docs.patch The formatting on DockerContainerExecutor.apt.vm is off. Needs correction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2802) ClusterMetrics to include AM launch and register delays
[ https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219478#comment-14219478 ] Hudson commented on YARN-2802: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #11 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/11/]) YARN-2802. ClusterMetrics to include AM launch and register delays. (Zhihai Xu via kasha) (kasha: rev c90fb84aaa902e6676de65d0016dee3a5414eb95) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClusterMetrics.java * hadoop-yarn-project/CHANGES.txt ClusterMetrics to include AM launch and register delays --- Key: YARN-2802 URL: https://issues.apache.org/jira/browse/YARN-2802 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.7.0 Attachments: YARN-2802.000.patch, YARN-2802.001.patch, YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, YARN-2802.005.patch add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue. Added two metrics in QueueMetrics: aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl. aMRegisterDelay: the time waiting from receiving event RMAppAttemptEventType.LAUNCHED to receiving event RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster) in RMAppAttemptImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate
[ https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219481#comment-14219481 ] Hudson commented on YARN-2865: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #11 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/11/]) YARN-2865. Fixed RM to always create a new RMContext when transtions from StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java Application recovery continuously fails with Application with id already present. Cannot duplicate Key: YARN-2865 URL: https://issues.apache.org/jira/browse/YARN-2865 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Critical Fix For: 2.7.0 Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch YARN-2588 handles exception thrown while transitioningToActive and reset activeServices. But it misses out clearing RMcontext apps/nodes details and ClusterMetrics and QueueMetrics. This causes application recovery to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity
[ https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219480#comment-14219480 ] Hudson commented on YARN-2315: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #11 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/11/]) YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt FairScheduler: Set current capacity in addition to capacity --- Key: YARN-2315 URL: https://issues.apache.org/jira/browse/YARN-2315 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Fix For: 2.7.0 Attachments: YARN-2315.001.patch, YARN-2315.002.patch, YARN-2315.003.patch, YARN-2315.patch Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler. In function getQueueInfo of FSQueue.java, we call setCapacity twice with different parameters so the first call is overrode by the second call. queueInfo.setCapacity((float) getFairShare().getMemory() / scheduler.getClusterResource().getMemory()); queueInfo.setCapacity((float) getResourceUsage().getMemory() / scheduler.getClusterResource().getMemory()); We should change the second setCapacity call to setCurrentCapacity to configure the current used capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class
[ https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2404: - Attachment: YARN-2404.6.patch Fixed warnings by findbugs. Remove ApplicationAttemptState and ApplicationState class in RMStateStore class Key: YARN-2404 URL: https://issues.apache.org/jira/browse/YARN-2404 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, YARN-2404.4.patch, YARN-2404.5.patch, YARN-2404.6.patch We can remove ApplicationState and ApplicationAttemptState class in RMStateStore, given that we already have ApplicationStateData and ApplicationAttemptStateData records. we may just replace ApplicationState with ApplicationStateData, similarly for ApplicationAttemptState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2727) In RMAdminCLI usage display, instead of yarn.node-labels.fs-store.root-dir, yarn.node-labels.fs-store.uri is being displayed
[ https://issues.apache.org/jira/browse/YARN-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219491#comment-14219491 ] Naganarasimha G R commented on YARN-2727: - Hi [~wangda], As discussed, shall i close this issue as you have handled as part of other jira? In RMAdminCLI usage display, instead of yarn.node-labels.fs-store.root-dir, yarn.node-labels.fs-store.uri is being displayed Key: YARN-2727 URL: https://issues.apache.org/jira/browse/YARN-2727 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Attachments: YARN-2727.20141023.1.patch In org.apache.hadoop.yarn.client.cli.RMAdminCLI usage display instead of yarn.node-labels.fs-store.root-dir, yarn.node-labels.fs-store.uri is being used And also some modifications for the description -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2375: Attachment: YARN-2375.patch Thanks for taking a look [~zjshen] and [~jeagles]. Attaching updated patch to address Zhijie's comments. *Additions:* # In ApplicationMaster.finish(), we now stop the timeline client if the timelineClient instance is not null. # Fixed the indent issue in TimelineClientImpl#serviceInit(). # LOG.info(Timeline server is (not) enabled) changed to LOG.info(Timeline service is (not) enabled); to be consistent with the log statements in other places. I still have not added the test case for testing the scenario if MAPREDUCE_JOB_EMIT_TIMELINE_DATA = true and TIMELINE_SERVICE_ENABLED = false; MiniMRYarnCluster doesn't start the timeline server. MiniMRYarnCluster seems to follow a little different path for starting up the timeline service. I am investigating that currently. I propose to address that in a followup jira. That way we can have the important fix checked in. If you guys are ok with that, I will file a jira. Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219546#comment-14219546 ] Hadoop QA commented on YARN-2375: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682667/YARN-2375.patch against trunk revision a9a0cc3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5891//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5891//console This message is automatically generated. Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2517) Implement TimelineClientAsync
[ https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2517: - Attachment: YARN-2517.2.patch Sorry for the delay. Attached Future-based implementation for simplicity. I think this design is one of the best way we go to. [~vinodkv], [~zjshen], should we add read APIs on another JIRA? And, do you have any opinions about Future-based design? Implement TimelineClientAsync - Key: YARN-2517 URL: https://issues.apache.org/jira/browse/YARN-2517 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Attachments: YARN-2517.1.patch, YARN-2517.2.patch In some scenarios, we'd like to put timeline entities in another thread no to block the current one. It's good to have a TimelineClientAsync like AMRMClientAsync and NMClientAsync. It can buffer entities, put them in a separate thread, and have callback to handle the responses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2517) Implement TimelineClientAsync
[ https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219554#comment-14219554 ] Tsuyoshi OZAWA commented on YARN-2517: -- [~mitdesai] I think you're one of the users of TimelineClient. If you have any feedbacks about the interface, please let me know. Implement TimelineClientAsync - Key: YARN-2517 URL: https://issues.apache.org/jira/browse/YARN-2517 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Attachments: YARN-2517.1.patch, YARN-2517.2.patch In some scenarios, we'd like to put timeline entities in another thread no to block the current one. It's good to have a TimelineClientAsync like AMRMClientAsync and NMClientAsync. It can buffer entities, put them in a separate thread, and have callback to handle the responses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class
[ https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219581#comment-14219581 ] Hadoop QA commented on YARN-2404: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682668/YARN-2404.6.patch against trunk revision a9a0cc3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5892//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5892//console This message is automatically generated. Remove ApplicationAttemptState and ApplicationState class in RMStateStore class Key: YARN-2404 URL: https://issues.apache.org/jira/browse/YARN-2404 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Tsuyoshi OZAWA Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, YARN-2404.4.patch, YARN-2404.5.patch, YARN-2404.6.patch We can remove ApplicationState and ApplicationAttemptState class in RMStateStore, given that we already have ApplicationStateData and ApplicationAttemptStateData records. we may just replace ApplicationState with ApplicationStateData, similarly for ApplicationAttemptState. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2517) Implement TimelineClientAsync
[ https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219587#comment-14219587 ] Hadoop QA commented on YARN-2517: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682679/YARN-2517.2.patch against trunk revision a9a0cc3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5893//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5893//console This message is automatically generated. Implement TimelineClientAsync - Key: YARN-2517 URL: https://issues.apache.org/jira/browse/YARN-2517 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Attachments: YARN-2517.1.patch, YARN-2517.2.patch In some scenarios, we'd like to put timeline entities in another thread no to block the current one. It's good to have a TimelineClientAsync like AMRMClientAsync and NMClientAsync. It can buffer entities, put them in a separate thread, and have callback to handle the responses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219598#comment-14219598 ] Allen Wittenauer commented on YARN-2786: bq. Does this make sense? No, it doesn't. It completely ignores 20+ years of industry experience towards operations and configuration management of large scale installations. This is mostly exemplified by this comment: bq. I'd rather have the tools call an API instead of 'automatically' sshing into 1000 machines and changing labels. I'm completely stunned and saddened by this ignorance. I suspect that there is a corporate mandate to get Ambari working as a third tier scheduling system by dictating where services run. But that mandate (and its likely required deliverable time) has put blinders on the architecture and may very well cause long term pain and could potentially prevent other, more complex needs from being met. The only silver linings I'm seeing are thus: * We still have time to undo the damage either now or in 3.x. * Selfishly, this will give me years of material about how not to design a system to be operationally friendly. Create yarn cluster CLI to enable list node labels collection - Key: YARN-2786 URL: https://issues.apache.org/jira/browse/YARN-2786 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch With YARN-2778, we can list node labels on existing RM nodes. But it is not enough, we should be able to: 1) list node labels collection The command should start with yarn cluster ..., in the future, we can add more functionality to the yarnClusterCLI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose
[ https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2356: -- Attachment: 0002-YARN-2356.patch bq.I thought it would be good if we avoid rehandling the same exception Yes [~devaraj.k]. I also feel that is more better. Double throwing of exception can be removed. I also updated the test case as mentioned. yarn status command for non-existent application/application attempt/container is too verbose -- Key: YARN-2356 URL: https://issues.apache.org/jira/browse/YARN-2356 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Sunil G Assignee: Sunil G Priority: Minor Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, Yarn-2356.1.patch *yarn application -status* or *applicationattempt -status* or *container status* commands can suppress exception such as ApplicationNotFound, ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in RM or History Server. For example, below exception can be suppressed better sunildev@host-a:~/hadoop/hadoop/bin ./yarn application -status application_1402668848165_0015 No GC_PROFILE is given. Defaults to medium. 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at /10.18.40.77:45022 Exception in thread main org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1402668848165_0015' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at $Proxy12.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException): Application with id 'application_1402668848165_0015' doesn't exist in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2854: Attachment: YARN-2854.20141120-1.patch Patch for documentation lssues in timeline server. [~zjshen] can you please review The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: YARN-2854.20141120-1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino reassigned YARN-2877: -- Assignee: Carlo Curino Extend YARN to support distributed scheduling - Key: YARN-2877 URL: https://issues.apache.org/jira/browse/YARN-2877 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Sriram Rao Assignee: Carlo Curino This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling. Briefly, some of the motivations for distributed scheduling are the following: 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources on individual machines. 2. Reduce allocation latency. Tasks where the scheduling time dominates (i.e., task execution time is much less compared to the time required for obtaining a container from the RM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-2877: --- Assignee: (was: Carlo Curino) Extend YARN to support distributed scheduling - Key: YARN-2877 URL: https://issues.apache.org/jira/browse/YARN-2877 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Sriram Rao This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling. Briefly, some of the motivations for distributed scheduling are the following: 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources on individual machines. 2. Reduce allocation latency. Tasks where the scheduling time dominates (i.e., task execution time is much less compared to the time required for obtaining a container from the RM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose
[ https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219723#comment-14219723 ] Hadoop QA commented on YARN-2356: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682693/0002-YARN-2356.patch against trunk revision a9a0cc3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5894//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5894//console This message is automatically generated. yarn status command for non-existent application/application attempt/container is too verbose -- Key: YARN-2356 URL: https://issues.apache.org/jira/browse/YARN-2356 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Sunil G Assignee: Sunil G Priority: Minor Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, Yarn-2356.1.patch *yarn application -status* or *applicationattempt -status* or *container status* commands can suppress exception such as ApplicationNotFound, ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in RM or History Server. For example, below exception can be suppressed better sunildev@host-a:~/hadoop/hadoop/bin ./yarn application -status application_1402668848165_0015 No GC_PROFILE is given. Defaults to medium. 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at /10.18.40.77:45022 Exception in thread main org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1402668848165_0015' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[jira] [Created] (YARN-2882) Introducing container types
Konstantinos Karanasos created YARN-2882: Summary: Introducing container types Key: YARN-2882 URL: https://issues.apache.org/jira/browse/YARN-2882 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos This JIRA introduces the notion of container types. We propose two initial types of containers: guaranteed-start and queueable containers. Guaranteed-start are the existing containers, which are allocated by the central RM and are instantaneously started, once allocated. Queueable is a new type of container, which allows containers to be queued in the NM, thus their execution may be arbitrarily delayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2882) Introducing container types
[ https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219763#comment-14219763 ] Carlo Curino commented on YARN-2882: To help understand this notion, think of containers types as a priority. The *guaranteed-start* containers, have higher priority, and are never over-booked (i.e., when they show up in the NM they are started instantaneously). By contrast the *queueable* containers are sent to the NM, and will be started only when there is room in the node. Also if *guaranteed-start* containers show up in a node that was completely utilized running *queueable* containers, the *queueable* containers are preempted/killed, to guarantee the start of the higher priority containers. _(The rest of the comment below is covered in other sub-JIRAs of YARN-2877, adding here some hints to the ideas for context)_ By having an explicit notion of container types, the AM can control when to use one type vs the other. For example, one can use *queueable* containers for tasks that are not yet on the critical path, and/or for short-running tasks (higher chance to complete). One important use of *queueable* containers is to allow us to boost utilization of the nodes (having a queue of work, minimize the times in which the NM is idle). Introducing container types --- Key: YARN-2882 URL: https://issues.apache.org/jira/browse/YARN-2882 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Konstantinos Karanasos This JIRA introduces the notion of container types. We propose two initial types of containers: guaranteed-start and queueable containers. Guaranteed-start are the existing containers, which are allocated by the central RM and are instantaneously started, once allocated. Queueable is a new type of container, which allows containers to be queued in the NM, thus their execution may be arbitrarily delayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2773) ReservationSystem's use of Queue names vs paths is inconsistent for CapacityReservationSystem and FairReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219769#comment-14219769 ] Anubhav Dhoot commented on YARN-2773: - getAdmissionPolicy uses the planQueuePath (fully qualified) while rest of the methods (e.g. getPlanQueueCapacity) uses planQueueName (just leaf queue name). ReservationSystem's use of Queue names vs paths is inconsistent for CapacityReservationSystem and FairReservationSystem - Key: YARN-2773 URL: https://issues.apache.org/jira/browse/YARN-2773 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Anubhav Dhoot Priority: Minor Reservation system requires use the ReservationDefinition to use a queue name to choose which reservation queue is being used. CapacityScheduler does not allow duplicate leaf queue names. Because of this we can refer to a unique leaf queue by simply using its name and not full path (which includes parentName + .). FairScheduler allows duplicate leaf queue names because of which one needs to refer to the full queue name to identify a queue uniquely. This is inconsistent for the implementation of the AbstractReservationSystem where one implementation of getQueuePath will do conversion (CapacityReservationSystem) while the FairReservationSystem will return the same value back -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2883) Queuing of container requests in the NM
Konstantinos Karanasos created YARN-2883: Summary: Queuing of container requests in the NM Key: YARN-2883 URL: https://issues.apache.org/jira/browse/YARN-2883 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos We propose to add a queue in each NM, where queueable container requests can be held. Based on the available resources in the node and the containers in the queue, the NM will decide when to allow the execution of a queued container. In order to ensure the instantaneous start of a guaranteed-start container, the NM may decide to pre-empt/kill running queueable containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2882) Introducing container types
[ https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219783#comment-14219783 ] Konstantinos Karanasos commented on YARN-2882: -- The queuing of containers is discussed in YARN-2883. Introducing container types --- Key: YARN-2882 URL: https://issues.apache.org/jira/browse/YARN-2882 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Konstantinos Karanasos This JIRA introduces the notion of container types. We propose two initial types of containers: guaranteed-start and queueable containers. Guaranteed-start are the existing containers, which are allocated by the central RM and are instantaneously started, once allocated. Queueable is a new type of container, which allows containers to be queued in the NM, thus their execution may be arbitrarily delayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2884) Proxying all AM-RM communications
Carlo Curino created YARN-2884: -- Summary: Proxying all AM-RM communications Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2885) LocalRM: distributed scheduling decisions for queueable containers
Konstantinos Karanasos created YARN-2885: Summary: LocalRM: distributed scheduling decisions for queueable containers Key: YARN-2885 URL: https://issues.apache.org/jira/browse/YARN-2885 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos We propose to add a Local ResourceManager (LocalRM) to the NM in order to support distributed scheduling decisions. Architecturally we leverage the RMProxy, introduced in YARN-2884. The LocalRM makes distributed decisions for queuable containers requests. Guaranteed-start requests are still handled by the central RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219844#comment-14219844 ] Karthik Kambatla commented on YARN-2877: +1 to the idea, particularly to reduce the allocation latency. I definitely see Impala wanting to use this in the future. Not mentioned in the description, I believe scale is probably another big reason for distributed scheduling. bq. Improve cluster utilization by opportunistically executing tasks otherwise idle resources on individual machines. A centralized RM could schedule tasks opportunistically too? Is the intention to quickly adapt to changing resource usage on the node, and the latency due to NM-RM-NM communication being too long to loose this window of opportunity? Extend YARN to support distributed scheduling - Key: YARN-2877 URL: https://issues.apache.org/jira/browse/YARN-2877 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Sriram Rao This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling. Briefly, some of the motivations for distributed scheduling are the following: 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources on individual machines. 2. Reduce allocation latency. Tasks where the scheduling time dominates (i.e., task execution time is much less compared to the time required for obtaining a container from the RM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219850#comment-14219850 ] Karthik Kambatla commented on YARN-2884: Given we already have an RMProxy, can we go with LocalRM as Sriram suggested on YARN-2877? Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2886) Estimating waiting time in NM container queues
Konstantinos Karanasos created YARN-2886: Summary: Estimating waiting time in NM container queues Key: YARN-2886 URL: https://issues.apache.org/jira/browse/YARN-2886 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos This JIRA is about estimating the waiting time of each NM queue. Having these estimates is crucial for the distributed scheduling of container requests, as it allows the LocalRM to decide in which NMs to queue the queuable container requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2887) AM policies for choosing type of containers
Konstantinos Karanasos created YARN-2887: Summary: AM policies for choosing type of containers Key: YARN-2887 URL: https://issues.apache.org/jira/browse/YARN-2887 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos Each AM can employ policies that determine what type of container (guaranteed-start or queueable) should be requested for each task. An example policy may be to use only guaranteed-start or only queueable containers, or to randomly pick a percentage of the requests to be queueable, or to choose the container type based on the characteristics of the tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219871#comment-14219871 ] Carlo Curino commented on YARN-2877: Karthik, you are correct... Karthik, glad you like the idea, and you ask good questions... This could be relevant to lower the load on the central RM (hence help with scale), in particular if we have a vast number of short-lived tasks (heavy scheduling cost for little work). (However, we have other ongoing work towards that, which we will post soon, hence the focus on utilization) What takes care of the fast adaption to node conditions is having a local queue (from which to pick more work if I am idle), and the notion of different containers types (i.e., I can kick out the optimistic containers if I am overbooked). With this in mind, the RM could be the one making scheduling decisions for queueable/optimistic containers as well, as you pointed out. What is constant (whether you make the scheduling decisions centrally or distributed), is the notion of different container types (see YARN-2882). This should be exposed to the AM, as it comes with very different level of guarantees on the container start/completion. Thus the AM need to know which type of containers to use for different tasks (e.g., short lived or non-critical-path containers can be optimistic). Extend YARN to support distributed scheduling - Key: YARN-2877 URL: https://issues.apache.org/jira/browse/YARN-2877 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Sriram Rao This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling. Briefly, some of the motivations for distributed scheduling are the following: 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources on individual machines. 2. Reduce allocation latency. Tasks where the scheduling time dominates (i.e., task execution time is much less compared to the time required for obtaining a container from the RM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2888) Corrective mechanisms for rebalancing NM container queues
Konstantinos Karanasos created YARN-2888: Summary: Corrective mechanisms for rebalancing NM container queues Key: YARN-2888 URL: https://issues.apache.org/jira/browse/YARN-2888 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of the scheduling decisions or due to having a stale image of the system) may lead to an imbalance in the waiting times of the NM container queues. This can in turn have an impact in job execution times and cluster utilization. To this end, we introduce corrective mechanisms that may remove (whenever needed) container requests from overloaded queues, adding them to less-loaded ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219874#comment-14219874 ] Sriram Rao commented on YARN-2877: -- [~kasha] (1) Yes, the central RM can allocate optimistic containers, however, as you note it introduces extra latency. (2) Scaling the RM's allocation particularly when you have small tasks is another motivation as well. Extend YARN to support distributed scheduling - Key: YARN-2877 URL: https://issues.apache.org/jira/browse/YARN-2877 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Sriram Rao This is an umbrella JIRA that proposes to extend YARN to support distributed scheduling. Briefly, some of the motivations for distributed scheduling are the following: 1. Improve cluster utilization by opportunistically executing tasks otherwise idle resources on individual machines. 2. Reduce allocation latency. Tasks where the scheduling time dominates (i.e., task execution time is much less compared to the time required for obtaining a container from the RM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219881#comment-14219881 ] Carlo Curino commented on YARN-2884: I agree we should give it another name... but the LocalRM is a slightly different concept YARN-2885, i.e., it is the logic making distributed scheduling decisions. The Proxy itself is just the mechanics to hijack the connection between AM-RM, which we will need for some more work on federating multiple RMs (JIRAs coming soon). Hence the need to call out separately the architectural piece (proxy) and the distributed scheduling logic (LocalRM). Any name suggestion? Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219894#comment-14219894 ] Subru Krishnan commented on YARN-2884: -- What about RMAgent ? Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219898#comment-14219898 ] Konstantinos Karanasos commented on YARN-2884: -- Karthik, just a clarification: what is the current RMProxy responsible for? As Carlo says, the functionality needed for the distributed scheduling is explained in more detail in YARN-2885, where we introduce the LocalRM. Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2604: Attachment: YARN-2604.patch The new patch addresses Karthik's 2nd suggestion. That actually made it so that we didn't need to exclude anything from findbugs, making the 1st suggestion moot now. I spoke to Karthik offline; the 3rd suggestion does not apply because we're not looking for the node to remove it; we're looking for the new largest node. Scheduler should consider max-allocation-* in conjunction with the largest node --- Key: YARN-2604 URL: https://issues.apache.org/jira/browse/YARN-2604 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Robert Kanter Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, YARN-2604.patch If the scheduler max-allocation-* values are larger than the resources available on the largest node in the cluster, an application requesting resources between the two values will be accepted by the scheduler but the requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2889) Limit in the number of queueable container requests per AM
Konstantinos Karanasos created YARN-2889: Summary: Limit in the number of queueable container requests per AM Key: YARN-2889 URL: https://issues.apache.org/jira/browse/YARN-2889 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos We introduce a way to limit the number of queueable requests that each AM can submit to the LocalRM. This way we can restrict the number of queueable containers handed out by the system, as well as throttle down misbehaving AMs (asking for too many queueable containers). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219914#comment-14219914 ] Anubhav Dhoot commented on YARN-2738: - The issue is the only configuration in the system is at the per queue level. I can add a new configuration level for global defaults in addition to this if needed in future. Have opened YARN-2881 for FairSchedulerPlanFollower. Add FairReservationSystem for FairScheduler --- Key: YARN-2738 URL: https://issues.apache.org/jira/browse/YARN-2738 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2738.001.patch, YARN-2738.002.patch, YARN-2738.003.patch Need to create a FairReservationSystem that will implement ReservationSystem for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219930#comment-14219930 ] Jonathan Eagles commented on YARN-2375: --- I think creating a separate ticket for enabling timeline server in the mini MR cluster is a good idea. changes look good to me. [~zjshen], any additional feedback before this goes in? Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2375: -- Attachment: YARN-2375.1.patch +1. LGTM. I uploaded a new patch to just fix the indent issue for one line. Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220030#comment-14220030 ] Hadoop QA commented on YARN-2604: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682729/YARN-2604.patch against trunk revision eb4045e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5895//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5895//console This message is automatically generated. Scheduler should consider max-allocation-* in conjunction with the largest node --- Key: YARN-2604 URL: https://issues.apache.org/jira/browse/YARN-2604 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Robert Kanter Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, YARN-2604.patch If the scheduler max-allocation-* values are larger than the resources available on the largest node in the cluster, an application requesting resources between the two values will be accepted by the scheduler but the requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
Mit Desai created YARN-2890: --- Summary: MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Fix For: 2.6.1 Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220047#comment-14220047 ] Arun C Murthy commented on YARN-2139: - Sorry, been busy with 2.6.0 - just coming up for air. What are we modeling with vdisk again? What is the metric? Is it directly the blkio parameter? If so, that is my biggest concern. [Umbrella] Support for Disk as a Resource in YARN -- Key: YARN-2139 URL: https://issues.apache.org/jira/browse/YARN-2139 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Attachments: Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, YARN-2139-prototype.patch YARN should consider disk as another resource for (1) scheduling tasks on nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220045#comment-14220045 ] Mit Desai commented on YARN-2375: - typo YARN-2890 Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220044#comment-14220044 ] Mit Desai commented on YARN-2375: - Thanks [~zjshen] for picking that indenting issue. I have filed YARH-2890 for addressing the test case scenario. Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN
[ https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220077#comment-14220077 ] Karthik Kambatla commented on YARN-2139: It is very similar to vcores. vdisks is the number of virtual disks, no metric just a number. If we want to allow upto 'n' tasks to share a disk, {{vdisks = n * num-disks}}. For cases with n 1, spindle locality will help with ensuring all the 'n' vdisks correspond to the same spindle(s). [Umbrella] Support for Disk as a Resource in YARN -- Key: YARN-2139 URL: https://issues.apache.org/jira/browse/YARN-2139 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Attachments: Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, YARN-2139-prototype.patch YARN should consider disk as another resource for (1) scheduling tasks on nodes, (2) isolation at runtime, (3) spindle locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220091#comment-14220091 ] Hadoop QA commented on YARN-2375: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682743/YARN-2375.1.patch against trunk revision eb4045e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5896//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5896//console This message is automatically generated. Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220199#comment-14220199 ] Karthik Kambatla commented on YARN-2604: Thanks Robert. One more thing I missed - we need to handle vcores in addition to memory. I was hoping this would come for free with Resource suggestion, but from looking at the code, I think we should handle vcores alongside memory the way the patch does now. Scheduler should consider max-allocation-* in conjunction with the largest node --- Key: YARN-2604 URL: https://issues.apache.org/jira/browse/YARN-2604 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Robert Kanter Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, YARN-2604.patch If the scheduler max-allocation-* values are larger than the resources available on the largest node in the cluster, an application requesting resources between the two values will be accepted by the scheduler but the requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220203#comment-14220203 ] Karthik Kambatla commented on YARN-2884: RMAgent seems okay to me. RMProxy is responsible to create a Proxy depending on the protocol the client wants to converse with the RM. Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2806) log container allocation requests
[ https://issues.apache.org/jira/browse/YARN-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220214#comment-14220214 ] Eric Wohlstadter commented on YARN-2806: Looking at scheduler.AppSchedulingInfo (lines 141-146, trunk): What is the significance of ResourceRequest.ANY, in terms of determining whether to LOG a ResourceRequest? Why only ResourceRequest.ANY? Why is the ANY location the only one which can determine that updatePendingResources = true? Are all updated resource requests from the AM initiated with a ResourceRequest at the ANY location? Can all allocate calls from the AM which do not include a ResourceRequest.ANY be considered followup requests to a previous initial request for those resources (e.g. by asking for less number of containers in the followup or by modifying preferred locations in the followup)? {code:title=AppSchedulingInfo(141-146) |borderStyle=solid} if (resourceName.equals(ResourceRequest.ANY)) { if (LOG.isDebugEnabled()) { LOG.debug(update: + application= + applicationId + request= + request); } updatePendingResources = true; {code} log container allocation requests - Key: YARN-2806 URL: https://issues.apache.org/jira/browse/YARN-2806 Project: Hadoop YARN Issue Type: Bug Reporter: Allen Wittenauer Assignee: Eric Wohlstadter Attachments: YARN-2806.patch I might have missed it, but I don't see where we log application container requests outside of the DEBUG context. Without this being logged, we have no idea on a per application the lag an application might be having in the allocation system. We should probably add this as an event to the RM audit log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Mazzucchelli updated YARN-2664: -- Attachment: YARN-2664.3.patch Hi, Submitted the new version of the patch. The patch includes unit tests and corrections of the previous errors. Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.patch YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220239#comment-14220239 ] Hadoop QA commented on YARN-2664: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682778/YARN-2664.3.patch against trunk revision 90194ca. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5897//console This message is automatically generated. Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.patch YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2188) Client service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220256#comment-14220256 ] Karthik Kambatla commented on YARN-2188: Sorry, didn't realize this patch was updated. Patch looks mostly good. Some minor comments: # Rename yarn.sharedcache.client.server.* to yarn.sharedcache.client-server.*? # ClientSCMProtocolPBClientImpl#close should set this.proxy to null. # In the test, cleanUp() should set variables to null after stopping them. Client service for cache manager Key: YARN-2188 URL: https://issues.apache.org/jira/browse/YARN-2188 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2188-trunk-v1.patch, YARN-2188-trunk-v2.patch, YARN-2188-trunk-v3.patch, YARN-2188-trunk-v4.patch Implement the client service for the shared cache manager. This service is responsible for handling client requests to use and release resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220269#comment-14220269 ] Vinod Kumar Vavilapalli commented on YARN-2786: --- Let me conclude this, it has gone on for far too long. We should agree to disagree. It's ops _and_ developers, not _or_. You want to manually configure through scripts, you will get it in the distributed-config setup. The rest of us want to configure programmatically through APIs, we will have that as an option. I don't see a technical argument against what is done so far, only opinions on which is the right approach. This JIRA is not the place for this discussion, if you have more qualms about this, you should comment on YARN-796. Suggestions to change design in a leaf JIRA is not constructive. Create yarn cluster CLI to enable list node labels collection - Key: YARN-2786 URL: https://issues.apache.org/jira/browse/YARN-2786 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch With YARN-2778, we can list node labels on existing RM nodes. But it is not enough, we should be able to: 1) list node labels collection The command should start with yarn cluster ..., in the future, we can add more functionality to the yarnClusterCLI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-2664: --- Attachment: YARN-2664.4.patch simply adding a new-line at the end to get patch to apply Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.patch YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220389#comment-14220389 ] Hadoop QA commented on YARN-2664: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682782/YARN-2664.4.patch against trunk revision 90194ca. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 5 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5898//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5898//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5898//console This message is automatically generated. Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.patch YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220395#comment-14220395 ] Carlo Curino commented on YARN-2664: I tried to test this, but: # the patch was missing a new-line at the end to go through with patch... I added it. # It is missing a couple of .js files (I guess imported by other .js)... I think d3.js possibly more. That was true of your previous patch as well (I manually fixed it in my previous tests) You should definitely included those files in the patch, and please make sure to test this on a clean machine with a clean browser. I am happy to try out a new patch once that is done. (In fact, I would deploy it in a research cluster where we are using this stuff actively, so we get some foot-traffic on the UI). Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Matteo Mazzucchelli Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.patch YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2604: Attachment: YARN-2604.patch The new patch adds similar code for scores. I made some other minor changes and updated the unit tests. Scheduler should consider max-allocation-* in conjunction with the largest node --- Key: YARN-2604 URL: https://issues.apache.org/jira/browse/YARN-2604 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Robert Kanter Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, YARN-2604.patch If the scheduler max-allocation-* values are larger than the resources available on the largest node in the cluster, an application requesting resources between the two values will be accepted by the scheduler but the requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220510#comment-14220510 ] Hadoop QA commented on YARN-2604: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682805/YARN-2604.patch against trunk revision 90194ca. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5899//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5899//console This message is automatically generated. Scheduler should consider max-allocation-* in conjunction with the largest node --- Key: YARN-2604 URL: https://issues.apache.org/jira/browse/YARN-2604 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Robert Kanter Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, YARN-2604.patch If the scheduler max-allocation-* values are larger than the resources available on the largest node in the cluster, an application requesting resources between the two values will be accepted by the scheduler but the requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework
[ https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220551#comment-14220551 ] Hudson commented on YARN-2375: -- FAILURE: Integrated in Hadoop-trunk-Commit #6584 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6584/]) YARN-2375. Allow enabling/disabling timeline server per framework. (Mit Desai via jeagles) (jeagles: rev c298a9a845f89317eb9efad332e6657c56736a4d) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java Allow enabling/disabling timeline server per framework -- Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Mit Desai Attachments: YARN-2375.1.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch, YARN-2375.patch This JIRA is to remove the ats enabled flag check within the TimelineClientImpl. Example where this fails is below. While running secure timeline server with ats flag set to disabled on resource manager, Timeline delegation token renewer throws an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose
[ https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220580#comment-14220580 ] Sunil G commented on YARN-2356: --- Test case failure of TestApplicationClientProtocolOnHA is not related to this patch. yarn status command for non-existent application/application attempt/container is too verbose -- Key: YARN-2356 URL: https://issues.apache.org/jira/browse/YARN-2356 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Sunil G Assignee: Sunil G Priority: Minor Attachments: 0001-YARN-2356.patch, 0002-YARN-2356.patch, Yarn-2356.1.patch *yarn application -status* or *applicationattempt -status* or *container status* commands can suppress exception such as ApplicationNotFound, ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in RM or History Server. For example, below exception can be suppressed better sunildev@host-a:~/hadoop/hadoop/bin ./yarn application -status application_1402668848165_0015 No GC_PROFILE is given. Defaults to medium. 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at /10.18.40.77:45022 Exception in thread main org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1402668848165_0015' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at $Proxy12.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException): Application with id 'application_1402668848165_0015' doesn't exist in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2881) Implement PlanFollower for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2881: Attachment: YARN-2881.prelim.patch Based on YARN-2738. FairSchedulerPlanFollower with unit tests. Implement PlanFollower for FairScheduler Key: YARN-2881 URL: https://issues.apache.org/jira/browse/YARN-2881 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2881.prelim.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.6.patch Updated patch which passes existing unit tests in the resourcemanager/capacity scheduler area. Still has extra debug logging and needs unit tests specific to the change. Setting patch available to see if unit tests outside what I have checked are impacted/etc. maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.2.patch, YARN-2637.6.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)